Re: [Resin-interest] Logrotation and NFS

2016-02-12 Thread Jens Dueholm Christensen
Hi Scott

Thank you for confirming that.

On a side notice, what do I do in regards to a feature request?

As far as I can see there is no way of controlling the 
programname/tag/application (3 names for the same field in a syslog message), 
but only facility and severity.

Unless I begin overloading facility and severity and filter just on those two 
fields in the message, I cannot easily filter 10+ different systems running in 
20+ different resin instances to  a single file per instance – let alone a 
single log for each system now that syslog is in play.

As far as I can tell from looking at src/resin/jni_vfs.c all syslog messages 
will be comming from “Resin” (there’s a call to “openlog("Resin", 0, 
LOG_DAEMON);” and then just calls to “syslog(priority, "%s", buffer);” , which 
makes it a bit hard to filter unless I change all existing logging to prepend a 
key for easier sorting – or create individual patched Resin versions per 
application with different names..


My feature request is that it should be possible to specify 
programname/tag/application as well as hostname via XML tags like so:



 
daemon
notice
my_application
my_non_FQDN
 


Regards,
Jens Dueholm Christensen
Survey IT

P +45 5161 7879
j...@ramboll.com<mailto:j...@ramboll.com>


Rambøll
Olof Palmes Allé 20
DK-8200 Aarhus N
Denmark
www.ramboll.dk<http://www.ramboll-management.dk/>

From: resin-interest-boun...@caucho.com 
[mailto:resin-interest-boun...@caucho.com] On Behalf Of Scott Ferguson
Sent: Wednesday, February 10, 2016 7:04 PM
To: resin-interest@caucho.com
Subject: Re: [Resin-interest] Logrotation and NFS

On 2/8/16 5:01 AM, Jens Dueholm Christensen wrote:
Hi

Resin license #1016608

Recently we’ve migrated to a ceph-based storage setup where we use NFS to allow 
our multiple Resin instances to write their logs directly to Ceph over NFS.

The logging did change between Resin 3 and Resin 4 mostly to improve threading, 
but the log rotation is essentially the same. (As I remember.) So if NFS stalls 
during log rotation, it's still going to be a problem. The log is written 
assuming the rotation might be slow (and avoids blocking other threads while 
it's happening), but it does assume the rotation will complete and not freeze.

-- Scott



However, on 2 of our client-machines we see NFS stalls and retransmissions a 
few minutes past midnight for some files.

So far I’ve managed to write off posibility after posibility and right now I’m 
stuck a few last possibilities, and the next one in line is the way Resin does 
logrotation.

A poll of open files each second (via lsof output) shows this where out stdout 
logfile gets rotated:

COMMAND PID  USER   FD  TYPE   DEVICE   SIZE/OFF
   NODE NAME
Sun Feb  7 23:59:58 CET 2016
java  30865   results  238w  REG   0,25 1215099201 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Sun Feb  7 23:59:59 CET 2016
java  30865   results  238w  REG   0,25 1215100254 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb  8 00:00:00 CET 2016
java  30865   results  107w  REG   0,25  233381888 
1073774632 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207
java  30865   results  118r  REG   0,25 1215103872 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb  8 00:00:04 CET 2016
java  30865   results  107w  REG   0,25  929890304 
1073774632 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207
java  30865   results  118r  REG   0,25 1215103872 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb  8 00:00:55 CET 2016
java  30865   results  161r  REG   0,251439432 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log

All day on the 7th the logfile is opened on filehandle 238 with write access 
(238w in the FD column)
Then around midnight a new file is created named resin-web-stdout.log.20160207 
on filehandle 107 with write permission (107w in the FD column).
At the same time it seems resin-web-stdout.log is opened with a new filehandle 
and only with read permission (118r).

Then over time data is copied from the file with handle 118r to 107w – the 
size/offset numbers increase between 00:00:00 and 00:00:04.

The stall I experience happens between 00:00:04 and 00:00:55 where lsof output 
stalls (the man-page mentions that lstat(2), readlink(2), and stat(2) calls are 
blocked if the NFS server is unresponsive).

Other open files in other Resin instances running on the same server does not 
see these stalls – nor does other client-machines, so I am confide

Re: [Resin-interest] Logrotation and NFS

2016-02-10 Thread Scott Ferguson

On 2/8/16 5:01 AM, Jens Dueholm Christensen wrote:


Hi

Resin license #1016608

Recently we’ve migrated to a ceph-based storage setup where we use NFS 
to allow our multiple Resin instances to write their logs directly to 
Ceph over NFS.




The logging did change between Resin 3 and Resin 4 mostly to improve 
threading, but the log rotation is essentially the same. (As I 
remember.) So if NFS stalls during log rotation, it's still going to be 
a problem. The log is written assuming the rotation might be slow (and 
avoids blocking other threads while it's happening), but it does assume 
the rotation will complete and not freeze.


-- Scott

However, on 2 of our client-machines we see NFS stalls and 
retransmissions a few minutes past midnight for some files.


So far I’ve managed to write off posibility after posibility and right 
now I’m stuck a few last possibilities, and the next one in line is 
the way Resin does logrotation.


A poll of open files each second (via lsof output) shows this where 
out stdout logfile gets rotated:


COMMAND PID  USER   FD  TYPE   DEVICE   
SIZE/OFF   NODE NAME


Sun Feb  7 23:59:58 CET 2016

java  30865   results  238w REG   0,25 1215099201 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log


Sun Feb  7 23:59:59 CET 2016

java  30865   results  238w REG   0,25 1215100254 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log


Mon Feb  8 00:00:00 CET 2016

java  30865   results  107w  REG   0,25  
233381888 1073774632 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207


java  30865   results  118r REG   0,25 1215103872 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log


Mon Feb  8 00:00:04 CET 2016

java  30865   results  107w REG   0,25  929890304 
1073774632 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207


java  30865   results  118r REG   0,25 1215103872 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log


Mon Feb  8 00:00:55 CET 2016

java  30865   results  161r REG   0,251439432 
1073774597 
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log


All day on the 7^th the logfile is opened on filehandle 238 with write 
access (238w in the FD column)


Then around midnight a new file is created named 
resin-web-stdout.log.20160207 on filehandle 107 with write permission 
(107w in the FD column).


At the same time it seems resin-web-stdout.log is opened with a new 
filehandle and only with read permission (118r).


Then over time data is copied from the file with handle 118r to 107w – 
the size/offset numbers increase between 00:00:00 and 00:00:04.


The stall I experience happens between 00:00:04 and 00:00:55 where 
lsof output stalls (the man-page mentions that lstat(2), readlink(2), 
and stat(2) calls are blocked if the NFS server is unresponsive).


Other open files in other Resin instances /running on the same server/ 
does not see these stalls – nor does other client-machines, so I am 
confident that our NFS-server is alive and answering requests.


Now for the real questions:

Since we are running an ancient version of Resin (3.1.13 and 3.1.14), 
have the way of rotating logfiles changed in Resin 4.0 or later?


Are the stalls I’m seeing (I assume it could just as well happen 
locally if a disksubsystem was slow enough) a known issue that has 
been fixed in Resin 4.0 or later?


Right now the best way of solving this is to change to syslog logging, 
but since we already have a roadmap for Resin 4 planned for later this 
year (yeay – finally!) this might just be enough for us to move it up 
to the top of our list.


Regards,

*Jens Dueholm Christensen
*Survey IT

P +45 5161 7879

j...@ramboll.com 



Rambøll

Olof Palmes Allé 20

DK-8200 Aarhus N

Denmark

www.ramboll.dk 



___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest


___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest