Hi all!

Recently we experienced a "nice" lock up of our web nodes that do
remote TCP logging to the centralized syslog server.

Central rsyslog server was shut down due the maintenance for few
hours. Very soon first Web servers with nginx/php-fpm started to fail
with timeouts and later access to the nodes themselves over ssh
stopped working, making boxes unaccessible. With the iLO it was
possible to log to the consoles of the nodes, but no problems were
spotted - memory, CPU, TCP sockets footprints were low, everything
looked normal.

Educated guess that it may be related to the rsyslog, as it was the
only change in the servers configuration on that day and killing
rsyslogd on the nodes brought machines back to normal.

A bit of googling brought up this story:

http://blog.bitbucket.org/2012/01/12/follow-up-on-our-downtime-last-week/

as well as a last year conversation on this ML:

http://lists.adiscon.net/pipermail/rsyslog/2011-October/013944.html

as well as a bug report in RedHat Bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=519201

Switching to UDP transport instead of TCP cured the problem and
running simple test with:

# for each n in `seq 1 1000`; do echo $n; echo "Syslog test $n" | logger; done

showed that with TCP remote logging enabled and absent remote log
server after first 250-300 messages all the following messages were
sent to syslog very slowly, with the 1-2sec delay each.

All nodes are running Debian 6(squeeze) with:

rsyslogd 5.8.11, compiled with:
        FEATURE_REGEXP:                         Yes
        FEATURE_LARGEFILE:                      No
        GSSAPI Kerberos 5 support:              Yes
        FEATURE_DEBUG (debug build, slow code): No
        32bit Atomic operations supported:      Yes
        64bit Atomic operations supported:      Yes
        Runtime Instrumentation (slow code):    No

And configuration:

$MaxMessageSize 8k
$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog   # provides kernel logging support

# Store PID of the process in the log
$SystemLogUsePIDFromSystem on
# Rate limit for imuxsock
$SystemLogRateLimitInterval 1
$SystemLogRateLimitBurst 500

$WorkDirectory /var/spool/rsyslog

$ModLoad imfile
$InputFilePollInterval 5

$InputFileName /var/log/nginx/access.log
$InputFilePersistStateInterval 100
$InputFileTag nginx/access:
$InputFileStateFile nginx_access_log_state
$InputFileFacility local7
$InputFileSeverity notice
$InputRunFileMonitor

$ActionQueueType                        LinkedList      # enable a
separate queue for this action
$ActionQueueFileName                 remote           # set file name,
also enables disk mode
$ActionResumeRetryCount            -1                  # infinite
retries on insert failure
$ActionQueueSaveOnShutdown     on
*.*                                               @@10.0.0.200

I went through ChangeLog on the site for the versions of rsyslogd
following 5.8.11, but didn't notice any fix to that problem. Neither
there is an open ticket in bugzilla.

Interesting enough that RedHat claims they fixed the problem in version 3.22:

http://rhn.redhat.com/errata/RHBA-2010-0213.html

Can you point me to the version where this problem is fixed? Or, if
it's not fixed yet - can it be done?

BTW, not sure how this would behave with REPL protocol instead of TCP...

with best regards,
Timur Bakeyev.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to