Following up on what Rainer is saying.

You should not think of this as fundamentally a rsyslog problem, you will run into this problem in many other situations.

You tripped over this with nginx blocking when it couldn't write to syslog. I'll point out that nginx would do the same thing if it was writing to disk and the disk filled up. similarly, if you write to a NFS server that doesn't have a fast disk cache, the performance of nginx will slow to a crawl because the NFS server is required by spec to save each write to the log to disk before it considers the write complete.

This is because nginx is written in the mode that they consider logs so critical that they would prefer to have the system halt rather than loose a single log message. Nginx is not alone in this, Apache does the same thing. This is why the Apache docs tell you not to have your log files on NFS.

The fear of this sort of problem is why nginx doesn't support logging to syslog by default and you need to patch it to do so.

Traditional syslog did a sync to disk for every message it recieved. This caused it to have a very low peak message rate, and also meant that if the disk filled up you would not be able to login to fix it (since login or su would try and create a log message, which could not be written to disk, so the login or su would never complete). This problem extended across a reboot as the disk would still be full.

Using a reliable transport between systems (TCP or RELP) now extends this behavior across systems.


Rsyslog relaxes this strict behavior by default because it has a queue (in memory by default) where some number of messages can accumulate without needing to hit the disk. These messages will be lost if the machine looses power before they get written to disk.

But if this queue fills up, you end up back in the exact same problem again. There is nothing that you can do with the log message, and you are still in the mode that stopping everything instead of loosing a log message is the 'right' thing to do.

To some extent you can address this by creating a disk queue, but disk queues are very slow compared to memory queues, so you may not be able to handle your log volume when doing so.

Another thing that you can do is to configure rsyslog to discard messages when the queue gets full instead of blocking. Rsyslog has fairly extensive options for specifying what to discard and when.

And then you can always opt to use UDP as your transport. As an 'unreliable' trnasport protocol (i.e. no acknowledgement of if the message has arrived or not) a down reciever will not block the senders. I put unreliable in quotes, because while it can loose messages, a properly configured system can handle messages at wire speed without loosing anything (unless a system is down, and you can implement high availability on youre recieving servers to deal with that).

David Lang

On Wed, 7 Nov 2012, Timur I. Bakeyev wrote:

Hi, Rainer!

Of course you know internals of rsyslog better, so can't argue with you
here. But as for the end user such behavior for me looks undesired. What
I'd expect:

1. All local(imuxsock) go to the local log files.
2. All remote messages would be spooled in an $ActionQueueFileName file.
3. Nothing would block

I also found this old discussion:

http://www.mail-archive.com/[email protected]/msg01330.html

So you tried to resolve it in 3.x series, but, apparently the problem is
still there.

DO I understand correctly that the reason is in the fact that Output queue
for local files and remote TCP is shared, so until confirmation from both
backends is received the message isn't discarded?

Is there a way to spit those queues?

Or, can you suggest the configuration that does, what I've described? Any
change if TCP would be replaced with RELP?

With regards,
Timur Bakeyev.

On Tue, Nov 6, 2012 at 3:37 PM, Rainer Gerhards <[email protected]>wrote:

Long story short: the problem is with your config. You configured rsyslog
to never lose any message, at some point it runs out of queue space, then
the inputs begin to block, so the log socket blocks and finally the apps
that write to the log socket.
There are many options to control under which circumstances you permit
message loss, and you need to make sure you know what you want.
The current versions also have lower enq timeout, which can kind of solve
this config issue, at least for systems that are not super-busy. Note that
it is not really an issue, but may be intended behavior in some
environments (e.g. banking, when you absolutely need to log).

Rainer

-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of Timur I. Bakeyev
Sent: Tuesday, November 06, 2012 3:26 PM
To: [email protected]
Cc: Michael Biebl
Subject: [rsyslog] Rsyslog freezes the box when can't send logs over
TCP

Hi all!

Recently we experienced a "nice" lock up of our web nodes that do
remote TCP logging to the centralized syslog server.

Central rsyslog server was shut down due the maintenance for few
hours. Very soon first Web servers with nginx/php-fpm started to fail
with timeouts and later access to the nodes themselves over ssh
stopped working, making boxes unaccessible. With the iLO it was
possible to log to the consoles of the nodes, but no problems were
spotted - memory, CPU, TCP sockets footprints were low, everything
looked normal.

Educated guess that it may be related to the rsyslog, as it was the
only change in the servers configuration on that day and killing
rsyslogd on the nodes brought machines back to normal.

A bit of googling brought up this story:

http://blog.bitbucket.org/2012/01/12/follow-up-on-our-downtime-last-
week/

as well as a last year conversation on this ML:

http://lists.adiscon.net/pipermail/rsyslog/2011-October/013944.html

as well as a bug report in RedHat Bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=519201

Switching to UDP transport instead of TCP cured the problem and
running simple test with:

# for each n in `seq 1 1000`; do echo $n; echo "Syslog test $n" |
logger; done

showed that with TCP remote logging enabled and absent remote log
server after first 250-300 messages all the following messages were
sent to syslog very slowly, with the 1-2sec delay each.

All nodes are running Debian 6(squeeze) with:

rsyslogd 5.8.11, compiled with:
        FEATURE_REGEXP:                         Yes
        FEATURE_LARGEFILE:                      No
        GSSAPI Kerberos 5 support:              Yes
        FEATURE_DEBUG (debug build, slow code): No
        32bit Atomic operations supported:      Yes
        64bit Atomic operations supported:      Yes
        Runtime Instrumentation (slow code):    No

And configuration:

$MaxMessageSize 8k
$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog   # provides kernel logging support

# Store PID of the process in the log
$SystemLogUsePIDFromSystem on
# Rate limit for imuxsock
$SystemLogRateLimitInterval 1
$SystemLogRateLimitBurst 500

$WorkDirectory /var/spool/rsyslog

$ModLoad imfile
$InputFilePollInterval 5

$InputFileName /var/log/nginx/access.log
$InputFilePersistStateInterval 100
$InputFileTag nginx/access:
$InputFileStateFile nginx_access_log_state
$InputFileFacility local7
$InputFileSeverity notice
$InputRunFileMonitor

$ActionQueueType                        LinkedList      # enable a
separate queue for this action
$ActionQueueFileName                 remote           # set file name,
also enables disk mode
$ActionResumeRetryCount            -1                  # infinite
retries on insert failure
$ActionQueueSaveOnShutdown     on
*.*                                               @@10.0.0.200

I went through ChangeLog on the site for the versions of rsyslogd
following 5.8.11, but didn't notice any fix to that problem. Neither
there is an open ticket in bugzilla.

Interesting enough that RedHat claims they fixed the problem in version
3.22:

http://rhn.redhat.com/errata/RHBA-2010-0213.html

Can you point me to the version where this problem is fixed? Or, if
it's not fixed yet - can it be done?

BTW, not sure how this would behave with REPL protocol instead of
TCP...

with best regards,
Timur Bakeyev.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to