Re: [rsyslog] rsyslog stops writing logs and main queue continues to grow

Leggett, Torrance I. Wed, 13 Nov 2013 11:48:18 -0800

Actually looking at my config I think I may have answered my own last question. 
In it I have:


if ( $programname == 'sshd' ) and ( $syslogfacility-text == 'auth' or 
$syslogfacility-text == 'authpriv' ) then {
    auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat
}

And since that fwd is already in its own conditional block, I probably need to 
just do:

if ( $programname == 'sshd' ) and ( $syslogfacility-text == 'auth' or 
$syslogfacility-text == 'authpriv' ) then {
        action( type="omfwd" ... )
}

On Nov 13, 2013, at 1:40 PM, Leggett, Torrance I. <[email protected]> wrote:

> I've never gotten real deep in the rsyslog configuration so I have a few 
> questions.
> 
> The ${MainMsg/Action}ResumeRetryCount seems to look like a 'legacy' style 
> option, but I don't see the equivalent in the ranier-script style:
> 
> action( type="omfwd"
>        target="127.0.0.1"
>        port="5544"
>        protocol="udp"
>        template="RSYSLOG_TraditionalForwardFormat"
>        queue.type="LinkedList"
>        queue.filename="logstash"
>        queue.size="1000000"
>        queue.highwatermark="60000"
>        queue.lowwatermark="50000"
>        queue.maxdiskspace="1g"
>        queue.saveonshutdown="on"
> )
> 
> It seems that if I'm going through this overhaul I should be converting to 
> the newer and future format.
> 
> Second, I've seen the pstats module, but do you have some examples of how I 
> could be using it to tell what's going on?
> 
> Lastly, with the legacy format I could do something like:
> 
> auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat
> 
> I don't grok how I could do that in the ranier-script notation. Would it be 
> something like:
> 
> aut,authpriv.* action( type="omfwd" ... )
> 
> or do I need to approach it a completely different way?
> 
> Thanks for all your help so far! At least now I have much more predictable 
> logging and indexing and I'm getting my rsyslog bearings even more.
> 
> 
> 
> On Nov 12, 2013, at 2:46 PM, Dave Caplinger <[email protected]> 
> wrote:
> 
>> On Nov 12, 2013, at 1:02 PM, Leggett, Torrance I. <[email protected]> 
>> wrote:
>>> So indeed it is logstash causing the backup. If I do UDP, the backup 
>>> doesn't happen - I haven't verified logstash is getting every single 
>>> message, but it does seem to be getting the vast majority. If I do TCP in 
>>> an action queue, that action queue starts backing up as the main queue did 
>>> previously - it's getting some messages to logstash, but most seem to just 
>>> be backing up in the queue. Restarting logstash doesn't seem to help at all 
>>> - those queue files always seem to stay on disk once they're there.
>> 
>> Looking at your config, two things stand out to me:
>> 
>> 1) There doesn't appear to be a ${MainMsg/Action}ResumeRetryCount setting.  
>> Setting this to -1 will cause rsyslog to keep retrying delivery to a failed 
>> [remote] action destination forever, and is probably what you want in this 
>> case.
>> 
>> 2) The ${MainMsg/Action}QueueLowWatermark setting may be too low.
>> 
>> When I was playing with rsyslog -> logstash -> elasticsearch I was running 
>> into what might be a similar situation.  First of all, I was regularly 
>> breaking ElasticSearch (filling the local disk, running out of memory, 
>> etc.), so every time that happened logstash delivery would hang, and the 
>> problem would propagate back to the sending rsyslog and leave a bunch of DA 
>> files on disk.
>> 
>> But outside of that, "something" was causing the logstash feed to 
>> periodically stop. When it did this, rsyslog would buffer to disk as 
>> expected, and the incoming log rate visible at elasticsearch (via kibana) 
>> would appear to drop dramatically (but be non-zero -- some logs were 
>> trickling in at a much lower rate).
>> 
>> After some period of time, things would "unstick," the rsyslog backlog would 
>> finally get delivered (and on-disk files removed except for the last one), 
>> and elasticsearch would insert the new records in their correct 
>> chronological location, back-filling the data in kibana.
>>      
>> This wound up being a consequence of rsyslog's high- and low- watermark 
>> settings for this queue. When the number of messages in the in-memory queue 
>> surged above the 80% (8000-message default) high water mark, rsyslog would 
>> start writing the queue to disk and keep doing so until the in-memory queue 
>> size dropped down to the 20%-full (2000-message default) low water mark. So 
>> even though the short-term spike in message volume was over, it would 
>> continue to send messages from mem->disk->mem->net until it got the queue 
>> all the way back down to 20% full.  This extra use of disk I/O dramatically 
>> reduced throughput and explained the apparent drop in log volume seen by 
>> elasticsearch and kibana in my case, at least.
>> 
>> Your high- and low-watermark numbers appear to also be 80% and 20% of the 
>> queue size.  So perhaps either 1) rsyslog gives up trying to send after 1 
>> failure and zero retries (the default!), or 2) maybe it really is delivering 
>> the backlog very slowly?  It's possible that the disk-constrained outbound 
>> traffic throughput may not be fast enough to empty the queue before more 
>> incoming traffic gets appended to the queue and eventually fills the 
>> filesystem.
>> 
>> If it's the latter case, then you could either A) put the DA queue files on 
>> faster disk, or B) increase the LowWathermark to something more like 50-60% 
>> so it get out of write-to-disk mode faster.
>> 
>> Some pstats output from the time leading up to and well into the problem 
>> would be really helpful to get an idea of the message volume hitting the 
>> main message queue and action queue, so you can tell where the messages are 
>> going.
>> 
>> - Dave
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] rsyslog stops writing logs and main queue continues to grow

Reply via email to