I've never gotten real deep in the rsyslog configuration so I have a few 
questions.

The ${MainMsg/Action}ResumeRetryCount seems to look like a 'legacy' style 
option, but I don't see the equivalent in the ranier-script style:

action( type="omfwd"
        target="127.0.0.1"
        port="5544"
        protocol="udp"
        template="RSYSLOG_TraditionalForwardFormat"
        queue.type="LinkedList"
        queue.filename="logstash"
        queue.size="1000000"
        queue.highwatermark="60000"
        queue.lowwatermark="50000"
        queue.maxdiskspace="1g"
        queue.saveonshutdown="on"
)

It seems that if I'm going through this overhaul I should be converting to the 
newer and future format.

Second, I've seen the pstats module, but do you have some examples of how I 
could be using it to tell what's going on?

Lastly, with the legacy format I could do something like:

auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat

I don't grok how I could do that in the ranier-script notation. Would it be 
something like:

aut,authpriv.* action( type="omfwd" ... )

or do I need to approach it a completely different way?

Thanks for all your help so far! At least now I have much more predictable 
logging and indexing and I'm getting my rsyslog bearings even more.



On Nov 12, 2013, at 2:46 PM, Dave Caplinger <[email protected]> 
wrote:

> On Nov 12, 2013, at 1:02 PM, Leggett, Torrance I. <[email protected]> wrote:
>> So indeed it is logstash causing the backup. If I do UDP, the backup doesn't 
>> happen - I haven't verified logstash is getting every single message, but it 
>> does seem to be getting the vast majority. If I do TCP in an action queue, 
>> that action queue starts backing up as the main queue did previously - it's 
>> getting some messages to logstash, but most seem to just be backing up in 
>> the queue. Restarting logstash doesn't seem to help at all - those queue 
>> files always seem to stay on disk once they're there.
> 
> Looking at your config, two things stand out to me:
> 
> 1) There doesn't appear to be a ${MainMsg/Action}ResumeRetryCount setting.  
> Setting this to -1 will cause rsyslog to keep retrying delivery to a failed 
> [remote] action destination forever, and is probably what you want in this 
> case.
> 
> 2) The ${MainMsg/Action}QueueLowWatermark setting may be too low.
> 
> When I was playing with rsyslog -> logstash -> elasticsearch I was running 
> into what might be a similar situation.  First of all, I was regularly 
> breaking ElasticSearch (filling the local disk, running out of memory, etc.), 
> so every time that happened logstash delivery would hang, and the problem 
> would propagate back to the sending rsyslog and leave a bunch of DA files on 
> disk.
> 
> But outside of that, "something" was causing the logstash feed to 
> periodically stop. When it did this, rsyslog would buffer to disk as 
> expected, and the incoming log rate visible at elasticsearch (via kibana) 
> would appear to drop dramatically (but be non-zero -- some logs were 
> trickling in at a much lower rate).
> 
> After some period of time, things would "unstick," the rsyslog backlog would 
> finally get delivered (and on-disk files removed except for the last one), 
> and elasticsearch would insert the new records in their correct chronological 
> location, back-filling the data in kibana.
>       
> This wound up being a consequence of rsyslog's high- and low- watermark 
> settings for this queue. When the number of messages in the in-memory queue 
> surged above the 80% (8000-message default) high water mark, rsyslog would 
> start writing the queue to disk and keep doing so until the in-memory queue 
> size dropped down to the 20%-full (2000-message default) low water mark. So 
> even though the short-term spike in message volume was over, it would 
> continue to send messages from mem->disk->mem->net until it got the queue all 
> the way back down to 20% full.  This extra use of disk I/O dramatically 
> reduced throughput and explained the apparent drop in log volume seen by 
> elasticsearch and kibana in my case, at least.
> 
> Your high- and low-watermark numbers appear to also be 80% and 20% of the 
> queue size.  So perhaps either 1) rsyslog gives up trying to send after 1 
> failure and zero retries (the default!), or 2) maybe it really is delivering 
> the backlog very slowly?  It's possible that the disk-constrained outbound 
> traffic throughput may not be fast enough to empty the queue before more 
> incoming traffic gets appended to the queue and eventually fills the 
> filesystem.
> 
> If it's the latter case, then you could either A) put the DA queue files on 
> faster disk, or B) increase the LowWathermark to something more like 50-60% 
> so it get out of write-to-disk mode faster.
> 
> Some pstats output from the time leading up to and well into the problem 
> would be really helpful to get an idea of the message volume hitting the main 
> message queue and action queue, so you can tell where the messages are going.
> 
> - Dave

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to