I've never gotten real deep in the rsyslog configuration so I have a few questions.
The ${MainMsg/Action}ResumeRetryCount seems to look like a 'legacy' style
option, but I don't see the equivalent in the ranier-script style:
action( type="omfwd"
target="127.0.0.1"
port="5544"
protocol="udp"
template="RSYSLOG_TraditionalForwardFormat"
queue.type="LinkedList"
queue.filename="logstash"
queue.size="1000000"
queue.highwatermark="60000"
queue.lowwatermark="50000"
queue.maxdiskspace="1g"
queue.saveonshutdown="on"
)
It seems that if I'm going through this overhaul I should be converting to the
newer and future format.
Second, I've seen the pstats module, but do you have some examples of how I
could be using it to tell what's going on?
Lastly, with the legacy format I could do something like:
auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat
I don't grok how I could do that in the ranier-script notation. Would it be
something like:
aut,authpriv.* action( type="omfwd" ... )
or do I need to approach it a completely different way?
Thanks for all your help so far! At least now I have much more predictable
logging and indexing and I'm getting my rsyslog bearings even more.
On Nov 12, 2013, at 2:46 PM, Dave Caplinger <[email protected]>
wrote:
> On Nov 12, 2013, at 1:02 PM, Leggett, Torrance I. <[email protected]> wrote:
>> So indeed it is logstash causing the backup. If I do UDP, the backup doesn't
>> happen - I haven't verified logstash is getting every single message, but it
>> does seem to be getting the vast majority. If I do TCP in an action queue,
>> that action queue starts backing up as the main queue did previously - it's
>> getting some messages to logstash, but most seem to just be backing up in
>> the queue. Restarting logstash doesn't seem to help at all - those queue
>> files always seem to stay on disk once they're there.
>
> Looking at your config, two things stand out to me:
>
> 1) There doesn't appear to be a ${MainMsg/Action}ResumeRetryCount setting.
> Setting this to -1 will cause rsyslog to keep retrying delivery to a failed
> [remote] action destination forever, and is probably what you want in this
> case.
>
> 2) The ${MainMsg/Action}QueueLowWatermark setting may be too low.
>
> When I was playing with rsyslog -> logstash -> elasticsearch I was running
> into what might be a similar situation. First of all, I was regularly
> breaking ElasticSearch (filling the local disk, running out of memory, etc.),
> so every time that happened logstash delivery would hang, and the problem
> would propagate back to the sending rsyslog and leave a bunch of DA files on
> disk.
>
> But outside of that, "something" was causing the logstash feed to
> periodically stop. When it did this, rsyslog would buffer to disk as
> expected, and the incoming log rate visible at elasticsearch (via kibana)
> would appear to drop dramatically (but be non-zero -- some logs were
> trickling in at a much lower rate).
>
> After some period of time, things would "unstick," the rsyslog backlog would
> finally get delivered (and on-disk files removed except for the last one),
> and elasticsearch would insert the new records in their correct chronological
> location, back-filling the data in kibana.
>
> This wound up being a consequence of rsyslog's high- and low- watermark
> settings for this queue. When the number of messages in the in-memory queue
> surged above the 80% (8000-message default) high water mark, rsyslog would
> start writing the queue to disk and keep doing so until the in-memory queue
> size dropped down to the 20%-full (2000-message default) low water mark. So
> even though the short-term spike in message volume was over, it would
> continue to send messages from mem->disk->mem->net until it got the queue all
> the way back down to 20% full. This extra use of disk I/O dramatically
> reduced throughput and explained the apparent drop in log volume seen by
> elasticsearch and kibana in my case, at least.
>
> Your high- and low-watermark numbers appear to also be 80% and 20% of the
> queue size. So perhaps either 1) rsyslog gives up trying to send after 1
> failure and zero retries (the default!), or 2) maybe it really is delivering
> the backlog very slowly? It's possible that the disk-constrained outbound
> traffic throughput may not be fast enough to empty the queue before more
> incoming traffic gets appended to the queue and eventually fills the
> filesystem.
>
> If it's the latter case, then you could either A) put the DA queue files on
> faster disk, or B) increase the LowWathermark to something more like 50-60%
> so it get out of write-to-disk mode faster.
>
> Some pstats output from the time leading up to and well into the problem
> would be really helpful to get an idea of the message volume hitting the main
> message queue and action queue, so you can tell where the messages are going.
>
> - Dave
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

