Actually looking at my config I think I may have answered my own last question. In it I have:
if ( $programname == 'sshd' ) and ( $syslogfacility-text == 'auth' or
$syslogfacility-text == 'authpriv' ) then {
auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat
}
And since that fwd is already in its own conditional block, I probably need to
just do:
if ( $programname == 'sshd' ) and ( $syslogfacility-text == 'auth' or
$syslogfacility-text == 'authpriv' ) then {
action( type="omfwd" ... )
}
On Nov 13, 2013, at 1:40 PM, Leggett, Torrance I. <[email protected]> wrote:
> I've never gotten real deep in the rsyslog configuration so I have a few
> questions.
>
> The ${MainMsg/Action}ResumeRetryCount seems to look like a 'legacy' style
> option, but I don't see the equivalent in the ranier-script style:
>
> action( type="omfwd"
> target="127.0.0.1"
> port="5544"
> protocol="udp"
> template="RSYSLOG_TraditionalForwardFormat"
> queue.type="LinkedList"
> queue.filename="logstash"
> queue.size="1000000"
> queue.highwatermark="60000"
> queue.lowwatermark="50000"
> queue.maxdiskspace="1g"
> queue.saveonshutdown="on"
> )
>
> It seems that if I'm going through this overhaul I should be converting to
> the newer and future format.
>
> Second, I've seen the pstats module, but do you have some examples of how I
> could be using it to tell what's going on?
>
> Lastly, with the legacy format I could do something like:
>
> auth,authpriv.* @someotherhost:514;RSYSLOG_TraditionalForwardFormat
>
> I don't grok how I could do that in the ranier-script notation. Would it be
> something like:
>
> aut,authpriv.* action( type="omfwd" ... )
>
> or do I need to approach it a completely different way?
>
> Thanks for all your help so far! At least now I have much more predictable
> logging and indexing and I'm getting my rsyslog bearings even more.
>
>
>
> On Nov 12, 2013, at 2:46 PM, Dave Caplinger <[email protected]>
> wrote:
>
>> On Nov 12, 2013, at 1:02 PM, Leggett, Torrance I. <[email protected]>
>> wrote:
>>> So indeed it is logstash causing the backup. If I do UDP, the backup
>>> doesn't happen - I haven't verified logstash is getting every single
>>> message, but it does seem to be getting the vast majority. If I do TCP in
>>> an action queue, that action queue starts backing up as the main queue did
>>> previously - it's getting some messages to logstash, but most seem to just
>>> be backing up in the queue. Restarting logstash doesn't seem to help at all
>>> - those queue files always seem to stay on disk once they're there.
>>
>> Looking at your config, two things stand out to me:
>>
>> 1) There doesn't appear to be a ${MainMsg/Action}ResumeRetryCount setting.
>> Setting this to -1 will cause rsyslog to keep retrying delivery to a failed
>> [remote] action destination forever, and is probably what you want in this
>> case.
>>
>> 2) The ${MainMsg/Action}QueueLowWatermark setting may be too low.
>>
>> When I was playing with rsyslog -> logstash -> elasticsearch I was running
>> into what might be a similar situation. First of all, I was regularly
>> breaking ElasticSearch (filling the local disk, running out of memory,
>> etc.), so every time that happened logstash delivery would hang, and the
>> problem would propagate back to the sending rsyslog and leave a bunch of DA
>> files on disk.
>>
>> But outside of that, "something" was causing the logstash feed to
>> periodically stop. When it did this, rsyslog would buffer to disk as
>> expected, and the incoming log rate visible at elasticsearch (via kibana)
>> would appear to drop dramatically (but be non-zero -- some logs were
>> trickling in at a much lower rate).
>>
>> After some period of time, things would "unstick," the rsyslog backlog would
>> finally get delivered (and on-disk files removed except for the last one),
>> and elasticsearch would insert the new records in their correct
>> chronological location, back-filling the data in kibana.
>>
>> This wound up being a consequence of rsyslog's high- and low- watermark
>> settings for this queue. When the number of messages in the in-memory queue
>> surged above the 80% (8000-message default) high water mark, rsyslog would
>> start writing the queue to disk and keep doing so until the in-memory queue
>> size dropped down to the 20%-full (2000-message default) low water mark. So
>> even though the short-term spike in message volume was over, it would
>> continue to send messages from mem->disk->mem->net until it got the queue
>> all the way back down to 20% full. This extra use of disk I/O dramatically
>> reduced throughput and explained the apparent drop in log volume seen by
>> elasticsearch and kibana in my case, at least.
>>
>> Your high- and low-watermark numbers appear to also be 80% and 20% of the
>> queue size. So perhaps either 1) rsyslog gives up trying to send after 1
>> failure and zero retries (the default!), or 2) maybe it really is delivering
>> the backlog very slowly? It's possible that the disk-constrained outbound
>> traffic throughput may not be fast enough to empty the queue before more
>> incoming traffic gets appended to the queue and eventually fills the
>> filesystem.
>>
>> If it's the latter case, then you could either A) put the DA queue files on
>> faster disk, or B) increase the LowWathermark to something more like 50-60%
>> so it get out of write-to-disk mode faster.
>>
>> Some pstats output from the time leading up to and well into the problem
>> would be really helpful to get an idea of the message volume hitting the
>> main message queue and action queue, so you can tell where the messages are
>> going.
>>
>> - Dave
>
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

