Re: [rsyslog] Finding the holy grail tuning setting...

David Lang Mon, 04 Aug 2014 13:26:47 -0700

On Mon, 4 Aug 2014, Doug McClure wrote:

Thanks. I fully suspect the bottleneck upstream and am adding more instance
of logstash/shippers behind my haproxy instance.  If there wasn't a
bottleneck, would rsyslog push more out on its own or do I need to increase
batch/thread/queue?

rsyslog can handle a very large number of requests without needing to increasethreads. The number of worker threads should only be increased if a singlethread shows that it's bottlenecked on CPU (and even then, it may make moresense to split the work a bit instead)


# threads/workers

This should only be increased if the existing thread(s) are bottlenecked onCPU. This is pretty rare, but can happen if you are doing a lot of stringmanipulation (regex, complex templates, etc)


Batch Size

This can be increased with very little risk (the default should probably beincreased at some point), but it does have diminishing returns

A batch will only be as large as the number of messages that have arrivedsince the last batch was processed, so in general the batches that are processedend up being small (it would be handy to get stats of this) even with a largebatch size configured.

Large batches only end up being processed when the outputs can't keep up withsmall batches. For databases, I've seen them handle 1000 inserts in a singlebatch in about the same time it would take to handle 2 inserts as two separaterequests.

If your backend that you are sending to just splits the requests and handlesthem individually, large batch sizes are unlikely to help.


  The drawback of large batch sizes are:

    A. the message sent to the output can get large

B. on some imudp, rsyslog will use the same timestamp for up to batchsizemessages if there is no gap between them

The default batch size started out at 16, but I believe it's been bumped to128, going to 1024 or so is unlikely to hurt.



Queue Sizes

The purpose of a queue is to handle a burst of requests until they can beoutput. Think of them as a buffer to smooth things out.

If you don't have outages, the queue size should be able to be small (asecond or so worth of messages should be enough)

To deal with an outage, your queue needs to be large enough to hold all themessages that you want to deliver after the outage is restored. If you can't fitthis many messages in memory, you need to either configure the watermark levelsto throw some messages away, or you need to spill to disk (da queues). Using aDA queue is much slower than using a memory queue, so you can find yourself in asituation where it takes a LONG time to flush the queue.

If you have outputs that are likely to have outages indepenently of otheroutputs (network vs local disk, different network destinations, etc) then itmakes sense to create a queue for just that destination (or a rulsets worth ofdestinations to avoid the overhead of queuing the same message many times)

to get more out assuming a steady state?  Will it try
to push/process as much out from queues/cache every time and backoff if it
can't?

everything that arrives is put into the main queue by the im* threads and isprocessed as quickly as it can be by the worker threads running the om* code.

Is there a point of no return (no improvement) in tweaking the knobs?

absolutly, in fact if you configure too many worker threads and/or too manyqueues rsyslog can end up spending all it's cpu locking and unlocking the queuesand end up getting very little actual work done.


David Lang

Doug


On Mon, Aug 4, 2014 at 3:02 PM, David Lang <[email protected]> wrote:

well, it's clear that you are getting new requests FAR faster than you are
processing them

Mon Aug  4 13:14:16 2014: imuxsock: submitted=3 ratelimit.discarded=0
ratelimit.numratelimiters=2
Mon Aug  4 13:14:16 2014: action 1: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 2: processed=603 failed=0
Mon Aug  4 13:14:16 2014: action 3: processed=547 failed=0
Mon Aug  4 13:14:16 2014: action 4: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 5: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 6: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 7: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 8: processed=0 failed=0
Mon Aug  4 13:14:16 2014: action 9: processed=0 failed=0
Mon Aug  4 13:14:16 2014: logstashforwarder: processed=270878 failed=0
Mon Aug  4 13:14:16 2014: imptcp(*/10514/IPv4): submitted=270859
Mon Aug  4 13:14:16 2014: imptcp(*/10514/IPv6): submitted=0
Mon Aug  4 13:14:16 2014: logstashforwarder[DA]: size=73726973
enqueued=114807 full=0 discarded.full=0 discarded.nf=0 maxqsize=73756802
Mon Aug  4 13:14:16 2014: logstashforwarder: size=147 enqueued=270878
full=0 discarded.full=0 discarded.nf=0 maxqsize=9770
Mon Aug  4 13:14:16 2014: main Q: size=0 enqueued=270878 full=0
discarded.full=0 discarded.nf=0 maxqsize=31209



Mon Aug  4 13:15:16 2014: imuxsock: submitted=10 ratelimit.discarded=0
ratelimit.numratelimiters=6
Mon Aug  4 13:15:16 2014: action 1: processed=0 failed=0
Mon Aug  4 13:15:16 2014: action 2: processed=1877 failed=0
Mon Aug  4 13:15:16 2014: action 3: processed=592 failed=0
Mon Aug  4 13:15:16 2014: action 4: processed=4 failed=0
Mon Aug  4 13:15:16 2014: action 5: processed=2 failed=0
Mon Aug  4 13:15:16 2014: action 6: processed=0 failed=0
Mon Aug  4 13:15:16 2014: action 7: processed=0 failed=0
Mon Aug  4 13:15:16 2014: action 8: processed=0 failed=0
Mon Aug  4 13:15:16 2014: action 9: processed=0 failed=0
Mon Aug  4 13:15:16 2014: logstashforwarder: processed=694102 failed=0
Mon Aug  4 13:15:16 2014: imptcp(*/10514/IPv4): submitted=696044
Mon Aug  4 13:15:16 2014: imptcp(*/10514/IPv6): submitted=0
Mon Aug  4 13:15:16 2014: logstashforwarder[DA]: size=73817861
enqueued=317479 full=0 discarded.full=0 discarded.nf=0 maxqsize=73817861
Mon Aug  4 13:15:16 2014: logstashforwarder: size=1392 enqueued=694130
full=0 discarded.full=0 discarded.nf=0 maxqsize=9770
Mon Aug  4 13:15:16 2014: main Q: size=4150 enqueued=696078 full=0
discarded.full=0 discarded.nf=0 maxqsize=31209

if you look at the queue sizes, in this timeframe you fell WAY behind, you
received more messages more than you processed (the difference in the cache
sizes for the logstashforwarder Q, logstashforwarder[DA] and main Q size
stats). It looks like you fell behind by >100k messages

So this looks to me like the logstash instance just isn't able to keep up,
can you look at the data there?

also, it would be good to restart this with the DA cache files removed,
putting messages into the DA cache files does cost performance.

At this data volume, I'd also suggest changing the impstats time down to
something like 10 seconds so that the numbers don't get too big.

David Lang


On Mon, 4 Aug 2014, Doug McClure wrote:

 Date: Mon, 4 Aug 2014 14:49:38 -0400

From: Doug McClure <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] Finding the holy grail tuning setting...

I appreciate it - I desire an objective approach this challenge!

Attached is a fresh impstats file.  Appreciate any interpretation advice
and tuning actions.

Doug

On Mon, Aug 4, 2014 at 1:05 PM, David Lang <[email protected]> wrote:

 On Mon, 4 Aug 2014, Doug McClure wrote:


 I've read, re-read and read again everything I can find out there on

queues, options, etc. and still feel I don't really know what I'm doing
other than haphazardly changing one or more settings hoping to get more
data through/out of rsyslog.

I'm growing about one 1GB DA cache file every 10 min or so and I can't
seem
to increase the processing to clear them up.  I probably clear one for
every 2-4 new ones that are created.

What's the best setting to focus on to increase DA queue file
processing?
I've taken dequeuebatchsize from as low as 100 or 1000 (which everything
seems to talk about) to as high as 100,000 or more and I can't seem to
hit
a sweet spot.  I have varied threads up to 200. Queue size up to 1G or
500M
or 200K.

What are the rules of thumb here - change X watch Y until you get to
some
ceiling and you need to add more system resources or other upstream?

well, rather than focusing on the DA queue handling, let's try and figure
out what's slow and causing things to queue

have you configured impstats? configure it to log to a file, and log
fairly frequently and we should be able to see what action is holding
things up. Once we know that we can work to figure out how to solve that
bottlneck.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Finding the holy grail tuning setting...

Reply via email to