On Mon, 4 Aug 2014, Doug McClure wrote:
Thanks. I fully suspect the bottleneck upstream and am adding more instance
of logstash/shippers behind my haproxy instance. If there wasn't a
bottleneck, would rsyslog push more out on its own or do I need to increase
batch/thread/queue?
rsyslog can handle a very large number of requests without needing to increase
threads. The number of worker threads should only be increased if a single
thread shows that it's bottlenecked on CPU (and even then, it may make more
sense to split the work a bit instead)
# threads/workers
This should only be increased if the existing thread(s) are bottlenecked on
CPU. This is pretty rare, but can happen if you are doing a lot of string
manipulation (regex, complex templates, etc)
Batch Size
This can be increased with very little risk (the default should probably be
increased at some point), but it does have diminishing returns
A batch will only be as large as the number of messages that have arrived
since the last batch was processed, so in general the batches that are processed
end up being small (it would be handy to get stats of this) even with a large
batch size configured.
Large batches only end up being processed when the outputs can't keep up with
small batches. For databases, I've seen them handle 1000 inserts in a single
batch in about the same time it would take to handle 2 inserts as two separate
requests.
If your backend that you are sending to just splits the requests and handles
them individually, large batch sizes are unlikely to help.
The drawback of large batch sizes are:
A. the message sent to the output can get large
B. on some imudp, rsyslog will use the same timestamp for up to batchsize
messages if there is no gap between them
The default batch size started out at 16, but I believe it's been bumped to
128, going to 1024 or so is unlikely to hurt.
Queue Sizes
The purpose of a queue is to handle a burst of requests until they can be
output. Think of them as a buffer to smooth things out.
If you don't have outages, the queue size should be able to be small (a
second or so worth of messages should be enough)
To deal with an outage, your queue needs to be large enough to hold all the
messages that you want to deliver after the outage is restored. If you can't fit
this many messages in memory, you need to either configure the watermark levels
to throw some messages away, or you need to spill to disk (da queues). Using a
DA queue is much slower than using a memory queue, so you can find yourself in a
situation where it takes a LONG time to flush the queue.
If you have outputs that are likely to have outages indepenently of other
outputs (network vs local disk, different network destinations, etc) then it
makes sense to create a queue for just that destination (or a rulsets worth of
destinations to avoid the overhead of queuing the same message many times)
to get more out assuming a steady state? Will it try
to push/process as much out from queues/cache every time and backoff if it
can't?
everything that arrives is put into the main queue by the im* threads and is
processed as quickly as it can be by the worker threads running the om* code.
Is there a point of no return (no improvement) in tweaking the knobs?
absolutly, in fact if you configure too many worker threads and/or too many
queues rsyslog can end up spending all it's cpu locking and unlocking the queues
and end up getting very little actual work done.
David Lang
Doug
On Mon, Aug 4, 2014 at 3:02 PM, David Lang <[email protected]> wrote:
well, it's clear that you are getting new requests FAR faster than you are
processing them
Mon Aug 4 13:14:16 2014: imuxsock: submitted=3 ratelimit.discarded=0
ratelimit.numratelimiters=2
Mon Aug 4 13:14:16 2014: action 1: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 2: processed=603 failed=0
Mon Aug 4 13:14:16 2014: action 3: processed=547 failed=0
Mon Aug 4 13:14:16 2014: action 4: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 5: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 6: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 7: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 8: processed=0 failed=0
Mon Aug 4 13:14:16 2014: action 9: processed=0 failed=0
Mon Aug 4 13:14:16 2014: logstashforwarder: processed=270878 failed=0
Mon Aug 4 13:14:16 2014: imptcp(*/10514/IPv4): submitted=270859
Mon Aug 4 13:14:16 2014: imptcp(*/10514/IPv6): submitted=0
Mon Aug 4 13:14:16 2014: logstashforwarder[DA]: size=73726973
enqueued=114807 full=0 discarded.full=0 discarded.nf=0 maxqsize=73756802
Mon Aug 4 13:14:16 2014: logstashforwarder: size=147 enqueued=270878
full=0 discarded.full=0 discarded.nf=0 maxqsize=9770
Mon Aug 4 13:14:16 2014: main Q: size=0 enqueued=270878 full=0
discarded.full=0 discarded.nf=0 maxqsize=31209
Mon Aug 4 13:15:16 2014: imuxsock: submitted=10 ratelimit.discarded=0
ratelimit.numratelimiters=6
Mon Aug 4 13:15:16 2014: action 1: processed=0 failed=0
Mon Aug 4 13:15:16 2014: action 2: processed=1877 failed=0
Mon Aug 4 13:15:16 2014: action 3: processed=592 failed=0
Mon Aug 4 13:15:16 2014: action 4: processed=4 failed=0
Mon Aug 4 13:15:16 2014: action 5: processed=2 failed=0
Mon Aug 4 13:15:16 2014: action 6: processed=0 failed=0
Mon Aug 4 13:15:16 2014: action 7: processed=0 failed=0
Mon Aug 4 13:15:16 2014: action 8: processed=0 failed=0
Mon Aug 4 13:15:16 2014: action 9: processed=0 failed=0
Mon Aug 4 13:15:16 2014: logstashforwarder: processed=694102 failed=0
Mon Aug 4 13:15:16 2014: imptcp(*/10514/IPv4): submitted=696044
Mon Aug 4 13:15:16 2014: imptcp(*/10514/IPv6): submitted=0
Mon Aug 4 13:15:16 2014: logstashforwarder[DA]: size=73817861
enqueued=317479 full=0 discarded.full=0 discarded.nf=0 maxqsize=73817861
Mon Aug 4 13:15:16 2014: logstashforwarder: size=1392 enqueued=694130
full=0 discarded.full=0 discarded.nf=0 maxqsize=9770
Mon Aug 4 13:15:16 2014: main Q: size=4150 enqueued=696078 full=0
discarded.full=0 discarded.nf=0 maxqsize=31209
if you look at the queue sizes, in this timeframe you fell WAY behind, you
received more messages more than you processed (the difference in the cache
sizes for the logstashforwarder Q, logstashforwarder[DA] and main Q size
stats). It looks like you fell behind by >100k messages
So this looks to me like the logstash instance just isn't able to keep up,
can you look at the data there?
also, it would be good to restart this with the DA cache files removed,
putting messages into the DA cache files does cost performance.
At this data volume, I'd also suggest changing the impstats time down to
something like 10 seconds so that the numbers don't get too big.
David Lang
On Mon, 4 Aug 2014, Doug McClure wrote:
Date: Mon, 4 Aug 2014 14:49:38 -0400
From: Doug McClure <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] Finding the holy grail tuning setting...
I appreciate it - I desire an objective approach this challenge!
Attached is a fresh impstats file. Appreciate any interpretation advice
and tuning actions.
Doug
On Mon, Aug 4, 2014 at 1:05 PM, David Lang <[email protected]> wrote:
On Mon, 4 Aug 2014, Doug McClure wrote:
I've read, re-read and read again everything I can find out there on
queues, options, etc. and still feel I don't really know what I'm doing
other than haphazardly changing one or more settings hoping to get more
data through/out of rsyslog.
I'm growing about one 1GB DA cache file every 10 min or so and I can't
seem
to increase the processing to clear them up. I probably clear one for
every 2-4 new ones that are created.
What's the best setting to focus on to increase DA queue file
processing?
I've taken dequeuebatchsize from as low as 100 or 1000 (which everything
seems to talk about) to as high as 100,000 or more and I can't seem to
hit
a sweet spot. I have varied threads up to 200. Queue size up to 1G or
500M
or 200K.
What are the rules of thumb here - change X watch Y until you get to
some
ceiling and you need to add more system resources or other upstream?
well, rather than focusing on the DA queue handling, let's try and figure
out what's slow and causing things to queue
have you configured impstats? configure it to log to a file, and log
fairly frequently and we should be able to see what action is holding
things up. Once we know that we can work to figure out how to solve that
bottlneck.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.