On 05/17/2018 05:52 AM, Brian Knox wrote:
To my knowledge, Rich is correct. This also would explain a case we
hit maybe every couple of months, where rsyslog very quickly
duplicates some messages it is sending to elasticsearch. I would
assume this would be a case where a batch is submitted, only some of
the messages are rejected, and rsyslog then duplicates messages trying
to send the batch over and over again.
You can confirm this by monitoring the bulk index thread pool
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/cat-thread-pool.html
to see if you are getting bulk rejections.
On Thu, May 17, 2018 at 12:08 AM David Lang <da...@lang.hm
<mailto:da...@lang.hm>> wrote:
On Wed, 16 May 2018, Rich Megginson wrote:
> On 05/16/2018 05:58 PM, David Lang wrote:
>> there's no need to add this extra complexity (multiple rulesets
and queues)
>>
>> What should be happening (on any output module) is:
>>
>> submit a batch.
>> If rejected with a soft error, retry/suspend the output
>
> retry of the entire batch? see below
>
>> if batch-size=1 and a hard error, send to errorfile
>> if rejected with a hard error resubmit half of the batch
>
> But what if 90% of the batch was successfully added? Then you
are needlessly
> resubmitting many of the records in the batch.
when submitting batches, you get a success/fail for the batch as a
whole (for
99% of things that actually allow you to insert in batches), so
you don't know
what message failed. This is a database transaction (again, in
most cases), so
if a batch fails, all you can do is bisect to figure out what
message fails. If
the endpoint is inserting some of the messages from a batch that
fails, that's
usually a bad thing.
now, if ES batch mode isn't an ACID transaction and it accepts
some messages and
then tells you which ones failed, then you can mark the ones
accepted as done
and just retry the ones that fail. But there's still no need for a
separate
ruleset and queue. In Rsyslog, if an output cannot accept a
message and there's
reason to think that it will in the future, then you suspend that
output and try
again later. If you have reason to believe that the message is
never going to be
able to be delivered, then you need to fail the message or you
will be stuck
forever. This is what the error output was made for.
> If using the "index" (default) bulk type, this causes duplicate
records to be
> added.
> If using the "create" type (and you have assigned a unique _id),
you will get
> back many 409 Duplicate errors.
> This causes problems - we know because this is how the fluentd
plugin used to
> work, which is why we had to change it.
>
>
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section
> "Bulk Rejections"
> "It is much better to handle queuing in your application by
gracefully
> handling the back pressure from a full queue. When you receive bulk
> rejections, you should take these steps:
>
> Pause the import thread for 3–5 seconds.
> Extract the rejected actions from the bulk response, since
it is probable
> that many of the actions were successful. The bulk response will
tell you
> which succeeded and which were rejected.
> Send a new bulk request with just the rejected actions.
> Repeat from step 1 if rejections are encountered again.
>
> Using this procedure, your code naturally adapts to the load of
your cluster
> and naturally backs off.
> "
Does it really accept some and reject some in a random manner? or
is it a matter
of accepting the first X and rejecting any after that point? The
first is easier
to deal with.
Batch mode was created to be able to more efficiently process
messages that are
inserted into databases, we then found that the reduced queue
congestion was a
significant advantage in itself.
But unless you have a queue just for the ES action, doing queue
manipulation
isn't possible, all you can do is succeed or fail, and if you
fail, the retry
logic will kick in.
Rainer is going to need to comment on this.
David Lang
>
>> repeat
>>
>> all that should be needed is to add tests into omelasticsearch
to detect
>> the soft errors and turn them into retries (or suspend the
output as
>> appropriate)
>>
>> David Lang
>
>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.