To my knowledge, Rich is correct. This also would explain a case we hit
maybe every couple of months, where rsyslog very quickly duplicates some
messages it is sending to elasticsearch. I would assume this would be a
case where a batch is submitted, only some of the messages are rejected,
and rsyslog then duplicates messages trying to send the batch over and over
again.

On Thu, May 17, 2018 at 12:08 AM David Lang <da...@lang.hm> wrote:

> On Wed, 16 May 2018, Rich Megginson wrote:
>
> > On 05/16/2018 05:58 PM, David Lang wrote:
> >> there's no need to add this extra complexity (multiple rulesets and
> queues)
> >>
> >> What should be happening (on any output module) is:
> >>
> >> submit a batch.
> >>    If rejected with a soft error, retry/suspend the output
> >
> > retry of the entire batch?  see below
> >
> >> if batch-size=1 and a hard error, send to errorfile
> >>    if rejected with a hard error resubmit half of the batch
> >
> > But what if 90% of the batch was successfully added?  Then you are
> needlessly
> > resubmitting many of the records in the batch.
>
> when submitting batches, you get a success/fail for the batch as a whole
> (for
> 99% of things that actually allow you to insert in batches), so you don't
> know
> what message failed. This is a database transaction (again, in most
> cases), so
> if a batch fails, all you can do is bisect to figure out what message
> fails. If
> the endpoint is inserting some of the messages from a batch that fails,
> that's
> usually a bad thing.
>
> now, if ES batch mode isn't an ACID transaction and it accepts some
> messages and
> then tells you which ones failed, then you can mark the ones accepted as
> done
> and just retry the ones that fail. But there's still no need for a
> separate
> ruleset and queue. In Rsyslog, if an output cannot accept a message and
> there's
> reason to think that it will in the future, then you suspend that output
> and try
> again later. If you have reason to believe that the message is never going
> to be
> able to be delivered, then you need to fail the message or you will be
> stuck
> forever. This is what the error output was made for.
>
> > If using the "index" (default) bulk type, this causes duplicate records
> to be
> > added.
> > If using the "create" type (and you have assigned a unique _id), you
> will get
> > back many 409 Duplicate errors.
> > This causes problems - we know because this is how the fluentd plugin
> used to
> > work, which is why we had to change it.
> >
> >
> https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section
> > "Bulk Rejections"
> > "It is much better to handle queuing in your application by gracefully
> > handling the back pressure from a full queue. When you receive bulk
> > rejections, you should take these steps:
> >
> >     Pause the import thread for 3–5 seconds.
> >     Extract the rejected actions from the bulk response, since it is
> probable
> > that many of the actions were successful. The bulk response will tell
> you
> > which succeeded and which were rejected.
> >     Send a new bulk request with just the rejected actions.
> >     Repeat from step 1 if rejections are encountered again.
> >
> > Using this procedure, your code naturally adapts to the load of your
> cluster
> > and naturally backs off.
> > "
>
> Does it really accept some and reject some in a random manner? or is it a
> matter
> of accepting the first X and rejecting any after that point? The first is
> easier
> to deal with.
>
> Batch mode was created to be able to more efficiently process messages
> that are
> inserted into databases, we then found that the reduced queue congestion
> was a
> significant advantage in itself.
>
> But unless you have a queue just for the ES action, doing queue
> manipulation
> isn't possible, all you can do is succeed or fail, and if you fail, the
> retry
> logic will kick in.
>
> Rainer is going to need to comment on this.
>
> David Lang
>
> >
> >> repeat
> >>
> >> all that should be needed is to add tests into omelasticsearch to
> detect
> >> the soft errors and turn them into retries (or suspend the output as
> >> appropriate)
> >>
> >> David Lang
> >
> >
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to