Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Guozhang Wang Wed, 11 Feb 2015 16:19:42 -0800

If people have agreed upon this semantic:

<quote>
if you set retries > 0 you are saying "I accept
duplicates but want to ensure my stuff gets written", if you set retries =
0 you are saying "I can't abide duplicates and am willing to tolerate
loss". So Retryable for us means "retry may succeed".
<\quote>


then NotEnoughReplicasAfterAppend should be retriable.

PS: we can probably make it clearer in the new producer config table?

Guozhang

On Wed, Feb 11, 2015 at 5:41 AM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Thanks for the comments - however, it is not clear to me what your
> preference is on making NotEnoughReplicasAfterAppend retriable vs
> non-retriable.
>
> As for me, my preference is to leave it as retriable since it is clear
> that the produce may succeed on a retry (and may introduce a
> duplicate). I agree that idempotence will bring full closure to this
> though.
>
> Anyone else have a preference on this?
>
> Thanks,
>
> Joel
>
> On Tue, Feb 10, 2015 at 08:23:08PM -0800, Jay Kreps wrote:
> > Yeah there are really two concepts here as I think you noted:
> > 1. Retry safe: we know that the write did not occur
> > 2. Retry fixable: if you send that again it could work
> >
> > (probably there are better names for these).
> >
> > Some things we know did not do a write and may be fixed by retrying (no
> > leader). Some things we know didn't do a write and are not worth retrying
> > (message too large). Somethings we don't know and are worth retrying
> > (network error), and probably some things we don't know and aren't worth
> it
> > (can't think of one though).
> >
> > (I feel like Donald Rumsfeld with the "known unknowns" thing).
> >
> > In the current world if you set retries > 0 you are saying "I accept
> > duplicates but want to ensure my stuff gets written", if you set retries
> =
> > 0 you are saying "I can't abide duplicates and am willing to tolerate
> > loss". So Retryable for us means "retry may succeed".
> >
> > Originally I thought of maybe trying to model both concepts. However the
> > two arguments against it are:
> > 1. Even if you do this the guarantee remains "at least once delivery"
> > because: (1) in the network error case you just don't know, (2) consumer
> > failure.
> > 2. The proper fix for this is to add idempotence support on the server,
> > which we should do.
> >
> > Doing idempotence support on the server will actually fix all duplicate
> > problems, including the network error case (because of course the server
> > knows whether your write went through even though the client doesn't).
> When
> > we have that then the client can always just retry anything marked
> > Retriable (i.e. retry may work) without fear of duplicates.
> >
> > This gives exactly once delivery to the log, and a co-operating consumer
> > can use the offset to dedupe and get it end-to-end.
> >
> > So that was why I had just left one type of Retriable and used it to mean
> > "retry may work" and don't try to flag anything for duplicates.
> >
> > -Jay
> >
> >
> >
> >
> > On Tue, Feb 10, 2015 at 4:32 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >
> > > Hi Kafka Devs,
> > >
> > > Need your thoughts on retriable exceptions:
> > >
> > > If a user configures Kafka with min.isr > 1 and there are not enough
> > > replicas to safely store the data, there are two possibilities:
> > >
> > > 1. The lack of replicas was discovered before the message was written.
> We
> > > throw NotEnoughReplicas.
> > > 2. The lack of replicas was discovered after the message was written to
> > > leader. In this case, we throw  NotEnoughReplicasAfterAppend.
> > >
> > > Currently, both errors are Retriable. Which means that the new producer
> > > will retry multiple times.
> > > In case of the second exception, this will cause duplicates.
> > >
> > > KAFKA-1697 suggests:
> > > "we probably want to make NotEnoughReplicasAfterAppend a non-retriable
> > > exception and let the client decide what to do."
> > >
> > > I agreed that the client (the one using the Producer) should weight the
> > > problems duplicates will cause vs. the probability of losing the
> message
> > > and do something sensible and made the exception non-retriable.
> > >
> > > In the RB (https://reviews.apache.org/r/29647/) Joel raised a good
> point:
> > > (Joel, feel free to correct me if I misrepresented your point)
> > >
> > > "I think our interpretation of retriable is as follows (but we can
> discuss
> > > on the list if that needs to change): if the produce request hits an
> error,
> > > and there is absolutely no point in retrying then that is a
> non-retriable
> > > error. MessageSizeTooLarge is an example - since unless the producer
> > > changes the request to make the messages smaller there is no point in
> > > retrying.
> > >
> > > ...
> > > Duplicates can arise even for other errors (e.g., request timed out).
> So
> > > that side-effect is not compelling enough to warrant a change to make
> this
> > > non-retriable. "
> > >
> > > *(TL;DR;  )  Should exceptions where retries can cause duplicates
> should
> > > still be *
> > > *retriable?*
> > >
> > > Gwen
> > >
>
>


-- 
-- Guozhang

Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Reply via email to