Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Gwen Shapira Wed, 11 Feb 2015 16:22:13 -0800

Makes sense to me.

Thanks for the very detailed clarification, Jay :)


Will leave NotEnoughReplicasAfterAppend as retriable.

Gwen

On Wed, Feb 11, 2015 at 4:18 PM, Guozhang Wang <[email protected]> wrote:

> If people have agreed upon this semantic:
>
> <quote>
> if you set retries > 0 you are saying "I accept
> duplicates but want to ensure my stuff gets written", if you set retries =
> 0 you are saying "I can't abide duplicates and am willing to tolerate
> loss". So Retryable for us means "retry may succeed".
> <\quote>
>
> then NotEnoughReplicasAfterAppend should be retriable.
>
> PS: we can probably make it clearer in the new producer config table?
>
> Guozhang
>
> On Wed, Feb 11, 2015 at 5:41 AM, Joel Koshy <[email protected]> wrote:
>
> > Thanks for the comments - however, it is not clear to me what your
> > preference is on making NotEnoughReplicasAfterAppend retriable vs
> > non-retriable.
> >
> > As for me, my preference is to leave it as retriable since it is clear
> > that the produce may succeed on a retry (and may introduce a
> > duplicate). I agree that idempotence will bring full closure to this
> > though.
> >
> > Anyone else have a preference on this?
> >
> > Thanks,
> >
> > Joel
> >
> > On Tue, Feb 10, 2015 at 08:23:08PM -0800, Jay Kreps wrote:
> > > Yeah there are really two concepts here as I think you noted:
> > > 1. Retry safe: we know that the write did not occur
> > > 2. Retry fixable: if you send that again it could work
> > >
> > > (probably there are better names for these).
> > >
> > > Some things we know did not do a write and may be fixed by retrying (no
> > > leader). Some things we know didn't do a write and are not worth
> retrying
> > > (message too large). Somethings we don't know and are worth retrying
> > > (network error), and probably some things we don't know and aren't
> worth
> > it
> > > (can't think of one though).
> > >
> > > (I feel like Donald Rumsfeld with the "known unknowns" thing).
> > >
> > > In the current world if you set retries > 0 you are saying "I accept
> > > duplicates but want to ensure my stuff gets written", if you set
> retries
> > =
> > > 0 you are saying "I can't abide duplicates and am willing to tolerate
> > > loss". So Retryable for us means "retry may succeed".
> > >
> > > Originally I thought of maybe trying to model both concepts. However
> the
> > > two arguments against it are:
> > > 1. Even if you do this the guarantee remains "at least once delivery"
> > > because: (1) in the network error case you just don't know, (2)
> consumer
> > > failure.
> > > 2. The proper fix for this is to add idempotence support on the server,
> > > which we should do.
> > >
> > > Doing idempotence support on the server will actually fix all duplicate
> > > problems, including the network error case (because of course the
> server
> > > knows whether your write went through even though the client doesn't).
> > When
> > > we have that then the client can always just retry anything marked
> > > Retriable (i.e. retry may work) without fear of duplicates.
> > >
> > > This gives exactly once delivery to the log, and a co-operating
> consumer
> > > can use the offset to dedupe and get it end-to-end.
> > >
> > > So that was why I had just left one type of Retriable and used it to
> mean
> > > "retry may work" and don't try to flag anything for duplicates.
> > >
> > > -Jay
> > >
> > >
> > >
> > >
> > > On Tue, Feb 10, 2015 at 4:32 PM, Gwen Shapira <[email protected]>
> > wrote:
> > >
> > > > Hi Kafka Devs,
> > > >
> > > > Need your thoughts on retriable exceptions:
> > > >
> > > > If a user configures Kafka with min.isr > 1 and there are not enough
> > > > replicas to safely store the data, there are two possibilities:
> > > >
> > > > 1. The lack of replicas was discovered before the message was
> written.
> > We
> > > > throw NotEnoughReplicas.
> > > > 2. The lack of replicas was discovered after the message was written
> to
> > > > leader. In this case, we throw  NotEnoughReplicasAfterAppend.
> > > >
> > > > Currently, both errors are Retriable. Which means that the new
> producer
> > > > will retry multiple times.
> > > > In case of the second exception, this will cause duplicates.
> > > >
> > > > KAFKA-1697 suggests:
> > > > "we probably want to make NotEnoughReplicasAfterAppend a
> non-retriable
> > > > exception and let the client decide what to do."
> > > >
> > > > I agreed that the client (the one using the Producer) should weight
> the
> > > > problems duplicates will cause vs. the probability of losing the
> > message
> > > > and do something sensible and made the exception non-retriable.
> > > >
> > > > In the RB (https://reviews.apache.org/r/29647/) Joel raised a good
> > point:
> > > > (Joel, feel free to correct me if I misrepresented your point)
> > > >
> > > > "I think our interpretation of retriable is as follows (but we can
> > discuss
> > > > on the list if that needs to change): if the produce request hits an
> > error,
> > > > and there is absolutely no point in retrying then that is a
> > non-retriable
> > > > error. MessageSizeTooLarge is an example - since unless the producer
> > > > changes the request to make the messages smaller there is no point in
> > > > retrying.
> > > >
> > > > ...
> > > > Duplicates can arise even for other errors (e.g., request timed out).
> > So
> > > > that side-effect is not compelling enough to warrant a change to make
> > this
> > > > non-retriable. "
> > > >
> > > > *(TL;DR;  )  Should exceptions where retries can cause duplicates
> > should
> > > > still be *
> > > > *retriable?*
> > > >
> > > > Gwen
> > > >
> >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Reply via email to