Makes sense to me. Thanks for the very detailed clarification, Jay :)
Will leave NotEnoughReplicasAfterAppend as retriable. Gwen On Wed, Feb 11, 2015 at 4:18 PM, Guozhang Wang <wangg...@gmail.com> wrote: > If people have agreed upon this semantic: > > <quote> > if you set retries > 0 you are saying "I accept > duplicates but want to ensure my stuff gets written", if you set retries = > 0 you are saying "I can't abide duplicates and am willing to tolerate > loss". So Retryable for us means "retry may succeed". > <\quote> > > then NotEnoughReplicasAfterAppend should be retriable. > > PS: we can probably make it clearer in the new producer config table? > > Guozhang > > On Wed, Feb 11, 2015 at 5:41 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > Thanks for the comments - however, it is not clear to me what your > > preference is on making NotEnoughReplicasAfterAppend retriable vs > > non-retriable. > > > > As for me, my preference is to leave it as retriable since it is clear > > that the produce may succeed on a retry (and may introduce a > > duplicate). I agree that idempotence will bring full closure to this > > though. > > > > Anyone else have a preference on this? > > > > Thanks, > > > > Joel > > > > On Tue, Feb 10, 2015 at 08:23:08PM -0800, Jay Kreps wrote: > > > Yeah there are really two concepts here as I think you noted: > > > 1. Retry safe: we know that the write did not occur > > > 2. Retry fixable: if you send that again it could work > > > > > > (probably there are better names for these). > > > > > > Some things we know did not do a write and may be fixed by retrying (no > > > leader). Some things we know didn't do a write and are not worth > retrying > > > (message too large). Somethings we don't know and are worth retrying > > > (network error), and probably some things we don't know and aren't > worth > > it > > > (can't think of one though). > > > > > > (I feel like Donald Rumsfeld with the "known unknowns" thing). > > > > > > In the current world if you set retries > 0 you are saying "I accept > > > duplicates but want to ensure my stuff gets written", if you set > retries > > = > > > 0 you are saying "I can't abide duplicates and am willing to tolerate > > > loss". So Retryable for us means "retry may succeed". > > > > > > Originally I thought of maybe trying to model both concepts. However > the > > > two arguments against it are: > > > 1. Even if you do this the guarantee remains "at least once delivery" > > > because: (1) in the network error case you just don't know, (2) > consumer > > > failure. > > > 2. The proper fix for this is to add idempotence support on the server, > > > which we should do. > > > > > > Doing idempotence support on the server will actually fix all duplicate > > > problems, including the network error case (because of course the > server > > > knows whether your write went through even though the client doesn't). > > When > > > we have that then the client can always just retry anything marked > > > Retriable (i.e. retry may work) without fear of duplicates. > > > > > > This gives exactly once delivery to the log, and a co-operating > consumer > > > can use the offset to dedupe and get it end-to-end. > > > > > > So that was why I had just left one type of Retriable and used it to > mean > > > "retry may work" and don't try to flag anything for duplicates. > > > > > > -Jay > > > > > > > > > > > > > > > On Tue, Feb 10, 2015 at 4:32 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > > > > > Hi Kafka Devs, > > > > > > > > Need your thoughts on retriable exceptions: > > > > > > > > If a user configures Kafka with min.isr > 1 and there are not enough > > > > replicas to safely store the data, there are two possibilities: > > > > > > > > 1. The lack of replicas was discovered before the message was > written. > > We > > > > throw NotEnoughReplicas. > > > > 2. The lack of replicas was discovered after the message was written > to > > > > leader. In this case, we throw NotEnoughReplicasAfterAppend. > > > > > > > > Currently, both errors are Retriable. Which means that the new > producer > > > > will retry multiple times. > > > > In case of the second exception, this will cause duplicates. > > > > > > > > KAFKA-1697 suggests: > > > > "we probably want to make NotEnoughReplicasAfterAppend a > non-retriable > > > > exception and let the client decide what to do." > > > > > > > > I agreed that the client (the one using the Producer) should weight > the > > > > problems duplicates will cause vs. the probability of losing the > > message > > > > and do something sensible and made the exception non-retriable. > > > > > > > > In the RB (https://reviews.apache.org/r/29647/) Joel raised a good > > point: > > > > (Joel, feel free to correct me if I misrepresented your point) > > > > > > > > "I think our interpretation of retriable is as follows (but we can > > discuss > > > > on the list if that needs to change): if the produce request hits an > > error, > > > > and there is absolutely no point in retrying then that is a > > non-retriable > > > > error. MessageSizeTooLarge is an example - since unless the producer > > > > changes the request to make the messages smaller there is no point in > > > > retrying. > > > > > > > > ... > > > > Duplicates can arise even for other errors (e.g., request timed out). > > So > > > > that side-effect is not compelling enough to warrant a change to make > > this > > > > non-retriable. " > > > > > > > > *(TL;DR; ) Should exceptions where retries can cause duplicates > > should > > > > still be * > > > > *retriable?* > > > > > > > > Gwen > > > > > > > > > > > -- > -- Guozhang >