[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-14 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1590766255

   Thank you for the info. I'd say that in isolation, this specific issue 
(KAFKA-14034) should be resolved by retrying the in-flight batches.
   Are you suggesting that until KAFKA-14359 is fixed, we shouldn't try to fix 
KAFKA-14034, because we increase the risk of encountering KAFKA-14359?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-08 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1582188829

   > I guess I just need to clarify what retried batches are here -- is the 
idea that we wait for inflight batches to return a response or time out? What 
if the response triggers another retry? Would we prevent that from sending out?
   
   The core idea is that we let each of the in-flight batches complete, even if 
they need multiple retries. This would allow the producer to
   1. Avoid inconsistency - by letting in-flight batches finish, we do not run 
the risk of overwriting their sequence number while we are still not sure if 
they were appended or not.
   2. Operate with best-effort - when using an idempotent producer, and 
encountering an error, it is costly to verify if a message was appended to the 
log or not (I think the "official" suggestion is to consume the topic to 
verify). By letting the in-flight batches finish, the idempotent producer will 
report fewer false positive errors.
   
   
   > I'm also wondering the benefit of preserving the previous batches if there 
is an error. How does the system recover differently if we allow those batches 
to "complete". I think we could run into cases where the error causes the 
inflight batches to be unable to be written. Do we prefer to fail them (what we 
may do with this change) and start clean or try to write them with new 
sequences? I can see both scenarios causing issues.
   
   I believe that produce errors should be handled separately, and should not 
cascade to other batches. I think most errors do not really predict the result 
of other produce requests.
   
   > I guess it boils down to availability of writes (rewriting the sequences 
allows us to continue writing) or idempotency correctness (trying to wait for 
them to complete with their old sequences). The sticking point I'm running into 
is why getting those extra inflight requests (potentially) written is better if 
we've hit a non-retriable error.
   
   My understanding is that here correctness beats availability. Are you 
suggesting that we should just cancel in-flight batches when encountering an 
error?
   
   > Maybe I just need an example :)
   
   I will try to write up some examples, and also write more unit tests to 
demonstrate those scenarios.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-06 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1580030486

   @viktorsomogyi Since this part of the code is quite tricky, I would try to 
address the different issues in different PRs. I believe that the fix I'm 
proposing will solve the issue reported in KAFKA-14034 specifically. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-05 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1576211511

   > > Instead, the producer should wait for the preceding, retried batches to 
complete before resetting the sequence number. This ensures that the sequence 
numbers can only get reset after the preceding batches are definitely completed.
   > 
   > Can we guarantee that the retrying batches will complete?
   
   No, we can't - it all depends on the retries and the delivery timeout.
   Do you think that causes a problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-02 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1573381258

   @philipnee @ijuma @hachikuji @jolshan Could you please review this fix? 
Based on the git history, you made changes lately in the related code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] urbandan commented on pull request #13796: KAFKA-14034 Idempotent producer should wait for preceding in-flight b…

2023-06-01 Thread via GitHub


urbandan commented on PR #13796:
URL: https://github.com/apache/kafka/pull/13796#issuecomment-1572188562

   @viktorsomogyi Could you please review this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org