divijvaidya commented on PR #14242: URL: https://github.com/apache/kafka/pull/14242#issuecomment-1779404361
> If we ignore producer-state-flush failure here, recovery-point might be incremented even with stale on-disk producer state snapshot. So, in case of restart after power failure, the broker might restore stale producer state without rebuilding (since recovery point is incremented) which could cause idempotency issues. Great point. May I suggest that we document the consistency expectations of producer snapshot with segment on the disk. From what you mentioned, it sounds like "Kafka expects producer snapshot to be strongly consistent with the segment data on disk before the recovery checkpoint but doesn't expect after the checkpoint. The inconsistency after the checkpoint is acceptable because....blah blah" We verify that expectations with experts such as Justine and Jun. Based on that we can make a decision of quietly vs. async etc. The documentation will also help future contributions reason about code base. Initially, you can put the documentation in the description of this PR itself and later we can find a home for it in Kafka website docs. We need to do the same exercise for other files that you are changing in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org