Justine Olshan created KAFKA-14920:
--------------------------------------
Summary: Address timeouts and out of order sequences
Key: KAFKA-14920
URL: https://issues.apache.org/jira/browse/KAFKA-14920
Project: Kafka
Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan
KAFKA-14844 showed the destructive nature of a timeout on the first produce
request for a topic partition (ie one that has no state in psm)
Since we currently don't validate the first sequence (we will in part 2 of
kip-890), any transient error on the first produce can lead to out of order
sequences that never recover.
Originally, KAFKA-14561 relied on the producer's retry mechanism for these
transient issues, but until that is fixed, we may need to retry from in the
AddPartitionsManager instead. We addressed the concurrent transactions, but
there are other errors like coordinator loading that we could run into and see
increased out of order issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)