Sean Quah created KAFKA-16386:
---------------------------------
Summary: NETWORK_EXCEPTIONs from transaction verification are not
translated
Key: KAFKA-16386
URL: https://issues.apache.org/jira/browse/KAFKA-16386
Project: Kafka
Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Sean Quah
KAFKA-14402
([KIP-890|https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense])
adds verification with the transaction coordinator on Produce and
TxnOffsetCommit paths as a defense against hanging transactions. For
compatibility with older clients, retriable errors from the verification step
are translated to ones already expected and handled by existing clients. When
verification was added, we forgot to translate {{NETWORK_EXCEPTION}} s.
[~dajac] noticed this manifesting as a test failure when
tests/kafkatest/tests/core/transactions_test.py was run with an older client
(pre KAFKA-16122):
{quote}
{{NETWORK_EXCEPTION}} is indeed returned as a partition error. The
{{TransactionManager.TxnOffsetCommitHandler}} considers it as a fatal error so
it transitions to the fatal state.
It seems that there are two cases where the server could return it: (1) When
the verification request times out or its connections is cut; or (2) in
{{AddPartitionsToTxnManager.addTxnData}} where we say that we use it because we
want a retriable error.
{quote}
The first case was triggered as part of the test. The second case happens when
there is already a verification request ({{AddPartitionsToTxn}}) in flight with
the same epoch and we want clients to try again when we're not busy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)