Graham Campbell created KAFKA-16951:
---------------------------------------
Summary: TransactionManager should rediscover coordinator on
disconnection
Key: KAFKA-16951
URL: https://issues.apache.org/jira/browse/KAFKA-16951
Project: Kafka
Issue Type: Improvement
Components: clients, producer
Affects Versions: 3.7.0
Reporter: Graham Campbell
When a transaction coordinator for a transactional client shuts down for
restart or due to failure, the NetworkClient notices the broker disconnection
and [will automatically refresh cluster
metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183]
to get the latest partition assignments.
The TransactionManager does not notice any changes until the next transactional
request. If the broker is still offline, this is a [blocking wait while the
client attempts to reconnect to the old
coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490],
which can be up to request.timeout.ms long (default 35 seconds). Coordinator
lookup is only performed after a transactional request times out and fails. The
lookup is triggered in either the [Sender|#L525-L528]
or
[TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229]
error handling.
To support faster recovery and faster reaction to transaction coordinator
reassignments, the TransactionManager should proactively lookup the transaction
coordinator whenever the client is disconnected from the current transaction
coordinator.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)