Graham Campbell created KAFKA-16951:
---------------------------------------

             Summary: TransactionManager should rediscover coordinator on 
disconnection
                 Key: KAFKA-16951
                 URL: https://issues.apache.org/jira/browse/KAFKA-16951
             Project: Kafka
          Issue Type: Improvement
          Components: clients, producer 
    Affects Versions: 3.7.0
            Reporter: Graham Campbell


When a transaction coordinator for a transactional client shuts down for 
restart or due to failure, the NetworkClient notices the broker disconnection 
and [will automatically refresh cluster 
metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183]
 to get the latest partition assignments.

The TransactionManager does not notice any changes until the next transactional 
request. If the broker is still offline, this is a [blocking wait while the 
client attempts to reconnect to the old 
coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490],
 which can be up to request.timeout.ms long (default 35 seconds). Coordinator 
lookup is only performed after a transactional request times out and fails. The 
lookup is triggered in either the [Sender|#L525-L528]
 or 
[TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229]
 error handling.

To support faster recovery and faster reaction to transaction coordinator 
reassignments, the TransactionManager should proactively lookup the transaction 
coordinator whenever the client is disconnected from the current transaction 
coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to