[ https://issues.apache.org/jira/browse/KAFKA-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854858#comment-17854858 ]
Justine Olshan commented on KAFKA-16951: ---------------------------------------- [~gcampbell] for my understanding, this only requires the request.timeout.ms amount of time if the old coordinator is still shut down (ie not restarted in that time)? I agree in that case this is unfortunate and there is a way to make this faster. > TransactionManager should rediscover coordinator on disconnection > ----------------------------------------------------------------- > > Key: KAFKA-16951 > URL: https://issues.apache.org/jira/browse/KAFKA-16951 > Project: Kafka > Issue Type: Improvement > Components: clients, producer > Affects Versions: 3.7.0 > Reporter: Graham Campbell > Priority: Major > > When a transaction coordinator for a transactional client shuts down for > restart or due to failure, the NetworkClient notices the broker disconnection > and [will automatically refresh cluster > metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183] > to get the latest partition assignments. > The TransactionManager does not notice any changes until the next > transactional request. If the broker is still offline, this is a [blocking > wait while the client attempts to reconnect to the old > coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490], > which can be up to request.timeout.ms long (default 35 seconds). Coordinator > lookup is only performed after a transactional request times out and fails. > The lookup is triggered in either the [Sender|#L525-L528] > or > [TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229] > error handling. > To support faster recovery and faster reaction to transaction coordinator > reassignments, the TransactionManager should proactively lookup the > transaction coordinator whenever the client is disconnected from the current > transaction coordinator. -- This message was sent by Atlassian Jira (v8.20.10#820010)