[ 
https://issues.apache.org/jira/browse/KAFKA-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854858#comment-17854858
 ] 

Justine Olshan commented on KAFKA-16951:
----------------------------------------

[~gcampbell] for my understanding, this only requires the request.timeout.ms 
amount of time if the old coordinator is still shut down (ie not restarted in 
that time)? I agree in that case this is unfortunate and there is a way to make 
this faster.

> TransactionManager should rediscover coordinator on disconnection
> -----------------------------------------------------------------
>
>                 Key: KAFKA-16951
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16951
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, producer 
>    Affects Versions: 3.7.0
>            Reporter: Graham Campbell
>            Priority: Major
>
> When a transaction coordinator for a transactional client shuts down for 
> restart or due to failure, the NetworkClient notices the broker disconnection 
> and [will automatically refresh cluster 
> metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183]
>  to get the latest partition assignments.
> The TransactionManager does not notice any changes until the next 
> transactional request. If the broker is still offline, this is a [blocking 
> wait while the client attempts to reconnect to the old 
> coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490],
>  which can be up to request.timeout.ms long (default 35 seconds). Coordinator 
> lookup is only performed after a transactional request times out and fails. 
> The lookup is triggered in either the [Sender|#L525-L528]
>  or 
> [TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229]
>  error handling.
> To support faster recovery and faster reaction to transaction coordinator 
> reassignments, the TransactionManager should proactively lookup the 
> transaction coordinator whenever the client is disconnected from the current 
> transaction coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to