[jira] [Commented] (KAFKA-16951) TransactionManager should rediscover coordinator on disconnection

Graham Campbell (Jira) Tue, 18 Jun 2024 13:04:05 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856057#comment-17856057
 ]


Graham Campbell commented on KAFKA-16951:
-----------------------------------------

Yes, if the original coordinator is online the transactional request will 
either succeed as normal if leader election has happened for the relevant 
__transaction_state partition or quickly return a NOT_COORDINATOR error.

 

I've made an attempt to generalize the handleServerDisconnect method used by 
the MetadataUpdater to be a more general interface in the linked PR

Related to this ticket I also opened KAFKA-16902 to use the 
socket.connection.setup.timeout.ms config to reduce the impact of attempting 
reconnection.

> TransactionManager should rediscover coordinator on disconnection
> -----------------------------------------------------------------
>
>                 Key: KAFKA-16951
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16951
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, producer 
>    Affects Versions: 3.7.0
>            Reporter: Graham Campbell
>            Priority: Major
>
> When a transaction coordinator for a transactional client shuts down for 
> restart or due to failure, the NetworkClient notices the broker disconnection 
> and [will automatically refresh cluster 
> metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183]
>  to get the latest partition assignments.
> The TransactionManager does not notice any changes until the next 
> transactional request. If the broker is still offline, this is a [blocking 
> wait while the client attempts to reconnect to the old 
> coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490],
>  which can be up to request.timeout.ms long (default 35 seconds). Coordinator 
> lookup is only performed after a transactional request times out and fails. 
> The lookup is triggered in either the [Sender|#L525-L528]
>  or 
> [TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229]
>  error handling.
> To support faster recovery and faster reaction to transaction coordinator 
> reassignments, the TransactionManager should proactively lookup the 
> transaction coordinator whenever the client is disconnected from the current 
> transaction coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-16951) TransactionManager should rediscover coordinator on disconnection

Reply via email to