[
https://issues.apache.org/jira/browse/KAFKA-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Lin updated KAFKA-1565:
----------------------------
Assignee: Dong Lin
> Transaction manager failover handling
> -------------------------------------
>
> Key: KAFKA-1565
> URL: https://issues.apache.org/jira/browse/KAFKA-1565
> Project: Kafka
> Issue Type: New Feature
> Reporter: Dong Lin
> Assignee: Dong Lin
> Labels: transactions
>
> Transaction manager should guarantee that, once a pre-commit/pre-abort
> request is acknowledged, commit/abort request will be delivered to partitions
> involved in the transaction.
> In particular, we handle the following failover scenarios:
> 1) Transaction manager or its followers fail before txRequest is duplicated
> on local log and followers.
> Solution: Transaction manager responds to request with error status if it is
> alive. The producer keeps trying commit.
> 2) The txPartition’s leader is not available.
> Solution: Put txRequest on unSentTxRequestQueue. When metadataCache is
> updated, check and re-send txRequest from unSentTxRequestQueue if possible.
> 3) The txPartition’s leader fails when txRequest is in channel manager.
> Solution: Retrieve all txRequests queued for transmission to this broker and
> put them on unSentTxRequestQueue.
> 4) Transaction manage does not receive success response from txPartition’s
> leaders within timeout period.
> Solution: Transaction manager expires the txRequest and re-send it.
> 5) Transaction manager fails.
> Solution: The new transaction manager reads transactionHW from zookeeper, and
> sends txRequest starting from the transactionHW.
--
This message was sent by Atlassian JIRA
(v6.2#6252)