[
https://issues.apache.org/jira/browse/KAFKA-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justine Olshan resolved KAFKA-18575.
------------------------------------
Resolution: Fixed
> Transaction Version 2 doesn't correctly handle race condition with completing
> and new transaction
> -------------------------------------------------------------------------------------------------
>
> Key: KAFKA-18575
> URL: https://issues.apache.org/jira/browse/KAFKA-18575
> Project: Kafka
> Issue Type: Bug
> Reporter: Justine Olshan
> Assignee: Justine Olshan
> Priority: Blocker
>
> Right now we have a check to figure out if we need to verify/add a partition
> and it involves checking if there is an ongoing transaction.
> In the case where the previous transaction is in the process of
> committing/aborting, we could run into a scenario where we say a transaction
> is not ongoing for a given epoch so we need to add it to the coordinator. We
> add it to the queue to add to the transaction with the verification guard.
> When we get to the coordinator, the previous transaction has completed and we
> can add the partition. However, we still have the verification guard check at
> the log level right before the write, and that fails because completing the
> transaction clobbers the verification guard. I think what we need to do is
> just not have this second check at the log layer for TV2 and instead check
> the epoch is correct.
> (This was in the KIP but we didn’t quite implement it that way). The result
> is we self-fence in a scenario where we shouldn’t.
> (This doesn’t happen with TV0 because we have to add the partition client
> side first and we hit all the concurrent transactions errors there first. We
> can only write and proceed to the produce message when the previous
> transaction is complete.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)