Justine Olshan created KAFKA-18575:
--------------------------------------
Summary: Transaction Version 2 doesn't correctly handle race
condition with completing and new transaction
Key: KAFKA-18575
URL: https://issues.apache.org/jira/browse/KAFKA-18575
Project: Kafka
Issue Type: Bug
Reporter: Justine Olshan
Right now we have a check to figure out if we need to verify/add a partition
and it involves checking if there is an ongoing transaction.
In the case where the previous transaction is in the process of
committing/aborting, we could run into a scenario where we say a transaction is
not ongoing for a given epoch so we need to add it to the coordinator. We add
it to the queue to add to the transaction with the verification guard. When we
get to the coordinator, the previous transaction has completed and we can add
the partition. However, we still have the verification guard check at the log
level right before the write, and that fails because completing the transaction
clobbers the verification guard. I think what we need to do is just not have
this second check at the log layer for TV2 and instead check the epoch is
correct.
(This was in the KIP but we didn’t quite implement it that way). The result is
we self-fence in a scenario where we shouldn’t.
(This doesn’t happen with TV0 because we have to add the partition client side
first and we hit all the concurrent transactions errors there first. We can
only write and proceed to the produce message when the previous transaction is
complete.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)