Travis Bischel created KAFKA-14315:
--------------------------------------
Summary: Kraft: 1 broker setup, broker took 34 seconds to
transition from PrepareCommit to CompleteCommit
Key: KAFKA-14315
URL: https://issues.apache.org/jira/browse/KAFKA-14315
Project: Kafka
Issue Type: Bug
Components: kraft
Reporter: Travis Bischel
I'm still looking into a PR failure in [my
client|https://github.com/twmb/franz-go/pull/223] and noticed something a bit
strange. I know that _technically_ I should be using RequireStableFetchOffsets
in my transaction tests to prevent rebalances while a transaction is not
finalized. I'll be adding that.
However, these tests have never failed against zookeeper mode. The client goes
through a lot of efforts to avoid needing KIP-447 behavior, and the assumption
with localhost testing is that things run fast enough (and that there are
enough guards) that problems would not be encountered.
That looks to not be true with a kraft broker, but looking at
__transaction_state, the following looks to be especially problematic:
{{__transaction_state partition 33 offset 7 at [2022-10-18 11:15:37.821]}}
{{TxnMetadataKey(0)
9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{ ProducerID 41}}
{{ ProducerEpoch 0}}
{{ TimeoutMillis 120000}}
{{ State PrepareCommit}}
{{ Topics __consumer_offsets=>[13]
e7c7d971626fbaf4bfb33975e57089167939e6acabb4c4fc534eb148462e45cc=>[4 5 12 16]
}}
{{ LastUpdateTimestamp 1666113337821}}
{{ StartTimestamp 1666113335311}}
{{__transaction_state partition 33 offset 8 at [2022-10-18 11:16:11.419]}}
{{TxnMetadataKey(0)
9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{ ProducerID 41}}
{{ ProducerEpoch 0}}
{{ TimeoutMillis 120000}}
{{ State CompleteCommit}}
{{ Topics }}
{{ LastUpdateTimestamp 1666113337821}}
{{ StartTimestamp 1666113335311}}
I've captured that using my kcl tool.
Note that the transaction enters PrepareCommit at 11:15:37.821, and then enters
CompleteCommit at 11:16:11.419. AFAICT, this means that in my single node kraft
setup, the broker took 34 seconds to transition commit states internally.
I noticed this in tests because a rebalance happened between those 34 seconds,
which caused duplicate consumption because transactional offset commits were
not finalized and the old commits were picked up.
This ticket is related to KAFKA-14312, in that this failure is cropping up as
I've worked around KAFKA-14312 within the client itself.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)