We have come across an issue where in FATAL messages are logged in the broker.

FATAL kafka.coordinator.transaction.TransactionMetadata: 
TransactionMetadata(transactionalId=tx-id-1, producerId=96011, 
producerEpoch=51, txnTimeoutMs=60000, state=CompleteCommit, 
pendingState=Some(Ongoing), topicPartitions=Set(), 
txnStartTimestamp=1580894482199, txnLastUpdateTimestamp=1580894482292)'s 
transition to TxnTransitMetadata(producerId=96011, producerEpoch=51, 
txnTimeoutMs=60000, txnState=Ongoing, topicPartitions=Set(topic1-0), 
txnStartTimestamp=1580894480766, txnLastUpdateTimestamp=1580894480766) failed: 
this should not happen

On close inspection, we found the message is because the completed transaction 
has a newer timestamp(txnStartTimestamp=1580894482199) than the current 
timestamp of TxnTransitMetadata(txnStartTimestamp=1580894480766) and we also 
found the possibility of clocks in the broker being out of sync by a few 
seconds.

https://github.com/apache/kafka/blob/b526528cafe4142b73df8c930473b0cddc84ca9d/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L382


The scenario in general is acknowledged and partially addressed below. However, 
it does not cover the case where the startTime of Ongoing transaction is older 
than start time of completed/aborted.
https://issues.apache.org/jira/browse/KAFKA-5415?focusedCommentId=16045170&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045170

Is this deliberate? Do we need that check there?

Reply via email to