We have come across an issue where in FATAL messages are logged in the broker.
FATAL kafka.coordinator.transaction.TransactionMetadata: TransactionMetadata(transactionalId=tx-id-1, producerId=96011, producerEpoch=51, txnTimeoutMs=60000, state=CompleteCommit, pendingState=Some(Ongoing), topicPartitions=Set(), txnStartTimestamp=1580894482199, txnLastUpdateTimestamp=1580894482292)'s transition to TxnTransitMetadata(producerId=96011, producerEpoch=51, txnTimeoutMs=60000, txnState=Ongoing, topicPartitions=Set(topic1-0), txnStartTimestamp=1580894480766, txnLastUpdateTimestamp=1580894480766) failed: this should not happen On close inspection, we found the message is because the completed transaction has a newer timestamp(txnStartTimestamp=1580894482199) than the current timestamp of TxnTransitMetadata(txnStartTimestamp=1580894480766) and we also found the possibility of clocks in the broker being out of sync by a few seconds. https://github.com/apache/kafka/blob/b526528cafe4142b73df8c930473b0cddc84ca9d/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L382 The scenario in general is acknowledged and partially addressed below. However, it does not cover the case where the startTime of Ongoing transaction is older than start time of completed/aborted. https://issues.apache.org/jira/browse/KAFKA-5415?focusedCommentId=16045170&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045170 Is this deliberate? Do we need that check there?