David Arthur created KAFKA-15552: ------------------------------------ Summary: Duplicate Producer ID blocks during ZK migration Key: KAFKA-15552 URL: https://issues.apache.org/jira/browse/KAFKA-15552 Project: Kafka Issue Type: Bug Affects Versions: 3.5.1, 3.4.1, 3.5.0, 3.4.0, 3.6.0 Reporter: David Arthur Assignee: David Arthur Fix For: 3.4.2, 3.5.2, 3.6.1
When migrating producer ID blocks from ZK to KRaft, we are taking the current producer ID block from ZK and writing it's "firstProducerId" into the producer IDs KRaft record. However, in KRaft we store the _next_ producer ID block in the log rather than storing the current block like ZK does. The end result is that the first block given to a caller of AllocateProducerIds is a duplicate of the last block allocated in ZK mode. This can result in duplicate producer IDs being given to transactional or idempotent producers. In the case of transactional producers, this can cause long term problems since the producer IDs are persisted and reused for a long time. The time between the last producer ID block being allocated by the ZK controller and all the brokers being restarted following the metadata migration is when this bug is possible. Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException and possibly some producer epoch validation errors. To see if a cluster is affected by this bug, search for the offending producer ID and see if it is being used by more than one producer. For example, the following error was observed {code} Out of order sequence number for producer 376000 at offset 381338 in partition REDACTED: 0 (incoming seq. number), 21 (current end sequence number) {code} Then searching for "376000" on org.apache.kafka.clients.producer.internals.TransactionManager logs, two brokers both show the same producer ID being provisioned {code} Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1 Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)