[ https://issues.apache.org/jira/browse/KAFKA-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786694#comment-17786694 ]
Luke Chen commented on KAFKA-15552: ----------------------------------- Closing this ticket since the PR is merged. > Duplicate Producer ID blocks during ZK migration > ------------------------------------------------ > > Key: KAFKA-15552 > URL: https://issues.apache.org/jira/browse/KAFKA-15552 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.4.0, 3.5.0, 3.4.1, 3.6.0, 3.5.1 > Reporter: David Arthur > Assignee: David Arthur > Priority: Critical > Fix For: 3.5.2, 3.6.1 > > > When migrating producer ID blocks from ZK to KRaft, we are taking the current > producer ID block from ZK and writing it's "firstProducerId" into the > producer IDs KRaft record. However, in KRaft we store the _next_ producer ID > block in the log rather than storing the current block like ZK does. The end > result is that the first block given to a caller of AllocateProducerIds is a > duplicate of the last block allocated in ZK mode. > > This can result in duplicate producer IDs being given to transactional or > idempotent producers. In the case of transactional producers, this can cause > long term problems since the producer IDs are persisted and reused for a long > time. > The time between the last producer ID block being allocated by the ZK > controller and all the brokers being restarted following the metadata > migration is when this bug is possible. > > Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException > and possibly some producer epoch validation errors. To see if a cluster is > affected by this bug, search for the offending producer ID and see if it is > being used by more than one producer. > > For example, the following error was observed > {code} > Out of order sequence number for producer 376000 at offset 381338 in > partition REDACTED: 0 (incoming seq. number), 21 (current end sequence > number) > {code} > Then searching for "376000" on > org.apache.kafka.clients.producer.internals.TransactionManager logs, two > brokers both show the same producer ID being provisioned > {code} > Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1 > Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)