David Arthur created KAFKA-15552:
------------------------------------

             Summary: Duplicate Producer ID blocks during ZK migration
                 Key: KAFKA-15552
                 URL: https://issues.apache.org/jira/browse/KAFKA-15552
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.5.1, 3.4.1, 3.5.0, 3.4.0, 3.6.0
            Reporter: David Arthur
            Assignee: David Arthur
             Fix For: 3.4.2, 3.5.2, 3.6.1


When migrating producer ID blocks from ZK to KRaft, we are taking the current 
producer ID block from ZK and writing it's "firstProducerId" into the producer 
IDs KRaft record. However, in KRaft we store the _next_ producer ID block in 
the log rather than storing the current block like ZK does. The end result is 
that the first block given to a caller of AllocateProducerIds is a duplicate of 
the last block allocated in ZK mode.

 

This can result in duplicate producer IDs being given to transactional or 
idempotent producers. In the case of transactional producers, this can cause 
long term problems since the producer IDs are persisted and reused for a long 
time.


The time between the last producer ID block being allocated by the ZK 
controller and all the brokers being restarted following the metadata migration 
is when this bug is possible.
 

Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException 
and possibly some producer epoch validation errors. To see if a cluster is 
affected by this bug, search for the offending producer ID and see if it is 
being used by more than one producer.

 

For example, the following error was observed
{code}
Out of order sequence number for producer 376000 at offset 381338 in partition 
REDACTED: 0 (incoming seq. number), 21 (current end sequence number) 
{code}

Then searching for "376000" on 
org.apache.kafka.clients.producer.internals.TransactionManager logs, two 
brokers both show the same producer ID being provisioned

{code}
Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1
Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to