[ 
https://issues.apache.org/jira/browse/KAFKA-19367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-19367.
------------------------------------
    Fix Version/s: 4.0.1
       Resolution: Fixed

> InitProducerId with TV2 double-increments epoch if ongoing transaction is 
> aborted
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-19367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19367
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.0.0
>            Reporter: Artem Livshits
>            Assignee: Ritika Reddy
>            Priority: Blocker
>             Fix For: 4.0.1, 4.1.0
>
>
> When InitProducerId is handled on the transaction coordinator, the producer 
> epoch is incremented (so that we fence stale requests), then if a transaction 
> was ongoing during this time, it's aborted.  With transaction version 2 
> (a.k.a. KIP-890 part 2), abort increments the producer epoch again (it's the 
> part of the new abort / commit protocol), so the epoch ends up incremented 
> twice.
> In most cases this is benign, but in the case when the epoch of the ongoing 
> transaction is 32766, it's incremented to 32767 which is max value for short, 
> and then when it's incremented for the second time, it goes negative, and 
> causes illegal argument exception.
>  # First increment happens 
> [here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L100]
>  # Second increment happens 
> [here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L195]
>  # Illegal argument exception happens 
> [here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L289]
>  
> Most likely the solution would be to just not bump epoch when we reset the 
> fenced state 
> [here|https://github.com/apache/kafka/blob/b1ea280ab1012e857d4c8354fc57951f9c88f667/core/src/main/scala/kafka/coordinator/transaction/TransactionCoordinator.scala#L572].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to