[ 
https://issues.apache.org/jira/browse/KAFKA-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525161#comment-17525161
 ] 

Guozhang Wang commented on KAFKA-13768:
---------------------------------------

Hello [~ddrid], I think what you're describing here is aligned with this 
proposed KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-691%3A+Enhance+Transactional+Producer+Exception+Handling

Could you check and see if that's the case?

This KIP is proposed by another contributor but he's not actively working on 
this anymore, if you are interested to pick it up, please let us know.

> Transactional producer exits because of expiration in RecordAccumulator
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-13768
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13768
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 2.0.0
>            Reporter: xuexiaoyue
>            Priority: Major
>
> Hi team, I'm using a transactional producer and set request.timeout.ms to a 
> rather small value such as 10s, meanwhile set zookeeper.session.timeout.ms 
> longer such as 30s. 
> When the producer sending records and one broker accidentally shut down, I 
> notice the producer throw out 'org.apache.kafka.common.KafkaException: The 
> client hasn't received acknowledgment for some previously sent messages and 
> can no longer retry them. It isn't safe to continue' and exit.
> Looking into the code, I found that when a batch expired in 
> RecordAccumulator, it will be marked as unsolved in Sender#sendProducerData. 
> And if it's a transactional process, it will be doomed to 
> transitionToFatalError later.
> I'm wondering why we need to transitionToFatalError here? Is it better to 
> abort this transaction instead? I know it's necessary to bump the epoch 
> during the idempotence sending, but why we let the producer crash in this 
> case?
> I found that KAFKA-8805; Bump producer epoch on recoverable errors (#7389)  
> fix this by automatically bumping the producer epoch after aborting the 
> transaction, but why it's necessary to bump the epoch, what problem will 
> occur if we call transitionToAbortableError directly and let the user abort 
> it?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to