[ 
https://issues.apache.org/jira/browse/KAFKA-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuexiaoyue updated KAFKA-13768:
-------------------------------
    Description: 
Hi team, I'm using a transactional producer and set request.timeout.ms to a 
rather small value such as 10s, meanwhile set zookeeper.session.timeout.ms 
longer such as 30s. 

When the producer sending records and one broker accidentally shut down, I 
notice the producer throw out 'org.apache.kafka.common.KafkaException: The 
client hasn't received acknowledgment for some previously sent messages and can 
no longer retry them. It isn't safe to continue' and exit.

Looking into the code, I found that when a batch expired in RecordAccumulator, 
it will be marked as unsolved in Sender#sendProducerData. And if it's a 
transactional process, it will be doomed to transitionToFatalError later.

I'm wondering why we need to transitionToFatalError here? Is it better to abort 
this transaction instead? I know it's necessary to bump the epoch during the 
idempotence sending, but why we let the producer crash in this case?

I found that KAFKA-8805; Bump producer epoch on recoverable errors (#7389)  fix 
this by automatically bumping the producer epoch after aborting the 
transaction, but why it's necessary to bump the epoch, what problem will occur 
if we call transitionToAbortableError directly and let the user abort it?

  was:
Hi team, I'm using a transactional producer and set request.timeout.ms to a 
rather small value such as 10s, meanwhile set zookeeper.session.timeout.ms 
longer such as 30s. 

When the producer sending records and one broker accidentally shut down, I 
notice the producer throw out 'org.apache.kafka.common.KafkaException: The 
client hasn't received acknowledgment for some previously sent messages and can 
no longer retry them. It isn't safe to continue' and exit.

Looking into the code, I found that when a batch expired in RecordAccumulator, 
it will be marked as unsolved in Sender#sendProducerData. And if it's a 
transactional process, it will be doomed to transitionToFatalError later.

I'm wondering why we need to transitionToFatalError here? Is it better to abort 
this transaction instead? I know it's necessary to bump the epoch during the 
idempotence sending, but why we let the producer crash in this case?

I found that KAFKA-8805; Bump producer epoch on recoverable errors (#7389)  fix 
this by automatically bumping the producer epoch after aborting the 
transaction, but why not call transitionToAbortableError directly and let the 
user abort it?


> Transactional producer exits because of expiration in RecordAccumulator
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-13768
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13768
>             Project: Kafka
>          Issue Type: Improvement
>          Components: producer 
>    Affects Versions: 2.0.0
>            Reporter: xuexiaoyue
>            Priority: Major
>
> Hi team, I'm using a transactional producer and set request.timeout.ms to a 
> rather small value such as 10s, meanwhile set zookeeper.session.timeout.ms 
> longer such as 30s. 
> When the producer sending records and one broker accidentally shut down, I 
> notice the producer throw out 'org.apache.kafka.common.KafkaException: The 
> client hasn't received acknowledgment for some previously sent messages and 
> can no longer retry them. It isn't safe to continue' and exit.
> Looking into the code, I found that when a batch expired in 
> RecordAccumulator, it will be marked as unsolved in Sender#sendProducerData. 
> And if it's a transactional process, it will be doomed to 
> transitionToFatalError later.
> I'm wondering why we need to transitionToFatalError here? Is it better to 
> abort this transaction instead? I know it's necessary to bump the epoch 
> during the idempotence sending, but why we let the producer crash in this 
> case?
> I found that KAFKA-8805; Bump producer epoch on recoverable errors (#7389)  
> fix this by automatically bumping the producer epoch after aborting the 
> transaction, but why it's necessary to bump the epoch, what problem will 
> occur if we call transitionToAbortableError directly and let the user abort 
> it?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to