[ 
https://issues.apache.org/jira/browse/KAFKA-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175704#comment-16175704
 ] 

ASF GitHub Bot commented on KAFKA-5957:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/3942

    KAFKA-5957: Prevent second deallocate if response for aborted batch returns

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-5957

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3942
    
----
commit 4974842e629c098b0d36bc42e189bff211e7faac
Author: Jason Gustafson <ja...@confluent.io>
Date:   2017-09-22T00:09:30Z

    KAFKA-5957: Prevent second deallocate if produce response for aborted batch 
returns

----


> Producer IllegalStateException due to second deallocate after aborting a batch
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-5957
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5957
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> Saw this recently in a system test failure:
> {code}
> [2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Aborting producer batches due to 
> fatal error (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: The client hasn't received 
> acknowledgment for some previously sent messages and can no longer retry 
> them. It isn't safe to continue.
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
>         at java.lang.Thread.run(Thread.java:745)
> [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.KafkaException: The client hasn't received 
> acknowledgment for some previously sent messages and can no longer retry 
> them. It isn't safe to continue.
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
>         at java.lang.Thread.run(Thread.java:745)
> [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Not sending transactional request 
> (type=EndTxnRequest, transactionalId=my-second-transactional-id, 
> producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error 
> state (org.apache.kafka.clients.producer.internals.TransactionManager)
> [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms. 
> (org.apache.kafka.clients.producer.KafkaProducer)
> [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Beginning shutdown of Kafka 
> producer I/O thread, sending remaining records. 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Received produce response from 
> node 1 with correlation id 245 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd 
> sequence number for topic-partition output-topic-2 to 136 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-09-21 05:04:52,360] TRACE Successfully produced messages to 
> output-topic-2 with base offset 387. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 
> after batch had already been aborted. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, 
> transactionalId=my-second-transactional-id] Uncaught error in request 
> completion: (org.apache.kafka.clients.NetworkClient)
> java.lang.IllegalStateException: Remove from the incomplete set failed. This 
> should be impossible.
>         at 
> org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)
>         at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
>         at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685)
>         at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
>         at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481)
>         at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Although we allow a batch to be aborted before it returns, we are not careful 
> about preventing a second call to {{deallocate()}} which causes this error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to