[ https://issues.apache.org/jira/browse/KAFKA-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175704#comment-16175704 ]
ASF GitHub Bot commented on KAFKA-5957: --------------------------------------- GitHub user hachikuji opened a pull request: https://github.com/apache/kafka/pull/3942 KAFKA-5957: Prevent second deallocate if response for aborted batch returns You can merge this pull request into a Git repository by running: $ git pull https://github.com/hachikuji/kafka KAFKA-5957 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3942 ---- commit 4974842e629c098b0d36bc42e189bff211e7faac Author: Jason Gustafson <ja...@confluent.io> Date: 2017-09-22T00:09:30Z KAFKA-5957: Prevent second deallocate if produce response for aborted batch returns ---- > Producer IllegalStateException due to second deallocate after aborting a batch > ------------------------------------------------------------------------------ > > Key: KAFKA-5957 > URL: https://issues.apache.org/jira/browse/KAFKA-5957 > Project: Kafka > Issue Type: Bug > Components: producer > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 1.0.0 > > > Saw this recently in a system test failure: > {code} > [2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Aborting producer batches due to > fatal error (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.KafkaException: The client hasn't received > acknowledgment for some previously sent messages and can no longer retry > them. It isn't safe to continue. > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) > at java.lang.Thread.run(Thread.java:745) > [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 > (org.apache.kafka.clients.producer.internals.ProducerBatch) > org.apache.kafka.common.KafkaException: The client hasn't received > acknowledgment for some previously sent messages and can no longer retry > them. It isn't safe to continue. > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) > at java.lang.Thread.run(Thread.java:745) > [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Not sending transactional request > (type=EndTxnRequest, transactionalId=my-second-transactional-id, > producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error > state (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Closing the Kafka producer with > timeoutMillis = 9223372036854775807 ms. > (org.apache.kafka.clients.producer.KafkaProducer) > [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Beginning shutdown of Kafka > producer I/O thread, sending remaining records. > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Received produce response from > node 1 with correlation id 245 > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd > sequence number for topic-partition output-topic-2 to 136 > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] TRACE Successfully produced messages to > output-topic-2 with base offset 387. > (org.apache.kafka.clients.producer.internals.ProducerBatch) > [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 > after batch had already been aborted. > (org.apache.kafka.clients.producer.internals.ProducerBatch) > [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Uncaught error in request > completion: (org.apache.kafka.clients.NetworkClient) > java.lang.IllegalStateException: Remove from the incomplete set failed. This > should be impossible. > at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561) > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475) > at > org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) > at > org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177) > at java.lang.Thread.run(Thread.java:745) > {code} > Although we allow a batch to be aborted before it returns, we are not careful > about preventing a second call to {{deallocate()}} which causes this error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)