[ https://issues.apache.org/jira/browse/KAFKA-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma updated KAFKA-5957: ------------------------------- Priority: Critical (was: Major) > Producer IllegalStateException due to second deallocate after aborting a batch > ------------------------------------------------------------------------------ > > Key: KAFKA-5957 > URL: https://issues.apache.org/jira/browse/KAFKA-5957 > Project: Kafka > Issue Type: Bug > Components: producer > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 1.0.0 > > > Saw this recently in a system test failure: > {code} > [2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Aborting producer batches due to > fatal error (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.KafkaException: The client hasn't received > acknowledgment for some previously sent messages and can no longer retry > them. It isn't safe to continue. > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) > at java.lang.Thread.run(Thread.java:745) > [2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 > (org.apache.kafka.clients.producer.internals.ProducerBatch) > org.apache.kafka.common.KafkaException: The client hasn't received > acknowledgment for some previously sent messages and can no longer retry > them. It isn't safe to continue. > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164) > at java.lang.Thread.run(Thread.java:745) > [2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Not sending transactional request > (type=EndTxnRequest, transactionalId=my-second-transactional-id, > producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error > state (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Closing the Kafka producer with > timeoutMillis = 9223372036854775807 ms. > (org.apache.kafka.clients.producer.KafkaProducer) > [2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Beginning shutdown of Kafka > producer I/O thread, sending remaining records. > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Received produce response from > node 1 with correlation id 245 > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd > sequence number for topic-partition output-topic-2 to 136 > (org.apache.kafka.clients.producer.internals.Sender) > [2017-09-21 05:04:52,360] TRACE Successfully produced messages to > output-topic-2 with base offset 387. > (org.apache.kafka.clients.producer.internals.ProducerBatch) > [2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 > after batch had already been aborted. > (org.apache.kafka.clients.producer.internals.ProducerBatch) > [2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, > transactionalId=my-second-transactional-id] Uncaught error in request > completion: (org.apache.kafka.clients.NetworkClient) > java.lang.IllegalStateException: Remove from the incomplete set failed. This > should be impossible. > at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561) > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475) > at > org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) > at > org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101) > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177) > at java.lang.Thread.run(Thread.java:745) > {code} > Although we allow a batch to be aborted before it returns, we are not careful > about preventing a second call to {{deallocate()}} which causes this error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)