[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913015#comment-16913015 ] ASF GitHub Bot commented on KAFKA-8325: --- hachikuji commented on pull request #7176: KAFKA-8325: Remove batch from in-flight requests when handling MESSAG… URL: https://github.com/apache/kafka/pull/7176 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902452#comment-16902452 ] ASF GitHub Bot commented on KAFKA-8325: --- bob-barrett commented on pull request #7176: KAFKA-8325: Remove batch from in-flight requests when handling MESSAG… URL: https://github.com/apache/kafka/pull/7176 …E_TOO_LARGE error This patch fixes a bug in the handling of MESSAGE_TOO_LARGE errors. The large batch is split, the smaller batches are re-added to the accumulator, and the batch is deallocated, but it was not removed from the list of in-flight batches. When the batch was eventually expired from the in-flight batches, the producer would try to deallocate it a second time, causing an error. This patch changes the behavior to correctly remove the batch from the list of in-flight requests. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901429#comment-16901429 ] Bob Barrett commented on KAFKA-8325: Looks like the problem is that when handling a MESSAGE_TOO_LARGE error, we don't correctly remove the original batch from the list of in-flight batches, but we do deallocate it in the accumulator. When we check the in-flight batches for expiration, we then try to deallocate the batch a second time, which causes this error. I'll have a fix out this week. Thanks for the report and the logs, [~mbarbon] and [~lukestephenson]! [~lukestephenson], thanks for providing that demo code! Regarding the OutOfMemory you found, I think the underlying cause is the same: because we don't remove the batch from the list of in-flight batches, and because we retry MESSAGE_TOO_LARGE errors infinitely, the batches build up and eventually use all the available memory. I'll run your program with my fix and see if it fixes the issue. As for why we don't decrement retries after splitting batches, it's because we want to treat the new, smaller batches as separate requests that get the same number of attempts as any other request. If we didn't do this, and the producer batch size was too high relative to the number of retries, we might run out of retries before splitting down to a safe size and fail to produce the records, even though each individual one is viable. Eventually we'll split large batches down to a single record, and if that is still too large we don't retry. In the case of your demo, I suspect the memory ran out before the split batches got down below the broker size limit, but that should be addressed by the fix for this bug. > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900495#comment-16900495 ] Luke Stephenson commented on KAFKA-8325: I played around further with `batch.size` and setting it to large values can cause some strange behaviour on the producer. Here is one example https://github.com/lukestephenson-zendesk/kafka-bug. (See project readme for an overview) > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899811#comment-16899811 ] Luke Stephenson commented on KAFKA-8325: I suspect the trigger for this is having a large `batch.size` on the producer. I had configured: "batch.size" = 100 I was initially under the impression that this was just used to influence throughput by causing the producer to wait for more bytes. However, when the batch is "split and retried", it also uses the `batch.size` value for splitting. Curious if the other people who raised this bug also had a high value for this setting. I've reduced the value by half and I've stopped seeing the Exception with similar load. > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899717#comment-16899717 ] Luke Stephenson commented on KAFKA-8325: I'm also seeing this issue with version 2.2.1 of the client (brokers are running kafka 2.1.1). In my application, when the KafkaProducer callback is invoked, I'm completing a Scala Future. This is also triggering a `java.lang.IllegalStateException: Promise already completed.` to be logged. So that's in alignment with the `IncompleteBatches` reporting. As [~bob-barrett] reported, the initial cause of this seems to be triggered by ``` Got error produce response in correlation id 1386 on topic-partition my.topic-0, splitting and retrying (30 attempts left). Error: MESSAGE_TOO_LARGE ``` > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893750#comment-16893750 ] Mattia Barbon commented on KAFKA-8325: -- [~bob-barrett] all the cases seem to be instances of {quote}{{WARN [2019-07-26 13:09:27.670] [20295:kafka-producer-network-thread] org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=232-prd-0, transactionalId=232-prd-0] Got error produce response in correlation id 4397 on topic-partition topic-31, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE}} {{ WARN [2019-07-26 13:09:27.839] [20295:kafka-producer-network-thread] org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=232-prd-0, transactionalId=232-prd-0] Got error produce response with correlation id 4404 on topic-partition }}{{topic}}{{-31, retrying (2147483646 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER}} {{ ERROR [2019-07-26 13:09:30.562] [20295:kafka-producer-network-thread] org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=232-prd-0, transactionalId=232-prd-0] Uncaught error in kafka producer I/O thread:}}{quote} > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893166#comment-16893166 ] Bob Barrett commented on KAFKA-8325: [~mbarbon] Would you be able to upload producer logs from around the time this error was raised? This error is caused when the producer tries to deallocate a batch that has already been deallocated. There are multiple reasons a batch would be removed; the stack trace in this case shows the producer attempting to remove a batch that has expired. Additional logs might tell us what actually removed the batch before this call was made. If you don't have logs, are you aware of any interesting producer event around the time of this error, such an aborted transaction or a request timeout? > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Assignee: Bob Barrett >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891090#comment-16891090 ] Mattia Barbon commented on KAFKA-8325: -- It happens with the 2.3 client as well > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0, 2.3.0 >Reporter: Mattia Barbon >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible
[ https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838718#comment-16838718 ] Ming Liu commented on KAFKA-8325: - We also run into this problem after we upgrade to Kafka 2.2. > Remove from the incomplete set failed. This should be impossible > > > Key: KAFKA-8325 > URL: https://issues.apache.org/jira/browse/KAFKA-8325 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.1.0 >Reporter: Mattia Barbon >Priority: Major > > I got this error when using the Kafka producer. So far it happened twice, > with an interval of about 1 week. > {{ERROR [2019-05-05 08:43:07,505] > org.apache.kafka.clients.producer.internals.Sender: [Producer > clientId=, transactionalId=] Uncaught error in kafka > producer I/O thread:}} > {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. > This should be impossible.}} > {{ ! at > org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}} > {{ ! at > org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}} > {{ ! at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}} > {{ ! at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)