[ https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839587#comment-15839587 ]
Rajini Sivaram commented on KAFKA-4557: --------------------------------------- [~salaev] Does the producer application send messages from a send callback when an exception occurs? I think this scenario can occur if a message is sent from the callback when some message expires. I think we do allow sends to be called from callbacks (can't find anything in Javadocs that says otherwise) and it is not unusual to send messages to a dead-letter-queue when send fails. So it makes sense to fix this scenario. > ConcurrentModificationException in KafkaProducer event loop > ----------------------------------------------------------- > > Key: KAFKA-4557 > URL: https://issues.apache.org/jira/browse/KAFKA-4557 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.10.1.0 > Reporter: Sergey Alaev > Assignee: Rajini Sivaram > Priority: Critical > Labels: reliability > Fix For: 0.10.2.0 > > > Under heavy load, Kafka producer can stop publishing events. Logs below. > [2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] > [NetworkClient] [] [<none>] [] [DEBUG]: Disconnecting from node 2 due to > request timeout. > [2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > [2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. (#2 from 2016-12-19T15:01:28.793Z) > -------------------------------- > [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for > events-deadletter-0 due to 30032 ms has passed since batch creation plus > linger time (#285 from 2016-12-19 > T15:01:28.793Z) > [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] > [SgsService] [] [<none>] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka > deadletter queue > org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for > events-deadletter-0 due to 30032 ms has passed since batch creation plus > linger time (#286 from 2016-12-19 > T15:01:28.793Z) > [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] > [Sender] [] [<none>] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer > I/O thread: > java.util.ConcurrentModificationException: null > at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) > ~[na:1.8.0_45] > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242) > ~[kafka-clients-0.10.1.0.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) > ~[kafka-clients-0.10.1.0.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) > ~[kafka-clients-0.10.1.0.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] > [NetworkClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error while fetching > metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE} -- This message was sent by Atlassian JIRA (v6.3.4#6332)