Hi Mickael, This looks to be the same as KAFKA-4669. In theory, this should never happen and it's unclear when/how it can happen. Not sure if someone has investigated it in more detail.
Ismael On Mon, Mar 6, 2017 at 5:15 PM, Mickael Maison <mickael.mai...@gmail.com> wrote: > Hi, > > In one of our clusters, some of our clients occasionally see this > exception: > java.lang.IllegalStateException: Correlation id for response (4564) > does not match request (4562) > at org.apache.kafka.clients.NetworkClient.correlate( > NetworkClient.java:486) > at org.apache.kafka.clients.NetworkClient.parseResponse( > NetworkClient.java:381) > at org.apache.kafka.clients.NetworkClient.handleCompletedReceives( > NetworkClient.java:449) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) > at java.lang.Thread.run(Unknown Source) > > We've also seen it from consumer poll() and commit() > > Usually the response's correlation id is off by just 1 or 2 (like > above) but we've also seen it off by a few hundreds: > java.lang.IllegalStateException: Correlation id for response (742) > does not match request (174) > at org.apache.kafka.clients.NetworkClient.correlate( > NetworkClient.java:486) > at org.apache.kafka.clients.NetworkClient.parseResponse( > NetworkClient.java:381) > at org.apache.kafka.clients.NetworkClient.handleCompletedReceives( > NetworkClient.java:449) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient. > clientPoll(ConsumerNetworkClient.java:360) > at org.apache.kafka.clients.consumer.internals. > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at org.apache.kafka.clients.consumer.internals. > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at org.apache.kafka.clients.consumer.internals. > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. > commitOffsetsSync(ConsumerCoordinator.java:426) > at org.apache.kafka.clients.consumer.KafkaConsumer. > commitSync(KafkaConsumer.java:1059) > at org.apache.kafka.clients.consumer.KafkaConsumer. > commitSync(KafkaConsumer.java:1027) > > When this happens, all subsequent responses are also shifted: > java.lang.IllegalStateException: Correlation id for response (743) > does not match request (742) > java.lang.IllegalStateException: Correlation id for response (744) > does not match request (743) > java.lang.IllegalStateException: Correlation id for response (745) > does not match request (744) > java.lang.IllegalStateException: Correlation id for response (746) > does not match request (745) > ... > It's easy to discard and recreate the consumer instance to recover > however we can't do that with the producer as it occurs in the Sender > thread. > > Our cluster and our clients are running Kafka 0.10.0.1. > Under which circumstances would such an error happen ? > Even with logging set to TRACE, we can't spot anything suspicious > shortly before the issue. Is there any data we should try to capture > when this happens ? > > Thanks! >