[ https://issues.apache.org/jira/browse/KAFKA-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220416#comment-15220416 ]
Rajini Sivaram commented on KAFKA-3488: --------------------------------------- [~hachikuji] Thank you for the feedback. I was thinking along the lines of 2) since it felt like the simplest change. I will take a look at the commits to see if I can reuse the code. I will look into 1) as well. Thanks. > commitAsync() fails if metadata update creates new SASL/SSL connection > ---------------------------------------------------------------------- > > Key: KAFKA-3488 > URL: https://issues.apache.org/jira/browse/KAFKA-3488 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.9.0.1 > Reporter: Rajini Sivaram > Assignee: Rajini Sivaram > Fix For: 0.10.0.0 > > > Sasl/SslConsumerTest.testSimpleConsumption() fails intermittently with a > failure in {{commitAsync()}}. The exception stack trace shows: > {quote} > kafka.api.SaslPlaintextConsumerTest.testSimpleConsumption FAILED > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > kafka.api.BaseConsumerTest.awaitCommitCallback(BaseConsumerTest.scala:340) > at > kafka.api.BaseConsumerTest.testSimpleConsumption(BaseConsumerTest.scala:85) > {quote} > I have recreated this with some additional trace. The tests run with a very > small metadata expiry interval, triggering metadata updates quite often. If a > metadata request immediately following a {{commitAsync()}} call creates a new > SSL/SASL connection, {{ConsumerNetworkClient.poll}} returns to process the > connection handshake packets. Since {{ConsumerNetworkClient.poll}} discards > all unsent packets before returning from poll, this can result in the failure > of the commit - the callback is invoked with {{SendFailedException}}. > I understand that {{ConsumerNetworkClient.poll()}} discards unsent packets > rather than buffer them to keep the code simple. And perhaps it is ok to fail > {{commitAsync}} occasionally since the callback does indicate that the caller > should retry. But it feels like an unnecessary limitation that requires error > handling in client applications when there are no real failures and makes it > much harder to test reliably. As special handling to fix issues like > KAFKA-3412, KAFKA-2672 adds more complexity to the code anyway, and because > it is much harder to debug failures that affect only SSL/SASL, it may be > worth considering improving this behaviour. > I will see if I can submit a PR for the specific issue I was seeing with the > impact of handshakes on {{commitAsync()}}, but I will be interested in views > on improving the logic in {{ConsumerNetworkClient}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)