[ https://issues.apache.org/jira/browse/KAFKA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362972#comment-15362972 ]
Guozhang Wang commented on KAFKA-3876: -------------------------------------- I have seen a similar failure for a different test case of {{testPatternUnsubscription}}: {code} Error Message java.lang.AssertionError: Expected partitions [topic-0, topic-1, tblablac-0, tblablac-1] but actually got [] Stacktrace java.lang.AssertionError: Expected partitions [topic-0, topic-1, tblablac-0, tblablac-1] but actually got [] at org.junit.Assert.fail(Assert.java:88) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:771) at kafka.api.PlaintextConsumerTest.testPatternUnsubscription(PlaintextConsumerTest.scala:288) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) {code} Example: https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4552/. [~hachikuji] [~ijuma] My feeling is that a lot of transient failures in {{PlaintextConsumerTest}} is time dependent, and some times 5 seconds is not enough: although the metadata refresh period is 500ms, we cannot really determine how long it will take for two round trips (metadata fresh, and join group) in a jammed test environment; so I suggest checking {{TestConsumerReassignmentListener}}'s callback triggered with expected partitions instead, and increase the waitUntil timeout to 30 seconds, to all correlated test cases not just this one. > Transient test failure: > kafka.api.PlaintextConsumerTest.testExpandingTopicSubscriptions > --------------------------------------------------------------------------------------- > > Key: KAFKA-3876 > URL: https://issues.apache.org/jira/browse/KAFKA-3876 > Project: Kafka > Issue Type: Sub-task > Components: unit tests > Reporter: Ismael Juma > Labels: transient-unit-test-failure > > Failed in a recent build: > {code} > java.lang.AssertionError: Partition [__consumer_offsets,0] metadata not > propagated after 5000 ms > at org.junit.Assert.fail(Assert.java:88) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:771) > at > kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:812) > at > kafka.utils.TestUtils$$anonfun$createTopic$1.apply(TestUtils.scala:240) > at > kafka.utils.TestUtils$$anonfun$createTopic$1.apply(TestUtils.scala:239) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.Range.foreach(Range.scala:160) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at kafka.utils.TestUtils$.createTopic(TestUtils.scala:239) > at > kafka.api.IntegrationTestHarness$class.setUp(IntegrationTestHarness.scala:80) > at kafka.api.BaseConsumerTest.setUp(BaseConsumerTest.scala:60) > {code} > Standard out: > {code} > [2016-06-19 09:09:20,081] WARN Client session timed out, have not heard from > server in 4002ms for sessionid 0x15567ebcc160001 > (org.apache.zookeeper.ClientCnxn:1108) > [2016-06-19 09:09:21,602] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567ebcc160001, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:09:20,081] WARN Client session timed out, have not heard from > server in 4001ms for sessionid 0x15567ebcc160003 > (org.apache.zookeeper.ClientCnxn:1108) > [2016-06-19 09:09:21,598] WARN Client session timed out, have not heard from > server in 4001ms for sessionid 0x15567ebcc160002 > (org.apache.zookeeper.ClientCnxn:1108) > [2016-06-19 09:09:21,613] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567ebcc160003, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:09:21,613] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567ebcc160002, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:09:21,602] WARN Client session timed out, have not heard from > server in 4084ms for sessionid 0x15567ebcc160000 > (org.apache.zookeeper.ClientCnxn:1108) > [2016-06-19 09:09:21,615] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567ebcc160000, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:11:26,866] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567edb4d10002, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:11:29,171] ERROR [KafkaApi-0] Error when handling request > Name: FetchRequest; Version: 2; CorrelationId: 0; ClientId: > ReplicaFetcherThread-0-0; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; > RequestInfo: [__consumer_offsets,0] -> PartitionFetchInfo(0,1048576) > (kafka.server.KafkaApis:103) > kafka.common.NotAssignedReplicaException: Leader 0 failed to record follower > 2's position -1 since the replica is not recognized to be one of the assigned > replicas for partition [__consumer_offsets,0]. > at > kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:251) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:864) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:861) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:861) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:470) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:496) > at kafka.server.KafkaApis.handle(KafkaApis.scala:77) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > [2016-06-19 09:11:29,171] ERROR [ReplicaFetcherThread-0-0], Error for > partition [__consumer_offsets,0] to broker > 0:org.apache.kafka.common.errors.UnknownServerException: The server > experienced an unexpected error when processing the request > (kafka.server.ReplicaFetcherThread:97) > [2016-06-19 09:11:47,714] WARN caught end of stream exception > (org.apache.zookeeper.server.NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client sessionid > 0x15567ee0b4c0002, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > at java.lang.Thread.run(Thread.java:745) > {code} > https://jenkins.confluent.io/job/kafka-trunk/932/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)