[ https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887913#comment-15887913 ]
Rajini Sivaram commented on KAFKA-4779: --------------------------------------- The new failure looks like a very different issue. The consumer logs show: {quote} [2017-02-26 05:14:28,833] WARN Error while fetching metadata with correlation id 10712 : test_topic=LEADER_NOT_AVAILABLE (org.apache.kafka.clients.NetworkClient) .... [2017-02-26 05:14:29,655] DEBUG Received successful JoinGroup response for group group: org.apache.kafka.common.requests.JoinGroupResponse@7802468d (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2017-02-26 05:14:29,655] DEBUG Performing assignment for group group using strategy range with subscriptions console-consumer-810cd1cc-eb5b-4d6d-bfb5-6e094e107d14=Subscription(topics=[test_topic])(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2017-02-26 05:14:29,655] DEBUG Skipping assignment for topic test_topic since no metadata is available (org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor) [2017-02-26 05:14:29,655] WARN The following subscribed topics are not assigned to any members in the group group : [test_topic] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {quote} Even though partitions of the subscribed topic could not be assigned because no metadata is available due to a transient error resulting from broker restart, no metadata refresh is requested. The consumer times out after a minute. Metadata expiry is 5 minutes, so rebalancing will be delayed for 5 minutes. Consumers currently request metadata when leader is not known during fetching, but not if assignment cannot be performed. I think we should request metadata sooner for this case. Will submit a PR. > Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py > ---------------------------------------------------------------------------- > > Key: KAFKA-4779 > URL: https://issues.apache.org/jira/browse/KAFKA-4779 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Assignee: Rajini Sivaram > Fix For: 0.10.3.0, 0.10.2.1 > > > This test failed on 01/29, on both trunk and 0.10.2, error message: > {noformat} > The consumer has terminated, or timed out, on node ubuntu@worker3. > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py", > line 148, in test_rolling_upgrade_phase_two > self.run_produce_consume_validate(self.roll_in_secured_settings, > client_protocol, broker_protocol) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 100, in run_produce_consume_validate > self.stop_producer_and_consumer() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 87, in stop_producer_and_consumer > self.check_alive() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 79, in check_alive > raise Exception(msg) > Exception: The consumer has terminated, or timed out, on node ubuntu@worker3. > {noformat} > Looks like the console consumer times out: > {noformat} > [2017-01-30 04:56:00,972] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > kafka.consumer.ConsumerTimeoutException > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > {noformat} > A bunch of these security_rolling_upgrade tests failed, and in all cases, the > producer produced ~15k messages, of which ~7k were acked, and the consumer > only got around ~2600 before timing out. > There are a lot of messages like the following for different request types on > the producer and consumer: > {noformat} > [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in > produce request on partition test_topic-0. The topic/partition may not exist > or the user may not have Describe access to it > (org.apache.kafka.clients.producer.internals.Sender) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)