[
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887913#comment-15887913
]
Rajini Sivaram commented on KAFKA-4779:
---------------------------------------
The new failure looks like a very different issue. The consumer logs show:
{quote}
[2017-02-26 05:14:28,833] WARN Error while fetching metadata with correlation
id 10712 : test_topic=LEADER_NOT_AVAILABLE
(org.apache.kafka.clients.NetworkClient)
....
[2017-02-26 05:14:29,655] DEBUG Received successful JoinGroup response for
group group: org.apache.kafka.common.requests.JoinGroupResponse@7802468d
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-02-26 05:14:29,655] DEBUG Performing assignment for group group using
strategy range with subscriptions
console-consumer-810cd1cc-eb5b-4d6d-bfb5-6e094e107d14=Subscription(topics=[test_topic])(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-02-26 05:14:29,655] DEBUG Skipping assignment for topic test_topic since
no metadata is available
(org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor)
[2017-02-26 05:14:29,655] WARN The following subscribed topics are not assigned
to any members in the group group : [test_topic]
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
{quote}
Even though partitions of the subscribed topic could not be assigned because no
metadata is available due to a transient error resulting from broker restart,
no metadata refresh is requested. The consumer times out after a minute.
Metadata expiry is 5 minutes, so rebalancing will be delayed for 5 minutes.
Consumers currently request metadata when leader is not known during fetching,
but not if assignment cannot be performed. I think we should request metadata
sooner for this case. Will submit a PR.
> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> ----------------------------------------------------------------------------
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
> Issue Type: Bug
> Reporter: Apurva Mehta
> Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 123, in run
> data = self.run_test()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 176, in run_test
> return self.test_context.function(self.test)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
> line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
> line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings,
> client_protocol, broker_protocol)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 87, in stop_producer_and_consumer
> self.check_alive()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out:
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating
> consumer process: (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the
> producer produced ~15k messages, of which ~7k were acked, and the consumer
> only got around ~2600 before timing out.
> There are a lot of messages like the following for different request types on
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in
> produce request on partition test_topic-0. The topic/partition may not exist
> or the user may not have Describe access to it
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)