[ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887913#comment-15887913
 ] 

Rajini Sivaram commented on KAFKA-4779:
---------------------------------------

The new failure looks like a very different issue. The consumer logs show:

{quote}
[2017-02-26 05:14:28,833] WARN Error while fetching metadata with correlation 
id 10712 : test_topic=LEADER_NOT_AVAILABLE 
(org.apache.kafka.clients.NetworkClient)
....
[2017-02-26 05:14:29,655] DEBUG Received successful JoinGroup response for 
group group: org.apache.kafka.common.requests.JoinGroupResponse@7802468d 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-02-26 05:14:29,655] DEBUG Performing assignment for group group using 
strategy range with subscriptions 
console-consumer-810cd1cc-eb5b-4d6d-bfb5-6e094e107d14=Subscription(topics=[test_topic])(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-02-26 05:14:29,655] DEBUG Skipping assignment for topic test_topic since 
no metadata is available 
(org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor)
[2017-02-26 05:14:29,655] WARN The following subscribed topics are not assigned 
to any members in the group group : [test_topic]  
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
{quote}

Even though partitions of the subscribed topic could not be assigned because no 
metadata is available due to a transient error resulting from broker restart, 
no metadata refresh is requested. The consumer times out after a minute. 
Metadata expiry is 5 minutes, so rebalancing will be delayed for 5 minutes. 
Consumers currently request metadata when leader is not known during fetching, 
but not if assignment cannot be performed. I think we should request metadata 
sooner for this case. Will submit a PR.


> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-4779
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4779
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Rajini Sivaram
>             Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
>     data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
>     self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 100, in run_produce_consume_validate
>     self.stop_producer_and_consumer()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 87, in stop_producer_and_consumer
>     self.check_alive()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in check_alive
>     raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out: 
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
>         at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
>         at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
>         at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
>         at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
>         at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the 
> producer produced ~15k messages, of which ~7k were acked, and the consumer 
> only got around ~2600 before timing out. 
> There are a lot of messages like the following for different request types on 
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in 
> produce request on partition test_topic-0. The topic/partition may not exist 
> or the user may not have Describe access to it 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to