[
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885716#comment-15885716
]
Ismael Juma commented on KAFKA-4779:
------------------------------------
The test failed again, this time with a different message:
{code}
--------------------------------------------------------------------------------
test_id:
kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_PLAINTEXT.client_protocol=SSL
status: FAIL
run time: 4 minutes 32.586 seconds
1152 acked message did not make it to the Consumer. They are: 12288, 12289,
12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298, 12299, 12300,
12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more. Total Acked:
12184, Total Consumed: 11032. We validated that the first 1000 of these missing
messages correctly made it into Kafka's data files. This suggests they were
lost on their way to the consumer.
Traceback (most recent call last):
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 123, in run
data = self.run_test()
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 176, in run_test
return self.test_context.function(self.test)
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
line 148, in test_rolling_upgrade_phase_two
self.run_produce_consume_validate(self.roll_in_secured_settings,
client_protocol, broker_protocol)
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
line 117, in run_produce_consume_validate
self.validate()
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
line 179, in validate
assert success, msg
AssertionError: 1152 acked message did not make it to the Consumer. They are:
12288, 12289, 12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298,
12299, 12300, 12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more.
Total Acked: 12184, Total Consumed: 11032. We validated that the first 1000 of
these missing messages correctly made it into Kafka's data files. This suggests
they were lost on their way to the consumer.
--------------------------------------------------------------------------------
{code}
http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/report.html
http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/TestSecurityRollingUpgrade/test_rolling_upgrade_phase_two/broker_protocol=SASL_PLAINTEXT.client_protocol=SSL/62.tgz
> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> ----------------------------------------------------------------------------
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
> Issue Type: Bug
> Reporter: Apurva Mehta
> Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 123, in run
> data = self.run_test()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 176, in run_test
> return self.test_context.function(self.test)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
> line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
> line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings,
> client_protocol, broker_protocol)
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 87, in stop_producer_and_consumer
> self.check_alive()
> File
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out:
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating
> consumer process: (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the
> producer produced ~15k messages, of which ~7k were acked, and the consumer
> only got around ~2600 before timing out.
> There are a lot of messages like the following for different request types on
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in
> produce request on partition test_topic-0. The topic/partition may not exist
> or the user may not have Describe access to it
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)