[
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743246#comment-15743246
]
Ewen Cheslack-Postava commented on KAFKA-4526:
----------------------------------------------
re: related test failures, we're also seeing this in the same test run:
{quote}
====================================================================================================
test_id:
kafkatest.tests.core.replication_test.ReplicationTest.test_replication_with_broker_failure.security_protocol=SASL_SSL.failure_mode=hard_bounce.broker_type=controller
status: FAIL
run time: 3 minutes 35.081 seconds
2 acked message did not make it to the Consumer. They are: [43137, 43140].
We validated that the first 2 of these missing messages correctly made it into
Kafka's data files. This suggests they were lost on their way to the
consumer.(There are also 1110 duplicate messages in the log - but that is an
acceptable outcome)
Traceback (most recent call last):
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 123, in run
data = self.run_test()
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 176, in run_test
return self.test_context.function(self.test)
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/replication_test.py",
line 155, in test_replication_with_broker_failure
self.run_produce_consume_validate(core_test_action=lambda:
failures[failure_mode](self, broker_type))
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
line 101, in run_produce_consume_validate
self.validate()
File
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
line 163, in validate
assert success, msg
AssertionError: 2 acked message did not make it to the Consumer. They are:
[43137, 43140]. We validated that the first 2 of these missing messages
correctly made it into Kafka's data files. This suggests they were lost on
their way to the consumer.(There are also 1110 duplicate messages in the log -
but that is an acceptable outcome)
{quote}
These use common utilities, so they may not be related and just have similar
error messages. However, the fact that they seem to have started happening at
the same time is suspicious.
> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---------------------------------------------------------------
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
> Issue Type: Bug
> Reporter: Ewen Cheslack-Postava
> Assignee: Jason Gustafson
> Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class: ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
> "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are:
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the
> first 1000 of these missing messages correctly made it into Kafka's data
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
> for an example.
> Note that there are a number of similar bug reports for different tests:
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
> I am wondering if we have a wrong ack setting somewhere that we should be
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures
> seem to always start at 0...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)