[ https://issues.apache.org/jira/browse/KAFKA-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewen Cheslack-Postava resolved KAFKA-4673. ------------------------------------------ Resolution: Fixed Fix Version/s: 0.10.2.0 0.10.1.2 0.10.3.0 Issue resolved by pull request 2427 [https://github.com/apache/kafka/pull/2427] > Python VerifiableConsumer service has thread-safety bug for event_handlers > -------------------------------------------------------------------------- > > Key: KAFKA-4673 > URL: https://issues.apache.org/jira/browse/KAFKA-4673 > Project: Kafka > Issue Type: Bug > Components: system tests > Affects Versions: 0.10.1.1 > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Fix For: 0.10.3.0, 0.10.1.2, 0.10.2.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > From > http://confluent-kafka-0-10-1-system-test-results.s3-us-west-2.amazonaws.com/2017-01-17--001.1484653357--apache--0.10.1--d436fa2/report.html > {quote} > ==================================================================================================== > test_id: > 2017-01-17--001.kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=True.bounce_mode=rolling > status: FAIL > run time: 1 minute 42.663 seconds > dictionary changed size during iteration > Traceback (most recent call last): > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.3-py2.7.egg/ducktape/tests/runner.py", > line 106, in run_all_tests > data = self.run_single_test() > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.3-py2.7.egg/ducktape/tests/runner.py", > line 162, in run_single_test > return self.current_test_context.function(self.current_test) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.3-py2.7.egg/ducktape/mark/_mark.py", > line 331, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/tests/kafkatest/tests/client/consumer_test.py", > line 137, in test_consumer_bounce > self.await_all_members(consumer) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/tests/kafkatest/tests/verifiable_consumer_test.py", > line 83, in await_all_members > self.await_members(consumer, self.num_consumers) > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/tests/kafkatest/tests/verifiable_consumer_test.py", > line 80, in await_members > err_msg="Consumers failed to join in a reasonable amount of time") > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.3-py2.7.egg/ducktape/utils/util.py", > line 31, in wait_until > if condition(): > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/tests/kafkatest/tests/verifiable_consumer_test.py", > line 78, in <lambda> > wait_until(lambda: len(consumer.joined_nodes()) == num_consumers, > File > "/var/lib/jenkins/workspace/system-test-kafka-0.10.1/kafka/tests/kafkatest/services/verifiable_consumer.py", > line 317, in joined_nodes > return [handler.node for handler in self.event_handlers.itervalues() > RuntimeError: dictionary changed size during iteration > {quote} > It looks like the background thread is incorrectly inserting elements into > self.event_handlers during this iteration. This is the first time I've seen > this, so I suspect it's rarely hit. Looking at the code, most access is > protected by a lock except for the additions at the beginning of the _worker > method. Looking through the stacktrace, this looks like the likely culprit > since we start the consumer and immediately start calling > `await_all_members`, which ultimately iterates over that list. If that call > happens faster than the thread gets started we could hit a bad interleaving. > Fix is probably easy -- just add the proper locking around use of that object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)