This is an automated email from the ASF dual-hosted git repository. jgus pushed a commit to branch 1.0 in repository https://gitbox.apache.org/repos/asf/kafka.git
commit 52f152bbc631c9334ae5b841b44574de0b441540 Author: Jason Gustafson <[email protected]> AuthorDate: Fri Jun 7 16:53:50 2019 -0700 MINOR: Lower producer throughput in flaky upgrade system test We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output: ``` {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null} ``` This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout. ``` kafka.consumer.ConsumerTimeoutException at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) ``` The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression. Author: Jason Gustafson <[email protected]> Reviewers: Gwen Shapira Closes #6907 from hachikuji/lower-throughput-for-upgrade-test --- tests/kafkatest/tests/core/upgrade_test.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/kafkatest/tests/core/upgrade_test.py b/tests/kafkatest/tests/core/upgrade_test.py index c8cdac7..8f97654 100644 --- a/tests/kafkatest/tests/core/upgrade_test.py +++ b/tests/kafkatest/tests/core/upgrade_test.py @@ -36,7 +36,7 @@ class TestUpgrade(ProduceConsumeValidateTest): self.zk.start() # Producer and consumer - self.producer_throughput = 10000 + self.producer_throughput = 1000 self.num_producers = 1 self.num_consumers = 1
