This is an automated email from the ASF dual-hosted git repository.

jgus pushed a commit to branch 1.0
in repository https://gitbox.apache.org/repos/asf/kafka.git

commit 52f152bbc631c9334ae5b841b44574de0b441540
Author: Jason Gustafson <[email protected]>
AuthorDate: Fri Jun 7 16:53:50 2019 -0700

    MINOR: Lower producer throughput in flaky upgrade system test
    
    We see the upgrade test failing from time to time. I looked into it and 
found that the root cause is basically that the test throughput can be too high 
for the 0.9 producer to make progress. Eventually it reaches a point where it 
has a huge backlog of timed out requests in the accumulator which all have to 
be expired. We see a long run of messages like this in the output:
    
    ```
    {"exception":"class 
org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch
 Expired","class":"class 
org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null}
    {"exception":"class 
org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch
 Expired","class":"class 
org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null}
    {"exception":"class 
org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch
 Expired","class":"class 
org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null}
    {"exception":"class 
org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch
 Expired","class":"class 
org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null}
    ```
    This can continue for a long time (I have observed up to 1 min) and 
prevents the producer from successfully writing any new data. While it is busy 
expiring the batches, no data is getting delivered to the consumer, which 
causes it to eventually raise a timeout.
    ```
    kafka.consumer.ConsumerTimeoutException
    at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50)
    at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109)
    at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
    at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
    at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
    ```
    The fix here is to reduce the throughput, which seems reasonable since the 
purpose of the test is to verify the upgrade, which does not demand heavy load. 
Note that I investigated several failing instances of this test going back to 
1.0 and saw a similar pattern, so there does not appear to be a regression.
    
    Author: Jason Gustafson <[email protected]>
    
    Reviewers: Gwen Shapira
    
    Closes #6907 from hachikuji/lower-throughput-for-upgrade-test
---
 tests/kafkatest/tests/core/upgrade_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/kafkatest/tests/core/upgrade_test.py 
b/tests/kafkatest/tests/core/upgrade_test.py
index c8cdac7..8f97654 100644
--- a/tests/kafkatest/tests/core/upgrade_test.py
+++ b/tests/kafkatest/tests/core/upgrade_test.py
@@ -36,7 +36,7 @@ class TestUpgrade(ProduceConsumeValidateTest):
         self.zk.start()
 
         # Producer and consumer
-        self.producer_throughput = 10000
+        self.producer_throughput = 1000
         self.num_producers = 1
         self.num_consumers = 1
 

Reply via email to