[ https://issues.apache.org/jira/browse/KAFKA-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Fung resolved KAFKA-590. ----------------------------- Resolution: Fixed These 4 test cases are passing after setting "producer-retry-backoff-ms" to 2500 which is supported by ProducerPerformance (in KAFKA-267). > System Test - 4 cases failed due to insufficient no. of retry in > ProducerPerformance > ------------------------------------------------------------------------------------ > > Key: KAFKA-590 > URL: https://issues.apache.org/jira/browse/KAFKA-590 > Project: Kafka > Issue Type: Bug > Reporter: John Fung > > 1. Functional Test Area : Replication with Leader Hard Failure (1 Topic, 3 > Partitions) > 2. Testcases failed : > 0151 (Sync Producer, Acks = -1, No Compression) > 0152 (Async Producer, Acks = -1, No Compression) > 0155 (Sync Producer, Acks = -1, Compressed) > 0156 (Async Producer, Acks = -1, Compressed) > 3. Sample test results : > 2012-10-25 18:22:20,206 - INFO - > ====================================================== > 2012-10-25 18:22:20,206 - INFO - validating data matched > 2012-10-25 18:22:20,206 - INFO - > ====================================================== > 2012-10-25 18:22:20,206 - DEBUG - request-num-acks [-1] > (kafka_system_test_utils) > 2012-10-25 18:22:20,228 - INFO - no. of unique messages on topic [test_1] > sent from publisher : 900 (kafka_system_test_utils) > 2012-10-25 18:22:20,235 - INFO - no. of unique messages on topic [test_1] at > simple_consumer_1.log : 853 (kafka_system_test_utils) > 2012-10-25 18:22:20,242 - INFO - no. of unique messages on topic [test_1] at > simple_consumer_2.log : 853 (kafka_system_test_utils) > 2012-10-25 18:22:20,247 - INFO - no. of unique messages on topic [test_1] at > simple_consumer_3.log : 853 (kafka_system_test_utils) > 4. Investigations : > a. Merge log segment files per partition: > Under test_1351181987/testcase_0151/logs/broker-1/kafka_server_1_logs: > cat test_1-0/00000000000000000000.log >> > merged_test_1_0/00000000000000000000.log > cat test_1-0/00000000000000000197.log >> > merged_test_1_0/00000000000000000000.log > . . . > b. Retrieve all CRC from merged data log segment: > bin/kafka-run-class.sh kafka.tools.DumpLogSegments > merged_test_1_0/00000000000000000000.log | grep crc | sed 's/.* crc: //' | > sort -u > test_1_0_crc.log > . . . > c. Merge the CRC files together: > cat test_1_0_crc.log >> all_crc.log > cat test_1_1_crc.log >> all_crc.log > cat test_1_2_crc.log >> all_crc.log > d. Sort the merged CRC file: > cat all_crc.log | sort -u > all_crc_sorted.log > e. Get the no. of 'failed to send' CRC in producer_performance.log (70 in > this case): > grep 'failed to send' producer_performance.log | sed 's/.* crc = //' | sed > 's/, key = null.*//' | sort -u | wc -l > 70 > f. Match those 'failed to send' CRC from producer_performance.log to see how > many messages eventually got retried to send successfully: > $ for i in `grep 'failed to send' > ../../producer_performance-4/producer_performance.log | sed 's/.* crc = //' | > sed 's/, key = null.*//' | sort -u`; do echo -n "$i => "; grep $i > all_crc_sorted.log || echo "n/a"; done; > . . . > 1302684126 => n/a > 1456125554 => 1456125554 > 15299643 => n/a > 1653550869 => 1653550869 > 1741661084 => n/a > 1764395211 => 1764395211 > . . . > (23 msgs are sent successfully in retry) > g. As a result, (70 messages 'failed to send' in producer_performance.log - > 23 messages successfully sent in retry) = 47 messages are lost (which matches > the data loss count in the test result) > Therefore, if the no. of retry is increased to a higher value, all the > messages could be sent successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira