[ https://issues.apache.org/jira/browse/KAFKA-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495012#comment-13495012 ]
John Fung commented on KAFKA-606: --------------------------------- • This issue is transient and is showing randomly in this functional test group. The test case is failing because the log segment checksums are not matching across the replicas. When a DumpLogSegment is run on one of the log segment files and indexes, it gives error: Non-secutive offsets in :00000000000000000405.log Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer • Transient Failure Testcase: 0302, 0308 • Functional Test Group: Replication with Simulated Leader GC Pause / 1 topic / 3 partitions / Replica Factor = 3 / No. of brokers = 3 • validation_status : Leader Election Latency MAX : None Leader Election Latency MIN : None Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 971 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 967 Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 971 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 961 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 961 Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 961 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 825 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 825 Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 825 Validate for data matched on topic [test_1] across replicas : FAILED Validate for merged log segment checksum in cluster [source] : FAILED • Failure is due to log segment checksum doesn't match. • Leader with brokerID 2 is GC paused: 2012-11-11 04:03:50,967 - DEBUG - brokerid: [2] entity_id: [2] (kafka_system_test_utils) 2012-11-11 04:03:50,968 - INFO - ====================================================== 2012-11-11 04:03:50,968 - INFO - Found leader with entity id: 2 2012-11-11 04:03:50,968 - INFO - ====================================================== 2012-11-11 04:03:50,968 - DEBUG - executing command [ssh ela4-app0998.prod 'pid=17256; prev_pid=""; echo $pid; while [[ "x$pid" != "x" ]]; do prev_pid=$pid; for child in $(ps -o pid,ppid ax | awk "{ if ( \$2 == $pid ) { print \$1 }}"); do ech o $child; pid=$child; done; if [ $prev_pid == $pid ]; then break; fi; done' 2> /dev/null (system_test_utils) 2012-11-11 04:03:51,120 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGSTOP 17258'] (system_test_utils)2012-11-11 04:03:51,136 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGSTOP 17256'] (system_test_utils) 2012-11-11 04:03:56,159 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGCONT 17256'] (system_test_utils) 2012-11-11 04:03:56,176 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGCONT 17258'] (system_test_utils) • Log segment files in all brokers kafka@ela4-app0996 [~/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs] ./ls_log_segments.sh -rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000000.log -rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000203.log -rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000405.log -rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000604.log -rw-rw-r-- 1 kafka kafka 87696 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000803.log . . . ====================================================== -rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000000.log -rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000203.log -rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log -rw-rw-r-- 1 kafka kafka 114840 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000611.log -rw-rw-r-- 1 kafka kafka 73080 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000831.log . . . ====================================================== -rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000000.log -rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000203.log -rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000405.log -rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000604.log -rw-rw-r-- 1 kafka kafka 87696 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000803.log . . . • DumpLogSegment is showing Exception: kafka@ela4-app0996 [/mnt/u001/kafka_08_replication_system_test] bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log,/export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.index Dumping /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log Starting offset: 405 offset: 405 position: 0 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2241529125 offset: 406 position: 522 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2398559000 offset: 407 position: 1044 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2470656351 offset: 408 position: 1566 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 3047120790 . . . offset: 606 position: 102834 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1253807556 offset: 607 position: 103356 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1818751757 offset: 608 position: 103878 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1711972973 offset: 609 position: 104400 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1980328956 offset: 610 position: 104922 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1566762723 Dumping /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.index offset: 504 position: 49590 offset: 520 position: 57942 offset: 541 position: 68904 offset: 555 position: 76212 offset: 576 position: 87174 offset: 597 position: 98136 Non-secutive offsets in :00000000000000000405.log Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(Unknown Source) at scala.Tuple2._1$mcI$sp(Tuple2.scala:23) at kafka.tools.DumpLogSegments$$anonfun$main$3$$anonfun$apply$2.apply(DumpLogSegments.scala:85) at kafka.tools.DumpLogSegments$$anonfun$main$3$$anonfun$apply$2.apply(DumpLogSegments.scala:84) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.tools.DumpLogSegments$$anonfun$main$3.apply(DumpLogSegments.scala:84) at kafka.tools.DumpLogSegments$$anonfun$main$3.apply(DumpLogSegments.scala:81) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:80) at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:81) at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala) > System Test Transient Failure (case 0302 GC Pause) - Log segments mismatched > across replicas > -------------------------------------------------------------------------------------------- > > Key: KAFKA-606 > URL: https://issues.apache.org/jira/browse/KAFKA-606 > Project: Kafka > Issue Type: Bug > Reporter: John Fung > Attachments: testcase_0302_data_and_log4j.tar.gz > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira