Ewen Cheslack-Postava created KAFKA-1771: --------------------------------------------
Summary: replicate_testsuite data verification broken if num_partitions > replica_factor Key: KAFKA-1771 URL: https://issues.apache.org/jira/browse/KAFKA-1771 Project: Kafka Issue Type: Bug Components: system tests Affects Versions: 0.8.1.1 Reporter: Ewen Cheslack-Postava As discussed in KAFKA-1763, testcase_0131, testcase_0132, and testcase_0133 currently fail with an exception: {quote} Traceback (most recent call last): File "/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/ replica_basic_test.py", line 434, in runTest kafka_system_test_utils.validate_simple_consumer_data_matched_across_replic as(self.systemTestEnv, self.testcaseEnv) File "/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_tes t_utils.py", line 2223, in validate_simple_consumer_data_matched_across_replicas replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList IndexError: list index out of range {quote} The root cause seems to be kafka_system_test_utils.start_simple_consumer. The current logic seems incorrect. It should be generating one consumer per partition per replica so it can verify the data from all sources, but it currently has a loop involving the list of brokers, where that loop variable isn't even used. But probably a bigger issue is that it's generating multiple processes in the background. It records pids to the single well-known entity pid path, which means only the last pid is saved and we could easily leave zombie processes if one of them hangs for some reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)