Ewen Cheslack-Postava created KAFKA-1771:
--------------------------------------------

             Summary: replicate_testsuite data verification broken if 
num_partitions > replica_factor
                 Key: KAFKA-1771
                 URL: https://issues.apache.org/jira/browse/KAFKA-1771
             Project: Kafka
          Issue Type: Bug
          Components: system tests
    Affects Versions: 0.8.1.1
            Reporter: Ewen Cheslack-Postava


As discussed in KAFKA-1763,   testcase_0131,  testcase_0132, and testcase_0133 
currently fail with an exception:

{quote}
Traceback (most recent call last):
File
"/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/
replica_basic_test.py", line 434, in runTest

kafka_system_test_utils.validate_simple_consumer_data_matched_across_replic
as(self.systemTestEnv, self.testcaseEnv)
File
"/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_tes
t_utils.py", line 2223, in
validate_simple_consumer_data_matched_across_replicas
replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList
IndexError: list index out of range
{quote}

The root cause seems to be kafka_system_test_utils.start_simple_consumer. The 
current logic seems incorrect. It should be generating one consumer per 
partition per replica so it can verify the data from all sources, but it 
currently has a loop involving the list of brokers, where that loop variable 
isn't even used.

But probably a bigger issue is that it's generating multiple processes in the 
background. It records pids to the single well-known entity pid path, which 
means only the last pid is saved and we could easily leave zombie processes if 
one of them hangs for some reason.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to