[ https://issues.apache.org/jira/browse/KAFKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490835#comment-13490835 ]
Prashanth Menon commented on KAFKA-574: --------------------------------------- So I ran the system test an a Ubuntu box and two of the test cases fail consistently for me, both with and without the patch: _test_case_name: test_case_0001 _test_clss_name: ReplicaBasicTest arg : bounce_broker : false arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 1 arg : num_messages_to_produce_per_product_call : 50 arg : num_partition : 1 arg : replica_factor : 3 arg: sleep_sseconds_between_producer_calls : 1 validation_status: Leader Election latency MAX : None Leader Election latency MIN : None Validate leader election successful : FAILED _test_case_name: test_case_1 _test_clss_name: ReplicaBasicTest arg : bounce_broker : true arg : broker_type : leader arg : message_producing_free_time_sec : 15 arg : num_iteration : 2 arg : num_messages_to_produce_per_product_call : 50 arg : num_partition : 2 arg : replica_factor : 3 arg: sleep_sseconds_between_producer_calls : 1 validation_status: Validate leader election successful : FAILED Any idea if this is happening for everyone else? I'll investigate on my end to see what's causing it. > KafkaController unnecessarily reads leaderAndIsr info from ZK > ------------------------------------------------------------- > > Key: KAFKA-574 > URL: https://issues.apache.org/jira/browse/KAFKA-574 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Reporter: Jun Rao > Assignee: Prashanth Menon > Priority: Blocker > Labels: bugs > Attachments: KAFKA-574-v1.patch, KAFKA-574-v2.patch, > KAFKA-574-v3.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > KafkaController calls updateLeaderAndIsrCache() in onBrokerFailure(). This is > unnecessary since in onBrokerFailure(), we will make leader and isr change > anyway so there is no need to first read that information from ZK. Latency is > critical in onBrokerFailure() since it determines how quickly a leader can be > made online. > Similarly, updateLeaderAndIsrCache() is called in onBrokerStartup() > unnecessarily. In this case, the controller does not change the leader or the > isr. It just needs to send the current leader and the isr info to the newly > started broker. We already cache leader in the controller. Isr in theory > could change any time by the leader. So, reading from ZK doesn't guarantee > that we can get the latest isr anyway. Instead, we just need to get the isr > last selected by the controller (which can be cached together with the leader > in the controller). If the leader epoc in a broker is at or larger than the > epoc in the leaderAndIsr request, the broker can just ignore it. Otherwise, > the leader and the isr selected by the controller should be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira