[ 
https://issues.apache.org/jira/browse/KAFKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490835#comment-13490835
 ] 

Prashanth Menon commented on KAFKA-574:
---------------------------------------

So I ran the system test an a Ubuntu box and two of the test cases fail 
consistently for me, both with and without the patch:

_test_case_name: test_case_0001
_test_clss_name: ReplicaBasicTest
arg : bounce_broker : false
arg : broker_type : leader
arg : message_producing_free_time_sec : 15
arg : num_iteration : 1
arg : num_messages_to_produce_per_product_call : 50
arg : num_partition : 1
arg : replica_factor : 3
arg: sleep_sseconds_between_producer_calls : 1
validation_status:
  Leader Election latency MAX : None
  Leader Election latency MIN : None
  Validate leader election successful : FAILED

_test_case_name: test_case_1
_test_clss_name: ReplicaBasicTest
arg : bounce_broker : true
arg : broker_type : leader
arg : message_producing_free_time_sec : 15
arg : num_iteration : 2
arg : num_messages_to_produce_per_product_call : 50
arg : num_partition : 2
arg : replica_factor : 3
arg: sleep_sseconds_between_producer_calls : 1
validation_status:
  Validate leader election successful : FAILED

Any idea if this is happening for everyone else?  I'll investigate on my end to 
see what's causing it.
                
> KafkaController unnecessarily reads leaderAndIsr info from ZK
> -------------------------------------------------------------
>
>                 Key: KAFKA-574
>                 URL: https://issues.apache.org/jira/browse/KAFKA-574
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Prashanth Menon
>            Priority: Blocker
>              Labels: bugs
>         Attachments: KAFKA-574-v1.patch, KAFKA-574-v2.patch, 
> KAFKA-574-v3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> KafkaController calls updateLeaderAndIsrCache() in onBrokerFailure(). This is 
> unnecessary since in onBrokerFailure(), we will make leader and isr change 
> anyway so there is no need to first read that information from ZK. Latency is 
> critical in onBrokerFailure() since it determines how quickly a leader can be 
> made online.
> Similarly, updateLeaderAndIsrCache() is called in onBrokerStartup() 
> unnecessarily. In this case, the controller does not change the leader or the 
> isr. It just needs to send the current leader and the isr info to the newly 
> started broker. We already cache leader in the controller. Isr in theory 
> could change any time by the leader. So, reading from ZK doesn't guarantee 
> that we can get the latest isr anyway. Instead, we just need to get the isr 
> last selected by the controller (which can be cached together with the leader 
> in the controller). If the leader epoc in a broker is at or larger than the 
> epoc in the leaderAndIsr request, the broker can just ignore it. Otherwise, 
> the leader and the isr selected by the controller should be used. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to