[ 
https://issues.apache.org/jira/browse/KAFKA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Ghike updated KAFKA-999:
--------------------------------

    Attachment:     (was: LIKAFKA-269-v3.patch)
    
> Controlled shutdown never succeeds until the broker is killed
> -------------------------------------------------------------
>
>                 Key: KAFKA-999
>                 URL: https://issues.apache.org/jira/browse/KAFKA-999
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Swapnil Ghike
>            Priority: Critical
>         Attachments: kafka-999-v1.patch, kafka-999-v2.patch, 
> kafka-999-v3.patch
>
>
> A race condition in the way leader and isr request is handled by the broker 
> and controlled shutdown can lead to a situation where controlled shutdown can 
> never succeed and the only way to bounce the broker is to kill it.
> The root cause is that broker uses a smart to avoid fetching from a leader 
> that is not alive according to the controller. This leads to the broker 
> aborting a become follower request. And in cases where replication factor is 
> 2, the leader can never be transferred to a follower since it keeps rejecting 
> the become follower request and stays out of the ISR. This causes controlled 
> shutdown to fail forever
> One sequence of events that led to this bug is as follows -
> - Broker 2 is leader and controller
> - Broker 2 is bounced (uncontrolled shutdown)
> - Controller fails over
> - Controlled shutdown is invoked on broker 1
> - Controller starts leader election for partitions that broker 2 led
> - Controller sends become follower request with leader as broker 1 to broker 
> 2. At the same time, it does not include broker 1 in alive broker list sent 
> as part of leader and isr request
> - Broker 2 rejects leaderAndIsr request since leader is not in the list of 
> alive brokers
> - Broker 1 fails to transfer leadership to broker 2 since broker 2 is not in 
> ISR
> - Controlled shutdown can never succeed on broker 1
> Since controlled shutdown is a config option, if there are bugs in controlled 
> shutdown, there is no option but to kill the broker

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to