[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565914#comment-14565914
 ] 

Joe Stein commented on KAFKA-1778:
----------------------------------

Hey, sorry for late reply. I have seen now on a few dozen clusters situations 
where the broker gets into a state where the controller is hung and the only 
recourse is to either delete the znode from Zookeeper (/controller) to force a 
re-election or shutdown the broker. In the former case I have seen in one 
situation where the entire cluster went down. I am fairly certain this was 
because of the version of Zookeeper they were running (3.4.5) however I haven't 
ever tried to reproduce it. The latter case many folks don't want to shutdown 
the broker because they are in high traffic situations and doing so we could be 
a lot worse than the controller not working... sometimes that changes and they 
shut the broker down so the controller can fail over and their partition 
reassignment can continue to the new brokers they just launched (as an example).

So, originally we were thinking of fixing this be having an admin call that 
could trigger safely another leader election. We have been finding though that 
just having the broker start without it ever being able to be the controller 
(can.be.controller = false) is preferable in *a lot* of cases. This way there 
are brokers that will never be the controller and then some that could and with 
the brokers that could one of them would.

~ Joestein

> Create new re-elect controller admin function
> ---------------------------------------------
>
>                 Key: KAFKA-1778
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1778
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Joe Stein
>            Assignee: Abhishek Nigam
>             Fix For: 0.8.3
>
>
> kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to