[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

Andreas (JIRA) Fri, 12 Jan 2018 01:33:21 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323747#comment-16323747
 ]


Andreas commented on KAFKA-6442:
--------------------------------

Thanks for the reply. I am afraid "unclean.leader.election.enable" is not set 
at all, so it should default to true.
Running ./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids" returns

WatchedEvent state:SyncConnected type:None path:null
[1, 2, 3, 4]

which is legit.

> Catch 22 with cluster rebalancing
> ---------------------------------
>
>                 Key: KAFKA-6442
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6442
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

Reply via email to