[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

2018-01-12 Thread Andreas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323763#comment-16323763
 ] 

Andreas commented on KAFKA-6442:


For what is worth, restarting node4 (for maintenance) seems to have got 
everything unstuck. I guess it triggered leader relection internally?

> Catch 22 with cluster rebalancing
> -
>
> Key: KAFKA-6442
> URL: https://issues.apache.org/jira/browse/KAFKA-6442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

2018-01-12 Thread Andreas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323747#comment-16323747
 ] 

Andreas commented on KAFKA-6442:


Thanks for the reply. I am afraid "unclean.leader.election.enable" is not set 
at all, so it should default to true.
Running ./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids" returns

WatchedEvent state:SyncConnected type:None path:null
[1, 2, 3, 4]

which is legit.

> Catch 22 with cluster rebalancing
> -
>
> Key: KAFKA-6442
> URL: https://issues.apache.org/jira/browse/KAFKA-6442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

2018-01-11 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323021#comment-16323021
 ] 

Jan Filipiak commented on KAFKA-6442:
-

I guess discussing this on the mailing-list / slack first could help quicker.

Did you set "unclean.leader.election.enable" to false? An unclean election 
should allow broker 1 to take leadership with whatever it has.

If unclean election is true and its still not stepping up I usually delete the 
/controller node in zk to have a new controller elected to take care of these 
partitions again.


> Catch 22 with cluster rebalancing
> -
>
> Key: KAFKA-6442
> URL: https://issues.apache.org/jira/browse/KAFKA-6442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)