Andreas created KAFKA-6442:
------------------------------
Summary: Catch 22 with cluster rebalancing
Key: KAFKA-6442
URL: https://issues.apache.org/jira/browse/KAFKA-6442
Project: Kafka
Issue Type: Bug
Reporter: Andreas
Fix For: 0.8.2.1
PS. I classified this as a bug because I think the cluster should not be stuck
in that situation, apologies if that is wrong.
Hi,
I found myself in a situation a bit difficult to explain so I will skip the how
I ended up in this situation, but here is the problem.
Some of the brokers of my cluster are permanently gone. Consequently, I had
some partitions that now had offline leaders etc so, I used the
`kafka-reassign-partitions.sh` to rebalance my topics and for the most part
that worked ok. Where that did not work ok, was for partitions that had
leaders, rs and irs completely in the gone brokers. Those got stuck halfway
through to what now looks like
`Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr:`
(1,2,3 are legit, 6,7,8 permanently gone)
So the first catch 22, is that I cannot elect a new leader, because the leader
needs to be elected from the ISR, and I cannot recreate the ISR because the
topic has no leader.
The second catch 22 is that I cannot rerun `kafka-reassign-partitions.sh`
because the previous one is supposedly still in progress, and I cannot increase
the number of partitions to account for the now permanently offline partitions,
because that produces the following error `Error while executing topic command
requirement failed: All partitions should have the same number of replicas.`,
from which I cannot recover because `kafka-reassign-partitions.sh`.
Is there a way to recover from such a situation?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)