[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010601#comment-16010601
 ] 

Edoardo Comar commented on KAFKA-5200:
--------------------------------------

Thanks [~huxi_2b] unfortunately such steps would imply significant downtime 
which is not acceptable to us.

We actually tested a much less intrusive way to handle this occurrence, 
i.e. delete the zookeeper info about the topic while the cluster is still 
running (minus the dead broker of course)
and then force *only the controller broker* to restart.

Even if this is less intrusive, it still means that for a short-ish time two 
brokers are down.
With replication-factor 3 and min.insync.2 this implies an outage for some 
clients 
which remains unacceptable.



> Deleting topic when one broker is down will prevent topic to be re-creatable
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-5200
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5200
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to