Yes, that seems like a real issue. Could you file a jira? Thanks,
Jun On Tue, May 13, 2014 at 11:58 AM, Alex Demidko <alexan...@metamarkets.com>wrote: > Hi, > > Kafka version is 0.8.1.1. We have three machines: A,B,C. Let’s say there > is a topic with replication 2 and one of it’s partitions - partition 1 is > placed on brokers A and B. If the broker A is already down than for the > partition 1 we have: Leader: B, ISR: [B]. If the current controller is node > C, than killing broker B will turn partition 1 into state: Leader: -1, > ISR: []. But if the current controller is node B, than killing it won’t > update leadership/isr for partition 1 even when controller will be > restarted on node C, so partition 1 will forever think it’s leader is node > B which is dead. > > It looks that KafkaController.onBrokerFailure handles situation when the > broker down is the partition leader - it sets the new leader value to -1. > To the contrary, KafkaController.onControllerFailover never removes leader > from the partition with all replicas offline - allegedly because partition > gets into ReplicaDeletionIneligible state. Is it intended behavior? > > This behavior affects DefaultEventHandler.getPartition in the null key > case - it can’t determine partition 1 as having no leader, and this results > into events send failure. > > > What we are trying to achieve - is to be able to write data even if some > partitions lost all replicas, which is rare yet still possible scenario. > Using null key looked suitable with minor DefaultEventHandler modifications > (like getting rid from DefaultEventHandler.sendPartitionPerTopicCache to > avoid caching and uneven events distribution) as we neither use logs > compaction nor rely on partitioning of the data. We had such behavior with > kafka 0.7 - if the node is down, simply produce to a different one. > > > Thanks, > Alex > >