[ 
https://issues.apache.org/jira/browse/KAFKA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6082.
-----------------------------
    Resolution: Fixed

> consider fencing zookeeper updates with controller epoch zkVersion
> ------------------------------------------------------------------
>
>                 Key: KAFKA-6082
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6082
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Onur Karaman
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Major
>             Fix For: 2.1.0
>
>
>  
> Kafka controller may fail to function properly (even after repeated 
> controller movement) due to the following sequence of events:
>  - User requests topic deletion
>  - Controller A deletes the partition znode
>  - Controller B becomes controller and reads the topic znode
>  - Controller A deletes the topic znode and remove the topic from the topic 
> deletion znode
>  - Controller B reads the partition znode and topic deletion znode
>  - According to controller B's context, the topic znode exists, the topic is 
> not listed for deletion, and some partition is not found for the given topic. 
> Then controller B will create topic znode with empty data (i.e. partition 
> assignment) and create the partition znodes.
>  - All controller after controller B will fail because there is not data in 
> the topic znode.
> The long term solution is to have a way to prevent old controller from 
> writing to zookeeper if it is not the active controller. One idea is to use 
> the zookeeper multi API (See 
> [https://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable))]
>  such that controller only writes to zookeeper if the zk version of the 
> controller epoch znode has not been changed.
> The short term solution is to let controller reads the topic deletion znode 
> first. If the topic is still listed in the topic deletion znode, then the new 
> controller will properly handle partition states of this topic without 
> creating partition znodes for this topic. And if the topic is not listed in 
> the topic deletion znode, then both the topic znode and the partition znodes 
> of this topic should have been deleted by the time the new controller tries 
> to read them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to