[ https://issues.apache.org/jira/browse/KAFKA-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Lin updated KAFKA-6618: ---------------------------- Issue Type: Bug (was: Improvement) > Prevent two controllers from updating znodes concurrently > --------------------------------------------------------- > > Key: KAFKA-6618 > URL: https://issues.apache.org/jira/browse/KAFKA-6618 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > Priority: Major > > Kafka controller may fail to function properly (even after repeated > controller movement) due to the following sequence of events: > - User requests topic deletion > - Controller A deletes the partition znode > - Controller B becomes controller and reads the topic znode > - Controller A deletes the topic znode and remove the topic from the topic > deletion znode > - Controller B reads the partition znode and topic deletion znode > - According to controller B's context, the topic znode exists, the topic is > not listed for deletion, and some partition is not found for the given topic. > Then controller B will create topic znode with empty data (i.e. partition > assignment) and create the partition znodes. > - All controller after controller B will fail because there is not data in > the topic znode. > The long term solution is to have a way to prevent old controller from > writing to zookeeper if it is not the active controller. One idea is to use > the zookeeper multi API (See > [https://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable))] > such that controller only writes to zookeeper if the zk version of the > controller znode has not been changed. > The short term solution is to let controller reads the topic deletion znode > first. If the topic is still listed in the topic deletion znode, then the new > controller will properly handle partition states of this topic without > creating partition znodes for this topic. And if the topic is not listed in > the topic deletion znode, then both the topic znode and the partition znodes > of this topic should have been deleted by the time the new controller tries > to read them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)