[ https://issues.apache.org/jira/browse/FLINK-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
MalcolmSanders reassigned FLINK-11809: -------------------------------------- Assignee: MalcolmSanders > Implement etcd based StateHandleStore > ------------------------------------- > > Key: FLINK-11809 > URL: https://issues.apache.org/jira/browse/FLINK-11809 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Reporter: MalcolmSanders > Assignee: MalcolmSanders > Priority: Minor > > Implement StateHandleStore using jetcd. > Previously ZooKeeperStateHandleStore stores data in a dfs file while records > its metadata to a zookeeper node in order to keep the data size of a > zookeeper node small. EtcdStateHandleStore should work in the same way while > there is a corner case that should be carefully dealt with using etcd. > As described in FLINK-6612, the ZooKeeperStateHandleStore does not guard > against concurrent delete operations which could happen in case of a lost > leadership and a new leadership grant. The problem is that checkpoint nodes > can get deleted even after they have been recovered by another > ZooKeeperCompletedCheckpointStore. This corrupts the recovered checkpoint and > thwarts future recoveries. In order to guard against deletions of ZooKeeper > nodes which are still being used by a different ZooKeeperStateHandleStore, a > locking mechanism has been introduced to make sure that zookeeper nodes are > allowed to be deleted only after all ZooKeeperStateHandleStores have released > their lock. The locking mechanism is implemented via ephemeral child nodes of > the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to > lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral > child node. The node's name is unique to the ZooKeeperStateHandleStore > instance. The delete operations will then only delete the node if it does not > have any children associated. In order to guard against orphaned lock nodes, > they are created as ephemeral nodes. This means that they will be deleted by > ZooKeeper once the connection of the ZooKeeper client which created the node > timed out. > The solution leverages that a zookeeper directory cannot be deleted if it > still has child nodes and a ephemeral node will be deleted by zookeeper > server once the corresponding client is disconnected. Since etcd doesn’t have > hierarchical structure like zookeeper, there is no actual relations between a > path and its so-called parent directory. Suppose we create a persistent > key-value to store metadata and then create an ephemeral key-value using > previous key as the prefix. Once the client disconnects with etcd server, the > ephemeral key-value pair will be deleted from etcd server while there’ll be > no effect on its parent key-value pair. > My proposal to tackle this case is illustrated in Part 6 Implementation of > etcd based StateHandleStore in [design > doc|https://docs.google.com/document/d/12-gIZDuT4IOWG7gmwSqNFsOHuGlkdRHge0ahJ7M311Y/edit#heading=h.sqkj9zjvgicu]. > Any comments or suggestions will be appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)