[ 
https://issues.apache.org/jira/browse/FLINK-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-11809:
-------------------------------------

    Assignee:     (was: MalcolmSanders)

> Implement etcd based StateHandleStore
> -------------------------------------
>
>                 Key: FLINK-11809
>                 URL: https://issues.apache.org/jira/browse/FLINK-11809
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: MalcolmSanders
>            Priority: Minor
>
> Implement StateHandleStore using jetcd.
> Previously ZooKeeperStateHandleStore stores data in a dfs file while records 
> its metadata to a zookeeper node in order to keep the data size of a 
> zookeeper node small. EtcdStateHandleStore should work in the same way while 
> there is a corner case that should be carefully dealt with using etcd.
> As described in FLINK-6612, the ZooKeeperStateHandleStore does not guard 
> against concurrent delete operations which could happen in case of a lost 
> leadership and a new leadership grant. The problem is that checkpoint nodes 
> can get deleted even after they have been recovered by another 
> ZooKeeperCompletedCheckpointStore. This corrupts the recovered checkpoint and 
> thwarts future recoveries. In order to guard against deletions of ZooKeeper 
> nodes which are still being used by a different ZooKeeperStateHandleStore, a 
> locking mechanism has been introduced to make sure that zookeeper nodes are 
> allowed to be deleted only after all ZooKeeperStateHandleStores have released 
> their lock. The locking mechanism is implemented via ephemeral child nodes of 
> the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to 
> lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral 
> child node. The node's name is unique to the ZooKeeperStateHandleStore 
> instance. The delete operations will then only delete the node if it does not 
> have any children associated. In order to guard against orphaned lock nodes, 
> they are created as ephemeral nodes. This means that they will be deleted by 
> ZooKeeper once the connection of the ZooKeeper client which created the node 
> timed out.
> The solution leverages that a zookeeper directory cannot be deleted if it 
> still has child nodes and a ephemeral node will be deleted by zookeeper 
> server once the corresponding client is disconnected. Since etcd doesn’t have 
> hierarchical structure like zookeeper, there is no actual relations between a 
> path and its so-called parent directory. Suppose we create a persistent 
> key-value to store metadata and then create an ephemeral key-value using 
> previous key as the prefix. Once the client disconnects with etcd server, the 
> ephemeral key-value pair will be deleted from etcd server while there’ll be 
> no effect on its parent key-value pair.
> My proposal to tackle this case is illustrated in Part 6 Implementation of 
> etcd based StateHandleStore in [design 
> doc|https://docs.google.com/document/d/12-gIZDuT4IOWG7gmwSqNFsOHuGlkdRHge0ahJ7M311Y/edit#heading=h.sqkj9zjvgicu].
>  Any comments or suggestions will be appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to