[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729516#comment-16729516
 ] 

TisonKun edited comment on FLINK-10333 at 1/4/19 2:20 AM:
----------------------------------------------------------

Following the ZK transaction idea, I drafted a design doc on how we rework ZK 
based stores to achieve atomicity.

Mainly we will re-implement leader election service to get the 
{{election-node-path}}. It could be done by porting the implementation of 
{{LeaderLatch}} in Curator to Flink. Then we generate dispatcher/jm/rm's 
{{session id}} based on {{election-node-path}}, and when communicate with ZK, 
we pass the {{session id}}, which underneath converted to the 
{{election-node-path}} to ensure that the caller is the leader, just as code 
snipper above.

Subtasks can be separated as

# re-layout zookeeper content
# re-implement ZK leader election, expose election node path, which can be 
converted to a session id
# re-implement ZK submitted job graph store, r/w with a session id(the session 
id is then converted to the election node path)
# re-implement ZK completed checkpoint store, r/w with a session id(the session 
id is then converted to the election node path)
# let only dispatcher maintain running job registry

What do you think? [~till.rohrmann] [~StephanEwen]



was (Author: tison):
Following the ZK transaction idea, I drafted a design doc on how we rework ZK 
based stores to achieve atomicity.

Mainly we will re-implement leader election service to get the 
{{election-node-path}}. It could be done by porting the implementation of 
{{LeaderLatch}} in Curator to Flink. Then we generate dispatcher/jm/rm's 
{{session id}} based on {{election-node-path}}, and when communicate with ZK, 
we pass the {{session id}}, which underneath converted to the 
{{election-node-path}} to ensure that the caller is the leader, just as code 
snipper above.

Subtasks can be separated as

# re-implement ZK leader election, expose election node path, which can be 
converted to a session id
# re-implement ZK submitted job graph store, r/w with a session id(the session 
id is then converted to the election node path)
# re-implement ZK completed checkpoint store, r/w with a session id(the session 
id is then converted to the election node path)
# discuss how running job registry should work (this is an on-going issue, 
mainly about how we publish the status of a job
# (maybe) re-layout zookeeper content

What do you think? [~till.rohrmann] [~StephanEwen]


> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>             Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to