[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911377#comment-16911377
 ] 

TisonKun edited comment on FLINK-10333 at 8/20/19 1:56 PM:
-----------------------------------------------------------

Thanks for your review [~till.rohrmann]. For the implementation side it is 
helpful that we follow a staged fashion and focus on stage by stage. Actually, 
we internally implement it first simply a new {{HighAvailabilityServices}}. The 
document will provide a overview and details on the fly with discussion wrt the 
title "Rethink high-availability store".

For your concern on

>If Flink strictly requires this, then certain implementations might no longer 
>be possible.

we don't stick the implementation follow a transactional way but the statement 
is "commit only if it is the leader". There is other approach to achieve this 
requirement, for example, in standalone case one is always the leader, or in 
BookKeeper they have a mechanism to fencing stale leader. As for the statement 
"commit only if it is the leader", it is one of the basic requirements to make 
distributed coordination works and without it the implementation should no 
longer be valid.


was (Author: tison):
Thanks for your review [~till.rohrmann]. In the implementation side it is 
helpful that we follow a staged fashion and focus on stage by stage. Actually, 
we internally implement it first simply a new {{HighAvailabilityServices}}. The 
document will provide a overview and details on the fly with discussion wrt the 
title "Rethink high-availability store".

For your concern on

>If Flink strictly requires this, then certain implementations might no longer 
>be possible.

we don't stick the implementation follow a transactional way but the statement 
is "commit only if it is the leader". There is other approach to achieve this 
requirement, for example, in standalone case one is always the leader, or in 
BookKeeper they have a mechanism to fencing stale leader. As for the statement 
"commit only if it is the leader", it is one of the basic requirements to make 
distributed coordination works and without it the implementation should no 
longer be valid.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to