[jira] [Commented] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

Michael K. Edwards (JIRA) Wed, 21 Nov 2018 18:07:34 -0800


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695473#comment-16695473
 ]


Michael K. Edwards commented on ZOOKEEPER-1011:
-----------------------------------------------

Appropriate for 3.5.5?

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -----------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1011
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Semih Salihoglu
>            Assignee: maoling
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
>             zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>                     CreateMode.EPHEMERAL_SEQUENTIAL);
>             while (true) {
>                 synchronized (mutex) {
>                     List<String> list = zk.getChildren(root, true);
>                     if (list.size() < size) {
>                         mutex.wait();
>                     } else {
>                         return true;
>                     }
>                 }
>             }
>         }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>       
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

Reply via email to