[ 
https://issues.apache.org/jira/browse/CURATOR-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713621#comment-14713621
 ] 

J D commented on CURATOR-233:
-----------------------------

Hi Mike,

I can confirm that the code works for 2 nodes.

However, I think the two lines marked below should be in the else statement.

{code:title=DistributedDoubleBarrier.java|borderStyle=solid}
            String watchPath; // Watch somebody else that still exists
            if ( ourIndex == 0 )
            {
                watchPath = ZKPaths.makePath(barrierPath, 
children.get(children.size() - 1));
            }
            else
            {
                watchPath = ZKPaths.makePath(barrierPath, children.get(0));
                checkDeleteOurPath(ourNodeShouldExist); //here
                ourNodeShouldExist = false; //here
            }

            Stat stat = 
client.checkExists().usingWatcher(watcher).forPath(watchPath);

            checkDeleteOurPath(ourNodeShouldExist); //not here
            ourNodeShouldExist = false; //not here
{code}


As you guessed correctly, the fix changes the behavior for 3+ nodes. The reason 
is that a shortcut for the exit barrier was used which is not compatible to 
client 0 leaving prematurely 
(http://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_doubleBarriers).

Client 0 watches any other node (and leaves the barrier if only he is left)
All other clients watch client 0 (and leave the barrier if client 0 has left)
Thus, if any other client than client 0 leaves after maxWaitMs, nothing happens 
and all remaining clients keep waiting
But if client 0 leaves after maxWaitMs all other nodes leave together with 
client 0 (even if they do not have a maxWaitMs time limit)


Best regards,

J D

> Bug in double barrier
> ---------------------
>
>                 Key: CURATOR-233
>                 URL: https://issues.apache.org/jira/browse/CURATOR-233
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.8.0
>            Reporter: J D
>            Assignee: Mike Drob
>             Fix For: 2.9.0
>
>         Attachments: DoubleBarrierClient.java, DoubleBarrierTester.java
>
>
> Hi,
> I think I discovered a bug in the internalLeave method of the double barrier 
> implementation.
> When a client is told to leave the barrier after maxWait it does not do so. A 
> flag is set but the client does not leave the barrier, instead it keeps 
> iterating through the control loop and drives CPU usage to 100%.
> I have attached an example.
> Best regards
> Lianro



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to