Guys, I've been looking into a fix for CURATOR-79 ( https://issues.apache.org/jira/browse/CURATOR-79) and have found it to be slightly more complicated than initially expected.
The locking recipes are using protected zNodes (i.e the zNode name contains a random UUID that is tied to a particular builder instance) for locks, which is sensible, but there seems to be an issue with this. The protected logic basically looks for the cause of failure on a create, and if it's connection loss, then it does an ensured deleted on the path it was trying to create to ensure that it's removed if it did get created. For CURATOR-79, and InterruptedException is causing this call to fail when waiting for the response from ZK. This means that the protected logic does not fire and we end up with an orphaned node. It's possible with some ugliness to handle this in the InterprocesMutex, but I think that maybe it's better fixed in the protected logic. Maybe the protected logic could be modified so that it will occur on ConnectionLoss or on any non-KeeperException (i.e. InterruptedException). This would cause the zNode to be removed if it was created, and would fix this deadlock issue. I would welcome anyone's opinion on the way forward. cheers Cam
