[ https://issues.apache.org/jira/browse/ZOOKEEPER-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231654#comment-13231654 ]
Mihai Claudiu Toader commented on ZOOKEEPER-1424: ------------------------------------------------- Right now i'm leaving for a trip but as soon as i get a computer i'll do that. No later than 22'th March. > ZooKeeper will not allow a client to delete a tree when it should allow it > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-1424 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1424 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.2 > Environment: Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two > Java clients > Reporter: Mihai Claudiu Toader > > Hi all, > While using zookeeper at midokura we hit an interesting bug in zookeeper. We > did hit it sporadically > while developing some functional tests so i had to build a test case for it. > I finally created the test case and i think i narrowed down the conditions > under which it happens. > So i wanted to let you know my findings since they are somewhat troublesome. > We need: > - one running zookeeper server (didn't test that with a cluster) > let's name this: server > - one running zookeeper client that will create an ephemeral node under the > tree created by the next client > let's name this: the ephemeral client > - one running zookeeper client that will create a persistent tree and try > to delete that tree > let's name this: the persistent client > What needs to happen is this: > step 1. - the server starts > step 2. - the persistent client connects and creates a tree > step 3. - the ephemeral client connects and adds a ephemeral node under the > tree created by the persistent client > step 4. - the persistent client will try to delete the tree recursively > (without including the ephemeral node in the multi op > step 5. - the ephemeral client crashes hard (the equivalent of kill -9) > step 6. - the persistent client will try to delete the tree recursively > again (and fail with NoEmptyNode even if when we list the node we don't see > any childrens) > - the zookeeper server needs to be restarted in order for this to work. > The step 4 is critical in the sense that if we don't have that (there is no > previous error trying to remove a tree) then the nexts steps behave as we > would expect them to behave (aka pass). > Also no amount of fiddling with zookeeper connection timeouts (between > zookeeper and ephemeral node) will help. > > If the ephemeral client is shutdown properly it seems like everything will > behave properly (even with step 4). > The test code is available here: > https://github.com/mtoadermido/play > It needs an zookeepr 3.4.2 installed on the system (it uses the installed > jars from the deb to spawn the zookeeper server). > The entry point is > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java > There is a lot of boiler plate since i didn't want it to be depending on > stuff from midonet but the interesting part is the BlockingBug.main() method. > It will launch a zookeeper process, an external ephemeral client process, and > after that act as the second client. > Available tweaks: > - the zookeeper client timeout for the ephemeral client here: > > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56 > - the step 4 here (set to true / false): > > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69 > - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka > kill -9): > > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88 > The result is displayed depending on the fact that the final recursive > deletion succeeded or not: > > We hit it !. The clear tree failed. > > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103 > "No error :(" > > https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99 > The conclusion is that the bug seems to be inside the zookeeper codebase and > it's prone to being triggered by this > particular usage of zookeeper combined with the misfortune of having to kill > the ephemeral process hard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira