[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160955#comment-14160955
 ] 

Yip Ng commented on ZOOKEEPER-2052:
-----------------------------------

Thanks Rakesh, it took some time for us to reproduce this issue since it 
happens intermittently.  

I see there are a great deal of discussion and information exchange in 
https://issues.apache.org/jira/browse/ZOOKEEPER-965 ,
particularly there was a comment there regarding details on the 
rollbackPendingChanges() implementation.  It would certainly be 
very helpful if folks involved with that jira can comment further.

Nevertheless, there seems to be a bug in getPendingChanges() where it can save 
a ChangeRecord with zxid = -1 (obtained from getRecordForPath )
If I understand this correctly, this ChangeRecord is NOT an entry from the 
zks.outstandingChangesForPath map but came from ZKDatabase.   Since 
it is not an actual outstanding change for the path, it shouldn't be saved in 
getPendingChanges() as then it will introduce new ChangeRecords when the
multi() fails.

> Unable to delete a node when the node has no children
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-2052
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2052
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6, 3.5.0
>         Environment: Red Hat Enterprise Linux 6.1 x86_64, standalone or 3 
> node ensemble (v3.4.6), 2 Java clients (v3.4.6)
>            Reporter: Yip Ng
>         Attachments: ZOOKEEPER-2052.patch, ZOOKEEPER-2052.patch, 
> ZOOKEEPER-2052.patch, zookeeper.log
>
>
> We stumbled upon a ZooKeeper bug where a node with no children cannot be 
> removed on our 3 node ZooKeeper ensemble or standalone ZooKeeper on Red Hat 
> Enterprise Linux x86_64 environment.  Here is an example scenario/setup:
> o Standalone ZooKeeper or 3 node ensemble (v3.4.6)
> o 2 Java clients (v3.4.6)
>   - Client A creates a persistent node (e.g.:  /metadata/resources)
>   - Client B creates ephemeral nodes under this persistent node 
> o Client A attempts to remove the /metadata/resources node via multi op  
>    delete but fails since there are children
> o Client B's session expired, all the ephemeral nodes are removed
> o Client A attempts to recursively remove /metadata/resources node via 
>    multi op, this is expected to succeed but got the following exception:
>       org.apache.zookeeper.KeeperException$NotEmptyException:     
>          KeeperErrorCode = Directory not empty
>    (Note that Client B is the only client that creates these ephemeral nodes)
> o After this, we use zkCli.sh to inspect the problematic node but the 
> zkCli.sh shows the /metadata/resources node indeed have no children but it 
> will not allow /metadata/resources node to get deleted.  (shown below)
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper, metadata]
> [zk: localhost:2181(CONNECTED) 1] ls /metadata
> [resources]
> [zk: localhost:2181(CONNECTED) 2] get /metadata/resources
> null
> cZxid = 0x3
> ctime = Wed Oct 01 22:04:11 PDT 2014
> mZxid = 0x3
> mtime = Wed Oct 01 22:04:11 PDT 2014
> pZxid = 0x9
> cversion = 2
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 3] delete /metadata/resources
> Node not empty: /metadata/resources
> [zk: localhost:2181(CONNECTED) 4] get /metadata/resources   
> null
> cZxid = 0x3
> ctime = Wed Oct 01 22:04:11 PDT 2014
> mZxid = 0x3
> mtime = Wed Oct 01 22:04:11 PDT 2014
> pZxid = 0x9
> cversion = 2
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0
> o The only ways to remove this node is to either:
>    a) Restart the ZooKeeper server
>    b) set data to /metadata/resources then followed by a subsequent delete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to