[ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049197#comment-13049197 ]
Camille Fournier commented on ZOOKEEPER-1046: --------------------------------------------- Is this going to force me to write some kind of upgrade script for the ZK servers I have running 3.3.3? > Creating a new sequential node results in a ZNODEEXISTS error > ------------------------------------------------------------- > > Key: ZOOKEEPER-1046 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.3.2, 3.3.3 > Environment: A 3 node-cluster running Debian squeeze. > Reporter: Jeremy Stribling > Assignee: Vishal K > Labels: sequence > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, > ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz > > > On several occasions, I've seen a create() with the sequential flag set fail > with a ZNODEEXISTS error, and I don't think that should ever be possible. In > past runs, I've been able to closely inspect the state of the system with the > command line client, and saw that the parent znode's cversion is smaller than > the sequential number of existing children znode under that parent. In one > example: > {noformat} > [zk:<ip:port>(CONNECTED) 3] stat /zkrsm > cZxid = 0x5 > ctime = Mon Jan 17 18:28:19 PST 2011 > mZxid = 0x5 > mtime = Mon Jan 17 18:28:19 PST 2011 > pZxid = 0x1d819 > cversion = 120710 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 0 > numChildren = 2955 > {noformat} > However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk. > In a recent run, I was able to capture the Zookeeper logs, and I will attach > them to this JIRA. The logs are named as nodeX.<zxid_prefixes>.log, and each > new log represents an application process restart. > Here's the scenario: > # There's a cluster with nodes 1,2,3 using zxid 0x3. > # All three nodes restart, forming a cluster of zxid 0x4. > # Node 3 restarts, leading to a cluster of 0x5. > At this point, it seems like node 1 is the leader of the 0x5 epoch. In its > log (node1.0x4-0x5.log) you can see the first (of many) instances of the > following message: > {noformat} > 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level > KeeperException when processing sessionid:0x512f466bd44e0002 type:create > cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error > Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = > NodeExists for /zkrsm/00000000000000b2_record0001761440 > {noformat} > This then repeats forever as my application isn't expecting to ever get this > error message on a sequential node create, and just continually retries. The > message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes > into play. > I don't see anything terribly fishy in the transition between the epochs; the > correct snapshots seem to be getting transferred, etc. Unfortunately I don't > have a ZK snapshot/log that exhibits the problem when starting with a fresh > system. > Some oddities you might notice in these logs: > * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a > bug in our application code. (They are assigned randomly, but are supposed > to be consistent across restarts.) > * We manage node membership dynamically, and our application restarts the > ZooKeeperServer classes whenever a new node wants to join (without restarting > the entire application process). This is why you'll see messages like the > following in node1.0x4-0x5.log before a new election begins: > {noformat} > 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO > org.apache.zookeeper.server.quorum.Learner - shutdown called > {noformat} > * There is in fact one of these dynamic membership changes in > node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how > this would be related though, as no transactions are done during this period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira