[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895579#comment-13895579 ]
Flavio Junqueira commented on ZOOKEEPER-1573: --------------------------------------------- It seems reasonable to solve the problem this way, ignoring a nonode for the parent. Perhaps a better way would be to advance the txn log to make sure that the delete of the parent is there. The assumption for the current patch is that a delete will show up in the txn log at some point, which is fine if nothing has gone wrong. If it is difficult to advance the txn log, we could alternatively keep information about the missing parent to check later that the delete is there. Because we don't want to hold 3.4.6 for much longer, if you people prefer, we could check this in and create a new jira to fix it in the way I'm proposing later on, assuming my proposal makes sense. > Unable to load database due to missing parent node > -------------------------------------------------- > > Key: ZOOKEEPER-1573 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.3, 3.5.0 > Reporter: Thawan Kooburat > Assignee: Vinay > Priority: Critical > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1573-3.4.patch, ZOOKEEPER-1573.patch, > ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch > > > While replaying txnlog on data tree, the server has a code to detect missing > parent node. This code block was last modified as part of ZOOKEEPER-1333. In > our production, we found a case where this check is return false positive. > The sequence of txns is as follows: > zxid 1: create /prefix/a > zxid 2: create /prefix/a/b > zxid 3: delete /prefix/a/b > zxid 4: delete /prefix/a > The server start capturing snapshot at zxid 1. However, by the time it > traversing the data tree down to /prefix, txn 4 is already applied and > /prefix have no children. > When the server restore from snapshot, it process txnlog starting from zxid > 2. This txn generate missing parent error and the server refuse to start up. > The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know > if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)