[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824644#comment-13824644 ]
Raul Gutierrez Segales commented on ZOOKEEPER-1573: --------------------------------------------------- Nit - maybe this: {noformat} + * Snapshots are lazily created. So when the snapshot was in progress + * there is a chance that some of the later transactions can go into + * snapshot. While restoring same transactions NONODE/NODEEXISTS errors + * can come. Basically we can ignore all errors during the restore. {noformat} could be more clear like this: {noformat} + * Snapshots are lazily created. So when a snapshot is in progress, + * there is a chance for later transactions to make to into the snapshot. + * Then when the snapshot is restored, NONODE/NODEEXISTS errors + * could occur. It should be safe to ignore these. {noformat} Nit: {noformat} + LOG.warn("Intrrupted"); {noformat} typo. Nit: {noformat} + LOG.debug("Ignoring processTxn failure hdr: " + hdr.getType() + " : error: " + rc.err + " path: " + rc.path); {noformat} use string extrapolation with {} instead of string concatenation. Nit: {noformat} + /** + * Test we can restore a snapshot that has delete txns ahead of the zxid of the snapshot file. ZOOKEEPER-1573 + */ {noformat} make it: {noformat} + /** + * ZOOKEEPER-1573: test restoring a snapshot with deleted txns ahead of the snapshot file's zxid. + */ {noformat} Nit: {noformat} + LOG.info("Set lastProcessedZxid to " + zks.getZKDatabase().getDataTreeLastProcessedZxid()); {noformat} ditto wrt to string extrapolation via {}. > Unable to load database due to missing parent node > -------------------------------------------------- > > Key: ZOOKEEPER-1573 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.3, 3.5.0 > Reporter: Thawan Kooburat > Attachments: ZOOKEEPER-1573.patch > > > While replaying txnlog on data tree, the server has a code to detect missing > parent node. This code block was last modified as part of ZOOKEEPER-1333. In > our production, we found a case where this check is return false positive. > The sequence of txns is as follows: > zxid 1: create /prefix/a > zxid 2: create /prefix/a/b > zxid 3: delete /prefix/a/b > zxid 4: delete /prefix/a > The server start capturing snapshot at zxid 1. However, by the time it > traversing the data tree down to /prefix, txn 4 is already applied and > /prefix have no children. > When the server restore from snapshot, it process txnlog starting from zxid > 2. This txn generate missing parent error and the server refuse to start up. > The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know > if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1#6144)