[
https://issues.apache.org/jira/browse/ZOOKEEPER-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924725#comment-17924725
]
Andor Molnar edited comment on ZOOKEEPER-4846 at 2/6/25 11:21 PM:
------------------------------------------------------------------
Found a solution. It's super simple. ZooKeeper actually doesn't lose that txn
gap, it's in the previous logfile and it iterates over it.
Look at this log snippet:
{noformat}
2025-02-06 17:12:40,818 [myid:] - INFO [main:FileSnap@85] - Reading snapshot
/home/andor/tmp/zookeeper/version-2/snapshot.674c0
2025-02-06 17:12:40,845 [myid:] - INFO [main:ReferenceCountedACLCache@174] -
Ignoring acl 7382 as it does not exist in the cache
2025-02-06 17:12:40,847 [myid:] - INFO [main:DataTree@1719] - The digest in
the snapshot has digest version of 2, with zxid as 0x674c1, and digest value as
8282470946294
2025-02-06 17:12:40,858 [myid:] - INFO [main:FileTxnLog$FileTxnIterator@693] -
Going to next log file /home/andor/tmp/zookeeper/version-2/log.5a1df
2025-02-06 17:12:40,924 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c1: 0x674c1
2025-02-06 17:12:40,924 [myid:] - INFO [main:DataTree@469] - Creating node
/cloudera_manager_zookeeper_canary with acl 6
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnLog$FileTxnIterator@693] -
Going to next log file /home/andor/tmp/zookeeper/version-2/log.674c2
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c2: 0x674c2
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c3: 0x674c3
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ReferenceCountedACLCache@92] -
ERROR: ACL not available for long 7382
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ZooKeeperServerMain@91] -
Unexpected exception, exiting abnormally
java.lang.RuntimeException: Failed to fetch acls for 7382
at
org.apache.zookeeper.server.ReferenceCountedACLCache.convertLong(ReferenceCountedACLCache.java:93)
{noformat}
Snapshot zxid is {*}0x674c0{*}, so ZK goes to txn logfile *0x5a1df* first and
fast forward to txn zxid {*}0x674c1{*}, so actually processing the txn where
the parent znode was created. Clearly the ACL id is different in the cache and
the data tree: 6 <> 7382
Look at the code where we process the txn:
{code:java}
synchronized (parent) {
parentAcl = getACL(parent); // Add the ACL to ACL cache first, to avoid
the ACL not being
// created race condition during fuzzy snapshot sync.
//
// This is the simplest fix, which may add ACL reference count
// again if it's already counted in in the ACL map of fuzzy
// snapshot, which might also happen for deleteNode txn, but
// at least it won't cause the ACL not exist issue.
//
// Later we can audit and delete all non-referenced ACLs from
// ACL map when loading the snapshot/txns from disk, like what
// we did for the global sessions.
Long longval = aclCache.convertAcls(acl); <---- we re-create the ACL in
the ACL cache, but potentially with different id
Set<String> children = parent.getChildren();
if (children.contains(childName)) {
throw new KeeperException.NodeExistsException(); <---- znode already
exists in DataTree, so we skip
}
... {code}
The only thing that we need to do here is to get a reference to the existing
znode and fix the ACL reference.
{code:java}
if (children.contains(childName)) {
DataNode child = nodes.get(path);
child.acl = longval;
throw new KeeperException.NodeExistsException();
} {code}
or maybe something more elegant.
We don't lose the ACL information.
was (Author: andorm):
Found a solution. It's super simple. ZooKeeper actually doesn't lose that txn
gap, it's in the previous logfile and it iterates over it.
Look at this log snippet:
{noformat}
2025-02-06 17:12:40,818 [myid:] - INFO [main:FileSnap@85] - Reading snapshot
/home/andor/tmp/zookeeper/version-2/snapshot.674c0
2025-02-06 17:12:40,845 [myid:] - INFO [main:ReferenceCountedACLCache@174] -
Ignoring acl 7382 as it does not exist in the cache
2025-02-06 17:12:40,847 [myid:] - INFO [main:DataTree@1719] - The digest in
the snapshot has digest version of 2, with zxid as 0x674c1, and digest value as
8282470946294
2025-02-06 17:12:40,858 [myid:] - INFO [main:FileTxnLog$FileTxnIterator@693] -
Going to next log file /home/andor/tmp/zookeeper/version-2/log.5a1df
2025-02-06 17:12:40,924 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c1: 0x674c1
2025-02-06 17:12:40,924 [myid:] - INFO [main:DataTree@469] - Creating node
/cloudera_manager_zookeeper_canary with acl 6
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnLog$FileTxnIterator@693] -
Going to next log file /home/andor/tmp/zookeeper/version-2/log.674c2
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c2: 0x674c2
2025-02-06 17:12:40,928 [myid:] - INFO [main:FileTxnSnapLog@351] - Processing
zxid 0x674c3: 0x674c3
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ReferenceCountedACLCache@92] -
ERROR: ACL not available for long 7382
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ZooKeeperServerMain@91] -
Unexpected exception, exiting abnormally
java.lang.RuntimeException: Failed to fetch acls for 7382
at
org.apache.zookeeper.server.ReferenceCountedACLCache.convertLong(ReferenceCountedACLCache.java:93)
{noformat}
Snapshot zxid is {*}0x674c0{*}, so ZK goes to txn logfile *0x5a1df* first and
fast forward to txn zxid {*}0x674c1{*}, so actually processing the txn where
the parent znode was created. Clearly the ACL id is different in the cache and
the data tree: 6 <> 7382
Look at the code where we process the txn:
{code:java}
synchronized (parent) {
parentAcl = getACL(parent); // Add the ACL to ACL cache first, to avoid
the ACL not being
// created race condition during fuzzy snapshot sync.
//
// This is the simplest fix, which may add ACL reference count
// again if it's already counted in in the ACL map of fuzzy
// snapshot, which might also happen for deleteNode txn, but
// at least it won't cause the ACL not exist issue.
//
// Later we can audit and delete all non-referenced ACLs from
// ACL map when loading the snapshot/txns from disk, like what
// we did for the global sessions.
Long longval = aclCache.convertAcls(acl); <---- we re-create the ACL
in the ACL cache, but potentially with different id
Set<String> children = parent.getChildren();
if (children.contains(childName)) {
throw new KeeperException.NodeExistsException(); <---- znode already
exists in DataTree, so we skip
}
... {code}
The only thing that we need to do here is to get a reference to the existing
znode and fix the ACL reference.
{code:java}
if (children.contains(childName)) {
DataNode child = nodes.get(path);
child.acl = longval;
throw new KeeperException.NodeExistsException();
} {code}
or maybe something more elegant.
We don't lose the ACL information.
> Failure to reload database due to missing ACL
> ---------------------------------------------
>
> Key: ZOOKEEPER-4846
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4846
> Project: ZooKeeper
> Issue Type: Bug
> Reporter: Damien Diederen
> Assignee: Damien Diederen
> Priority: Blocker
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> ZooKeeper snapshots are {_}fuzzy{_}, as the server does not stop processing
> requests while ACLs and nodes are being streamed to disk.
> ACLs, notably, are streamed {_}first{_}, as a mapping between the full
> serialized ACL and an "ACL ID" referenced by the node.
> Consequently, a snapshot can very well contain ACL IDs which do not exist in
> the mapping. Prior to ZOOKEEPER-4799, such situations would produce harmless
> (if annoying) "Ignoring acl XYZ as it does not exist in the cache" INFO
> entries in the server logs.
> With ZOOKEEPER-4799, we started "eagerly" fetching the referenced ACLs in
> {{DataTree}} operations such as {{{}createNode{}}}, {{{}deleteNode{}}},
> etc.—as opposed to just fetching them from request processors.
> This can result in fatal errors during the {{fastForwardFromEdits}} phase of
> restoring a database, when transactions are processed on top of an
> inconsistent data tree—preventing the server from starting.
> The errors are thrown in this code path:
> {code:java}
> // ReferenceCountedACLCache.java:90
> List<ACL> acls = longKeyMap.get(longVal);
> if (acls == null) {
> LOG.error("ERROR: ACL not available for long {}", longVal);
> throw new RuntimeException("Failed to fetch acls for " + longVal);
> }
> {code}
> Here is a scenario leading to such a failure:
> * An existing node {{{}/foo{}}}, sporting an unique ACL, is deleted. This is
> recorded in transaction log {{{}$SNAP-1{}}}; said ACL is also deallocated;
> * Snapshot {{$SNAP}} is started;
> * The ACL map is serialized to {{{}$SNAP{}}};
> * A new node {{/foo}} sporting the same unique ACL is created in a portion
> of the data tree which still has to be serialized;
> * Node {{/foo}} is serialized to {{{}$SNAP{}}}—but its ACL isn't;
> * The server is restarted;
> * The {{DataTree}} is initialized from {{{}$SNAP{}}}, including node
> {{/foo}} with a dangling ACL reference;
> * Transaction log {{$SNAP-1}} is being replayed, leading to a
> {{{}deleteNode("/foo"){}}};
> * {{getACL(node)}} panics, preventing a successful restart.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)