[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924725#comment-17924725
 ] 

Andor Molnar edited comment on ZOOKEEPER-4846 at 2/6/25 11:21 PM:
------------------------------------------------------------------

Found a solution. It's super simple. ZooKeeper actually doesn't lose that txn 
gap, it's in the previous logfile and it iterates over it.

Look at this log snippet: 
{noformat}
2025-02-06 17:12:40,818 [myid:] - INFO  [main:FileSnap@85] - Reading snapshot 
/home/andor/tmp/zookeeper/version-2/snapshot.674c0
2025-02-06 17:12:40,845 [myid:] - INFO  [main:ReferenceCountedACLCache@174] - 
Ignoring acl 7382 as it does not exist in the cache
2025-02-06 17:12:40,847 [myid:] - INFO  [main:DataTree@1719] - The digest in 
the snapshot has digest version of 2, with zxid as 0x674c1, and digest value as 
8282470946294
2025-02-06 17:12:40,858 [myid:] - INFO  [main:FileTxnLog$FileTxnIterator@693] - 
Going to next log file /home/andor/tmp/zookeeper/version-2/log.5a1df
2025-02-06 17:12:40,924 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c1: 0x674c1
2025-02-06 17:12:40,924 [myid:] - INFO  [main:DataTree@469] - Creating node 
/cloudera_manager_zookeeper_canary with acl 6
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnLog$FileTxnIterator@693] - 
Going to next log file /home/andor/tmp/zookeeper/version-2/log.674c2
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c2: 0x674c2
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c3: 0x674c3
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ReferenceCountedACLCache@92] - 
ERROR: ACL not available for long 7382
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ZooKeeperServerMain@91] - 
Unexpected exception, exiting abnormally
java.lang.RuntimeException: Failed to fetch acls for 7382
        at 
org.apache.zookeeper.server.ReferenceCountedACLCache.convertLong(ReferenceCountedACLCache.java:93)
{noformat}
Snapshot zxid is {*}0x674c0{*}, so ZK goes to txn logfile *0x5a1df* first and 
fast forward to txn zxid {*}0x674c1{*}, so actually processing the txn where 
the parent znode was created. Clearly the ACL id is different in the cache and 
the data tree: 6 <> 7382

Look at the code where we process the txn:
{code:java}
synchronized (parent) {
    parentAcl = getACL(parent);    // Add the ACL to ACL cache first, to avoid 
the ACL not being
    // created race condition during fuzzy snapshot sync.
    //
    // This is the simplest fix, which may add ACL reference count
    // again if it's already counted in in the ACL map of fuzzy
    // snapshot, which might also happen for deleteNode txn, but
    // at least it won't cause the ACL not exist issue.
    //
    // Later we can audit and delete all non-referenced ACLs from
    // ACL map when loading the snapshot/txns from disk, like what
    // we did for the global sessions.
    Long longval = aclCache.convertAcls(acl);   <---- we re-create the ACL in 
the ACL cache, but potentially with different id             
    Set<String> children = parent.getChildren();
    if (children.contains(childName)) {
        throw new KeeperException.NodeExistsException();    <---- znode already 
exists in DataTree, so we skip
    }
...  {code}
The only thing that we need to do here is to get a reference to the existing 
znode and fix the ACL reference.
{code:java}
if (children.contains(childName)) {
    DataNode child = nodes.get(path);
    child.acl = longval;
    throw new KeeperException.NodeExistsException();
} {code}
or maybe something more elegant.

We don't lose the ACL information.


was (Author: andorm):
Found a solution. It's super simple. ZooKeeper actually doesn't lose that txn 
gap, it's in the previous logfile and it iterates over it.

Look at this log snippet: 
{noformat}
2025-02-06 17:12:40,818 [myid:] - INFO  [main:FileSnap@85] - Reading snapshot 
/home/andor/tmp/zookeeper/version-2/snapshot.674c0
2025-02-06 17:12:40,845 [myid:] - INFO  [main:ReferenceCountedACLCache@174] - 
Ignoring acl 7382 as it does not exist in the cache
2025-02-06 17:12:40,847 [myid:] - INFO  [main:DataTree@1719] - The digest in 
the snapshot has digest version of 2, with zxid as 0x674c1, and digest value as 
8282470946294
2025-02-06 17:12:40,858 [myid:] - INFO  [main:FileTxnLog$FileTxnIterator@693] - 
Going to next log file /home/andor/tmp/zookeeper/version-2/log.5a1df
2025-02-06 17:12:40,924 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c1: 0x674c1
2025-02-06 17:12:40,924 [myid:] - INFO  [main:DataTree@469] - Creating node 
/cloudera_manager_zookeeper_canary with acl 6
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnLog$FileTxnIterator@693] - 
Going to next log file /home/andor/tmp/zookeeper/version-2/log.674c2
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c2: 0x674c2
2025-02-06 17:12:40,928 [myid:] - INFO  [main:FileTxnSnapLog@351] - Processing 
zxid 0x674c3: 0x674c3
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ReferenceCountedACLCache@92] - 
ERROR: ACL not available for long 7382
2025-02-06 17:12:40,929 [myid:] - ERROR [main:ZooKeeperServerMain@91] - 
Unexpected exception, exiting abnormally
java.lang.RuntimeException: Failed to fetch acls for 7382
        at 
org.apache.zookeeper.server.ReferenceCountedACLCache.convertLong(ReferenceCountedACLCache.java:93)
{noformat}
Snapshot zxid is {*}0x674c0{*}, so ZK goes to txn logfile *0x5a1df* first and 
fast forward to txn zxid {*}0x674c1{*}, so actually processing the txn where 
the parent znode was created. Clearly the ACL id is different in the cache and 
the data tree: 6 <> 7382

Look at the code where we process the txn:
{code:java}
synchronized (parent) {
    parentAcl = getACL(parent);    // Add the ACL to ACL cache first, to avoid 
the ACL not being
    // created race condition during fuzzy snapshot sync.
    //
    // This is the simplest fix, which may add ACL reference count
    // again if it's already counted in in the ACL map of fuzzy
    // snapshot, which might also happen for deleteNode txn, but
    // at least it won't cause the ACL not exist issue.
    //
    // Later we can audit and delete all non-referenced ACLs from
    // ACL map when loading the snapshot/txns from disk, like what
    // we did for the global sessions.
    Long longval = aclCache.convertAcls(acl);      <---- we re-create the ACL 
in the ACL cache, but potentially with different id             
    Set<String> children = parent.getChildren();
    if (children.contains(childName)) {
        throw new KeeperException.NodeExistsException();    <---- znode already 
exists in DataTree, so we skip
    }
...  {code}
The only thing that we need to do here is to get a reference to the existing 
znode and fix the ACL reference.
{code:java}
if (children.contains(childName)) {
    DataNode child = nodes.get(path);
    child.acl = longval;
    throw new KeeperException.NodeExistsException();
} {code}
or maybe something more elegant.

We don't lose the ACL information.

> Failure to reload database due to missing ACL
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-4846
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4846
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Damien Diederen
>            Assignee: Damien Diederen
>            Priority: Blocker
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ZooKeeper snapshots are {_}fuzzy{_}, as the server does not stop processing 
> requests while ACLs and nodes are being streamed to disk.
> ACLs, notably, are streamed {_}first{_}, as a mapping between the full 
> serialized ACL and an "ACL ID" referenced by the node.
> Consequently, a snapshot can very well contain ACL IDs which do not exist in 
> the mapping. Prior to ZOOKEEPER-4799, such situations would produce harmless 
> (if annoying) "Ignoring acl XYZ as it does not exist in the cache" INFO 
> entries in the server logs.
> With ZOOKEEPER-4799, we started "eagerly" fetching the referenced ACLs in 
> {{DataTree}} operations such as {{{}createNode{}}}, {{{}deleteNode{}}}, 
> etc.—as opposed to just fetching them from request processors.
> This can result in fatal errors during the {{fastForwardFromEdits}} phase of 
> restoring a database, when transactions are processed on top of an 
> inconsistent data tree—preventing the server from starting.
> The errors are thrown in this code path:
> {code:java}
> // ReferenceCountedACLCache.java:90
> List<ACL> acls = longKeyMap.get(longVal);
> if (acls == null) {
>     LOG.error("ERROR: ACL not available for long {}", longVal);
>     throw new RuntimeException("Failed to fetch acls for " + longVal);
> }
> {code}
> Here is a scenario leading to such a failure:
>  * An existing node {{{}/foo{}}}, sporting an unique ACL, is deleted. This is 
> recorded in transaction log {{{}$SNAP-1{}}}; said ACL is also deallocated;
>  * Snapshot {{$SNAP}} is started;
>  * The ACL map is serialized to {{{}$SNAP{}}};
>  * A new node {{/foo}} sporting the same unique ACL is created in a portion 
> of the data tree which still has to be serialized;
>  * Node {{/foo}} is serialized to {{{}$SNAP{}}}—but its ACL isn't;
>  * The server is restarted;
>  * The {{DataTree}} is initialized from {{{}$SNAP{}}}, including node 
> {{/foo}} with a dangling ACL reference;
>  * Transaction log {{$SNAP-1}} is being replayed, leading to a 
> {{{}deleteNode("/foo"){}}};
>  * {{getACL(node)}} panics, preventing a successful restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to