I've a crazy idea for this which is super quick:

Here we add usage of ACL to the cache:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1358

What if we do this...?

In the cache, when we realize that ACL is missing, return false:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ReferenceCountedACLCache.java#L175

In DataTree we'll modify the Znode ACL reference to "-1" which is
"world readable", essentially removing the ACL from the znode and
continue:

synchronized (node) {
    if (!aclCache.addUsage(node.acl)) {
        // Fix missing ACL
        node.acl = OPEN_UNSAFE_ACL_ID;
        LOG.warn("Missing ACL has been removed from znode,
proceeding.");
    }
}

Txn's processing will be fine, next snapshot will be "fixed".

Andor



On Wed, 2025-02-05 at 15:45 -0600, Andor Molnar wrote:
> Hi ZK folks,
> 
> Let me draw your attention to this ticket. We've seen this happening
> in
> production and I would like to work on a fix.
> 
> Damien already created a draft PR here:
> https://github.com/apache/zookeeper/pull/2183
> 
> Let's take a closer look and work on a strategic solution.
> 
> Thanks,
> Andor
> 
> 

Reply via email to