I've a crazy idea for this which is super quick: Here we add usage of ACL to the cache: https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1358
What if we do this...? In the cache, when we realize that ACL is missing, return false: https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ReferenceCountedACLCache.java#L175 In DataTree we'll modify the Znode ACL reference to "-1" which is "world readable", essentially removing the ACL from the znode and continue: synchronized (node) { if (!aclCache.addUsage(node.acl)) { // Fix missing ACL node.acl = OPEN_UNSAFE_ACL_ID; LOG.warn("Missing ACL has been removed from znode, proceeding."); } } Txn's processing will be fine, next snapshot will be "fixed". Andor On Wed, 2025-02-05 at 15:45 -0600, Andor Molnar wrote: > Hi ZK folks, > > Let me draw your attention to this ticket. We've seen this happening > in > production and I would like to work on a fix. > > Damien already created a draft PR here: > https://github.com/apache/zookeeper/pull/2183 > > Let's take a closer look and work on a strategic solution. > > Thanks, > Andor > >