In the scenario described in the pull request, it sounds like we ultimately are not serializing the ACL for inode /foo. IIUC, this proposal would result in setting /foo to world readable, not the original intended ACL that was dropped from serialization. It would fix the failure, but it could open a security risk. (Let me know if I misunderstood.)
The only other solution I can think of is to persist a full representation of the ACLs per every node. The ReferenceCountedACLCache would not be serialized, and instead be reconstructed from the deserialized nodes, becoming purely a runtime memory optimization. The HDFS implementation for ACLs is similar to this. Of course, this would be a much bigger change. Chris Nauroth On Wed, Feb 5, 2025 at 4:03 PM Andor Molnar <an...@apache.org> wrote: > I've a crazy idea for this which is super quick: > > Here we add usage of ACL to the cache: > > https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1358 > > What if we do this...? > > In the cache, when we realize that ACL is missing, return false: > > https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ReferenceCountedACLCache.java#L175 > > In DataTree we'll modify the Znode ACL reference to "-1" which is > "world readable", essentially removing the ACL from the znode and > continue: > > synchronized (node) { > if (!aclCache.addUsage(node.acl)) { > // Fix missing ACL > node.acl = OPEN_UNSAFE_ACL_ID; > LOG.warn("Missing ACL has been removed from znode, > proceeding."); > } > } > > Txn's processing will be fine, next snapshot will be "fixed". > > Andor > > > > On Wed, 2025-02-05 at 15:45 -0600, Andor Molnar wrote: > > Hi ZK folks, > > > > Let me draw your attention to this ticket. We've seen this happening > > in > > production and I would like to work on a fix. > > > > Damien already created a draft PR here: > > https://github.com/apache/zookeeper/pull/2183 > > > > Let's take a closer look and work on a strategic solution. > > > > Thanks, > > Andor > > > > > >