adamyi commented on PR #1997: URL: https://github.com/apache/zookeeper/pull/1997#issuecomment-1583944146
Unfortunately, there are some bugs with this patch: Bug 1 ----- Consider the following sequence of events: 1. Serialize ACL (up to 5) 2. Create /a1 with new ACL 6 3. Create /a2 with ACL 6 4. Delete /a1 Without this feature, we would create ACL 6 with a refcount of 1 at step 2 and leave it unchanged at step 3 because ACL 6 already exists. Deleting /a1 at step 4 would decrement the refcount and consequently garbage-collect the ACL entry, leaving /a2 pointing to a non-existent ACL entry. Bug 2 ----- Consider the following sequence of events: 0. ACL is up to 5 1. Create /a0 with ACL 6 2. Remove /a0 3. Serialize ACL (up to 5) 4. Create /a1 with new ACL 7 5. Create /a2 with new ACL 8 6. Create /a3 with new ACL 7 When replaying, we would create ACL 6 at step 4; 7 at step 5. When replaying step 6, we would treat ACL 7 as already created with the value from step 5. That causes /a3 to have the ACL of /a2. This is wrong. Fixes --- I'm proposing three approaches to fix these bugs. I don't feel strongly among these solutions and am happy to implement any of them. Do Zookeeper reviewers have a preference? 1. Make the ACL cache skip indices for nodes that were deserialized even if these entries weren't there originally, as if we created and then removed them (ACL cache is discrete due to GC). We can do this by also bumping aclIndex in ReferenceCountedACLCache.addUsage. For bug 1 scenario, we'll move /a1 and /a2 both to ACL 7, making it easier to accurately ref-count them. Deleting /a1 will not affect /a2. For bug 2 scenario, we'll set /a1 and /a3 to point to ACL 9 and /a2 to point to ACL 10. 2. Directly change ACL to -2 during node deserialization if ACL cache does not exist. When creating node, we overwrite ACL if and only if it's -2. This is essentially the same as skipping these indices as we're making (ACLs originally missing) and (recreated ACLs during replay) two disjoint spaces. It might feel like a more local fix than solution 1 as the code change only involves DataTree but it's not. It still relies on the fact that -2 is not a valid ACL index in ReferenceCountedACLCache (ACLCache already uses -1 as special value). 3. An even more radical variation of solution 2 would be to directly delete the node during node deserialization if ACL cache does not exist. The argument is that we must re-create it during transaction replay; otherwise the node would not be accessible due to missing ACL MarshallError anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org