adamyi commented on PR #1997:
URL: https://github.com/apache/zookeeper/pull/1997#issuecomment-1583944146

   Unfortunately, there are some bugs with this patch:
   
    
   Bug 1
   -----
    
   Consider the following sequence of events:
    
   1. Serialize ACL (up to 5)
   2. Create /a1 with new ACL 6
   3. Create /a2 with ACL 6
   4. Delete /a1
    
   Without this feature, we would create ACL 6 with a refcount of 1 at step 2 
and
   leave it unchanged at step 3 because ACL 6 already exists. Deleting /a1 at 
step
   4 would decrement the refcount and consequently garbage-collect the ACL 
entry,
   leaving /a2 pointing to a non-existent ACL entry.
    
   Bug 2
   -----
   Consider the following sequence of events:
    
   0. ACL is up to 5
   1. Create /a0 with ACL 6
   2. Remove /a0
   3. Serialize ACL (up to 5)
   4. Create /a1 with new ACL 7
   5. Create /a2 with new ACL 8
   6. Create /a3 with new ACL 7
    
   When replaying, we would create ACL 6 at step 4; 7 at step 5. When replaying
   step 6, we would treat ACL 7 as already created with the value from step 5. 
That
   causes /a3 to have the ACL of /a2. This is wrong.
   
   Fixes
   ---
   I'm proposing three approaches to fix these bugs. I don't feel strongly 
among these solutions and am happy to implement any of them. Do Zookeeper 
reviewers have a preference?
   
   1. Make the ACL cache skip indices for nodes that were deserialized even if 
these entries weren't there originally, as if we created and then removed them 
(ACL cache is discrete due to GC). We can do this by also bumping aclIndex in 
ReferenceCountedACLCache.addUsage. For bug 1 scenario, we'll move /a1 and /a2 
both to ACL 7, making it easier to
   accurately ref-count them. Deleting /a1 will not affect /a2. For bug 2 
scenario, we'll set /a1 and /a3 to point to ACL 9 and /a2 to point to ACL 10.
   2. Directly change ACL to -2 during node deserialization if ACL cache does 
not exist. When creating node, we overwrite ACL if and only if it's -2. This is 
essentially the same as skipping these indices as we're making (ACLs originally 
missing) and (recreated ACLs during replay) two disjoint spaces. It might feel 
like a more local fix than solution 1 as the code change only involves DataTree 
but it's not. It still relies on the fact that -2 is not a valid ACL index in 
ReferenceCountedACLCache (ACLCache already uses -1 as special value).
    3. An even more radical variation of solution 2 would be to directly delete 
the node during node deserialization if ACL cache does not exist. The argument 
is that we must re-create it during transaction replay; otherwise the node 
would not be accessible due to missing ACL MarshallError anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to