[
https://issues.apache.org/jira/browse/ZOOKEEPER-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902505#comment-17902505
]
mutu edited comment on ZOOKEEPER-4689 at 12/3/24 3:31 AM:
----------------------------------------------------------
I also encounter this problem. Here are three system logs in
[^node1.log],[^node2.log],[^node3.log].
Hope it helpful for this issue.
was (Author: JIRAUSER304604):
I also encounter this problem. Here are three system logs in
https://issues.apache.org/jira/secure/attachment/13073164/node1.log,https://issues.apache.org/jira/secure/attachment/13073163/node2.log,https://issues.apache.org/jira/secure/attachment/13073162/node3.log
> Node may not accessible due the the inconsistent ACL reference map after SNAP
> sync (again)
> ------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4689
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4689
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.0, 3.7.0, 3.8.0
> Reporter: Adam Yi
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> In Zookeeper, we do a "fuzzy snapshot". It means that we don't make a copy of
> the DataTree or grab a lock or anything when serializing a DataTree. Instead,
> we note down the zxid when we start serializing DataTree. We serialize the
> DataTree while it's getting mutated and replay the \{transactions after
> starting to take snapshot} after deserializing the DataTree. The idea is that
> those transactions should be idempotent.
> Zookeeper also implements its own interned ACL. It keeps a [long -> ACL] map
> and store the `long` in each node as nodes tend to share the same ACL.
> When serializing DataTree, we first serialize the ACL cache and then
> serialize the nodes. It's possible that with the following sequence, a node
> points to an invalid ACL entry:
> 1. Serialize ACL
> 2. Create node with new ACL
> 3. Serialize node
> ZOOKEEPER-3306 fixes this by making sure to insert the ACL to cache upon
> calling `DataTree.createNode` when replaying transactions and when the node
> already exists. However, we only insert it to the cache, but do not set the
> interned ACL in the node to point to the new entry.
> It's possible that the longval we get for the ACL is inconsistent, even
> though we follow the same zxid ordering of events. Specifically, we keep a
> [aclIndex] pointing to the max entry that currently exists and increment that
> whenever we need to intern a new ACL we've never seen before.
> With ZOOKEEPER-2214, we started to do reference counting in ACL cache and
> remove no-longer used entries from the cache.
> Say the current aclIndex is 10. If we create a node with ACL unseen before
> and delete that node, aclIndex will increment to 11. However, when we
> deserialize the tree, we'll set aclIndex to the max existent ACL cache entry,
> so it's reverted back to 10. aclIndex inconsistency on its own is fine but it
> causes problem to the ZOOKEEPER-3306 patch.
> Now if we follow the same scenario mentioned in ZOOKEEPER-3306:
> # Leader creates ACL entry 11 and delete it due to node deletion
> # Server A starts to have snap sync with leader
> # After serializing the ACL map to Server A, there is a txn T1 to create a
> node N1 with new ACL_1 which was not exist in ACL map
> # On leader, after this txn, the ACL map will be 12 -> (ACL_1, COUNT: 1),
> and data tree N1 -> 12
> # On server A, it will be ACL map with max ID 10, and N1 -> 12 in fuzzy
> snapshot
> # When replaying the txn T1, it will add 11 -> (ACL_1, COUNT: 1) to the ACL
> cache but the node N1 still points to 12.
> N1 still points to invalid ACL entry.
> There are two ways to fix this:
> # Make aclIndex consistent upon re-deserialization (by either serializing it
> in snapshot or paying special attention to decrement it when removing cache)
> # Fix ZOOKEEPER-3306 patch so that we also override the ACL of node to new
> key if previous entry does not exist in the ACL table.
>
> I think solution 2 is nicer as aclIndex inconsistency itself is not a
> problem. With solution 1, we're still implicitly depending on aclIndex
> consistency and ordering of events. It's harder to reason about and it seems
> more fragile than solution 1.
> I'm going to send a patch for solution 2 but please let me know if you
> disagree and I'm happy to go with solution 1 instead.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)