[ https://issues.apache.org/jira/browse/OAK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Egli updated OAK-10658: ------------------------------ Description: Consider the following series of events (in the context of late-writes / OAK-10254) : * a tree structure /a/b/c/d is created properly * the subtree /a/b is half-removed in a late-write - i.e. it is not properly removed * (at this point this late-write removal would still be detected) * then a different cluster instance starts up, reusing the clusterId from the above crashed instance * that cluster instance then does a sweep - thereby implicitly marking those late-writes as committed (commit value then resolves to "c" implicitly as they are older than sweepRev) - so /a/b is now actually deleted. IOW the late-write revision that was never merged, now all of a sudden is considered merged) ** the side-effect of this (sweep) situation is that the state of a revision all of a sudden has changed. But unless the nodesCache is properly invalidated, it would now contain the wrong state : that eg /a/b/c/d exists even though it now no longer does. * next that cluster instance does a classic GC - this will delete the /a/b subtree documents (but fails to invalidate that subtree in caches properly) * now that cluster instance then tries to create /a/b/c/d/e ** this attempt fails with a ConflictException since part of the code now expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't exist (documentStore) - hence "The node 4:/a/b/c/d does not exist or is already deleted at base revision" was: Consider the following series of events (in the context of late-writes / OAK-10254) : * a tree structure /a/b/c/d is created properly * the subtree /a/b is half-removed in a late-write - i.e. it is not properly removed * (at this point this late-write removal would still be detected) * then a different cluster instance starts up, reusing the clusterId from the above crashed instance * that cluster instance then does a sweep - thereby implicitly marking those late-writes as committed (commit value then resolves to "c" implicitly as they are older than sweepRev) - so /a/b is now actually deleted. IOW the late-write revision that was never merged, now all of a sudden is considered merged) ** the side-effect of this (sweep) situation is that the state of a revision all of a sudden has changed. But unless the nodesCache is properly invalidated, it would now contain the wrong state : that eg /a/b/c/d exists even though it now no longer does. * now that cluster instance then tries to create /a/b/c/d/e ** this attempt fails with a ConflictException since part of the code now expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't exist - hence "The node 4:/a/b/c/d does not exist or is already deleted at base revision" > Missing cache invalidation after a late-write-then-sweep > -------------------------------------------------------- > > Key: OAK-10658 > URL: https://issues.apache.org/jira/browse/OAK-10658 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk > Reporter: Stefan Egli > Priority: Major > > Consider the following series of events (in the context of late-writes / > OAK-10254) : > * a tree structure /a/b/c/d is created properly > * the subtree /a/b is half-removed in a late-write - i.e. it is not properly > removed > * (at this point this late-write removal would still be detected) > * then a different cluster instance starts up, reusing the clusterId from the > above crashed instance > * that cluster instance then does a sweep - thereby implicitly marking those > late-writes as committed (commit value then resolves to "c" implicitly as > they are older than sweepRev) - so /a/b is now actually deleted. IOW the > late-write revision that was never merged, now all of a sudden is considered > merged) > ** the side-effect of this (sweep) situation is that the state of a revision > all of a sudden has changed. But unless the nodesCache is properly > invalidated, it would now contain the wrong state : that eg /a/b/c/d exists > even though it now no longer does. > * next that cluster instance does a classic GC - this will delete the /a/b > subtree documents (but fails to invalidate that subtree in caches properly) > * now that cluster instance then tries to create /a/b/c/d/e > ** this attempt fails with a ConflictException since part of the code now > expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't > exist (documentStore) - hence "The node 4:/a/b/c/d does not exist or is > already deleted at base revision" -- This message was sent by Atlassian Jira (v8.20.10#820010)