[ 
https://issues.apache.org/jira/browse/OAK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-10658:
------------------------------
    Description: 
Consider the following series of events (in the context of late-writes / 
OAK-10254) :
* a tree structure /a/b/c/d is created properly
* the subtree /a/b is half-removed in a late-write - i.e. it is not properly 
removed
* (at this point this late-write removal would still be detected)
* then a different cluster instance starts up, reusing the clusterId from the 
above crashed instance
* that cluster instance then does a sweep - thereby implicitly marking those 
late-writes as committed (commit value then resolves to "c" implicitly as they 
are older than sweepRev) - so /a/b is now actually deleted. IOW the late-write 
revision that was never merged, now all of a sudden is considered merged)
** the side-effect of this (sweep) situation is that the state of a revision 
all of a sudden has changed. But unless the nodesCache is properly invalidated, 
it would now contain the wrong state : that eg /a/b/c/d exists even though it 
now no longer does.
* next that cluster instance does a classic GC - this will delete the /a/b 
subtree documents (but fails to invalidate that subtree in caches properly)
* now that cluster instance then tries to create /a/b/c/d/e
** this attempt fails with a ConflictException since part of the code now 
expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't 
exist (documentStore) - hence "The node 4:/a/b/c/d does not exist or is already 
deleted at base revision"

  was:
Consider the following series of events (in the context of late-writes / 
OAK-10254) :
* a tree structure /a/b/c/d is created properly
* the subtree /a/b is half-removed in a late-write - i.e. it is not properly 
removed
* (at this point this late-write removal would still be detected)
* then a different cluster instance starts up, reusing the clusterId from the 
above crashed instance
* that cluster instance then does a sweep - thereby implicitly marking those 
late-writes as committed (commit value then resolves to "c" implicitly as they 
are older than sweepRev) - so /a/b is now actually deleted. IOW the late-write 
revision that was never merged, now all of a sudden is considered merged)
** the side-effect of this (sweep) situation is that the state of a revision 
all of a sudden has changed. But unless the nodesCache is properly invalidated, 
it would now contain the wrong state : that eg /a/b/c/d exists even though it 
now no longer does.
* now that cluster instance then tries to create /a/b/c/d/e
** this attempt fails with a ConflictException since part of the code now 
expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't 
exist - hence "The node 4:/a/b/c/d does not exist or is already deleted at base 
revision"


> Missing cache invalidation after a late-write-then-sweep
> --------------------------------------------------------
>
>                 Key: OAK-10658
>                 URL: https://issues.apache.org/jira/browse/OAK-10658
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: documentmk
>            Reporter: Stefan Egli
>            Priority: Major
>
> Consider the following series of events (in the context of late-writes / 
> OAK-10254) :
> * a tree structure /a/b/c/d is created properly
> * the subtree /a/b is half-removed in a late-write - i.e. it is not properly 
> removed
> * (at this point this late-write removal would still be detected)
> * then a different cluster instance starts up, reusing the clusterId from the 
> above crashed instance
> * that cluster instance then does a sweep - thereby implicitly marking those 
> late-writes as committed (commit value then resolves to "c" implicitly as 
> they are older than sweepRev) - so /a/b is now actually deleted. IOW the 
> late-write revision that was never merged, now all of a sudden is considered 
> merged)
> ** the side-effect of this (sweep) situation is that the state of a revision 
> all of a sudden has changed. But unless the nodesCache is properly 
> invalidated, it would now contain the wrong state : that eg /a/b/c/d exists 
> even though it now no longer does.
> * next that cluster instance does a classic GC - this will delete the /a/b 
> subtree documents (but fails to invalidate that subtree in caches properly)
> * now that cluster instance then tries to create /a/b/c/d/e
> ** this attempt fails with a ConflictException since part of the code now 
> expects /a/b/c/d to exist (the nodesCache) - but another part says it doesn't 
> exist (documentStore) - hence "The node 4:/a/b/c/d does not exist or is 
> already deleted at base revision"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to