[ 
https://issues.apache.org/jira/browse/OAK-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810909#comment-17810909
 ] 

Stefan Egli commented on OAK-10595:
-----------------------------------

{quote}another approach might be a change to what sweep does in the first 
place? I assume that's either not posible or too intrusive?{quote}
I don't see a way to change sweep itself. Maybe we could look into keeping the 
collisions around - and make sure the CommitValueResolver looks out for 
revisions in collisions.

I was under the impression that adding more functionality (to collisions & 
CommitValueResolver) would be making things more complicating than sending an 
invalidate journal entry. Of course the invalidate itself does have a 
(Schroedinger-like) influence on the caches (they are invalidated, hence could 
in theory be read shortly thereafter in a perhaps critical moment), so even 
this changes poses risk (which is why I think the tests must be thorough 
enough). How that risk compares to adjusting sweep/collision handling more 
broadly is up for discussion - I'd say it's lower.

> Cached data before a collision rollback can be read as committed
> ----------------------------------------------------------------
>
>                 Key: OAK-10595
>                 URL: https://issues.apache.org/jira/browse/OAK-10595
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: documentmk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>              Labels: candidate_oak_1_22
>
> There is a race-condition between a collision rollback and 
> MongoDocumentStore's nodesCache leaking uncommitted data, that later gets 
> treated as if committed.
> Under normal circumstances, a collision is properly cleaned up via a rollback 
> : all colliding data written is removed, and the revision was never marked as 
> committed in the first place. Without the revision marked as committed, 
> no-one would know of that revision - i.e. it wouldn't be able to be read 
> since that clusterId doesn't update parent's lastRevs etc. Subsequent updates 
> on involved documents result in caches to be updated accordingly, after which 
> all traces from a collision rollback are gone.
> But if a peer cluster manages to read and cache uncommitted data, that later 
> is rolled back due to a collision, it can happen that it treats that data as 
> if committed.
> This situation only persists as long as that process is running - since this 
> is dependent on cached data. The data in the physical repository is always 
> consistent. So a restart will cause that uncommitted data to disappear again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to