[ https://issues.apache.org/jira/browse/OAK-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810909#comment-17810909 ]
Stefan Egli commented on OAK-10595: ----------------------------------- {quote}another approach might be a change to what sweep does in the first place? I assume that's either not posible or too intrusive?{quote} I don't see a way to change sweep itself. Maybe we could look into keeping the collisions around - and make sure the CommitValueResolver looks out for revisions in collisions. I was under the impression that adding more functionality (to collisions & CommitValueResolver) would be making things more complicating than sending an invalidate journal entry. Of course the invalidate itself does have a (Schroedinger-like) influence on the caches (they are invalidated, hence could in theory be read shortly thereafter in a perhaps critical moment), so even this changes poses risk (which is why I think the tests must be thorough enough). How that risk compares to adjusting sweep/collision handling more broadly is up for discussion - I'd say it's lower. > Cached data before a collision rollback can be read as committed > ---------------------------------------------------------------- > > Key: OAK-10595 > URL: https://issues.apache.org/jira/browse/OAK-10595 > Project: Jackrabbit Oak > Issue Type: Bug > Components: documentmk > Reporter: Stefan Egli > Assignee: Stefan Egli > Priority: Major > Labels: candidate_oak_1_22 > > There is a race-condition between a collision rollback and > MongoDocumentStore's nodesCache leaking uncommitted data, that later gets > treated as if committed. > Under normal circumstances, a collision is properly cleaned up via a rollback > : all colliding data written is removed, and the revision was never marked as > committed in the first place. Without the revision marked as committed, > no-one would know of that revision - i.e. it wouldn't be able to be read > since that clusterId doesn't update parent's lastRevs etc. Subsequent updates > on involved documents result in caches to be updated accordingly, after which > all traces from a collision rollback are gone. > But if a peer cluster manages to read and cache uncommitted data, that later > is rolled back due to a collision, it can happen that it treats that data as > if committed. > This situation only persists as long as that process is running - since this > is dependent on cached data. The data in the physical repository is always > consistent. So a restart will cause that uncommitted data to disappear again. -- This message was sent by Atlassian Jira (v8.20.10#820010)