[ https://issues.apache.org/jira/browse/CASSANDRA-14763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620926#comment-16620926 ]
Blake Eggleston commented on CASSANDRA-14763: --------------------------------------------- Pushed up fixes and started a new circle run. Good catch with the race condition. I realized there’s another race where another pending anti-compaction could complete after we’d filtered our sstables and before we locked the sstables, potentially leaking data from one session to another which I fixed in AcquisitionCallback. > Fail incremental repair prepare phase if it encounters sstables from > un-finalized sessions > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-14763 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14763 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Major > Fix For: 4.0 > > > Raised in CASSANDRA-14685. If we encounter sstables from other IR sessions > during an IR prepare phase, we should fail the new session. If we don't, the > expectation that all data received before a repair session is consistent when > it completes wouldn't always be true. > In more detail: > We don’t have a foolproof way of determining if a repair session has hung. To > prevent hung repair sessions from locking up sstables indefinitely, > incremental repair sessions will auto-fail after 24 hours. During this time, > the sstables for this session will remain isolated from the rest of the data > set. Afterwards, the sstables are moved back into the unrepaired set. > > During the prepare phase of an incremental repair, we isolate the data to be > repaired. However, we ignore other sstables marked pending repair for the > same token range. I think the intention here was to prevent a hung repair > from locking up incremental repairs for 24 hours without manual intervention. > Assuming the session succeeds, it’s data will be moved to repaired. _However > the data from a hung session will eventually be moved back to unrepaired._ > This means that you can’t use the most recent successful incremental repair > as the high water mark for fully repaired data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org