[ https://issues.apache.org/jira/browse/CASSANDRA-14763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620283#comment-16620283 ]
Marcus Eriksson commented on CASSANDRA-14763: --------------------------------------------- a few comments; * The error message given by the failing nodetool could be a bit better: {{Repair job has failed with the error message: [2018-09-19 10:01:51,386] null}} maybe we could add that the user should have a look in the logs for further details * a comment about isPending() on the commit on github wrote a dtest making sure that we throw an exception if this happens: https://github.com/krummas/cassandra-dtest/commits/marcuse/14763 also looks like a few repair dtests needs fixing > Fail incremental repair prepare phase if it encounters sstables from > un-finalized sessions > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-14763 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14763 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Major > Fix For: 4.0 > > > Raised in CASSANDRA-14685. If we encounter sstables from other IR sessions > during an IR prepare phase, we should fail the new session. If we don't, the > expectation that all data received before a repair session is consistent when > it completes wouldn't always be true. > In more detail: > We don’t have a foolproof way of determining if a repair session has hung. To > prevent hung repair sessions from locking up sstables indefinitely, > incremental repair sessions will auto-fail after 24 hours. During this time, > the sstables for this session will remain isolated from the rest of the data > set. Afterwards, the sstables are moved back into the unrepaired set. > > During the prepare phase of an incremental repair, we isolate the data to be > repaired. However, we ignore other sstables marked pending repair for the > same token range. I think the intention here was to prevent a hung repair > from locking up incremental repairs for 24 hours without manual intervention. > Assuming the session succeeds, it’s data will be moved to repaired. _However > the data from a hung session will eventually be moved back to unrepaired._ > This means that you can’t use the most recent successful incremental repair > as the high water mark for fully repaired data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org