[ 
https://issues.apache.org/jira/browse/CASSANDRA-14763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620283#comment-16620283
 ] 

Marcus Eriksson commented on CASSANDRA-14763:
---------------------------------------------

a few comments;

* The error message given by the failing nodetool could be a bit better: 
{{Repair job has failed with the error message: [2018-09-19 10:01:51,386] 
null}} maybe we could add that the user should have a look in the logs for 
further details
* a comment about isPending() on the commit on github

wrote a dtest making sure that we throw an exception if this happens: 
https://github.com/krummas/cassandra-dtest/commits/marcuse/14763

also looks like a few repair dtests needs fixing

> Fail incremental repair prepare phase if it encounters sstables from 
> un-finalized sessions
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14763
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14763
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Repair
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 4.0
>
>
> Raised in CASSANDRA-14685. If we encounter sstables from other IR sessions 
> during an IR prepare phase, we should fail the new session. If we don't, the 
> expectation that all data received before a repair session is consistent when 
> it completes wouldn't always be true.
> In more detail: 
> We don’t have a foolproof way of determining if a repair session has hung. To 
> prevent hung repair sessions from locking up sstables indefinitely, 
> incremental repair sessions will auto-fail after 24 hours. During this time, 
> the sstables for this session will remain isolated from the rest of the data 
> set. Afterwards, the sstables are moved back into the unrepaired set.
>  
> During the prepare phase of an incremental repair, we isolate the data to be 
> repaired. However, we ignore other sstables marked pending repair for the 
> same token range. I think the intention here was to prevent a hung repair 
> from locking up incremental repairs for 24 hours without manual intervention. 
> Assuming the session succeeds, it’s data will be moved to repaired. _However 
> the data from a hung session will eventually be moved back to unrepaired._ 
> This means that you can’t use the most recent successful incremental repair 
> as the high water mark for fully repaired data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to