[ 
https://issues.apache.org/jira/browse/CASSANDRA-11264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241220#comment-15241220
 ] 

Paulo Motta commented on CASSANDRA-11264:
-----------------------------------------

After having a look at your original patch I saw that a failed task will be 
re-prioritized against other scheduled jobs/tasks with a high priority (given 
its last run time will not be updated), so that's already a retry mechanism in 
itself.

Rather than cluttering the scheduled repair mechanism with retry logic, I think 
that it's better to add a retry option to (non-scheduled) repair job, and do 
more fine grained retry on individual steps such as validation and sync, since 
this will be more effective against transient failures rather than retrying the 
whole task and potentially losing work of non-failed tasks.

We can of course log warns and gather statistics when a scheduled task fails, 
but I think we should add retry support to repair independently of this. WDYT?

> Repair scheduling - Failure handling and retry
> ----------------------------------------------
>
>                 Key: CASSANDRA-11264
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11264
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>
> Make it possible for repairs to be run again if they fail and clean up the 
> associated resources (validations and streaming sessions) before retrying. 
> Log a warning for each re-attempt and an error if it can't complete in X 
> times. The number of retries before considering the repair a failure could be 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to