[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107876#comment-15107876
 ] 

Anuj Wadehra edited comment on CASSANDRA-10446 at 1/20/16 3:08 AM:
-------------------------------------------------------------------

I think, this is an important issue. We should increase the priority and change 
the type from Improvement to Bug so that it gets due attention.

Consider following scenario and flow of events which demonstrate the importance 
of this issue:

Scenario: I have a 20 node cluster, RF=5, Read/Write Quorum, gc grace period=20 
days. I think that my Cassandra cluster is fault tolerant and it can afford 2 
node failures.

Suddenly, one node goes down due to some hardware issue. The failed node would 
prevent repair on many nodes in the cluster as it had approximately 5/20th 
share of total data ..1/20 which it owns and 4/20 which is stored as replica of 
data owned by other nodes. Now Its 10 days since the node is down, most of the 
nodes are not being repaired and now its DECISION time for me. I am not sure 
how soon the issue would be fixed may be next 2 days i.e. 8 days before gc 
grace period, so I shouldn't remove node early and add the node back as it 
would cause significant and unnecessary streaming due to token re-arrangement. 
At the same time, if I don't remove the failed node at this time i.e. 10 days 
after failure (much before gc grace) and wait for the issue to be resolved, my 
entire system health would be in question and it would be a panic situation as 
most of the data didn't get repaired in last 10 days and gc grace is 
approaching. I need sufficient time to repair all nodes before the gc grace 
period ends.

What looked like a fault tolerant Cassandra cluster which can easily afford 2 
node failure, will require urgent attention and manual decision making each 
time a single node goes down- just like it happened in the above scenario. 

If some replicas are down, we should allow Repair to proceed with remaining 
replicas. If failed nodes comes up before gc grace period, we would run repair 
to fix inconsistencies. Otherwise, we would discard failed node data and 
bootstrap. I think that would be a really robust fault tolerant system.




was (Author: eanujwa):
I think, this is an issue with the way we handled the scenario of a downed 
replica in repairs. We should increase the priority and change the type from 
Improvement to Bug so that it gets due attention.

Consider following scenario and flow of events which demonstrate the importance 
of this issue:

Scenario: I have a 20 node cluster, RF=5, Read/Write Quorum, gc grace period=20 
days. I think that my Cassandra cluster is fault tolerant and it can afford 2 
node failures.

Suddenly, one node goes down due to some hardware issue. The failed node would 
prevent repair on many nodes in the cluster as it had approximately 5/20th 
share of total data ..1/20 which it owns and 4/20 which is stored as replica of 
data owned by other nodes. Now Its 10 days since the node is down, most of the 
nodes are not being repaired and now its DECISION time for me. I am not sure 
how soon the issue would be fixed may be next 2 days i.e. 8 days before gc 
grace period, so I shouldn't remove node early and add the node back as it 
would cause significant and unnecessary streaming due to token re-arrangement. 
At the same time, if I don't remove the failed node at this time i.e. 10 days 
after failure (much before gc grace) and wait for the issue to be resolved, my 
entire system health would be in question and it would be a panic situation as 
most of the data didn't get repaired in last 10 days and gc grace is 
approaching. I need sufficient time to repair all nodes before the gc grace 
period ends.

What looked like a fault tolerant Cassandra cluster which can easily afford 2 
node failure, will require urgent attention and manual decision making each 
time a single node goes down- just like it happened in the above scenario. 

If some replicas are down, we should allow Repair to proceed with remaining 
replicas. If failed nodes comes up before gc grace period, we would run repair 
to fix inconsistencies. Otherwise, we would discard failed node data and 
bootstrap. I think that would be a really robust fault tolerant system.



> Run repair with down replicas
> -----------------------------
>
>                 Key: CASSANDRA-10446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>             Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to