[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977089#comment-13977089
 ] 

Tupshin Harper commented on CASSANDRA-6696:
-------------------------------------------

bq. or we probably need to have a dynamic allocation strategy, and the problem 
with that is that when the token range gets redistributed by node 
additions/removals, the whole cluster suddenly needs to start kicking off 
rebalancing of their local disks.
A node addition will add 256 vnodes to the ring. Unless I misunderstand, this 
will be DC-local resizing of vnodes, and that if the cluster is huge, there 
will still only be 256 (times RF?) different resize operations that have to 
take place in that DC. So there is a finite cap on the amount of work needed to 
be performed per node addition (and presumably removals), and that cap is 
actually bounded by vnodes per node, and not by cluster size.
If true, then Jonathan's solution feels good enough, since the upper bound is 
reasonably constrained. Not saying I wouldn't prefer doing less overall work, 
though.

> Drive replacement in JBOD can cause data to reappear. 
> ------------------------------------------------------
>
>                 Key: CASSANDRA-6696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Assignee: Marcus Eriksson
>             Fix For: 3.0
>
>
> In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
> empty one and repair is run. 
> This can cause deleted data to come back in some cases. Also this is true for 
> corrupt stables in which we delete the corrupt stable and run repair. 
> Here is an example:
> Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
> row=sankalp col=sankalp is written 20 days back and successfully went to all 
> three nodes. 
> Then a delete/tombstone was written successfully for the same row column 15 
> days back. 
> Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
> since it got compacted with the actual data. So there is no trace of this row 
> column in node A and B.
> Now in node C, say the original data is in drive1 and tombstone is in drive2. 
> Compaction has not yet reclaimed the data and tombstone.  
> Drive2 becomes corrupt and was replaced with new empty drive. 
> Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
> has come back to life. 
> Now after replacing the drive we run repair. This data will be propagated to 
> all nodes. 
> Note: This is still a problem even if we run repair every gc grace. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to