[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958647#comment-13958647
 ] 

Benedict commented on CASSANDRA-6696:
-------------------------------------

Just a suggestion (not 100% certain it is better, but it seems cleaner to me):

Once this feature is activated by the user, it might be easier to have an 
upgrade period during which sstables are migrated using DiskAwareWriter, but 
after which we know that the constraints hold. This would allow us to mostly 
leave the code unchanged in a few places (e.g. scrubber, compactiontask) which 
are already (prior to this ticket) a little on the complex side. It also seems 
like it would be easier to reason about behaviour in the future if we know 
these constraints are safely imposed, whereas using DiskAwareWriter leaves you 
with the impression we're never quite sure if the files obey our constraints or 
not.

Really it's not a major issue, but worth considering.

One other minor thing (more certain about this one though): perDiskExecutor 
should be an array of executors, one per disk; any configurable parallelism 
then should affect the number of threads each executor is given. Otherwise 
could get uneven distribution of work to the disks (especially as we add tasks 
in disk order, so if multiple tasks get queued at once, we'll get clumping of 
tasks by disk, reducing throughput on some disks through over-utilisation, and 
under-utilising the others.)

> Drive replacement in JBOD can cause data to reappear. 
> ------------------------------------------------------
>
>                 Key: CASSANDRA-6696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Assignee: Marcus Eriksson
>             Fix For: 3.0
>
>
> In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
> empty one and repair is run. 
> This can cause deleted data to come back in some cases. Also this is true for 
> corrupt stables in which we delete the corrupt stable and run repair. 
> Here is an example:
> Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
> row=sankalp col=sankalp is written 20 days back and successfully went to all 
> three nodes. 
> Then a delete/tombstone was written successfully for the same row column 15 
> days back. 
> Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
> since it got compacted with the actual data. So there is no trace of this row 
> column in node A and B.
> Now in node C, say the original data is in drive1 and tombstone is in drive2. 
> Compaction has not yet reclaimed the data and tombstone.  
> Drive2 becomes corrupt and was replaced with new empty drive. 
> Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
> has come back to life. 
> Now after replacing the drive we run repair. This data will be propagated to 
> all nodes. 
> Note: This is still a problem even if we run repair every gc grace. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to