[ 
https://issues.apache.org/jira/browse/CASSANDRA-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053302#comment-13053302
 ] 

Sylvain Lebresne commented on CASSANDRA-2811:
---------------------------------------------

The question that remains is whether we prefer adding a specific mono-threaded 
executor for validation compaction (could make sense) or simply introduce a 
validationCompactionLock.

> Repair doesn't stagger flushes
> ------------------------------
>
>                 Key: CASSANDRA-2811
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2811
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>
> When you do a nodetool repair (with no options), the following things occured:
> * For each keyspace, a call to SS.forceTableRepair is issued
> * In each of those calls: for each token range the node is responsible for, a 
> repair session is created and started
> * Each of these session will request one merkle tree by column family (to 
> each node for which it makes sense, which includes the node the repair is 
> started on)
> All those merkle tree requests are done basically at the same time. And now 
> that compaction is multi-threaded, this means that usually more than one 
> validation compaction will be started at the same time. The problem is that a 
> validation compaction starts by a flush. Given that by default the 
> flush_queue_size is 4 and the number of compaction thread is the number of 
> processors and given that on any recent machine the number of core will be >= 
> 4, this means that this will easily end up blocking write for some period of 
> time.
> It turns out to also have a more subtle problem for repair itself. If two 
> validation compaction for the same column family (but different range) are 
> started in a very short time interval, the first validation will block on the 
> flush, but the second one may not block at all if the memtable is clean when 
> it request it's own flush. In which case that second validation will be 
> executed on data older than it should.
> I think the simpler fix is to make sure we only ever do one validation 
> compaction at a time. It's probably a better use of resources anyway. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to