[ https://issues.apache.org/jira/browse/CASSANDRA-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053302#comment-13053302 ]
Sylvain Lebresne commented on CASSANDRA-2811: --------------------------------------------- The question that remains is whether we prefer adding a specific mono-threaded executor for validation compaction (could make sense) or simply introduce a validationCompactionLock. > Repair doesn't stagger flushes > ------------------------------ > > Key: CASSANDRA-2811 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2811 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.8.0 > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 0.8.2 > > > When you do a nodetool repair (with no options), the following things occured: > * For each keyspace, a call to SS.forceTableRepair is issued > * In each of those calls: for each token range the node is responsible for, a > repair session is created and started > * Each of these session will request one merkle tree by column family (to > each node for which it makes sense, which includes the node the repair is > started on) > All those merkle tree requests are done basically at the same time. And now > that compaction is multi-threaded, this means that usually more than one > validation compaction will be started at the same time. The problem is that a > validation compaction starts by a flush. Given that by default the > flush_queue_size is 4 and the number of compaction thread is the number of > processors and given that on any recent machine the number of core will be >= > 4, this means that this will easily end up blocking write for some period of > time. > It turns out to also have a more subtle problem for repair itself. If two > validation compaction for the same column family (but different range) are > started in a very short time interval, the first validation will block on the > flush, but the second one may not block at all if the memtable is clean when > it request it's own flush. In which case that second validation will be > executed on data older than it should. > I think the simpler fix is to make sure we only ever do one validation > compaction at a time. It's probably a better use of resources anyway. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira