[ https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938595#comment-13938595 ]
Yuki Morishita commented on CASSANDRA-6455: ------------------------------------------- Here's my work in progress: https://github.com/yukim/cassandra/tree/6455-1 (against cassandra-2.1 branch) For improvement, I upgraded repair execution flow with guava's [ListenableFuture/ListeningExecutorService|https://code.google.com/p/guava-libraries/wiki/ListenableFutureExplained], so we can perform promise pipeline on various async and distributed tasks(main execution flow is here: https://github.com/yukim/cassandra/blob/6455-1/src/java/org/apache/cassandra/repair/RepairJob.java#L75). Also I replaced AntiEntropySessions thread pool which runs RepairSessions with one time ListeningExecutorService that runs RepairJobs. This allows concurrent RepairJob process and ease of progress tracking throw nodetool tpstats. (This version is still use hard coded 1 thread for thread pool, so this needs to be configurable through nodetool command.) Other improvements include: * RepairSession is ListenableFuture that returns repair stats. This enables repair command to print out progress as RepairSession finishes, with more information. (Though stats collected is only number of differences between MerkleTree, and this wip version does not change any output message yet.) * Removed unnecessary local messaging on coordinator node. I will work on providing repair command option for concurrency to finalize the patch. Comments or concerns are welcome. > Improve concurrency of repair process > ------------------------------------- > > Key: CASSANDRA-6455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6455 > Project: Cassandra > Issue Type: Improvement > Reporter: Yuki Morishita > Assignee: Yuki Morishita > Priority: Minor > > Currently, most of the repair tasks (taking snapshots, send/receiving merkle > tree, compute MT difference, etc) are done on single threaded > AntiEntropyStage. > This causes a problem like CASSANDRA-6415 and likely to cause unnecessary > wait. > Also, repair is done one CF at the time. I think we can parallelize > this(concurrency is configurable by a user based on # of CF and load of the > nodes) for faster processing. -- This message was sent by Atlassian JIRA (v6.2#6252)