[ 
https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938595#comment-13938595
 ] 

Yuki Morishita commented on CASSANDRA-6455:
-------------------------------------------

Here's my work in progress: https://github.com/yukim/cassandra/tree/6455-1 
(against cassandra-2.1 branch)

For improvement, I upgraded repair execution flow with guava's 
[ListenableFuture/ListeningExecutorService|https://code.google.com/p/guava-libraries/wiki/ListenableFutureExplained],
 so we can perform promise pipeline on various async and distributed tasks(main 
execution flow is here: 
https://github.com/yukim/cassandra/blob/6455-1/src/java/org/apache/cassandra/repair/RepairJob.java#L75).
 

Also I replaced AntiEntropySessions thread pool which runs RepairSessions with 
one time ListeningExecutorService that runs RepairJobs. This allows concurrent 
RepairJob process and ease of progress tracking throw nodetool tpstats.
(This version is still use hard coded 1 thread for thread pool, so this needs 
to be configurable through nodetool command.)

Other improvements include:

* RepairSession is ListenableFuture that returns repair stats. This enables 
repair command to print out progress as RepairSession finishes, with more 
information. (Though stats collected is only number of differences between 
MerkleTree, and this wip version does not change any output message yet.)
* Removed unnecessary local messaging on coordinator node.

I will work on providing repair command option for concurrency to finalize the 
patch.
Comments or concerns are welcome.

> Improve concurrency of repair process
> -------------------------------------
>
>                 Key: CASSANDRA-6455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6455
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>
> Currently, most of the repair tasks (taking snapshots, send/receiving merkle 
> tree, compute MT difference, etc) are done on single threaded 
> AntiEntropyStage.
> This causes a problem like CASSANDRA-6415 and likely to cause unnecessary 
> wait.
> Also, repair is done one CF at the time. I think we can parallelize 
> this(concurrency is configurable by a user based on # of CF and load of the 
> nodes) for faster processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to