[jira] [Commented] (CASSANDRA-6455) Improve concurrency of repair process

Yuki Morishita (JIRA) Tue, 01 Jul 2014 08:56:52 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048988#comment-14048988
 ]


Yuki Morishita commented on CASSANDRA-6455:
-------------------------------------------

Pushed latest version: https://github.com/yukim/cassandra/tree/6455-v3

bq. Seems the rebase lost CASSANDRA-3569 - we need to unregister from the FD 
once all validation messages have arrived.

Added.

bq. We should probably cap how big X we can have in -j X - really easy to OOM 
the nodes involved if you put a big X in.

Right. I thought about the right number to cap and came up with 4 because we 
don't want to push too much anyway. I also updated command option description 
to clarify.

bq. Should we make the taskExecutor in RepairSession static?

The reason I made taskExecutor local to RepairSession instance is to cancel all 
submitted tasks when session faild. I left this as is in the latest version.

bq. Why do we add ourselves as a no-op StreamEventHandler in 
LocalSyncTask/StreamingRepairTask when creating the StreamPlan?

handleStreamEvent as well as onSuccess/onFailure is part of StreamEventHandler. 
We don't need to handle only on success/failure but not other events like 
progress.

> Improve concurrency of repair process
> -------------------------------------
>
>                 Key: CASSANDRA-6455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6455
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 6455-3.0.txt, 6455.txt
>
>
> Currently, most of the repair tasks (taking snapshots, send/receiving merkle 
> tree, compute MT difference, etc) are done on single threaded 
> AntiEntropyStage.
> This causes a problem like CASSANDRA-6415 and likely to cause unnecessary 
> wait.
> Also, repair is done one CF at the time. I think we can parallelize 
> this(concurrency is configurable by a user based on # of CF and load of the 
> nodes) for faster processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6455) Improve concurrency of repair process

Reply via email to