[jira] [Commented] (CASSANDRA-12245) initial view build can be parallel

JIRA Fri, 01 Dec 2017 03:09:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274261#comment-16274261
 ]


Andrés de la Peña commented on CASSANDRA-12245:
-----------------------------------------------

bq. One thing I noticed is that even though the builder task and the view 
builder is aborted, the other tasks of the same builder keep running. At least 
until we have the ability to start and stop view builders, I think that 
stopping a subtask should also abort the other subtasks of the same view 
builder - since the view builder will not complete anyway. What do you think? 
I've done this 
[here|https://github.com/pauloricardomg/cassandra/commit/81853218eee702b778ba801426ba19d48336cf77]
 and the tests didn't need any change. I've also extended {{SplitterTest}} with 
a couple more test cases here.

Makes sense. I started with a similar solution but changed to stop only the 
specified task in an attempt to better fulfill the {{nodetool stop}} 
documentation. But it's true that stopping all the tasks is more useful, 
especially without another way to stop the view build.

bq. The tests looks good, but sometimes they were failing on my machine because 
the view builder task finished on some nodes before they were stopped and also 
{{_wait_for_view_build_start}} did not guarantee the view builder started in 
all nodes before issuing {{nodetool stop VIEW_BUILD}}, so I fixed this [on this 
commit|https://github.com/pauloricardomg/cassandra-dtest/commit/667315e42bd2b7d04ac038e79149f1b0e63ba0f2].
 I also extended test_resume_stopped_build to verify that view was not built 
after abort 
([here|https://github.com/pauloricardomg/cassandra-dtest/commit/f4c3ad7ac9e4ea64576d669a1cf30b0ef4e02a3f]).

Nice fix, thanks!

I have merged your changes and rebased again, I'll commit if a final CI round 
looks well:
||[cassandra|https://github.com/adelapena/cassandra/tree/12245-trunk]||[cassandra-dtest|https://github.com/adelapena/cassandra-dtest/tree/12245]||
||[utest|http://jenkins-cassandra.datastax.lan/view/Dev/view/adelapena/job/adelapena-12245-trunk-testall/]||[dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/adelapena/job/adelapena-12245-trunk-dtest/]||


> initial view build can be parallel
> ----------------------------------
>
>                 Key: CASSANDRA-12245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Materialized Views
>            Reporter: Tom van der Woerdt
>            Assignee: Andrés de la Peña
>             Fix For: 4.x
>
>
> On a node with lots of data (~3TB) building a materialized view takes several 
> weeks, which is not ideal. It's doing this in a single thread.
> There are several potential ways this can be optimized :
>  * do vnodes in parallel, instead of going through the entire range in one 
> thread
>  * just iterate through sstables, not worrying about duplicates, and include 
> the timestamp of the original write in the MV mutation. since this doesn't 
> exclude duplicates it does increase the amount of work and could temporarily 
> surface ghost rows (yikes) but I guess that's why they call it eventual 
> consistency. doing it this way can avoid holding references to all tables on 
> disk, allows parallelization, and removes the need to check other sstables 
> for existing data. this is essentially the 'do a full repair' path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12245) initial view build can be parallel

Reply via email to