[jira] [Commented] (CASSANDRA-12245) initial view build can be parallel

JIRA Wed, 23 Aug 2017 04:13:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138233#comment-16138233
 ]


Andrés de la Peña commented on CASSANDRA-12245:
-----------------------------------------------

[Here|https://github.com/apache/cassandra/compare/trunk...adelapena:12245-trunk]
 is a new version of the patch addressing the review comments. The updated 
dtests are 
[here|https://github.com/apache/cassandra-dtest/compare/master...adelapena:12245].

bq. It would be nice to maybe try to reuse the {{Splitter}} methods if 
possible, so we can reuse tests, or if that's not straightforward maybe put the 
methods on splitter and add some tests to make sure it's working correctly.

I have moved the [methods to split the 
ranges|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L137-L187]
 to the {{Splitter}}, reusing its 
[{{valueForToken}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L47]
 method. Tests 
[here|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/test/unit/org/apache/cassandra/dht/SplitterTest.java#L162-L224].

bq. Can probably remove the generation field from the builds in progress table 
and remove this comment

Removed.

bq. {{views_builds_in_progress_v2}} sounds a bit hacky, so perhaps we should 
call it {{system.view_builds_in_progress}} (remove the s) and also add a NOTICE 
entry informing the previous table was replaced and data files can be removed.

Renamed to {{system.view_builds_in_progress}}. Added an [NEWS.txt 
entry|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/NEWS.txt#L183-L185]
 informing about the replacement.

bq. I'm a bit concerned about starving the compaction executor for a long 
period during view build of large base tables, so we should probably have 
another option like {{concurret_view_builders}} with a conservative default and 
perhaps control the concurrency at the {{ViewBuilderController}}. WDYT?

Agree. I have added a new dedicated 
[executor|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L126]
 in the {{CompactionManager}}, similar to the executors used for validation and 
cache cleanup. The concurrency of this executor is determined by the new config 
property 
[{{concurrent_materialized_view_builders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/conf/cassandra.yaml#L756],
 which defaults to a perhaps too much conservative value of {{1}}. This 
property can be modified through both JMX and the new 
[{{setconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/SetConcurrentViewBuilders.java]
 and 
[{{getconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/GetConcurrentViewBuilders.java]
 nodetool commands. These commands are tested 
[here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/nodetool_test.py#L162-L190].

I'm not sure about if it still makes sense for the builder task to extend 
{{CompactionInfo.Holder}}. If so, I'm neither sure about how to use 
{{prevToken.size(range.right)}} (that returns a {{double}}) to create 
{{CompationInfo}} objects. WDYT?

bq. ViewBuilder seems to be reimplementing some of the logic of 
PartitionRangeReadCommand, so I wonder if we shoud take this chance to simplify 
and use that instead of manually constructing the commands via 
ReducingKeyIterator and multiple SinglePartitionReadCommands? We can totally do 
this in other ticket if you prefer.

I would prefer to do this in another ticket.

bq. Perform view marking on ViewBuilderController instead of ViewBuilder

I have moved the marking of system tables (and the retries in case of failure) 
from the {{ViewBuilderTask}} to the {{ViewBuilder}}, using a callback to do the 
marking. I think the code is clearer this way.

bq. Updating the view built status at every key is perhaps a bit inefficient 
and unnecessary, so perhaps we should update it every 1000 keys or so.

Done, it is updated [every 1000 
keys|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/view/ViewBuilderTask.java#L62].
 It doesn't seem to make a great difference in some small benchmarks that I 
have run.

bq. Would be nice to update the {{interrupt_build_process_test}} to stop 
halfway through the build (instead of the start of the build) and verify it 
it's being resumed correctly with the new changes.

Updated 
[here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/materialized_views_test.py#L935-L969].
 It also uses [a byteman 
script|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/byteman/skip_view_build_finalization.btm]
 to make sure that the MV build isn't finished before stopping the cluster, 
which is more likely to happen.

CI results seem ok, there are no failures in unit tests and the failing dtests 
seem unrelated.

> initial view build can be parallel
> ----------------------------------
>
>                 Key: CASSANDRA-12245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Materialized Views
>            Reporter: Tom van der Woerdt
>            Assignee: Andrés de la Peña
>             Fix For: 4.x
>
>
> On a node with lots of data (~3TB) building a materialized view takes several 
> weeks, which is not ideal. It's doing this in a single thread.
> There are several potential ways this can be optimized :
>  * do vnodes in parallel, instead of going through the entire range in one 
> thread
>  * just iterate through sstables, not worrying about duplicates, and include 
> the timestamp of the original write in the MV mutation. since this doesn't 
> exclude duplicates it does increase the amount of work and could temporarily 
> surface ghost rows (yikes) but I guess that's why they call it eventual 
> consistency. doing it this way can avoid holding references to all tables on 
> disk, allows parallelization, and removes the need to check other sstables 
> for existing data. this is essentially the 'do a full repair' path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12245) initial view build can be parallel

Reply via email to