[ https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138233#comment-16138233 ]
Andrés de la Peña commented on CASSANDRA-12245: ----------------------------------------------- [Here|https://github.com/apache/cassandra/compare/trunk...adelapena:12245-trunk] is a new version of the patch addressing the review comments. The updated dtests are [here|https://github.com/apache/cassandra-dtest/compare/master...adelapena:12245]. bq. It would be nice to maybe try to reuse the {{Splitter}} methods if possible, so we can reuse tests, or if that's not straightforward maybe put the methods on splitter and add some tests to make sure it's working correctly. I have moved the [methods to split the ranges|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L137-L187] to the {{Splitter}}, reusing its [{{valueForToken}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L47] method. Tests [here|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/test/unit/org/apache/cassandra/dht/SplitterTest.java#L162-L224]. bq. Can probably remove the generation field from the builds in progress table and remove this comment Removed. bq. {{views_builds_in_progress_v2}} sounds a bit hacky, so perhaps we should call it {{system.view_builds_in_progress}} (remove the s) and also add a NOTICE entry informing the previous table was replaced and data files can be removed. Renamed to {{system.view_builds_in_progress}}. Added an [NEWS.txt entry|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/NEWS.txt#L183-L185] informing about the replacement. bq. I'm a bit concerned about starving the compaction executor for a long period during view build of large base tables, so we should probably have another option like {{concurret_view_builders}} with a conservative default and perhaps control the concurrency at the {{ViewBuilderController}}. WDYT? Agree. I have added a new dedicated [executor|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L126] in the {{CompactionManager}}, similar to the executors used for validation and cache cleanup. The concurrency of this executor is determined by the new config property [{{concurrent_materialized_view_builders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/conf/cassandra.yaml#L756], which defaults to a perhaps too much conservative value of {{1}}. This property can be modified through both JMX and the new [{{setconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/SetConcurrentViewBuilders.java] and [{{getconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/GetConcurrentViewBuilders.java] nodetool commands. These commands are tested [here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/nodetool_test.py#L162-L190]. I'm not sure about if it still makes sense for the builder task to extend {{CompactionInfo.Holder}}. If so, I'm neither sure about how to use {{prevToken.size(range.right)}} (that returns a {{double}}) to create {{CompationInfo}} objects. WDYT? bq. ViewBuilder seems to be reimplementing some of the logic of PartitionRangeReadCommand, so I wonder if we shoud take this chance to simplify and use that instead of manually constructing the commands via ReducingKeyIterator and multiple SinglePartitionReadCommands? We can totally do this in other ticket if you prefer. I would prefer to do this in another ticket. bq. Perform view marking on ViewBuilderController instead of ViewBuilder I have moved the marking of system tables (and the retries in case of failure) from the {{ViewBuilderTask}} to the {{ViewBuilder}}, using a callback to do the marking. I think the code is clearer this way. bq. Updating the view built status at every key is perhaps a bit inefficient and unnecessary, so perhaps we should update it every 1000 keys or so. Done, it is updated [every 1000 keys|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/view/ViewBuilderTask.java#L62]. It doesn't seem to make a great difference in some small benchmarks that I have run. bq. Would be nice to update the {{interrupt_build_process_test}} to stop halfway through the build (instead of the start of the build) and verify it it's being resumed correctly with the new changes. Updated [here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/materialized_views_test.py#L935-L969]. It also uses [a byteman script|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/byteman/skip_view_build_finalization.btm] to make sure that the MV build isn't finished before stopping the cluster, which is more likely to happen. CI results seem ok, there are no failures in unit tests and the failing dtests seem unrelated. > initial view build can be parallel > ---------------------------------- > > Key: CASSANDRA-12245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12245 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views > Reporter: Tom van der Woerdt > Assignee: Andrés de la Peña > Fix For: 4.x > > > On a node with lots of data (~3TB) building a materialized view takes several > weeks, which is not ideal. It's doing this in a single thread. > There are several potential ways this can be optimized : > * do vnodes in parallel, instead of going through the entire range in one > thread > * just iterate through sstables, not worrying about duplicates, and include > the timestamp of the original write in the MV mutation. since this doesn't > exclude duplicates it does increase the amount of work and could temporarily > surface ghost rows (yikes) but I guess that's why they call it eventual > consistency. doing it this way can avoid holding references to all tables on > disk, allows parallelization, and removes the need to check other sstables > for existing data. this is essentially the 'do a full repair' path -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org