[ 
https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080228#comment-16080228
 ] 

Andrés de la Peña commented on CASSANDRA-12245:
-----------------------------------------------

[Here|https://github.com/apache/cassandra/compare/trunk...adelapena:12245-trunk]
 is a first draft of the patch.

At first, it seems that the current implementation is not properly retrieving 
the view build status because of a mistake at 
[{{SystemKeyspace.getViewBuildStatus}}|https://github.com/apache/cassandra/blob/9c6f87c35f364ec6a88775cb3d0cf143e36635e7/src/java/org/apache/cassandra/db/SystemKeyspace.java#L501-L521].
 This makes the view build to always start from the beginning.

The provided patch does the following:
* {{ViewBuilder}} changes to be responsible for the building of a view for a 
single given token range.
* The newly created {{ViewBuilderController}} reads the local token ranges, 
splits them to satisfy a concurrency factor, and runs a {{ViewBuilder}} for 
each of them.
* Each {{ViewBuilder}} task calls to {{ViewBuilderController}} at termination 
to know if it is the last pending build task. If it is, it marks the view as 
built.
* When {{ViewBuilderController}} receives the finalization signal of the last 
{{ViewBuilder}}, it double-checks if there are new local ranges that weren't 
considered at the beginning of the build. If there are new ranges, new 
{{ViewBuilder}}s are created for them.
* Given that we have a {{ViewBuilder}} per local range, the key of the table 
{{system.views_builds_in_progress}} is modified to include the bounds of the 
token range. So, we will have an entry in the table per each {{ViewBuilder}}. 
The number of covered keys per range is also recorded in the table.

Some considerations about the implementation:
* {{ViewBuilder}} and {{ViewBuilderController}} are probably not the best 
names. Maybe we could rename {{ViewBuilder}} to something like 
{{ViewBuilderTask}} or {{ViewBuilderForRange}}, and rename 
{{ViewBuilderController}} to {{ViewBuilder}}.
* The concurrency factor is based on {{conf.concurrent_compactors}} because the 
views are built on the {{CompactionManager}}, but we may be interested in a 
different value.
* The patch tries to evenly split the token ranges in the minimum number of 
parts to satisfy the concurrency factor, and it never merges ranges. So, with 
the default 256 virtual nodes (and a lesser concurrency factor) we create 256 
build tasks. We might be interested in a different planning. If we want the 
number of tasks to be lesser than the number of local ranges we should modify 
the {{ViewBuilder}} task to be responsible for several ranges, although it will 
complicate the status tracking.
* Probably there is a better way of implementing 
{{ViewBuilder.getCompactionInfo}}. The patch uses 
{{keysBuilt}}/{{ColumnFamilyStore.estimatedKeysForRange}} to estimate the 
completion, which could deal to have task completion status over 100%, 
depending on the estimation.

[~slebresne], [~tjake], what do you think?

> initial view build can be parallel
> ----------------------------------
>
>                 Key: CASSANDRA-12245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tom van der Woerdt
>            Assignee: Andrés de la Peña
>
> On a node with lots of data (~3TB) building a materialized view takes several 
> weeks, which is not ideal. It's doing this in a single thread.
> There are several potential ways this can be optimized :
>  * do vnodes in parallel, instead of going through the entire range in one 
> thread
>  * just iterate through sstables, not worrying about duplicates, and include 
> the timestamp of the original write in the MV mutation. since this doesn't 
> exclude duplicates it does increase the amount of work and could temporarily 
> surface ghost rows (yikes) but I guess that's why they call it eventual 
> consistency. doing it this way can avoid holding references to all tables on 
> disk, allows parallelization, and removes the need to check other sstables 
> for existing data. this is essentially the 'do a full repair' path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to