[ https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paulo Motta updated CASSANDRA-12245: ------------------------------------ Reviewer: Paulo Motta > initial view build can be parallel > ---------------------------------- > > Key: CASSANDRA-12245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12245 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views > Reporter: Tom van der Woerdt > Assignee: Andrés de la Peña > Fix For: 4.x > > > On a node with lots of data (~3TB) building a materialized view takes several > weeks, which is not ideal. It's doing this in a single thread. > There are several potential ways this can be optimized : > * do vnodes in parallel, instead of going through the entire range in one > thread > * just iterate through sstables, not worrying about duplicates, and include > the timestamp of the original write in the MV mutation. since this doesn't > exclude duplicates it does increase the amount of work and could temporarily > surface ghost rows (yikes) but I guess that's why they call it eventual > consistency. doing it this way can avoid holding references to all tables on > disk, allows parallelization, and removes the need to check other sstables > for existing data. this is essentially the 'do a full repair' path -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org