[ https://issues.apache.org/jira/browse/CASSANDRA-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388481#comment-15388481 ]
DOAN DuyHai commented on CASSANDRA-12268: ----------------------------------------- Additional remark: We can allow the user define the thresholds: * either a maximum number of mutation per logged batch to send to view paired-replica * or/and the maximum size (in bytes) of the mutation per logged batch to send to view paired-replica At runtime, either threshold will apply depending on the situation > Make MV Index creation robust for wide referent rows > ---------------------------------------------------- > > Key: CASSANDRA-12268 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12268 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Shook > > When creating an index for a materialized view for extant data, heap pressure > is very dependent on the cardinality of of rows associated with each index > value. With the way that per-index value rows are created within the index, > this can cause unbounded heap pressure, which can cause OOM. This appears to > be a side-effect of how each index row is applied atomically as with batches. > The commit logs can accumulate enough during the process to prevent the node > from being restarted. Given that this occurs during global index creation, > this can happen on multiple nodes, making stable recovery of a node set > difficult, as co-replicas become unavailable to assist in back-filling data > from commitlogs. > While it is understandable that you want to avoid having relatively wide rows > even in materialized views, this scenario represent a particularly difficult > scenario for triage. > The basic recommendation for improving this is to sub-group the index > creation into smaller chunks internally, providing a maximal bound against > the heap pressure when it is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)