[ 
https://issues.apache.org/jira/browse/CASSANDRA-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-12268:
---------------------------------------
    Description: 
When creating an index for a materialized view for extant data, heap pressure 
is very dependent on the cardinality of of rows associated with each index 
value. With the way that per-index value rows are created within the index, 
this can cause unbounded heap pressure, which can cause OOM. This appears to be 
a side-effect of how each index row is applied atomically as with batches.

The commit logs can accumulate enough during the process to prevent the node 
from being restarted. Given that this occurs during global index creation, this 
can happen on multiple nodes, making stable recovery of a node set difficult, 
as co-replicas become unavailable to assist in back-filling data from 
commitlogs.

While it is understandable that you want to avoid having relatively wide rows  
even in materialized views, this represents a particularly difficult scenario 
for triage.

The basic recommendation for improving this is to sub-group the index creation 
into smaller chunks internally, providing a maximal bound against the heap 
pressure when it is needed.

  was:
When creating an index for a materialized view for extant data, heap pressure 
is very dependent on the cardinality of of rows associated with each index 
value. With the way that per-index value rows are created within the index, 
this can cause unbounded heap pressure, which can cause OOM. This appears to be 
a side-effect of how each index row is applied atomically as with batches.

The commit logs can accumulate enough during the process to prevent the node 
from being restarted. Given that this occurs during global index creation, this 
can happen on multiple nodes, making stable recovery of a node set difficult, 
as co-replicas become unavailable to assist in back-filling data from 
commitlogs.

While it is understandable that you want to avoid having relatively wide rows  
even in materialized views, this scenario represent a particularly difficult 
scenario for triage.

The basic recommendation for improving this is to sub-group the index creation 
into smaller chunks internally, providing a maximal bound against the heap 
pressure when it is needed.


> Make MV Index creation robust for wide referent rows
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12268
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12268
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Shook
>
> When creating an index for a materialized view for extant data, heap pressure 
> is very dependent on the cardinality of of rows associated with each index 
> value. With the way that per-index value rows are created within the index, 
> this can cause unbounded heap pressure, which can cause OOM. This appears to 
> be a side-effect of how each index row is applied atomically as with batches.
> The commit logs can accumulate enough during the process to prevent the node 
> from being restarted. Given that this occurs during global index creation, 
> this can happen on multiple nodes, making stable recovery of a node set 
> difficult, as co-replicas become unavailable to assist in back-filling data 
> from commitlogs.
> While it is understandable that you want to avoid having relatively wide rows 
>  even in materialized views, this represents a particularly difficult 
> scenario for triage.
> The basic recommendation for improving this is to sub-group the index 
> creation into smaller chunks internally, providing a maximal bound against 
> the heap pressure when it is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to