[ https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653743#comment-15653743 ]
ASF GitHub Bot commented on CASSANDRA-12796: -------------------------------------------- Github user mmajercik closed the pull request at: https://github.com/apache/cassandra/pull/82 > Heap exhaustion when rebuilding secondary index over a table with wide > partitions > --------------------------------------------------------------------------------- > > Key: CASSANDRA-12796 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12796 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Milan Majercik > Priority: Critical > > We have a table with rather wide partition and a secondary index defined over > it. As soon as we try to rebuild the index we observed exhaustion of Java > heap and eventual OOM error. After a lengthy investigation we have managed to > find a culprit which appears to be a wrong granule of barrier issuances in > method {{org.apache.cassandra.db.Keyspace.indexRow}}: > {code} > try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html} > { > Set<SecondaryIndex> indexes = > cfs.indexManager.getIndexesByNames(idxNames); > Iterator<ColumnFamily> pager = QueryPagers.pageRowLocally(cfs, > key.getKey(), DEFAULT_PAGE_SIZE); > while (pager.hasNext()) > { > ColumnFamily cf = pager.next(); > ColumnFamily cf2 = cf.cloneMeShallow(); > for (Cell cell : cf) > { > if (cfs.indexManager.indexes(cell.name(), indexes)) > cf2.addColumn(cell); > } > cfs.indexManager.indexRow(key.getKey(), cf2, opGroup); > } > } > {code} > Please note the operation group granule is a partition of the source table > which poses a problem for wide partition tables as flush runnable > ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed > with flushing secondary index memtable before completing operations prior > recent issue of the barrier. In our situation the flush runnable waits until > whole wide partition gets indexed into the secondary index memtable before > flushing it. This causes an exhaustion of the heap and eventual OOM error. > After we changed granule of barrier issue in method > {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to > table partition secondary index (see > [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]), > rebuild started to work without heap exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)