Goodness Ayinmode created CASSANDRA-19959: ---------------------------------------------
Summary: Out of memory (OOM) risks due to unbound growth in collections Key: CASSANDRA-19959 URL: https://issues.apache.org/jira/browse/CASSANDRA-19959 Project: Cassandra Issue Type: Improvement Reporter: Goodness Ayinmode I noticed some methods with collections that could cause OOM issues. For example in [ Keyspace.getValidColumnFamilies,|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/Keyspace.java#L707] this method retrieves a set of valid ColumnFamilyStore objects based on the provided column family name. When cfNames.length == 0, it iterates over all the column family stores returned by getColumnFamilyStores() and then adds each to the valid set. For each cfstore, If autoAddIndexes is true, getIndexColumnFamilyStores(cfStore) is called and will add additional index column family stores to the set (valid). Since the set grows in size as more column families and indexes are added, when a large number of column families or indexes are all added at once, there is a potential for significant memory consumption increasing the risk of OOM errors. This risk also appears in [Sets$Literal.prepare|https://github.com/apache/cassandra/blob/662ce36a7be5a03560bb0395a4bced09d3c34a0c/src/java/org/apache/cassandra/cql3/Sets.java#L136], [PendingAntiCompaction$AcquisitionCallback.apply|https://github.com/apache/cassandra/blob/662ce36a7be5a03560bb0395a4bced09d3c34a0c/src/java/org/apache/cassandra/db/repair/PendingAntiCompaction.java#L291] , [RepairSession.start|https://github.com/apache/cassandra/blob/662ce36a7be5a03560bb0395a4bced09d3c34a0c/src/java/org/apache/cassandra/repair/RepairSession.java#L272], [RepairedState.addAll|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/repair/consistent/RepairedState.java#L208], [SEPExecutor.addTask|https://github.com/apache/cassandra/blob/662ce36a7be5a03560bb0395a4bced09d3c34a0c/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L119], [SystemDistributedKeyspace.startRepairs|https://github.com/apache/cassandra/blob/662ce36a7be5a03560bb0395a4bced09d3c34a0c/src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java#L226], [SingleTableUpdatesCollector.toMutations|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/cql3/statements/SingleTableUpdatesCollector.java#L95], [AbstractReplicaCollection.filter|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/locator/AbstractReplicaCollection.java#L504], [BatchMessage.execute|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/transport/messages/BatchMessage.java#L173] and [SystemKeyspace.tokensAsSet|https://github.com/apache/cassandra/blob/ea801625f64bdebf78cf03634e30a1fde037f965/src/java/org/apache/cassandra/db/SystemKeyspace.java#L887] with these methods having collections that show potential unbounded growth and can cause OOM issues. If processing all elements at once is not essential, an optimization could be to batch the processing of elements, by splitting the elements into batches of smaller chunks and accumulating the results in values per batch or assigning fixed sizes for the collections when initializing these collections. Please let me know if my analysis is wrong, or if you have any comments regarding the optimization suggestion. Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org