[ https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jordan West updated CASSANDRA-15392: ------------------------------------ Status: Changes Suggested (was: Review In Progress) MergeIteratorPool: * #initialValue() isn’t used or at a minimum doesn’t need {{throws Exception}} MergeIterator: * {{#returnToPool()}}: Instead of hard-coding each concrete class to their specific pools, consider adding a pool reference to MergeIterator and defining {{#returnToPool()}} as {{(if (pool != null) pool.put(this))}}. The only downside I can see here is the additional field will add to the memory overhead of MergeIterator which is antithesis to this patch and it already adds a field. On the plus side it could make future tests easier, reduces reliance on more statics, and cleans up the code a bit. It would also allow for a version where we don’t rely on type erasure in the future. * Really minor nit: the naming of {{#getSimple()}} made me think of {{TrivialOneToOne}} iterators since they are synonymous. Consider {{getInternal() }}/ {{getFromPool() }}/ {{getPooled()}} instead. * ManyToOne.DEFAULT_SIZE could be tunable so if we do find a pathological workload in the future it doesn’t require a re-compile to address * If my understanding is correct, we keep the {{candidates}} array in {{ManyToOne}} so we can {{#close()}} them all so they can be recycled. If thats true, I think it may be possible to get rid of the {{candidates}} array in {{ManyToOne}} by doing the following: 1) Change Candidate#advance to return a boolean. 2) Pass that boolean to {{ManyToOne#replaceAndSink}} to use in the first if block — where {{Candidate#close()}} is called. This would allow the remaining heap to be iterated over, instead of the {{candidates}} array in {{ManyToOne#close()}}. SASI Changes: * While I don’t see anything wrong with these changes they might detract from the efficacy of this patch when SASI is being used. In particular, the MergeIterator created by OnDiskToken#iterator() is almost always of size 1 (its only >=2 if there is a DecoratedKey hash collision). This means we are almost always wasting a {{TrivialOneToOne}} iterator when we don’t need one and in the case where there is a collision we will use a {{ManyToOne}} whose {{candidates}} array is 30 elements too large. I’d add a {{if (keys.size() == 1) returns keys.get(0)}} and bypass the pool for the rare case a collision causes a {{ManyToOne}} to be instantiated. Other: * I can imagine an uncommon workloads / configuration that compacted and read infrequently that could benefit from the queues being able to shrink. Do you think its worth exploring or benchmarking? > Pool Merge Iterators > -------------------- > > Key: CASSANDRA-15392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15392 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Normal > Fix For: 4.0 > > > By pooling merge iterators, instead of creating new ones each time we need > them, we can reduce garbage on the compaction and read paths under relevant > workloads by ~4% in many cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org