[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-15392:
------------------------------------
    Status: Changes Suggested  (was: Review In Progress)

MergeIteratorPool:
 *  #initialValue() isn’t used or at a minimum doesn’t need {{throws 
Exception}}


MergeIterator:
 * {{#returnToPool()}}: Instead of hard-coding each concrete class to their 
specific pools, consider adding a pool reference to MergeIterator and defining 
{{#returnToPool()}} as {{(if (pool != null) pool.put(this))}}. The only 
downside I can see here is the additional field will add to the memory overhead 
of MergeIterator which is antithesis to this patch and it already adds a field. 
On the plus side it could make future tests easier, reduces reliance on more 
statics, and cleans up the code a bit. It would also allow for a version where 
we don’t rely on type erasure in the future.
 * Really minor nit: the naming of {{#getSimple()}} made me think of 
{{TrivialOneToOne}} iterators since they are synonymous. Consider 
{{getInternal() }}/ {{getFromPool() }}/ {{getPooled()}} instead.
 * ManyToOne.DEFAULT_SIZE could be tunable so if we do find a pathological 
workload in the future it doesn’t require a re-compile to address
 * If my understanding is correct, we keep the {{candidates}} array in 
{{ManyToOne}} so we can {{#close()}} them all so they can be recycled. If thats 
true, I think it may be possible to get rid of the {{candidates}} array in 
{{ManyToOne}} by doing the following: 1) Change Candidate#advance to return a 
boolean. 2) Pass that boolean to {{ManyToOne#replaceAndSink}} to use in the 
first if block — where {{Candidate#close()}} is called. This would allow the 
remaining heap to be iterated over, instead of the {{candidates}} array in 
{{ManyToOne#close()}}.

SASI Changes:
 * While I don’t see anything wrong with these changes they might detract from 
the efficacy of this patch when SASI is being used. In particular, the 
MergeIterator created by OnDiskToken#iterator() is almost always of size 1 (its 
only >=2 if there is a DecoratedKey hash collision). This means we are almost 
always wasting a {{TrivialOneToOne}} iterator when we don’t need one and in the 
case where there is a collision we will use a {{ManyToOne}} whose 
{{candidates}} array is 30 elements too large. I’d add a {{if (keys.size() == 
1) returns keys.get(0)}} and bypass the pool for the rare case a collision 
causes a {{ManyToOne}} to be instantiated.

Other:
 * I can imagine an uncommon workloads / configuration that compacted and read 
infrequently that could benefit from the queues being able to shrink. Do you 
think its worth exploring or benchmarking? 

> Pool Merge Iterators
> --------------------
>
>                 Key: CASSANDRA-15392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need 
> them, we can reduce garbage on the compaction and read paths under relevant 
> workloads by ~4% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to