[ https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010853#comment-17010853 ]
Benedict Elliott Smith edited comment on CASSANDRA-15392 at 1/8/20 5:34 PM: ---------------------------------------------------------------------------- MergeIteratorPool: * As Jordan suggests, storing the pool inside the \{{MergeIterator}} makes sense - more so because it can avoid incurring the cost of \{{ThreadLocal.get()}} on \{{returnToPool}}, by stashing a link to an inner object (that guards put on the identity of the \{{Thread}} owning the queue) * Perhaps an \{{ArrayList}} is a better idea for collection type, as it will be generally garbage-free, and occupy less space than a \{{LinkedList}}? An \{{ArrayDeque}} could maintain the current behaviour, or you could pop from the back * Perhaps the queue would be better with fixed size, permitting at most two or three items to be stored? * As it happens, I’ve been working on very similar parts of the codebase, and I have implemented my own \{{TinyThreadLocalPool}} in the patches I hope to land soon, that does the above but has lower fixed-overhead and less indirection by permitting at most 3 items, stored in in-line properties (i.e. \{{val0}}, \{{val1}} and \{{val2}}). It might be worthwhile introducing that class here for consideration/comparison, and perhaps some amalgamation of the two used in whichever patch lands first (which I fully expect to be this one)? {\{MergeIterator.OneToOne}}: * While we’re here optimising, it probably makes sense to merge \{{OneToOne}} and \{{TrivialOneToOne}}, simply permitting the reducer in \{{OneToOne}} to be \{{null}}, and to switch behaviour based on a null check. This should permit call-sites to become bimorphic that are presently megamorphic, and the branch should be near perfectly predicted. {\{MergeIterator.ManyToOne}}: * Instead of using a separate \{{candidates}} array for reusing a \{{Candidate}}, could we avoid setting the last heap element to null by instead placing the now-defunct item there? * Do we need \{{active}} when it is implied by \{{MergeIterator.iterators != null}} and \{{Candidate.iter != null}}? * Do we need to set primitives to \{{-1}} or \{{false}} on \{{close}}, given we will throw \{{NullPointerException}} if we haven’t invoked \{{reset}}? Benchmarking: * It’s annoying work, but alongside the garbage benchmarks, it would be nice to see some simple JMH throughput benchmarks introduced, to see what the overall impact of the change is. I don’t mind helping out here. nits: * \{{ManyToOne.DEFAULT_SIZE}} -> \{{ManyToOne.POOLED_MERGE_LIMIT}}? * +1 Jordan’s nit about naming of {{getSimple}} * \{{MergeIteratorPool.queues.initialValue}} should not declare {{throw Exception}} * {{MergeIteratorPool.initialValue}} looks to be a duplicate * {{TokenTree:380}} errant new line was (Author: benedict): MergeIteratorPool: * As Jordan suggests, storing the pool inside the \{{MergeIterator}} makes sense - more so because it can avoid incurring the cost of \{{ThreadLocal.get()}} on \{{returnToPool}}, by stashing a link to an inner object (that guards put on the identity of the \{{Thread}} owning the queue) * Perhaps an \{{ArrayList}} is a better idea for collection type, as it will be generally garbage-free, and occupy less space than a \{{LinkedList}}? An \{{ArrayDeque}} could maintain the current behaviour, or you could pop from the back * Perhaps the queue would be better with fixed size, permitting at most two or three items to be stored? * As it happens, I’ve been working on very similar parts of the codebase, and I have implemented my own \{{TinyThreadLocalPool}} in the patches I hope to land soon, that does the above but has lower fixed-overhead and less indirection by permitting at most 3 items, stored in in-line properties (i.e. \{{val0}}, \{{val1}} and \{{val2}}). It might be worthwhile introducing that class here for consideration/comparison, and perhaps some amalgamation of the two used in whichever patch lands first (which I fully expect to be this one)? {\{MergeIterator.OneToOne}}: * While we’re here optimising, it probably makes sense to merge \{{OneToOne}} and \{{TrivialOneToOne}}, simply permitting the reducer in \{{OneToOne}} to be \{{null}}, and to switch behaviour based on a null check. This should permit call-sites to become bimorphic that are presently megamorphic, and the branch should be near perfectly predicted. {\{MergeIterator.ManyToOne}}: * Instead of using a separate \{{candidates}} array for reusing a \{{Candidate}}, could we avoid setting the last heap element to null by instead placing the now-defunct item there? * Do we need \{{active}} when it is implied by \{{MergeIterator.iterators != null}} and \{{Candidate.iter != null}}? * Do we need to set primitives to \{{-1}} or \{{false}} on \{{close}}, given we will throw \{{NullPointerException}} if we haven’t invoked \{{reset}}? Benchmarking: * It’s annoying work, but alongside the garbage benchmarks, it would be nice to see some simple JMH throughput benchmarks introduced, to see what the overall impact of the change is. I don’t mind helping out here. nits: * {\{ManyToOne.DEFAULT_SIZE}} -> \{{ManyToOne.POOLED_MERGE_LIMIT}}? * +1 Jordan’s nit about naming of \{{getSimple}} * {\{MergeIteratorPool.queues.initialValue}} should not declare \{{throw Exception}} * {\{MergeIteratorPool.initialValue}} looks to be a duplicate * {\{TokenTree:380}} errant new line > Pool Merge Iterators > -------------------- > > Key: CASSANDRA-15392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15392 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Normal > Fix For: 4.0 > > > By pooling merge iterators, instead of creating new ones each time we need > them, we can reduce garbage on the compaction and read paths under relevant > workloads by ~4% in many cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org