[jira] [Commented] (CASSANDRA-15392) Pool Merge Iterators

2020-01-23 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022492#comment-17022492
 ] 

Blake Eggleston commented on CASSANDRA-15392:
-

[~jrwest] sure, that would be great. Tag me for review if you'd like.

> Pool Merge Iterators
> 
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need 
> them, we can reduce garbage on the compaction and read paths under relevant 
> workloads by ~4% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15392) Pool Merge Iterators

2020-01-23 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022051#comment-17022051
 ] 

Jordan West commented on CASSANDRA-15392:
-

{quote} As this relates to SASI, this wouldn’t be a regression right? I'd guess 
we're already wasting TrivialOneToOne iterators?
{quote}
Correct. This wouldn't break SASI but it would waste TrivialOneToOne's most of 
the time and rarely it would waste a ManyToOne that is oversized for its needs. 
I haven't looked at FQL but for SASI I think its easy enough to just skip using 
MergeIterators when they aren't needed / skip the pool when they are – happy to 
submit a patch if its helpful. 

> Pool Merge Iterators
> 
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need 
> them, we can reduce garbage on the compaction and read paths under relevant 
> workloads by ~4% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15392) Pool Merge Iterators

2020-01-22 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021552#comment-17021552
 ] 

Benedict Elliott Smith commented on CASSANDRA-15392:


bq. In cases where there are that many sstables to merge the node is usually 
also under significant gc pressure, so it would not be a great time to abandon 
this optimization. 

Perhaps the size limit should be larger, and an intrusive linked-list stack is 
certainly a good data structure to use here.  But we probably want to impose 
_some_ coarse size limits?  Though you make a very good point that three is 
much too few in this case.  But otherwise we have the problem that infrequent 
storms of sstable creation slowly causes threads to accumulate "huge" pools - 
and we can have a lot of these threads, since read-parallelism is gated by the 
number of threads.

It's far from crazy to have a system with > 200 threads, and if they each 
managed to hold on to 1k iterators at a time (noting we need iterators both for 
the partitions, their rows and any complex columns, so this seems readily 
achievable), we'd be looking at 50-100MiB+ of heap utilised by these pools.

> Pool Merge Iterators
> 
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need 
> them, we can reduce garbage on the compaction and read paths under relevant 
> workloads by ~4% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15392) Pool Merge Iterators

2020-01-22 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021455#comment-17021455
 ] 

Blake Eggleston commented on CASSANDRA-15392:
-

Thanks for the feedback. I just pushed up some commits addressing most points.

Getting rid of linked list queues is a good idea, but don’t think 
TinyThreadLocalPool will be sufficient in some cases. If {{MergeIterator#get}} 
is called with more than than {{DEFAULT_SIZE}} iterators, we’ll nest pooled 
iterators. In cases where there _are_ that many sstables to merge the node is 
usually also under significant gc pressure, so it would not be a great time to 
abandon this optimization. An intrusive linked stack is the best I could come 
up with. It's unbounded, doesn't do any allocations, and has very simple logic, 
which was important for preventing oneToOne performance from regressing too 
much.

@Jordan, short circuiting the merge iterator in that case is a really good 
idea. However, FQL will need a bit of (probably straightforward) rework before 
{{MergeIterator#get}} can return a closeable iterator, since it relies on 
having separate input and output types. As this relates to SASI, this wouldn’t 
be a regression right? I'd guess we're already wasting TrivialOneToOne 
iterators?

I’ve also put together a very simple jmh benchmark. After a few iterations of 
additional optimization, the manyToOne path saw a 15% increase in throughput in 
most cases, and the oneToOne path saw a 2-6% reduction in most cases. Numbers 
below:
{code:java}
Baseline:
BenchmarkMode   Cnt 
ScoreError  Units
MergeIteratorGetBench.manyToOnesample  59417991   
418.721 ± 22.706  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.00sample 
101.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.50sample 
148.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.90sample 
171.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.95sample 
176.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.99sample 
188.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.999   sample
1456.000   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.  sample
9171.213   ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p1.00sample
38862848.000   ns/op
MergeIteratorGetBench.one  sample  53514991
83.452 ±  4.005  ns/op
MergeIteratorGetBench.one:one·p0.00sample  
47.000   ns/op
MergeIteratorGetBench.one:one·p0.50sample  
73.000   ns/op
MergeIteratorGetBench.one:one·p0.90sample  
90.000   ns/op
MergeIteratorGetBench.one:one·p0.95sample  
95.000   ns/op
MergeIteratorGetBench.one:one·p0.99sample 
103.000   ns/op
MergeIteratorGetBench.one:one·p0.999   sample 
119.000   ns/op
MergeIteratorGetBench.one:one·p0.  sample
3830.003   ns/op
MergeIteratorGetBench.one:one·p1.00sample
21987328.000   ns/op

Pooled:
BenchmarkModeCnt 
Score   Error  Units
MergeIteratorGetBench.manyToOnesample  100352304   
133.534 ± 1.662  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.00sample   
86.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.50sample  
128.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.90sample  
143.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.95sample  
148.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.99sample  
157.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.999   sample  
172.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p0.  sample 
3744.000  ns/op
MergeIteratorGetBench.manyToOne:manyToOne·p1.00sample 
13008896.000  ns/op
MergeIteratorGetBench.one  sample   97007239
87.340 ± 3.063  ns/op
MergeIteratorGetBench.one:one·p0.00sample   
51.000  ns/op
MergeIteratorGetBench.one:one·p0.50sample   
78.000  ns/op
MergeIteratorGetBench.one:one·p0.90sample   
92.000  ns/op
MergeIteratorGetBench.one:one·p0.95sample   
97

[jira] [Commented] (CASSANDRA-15392) Pool Merge Iterators

2020-01-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010853#comment-17010853
 ] 

Benedict Elliott Smith commented on CASSANDRA-15392:


MergeIteratorPool:
 * As Jordan suggests, storing the pool inside the \{{MergeIterator}} makes 
sense - more so because it can avoid incurring the cost of 
\{{ThreadLocal.get()}} on \{{returnToPool}}, by stashing a link to an inner 
object (that guards put on the identity of the \{{Thread}} owning the queue)
 * Perhaps an \{{ArrayList}} is a better idea for collection type, as it will 
be generally garbage-free, and occupy less space than a \{{LinkedList}}? An 
\{{ArrayDeque}} could maintain the current behaviour, or you could pop from the 
back
 * Perhaps the queue would be better with fixed size, permitting at most two or 
three items to be stored?
 * As it happens, I’ve been working on very similar parts of the codebase, and 
I have implemented my own \{{TinyThreadLocalPool}} in the patches I hope to 
land soon, that does the above but has lower fixed-overhead and less 
indirection by permitting at most 3 items, stored in in-line properties (i.e. 
\{{val0}}, \{{val1}} and \{{val2}}).  It might be worthwhile introducing that 
class here for consideration/comparison, and perhaps some amalgamation of the 
two used in whichever patch lands first (which I fully expect to be this one)?

 

{\{MergeIterator.OneToOne}}: 
 * While we’re here optimising, it probably makes sense to merge \{{OneToOne}} 
and \{{TrivialOneToOne}}, simply permitting the reducer in \{{OneToOne}} to be 
\{{null}}, and to switch behaviour based on a null check.  This should permit 
call-sites to become bimorphic that are presently megamorphic, and the branch 
should be near perfectly predicted.

 

{\{MergeIterator.ManyToOne}}: 
 * Instead of using a separate \{{candidates}} array for reusing a 
\{{Candidate}}, could we avoid setting the last heap element to null by instead 
placing the now-defunct item there?
 * Do we need \{{active}} when it is implied by \{{MergeIterator.iterators != 
null}} and \{{Candidate.iter != null}}?
 * Do we need to set primitives to \{{-1}} or \{{false}} on \{{close}}, given 
we will throw \{{NullPointerException}} if we haven’t invoked \{{reset}}?

 

Benchmarking:
 * It’s annoying work, but alongside the garbage benchmarks, it would be nice 
to see some simple JMH throughput benchmarks introduced, to see what the 
overall impact of the change is.  I don’t mind helping out here.

 

nits:
 * {\{ManyToOne.DEFAULT_SIZE}} -> \{{ManyToOne.POOLED_MERGE_LIMIT}}?
 * +1 Jordan’s nit about naming of \{{getSimple}}
 * {\{MergeIteratorPool.queues.initialValue}} should not declare \{{throw 
Exception}}
 * {\{MergeIteratorPool.initialValue}} looks to be a duplicate
 * {\{TokenTree:380}} errant new line

 

> Pool Merge Iterators
> 
>
> Key: CASSANDRA-15392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15392
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> By pooling merge iterators, instead of creating new ones each time we need 
> them, we can reduce garbage on the compaction and read paths under relevant 
> workloads by ~4% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org