[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370492#comment-15370492
 ] 

Benjamin Lerer commented on CASSANDRA-12153:
--------------------------------------------

I profiled trunk with Mission Control with the 
{{-XX:+UnlockDiagnosticVMOptions}} and {{-XX:+DebugNonSafepoints}} options.
To be sure that the problem was not hidden by the results processing part of 
the code, I used a program that fired queries on an empty table.
On my initial run {{RestrictionSet.hasIn()}} was shown second on the hot 
methods list with 3.14% of the time in the call tree. Looking at the other hot 
methods, I found out that {{HashMap.putVal}}, {{HashMap.resize()}}, and 
{{HashSet.init()}} were pretty high on the list.
In the allocation tab: {{HashMap$Node[]}}, {{LinkedHashMap}} and 
{{LinkedHashMap$Entry}} were representing a total pressure of 11,81%. All those 
were due to the {{LinkedHashSet}} creation in {{RestrictionSet.stream()}} and 
{{RestrictionSet.iterator()}} methods.

Removing {{RestrictionSet.stream()}} and the use of lambdas by simple loops 
using {{RestrictionSet.iterator()}} reduced {{RestrictionSet.hasIn()}} to 0.61% 
in  call tree. 
Replacing new {{LinkedHashSet()}} by a custom filtering iterator brought down 
{{RestrictionSet.hasIn()}} to 0.10% and removed all the allocations caused by 
the use of {{LinkedHashSet}}.

As we can only have duplicates when multiple column restrictions are used, I 
added a variable to keep track of it and only filter duplicates when needed. 
That has a strong effect everywhere where {{RestrictionSet.iterator()}} was 
used and changed the complete picture. {{RestrictionSet.hasIn()}} came back up 
to 0.51% just because other part of the code were faster :-).

I have pushed my experimental branch 
[here|https://github.com/apache/cassandra/compare/trunk...blerer:12153-trunk].
Waiting for your feedbacks.

PS: To be honest, I was not expecting that the use of new {{LinkHashSet}}, 
{{stream()}} and lambdas was so bad from a performance point of view. 


> RestrictionSet.hasIN() is slow
> ------------------------------
>
>                 Key: CASSANDRA-12153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to