[ https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370492#comment-15370492 ]
Benjamin Lerer commented on CASSANDRA-12153: -------------------------------------------- I profiled trunk with Mission Control with the {{-XX:+UnlockDiagnosticVMOptions}} and {{-XX:+DebugNonSafepoints}} options. To be sure that the problem was not hidden by the results processing part of the code, I used a program that fired queries on an empty table. On my initial run {{RestrictionSet.hasIn()}} was shown second on the hot methods list with 3.14% of the time in the call tree. Looking at the other hot methods, I found out that {{HashMap.putVal}}, {{HashMap.resize()}}, and {{HashSet.init()}} were pretty high on the list. In the allocation tab: {{HashMap$Node[]}}, {{LinkedHashMap}} and {{LinkedHashMap$Entry}} were representing a total pressure of 11,81%. All those were due to the {{LinkedHashSet}} creation in {{RestrictionSet.stream()}} and {{RestrictionSet.iterator()}} methods. Removing {{RestrictionSet.stream()}} and the use of lambdas by simple loops using {{RestrictionSet.iterator()}} reduced {{RestrictionSet.hasIn()}} to 0.61% in call tree. Replacing new {{LinkedHashSet()}} by a custom filtering iterator brought down {{RestrictionSet.hasIn()}} to 0.10% and removed all the allocations caused by the use of {{LinkedHashSet}}. As we can only have duplicates when multiple column restrictions are used, I added a variable to keep track of it and only filter duplicates when needed. That has a strong effect everywhere where {{RestrictionSet.iterator()}} was used and changed the complete picture. {{RestrictionSet.hasIn()}} came back up to 0.51% just because other part of the code were faster :-). I have pushed my experimental branch [here|https://github.com/apache/cassandra/compare/trunk...blerer:12153-trunk]. Waiting for your feedbacks. PS: To be honest, I was not expecting that the use of new {{LinkHashSet}}, {{stream()}} and lambdas was so bad from a performance point of view. > RestrictionSet.hasIN() is slow > ------------------------------ > > Key: CASSANDRA-12153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12153 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Tyler Hobbs > Assignee: Tyler Hobbs > Priority: Minor > Fix For: 3.x > > > While profiling local in-memory reads for CASSANDRA-10993, I noticed that > {{RestrictionSet.hasIN()}} was responsible for about 1% of the time. It > looks like it's mostly slow because it creates a new LinkedHashSet (which is > expensive to init) and uses streams. This can be replaced with a simple for > loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)