[ https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368229#comment-15368229 ]
Benjamin Lerer edited comment on CASSANDRA-12153 at 7/8/16 7:28 PM: -------------------------------------------------------------------- I am the one to blame for the {{stream()}} method. My main concern, when I created it, was just to simplify the code. If we are really looking for speed, I think that we should have some field variables for {{hasIN}}, {{hasEq}} ... It will move the computation at preparation time rather than at execution time and will perform it only once (if my memory is correct {{hasIN()}} is called multiple times). bq. Then remove RestrictionSet stream() to discourage this from being reintroduced? There is 2 problems associated to the {{stream()}} method. The creation of the {{LinkedHashSet}} which is used to remove the {{MultiColumnRestriction}} duplicates and the Lambda expressions. The {{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so removing {{stream()}} will not solve that problem. I think, we could keep track of the fact that multicolumn restrictions are used or not and avoid creating the {{LinkedHashSet}} if they are not used. I have no idea of the cost associated to the use of the lambda. was (Author: blerer): I am the one to blame for the {{stream()}} method. My main concern, when I created it, was just to simplify the code. If we are really looking for speed, I think that we should have some field variables for {{hasIN}}, {{hasEq}} ... It will move the computation at preparation time rather than at execution time and will perform it only once (if my memory is correct {{hasIN()}} is called multiple times). bq. Then remove RestrictionSet stream() to discourage this from being reintroduced? There is 2 problems associated to the {{stream()}} method. The creation of the {{LinkedHashSet}} which is used to remove the duplicates {{MultiColumnRestrictions}} and the Lambda expressions. The {{{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so removing {{stream()} will not solve that problem. I think, we could keep track of the fact that multicolumn restrictions are used or not and avoid creating the {{LinkedHashSet}} if they are not used. I have no idea of the cost associated to the use of the lambda. > RestrictionSet.hasIN() is slow > ------------------------------ > > Key: CASSANDRA-12153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12153 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Tyler Hobbs > Assignee: Tyler Hobbs > Priority: Minor > Fix For: 3.x > > > While profiling local in-memory reads for CASSANDRA-10993, I noticed that > {{RestrictionSet.hasIN()}} was responsible for about 1% of the time. It > looks like it's mostly slow because it creates a new LinkedHashSet (which is > expensive to init) and uses streams. This can be replaced with a simple for > loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)