[jira] [Comment Edited] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

Benjamin Lerer (JIRA) Fri, 08 Jul 2016 12:29:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368229#comment-15368229
 ]


Benjamin Lerer edited comment on CASSANDRA-12153 at 7/8/16 7:28 PM:
--------------------------------------------------------------------

I am the one to blame for the {{stream()}} method. My main concern, when I 
created it, was just to simplify the code.
If we are really looking for speed, I think that we should have some field 
variables for {{hasIN}}, {{hasEq}} ...
It will move the computation at preparation time rather than at execution time 
and will perform it only once (if my memory is correct {{hasIN()}} is called 
multiple times).

bq. Then remove RestrictionSet stream() to discourage this from being 
reintroduced?

There is 2 problems associated to the {{stream()}} method. The creation of the 
{{LinkedHashSet}} which is used to remove the {{MultiColumnRestriction}} 
duplicates and the Lambda expressions.
The {{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so 
removing {{stream()}} will not solve that problem. 
I think, we could keep track of the fact that multicolumn restrictions are used 
or not and avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.


was (Author: blerer):
I am the one to blame for the {{stream()}} method. My main concern, when I 
created it, was just to simplify the code.
If we are really looking for speed, I think that we should have some field 
variables for {{hasIN}}, {{hasEq}} ...
It will move the computation at preparation time rather than at execution time 
and will perform it only once (if my memory is correct {{hasIN()}} is called 
multiple times).

bq. Then remove RestrictionSet stream() to discourage this from being 
reintroduced?

There is 2 problems associated to the {{stream()}} method. The creation of the 
{{LinkedHashSet}} which is used to remove the duplicates 
{{MultiColumnRestrictions}} and the Lambda expressions.
The {{{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so 
removing {{stream()} will not solve that problem. 
I think, we could keep track of the fact that multicolumn restrictions are used 
or not and avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.

> RestrictionSet.hasIN() is slow
> ------------------------------
>
>                 Key: CASSANDRA-12153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

Reply via email to