[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967715#comment-16967715
 ] 

Benedict Elliott Smith commented on CASSANDRA-15397:
----------------------------------------------------

Hi [~cthumuluru], this sounds like a plausible optimisation (without having 
thought about it much myself yet).  Unfortunately it's not a very _pressing_ 
optimisation, but I will try to find time within the next couple of weeks to 
give you some feedback.

If possible, we generally prefer links to GitHub branches.  Could you push your 
fork with these changes somewhere to look at?

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15397
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Chandrasekhar Thumuluru
>            Assignee: Chandrasekhar Thumuluru
>            Priority: Low
>         Attachments: 95p_10000_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_20000_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_30000_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_10000_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_20000_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_30000_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_10000_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_20000_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_30000_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to