[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-08 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054593#comment-17054593
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] — Sorry for the delayed update on this ticket. 

I made necessary changes to IntervalList implementation based on your feedback. 
The previous submission was moved to IntervalList2. I also refactored the other 
code to use the IntervalList instead of IntervalTree. You suggested to use 4 
long[] arrays but I couldn't do so since the code uses a generic that's 
comparable. I'm not sure if assuming long will be a good idea. Instead I 
changed the implementation to use two lists one to store the interval points 
and the other to store the data. Based on my performance comparison, I see the 
previous submission performs better. It could be partly because the previous 
implementation packs all the relevant items together. I also added the 
performance comparison for 100k, 250k, 500k, 750k and 1 million SSTables based 
on 1000 searches. 

Based on my analysis, the performance of Interval list with (250k and above) 
huge number of SSTables falls behind Interval tree by a small margin but the 
trade-off is less construction cost and less garbage creation during 
construction. Please take a look and let me know what you think. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the 

[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-08 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: 90p_1million_sstables_with_1000_searches.png
Mean_1million_sstables_with_1000_searches.png
95p_1million_sstables_with_1000_searches.png
99p_1million_sstables_with_1000_searches.png
90p_750k_sstables_with_1000_searches.png
95p_750k_sstables_with_1000_searches.png
99p_750k_sstables_with_1000_searches.png
Mean_750k_sstables_with_1000_searches.png
90p_500k_sstables_with_1000_searches.png
95p_500k_sstables_with_1000_searches.png
99p_500k_sstables_with_1000_searches.png
Mean_500k_sstables_with_1000_searches.png
90p_250k_sstables_with_1000_searches.png
95p_250k_sstables_with_1000_searches.png
99p_250k_sstables_with_1000_searches.png
Mean_250k_sstables_with_1000_searches.png
90p_100k_sstables_with_1000_searches.png
95p_100k_sstables_with_1000_searches.png
99p_100k_sstables_with_1000_searches.png
Mean_100k_sstables_with_1000_searches.png

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage.