[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and about the performance.  

I'll make necessary changes, compare it again and post the results. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108
 ] 

Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 3/9/20, 3:50 PM:
--

{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and not about the performance.  I'll make 
necessary changes, compare it again and post the results. 


was (Author: cthumuluru):
{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and about the performance.  

I'll make necessary changes, compare it again and post the results. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos

[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-08 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054593#comment-17054593
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] — Sorry for the delayed update on this ticket. 

I made necessary changes to IntervalList implementation based on your feedback. 
The previous submission was moved to IntervalList2. I also refactored the other 
code to use the IntervalList instead of IntervalTree. You suggested to use 4 
long[] arrays but I couldn't do so since the code uses a generic that's 
comparable. I'm not sure if assuming long will be a good idea. Instead I 
changed the implementation to use two lists one to store the interval points 
and the other to store the data. Based on my performance comparison, I see the 
previous submission performs better. It could be partly because the previous 
implementation packs all the relevant items together. I also added the 
performance comparison for 100k, 250k, 500k, 750k and 1 million SSTables based 
on 1000 searches. 

Based on my analysis, the performance of Interval list with (250k and above) 
huge number of SSTables falls behind Interval tree by a small margin but the 
trade-off is less construction cost and less garbage creation during 
construction. Please take a look and let me know what you think. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during rep

[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-08 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: 90p_1million_sstables_with_1000_searches.png
Mean_1million_sstables_with_1000_searches.png
95p_1million_sstables_with_1000_searches.png
99p_1million_sstables_with_1000_searches.png
90p_750k_sstables_with_1000_searches.png
95p_750k_sstables_with_1000_searches.png
99p_750k_sstables_with_1000_searches.png
Mean_750k_sstables_with_1000_searches.png
90p_500k_sstables_with_1000_searches.png
95p_500k_sstables_with_1000_searches.png
99p_500k_sstables_with_1000_searches.png
Mean_500k_sstables_with_1000_searches.png
90p_250k_sstables_with_1000_searches.png
95p_250k_sstables_with_1000_searches.png
99p_250k_sstables_with_1000_searches.png
Mean_250k_sstables_with_1000_searches.png
90p_100k_sstables_with_1000_searches.png
95p_100k_sstables_with_1000_searches.png
99p_100k_sstables_with_1000_searches.png
Mean_100k_sstables_with_1000_searches.png

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using r

[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-12-20 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001037#comment-17001037
 ] 

Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 12/20/19 5:50 PM:
---

[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code references in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even it the 
assumption turns out to be wrong. 


was (Author: cthumuluru):
[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code reference in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even it the 
assumption turns out to be wrong. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Sear

[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-12-20 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001037#comment-17001037
 ] 

Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 12/20/19 5:50 PM:
---

[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code references in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even if the 
assumption turns out to be wrong. 


was (Author: cthumuluru):
[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code references in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even it the 
assumption turns out to be wrong. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Sear

[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-12-20 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001037#comment-17001037
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code reference in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even it the 
assumption turns out to be wrong. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-12-19 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000219#comment-17000219
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] — I posted the changes to my branch and created a 
[PR|https://github.com/apache/cassandra/pull/400]. Please provide your comments 
when you find free time. Thanks. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-12-10 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993222#comment-16993222
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] — When you get a chance, can you review the changes and let me know 
your feedback?

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-14 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: TESTS-TestSuites.xml.lz4

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-14 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974574#comment-16974574
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

[~benedict] —
 * Attached the patch file with my changes 
[^replace_intervaltree_with_intervallist.patch]. I used trunk for the patch.
 * Attached the unit test run results [^TESTS-TestSuites.xml.lz4] .
 * 
[Link|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
 to google doc with some details about the proposed algorithm.

Please review the changes when you get a chance and let me know your feedback.

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-14 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: replace_intervaltree_with_intervallist.patch

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> replace_intervaltree_with_intervallist.patch
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-05 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967732#comment-16967732
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

Sure [~benedict]. I can make the changes and update the ticket with Github 
links. As you can see I simplified the IntervalTree implementation for 
comparison purposes. I'll make the final changes with tests and push them to my 
fork by weekend.

I completely agree with you it's not a pressing change but given the 
construction cost and immutable nature of IntervalTree usage I felt it's worth 
a shot. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>    Reporter: Chandrasekhar Thumuluru
>    Assignee: Chandrasekhar Thumuluru
>Priority: Low
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-05 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: Mean_3_SSTable_with_5000_Searches.png
Mean_25000_SSTable_with_5000_Searches.png
Mean_2_SSTable_with_5000_Searches.png
Mean_15000_SSTable_with_5000_Searches.png
Mean_1_SSTable_with_5000_Searches.png
Mean_5000_SSTable_with_5000_Searches.png

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-05 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: 95p_5000_SSTable_with_5000_Searches.png
95p_1_SSTable_with_5000_Searches.png
95p_15000_SSTable_with_5000_Searches.png
95p_2_SSTable_with_5000_Searches.png
95p_25000_SSTable_with_5000_Searches.png
95p_3_SSTable_with_5000_Searches.png

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 95p_1_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-05 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree or out performs IntervalTree based 
search. The cost of IntervalTree construction is also substantial and produces 
lot of garbage during repairs. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. The x-axis in the graphs is the search interval coverage. 10p 
means the search interval covered 10% of the intervals. The y-axis is the time 
the search took in nanos. 

PS: 
# For the purpose of test, I simplified the IntervalTree by removing the data 
portion of the interval.  Modified the template version (Java generics) to a 
specialized version. 
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree or out performs IntervalTree based 
search. The cost of IntervalTree construction is also substantial and produces 
lot of garbage during repairs. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. The x-axis in the graphs is the search interval coverage. 10p 
means the search interval covered 10% of the intervals. The y-axis is the time 
the search took in nanos. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
>

[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-05 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree or out performs IntervalTree based 
search. The cost of IntervalTree construction is also substantial and produces 
lot of garbage during repairs. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. The x-axis in the graphs is the search interval coverage. 10p 
means the search interval covered 10% of the intervals. The y-axis is the time 
the search took in nanos. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 


PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  
> # I use

[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 


PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: 
# For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  
# I used the code from Cassandra version _3.11_.
# Time in the graph is in nanos. 

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  I used the code 
from version _3.11_.


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalTreeSimplified.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalList.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Attachment: IntervalListWithElimination.java

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:

Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  I used the code 
from version _3.11_.

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>    Reporter: Chandrasekhar Thumuluru
>Priority: Normal
> Attachments: 99p_1_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2019-11-04 Thread Chandrasekhar Thumuluru (Jira)
Chandrasekhar Thumuluru created CASSANDRA-15397:
---

 Summary: IntervalTree performance comparison with Linear Walk and 
Binary Search based Elimination. 
 Key: CASSANDRA-15397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chandrasekhar Thumuluru
 Attachments: 99p_1_SSTable_with_5000_Searches.png, 
99p_15000_SSTable_with_5000_Searches.png, 
99p_2_SSTable_with_5000_Searches.png, 
99p_25000_SSTable_with_5000_Searches.png, 
99p_3_SSTable_with_5000_Searches.png, 
99p_5000_SSTable_with_5000_Searches.png

Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: Feedback wanted for Knowledge base for all things cassandra (cassandra.link)

2019-02-25 Thread Chandrasekhar Thumuluru
Wow, this is great! Thanks!

On Mon, Feb 25, 2019 at 7:05 AM Rahul Singh 
wrote:

> Folks,
>
> I've been scrounging time to work on a knowledge resource for all things
> Cassandra ( Cassandra, DSE, Scylla, YugaByte, Elassandra)
>
> I feel like the Cassandra core community still has the most knowledge even
> though people are fragmenting into their brands.
>
> Would love to get your feedback on what you guys would want as a go to
> resource for Cassandra development, administration, architecture, etc.
> resources.
>
> *MVP  1*
> https://anant.github.io/awesome-cassandra
>
>
> *MVP  2*
> https://cassandra.netlify.com/
>
> *MVP  3*
>
> https://leaves-search.netlify.com/documents.html#/q=*:*=tags:(cassandra)=*=20&=
>   -
>
> Each of these were iterated with feedback from the community, so would
> love to get your feedback to make it better.
>
> Up next is to add the RSS feeds from the major Cassandra folks like on
> https://cassandra.alteroot.org
>
> Thanks for your feedback in advance.
>
>