[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandrasekhar Thumuluru updated CASSANDRA-15397:
------------------------------------------------
    Description: 
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  I used the code 
from version _3.11_.

  was:
Cassandra uses IntervalTrees to identify the SSTables that overlap with search 
interval. In Cassandra, IntervalTrees are not mutated. They are recreated each 
time a mutation is required. This can be an issue during repairs. In fact we 
noticed such issues during repair. 

Since lists are cache friendly compared to linked lists and trees, I decided to 
compare the search performance with:
* Linear Walk.
* Elimination using Binary Search (idea is to eliminate intervals using start 
and end points of search interval). 

Based on the tests I ran, I noticed Binary Search based elimination almost 
always performs similar to IntervalTree performance or out performs 
IntervalTree based search. 

I ran the tests using random intervals to build the tree/lists and another 
randomly generated search interval with 5000 iterations. I'm attaching all the 
relevant graphs. 

PS: For the purpose of test, I simplified the IntervalTree code by making it 
non-generic and removing the data portion of the interval.  


> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15397
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chandrasekhar Thumuluru
>            Priority: Normal
>         Attachments: 99p_10000_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_20000_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_30000_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png
>
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree performance or out performs 
> IntervalTree based search. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. 
> PS: For the purpose of test, I simplified the IntervalTree code by making it 
> non-generic and removing the data portion of the interval.  I used the code 
> from version _3.11_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to