[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054593#comment-17054593 ]
Chandrasekhar Thumuluru commented on CASSANDRA-15397: ----------------------------------------------------- [~benedict] — Sorry for the delayed update on this ticket. I made necessary changes to IntervalList implementation based on your feedback. The previous submission was moved to IntervalList2. I also refactored the other code to use the IntervalList instead of IntervalTree. You suggested to use 4 long[] arrays but I couldn't do so since the code uses a generic that's comparable. I'm not sure if assuming long will be a good idea. Instead I changed the implementation to use two lists one to store the interval points and the other to store the data. Based on my performance comparison, I see the previous submission performs better. It could be partly because the previous implementation packs all the relevant items together. I also added the performance comparison for 100k, 250k, 500k, 750k and 1 million SSTables based on 1000 searches. Based on my analysis, the performance of Interval list with (250k and above) huge number of SSTables falls behind Interval tree by a small margin but the trade-off is less construction cost and less garbage creation during construction. Please take a look and let me know what you think. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable > Reporter: Chandrasekhar Thumuluru > Assignee: Chandrasekhar Thumuluru > Priority: Low > Labels: pull-request-available > Attachments: 90p_100k_sstables_with_1000_searches.png, > 90p_1million_sstables_with_1000_searches.png, > 90p_250k_sstables_with_1000_searches.png, > 90p_500k_sstables_with_1000_searches.png, > 90p_750k_sstables_with_1000_searches.png, > 95p_10000_SSTable_with_5000_Searches.png, > 95p_100k_sstables_with_1000_searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_1million_sstables_with_1000_searches.png, > 95p_20000_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_250k_sstables_with_1000_searches.png, > 95p_30000_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 95p_500k_sstables_with_1000_searches.png, > 95p_750k_sstables_with_1000_searches.png, > 99p_10000_SSTable_with_5000_Searches.png, > 99p_100k_sstables_with_1000_searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_1million_sstables_with_1000_searches.png, > 99p_20000_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_250k_sstables_with_1000_searches.png, > 99p_30000_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, > 99p_500k_sstables_with_1000_searches.png, > 99p_750k_sstables_with_1000_searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_10000_SSTable_with_5000_Searches.png, > Mean_100k_sstables_with_1000_searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_1million_sstables_with_1000_searches.png, > Mean_20000_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_250k_sstables_with_1000_searches.png, > Mean_30000_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png, > Mean_500k_sstables_with_1000_searches.png, > Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, > replace_intervaltree_with_intervallist.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org