[jira] [Commented] (CASSANDRA-4011) range-based log(n) elimination of sstables in read path

Jonathan Ellis (JIRA) Fri, 25 Jan 2013 15:51:15 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563179#comment-13563179
 ]


Jonathan Ellis commented on CASSANDRA-4011:
-------------------------------------------

DataTracker.intervalTree is used on the read path regardless of 
compactionstrategy.
                
> range-based log(n) elimination of sstables in read path
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4011
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4011
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>
> If the read path was able to eliminate sstables based on token ranges, we 
> would avoid {{O(n)}} bloom filter checks ({{n}} being number of sstables).
> Contributing motivation:
> * For maximally efficient bulk-import, you tend to want a lot of small 
> sstables to avoid having to build up huge ones during the bulk creation 
> process.
> * To avoid having to keep duplicate data when switching a data set (in a 
> periodic bulk replace import process), keeping sstables partitioned on token 
> range (similarly to leveled compaction) allows in-place replacement of 
> sstables one sstable at a time.
> Those two in combination would mean that you can run a bulk-import based 
> total-dataset-replacement cluster with zero compaction and with zero disk 
> space overhead stemming from having to have overhead for compaction.
> In addition:
> * For e.g. leveled compaction where we have range based partitioning anyway, 
> {{log(n)}} is preferable to {{o(n)}}; especially if it would allow us to have 
> more than 10 "partitions" per level. I'm not sure yet whether there are other 
> reasons to have "only" 10, but if we can make them smaller by eliminating the 
> {{o(n)}} behavior in the read path, individual compactions can be even 
> smaller with leveled and you would scale even more easily with large data 
> sets while avoiding build-up in L0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4011) range-based log(n) elimination of sstables in read path

Reply via email to