[ 
https://issues.apache.org/jira/browse/CASSANDRA-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699273#comment-13699273
 ] 

Sylvain Lebresne commented on CASSANDRA-5677:
---------------------------------------------

bq. Why would we want to retain the Centered implementation?

The centered implementation is faster at lookups in general when there is 
ranges overlapping, which is exactly what we care about for sstables lookup in 
DataTracker so we probably want to keep it for that.

In fact, I think we should specialize things further for range tombstones, 
because in practice there is no real reason to keep overlapping range 
tombstones, so all we should keep is a list of non-overlapping ranges. 
Furthermore, when we read, we actually read range tombstones in sorted order 
(of their start), so we can do the same kind of optimization that we do in 
ArrayBackedSortedColumns. Also, we do more than once something like:
{noformat}
for (Column c : cf)
  if (cf.deletionInfo().isDeleted(c) && ...)
     ...
{noformat}
which for every column ends up looking up the whole deletionInfo, even though 
we know we will call isDeleted() in sorted order, so we could optimize that.
Lastly, our current way of handling range tombstone isn't particularly cheap in 
term of allocated object, and I think we can do better :)

But anyway, I'll post what I have in mind tomorrow (need to clean up the patch).
                
> Performance improvements of RangeTombstones/IntervalTree
> --------------------------------------------------------
>
>                 Key: CASSANDRA-5677
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5677
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Fabien Rousseau
>            Priority: Minor
>         Attachments: 5677-new-IntervalTree-implementation.patch
>
>
> Using massively range tombstones leads to bad response time (ie 100-500 
> ranges tombstones per row).
> After investigation, it seems that the culprit is how the DeletionInfo are 
> merged. Each time a RangeTombstone is added into the DeletionInfo, the whole 
> IntervalTree is rebuilt (thus, if you have 100 tombstones in one row, then 
> 100 instances of IntervalTree are created, the first one having one interval, 
> the second one 2 intervals,... the 100th one : 100 intervals...)
> It seems that once the IntervalTree is built, it is not possible to add a new 
> Interval. Idea is to change the implementation of the IntervalTree by another 
> one which support "insert interval".
> Attached is a proposed patch which :
>  - renames the IntervalTree implementation to IntervalTreeCentered (the 
> renaming is inspired from : http://en.wikipedia.org/wiki/Interval_tree)
>  - adds a new implementation IntervalTreeAvl (which is described here : 
> http://en.wikipedia.org/wiki/Interval_tree#Augmented_tree and here : 
> http://en.wikipedia.org/wiki/AVL_tree )
>  - adds a new interface IIntervalTree to abstract the implementation
>  - adds a new configuration option (interval_tree_provider) which allows to 
> choose between the two implementations (defaults to previous 
> IntervalTreeCentered)
>  - updates IntervalTreeTest unit tests to test both implementations
>  - creates a mini benchmark between the two implementations (tree creation, 
> point lookup, interval lookup)
>  - creates a mini benchmark between the two implementations when merging 
> DeletionInfo (which shows a big performance improvement when using 500 
> tombstones for a row)
> This patch applies for 1.2 branch...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to