[ 
https://issues.apache.org/jira/browse/CASSANDRA-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845362#comment-13845362
 ] 

Oleg Anastasyev commented on CASSANDRA-6446:
--------------------------------------------

On write path:
Your patch looks good to me. 

On  MAX_COLUMNS_TO_APPLY_RANGE_TOMBSTONE: we did not made a throughout testing 
on what the best value for this could be. This is kind of a guess, comparing 
complexities of 2 code paths. All we know at the moment is, that having it 
10000 does not degraded performance for small memtables and solved write 
timeout problems with large ones. Of course, doing some perf test is the best 
option here.

On read path:
1. If we make 2 calls to the existing searchInternal, we could have situation 
when both start and end do not fit into any range, while there are ranges 
between start and end.  For example we select  1-4 and we have deleted range 
2-3. We will get -1 from both calls and will erroneously drop range.
2.3. You are right, filter implementation is not optimized for multiple 
Names/Slices. The proposed way, code would be more complicated and probably  
error prone. While multiple Name/Slines queries are rare and profit would be 
very low, we decided not complicate things
4. >> And if the rational is that it's faster to add a few tombstone blindly 
than to call getRangeTombstoneIterator,
Yes, this was the original intent. Anyway, it does not makes much sense, so 
lets just remove this check.


> Faster range tombstones on wide partitions
> ------------------------------------------
>
>                 Key: CASSANDRA-6446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6446
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Oleg Anastasyev
>            Assignee: Oleg Anastasyev
>             Fix For: 2.1
>
>         Attachments: 6446-write-path-v2.txt, 
> RangeTombstonesReadOptimization.diff, RangeTombstonesWriteOptimization.diff
>
>
> Having wide CQL rows (~1M in single partition) and after deleting some of 
> them, we found inefficiencies in handling of range tombstones on both write 
> and read paths.
> I attached 2 patches here, one for write path 
> (RangeTombstonesWriteOptimization.diff) and another on read 
> (RangeTombstonesReadOptimization.diff).
> On write path, when you have some CQL rows deletions by primary key, each of 
> deletion is represented by range tombstone. On put of this tombstone to 
> memtable the original code takes all columns from memtable from partition and 
> checks DeletionInfo.isDeleted by brute for loop to decide, should this column 
> stay in memtable or it was deleted by new tombstone. Needless to say, more 
> columns you have on partition the slower deletions you have heating your CPU 
> with brute range tombstones check. 
> The RangeTombstonesWriteOptimization.diff patch for partitions with more than 
> 10000 columns loops by tombstones instead and checks existance of columns for 
> each of them. Also it copies of whole memtable range tombstone list only if 
> there are changes to be made there (original code copies range tombstone list 
> on every write).
> On read path, original code scans whole range tombstone list of a partition 
> to match sstable columns to their range tomstones. The 
> RangeTombstonesReadOptimization.diff patch scans only necessary range of 
> tombstones, according to filter used for read.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to