[ 
https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593843#comment-13593843
 ] 

Rick Branson commented on CASSANDRA-3929:
-----------------------------------------

[~liqusha]: What I mean is that in order to DELETE only the tail, Cassandra 
will have to read the entire row. For instance, your minimum retention 
requirement is ~500 columns, in order to find any columns after the 500th, the 
following operations must be performed:

 * All of the columns are read from the SSTable files that contain columns for 
that row
 * These row fragments are "merged" (re-sorting by Comparator, tombstone 
removal, etc)
 * Tombstones must be inserted for each column "after" the 500th.
 * As time goes on and tombstones build up (before GC grace), this operation 
gets more and more expensive and compaction perf also suffers.

What I mean by "free" is not actually the need to perform the DELETE operation, 
but that it doesn't add extra cost burden to support this feature.

As far as use case, it varies quite a bit. There are many use cases I can 
imagine for persistent storage with a quota for each user that auto-evicts old 
data over time for a low cost. Even for "big data" scenarios, the cost of 
computing still goes up as the data size grows. For instance, a database used 
to store objects a user interacted with for performing collaborative filtering 
only needs a sample. In real world use cases, these types of algorithms really 
need a relatively bounded set of data, and user taste might change over time, 
so only taking into consideration the most recent 90 objects makes sense. 
TTL'ing this data also doesn't make sense, because there are a wide range of 
frequencies at which users might generate this data.

[~slebresne]: I spent a few hours digging thru the compaction source and it's 
going to be messy to do this, probably involving a lot of copy+paste, so I'm 
even more +1 on disaggregating that massive Runnable method in CompactionTask 
into something more pluggable / extensible.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 2.0
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt, 
> 3929_f.txt, 3929_g_tests.txt, 3929_g.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had 
> requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to