[ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593843#comment-13593843 ]
Rick Branson commented on CASSANDRA-3929: ----------------------------------------- [~liqusha]: What I mean is that in order to DELETE only the tail, Cassandra will have to read the entire row. For instance, your minimum retention requirement is ~500 columns, in order to find any columns after the 500th, the following operations must be performed: * All of the columns are read from the SSTable files that contain columns for that row * These row fragments are "merged" (re-sorting by Comparator, tombstone removal, etc) * Tombstones must be inserted for each column "after" the 500th. * As time goes on and tombstones build up (before GC grace), this operation gets more and more expensive and compaction perf also suffers. What I mean by "free" is not actually the need to perform the DELETE operation, but that it doesn't add extra cost burden to support this feature. As far as use case, it varies quite a bit. There are many use cases I can imagine for persistent storage with a quota for each user that auto-evicts old data over time for a low cost. Even for "big data" scenarios, the cost of computing still goes up as the data size grows. For instance, a database used to store objects a user interacted with for performing collaborative filtering only needs a sample. In real world use cases, these types of algorithms really need a relatively bounded set of data, and user taste might change over time, so only taking into consideration the most recent 90 objects makes sense. TTL'ing this data also doesn't make sense, because there are a wide range of frequencies at which users might generate this data. [~slebresne]: I spent a few hours digging thru the compaction source and it's going to be messy to do this, probably involving a lot of copy+paste, so I'm even more +1 on disaggregating that massive Runnable method in CompactionTask into something more pluggable / extensible. > Support row size limits > ----------------------- > > Key: CASSANDRA-3929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3929 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Priority: Minor > Labels: ponies > Fix For: 2.0 > > Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt, > 3929_f.txt, 3929_g_tests.txt, 3929_g.txt, 3929.txt > > > We currently support expiring columns by time-to-live; we've also had > requests for keeping the most recent N columns in a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira