Tupshin Harper created CASSANDRA-5561:
-----------------------------------------

             Summary: Compaction strategy that minimizes re-compaction of 
old/frozen data
                 Key: CASSANDRA-5561
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5561
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 1.2.3
            Reporter: Tupshin Harper
             Fix For: 2.0


Neither LCS nor STCS are good for data that becomes immutable over time. The 
most obvious case is for time-series data where the application can guarantee 
that out-of-order delivery (to Cassandra) of events can't take place more than 
N minutes/seconds/hours/days have elapsed after the real (wall time).

There are various approaches that could involve paying attention to the row 
keys (if they include a time component) and/or the column name (if they are 
TimeUUID or Integer based and are inherently time-ordered), but it might be 
sufficient to just look at the timestamp of the columns themselves.

A possible approach:
1) Define an optional max-out-of-order window on a per-CF basis.
2) Use normal (LCS or STCS) compaction strategy for any SSTables that include 
any columns younger than max-out-of-order-delivery).
3) Use alternate compaction strategy (will call it TWCS time window compaction 
strategy for now) for any SSTables that only contain columns older than 
max-out-of-order-delivery.
4) TWCS will only compact sstables containing data older than 
max-out-of-order-delivery.
5) TWCS will only perform compaction to reduce row fragmentation (if there is 
any by the time it gets to TWCS or to reduce the number of small sstables.
6) To minimize re-compaction in TWCS, it should aggresively try to compact as 
many small sstables as possible into one large sstable that would never have to 
get recompacted. 

In the case of large datasets (e.g. 5TB per node) with LCS, there would be on 
the order of seven levels, and hence seven separate writes of the same data 
over time. With this approach, it should be possible to get about 3 compactions 
per column (2 in original compaction and one more once hitting TWCS) in most 
cases, cutting the write workload by a factor of two or more for high volume 
time-series applications.

Note that the only workaround I can currently suggest to minimize compaction 
for these workloads is to programatically shard your data across time-window 
ranges (e.g. new CF per week), but that pushes unnecessary writing and querying 
logic out to the user and is not as convenient nor flexible.

Also note that I am not convinced that the approach I've suggested above is the 
best/most general way to solve the problem, but it does appear to be a 
relatively easy one to implement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to