[ https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Jirsa updated CASSANDRA-11407: ----------------------------------- Labels: dtcs (was: ) > Proposal for simplified DTCS > ---------------------------- > > Key: CASSANDRA-11407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11407 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Reporter: Anubhav Kale > Labels: dtcs > Attachments: 0001-Simple-DTCS.patch > > > Today's DTCS implementation has been discussed and debated in a few JIRAs > already (the notable one is > https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main > challenges with the current approach is that it is very difficult to reason > about how the "Target" class makes buckets, thus making it difficult to > reason about the expected file layout on disk. > I am proposing a simplification to current approach that keeps most of the > DTCS properties intact that makes it a great fit for time-series data. The > simplification is as follows. > Given the min and max timestamps across all SS Tables in question, start from > min and make windows based on base and min_threshold. The logic in GetWindow > simply tries to fit maximum sized windows from min to max. > This keeps the DTCS properties intact except that we don't need to wait for > min_threshold windows before making a bigger one. I would argue this > simplifies the algorithm to a great extent, is easy to reason about and the > end result isn't drastically different than the original DTCS in most cases. > We give up on the "alignment" logic that exists in current implementation, > but I honestly don't think it buys us a lot besides complexity. > The implementation can obviously be optimized and cleaned up more if folks > think this is a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)