[ https://issues.apache.org/jira/browse/HBASE-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Rodionov resolved HBASE-14477. --------------------------------------- Resolution: Duplicate Duplicate of HBASE-15181 > Compaction improvements: Date tiered compaction policy > ------------------------------------------------------ > > Key: HBASE-14477 > URL: https://issues.apache.org/jira/browse/HBASE-14477 > Project: HBase > Issue Type: New Feature > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > For immutable and mostly immutable data the current SizeTiered-based > compaction policy is not efficient. > # There is no need to compact all files into one, because, data is (mostly) > immutable and we do not need to collect garbage. (performance reason will be > discussed later) > # Size-tiered compaction is not suitable for applications where most recent > data is most important and prevents efficient caching of this data. > The idea is pretty similar to DateTieredCompaction in Cassandra: > http://www.datastax.com/dev/blog/datetieredcompactionstrategy > http://www.datastax.com/dev/blog/dtcs-notes-from-the-field > From Cassandra own blog: > {quote} > Since DTCS can be used with any table, it is important to know when it is a > good idea, and when it is not. I’ll try to explain the spectrum and > trade-offs here: > 1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you > ingest fact data that is ordered in time, with no deletes or overwrites. This > is the standard “time series” use case. > 2. OK Fit: Time-Ordered, with limited updates across whole data set, or only > updates to recent data: When you ingest data that is (mostly) ordered in > time, but revise or delete a very small proportion of the overall data across > the whole timeline. > 3. Not a Good Fit: many partial row updates or deletions over time: When you > need to partially revise or delete fields for rows that you read together. > Also, when you revise or delete rows within clustered reads. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)