[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153385#comment-15153385 ]
Clara Xiong commented on HBASE-15181: ------------------------------------- [~enis][~vrodionov] I appreciate all the callouts and suggestions greatly. I agree we have to guarantee correctness and I think the first proposal will carry the smaller trade-off. We want to construct the tiers to maximize performance for time-range scan on time-series data.The scan api are typically based on look back window based on data timestamps. So we want to use timestamp instead of creation time to align the tiers to the scans. And creation time may not be as reliable as sequence id as a monochronic indicator. [~vrodionov] I carefully thought about Enis' first proposal. It should work since we know which tier and compaction window a store file belongs to as long as we know the current time and the file's maxTimestamp. We don't need the sequenceId to build the tiers. But I want to add a tweak to this proposal on how to handle late-arriving data. I want to compact the out-of-order data with newer files other than older ones. Since we don’t write future data, the worst scenario is that file on the lower tier have long tails instead of the data goes to higher tier. The additional cost of long tail is the cost to scan newer and smaller files. Given the tiered design, we only need to scan additional data at most the tail size + current window size.This will also reduce the chance of recompacting small files to an out-of -portion file. For the bulk load scenarios, currently bulk-load file carries 0 as sequenceId which will land them at the highest tier. It is configurable to use the sequenceId at the time of creation which will land them at the lower tiers. We will need to call out that user wants to decide based on the data timestamps relatively to the tiers and the access pattern. Please let me know what you think. [~enis]We do have performance results from production on a very large cluster replicated across multiple DC serving many concurrent time-range scans of different look-back windows. We are collecting more. I will share them externally once they are ready, most likely next week. We have observed drastic IO reduction for the scans. > A simple implementation of date based tiered compaction > ------------------------------------------------------- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction > Reporter: Clara Xiong > Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)