[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153385#comment-15153385
 ] 

Clara Xiong commented on HBASE-15181:
-------------------------------------

[~enis][~vrodionov]  I appreciate all the callouts and suggestions greatly. I 
agree we have to guarantee correctness and I think the first proposal will 
carry the smaller trade-off.

We want to construct the tiers to maximize performance for  time-range scan on 
time-series data.The scan api are typically based on look back window based on 
data timestamps. So we want to use timestamp instead of creation time to align 
the tiers to the scans. And creation time may not be as reliable as sequence id 
as a monochronic indicator.  

[~vrodionov] I carefully thought about Enis' first proposal. It should work 
since we know which tier and compaction window a store file belongs to as long 
as we know the current time and the file's maxTimestamp. We don't need the 
sequenceId to build the tiers.

But I want to add a tweak to this proposal on how to handle late-arriving data. 
I want to compact the out-of-order data with newer files other than older ones. 
Since we don’t write future data, the worst scenario is that file on the lower 
tier  have long tails instead of the data goes to higher tier. The additional 
cost of long tail is the cost to scan newer and smaller files. Given the tiered 
design, we only need to scan additional data at most the tail size + current 
window size.This will also reduce the chance of recompacting small files to an 
out-of -portion file.  

For the bulk load scenarios, currently bulk-load file carries 0 as sequenceId 
which will land them at the highest tier. It is configurable to use the 
sequenceId at the time of creation which will land them at the lower tiers. We 
will need to call out that user wants to decide based on the data timestamps 
relatively to the tiers and the access pattern. 

Please let me know what you think.
 
[~enis]We do have performance results from production on a very large cluster 
replicated across multiple DC serving many concurrent time-range scans of 
different look-back windows. We are collecting more. I will share them 
externally once they are ready, most likely next week. We have observed drastic 
IO reduction for the scans.





> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to