[ https://issues.apache.org/jira/browse/HBASE-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192261#comment-15192261 ]
Hadoop QA commented on HBASE-15400: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s {color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.2.0/precommit-patchnames for instructions. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 14s {color} | {color:red} Docker failed to build yetus/hbase:date2016-03-13. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12793187/HBASE-15400-v1.pa | | JIRA Issue | HBASE-15400 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/945/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Use DateTieredCompactor for Date Tiered Compaction > -------------------------------------------------- > > Key: HBASE-15400 > URL: https://issues.apache.org/jira/browse/HBASE-15400 > Project: HBase > Issue Type: Sub-task > Components: Compaction > Reporter: Clara Xiong > Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15400-v1.pa, HBASE-15400.patch > > > When we compact, we can output multiple files along the current window > boundaries. There are two use cases: > 1. Major compaction: We want to output date tiered store files with data > older than max age archived in trunks of the window size on the higher tier. > 2. Bulk load files and the old file generated by major compaction before > upgrading to DTCP. > Pros: > 1. Restore locality, process versioning, updates and deletes while > maintaining the tiered layout. > 2. The best way to fix a skewed layout. > > This work is based on a prototype of DateTieredCompactor from HBASE-15389 and > focused on the part to meet needs for these two use cases while supporting > others. I have to call out a few design decisions: > 1. We only want to output the files along all windows for major compaction. > And we want to output multiple files older than max age in the trunks of the > maximum tier window size determined by base window size, windows per tier and > max age. > 2. For minor compaction, we don't want to output too many files, which will > remain around because of current restriction of contiguous compaction by seq > id. I will only output two files if all the files in the windows are being > combined, one for the data within window and the other for the out-of-window > tail. If there is any file in the window excluded from compaction, only one > file will be output from compaction. When the windows are promoted, the > situation of out of order data will gradually improve. For the incoming > window, we need to accommodate the case with user-specified future data. > 3. We have to pass the boundaries with the list of store file as a complete > time snapshot instead of two separate calls because window layout is > determined by the time the computation is called. So we will need new type of > compaction request. > 4. Since we will assign the same seq id for all output files, we need to sort > by maxTimestamp subsequently. Right now all compaction policy gets the files > sorted for StoreFileManager which sorts by seq id and other criteria. I will > use this order for DTCP only, to avoid impacting other compaction policies. > 5. We need some cleanup of current design of StoreEngine and CompactionPolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)