[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chunhui shen updated HBASE-7403: -------------------------------- Description: The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions Usage: 1.Tool: bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] <table-name> <region-encodedname-1> <region-encodedname-2> 2.API: static void MergeManager#createMergeRequest We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t be rolled back. It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch was: The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t be rolled back. It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch > Online Merge > ------------ > > Key: HBASE-7403 > URL: https://issues.apache.org/jira/browse/HBASE-7403 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.94.3 > Reporter: chunhui shen > Assignee: chunhui shen > Fix For: 0.96.0, 0.94.5 > > Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, > 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, > hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, > hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, > merge region.pdf > > > The feature of this online merge: > 1.Online,no necessary to disable table > 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 > 3.Easy to call merege request, no need to input a long region name, only > encoded name enough > 4.No limit when operation, you don't need to tabke care the events like > Server Dead, Balance, Split, Disabing/Enabing table, no need to take care > whether you send a wrong merge request, it has alread done for you > 5.Only little offline time for two merging regions > Usage: > 1.Tool: > bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] > <table-name> <region-encodedname-1> <region-encodedname-2> > 2.API: static void MergeManager#createMergeRequest > We need merge in the following cases: > 1.Region hole or region overlap, can’t be fix by hbck > 2.Region become empty because of TTL and not reasonable Rowkey design > 3.Region is always empty or very small because of presplit when create table > 4.Too many empty or small regions would reduce the system performance(e.g. > mslab) > Current merge tools only support offline and are not able to redo if > exception is thrown in the process of merging, causing a dirty data > For online system, we need a online merge. > This implement logic of this patch for Online Merge is : > For example, merge regionA and regionB into regionC > 1.Offline the two regions A and B > 2.Merge the two regions in the HDFS(Create regionC’s directory, move > regionA’s and regionB’s file to regionC’s directory, delete regionA’s and > regionB’s directory) > 3.Add the merged regionC to .META. > 4.Assign the merged regionC > As design of this patch , once we do the merge work in the HDFS,we could redo > it until successful if it throws exception or abort or server restart, but > couldn’t be rolled back. > It depends on > Use zookeeper to record the transaction journal state, make redo easier > Use zookeeper to send/receive merge request > Merge transaction is executed on the master > Support calling merge request through API or shell tool > About the merge process, please see the attachment and patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira