[ https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889062#comment-16889062 ]
stack commented on HBASE-21745: ------------------------------- A few thoughts on remaining items: * Fix region holes, overlaps, and other errors in the region chain * Fix failed split and merge transactions that have failed to roll back due to some bug (related to previous) There are holes and overlaps in hbase:meta and then there are holes and overlaps in the filesystem (hdfs). In the past, hbck1 would fix 'holes and overlaps' in hdfs.... then hbase:meta would be consulted and adjusted to pick up the hdfs changes. Lets not do it this way for hbck2 (Caveat HBASE-22567 which finds hbase:meta holes and if an hdfs region, hoists it up into hbase;meta). In hbck2, perhaps the Master itself can see 'holes' and 'overlaps' in hbase:meta. Master already runs a process on a period to ‘check’ hbase:meta called CatalogJanitor. It could minimally report holes and overlaps (as well as unknown servers, etc.). I was going to have a look at doing this. CJ could report to the UI its findings (after the [~zghaobac] new tendency) What about leftover directories in hdfs? Orphans and broken regions or broken tables? In hdfs, hbck1 used to have the notion of 'adoption' where a new region was created in a target table and the 'orphan' region's content was copied into the new location. Thereafter, there'd be machinations to get the new region up into hbase:meta. What if we ran an 'adoption service' in the Master where hbck2 would pass the Master a list of directories and tell the Master to 'adopt' the content whether files or dropped regions, overlapping dirs, or even tables? The Master's hbase:meta would have to be healthy first so new data had a home to go to. On fix split and merge transactions, this category of issues we should roll up into the general master fix described above where something like CJ recognizes any problem (it already does a bunch of the heavy-lifting for split/merges). The 'HBASE-21965 Fix failed split and merge transactions that have failed to roll back' "fix" above has actually been undone for now in favor of "HBASE-22709 Add a web ui to show the failed splited/merged regions" whose intent is listing in UI split/merges with recipes for fix. And then perhaps a release of hbase-operator-tools? > Make HBCK2 be able to fix issues other than region assignment > ------------------------------------------------------------- > > Key: HBASE-21745 > URL: https://issues.apache.org/jira/browse/HBASE-21745 > Project: HBase > Issue Type: Umbrella > Components: hbase-operator-tools, hbck2 > Reporter: Duo Zhang > Assignee: stack > Priority: Critical > > This is what [~apurtell] posted on mailing-list, HBCK2 should support > * -Rebuild meta from region metadata in the filesystem, aka offline meta > rebuild.- > * -Fix assignment errors (undeployed regions, double assignments (yes, > should not be possible), etc)- (See > https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302) > * Fix region holes, overlaps, and other errors in the region chain > * Fix failed split and merge transactions that have failed to roll back due > to some bug (related to previous) > * -Enumerate store files to determine file level corruption and sideline > corrupt files- > * -Fix hfile link problems (dangling / broken)- -- This message was sent by Atlassian JIRA (v7.6.14#76016)