[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889062#comment-16889062
 ] 

stack commented on HBASE-21745:
-------------------------------

A few thoughts on remaining items:

 * Fix region holes, overlaps, and other errors in the region chain
 * Fix failed split and merge transactions that have failed to roll back due to 
some bug (related to previous)

There are holes and overlaps in hbase:meta and then there are holes and 
overlaps in the filesystem (hdfs). In the past, hbck1 would fix 'holes and 
overlaps' in hdfs.... then hbase:meta would be consulted and adjusted to pick 
up the hdfs changes. Lets not do it this way for hbck2 (Caveat HBASE-22567 
which finds hbase:meta holes and if an hdfs region, hoists it up into 
hbase;meta). In hbck2, perhaps the Master itself can see 'holes' and 'overlaps' 
in hbase:meta. Master already runs a process on a period to ‘check’ hbase:meta 
called CatalogJanitor. It could minimally report holes and overlaps (as well as 
unknown servers, etc.). I was going to have a look at doing this. CJ could 
report to the UI its findings (after the [~zghaobac] new tendency)

What about leftover directories in hdfs? Orphans and broken regions or broken 
tables? In hdfs, hbck1 used to have the notion of 'adoption' where a new region 
was created in a target table and the 'orphan' region's content was copied into 
the new location. Thereafter, there'd be machinations to get the new region up 
into hbase:meta. What if we ran an 'adoption service' in the Master where hbck2 
would pass the Master a list of directories and tell the Master to 'adopt' the 
content whether files or dropped regions, overlapping dirs, or even tables? The 
Master's hbase:meta would have to be healthy first so new data had a home to go 
to.

On fix split and merge transactions, this category of issues we should roll up 
into the general master fix described above where something like CJ recognizes 
any problem (it already does a bunch of the heavy-lifting for split/merges). 
The 'HBASE-21965
Fix failed split and merge transactions that have failed to roll back' "fix" 
above has actually been undone for now in favor of "HBASE-22709 Add a web ui to 
show the failed splited/merged regions" whose intent is listing in UI 
split/merges with recipes for fix.

And then perhaps a release of hbase-operator-tools?



> Make HBCK2 be able to fix issues other than region assignment
> -------------------------------------------------------------
>
>                 Key: HBASE-21745
>                 URL: https://issues.apache.org/jira/browse/HBASE-21745
>             Project: HBase
>          Issue Type: Umbrella
>          Components: hbase-operator-tools, hbck2
>            Reporter: Duo Zhang
>            Assignee: stack
>            Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to