[ https://issues.apache.org/jira/browse/HBASE-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527635#comment-14527635 ]
Enis Soztutar commented on HBASE-13576: --------------------------------------- LGTM except that we should re-throw the exception from the first region (if possible) so that we do not lose the exception context if limit is configured as 0. > HBCK enhancement: Failure in checking one region should not fail the entire > HBCK operation. > ------------------------------------------------------------------------------------------- > > Key: HBASE-13576 > URL: https://issues.apache.org/jira/browse/HBASE-13576 > Project: HBase > Issue Type: Bug > Components: hbck > Affects Versions: 2.0.0, 1.1.0, 1.2.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Attachments: HBASE-13576.v1-master.patch, > HBASE-13576.v2-master.patch, HBASE-13576.v3-master.patch > > > HBaseFsck#checkRegionConsistency() checks region consistency and repair the > corruption if requested. However, this function expects some exceptions. > For example, in one aspect of region repair, it calls > HBaseFsckRepair#waitUntilAssigned(), if a region is in transition for over > 120 seconds (default value of "hbase.hbck.assign.timeout" configuration), > IOException would throw. > The problem is that one exception in checkRegionConsistency() would kill > entire hbck operation, because the exception would propagate up. > The proposal is that if the region is not META region ( or a system table > region if we prefer), we can skip the region if > HBaseFsck#checkRegionConsistency() fails. We could print out skip regions in > summary section so that users know to either re-run or investigate potential > issue for that region. -- This message was sent by Atlassian JIRA (v6.3.4#6332)