[ https://issues.apache.org/jira/browse/HBASE-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525467#comment-14525467 ]
Ted Yu commented on HBASE-13576: -------------------------------- Looks good overall. {code} 1825 int terminateThreshold = getConf().getInt("hbase.hbck.skippedregionslimit", 0); {code} To be consistent with other hbck parameters, please use "hbase.hbck.skipped.regions.limit" {code} 3739 + (skippedRegions.containsKey(tInfo.getName())? " (with some skipped regions)." : ".")); {code} Please add the number of skipped regions to above message. > HBCK enhancement: Failure in checking one region should not fail the entire > HBCK operation. > ------------------------------------------------------------------------------------------- > > Key: HBASE-13576 > URL: https://issues.apache.org/jira/browse/HBASE-13576 > Project: HBase > Issue Type: Bug > Components: hbck > Affects Versions: 2.0.0, 1.1.0, 1.2.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Attachments: HBASE-13576.v1-master.patch, HBASE-13576.v2-master.patch > > > HBaseFsck#checkRegionConsistency() checks region consistency and repair the > corruption if requested. However, this function expects some exceptions. > For example, in one aspect of region repair, it calls > HBaseFsckRepair#waitUntilAssigned(), if a region is in transition for over > 120 seconds (default value of "hbase.hbck.assign.timeout" configuration), > IOException would throw. > The problem is that one exception in checkRegionConsistency() would kill > entire hbck operation, because the exception would propagate up. > The proposal is that if the region is not META region ( or a system table > region if we prefer), we can skip the region if > HBaseFsck#checkRegionConsistency() fails. We could print out skip regions in > summary section so that users know to either re-run or investigate potential > issue for that region. -- This message was sent by Atlassian JIRA (v6.3.4#6332)