[ https://issues.apache.org/jira/browse/HBASE-23044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278417#comment-17278417 ]
Karthik Palanisamy commented on HBASE-23044: -------------------------------------------- Yes [~rsanwal] . Sometimes silent data-loss. Two users reported this issue recently but we somehow recovered data from archive. We requested users to disable NORMALIZATION or any aggressive manual region merge. > CatalogJanitor#cleanMergeQualifier may clean wrong parent regions > ----------------------------------------------------------------- > > Key: HBASE-23044 > URL: https://issues.apache.org/jira/browse/HBASE-23044 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.0.6, 2.2.1, 2.1.6 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Priority: Critical > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > 2019-09-17,19:42:40,539 INFO [PEWorker-1] > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=1223589, > state=SUCCESS; GCMultipleMergedRegionsProcedure > child={color:red}647600d28633bb2fe06b40682bab0593{color}, > parents:[81b6fc3c560a00692bc7c3cd266a626a], > [472500358997b0dc8f0002ec86593dcf] in 2.6470sec > 2019-09-17,19:59:54,179 INFO [PEWorker-6] > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=1223651, > state=SUCCESS; GCMultipleMergedRegionsProcedure > child={color:red}647600d28633bb2fe06b40682bab0593{color}, > parents:[9c52f24e0a9cc9b4959c1ebdfea29d64], > [a623f298870df5581bcfae7f83311b33] in 1.0340sec > The child is same region {color:red}647600d28633bb2fe06b40682bab0593{color} > but the parent regions are different. > MergeTableRegionProcedure#prepareMergeRegion will try to cleanMergeQualifier > for the regions to merge. > {code:java} > for (RegionInfo ri: this.regionsToMerge) { > if (!catalogJanitor.cleanMergeQualifier(ri)) { > String msg = "Skip merging " + > RegionInfo.getShortNameToLog(regionsToMerge) + > ", because parent " + RegionInfo.getShortNameToLog(ri) + " has a > merge qualifier"; > LOG.warn(msg); > throw new MergeRegionException(msg); > } > {code} > If region A and B merge to C, region D and E merge to F. When merge C and F, > it will try to cleanMergeQualifier for C and F. > catalogJanitor.cleanMergeQualifier for region C succeed but > catalogJanitor.cleanMergeQualifier for region F failed as there are > references in region F. > When merge C and F again, it will try to cleanMergeQualifier for C and F > again. But MetaTableAccessor.getMergeRegions will get wrong parents now. It > use scan with filter to scan result. But region C's MergeQualifier already > was deleted before. Then the scan will return a wrong result, may be anther > region...... > {code:java} > public boolean cleanMergeQualifier(final RegionInfo region) throws > IOException { > // Get merge regions if it is a merged region and already has merge > qualifier > List<RegionInfo> parents = > MetaTableAccessor.getMergeRegions(this.services.getConnection(), > region.getRegionName()); > if (parents == null || parents.isEmpty()) { > // It doesn't have merge qualifier, no need to clean > return true; > } > return cleanMergeRegion(region, parents); > } > public static List<RegionInfo> getMergeRegions(Connection connection, byte[] > regionName) > throws IOException { > return getMergeRegions(getMergeRegionsRaw(connection, regionName)); > } > private static Cell [] getMergeRegionsRaw(Connection connection, byte [] > regionName) > throws IOException { > Scan scan = new Scan().withStartRow(regionName). > setOneRowLimit(). > readVersions(1). > addFamily(HConstants.CATALOG_FAMILY). > setFilter(new QualifierFilter(CompareOperator.EQUAL, > new RegexStringComparator(HConstants.MERGE_QUALIFIER_PREFIX_STR+ > ".*"))); > try (Table m = getMetaHTable(connection); ResultScanner scanner = > m.getScanner(scan)) { > // Should be only one result in this scanner if any. > Result result = scanner.next(); > if (result == null) { > return null; > } > // Should be safe to just return all Cells found since we had filter in > place. > // All values should be RegionInfos or something wrong. > return result.rawCells(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)