Stephen Yuan Jiang created HBASE-15940:
------------------------------------------

             Summary: HBCK unnecessary moves reference files when a table has 
split region to fix non-existing overlap regions
                 Key: HBASE-15940
                 URL: https://issues.apache.org/jira/browse/HBASE-15940
             Project: HBase
          Issue Type: Bug
          Components: hbck
    Affects Versions: 1.0.0
            Reporter: Stephen Yuan Jiang
            Assignee: Stephen Yuan Jiang
         Attachments: org.apache.hadoop.hbase.util.TestHBaseFsck-output.txt, 
repro-hbck-repair-healthy-splitted=region.patch

When repair option (the -fixHdfsOverlaps option specifically) is specified 
against a table, if the table has splitted regions (both parent region and 
child regions exists with reference files), Hbck would wrongly think that there 
exists overlapped regions and try to merge them and fix it.  

This is by-design, as current implementation of Hbck uses HDFS as the trusted 
source without consulting META table.

Here is the comments from one of unit tests:
{code}
      // TODO: fixHdfsHoles does not work against splits, since the parent dir 
lingers on
      // for some time until children references are deleted. HBCK erroneously 
sees this as
      // overlapping regions
{code}

However, this is undesirable.  when the reference files moved to a new region, 
the parent region would have no daugher regions and hence it could be cleaned 
up by CatalogJanitor.  This would create real inconsistency: lingering 
reference files.  

Another bad consequence is that we would merge splitted regions back to one.  
Even it is undesirable, at least this would not cause more inconsistency.  this 
JIRA would not try to solve this unsplit issue, as it requires bigger design 
change in Hbck.  

This JIRA is  trying to address the potential lingering reference files issue, 
as multiple customers using branch-1 faced this issue in production.  
(workaround is that run major compaction on all split regions before run HBCK, 
this could take longer time and have production impact).

Attached is the log and modified unit test to repro the issue.  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to