Liu Shaohui created HBASE-10370:
-----------------------------------

             Summary: Compaction in out-of-date Store causes region split failed
                 Key: HBASE-10370
                 URL: https://issues.apache.org/jira/browse/HBASE-10370
             Project: HBase
          Issue Type: Bug
          Components: Compaction
            Reporter: Liu Shaohui
            Priority: Critical


In out product cluster, we encounter a problem that two daughter regions can 
not been opened for FileNotFoundException.
{quote}
2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Running rollback/cleanup of failed split of 
user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed 
lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
java.io.IOException: Failed 
lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
        at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.IOException: 
java.io.FileNotFoundException: File does not exist: 
/hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
....
{quote}
The reason is that a compaction in an out-of-date Store deletes the hfiles, 
which are referenced by  the daughter regions after split. This will cause the 
daughter regions can not be opened forever. 

The timeline is that 

Assumption: there are two hfiles: a, b in Store A in Region R
t0: A compaction request of Store A(a+b) in Region R is send.

t1: A Split for Region R. But the split is timeout and  rollbacked. In the 
rollback, region reinitializes all store objects , see SplitTransaction #824. 
Now the store is Region R is A'(a+b).

t2: Run compaction(a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile 
references for hfile a, b from Store A'(a + b)

t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will 
failed for FileNotFoundException.

I have add a test to identity this problem.

After search the jira, maybe HBASE-8502 is the same problem. [~goldin]




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to