[ 
https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463494#comment-13463494
 ] 

Devaraj Das commented on HBASE-6679:
------------------------------------

Okay, did some digging into the logs (that was attached in the jira earlier) 
and the code. Doesn't seem like a race between compaction and split (apologies 
for the confusion I might have created). The two are sequential (at the end of 
a compaction, split is requested for). But I'll note that the split happens in 
a separate thread.

The problem is that the daughter tries to open a reader to a file that doesn't 
exist. 
{noformat}
java.io.IOException: Failed 
ip-10-4-197-133.ec2.internal,60020,1346119706203-daughterOpener=4efb1c92918bbf3c54d0ead3345bb735
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:368)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:456)
        at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File does not exist: 
/apps/hbase/data/TestLoadAndVerify_1346120615716/5689a8785bbc9a8aa8e526cd7ef1542a/f1/5a55df83829f401993d95ecf2e539ba1
{noformat}

The method SplitTransaction.createDaughters creates the reference files (via a 
call to the method SplitTransaction.splitStoreFiles) that the daughter then 
tries to open. The list of files to create references to is the set of entries 
in the storeFiles field in Store.java (obtained via the call to 
this.parent.close in createDaughters). The storeFiles is last updated (in the 
thread doing the compaction) in the method Store.completeCompaction.

My suspicion is that the problem is due to the fact that accesses to storeFiles 
is not synchronized, and it not volatile either. This leads to inconsistencies 
in the compaction-thread and split-thread and the split thread doesn't see the 
last updated value of the field.

If the above theory is right (and I have this theory only), then the solution 
could be to make the storeFiles field volatile.

Thoughts?
                
> RegionServer aborts due to race between compaction and split
> ------------------------------------------------------------
>
>                 Key: HBASE-6679
>                 URL: https://issues.apache.org/jira/browse/HBASE-6679
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.92.3
>
>         Attachments: rs-crash-parallel-compact-split.log
>
>
> In our nightlies, we have seen RS aborts due to compaction and split racing. 
> Original parent file gets deleted after the compaction, and hence, the 
> daughters don't find the parent data file. The RS kills itself when this 
> happens. Will attach a snippet of the relevant RS logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to