[ https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463494#comment-13463494 ]
Devaraj Das commented on HBASE-6679: ------------------------------------ Okay, did some digging into the logs (that was attached in the jira earlier) and the code. Doesn't seem like a race between compaction and split (apologies for the confusion I might have created). The two are sequential (at the end of a compaction, split is requested for). But I'll note that the split happens in a separate thread. The problem is that the daughter tries to open a reader to a file that doesn't exist. {noformat} java.io.IOException: Failed ip-10-4-197-133.ec2.internal,60020,1346119706203-daughterOpener=4efb1c92918bbf3c54d0ead3345bb735 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:368) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:456) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: File does not exist: /apps/hbase/data/TestLoadAndVerify_1346120615716/5689a8785bbc9a8aa8e526cd7ef1542a/f1/5a55df83829f401993d95ecf2e539ba1 {noformat} The method SplitTransaction.createDaughters creates the reference files (via a call to the method SplitTransaction.splitStoreFiles) that the daughter then tries to open. The list of files to create references to is the set of entries in the storeFiles field in Store.java (obtained via the call to this.parent.close in createDaughters). The storeFiles is last updated (in the thread doing the compaction) in the method Store.completeCompaction. My suspicion is that the problem is due to the fact that accesses to storeFiles is not synchronized, and it not volatile either. This leads to inconsistencies in the compaction-thread and split-thread and the split thread doesn't see the last updated value of the field. If the above theory is right (and I have this theory only), then the solution could be to make the storeFiles field volatile. Thoughts? > RegionServer aborts due to race between compaction and split > ------------------------------------------------------------ > > Key: HBASE-6679 > URL: https://issues.apache.org/jira/browse/HBASE-6679 > Project: HBase > Issue Type: Bug > Reporter: Devaraj Das > Assignee: Devaraj Das > Fix For: 0.92.3 > > Attachments: rs-crash-parallel-compact-split.log > > > In our nightlies, we have seen RS aborts due to compaction and split racing. > Original parent file gets deleted after the compaction, and hence, the > daughters don't find the parent data file. The RS kills itself when this > happens. Will attach a snippet of the relevant RS logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira