[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044134#comment-16044134
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:
------------------------------------------------

This and HBASE-17406 I can see another way this could happen - mainly the 
StackTrace of file not found during the time of opening the scanners,
->Thread 1 creates a scan and it acquires the HStore's readLock and tries to 
get all the storeFiles over which the StorefileScanner has to be created. So it 
has just come out of the read lock and still the scanners are not created.
->Thread 2 - a compaction is completed where the store files involved in the 
scanner of Thread 1 are actually compacted. It acquires the write lock to 
update the compactedFiles. These compactedFiles are marked compactedAway.
->Thread 3 (The CompactionDischareHandler) starts and it acquires the read lock 
to find the list of compactedfiles. It gets all the files created by Thread 2 
and continues with archiving those files. At this point the refCount is still 0 
for those storefiles. So it can see these files are marked compactedAway but 
refCount is still 0 and so it can remove those files for archiving.

->Now Thread 1 continues with incrementing the refcount to 1 on the Storefiles 
used for scanning. But by this time the file could have been archived. 
Am not sure if a test case will be possible here. But let me check by adding 
some debug points if the above can be reproduced. How ever the soln is simple 
is like we could extend the readLock in Thread 1 to be held till the 
StorefileScanners are created because the only costly step there is creation of 
readers and that is already done when the storeFiles are actually 
compacted/flushed. 

> Frequent FileNotFoundExceptions in region server logs
> -----------------------------------------------------
>
>                 Key: HBASE-18186
>                 URL: https://issues.apache.org/jira/browse/HBASE-18186
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs://<region_dir>/<cf>/<store_file> because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory <region_dir>/<cf>/<store_file> 
> does not exist.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>       at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>       at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>       at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>       at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
> File/Directory <region_dir>/<cf>/<store_file> does not exist.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1413)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy16.setTimes(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:854)
>       at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy17.setTimes(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:302)
>       at com.sun.proxy.$Proxy18.setTimes(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3113)
>       ... 18 more
> {Code}
> {Code}
> ERROR regionserver.CompactSplitThread: Compaction selection failed Store = 
> <cf>, pri = 289
> java.io.FileNotFoundException: File does not exist: 
> hdfs://<region_dir>/<cf>/<store_file>
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
>       at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64)
>       at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
>       at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to