[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042994#comment-16042994
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:
------------------------------------------------

the second stack trace is because we do FileStatus.list() to get the 
modification time. So there could be a chance the file got archived. 
{code}
catch (FailedArchiveException fae) {
          // Even if archiving some files failed, we still need to clear out 
any of the
          // files which were successfully archived.  Otherwise we will receive 
a
          // FileNotFoundException when we attempt to re-archive them in the 
next go around.
          Collection<Path> failedFiles = fae.getFailedFiles();
          Iterator<StoreFile> iter = filesToRemove.iterator();
          while (iter.hasNext()) {
            if (failedFiles.contains(iter.next().getPath())) {
              iter.remove();
            }
          }
          if (!filesToRemove.isEmpty()) {
            clearCompactedfiles(filesToRemove);
          }
          throw fae;
{code}
This code in HStore#removeCompactedFiles() - if we get FileNotFound exception 
for a set of files - then we should simple clear the compacted files with all 
those fileToRemove, if we remove some of the items from that list 
'filesToRemove' then next time it will still be picked up right? Am I missing 
something.

> Frequent FileNotFoundExceptions in region server logs
> -----------------------------------------------------
>
>                 Key: HBASE-18186
>                 URL: https://issues.apache.org/jira/browse/HBASE-18186
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs://<region_dir>/<cf>/<store_file> because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory <region_dir>/<cf>/<store_file> 
> does not exist.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>       at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>       at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>       at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>       at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>       at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
> File/Directory <region_dir>/<cf>/<store_file> does not exist.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1413)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy16.setTimes(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:854)
>       at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy17.setTimes(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:302)
>       at com.sun.proxy.$Proxy18.setTimes(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3113)
>       ... 18 more
> {Code}
> {Code}
> ERROR regionserver.CompactSplitThread: Compaction selection failed Store = 
> <cf>, pri = 289
> java.io.FileNotFoundException: File does not exist: 
> hdfs://<region_dir>/<cf>/<store_file>
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
>       at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
>       at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64)
>       at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
>       at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
>       at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494)
>       at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to