[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119384#comment-16119384 ] ramkrishna.s.vasudevan commented on HBASE-18186: bq. Split fails if we write out a link to a file in parent region that either doesn't exist or gets incorrectly removed (by compaction discharger/archiver?) after the link was created, so during the split you can't open daughter regions due to IO error. I will go through the split code once again and how the reference files are compacted. I believe when the store file accounting was done this area was looked into how ever there are chances that obvious cases are missed. Will get back here. Thanks for the update [~mantonov]. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118881#comment-16118881 ] Mikhail Antonov commented on HBASE-18186: - Yes, I saw that on the cluster with HBASE-16788 patch applied. There's more (subtle) problem somewhere here. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118879#comment-16118879 ] Mikhail Antonov commented on HBASE-18186: - Regarding failed splits -those are more a result of the bug rather than a scenario. Split fails if we write out a link to a file in parent region that either doesn't exist or gets incorrectly removed (by compaction discharger/archiver?) after the link was created, so during the split you can't open daughter regions due to IO error. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118877#comment-16118877 ] Anoop Sam John commented on HBASE-18186: You see these FNFE (Take case 1) after fixing HBASE-16788? Or this happened in a test where HBASE-16788 is not in? Before HBASE-16788, there was chance that a slow running CompactedHFilesDischarger would run parallel with a new one and both have same file path to be archived and one will succed and other will get FNFE. HBASE-16788 added synchronized for closeAndArchiveCompactedFiles. May be at top level itself (at begin of Chore run) we should check and not allow 2 parallel CompactedHFilesDischarger chores? > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118876#comment-16118876 ] Mikhail Antonov commented on HBASE-18186: - [~ram_krish] This should be most visible with the workload with high write rate (frequent flushes/compactions) and reasonably high rate of *long* scans (I don't know if I can say same scans never succeeds or not, because at some point client will stop the operation and return error or a retry would be made on the application level). I definitely think it's one of the things that are most critical to fix for 1.3 ad 1.4 so any attention here would be great! Appreciate your help. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118716#comment-16118716 ] ramkrishna.s.vasudevan commented on HBASE-18186: [~mantonov] Is there any specific scenario where you observer these failures? does it happen every time after split? You said in one of the comments in mailing list like for long running scans - so the same scan keeps failing and never it gets succeeded or all these are random in nature like it pops up some times? Since you had mentioned that the stability of 1.3 is not good because of all these store file accounting am planning to put some more effort to track this. (so this problem should be in 1.3 and above) [~chia7712] did an excellent work with respect to AcidGurantees how ever they are not related with file not found exceptions. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092292#comment-16092292 ] Mikhail Antonov commented on HBASE-18186: - [~ram_krish] oh I see, I have missed that part and I thought this is a slightly different scenario. That means we're back to square one on that.. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) > at
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091401#comment-16091401 ] ramkrishna.s.vasudevan commented on HBASE-18186: [~mantonov] I was exactly referring to the same set of steps in my previous comment here but later found that when we actually create a scanner for a user scan /get {code} @Override public KeyValueScanner getScanner(Scan scan, final NavigableSet targetCols, long readPt) throws IOException { lock.readLock().lock(); try { KeyValueScanner scanner = null; if (this.getCoprocessorHost() != null) { scanner = this.getCoprocessorHost().preStoreScannerOpen(this, scan, targetCols); } scanner = createScanner(scan, targetCols, readPt, scanner); return scanner; } finally { lock.readLock().unlock(); } } {code} We hold this entire read lock for a longer duration and hence the above problem does not happen. Infact when I had put the last comment some time back I tried debugging for the exact case and it just did not happen. May be is there a way where we are directly calling {code} public List getScanners(boolean cacheBlocks, boolean isGet, boolean usePread, boolean isCompaction, ScanQueryMatcher matcher, byte[] startRow, byte[] stopRow, long readPt) throws IOException { {code} And then we may have that problem like recently I saw in the case of HBASE-18221 (how ever that feature is not in branch-1.3). > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Sub-task > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090588#comment-16090588 ] Mikhail Antonov commented on HBASE-18186: - [~ram_krish] I suspect there's possible race in the following code, would be happy to be proven wrong on it: See HStore#getScanners() call (https://github.com/apache/hbase/blob/ee0f148c730e0ae1cb616406166487fba78a2298/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L1179), here we seem to * Get readLock * Get list of store files to create StoreScanners for * Release readLock * Actually create store scanners per store file It appears we can have a race here when compaction can change the list of active files between 3rd and 4th steps which would lead to Scanners that point to missing file, am I missing something? > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045273#comment-16045273 ] Ashu Pachauri commented on HBASE-18186: --- bq. Can you confirm if this FileNotFoundException is happening repeatedly for the same set of files? I picked a couple of regionservers facing this.. it's not happening repeatedly for the same set of files (or at least I could not find such a case) > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044291#comment-16044291 ] ramkrishna.s.vasudevan commented on HBASE-18186: Will try the above sequence in a different way. will be back. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044283#comment-16044283 ] ramkrishna.s.vasudevan commented on HBASE-18186: After debugging found that this case is not possible, because I found that HStore#getScanner(Scan scan, final NavigableSet targetCols, long readPt) takes a read lock till the StoreScanner is created. It is considerably a lock for a bigger duration. So the above theory does not apply. If the lock had not been there then the above theory is valid and may cause data loss. So lets see why still we get that FilenotfoundException. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044164#comment-16044164 ] Anoop Sam John commented on HBASE-18186: So if the flow happens as u said above, then the read op will be missing some data from being read? Because at Step #1 The compaction o/p file was not present. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044134#comment-16044134 ] ramkrishna.s.vasudevan commented on HBASE-18186: This and HBASE-17406 I can see another way this could happen - mainly the StackTrace of file not found during the time of opening the scanners, ->Thread 1 creates a scan and it acquires the HStore's readLock and tries to get all the storeFiles over which the StorefileScanner has to be created. So it has just come out of the read lock and still the scanners are not created. ->Thread 2 - a compaction is completed where the store files involved in the scanner of Thread 1 are actually compacted. It acquires the write lock to update the compactedFiles. These compactedFiles are marked compactedAway. ->Thread 3 (The CompactionDischareHandler) starts and it acquires the read lock to find the list of compactedfiles. It gets all the files created by Thread 2 and continues with archiving those files. At this point the refCount is still 0 for those storefiles. So it can see these files are marked compactedAway but refCount is still 0 and so it can remove those files for archiving. ->Now Thread 1 continues with incrementing the refcount to 1 on the Storefiles used for scanning. But by this time the file could have been archived. Am not sure if a test case will be possible here. But let me check by adding some debug points if the above can be reproduced. How ever the soln is simple is like we could extend the readLock in Thread 1 to be held till the StorefileScanners are created because the only costly step there is creation of readers and that is already done when the storeFiles are actually compacted/flushed. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044072#comment-16044072 ] ramkrishna.s.vasudevan commented on HBASE-18186: bq.java.io.FileNotFoundException: File/Directory // does not exist. [~ashu210890] - Can you confirm if this FileNotFoundException is happening repeatedly for the same set of files? > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043940#comment-16043940 ] ramkrishna.s.vasudevan commented on HBASE-18186: I verified this test case TestCompactionArchiveIOException. It seems doing the right thing but if we specifically get fileNotFoundException - I think we can be sure that it was already been archived and remove that also from the compactedFile list. Infact to confirm this shall we do file.exists() check in the 'archive' path? > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043190#comment-16043190 ] Ashu Pachauri commented on HBASE-18186: --- bq. If you are not looking into this I can take this up Not looking into this at the moment. But, I think [~mantonov] is digging into HBASE-17406 which may be overlapping with this. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042994#comment-16042994 ] ramkrishna.s.vasudevan commented on HBASE-18186: the second stack trace is because we do FileStatus.list() to get the modification time. So there could be a chance the file got archived. {code} catch (FailedArchiveException fae) { // Even if archiving some files failed, we still need to clear out any of the // files which were successfully archived. Otherwise we will receive a // FileNotFoundException when we attempt to re-archive them in the next go around. Collection failedFiles = fae.getFailedFiles(); Iterator iter = filesToRemove.iterator(); while (iter.hasNext()) { if (failedFiles.contains(iter.next().getPath())) { iter.remove(); } } if (!filesToRemove.isEmpty()) { clearCompactedfiles(filesToRemove); } throw fae; {code} This code in HStore#removeCompactedFiles() - if we get FileNotFound exception for a set of files - then we should simple clear the compacted files with all those fileToRemove, if we remove some of the items from that list 'filesToRemove' then next time it will still be picked up right? Am I missing something. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042193#comment-16042193 ] ramkrishna.s.vasudevan commented on HBASE-18186: [~ashu210890] If you are not looking into this I can take this up. Probably by beginning of next week could share my analysis here. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) > at >
[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs
[ https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041500#comment-16041500 ] Ashu Pachauri commented on HBASE-18186: --- Linking HBASE-13082 due to suspicion based on similarity of code paths affected by this in 1.3 releases. > Frequent FileNotFoundExceptions in region server logs > - > > Key: HBASE-18186 > URL: https://issues.apache.org/jira/browse/HBASE-18186 > Project: HBase > Issue Type: Bug > Components: Compaction, Scanners >Affects Versions: 1.3.1 >Reporter: Ashu Pachauri > > We see frequent FileNotFoundException in regionserver logs on multiple code > paths trying to reference non existing store files. I know that there have > been multiple bugs in store file accounting of compacted store files. > Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788. > Observations: > 1. The issue mentioned here also seems to bear a similar flavor, because we > are not seeing rampant dataloss given the frequency of these exceptions in > the logs. So, it's more likely an accounting issue, but I could be wrong. > 2. The frequency with which this happens on scan heavy workload is at least > one order of magnitude higher than a mixed workload. > Stack traces: > {Code} > WARN backup.HFileArchiver: Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs: because it does not exist! > Skipping and continuing on. > java.io.FileNotFoundException: File/Directory // > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520) > at > org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530) > at > org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433) > at > org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723) > at > org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672) > at > org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43) > at >