[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119384#comment-16119384
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


bq. Split fails if we write out a link to a file in parent region that either 
doesn't exist or gets incorrectly removed (by compaction discharger/archiver?) 
after the link was created, so during the split you can't open daughter regions 
due to IO error.
I will go through the split code once again and how the reference files are 
compacted. I believe when the store file accounting was done this area was 
looked into how ever there are chances that obvious cases are missed. Will get 
back here. Thanks for the update [~mantonov].

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118881#comment-16118881
 ] 

Mikhail Antonov commented on HBASE-18186:
-

Yes, I saw that on the cluster with HBASE-16788 patch applied. There's more 
(subtle) problem somewhere here.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118879#comment-16118879
 ] 

Mikhail Antonov commented on HBASE-18186:
-

Regarding failed splits -those are more a result of the bug rather than a 
scenario. Split fails if we write out a link to a file in parent region that 
either doesn't exist or gets incorrectly removed (by compaction 
discharger/archiver?) after the link was created, so during the split you can't 
open daughter regions due to IO error.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118877#comment-16118877
 ] 

Anoop Sam John commented on HBASE-18186:


You see these FNFE (Take case 1) after fixing HBASE-16788?  Or this happened in 
a test where HBASE-16788 is not in?
Before HBASE-16788, there was chance that a slow running 
CompactedHFilesDischarger would run parallel with a new one and both have same 
file path to be archived and one will succed and other will get FNFE.   
HBASE-16788 added synchronized for closeAndArchiveCompactedFiles.   May be at 
top level itself (at begin of Chore run) we should check and not allow 2 
parallel CompactedHFilesDischarger chores?

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118876#comment-16118876
 ] 

Mikhail Antonov commented on HBASE-18186:
-

[~ram_krish]

This should be most visible with the workload with high write rate (frequent 
flushes/compactions) and reasonably high rate of *long* scans (I don't know if 
I can say same scans never succeeds or not, because at some point client will 
stop the operation and return error or a retry would be made on the application 
level).

I definitely think it's one of the things that are most critical to fix for 1.3 
ad 1.4 so any attention here would be great! Appreciate your help.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-08-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118716#comment-16118716
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


[~mantonov]
Is there any specific scenario where you observer these failures? does it 
happen every time after split? You said in one of the comments in mailing list 
like for long running scans - so the same scan keeps failing and never it gets 
succeeded or all these are random in nature like it pops up some times? Since 
you had mentioned that the stability of 1.3 is not good because of all these 
store file accounting am planning to put some more effort to track this. (so 
this problem should be in 1.3 and above)
[~chia7712] did an excellent work with respect to AcidGurantees how ever they 
are not related with file not found exceptions. 


> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-07-18 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092292#comment-16092292
 ] 

Mikhail Antonov commented on HBASE-18186:
-

[~ram_krish] oh I see, I have missed that part and I thought this is a slightly 
different scenario. That means we're back to square one on that..

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>   at 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-07-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091401#comment-16091401
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


[~mantonov]
I was exactly referring to the same set of steps in my previous comment here 
but later found that when we actually create a scanner for a user scan /get 
{code}
@Override
  public KeyValueScanner getScanner(Scan scan,
  final NavigableSet targetCols, long readPt) throws IOException {
lock.readLock().lock();
try {
  KeyValueScanner scanner = null;
  if (this.getCoprocessorHost() != null) {
scanner = this.getCoprocessorHost().preStoreScannerOpen(this, scan, 
targetCols);
  }
  scanner = createScanner(scan, targetCols, readPt, scanner);
  return scanner;
} finally {
  lock.readLock().unlock();
}
  }
{code}
We hold this entire read lock for a longer duration and hence the above problem 
does not happen. Infact when I had put the last comment some time back I tried 
debugging for the exact case and it just did not happen. May be is there a way 
where we are directly calling
{code}
public List getScanners(boolean cacheBlocks, boolean isGet,
  boolean usePread, boolean isCompaction, ScanQueryMatcher matcher, byte[] 
startRow,
  byte[] stopRow, long readPt) throws IOException {
{code}
And then we may have that problem like recently I saw in the case of 
HBASE-18221 (how ever that feature is not in branch-1.3).

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-07-17 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090588#comment-16090588
 ] 

Mikhail Antonov commented on HBASE-18186:
-

[~ram_krish]

I suspect there's possible race in the following code, would be happy to be 
proven wrong on it:

See HStore#getScanners() call 
(https://github.com/apache/hbase/blob/ee0f148c730e0ae1cb616406166487fba78a2298/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L1179),
 here we seem to
* Get readLock
* Get list of store files to create StoreScanners for 
* Release readLock
* Actually create store scanners per store file
   
It appears we can have a race here when compaction can change the list of 
active files between 3rd and 4th steps which would lead to Scanners that point 
to missing file, am I missing something?



> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045273#comment-16045273
 ] 

Ashu Pachauri commented on HBASE-18186:
---

bq. Can you confirm if this FileNotFoundException is happening repeatedly for 
the same set of files?
I picked a couple of regionservers facing this.. it's not happening repeatedly 
for the same set of files (or at least I could not find such a case)

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044291#comment-16044291
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


Will try the above sequence in a different way. will be back. 

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044283#comment-16044283
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


After debugging found that this case is not possible, because I found that 
HStore#getScanner(Scan scan,
  final NavigableSet targetCols, long readPt) 
takes a read lock till the StoreScanner is created. It is considerably a lock 
for a bigger duration. So the above theory does not apply. If the lock had not 
been there then the above theory is valid and may cause data loss. So lets see 
why still we get that FilenotfoundException.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044164#comment-16044164
 ] 

Anoop Sam John commented on HBASE-18186:


So if the flow happens as u said above, then the read op will be missing some 
data from being read?  Because at Step #1 The compaction o/p file was not 
present.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>  

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044134#comment-16044134
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


This and HBASE-17406 I can see another way this could happen - mainly the 
StackTrace of file not found during the time of opening the scanners,
->Thread 1 creates a scan and it acquires the HStore's readLock and tries to 
get all the storeFiles over which the StorefileScanner has to be created. So it 
has just come out of the read lock and still the scanners are not created.
->Thread 2 - a compaction is completed where the store files involved in the 
scanner of Thread 1 are actually compacted. It acquires the write lock to 
update the compactedFiles. These compactedFiles are marked compactedAway.
->Thread 3 (The CompactionDischareHandler) starts and it acquires the read lock 
to find the list of compactedfiles. It gets all the files created by Thread 2 
and continues with archiving those files. At this point the refCount is still 0 
for those storefiles. So it can see these files are marked compactedAway but 
refCount is still 0 and so it can remove those files for archiving.

->Now Thread 1 continues with incrementing the refcount to 1 on the Storefiles 
used for scanning. But by this time the file could have been archived. 
Am not sure if a test case will be possible here. But let me check by adding 
some debug points if the above can be reproduced. How ever the soln is simple 
is like we could extend the readLock in Thread 1 to be held till the 
StorefileScanners are created because the only costly step there is creation of 
readers and that is already done when the storeFiles are actually 
compacted/flushed. 

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044072#comment-16044072
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


bq.java.io.FileNotFoundException: File/Directory // 
does not exist.
[~ashu210890] - Can you confirm if this FileNotFoundException is happening 
repeatedly for the same set of files? 

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043940#comment-16043940
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


I verified this test case TestCompactionArchiveIOException. It seems doing the 
right thing but if we specifically get fileNotFoundException - I think we can 
be sure that it was already been archived and remove that also from the 
compactedFile list.
Infact to confirm this shall we do file.exists() check in the 'archive' path?

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-08 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043190#comment-16043190
 ] 

Ashu Pachauri commented on HBASE-18186:
---

bq. If you are not looking into this I can take this up
Not looking into this at the moment. But, I think [~mantonov] is digging into 
HBASE-17406 which may be overlapping with this.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042994#comment-16042994
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


the second stack trace is because we do FileStatus.list() to get the 
modification time. So there could be a chance the file got archived. 
{code}
catch (FailedArchiveException fae) {
  // Even if archiving some files failed, we still need to clear out 
any of the
  // files which were successfully archived.  Otherwise we will receive 
a
  // FileNotFoundException when we attempt to re-archive them in the 
next go around.
  Collection failedFiles = fae.getFailedFiles();
  Iterator iter = filesToRemove.iterator();
  while (iter.hasNext()) {
if (failedFiles.contains(iter.next().getPath())) {
  iter.remove();
}
  }
  if (!filesToRemove.isEmpty()) {
clearCompactedfiles(filesToRemove);
  }
  throw fae;
{code}
This code in HStore#removeCompactedFiles() - if we get FileNotFound exception 
for a set of files - then we should simple clear the compacted files with all 
those fileToRemove, if we remove some of the items from that list 
'filesToRemove' then next time it will still be picked up right? Am I missing 
something.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042193#comment-16042193
 ] 

ramkrishna.s.vasudevan commented on HBASE-18186:


[~ashu210890]
If you are not looking into this I can take this up. Probably by beginning of 
next week could share my analysis here.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>   at 
> 

[jira] [Commented] (HBASE-18186) Frequent FileNotFoundExceptions in region server logs

2017-06-07 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041500#comment-16041500
 ] 

Ashu Pachauri commented on HBASE-18186:
---

Linking HBASE-13082 due to suspicion based on similarity of code paths affected 
by this in 1.3 releases.

> Frequent FileNotFoundExceptions in region server logs
> -
>
> Key: HBASE-18186
> URL: https://issues.apache.org/jira/browse/HBASE-18186
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, Scanners
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>
> We see frequent FileNotFoundException in regionserver logs on multiple code 
> paths trying to reference non existing store files. I know that there have 
> been multiple bugs in store file accounting of compacted store files. 
> Examples include: HBASE-16964 , HBASE-16754 and HBASE-16788.
> Observations:  
> 1. The issue mentioned here also seems to bear a similar flavor, because we 
> are not seeing rampant dataloss given the frequency of these exceptions in 
> the logs. So, it's more likely an accounting issue, but I could be wrong. 
> 2. The frequency with which this happens on scan heavy workload is at least 
> one order of magnitude higher than a mixed workload.
> Stack traces:
> {Code}
> WARN backup.HFileArchiver: Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
> file:hdfs: because it does not exist! 
> Skipping and continuing on.
> java.io.FileNotFoundException: File/Directory // 
> does not exist.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>   at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3115)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1520)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$30.doCall(DistributedFileSystem.java:1516)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1530)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1805)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:575)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:410)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchive(HFileArchiver.java:320)
>   at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:242)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:433)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2723)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.closeAndArchiveCompactedFiles(HStore.java:2672)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactedHFilesDischargeHandler.process(CompactedHFilesDischargeHandler.java:43)
>   at 
>