[ https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anirban Roy updated HBASE-19681: -------------------------------- Attachment: region-server-missing file-log.doc > Online snapshot creation failing with missing store file > -------------------------------------------------------- > > Key: HBASE-19681 > URL: https://issues.apache.org/jira/browse/HBASE-19681 > Project: HBase > Issue Type: Bug > Components: backup&restore, snapshots > Affects Versions: 1.3.0 > Environment: Hadoop - 2.7.3 > HBase 1.3.0 > OS - GNU/Linux x86_64 > Cluster - Amazon Elastic Mapreduce > Reporter: Anirban Roy > Attachments: region-server-missing file-log.doc > > > We are facing problem creating online snapshot of our HBase table. The table > contains 20TB data and receiving ~10000 writes per second. The snapshot > creating failing intermittently with error that some hfile missing, see the > detailed output below. Once we locate the region server hosting the region > and restart the region server, snapshot creation succeeds. It seems the > missing hfile removed due to minor compaction, but region server still holds > the pointer to the file. > [hadoop@ip-10-0-12-164 ~]$ hbase shell > HBase Shell; enter 'help<RETURN>' for list of supported commands. > Type "exit<RETURN>" to leave the HBase Shell > Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017 > > hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’ > > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot > { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, > ip-10-0-0-32.ec2.internal,16020,1508372591059, > ip-10-0-14-221.ec2.internal,16020,1508372580873, > ip-10-0-15-185.ec2.internal,16020,1508372588507, > ip-10-0-9-43.ec2.internal,16020,1508372569107, > ip-10-0-10-62.ec2.internal,16020,1512885921693, > ip-10-0-8-216.ec2.internal,16020,1508372584133, > ip-10-0-1-207.ec2.internal,16020,1508372580144, > ip-10-0-0-173.ec2.internal,16020,1508372584969, > ip-10-0-4-79.ec2.internal,16020,1508372587161, > ip-10-0-3-165.ec2.internal,16020,1508372593566, > ip-10-0-14-137.ec2.internal,16020,1508372583225, > ip-10-0-6-33.ec2.internal,16020,1508372581587, > ip-10-0-15-199.ec2.internal,16020,1508372587478, > ip-10-0-5-253.ec2.internal,16020,1508372581243, > ip-10-0-1-99.ec2.internal,16020,1508372609684] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via > ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) > at > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344) > ... 6 more > Caused by: > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: > java.io.FileNotFoundException: File does not exist: > hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523 > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160) > at > org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187) > at > org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Here is some help for this command: > Take a snapshot of specified table. Examples: > > hbase> snapshot 'sourceTable', 'snapshotName' > hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => > true} -- This message was sent by Atlassian JIRA (v6.4.14#64029)