[ 
https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirban Roy updated HBASE-19681:
--------------------------------
    Attachment: region-server-snapshot-exception-log.doc

Attached the log snippet of region server where the exception is thrown upon 
creating online snapshot. Can someone explain what is going on here? 

> Online snapshot creation failing with missing store file
> --------------------------------------------------------
>
>                 Key: HBASE-19681
>                 URL: https://issues.apache.org/jira/browse/HBASE-19681
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore, Performance, scaling, snapshots
>    Affects Versions: 1.3.0
>         Environment: Hadoop - 2.7.3
> HBase 1.3.0
> OS - GNU/Linux x86_64
> Cluster - Amazon Elastic Mapreduce
>            Reporter: Anirban Roy
>         Attachments: region-server-missing file-log.doc, 
> region-server-snapshot-exception-log.doc
>
>
> We are facing problem creating online snapshot of our HBase table. The table 
> contains 20TB data and receiving ~10000 writes per second. The snapshot 
> creating failing intermittently with error that some hfile missing, see the 
> detailed output below. Once we locate the region server hosting the region 
> and restart the region server, snapshot creation succeeds. It seems the 
> missing hfile removed due to minor compaction, but region server still holds 
> the pointer to the file.
> [hadoop@ip-10-0-12-164 ~]$ hbase shell
> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Type "exit<RETURN>" to leave the HBase Shell
> Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
>  
> hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
>  
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
> ss=x_snapshot table=x_table type=FLUSH } had an error.  Procedure x_snapshot 
> { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, 
> ip-10-0-0-32.ec2.internal,16020,1508372591059, 
> ip-10-0-14-221.ec2.internal,16020,1508372580873, 
> ip-10-0-15-185.ec2.internal,16020,1508372588507, 
> ip-10-0-9-43.ec2.internal,16020,1508372569107, 
> ip-10-0-10-62.ec2.internal,16020,1512885921693, 
> ip-10-0-8-216.ec2.internal,16020,1508372584133, 
> ip-10-0-1-207.ec2.internal,16020,1508372580144, 
> ip-10-0-0-173.ec2.internal,16020,1508372584969, 
> ip-10-0-4-79.ec2.internal,16020,1508372587161, 
> ip-10-0-3-165.ec2.internal,16020,1508372593566, 
> ip-10-0-14-137.ec2.internal,16020,1508372583225, 
> ip-10-0-6-33.ec2.internal,16020,1508372581587, 
> ip-10-0-15-199.ec2.internal,16020,1508372587478, 
> ip-10-0-5-253.ec2.internal,16020,1508372581243, 
> ip-10-0-1-99.ec2.internal,16020,1508372609684] }
>         at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
>         at 
> org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
>         at 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
>         at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
>         ... 6 more
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>  
> Here is some help for this command:
> Take a snapshot of specified table. Examples:
>  
>   hbase> snapshot 'sourceTable', 'snapshotName'
>   hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => 
> true}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to