[ 
https://issues.apache.org/jira/browse/HBASE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7651:
----------------------------------

    Attachment: hbase-7651.patch

First cut of patch.  Need to add unit test.  Something very similar to this has 
been testing on 20 node cluster for several hours. Not all snapshot succeed but 
the snapshotting mechanism no longer gets stuck.
                
> RegionServerSnapshotManager fails with CancellationException if previous 
> snapshot fails in per region task
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7651
>                 URL: https://issues.apache.org/jira/browse/HBASE-7651
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-7290
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>            Priority: Blocker
>         Attachments: hbase-7651.patch
>
>
> I've reproduced this problem consistently on a 20 node cluster.
> The first run fails on a node (jon-snaphots-2 in this case) to take snapshot 
> due to a NotServingRegionException (this is acceptable)
> {code}
> 2013-01-23 13:32:48,631 DEBUG 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher:  accepting 
> received exception
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  org.apache.hadoop.hbase.NotServingRegionException: 
> TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is 
> closing
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
>         at 
> org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240)
>         at 
> org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is 
> closing
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-01-23 13:32:48,631 DEBUG 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher:  Recieved 
> error, notifying listeners...
> 2013-01-23 13:32:48,730 ERROR org.apache.hadoop.hbase.procedure.Procedure: 
> Procedure 'pe-6' execution failed!
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  org.apache.hadoop.hbase.NotServingRegionException: 
> TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is 
> closing
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84)
>         at 
> org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357)
>         at 
> org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203)
>         at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is 
> closing
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>         ... 5 more
> {code}
> Subsequent snapshot attempts that require jon-snapshot-2 to participate fail 
> like this:
> {code}
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.util.concurrent.CancellationException
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
>         at 
> org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240)
>         at 
> org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182)
>         at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.util.concurrent.CancellationException
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:270)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:202)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-01-23 13:32:59,557 DEBUG 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher:  Recieved 
> error, notifying listeners...
> 2013-01-23 13:32:59,810 ERROR org.apache.hadoop.hbase.procedure.Procedure: 
> Procedure 'pe-7' execution failed!
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.util.concurrent.CancellationException
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84)
>         at 
> org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357)
>         at 
> org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203)
>         at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.util.concurrent.CancellationException
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:270)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:202)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>         ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to