[ 
https://issues.apache.org/jira/browse/HBASE-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-12377:
---------------------------------------
    Attachment: HBASE-12377.v1-2.0.patch

The HBaseAdmin.deleteTable implemented its own logic to obtain all regions of a 
table, this is not robust (a few issues found from the logic, NotServingRegion 
exception was not handled for retry; stale meta cache was used, etc.)  The 
change in this patch is to use proven MetaScanner.listTableRegionLocations 
method to find all non-archived regions of a table.

> HBaseAdmin#deleteTable fails when META region is moved around the same 
> timeframe
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-12377
>                 URL: https://issues.apache.org/jira/browse/HBASE-12377
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.98.4
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>         Attachments: HBASE-12377.v1-2.0.patch
>
>
> This is the same issue that HBASE-10809 tried to address.  The fix of 
> HBASE-10809 refetch the latest meta location in retry-loop.  However, there 
> are 2 problems: (1).  inside the retry loop, there is another try-catch block 
> that would throw the exception before retry can kick in; (2). It looks like 
> that HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data 
> from meta cache, which means if the meta cache is stale and out of date, 
> retries would not solve the problem by fetching from the stale meta cache.
> Here is the call stack of the issue:
> {noformat}
> 2014-10-27 
> 10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
>  org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is 
> not online on ip-172-31-0-48.ec2.internal,60020,1414403435009
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.Thread.run(Thread.java:745)
> 2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
> {noformat}
> The META region was Online in RS1 when the delete table starts, it was moved 
> to RS2 during the delete table operation.  And the problem appears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to