[ 
https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10595:
---------------------------------

    Attachment: HBASE-10595-trunk_v2.patch

New patch 'fixing' previously failed TestMasterObserver case

The cause of the failure for TestMasterObserver is similar as 
TestAssignmentManagerOnCluster#testMoveRegionOfDeletedTable : 
HBaseAdmin.deleteTable is 'synchronous' to client in that it returns after it 
ensures table descriptor can't be retrieved back from master after asking 
master to delete a table. But DeleteTableHandler is processed asynchronously in 
master, and things such as 'clearing table descriptor cache', 'removing regions 
from RegionStates' and 'calling all coprocessors' postDeleteTableHandler' are 
all done *after* removing the table dir (it's 'removing table dir' now that 
makes client can't get table descriptor and believe the table is deleted after 
this patch, not from table descriptor cache).

Before this patch, the client can still get a valid table descriptor after 
master removes the table dir(first rename, then remove all region data dirs and 
finally remove table dir) until the table descriptor is removed from the table 
descriptor cache. But after this patch, client can't get table descriptor once 
master renames the table dir, so it makes the cases which assume "regions are 
removed from RegionStates" or "coprocessors' postDeleteTableHandler are called" 
much more possible to fail since now it takes longer from "client can't get 
table descriptor" to "regions are removed from RegionStates" / "coprocessors' 
postDeleteTableHandler are called", and the code assuming such things fail when 
executed immediately after HBaseAdmin.deleteTable().

In short, we can't assume "regions are removed from RegionStates" or 
"coprocessors' postDeleteTableHandler are called" after 
HBaseAdmin.deleteTable() returns, though HBaseAdmin.deleteTable() is seemingly 
synchronous.

> HBaseAdmin.getTableDescriptor can wrongly get the previous table's 
> TableDescriptor even after the table dir in hdfs is removed
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10595
>                 URL: https://issues.apache.org/jira/browse/HBASE-10595
>             Project: HBase
>          Issue Type: Bug
>          Components: master, util
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch
>
>
> When a table dir (in hdfs) is removed(by outside), HMaster will still return 
> the cached TableDescriptor to client for getTableDescriptor request.
> On the contrary, HBaseAdmin.listTables() is handled correctly in current 
> implementation, for a table whose table dir in hdfs is removed by outside, 
> getTableDescriptor can still retrieve back a valid (old) table descriptor, 
> while listTables says it doesn't exist, this is inconsistent
> The reason for this bug is because HMaster (via FSTableDescriptors) doesn't 
> check if the table dir exists for getTableDescriptor() request, (while it 
> lists all existing table dirs(not firstly respects cache) and returns 
> accordingly for listTables() request)
> When a table is deleted via deleteTable, the cache will be cleared after the 
> table dir and tableInfo file is removed, listTables/getTableDescriptor 
> inconsistency should be transient(though still exists, when table dir is 
> removed while cache is not cleared) and harder to expose



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to