[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Honghua updated HBASE-10595: --------------------------------- Attachment: HBASE-10595-trunk_v2.patch New patch 'fixing' previously failed TestMasterObserver case The cause of the failure for TestMasterObserver is similar as TestAssignmentManagerOnCluster#testMoveRegionOfDeletedTable : HBaseAdmin.deleteTable is 'synchronous' to client in that it returns after it ensures table descriptor can't be retrieved back from master after asking master to delete a table. But DeleteTableHandler is processed asynchronously in master, and things such as 'clearing table descriptor cache', 'removing regions from RegionStates' and 'calling all coprocessors' postDeleteTableHandler' are all done *after* removing the table dir (it's 'removing table dir' now that makes client can't get table descriptor and believe the table is deleted after this patch, not from table descriptor cache). Before this patch, the client can still get a valid table descriptor after master removes the table dir(first rename, then remove all region data dirs and finally remove table dir) until the table descriptor is removed from the table descriptor cache. But after this patch, client can't get table descriptor once master renames the table dir, so it makes the cases which assume "regions are removed from RegionStates" or "coprocessors' postDeleteTableHandler are called" much more possible to fail since now it takes longer from "client can't get table descriptor" to "regions are removed from RegionStates" / "coprocessors' postDeleteTableHandler are called", and the code assuming such things fail when executed immediately after HBaseAdmin.deleteTable(). In short, we can't assume "regions are removed from RegionStates" or "coprocessors' postDeleteTableHandler are called" after HBaseAdmin.deleteTable() returns, though HBaseAdmin.deleteTable() is seemingly synchronous. > HBaseAdmin.getTableDescriptor can wrongly get the previous table's > TableDescriptor even after the table dir in hdfs is removed > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-10595 > URL: https://issues.apache.org/jira/browse/HBASE-10595 > Project: HBase > Issue Type: Bug > Components: master, util > Reporter: Feng Honghua > Assignee: Feng Honghua > Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch > > > When a table dir (in hdfs) is removed(by outside), HMaster will still return > the cached TableDescriptor to client for getTableDescriptor request. > On the contrary, HBaseAdmin.listTables() is handled correctly in current > implementation, for a table whose table dir in hdfs is removed by outside, > getTableDescriptor can still retrieve back a valid (old) table descriptor, > while listTables says it doesn't exist, this is inconsistent > The reason for this bug is because HMaster (via FSTableDescriptors) doesn't > check if the table dir exists for getTableDescriptor() request, (while it > lists all existing table dirs(not firstly respects cache) and returns > accordingly for listTables() request) > When a table is deleted via deleteTable, the cache will be cleared after the > table dir and tableInfo file is removed, listTables/getTableDescriptor > inconsistency should be transient(though still exists, when table dir is > removed while cache is not cleared) and harder to expose -- This message was sent by Atlassian JIRA (v6.1.5#6160)