[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915909#comment-13915909 ]
Feng Honghua commented on HBASE-10595: -------------------------------------- To align with listTables() for which table is deemed nonexistent whenever its table dir is nonexistent, getTableDescriptor now gets TableNotFoundException whenever its table dir is nonexistent without regard to the table descriptor cache. During deleting table: table dir is renamed(moved) to tmp dir => archive all region data => remove table dir => clear table descriptor cache => remove from RegionStates => remove from ZKTable => execute postDeleteTable coprocessor By this patch, client now thinks deleting table succeeds after table dir is renamed(nonexistent), rather than after clearing the table descriptor cache, so some unit tests assuming states such as regions have been removed from RegionStates, postDeleteTable coprocessor has been executed now are more possible to fail (since archiving region data / removing table dir in tmp dir takes more time), that's why I add Threads.sleep() for some unit-tests in this patch -- Why these cases can pass before this patch is not by design, but by chance, because it takes much less time from clearing table descriptor cache to removing from RegionStates / executing postDeleteTable coprocessor(when without archiving table data / removing table dir), and they do fail when I add some extra sleep(it equals to scenario where HMaster could suddenly run slowly) after clearing table descriptor cache without this patch... The root cause of above test failure is another bug : HBaseAdmin.deleteTable is not "really" synchronous(some cleanups in HMaster are likely not done yet *after* HBaseAdmin.deleteTable() returns). HBase-10636 is created for this bug. We can remove the added Threads.sleep() once HBase-10636 is done, and personally I think this patch can be resolved independently. Any opinion? > HBaseAdmin.getTableDescriptor can wrongly get the previous table's > TableDescriptor even after the table dir in hdfs is removed > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-10595 > URL: https://issues.apache.org/jira/browse/HBASE-10595 > Project: HBase > Issue Type: Sub-task > Components: master, util > Reporter: Feng Honghua > Assignee: Feng Honghua > Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, > HBASE-10595-trunk_v3.patch, HBASE-10595-trunk_v4.patch > > > When a table dir (in hdfs) is removed(by outside), HMaster will still return > the cached TableDescriptor to client for getTableDescriptor request. > On the contrary, HBaseAdmin.listTables() is handled correctly in current > implementation, for a table whose table dir in hdfs is removed by outside, > getTableDescriptor can still retrieve back a valid (old) table descriptor, > while listTables says it doesn't exist, this is inconsistent > The reason for this bug is because HMaster (via FSTableDescriptors) doesn't > check if the table dir exists for getTableDescriptor() request, (while it > lists all existing table dirs(not firstly respects cache) and returns > accordingly for listTables() request) > When a table is deleted via deleteTable, the cache will be cleared after the > table dir and tableInfo file is removed, listTables/getTableDescriptor > inconsistency should be transient(though still exists, when table dir is > removed while cache is not cleared) and harder to expose -- This message was sent by Atlassian JIRA (v6.1.5#6160)