[ 
https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915909#comment-13915909
 ] 

Feng Honghua commented on HBASE-10595:
--------------------------------------

To align with listTables() for which table is deemed nonexistent whenever its 
table dir is nonexistent, getTableDescriptor now gets TableNotFoundException 
whenever its table dir is nonexistent without regard to the table descriptor 
cache.

During deleting table: table dir is renamed(moved) to tmp dir => archive all 
region data => remove table dir => clear table descriptor cache => remove from 
RegionStates => remove from ZKTable => execute postDeleteTable coprocessor

By this patch, client now thinks deleting table succeeds after table dir is 
renamed(nonexistent), rather than after clearing the table descriptor cache, so 
some unit tests assuming states such as regions have been removed from 
RegionStates, postDeleteTable coprocessor has been executed now are more 
possible to fail (since archiving region data / removing table dir in tmp dir 
takes more time), that's why I add Threads.sleep() for some unit-tests in this 
patch -- Why these cases can pass before this patch is not by design, but by 
chance, because it takes much less time from clearing table descriptor cache to 
removing from RegionStates / executing postDeleteTable coprocessor(when without 
archiving table data / removing table dir), and they do fail when I add some 
extra sleep(it equals to scenario where HMaster could suddenly run slowly) 
after clearing table descriptor cache without this patch...

The root cause of above test failure is another bug : HBaseAdmin.deleteTable is 
not "really" synchronous(some cleanups in HMaster are likely not done yet 
*after* HBaseAdmin.deleteTable() returns). HBase-10636 is created for this bug. 
We can remove the added Threads.sleep() once HBase-10636 is done, and 
personally I think this patch can be resolved independently.

Any opinion?

> HBaseAdmin.getTableDescriptor can wrongly get the previous table's 
> TableDescriptor even after the table dir in hdfs is removed
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10595
>                 URL: https://issues.apache.org/jira/browse/HBASE-10595
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, util
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, 
> HBASE-10595-trunk_v3.patch, HBASE-10595-trunk_v4.patch
>
>
> When a table dir (in hdfs) is removed(by outside), HMaster will still return 
> the cached TableDescriptor to client for getTableDescriptor request.
> On the contrary, HBaseAdmin.listTables() is handled correctly in current 
> implementation, for a table whose table dir in hdfs is removed by outside, 
> getTableDescriptor can still retrieve back a valid (old) table descriptor, 
> while listTables says it doesn't exist, this is inconsistent
> The reason for this bug is because HMaster (via FSTableDescriptors) doesn't 
> check if the table dir exists for getTableDescriptor() request, (while it 
> lists all existing table dirs(not firstly respects cache) and returns 
> accordingly for listTables() request)
> When a table is deleted via deleteTable, the cache will be cleared after the 
> table dir and tableInfo file is removed, listTables/getTableDescriptor 
> inconsistency should be transient(though still exists, when table dir is 
> removed while cache is not cleared) and harder to expose



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to