[ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983695#comment-13983695
 ] 

Selina Zhang commented on HIVE-6980:
------------------------------------

[~sershe]
Thanks for the comments! 

Sorry, I linked to the wrong JIRA. It should be HIVE-6256. :)

1. Protection check:
The work flows of drop table and drop partitions are different. Drop partitions 
need to have the Partition object. So the protection check is free. In drop 
table, we do not need to check partition by partition. The following statement 
will return all the partitions has protect mode on in one shot
{code}
    String queryText =  "select PARTITIONS.PART_NAME, "
        + "  PARTITION_PARAMS.PARAM_VALUE from PARTITIONS"
        + "  inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID "
        + "  inner join DBS on TBLS.DB_ID = DBS.DB_ID "
        + "  inner join PARTITION_PARAMS on PARTITIONS.PART_ID = 
PARTITION_PARAMS.PART_ID "
        + "  where TBLS.TBL_NAME = ? and DBS.NAME = ? and PARAM_KEY = '" + 
ProtectMode.PARAMETER_NAME + "'";
{code}

2. Construct the Partition object itself is expensive. Even with direct sql, 
you need to query bunch of tables and do loop join for the Partitions objects. 

3. I have the same concern as you in HIVE-6397(Hope I did not type wrong this 
time :)). But after research this page (session "Deletion, using Foreign Keys 
(RDBMS)"), I think turn on delete cascade is a good idea. It will save trips 
for DN. 
http://www.datanucleus.org/products/datanucleus/jdo/orm/cascading.html

Very appreciate your previous work on meta store. It gets better and better!

> Drop table by using direct sql
> ------------------------------
>
>                 Key: HIVE-6980
>                 URL: https://issues.apache.org/jira/browse/HIVE-6980
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.12.0
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to