[
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474141#comment-16474141
]
Peter Vary commented on HIVE-6980:
----------------------------------
Test failures are not related.
Rerun TestTxnCommands2 and TestTxnCommands just to be sure. They produced the
same failures before and after the patch.
Done some additional manual testing:
* Different DBs: Postgres / MySql / MSSQL / Oracle - Derby is used by the HMS
API tests
* Changed the batching size manually to a smaller number, so can see the
batching is working too
Patch description:
* Split the existing getPartitionIdsViaSqlFilter to reuse the query getting
the part we get the PartitionIds by the partition names
* Created specific directSql method for removing rows directly connected to
the Partition object
* Created directSql methods for dropping the embedded objects:
** StorageDescriptor
** Serde
** ColumnDescriptor
[~sershe]: Do you still have concerns about the DataNucleus caching?
[~vihangk1]: Could you please review?
Thanks,
Peter
> Drop table by using direct sql
> ------------------------------
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.12.0
> Reporter: Selina Zhang
> Assignee: Peter Vary
> Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch,
> HIVE-6980.patch, drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions).
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client
> and check the protect mode for each partition. I'd like to move this part of
> logic to metastore. The check will be done by direct sql (if direct sql is
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete
> entries from partitions table use direct sql. May need to change
> datanucleus.deletionPolicy = DataNucleus.
> 2. clean up the dependent tables by issue DELETE statement. This also needs
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE
> has to change schemas and prepare upgrade scripts. The second solutions added
> maintenance cost if new tables added in the future releases.
> Please advice.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)