[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~akolb] and [~vihangk1] for the review! > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, > HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.7.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.7.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, > HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.7.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.6.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, > HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.patch, drop_table_after.png, > drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.5.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, > HIVE-6980.5.patch, HIVE-6980.patch, drop_table_after.png, > drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.4.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, > HIVE-6980.patch, drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.3.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: (was: HIVE-6980.3.patch) > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: (was: HIVE-6980.3.patch) > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.3.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.3.patch, > HIVE-6980.patch, drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.3.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: drop_table_after.png > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.patch, > drop_table_after.png, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: drop_table_before.png > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.patch, drop_table_before.png > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.2.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.2.patch, HIVE-6980.patch > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Status: Patch Available (was: Open) First version of the patch. Splits {{getPartitionsViaSqlFilterInternal}} to: * getPartitionIdsViaSqlFilter - which returns the partition ids * getPartitionsFromPartitionIds - which returns the partition data for the partitions Creates {{dropPartitionsByPartitionIds}} which drops the partitions by directSQL commands Creates a {{dropPartitionsViaSqlFilter}} using {{getPartitionIdsViaSqlFilter}} and {{dropPartitionsByPartitionIds}}. Modifies the ObjectStore to drop partitions with directsql if possible. > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.patch > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-6980: - Attachment: HIVE-6980.patch > Drop table by using direct sql > -- > > Key: HIVE-6980 > URL: https://issues.apache.org/jira/browse/HIVE-6980 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 0.12.0 >Reporter: Selina Zhang >Assignee: Peter Vary >Priority: Major > Attachments: HIVE-6980.patch > > > Dropping table which has lots of partitions is slow. Even after applying the > patch of HIVE-6265, the drop table still takes hours (100K+ partitions). > The fixes come with two parts: > 1. use directSQL to query the partitions protect mode; > the current implementation needs to transfer the Partition object to client > and check the protect mode for each partition. I'd like to move this part of > logic to metastore. The check will be done by direct sql (if direct sql is > disabled, execute the same logic in the ObjectStore); > 2. use directSQL to drop partitions for table; > there maybe two solutions here: > 1. add "DELETE CASCADE" in the schema. In this way we only need to delete > entries from partitions table use direct sql. May need to change > datanucleus.deletionPolicy = DataNucleus. > 2. clean up the dependent tables by issue DELETE statement. This also needs > to turn on datanucleus.query.sql.allowAll > Both of above solutions should be able to fix the problem. The DELETE CASCADE > has to change schemas and prepare upgrade scripts. The second solutions added > maintenance cost if new tables added in the future releases. > Please advice. -- This message was sent by Atlassian JIRA (v7.6.3#76005)