[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-25 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks [~akolb] and [~vihangk1] for the review!

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, 
> HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.7.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.7.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, 
> HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.7.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-22 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.6.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, 
> HIVE-6980.5.patch, HIVE-6980.6.patch, HIVE-6980.patch, drop_table_after.png, 
> drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-17 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.5.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, 
> HIVE-6980.5.patch, HIVE-6980.patch, drop_table_after.png, 
> drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-13 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.4.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.4.patch, 
> HIVE-6980.patch, drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.3.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: (was: HIVE-6980.3.patch)

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: (was: HIVE-6980.3.patch)

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.3.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.3.patch, 
> HIVE-6980.patch, drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.3.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.3.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: drop_table_after.png

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.patch, 
> drop_table_after.png, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-05-02 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: drop_table_before.png

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.patch, drop_table_before.png
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-04-26 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.2.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.2.patch, HIVE-6980.patch
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-04-25 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Status: Patch Available  (was: Open)

First version of the patch.

Splits {{getPartitionsViaSqlFilterInternal}} to:
* getPartitionIdsViaSqlFilter - which returns the partition ids
* getPartitionsFromPartitionIds - which returns the partition data for the 
partitions

Creates {{dropPartitionsByPartitionIds}} which drops the partitions by 
directSQL commands

Creates a {{dropPartitionsViaSqlFilter}} using {{getPartitionIdsViaSqlFilter}} 
and {{dropPartitionsByPartitionIds}}.

Modifies the ObjectStore to drop partitions with directsql if possible.


> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.patch
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-6980) Drop table by using direct sql

2018-04-25 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-6980:
-
Attachment: HIVE-6980.patch

> Drop table by using direct sql
> --
>
> Key: HIVE-6980
> URL: https://issues.apache.org/jira/browse/HIVE-6980
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-6980.patch
>
>
> Dropping table which has lots of partitions is slow. Even after applying the 
> patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
> The fixes come with two parts:
> 1. use directSQL to query the partitions protect mode;
> the current implementation needs to transfer the Partition object to client 
> and check the protect mode for each partition. I'd like to move this part of 
> logic to metastore. The check will be done by direct sql (if direct sql is 
> disabled, execute the same logic in the ObjectStore);
> 2. use directSQL to drop partitions for table;
> there maybe two solutions here:
> 1. add "DELETE CASCADE" in the schema. In this way we only need to delete 
> entries from partitions table use direct sql. May need to change 
> datanucleus.deletionPolicy = DataNucleus. 
> 2. clean up the dependent tables by issue DELETE statement. This also needs 
> to turn on datanucleus.query.sql.allowAll
> Both of above solutions should be able to fix the problem. The DELETE CASCADE 
> has to change schemas and prepare upgrade scripts. The second solutions added 
> maintenance cost if new tables added in the future releases.
> Please advice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)