[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal
[ https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477841#comment-17477841 ] Stamatis Zampetakis commented on HIVE-23959: Before this change a DDL statement updating a column in a partitioned table would remove the statistics for the updated column from every partition but would leave the stats for other columns intact. After this change, if the appropriate configuration property is set, updating a column removes *all* partition statistics (for all columns of the table). [~kgyrtkirk] is my understanding correct or did I miss something? > Provide an option to wipe out column stats for partitioned tables in case of > column removal > --- > > Key: HIVE-23959 > URL: https://issues.apache.org/jira/browse/HIVE-23959 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > in case of column removal / replacement - an update for each partition is > neccessary; which could take a while. > goal here is to provide an option to switch to the bulk removal of column > statistics instead of working hard to retain as much as possible from the old > stats. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal
[ https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177586#comment-17177586 ] Yushi Hayasaka commented on HIVE-23959: --- [~kgyrtkirk] Thanks for the reply! I understood the key point and also confirmed the performance was improved through the test, replacing columns with a table which has 50+ partitions (timeout on HS2 -> 200 seconds after applying), on my environment. I was also concerned about calling `getPartitions` (slow function), but I found it did not call if we enabled the direct sql feature on this patch. Good! > Provide an option to wipe out column stats for partitioned tables in case of > column removal > --- > > Key: HIVE-23959 > URL: https://issues.apache.org/jira/browse/HIVE-23959 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > in case of column removal / replacement - an update for each partition is > neccessary; which could take a while. > goal here is to provide an option to switch to the bulk removal of column > statistics instead of working hard to retain as much as possible from the old > stats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal
[ https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175578#comment-17175578 ] Zoltan Haindrich commented on HIVE-23959: - [~yhaya]: sorry, I missed your comment. Yes, the key difference when this feature is enabled that carefull 1-by-1 partition update is skipped - instead; it will remove all column statistics for all partitions of the table. It will only execute a few queries - independently from the number of partitions - so it will be quick. > Provide an option to wipe out column stats for partitioned tables in case of > column removal > --- > > Key: HIVE-23959 > URL: https://issues.apache.org/jira/browse/HIVE-23959 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > in case of column removal / replacement - an update for each partition is > neccessary; which could take a while. > goal here is to provide an option to switch to the bulk removal of column > statistics instead of working hard to retain as much as possible from the old > stats. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal
[ https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170498#comment-17170498 ] Yushi Hayasaka commented on HIVE-23959: --- Hello, I'm interested in dealing with the issue since we have difficulty with it. Just curious, how does it affect performance? Also, it seems to replace calling `clearColumnStatsState` instead of `updateOrGetPartitionColumnStats` for partitions. I think the performance improvement is here. Is it correct? Or does it have any other improvement too? > Provide an option to wipe out column stats for partitioned tables in case of > column removal > --- > > Key: HIVE-23959 > URL: https://issues.apache.org/jira/browse/HIVE-23959 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > in case of column removal / replacement - an update for each partition is > neccessary; which could take a while. > goal here is to provide an option to switch to the bulk removal of column > statistics instead of working hard to retain as much as possible from the old > stats. -- This message was sent by Atlassian Jira (v8.3.4#803005)