[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2022-01-18 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477841#comment-17477841
 ] 

Stamatis Zampetakis commented on HIVE-23959:


Before this change a DDL statement updating a column in a partitioned table 
would remove the statistics for the updated column from every partition but 
would leave the stats for other columns intact.

After this change, if the appropriate configuration property is set, updating a 
column removes *all* partition statistics (for all columns of the table).

[~kgyrtkirk]  is my understanding correct or did I miss something?

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2020-08-14 Thread Yushi Hayasaka (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177586#comment-17177586
 ] 

Yushi Hayasaka commented on HIVE-23959:
---

[~kgyrtkirk]
Thanks for the reply! I understood the key point and also confirmed the 
performance was improved through the test, replacing columns with a table which 
has 50+ partitions (timeout on HS2 -> 200 seconds after applying), on my 
environment. I was also concerned about calling `getPartitions` (slow 
function), but I found it did not call if we enabled the direct sql feature on 
this patch. Good!

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2020-08-11 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175578#comment-17175578
 ] 

Zoltan Haindrich commented on HIVE-23959:
-

[~yhaya]: sorry, I missed your comment.

Yes, the key difference when this feature is enabled that carefull  1-by-1 
partition update is skipped - instead; it will remove all column statistics for 
all partitions of the table. It will only execute a few queries - independently 
from the number of partitions - so it will be quick.

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2020-08-03 Thread Yushi Hayasaka (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170498#comment-17170498
 ] 

Yushi Hayasaka commented on HIVE-23959:
---

Hello, I'm interested in dealing with the issue since we have difficulty with 
it.
Just curious, how does it affect performance?
Also, it seems to replace calling `clearColumnStatsState` instead of 
`updateOrGetPartitionColumnStats` for partitions. I think the performance 
improvement is here.
Is it correct? Or does it have any other improvement too?

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)