[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2020-08-14 Thread Yushi Hayasaka (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177586#comment-17177586
 ] 

Yushi Hayasaka commented on HIVE-23959:
---

[~kgyrtkirk]
Thanks for the reply! I understood the key point and also confirmed the 
performance was improved through the test, replacing columns with a table which 
has 50+ partitions (timeout on HS2 -> 200 seconds after applying), on my 
environment. I was also concerned about calling `getPartitions` (slow 
function), but I found it did not call if we enabled the direct sql feature on 
this patch. Good!

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23959) Provide an option to wipe out column stats for partitioned tables in case of column removal

2020-08-03 Thread Yushi Hayasaka (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170498#comment-17170498
 ] 

Yushi Hayasaka commented on HIVE-23959:
---

Hello, I'm interested in dealing with the issue since we have difficulty with 
it.
Just curious, how does it affect performance?
Also, it seems to replace calling `clearColumnStatsState` instead of 
`updateOrGetPartitionColumnStats` for partitions. I think the performance 
improvement is here.
Is it correct? Or does it have any other improvement too?

> Provide an option to wipe out column stats for partitioned tables in case of 
> column removal
> ---
>
> Key: HIVE-23959
> URL: https://issues.apache.org/jira/browse/HIVE-23959
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> in case of column removal / replacement - an update for each partition is 
> neccessary; which could take a while.
> goal here is to provide an option to switch to the bulk removal of column 
> statistics instead of working hard to retain as much as possible from the old 
> stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended

2020-07-07 Thread Yushi Hayasaka (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152623#comment-17152623
 ] 

Yushi Hayasaka commented on HIVE-23806:
---

Hello, I think this change does not improve the performance in case of dropping 
column, right?

> Avoid clearing column stat states in all partition in case schema is extended
> -
>
> Key: HIVE-23806
> URL: https://issues.apache.org/jira/browse/HIVE-23806
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> in case there are many partitions; adding a new column without cascade may 
> take a while - because we want to make sure in schema evolution cases that we 
> don't reuse stats later-on by mistake...
> however this is not neccessary in case the schema is extended



--
This message was sent by Atlassian Jira
(v8.3.4#803005)