[ 
https://issues.apache.org/jira/browse/HIVE-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-23781:
------------------------------------------


> Incomplete partition column stats in CachedStore may lead to wrong aggregate 
> stats
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-23781
>                 URL: https://issues.apache.org/jira/browse/HIVE-23781
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> Requesting aggregate stats from the Metastore 
> ({{RawStore#get_aggr_stats_for}}) may return wrong results when the backing 
> implementation is CachedStore and column statistics are missing from the 
> cache.
>  
> The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} 
> that returns an [empty 
> object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
>  when no stats are found in the cache. This is considered a valid value by 
> the consumer so no additional lookup is performed in the rawstore to fetch 
> the actual values.
> Moreover, in the case where the cache holds values for some partitions but 
> not for all those requested the result will be wrong assuming that the 
> underlying rawstore has information about the requested partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to