[
https://issues.apache.org/jira/browse/HIVE-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis reassigned HIVE-23781:
------------------------------------------
> Incomplete partition column stats in CachedStore may lead to wrong aggregate
> stats
> ----------------------------------------------------------------------------------
>
> Key: HIVE-23781
> URL: https://issues.apache.org/jira/browse/HIVE-23781
> Project: Hive
> Issue Type: Bug
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> Requesting aggregate stats from the Metastore
> ({{RawStore#get_aggr_stats_for}}) may return wrong results when the backing
> implementation is CachedStore and column statistics are missing from the
> cache.
>
> The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}}
> that returns an [empty
> object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
> when no stats are found in the cache. This is considered a valid value by
> the consumer so no additional lookup is performed in the rawstore to fetch
> the actual values.
> Moreover, in the case where the cache holds values for some partitions but
> not for all those requested the result will be wrong assuming that the
> underlying rawstore has information about the requested partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)