Stamatis Zampetakis created HIVE-23781:
------------------------------------------
Summary: Incomplete partition column stats in CachedStore may lead
to wrong aggregate stats
Key: HIVE-23781
URL: https://issues.apache.org/jira/browse/HIVE-23781
Project: Hive
Issue Type: Bug
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Requesting aggregate stats from the Metastore ({{RawStore#get_aggr_stats_for}})
may return wrong results when the backing implementation is CachedStore and
column statistics are missing from the cache.
The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} that
returns an [empty
object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
when no stats are found in the cache. This is considered a valid value by the
consumer so no additional lookup is performed in the rawstore to fetch the
actual values.
Moreover, in the case where the cache holds values for some partitions but not
for all those requested the result will be wrong assuming that the underlying
rawstore has information about the requested partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)