[
https://issues.apache.org/jira/browse/HIVE-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Rebele updated HIVE-29432:
---------------------------------
Summary: Autogather column statistics missing for tables containing a
column with an unsupported type (was: Statistics missing for tables with a
TIMESTAMP WITH LOCAL TIME ZONE)
> Autogather column statistics missing for tables containing a column with an
> unsupported type
> --------------------------------------------------------------------------------------------
>
> Key: HIVE-29432
> URL: https://issues.apache.org/jira/browse/HIVE-29432
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.3.0
> Reporter: Thomas Rebele
> Priority: Major
>
> Given the following qfile:
> {code:java}
> set hive.stats.kll.enable=true;
> set metastore.stats.fetch.bitvector=true;
> set metastore.stats.fetch.kll=true;
> set hive.stats.autogather=true;
> set hive.stats.column.autogather=true;
> CREATE TABLE test_stats0 (a int, b timestamp) STORED AS TEXTFILE;
> CREATE TABLE test_stats1 (a int, b timestamp with local time zone) STORED AS
> TEXTFILE;
> INSERT INTO test_stats0 (a, b) VALUES (1, "2020-11-02 00:00:00");
> INSERT INTO test_stats1 (a, b) VALUES (1, "2020-11-02 00:00:00");
> DESCRIBE FORMATTED test_stats0 a;
> DESCRIBE FORMATTED test_stats0 b;
> DESCRIBE FORMATTED test_stats1 a;
> DESCRIBE FORMATTED test_stats1 b;
> {code}
> The statistics for test_stats0 column a are computed successfully:
> {code:java}
> POSTHOOK: Input: default@test_stats0
> col_name a
> data_type int
> min 1
> max 1
> num_nulls 0
> distinct_count 1
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector HL
> histogram Q1: 1, Q2: 1, Q3: 1
> {code}
> However, the statistics for test_stats1 column a are missing:
> {code:java}
> POSTHOOK: Input: default@test_stats1
> col_name a
> data_type int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> histogram
> {code}
> Similar for column b, i.e., stats are available for table test_stats0, but
> not for test_stats1.
> Even if the stats for a TIMESTAMP WITH LOCAL TIME ZONE column cannot be
> calculated, it should not affect the other columns.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)