Zhihua Deng created HIVE-29265:
----------------------------------

             Summary: UnsupportedDoubleException could leave the stale column 
marker in COLUMN_STATS_ACCURATE
                 Key: HIVE-29265
                 URL: https://issues.apache.org/jira/browse/HIVE-29265
             Project: Hive
          Issue Type: Bug
          Components: Statistics
            Reporter: Zhihua Deng


Take the schema_evol_orc_nonvec_part.q as an example, 
{code:java}
CREATE TABLE 
part_change_lower_to_higher_numeric_group_decimal_to_float_n7(insert_num int,
           c1 decimal(38,18), c2 decimal(38,18),
           c3 float,
           b STRING) PARTITIONED BY(part INT);
insert into table part_change_lower_to_higher_numeric_group_decimal_to_float_n7 
partition(part=1) SELECT insert_num,
           decimal1, decimal1,
           float1,
          'original' FROM schema_evolution_data_n25; {code}
for column c3, the above query will throw UnsupportedDoubleException on 
gathering the column stats, as a result this column stats is ignored, we 
couldn't find the stats entry in part_col_stats. While in partition_params, the 
column stats c3 is marked as true: 
\{"BASIC_STATS":"true","COLUMN_STATS":{"b":"true","c1":"true","c2":"true","c3":"true","insert_num":"true"}}

If a valid insert happens afterwards, the new column stats for c3 will take 
over, this would make the c3 stats incorrect.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to