dengzhhu653 commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1192192469
##########
iceberg/iceberg-handler/src/test/results/positive/col_stats.q.out:
##########
@@ -339,17 +339,16 @@ POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@tbl_ice_puffin
col_name a
data_type int
-min 1
-max 333
-num_nulls 0
-distinct_count 7
+min
+max
+num_nulls
+distinct_count
Review Comment:
No, the `desc formatted tbl_ice_puffin a` doesn't fetch the stats from
puffin files though with `hive.iceberg.stats.source=iceberg`, instead it goes
to metastore for the stats.
the `tbl_ice_puffin` is an external table and recreated(inserted) multiple
times before the `desc` query, this time when the table created, the legacy
data files left behind make HMS believe that the column stats is stale,
so stats of the insertion("values (1, 'one', 50), (2, 'two', 51),(2, 'two',
51),(2, 'two', 51), (3, 'three', 52), (4, 'four', 53)") after cann't be merged.
there is a `explain select * from tbl_ice_puffin order by a, b, c;` before
the `desc`, as we can see, the stats stored in puffin files are not removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]