dengzhhu653 commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1192192469


##########
iceberg/iceberg-handler/src/test/results/positive/col_stats.q.out:
##########
@@ -339,17 +339,16 @@ POSTHOOK: type: DESCTABLE
 POSTHOOK: Input: default@tbl_ice_puffin
 col_name               a                   
 data_type              int                 
-min                    1                   
-max                    333                 
-num_nulls              0                   
-distinct_count         7                   
+min                                        
+max                                        
+num_nulls                                  
+distinct_count                             

Review Comment:
   The `desc formatted tbl_ice_puffin a` doesn't fetch the stats from puffin 
files though with `hive.iceberg.stats.source=iceberg`, instead it goes to 
metastore for the stats.
   
   The `tbl_ice_puffin` is an external table and recreated(inserted) multiple 
times before the desc, so this time when the table created, the legacy data 
files left behind make HMS believe that the column stats is stale(eg, cann't 
assume the row number is 0 and the min/max of column a),
   as a result stats of the insertion("values (1, 'one', 50), (2, 'two', 
51),(2, 'two', 51),(2, 'two', 51), (3, 'three', 52), (4, 'four', 53)") after 
cann't be merged.
   
   There is an `explain select * from tbl_ice_puffin order by a, b, c;` before 
the desc, as we can see, the stats stored in puffin files are not removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to