[ 
https://issues.apache.org/jira/browse/HIVE-29235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024161#comment-18024161
 ] 

Jaeho Yoo edited comment on HIVE-29235 at 10/1/25 11:15 PM:
------------------------------------------------------------

[~dkuzmenko] 
It's basicStats. It estimates stats in absense of statistics. 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L275]

 

Also, when in optimizer, when it tries to get row count, it sums up all row 
count for each partitions.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java#L933-L958


was (Author: jyoo94):
[~dkuzmenko] 
It's basicStats. It estimates stats in absense of statistics. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L275

> Iceberg returns incorrect count value
> -------------------------------------
>
>                 Key: HIVE-29235
>                 URL: https://issues.apache.org/jira/browse/HIVE-29235
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jaeho Yoo
>            Priority: Major
>
> For iceberg table, Hive tries to read partitionStatistics.
> But if the table doesn't have them, Hive calculates using default statistics, 
> which is incorrect.
> We are using Hive 4.1.0.
> SELECT count(*), log_date FROM db1.tbl1 GROUP BY 2;
> +----------+-------------+
> | _c0 | log_date |
> +----------+-------------+
> | 343662 | 2025-09-29 |
> | 2513459 | 2025-09-30 |  
>  
>  
> SELECT count(*) FROM db1.tb1 WHERE log_date = '2025-09-29'; // 2857121
> SELECT count(*) FROM db1.tb1 WHERE log_date = '2025-09-30'; // 2857121
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to