[ 
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052022#comment-18052022
 ] 

Denys Kuzmenko edited comment on HIVE-27190 at 1/15/26 9:42 AM:
----------------------------------------------------------------

[~lisoda], is this a read-path performance issue, with no writes involved and 
therefore no statistics updates?

Basic stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517

Table column stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728

Partition column stats (Hive-4.1+):
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759

1. I don't see where this would call 
[BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184]

2. Not sure whether any of these functions are invoked multiple times during 
planning, but if they are, caching would definitely help with column statistics 
retrieval.

3. Iceberg API used to retrieve basic partition stats: 
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L572-L573



was (Author: dkuzmenko):
[~lisoda], is this a read-path performance issue, with no writes involved and 
therefore no statistics updates?

Basic stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517

Table column stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728

Partition column stats (Hive-4.1+):
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759

1. I don't see where this would call 
[BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184]

2. Not sure whether any of these functions are invoked multiple times during 
planning, but if they are, caching would definitely help with column statistics 
retrieval.


> Implement  col stats cache for hive iceberg table
> -------------------------------------------------
>
>                 Key: HIVE-27190
>                 URL: https://issues.apache.org/jira/browse/HIVE-27190
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Simhadri Govindappa
>            Assignee: Simhadri Govindappa
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to