[ 
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052022#comment-18052022
 ] 

Denys Kuzmenko edited comment on HIVE-27190 at 1/15/26 10:34 AM:
-----------------------------------------------------------------

[~lisoda], is this a read-path performance issue, with no writes involved and 
therefore no statistics updates?

Basic stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517

Table column stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728

Partition column stats (Hive-4.1+):
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759

1. I don't see where this would call 
[BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184]

2. Iceberg API used to retrieve basic partition stats (Hive-4.1+): 
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L572-L573

3. Not sure whether any of these functions are invoked multiple times during 
planning, but if they are, caching would definitely help with column statistics 
retrieval.




was (Author: dkuzmenko):
[~lisoda], is this a read-path performance issue, with no writes involved and 
therefore no statistics updates?

Basic stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517

Table column stats:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728

Partition column stats (Hive-4.1+):
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759

1. I don't see where this would call 
[BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184]

2. Iceberg API used to retrieve basic partition stats: 
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L572-L573

3. Not sure whether any of these functions are invoked multiple times during 
planning, but if they are, caching would definitely help with column statistics 
retrieval.



> Implement  col stats cache for hive iceberg table
> -------------------------------------------------
>
>                 Key: HIVE-27190
>                 URL: https://issues.apache.org/jira/browse/HIVE-27190
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Simhadri Govindappa
>            Assignee: Simhadri Govindappa
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to