[
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052022#comment-18052022
] Denys Kuzmenko edited comment on HIVE-27190 at 1/15/26 10:34 AM: ----------------------------------------------------------------- [~lisoda], is this a read-path performance issue, with no writes involved and therefore no statistics updates? Basic stats: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517 Table column stats: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728 Partition column stats (Hive-4.1+): https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759 1. I don't see where this would call [BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184] 2. Iceberg API used to retrieve basic partition stats (Hive-4.1+): https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L572-L573 3. Not sure whether any of these functions are invoked multiple times during planning, but if they are, caching would definitely help with column statistics retrieval. was (Author: dkuzmenko): [~lisoda], is this a read-path performance issue, with no writes involved and therefore no statistics updates? Basic stats: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L475-L517 Table column stats: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L706-L728 Partition column stats (Hive-4.1+): https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L731-L759 1. I don't see where this would call [BaseMetastoreTableOperations#refreshFromMetadataLocation|https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L184] 2. Iceberg API used to retrieve basic partition stats: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L572-L573 3. Not sure whether any of these functions are invoked multiple times during planning, but if they are, caching would definitely help with column statistics retrieval. > Implement col stats cache for hive iceberg table > ------------------------------------------------- > > Key: HIVE-27190 > URL: https://issues.apache.org/jira/browse/HIVE-27190 > Project: Hive > Issue Type: Improvement > Reporter: Simhadri Govindappa > Assignee: Simhadri Govindappa > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
