[
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051800#comment-18051800
]
Denys Kuzmenko edited comment on HIVE-27190 at 1/14/26 12:46 PM:
-----------------------------------------------------------------
Even better would be if you could provide a flamegraph, so we can pinpoint
where CPU cycles are being spent inefficiently.
Currently, for unpartitioned tables, each column stats object is stored in a
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics.
We read and aggregate the blobs for the relevant partitions, then discard any
columns that are not part of the projection.
was (Author: dkuzmenko):
Even better would be if you could provide a flamegraph, so we can pinpoint
where CPU cycles are being spent inefficiently.
Currently, for unpartitioned tables, each column stats object is stored in a
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics.
We read and aggregate the blobs for the relevant partitions, then deserialize
the data and discard any columns that are not part of the projection.
> Implement col stats cache for hive iceberg table
> -------------------------------------------------
>
> Key: HIVE-27190
> URL: https://issues.apache.org/jira/browse/HIVE-27190
> Project: Hive
> Issue Type: Improvement
> Reporter: Simhadri Govindappa
> Assignee: Simhadri Govindappa
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)