[
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051800#comment-18051800
]
Denys Kuzmenko edited comment on HIVE-27190 at 1/14/26 12:47 PM:
-----------------------------------------------------------------
Even better would be if you could provide a flamegraph, so we can pinpoint
where CPU cycles are being spent inefficiently.
Currently, for unpartitioned tables, each column stats object is stored in a
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics.
We read and aggregate the blobs for the relevant partitions, then discard any
columns that are not part of the projection (no secondary indexes in puffin).
was (Author: dkuzmenko):
Even better would be if you could provide a flamegraph, so we can pinpoint
where CPU cycles are being spent inefficiently.
Currently, for unpartitioned tables, each column stats object is stored in a
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics.
We read and aggregate the blobs for the relevant partitions, then discard any
columns that are not part of the projection.
> Implement col stats cache for hive iceberg table
> -------------------------------------------------
>
> Key: HIVE-27190
> URL: https://issues.apache.org/jira/browse/HIVE-27190
> Project: Hive
> Issue Type: Improvement
> Reporter: Simhadri Govindappa
> Assignee: Simhadri Govindappa
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)