[ 
https://issues.apache.org/jira/browse/HIVE-27190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051800#comment-18051800
 ] 

Denys Kuzmenko edited comment on HIVE-27190 at 1/14/26 12:46 PM:
-----------------------------------------------------------------

Even better would be if you could provide a flamegraph, so we can pinpoint 
where CPU cycles are being spent inefficiently. 
Currently, for unpartitioned tables, each column stats object is stored in a 
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics. 
We read and aggregate the blobs for the relevant partitions, then discard any 
columns that are not part of the projection.


was (Author: dkuzmenko):
Even better would be if you could provide a flamegraph, so we can pinpoint 
where CPU cycles are being spent inefficiently. 
Currently, for unpartitioned tables, each column stats object is stored in a 
separate Puffin blob. For partitioned tables - it's the whole ColumnStatistics. 
We read and aggregate the blobs for the relevant partitions, then deserialize 
the data and discard any columns that are not part of the projection.

> Implement  col stats cache for hive iceberg table
> -------------------------------------------------
>
>                 Key: HIVE-27190
>                 URL: https://issues.apache.org/jira/browse/HIVE-27190
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Simhadri Govindappa
>            Assignee: Simhadri Govindappa
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to