[ 
https://issues.apache.org/jira/browse/IMPALA-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges resolved IMPALA-10879.
-----------------------------------
    Resolution: Implemented

> Add parquet stats to iceberg manifest
> -------------------------------------
>
>                 Key: IMPALA-10879
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10879
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>    Affects Versions: Impala 4.0.0
>            Reporter: Attila Jeges
>            Assignee: Attila Jeges
>            Priority: Major
>              Labels: impala-iceberg
>
> Parquet stats should be written to iceberg manifest as per-datafile metrics.
> This task is specifically about the following metrics:
> - column_sizes : Map from column id to the total size on disk of all regions 
> that store the column. Does not include bytes necessary to read other 
> columns, like footers. Leave null for row-oriented formats
> - null_value_counts : Map from column id to number of null values in the 
> column.
> - lower_bounds : Map from column id to lower bound in the column serialized 
> as binary. Each value must be less than or equal to all non-null, non-NaN 
> values in the column for the file.
> - upper_bounds : Map from column id to upper bound in the column serialized 
> as binary. Each value must be greater than or equal to all non-null, non-Nan 
> values in the column for the file.
> Iceberg manifest doc: 
> https://iceberg.apache.org/spec/#manifests
> lower_bounds and upper_bounds values should be Single-value serialized to 
> binary:
> https://iceberg.apache.org/spec/#appendix-d-single-value-serialization



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to