Attila Jeges created IMPALA-10879:
-------------------------------------

             Summary: Add parquet stats to iceberg manifest
                 Key: IMPALA-10879
                 URL: https://issues.apache.org/jira/browse/IMPALA-10879
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend, Frontend
    Affects Versions: Impala 4.0.0
            Reporter: Attila Jeges
            Assignee: Attila Jeges


Parquet stats should be written to iceberg manifest as per-datafile metrics.

This task is specifically about the following metrics:
- column_sizes : Map from column id to the total size on disk of all regions 
that store the column. Does not include bytes necessary to read other columns, 
like footers. Leave null for row-oriented formats
- null_value_counts : Map from column id to number of null values in the column.
- lower_bounds : Map from column id to lower bound in the column serialized as 
binary. Each value must be less than or equal to all non-null, non-NaN values 
in the column for the file.
- upper_bounds : Map from column id to upper bound in the column serialized as 
binary. Each value must be greater than or equal to all non-null, non-Nan 
values in the column for the file.

Iceberg manifest doc: 
https://iceberg.apache.org/spec/#manifests

lower_bounds and upper_bounds values should be Single-value serialized to 
binary:
https://iceberg.apache.org/spec/#appendix-d-single-value-serialization



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to