[
https://issues.apache.org/jira/browse/PARQUET-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776580#comment-17776580
]
Raunaq Morarka commented on PARQUET-411:
----------------------------------------
I believe this issue is addressed by the changes done for
https://issues.apache.org/jira/browse/PARQUET-2352
cc: [~wgtmac]
> Format: Add a flag when min/max are truncated
> ---------------------------------------------
>
> Key: PARQUET-411
> URL: https://issues.apache.org/jira/browse/PARQUET-411
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Affects Versions: format-2.3.1
> Reporter: Ryan Blue
> Priority: Major
>
> PARQUET-372 drops page and column chunk stats when values are larger than 4k
> to avoid storing very large values in page headers and the file footer. An
> alternative approach is to truncate the values, which would still allow
> filtering on page stats. The problem with truncating values is that the value
> in stats may not be the true min or max so engines that use these values as
> the result of aggregations like {{min(col)}} would return incorrect data. We
> should consider adding metadata to allow truncating values for filtering that
> captures the fact that the values have been modified.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)