[ 
https://issues.apache.org/jira/browse/PARQUET-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776580#comment-17776580
 ] 

Raunaq Morarka commented on PARQUET-411:
----------------------------------------

I believe this issue is addressed by the changes done for 
https://issues.apache.org/jira/browse/PARQUET-2352

cc: [~wgtmac] 

> Format: Add a flag when min/max are truncated
> ---------------------------------------------
>
>                 Key: PARQUET-411
>                 URL: https://issues.apache.org/jira/browse/PARQUET-411
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: format-2.3.1
>            Reporter: Ryan Blue
>            Priority: Major
>
> PARQUET-372 drops page and column chunk stats when values are larger than 4k 
> to avoid storing very large values in page headers and the file footer. An 
> alternative approach is to truncate the values, which would still allow 
> filtering on page stats. The problem with truncating values is that the value 
> in stats may not be the true min or max so engines that use these values as 
> the result of aggregations like {{min(col)}} would return incorrect data. We 
> should consider adding metadata to allow truncating values for filtering that 
> captures the fact that the values have been modified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to