[
https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759079#comment-17759079
]
ASF GitHub Bot commented on PARQUET-2261:
-----------------------------------------
m29498 commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1693533716
Thanks @GregoryKimball and @etseidl We would also find this change very
useful. As @GregoryKimball mentioned, we can use the extra size statistics in
the page footer to be able to more accurately predict the memory usage of
decompressed pages in files.
We have a usecase based on rapidsai/cudf that would greatly benefit from the
the chunked Parquet reader working in the manner described above. Currently, we
have to go to a lot of work in our GPU based Parquet reader to ensure that we
don't try to read more of a Parquet file than we have room to decompress in GPU
memory. With this change, that information would be available in the file and
no prediction of sizes would be necessary.
We would really like to see this implemented!
> [Format] Add statistics that reflect decoded size to metadata
> -------------------------------------------------------------
>
> Key: PARQUET-2261
> URL: https://issues.apache.org/jira/browse/PARQUET-2261
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-format
> Reporter: Micah Kornfield
> Assignee: Micah Kornfield
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)