[ 
https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707453#comment-17707453
 ] 

ASF GitHub Bot commented on PARQUET-2261:
-----------------------------------------

emkornfield commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1492748667

   > Do we want to include these statistics at both row group (column chunk) 
and page level? For the latter I am not sure it is the right approach. We 
implemented column indexes so one would not need to read the page header to get 
the related statistics. We even stopped writing `Statistics` into page headers 
in parquet-mr. If we only want these for the column chunk level then I would 
suggest having it under `ColumnMetaData` directly.
   
   @gszadovsky
   Is there an argument against flexibility here?  I believe parquet-cpp still 
writes page headers.  One argument for page headers is it allows readers better 
incremental estimates of memory needed as they progress (although it is 
possible taking an average size per cell at column chunk is sufficient here)




> [Format] Add statistics that reflect decoded size to metadata
> -------------------------------------------------------------
>
>                 Key: PARQUET-2261
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2261
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to