[jira] [Commented] (ARROW-15759) [C++] Investigate scanning parquet files at sub-row-group resolution

Joris Van den Bossche (Jira) Wed, 23 Feb 2022 00:40:09 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496570#comment-17496570
 ]


Joris Van den Bossche commented on ARROW-15759:
-----------------------------------------------

Maybe not strictly needed to move towards reading data per page, but I suppose 
it would then also be nice to support filtering on that level: ARROW-10158

> [C++] Investigate scanning parquet files at sub-row-group resolution
> --------------------------------------------------------------------
>
>                 Key: ARROW-15759
>                 URL: https://issues.apache.org/jira/browse/ARROW-15759
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> Most of the Arrow APIs read from a parquet file one entire row group at a 
> time.  The Parquet reader should allow us to read a single page at a time.  
> When scanning a dataset we often want to read in relatively small (e.g. 1M 
> rows) sized batches to increase parallelism, decrease memory usage, and 
> decrease latency.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15759) [C++] Investigate scanning parquet files at sub-row-group resolution

Reply via email to