[
https://issues.apache.org/jira/browse/ARROW-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496570#comment-17496570
]
Joris Van den Bossche commented on ARROW-15759:
-----------------------------------------------
Maybe not strictly needed to move towards reading data per page, but I suppose
it would then also be nice to support filtering on that level: ARROW-10158
> [C++] Investigate scanning parquet files at sub-row-group resolution
> --------------------------------------------------------------------
>
> Key: ARROW-15759
> URL: https://issues.apache.org/jira/browse/ARROW-15759
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> Most of the Arrow APIs read from a parquet file one entire row group at a
> time. The Parquet reader should allow us to read a single page at a time.
> When scanning a dataset we often want to read in relatively small (e.g. 1M
> rows) sized batches to increase parallelism, decrease memory usage, and
> decrease latency.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)