Weston Pace created ARROW-15759:
-----------------------------------
Summary: [C++] Investigate scanning parquet files at sub-row-group
resolution
Key: ARROW-15759
URL: https://issues.apache.org/jira/browse/ARROW-15759
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Most of the Arrow APIs read from a parquet file one entire row group at a time.
The Parquet reader should allow us to read a single page at a time. When
scanning a dataset we often want to read in relatively small (e.g. 1M rows)
sized batches to increase parallelism, decrease memory usage, and decrease
latency.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)