Weston Pace created ARROW-15759: ----------------------------------- Summary: [C++] Investigate scanning parquet files at sub-row-group resolution Key: ARROW-15759 URL: https://issues.apache.org/jira/browse/ARROW-15759 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Weston Pace
Most of the Arrow APIs read from a parquet file one entire row group at a time. The Parquet reader should allow us to read a single page at a time. When scanning a dataset we often want to read in relatively small (e.g. 1M rows) sized batches to increase parallelism, decrease memory usage, and decrease latency. -- This message was sent by Atlassian Jira (v8.20.1#820001)