Weston Pace created ARROW-15759:
-----------------------------------

             Summary: [C++] Investigate scanning parquet files at sub-row-group 
resolution
                 Key: ARROW-15759
                 URL: https://issues.apache.org/jira/browse/ARROW-15759
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Most of the Arrow APIs read from a parquet file one entire row group at a time. 
 The Parquet reader should allow us to read a single page at a time.  When 
scanning a dataset we often want to read in relatively small (e.g. 1M rows) 
sized batches to increase parallelism, decrease memory usage, and decrease 
latency.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to