Rollo Konig-brock created ARROW-7800:
----------------------------------------
Summary: [Python] Expose GetRecordBatchReader API in PyArrow
Key: ARROW-7800
URL: https://issues.apache.org/jira/browse/ARROW-7800
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Rollo Konig-brock
Fix For: 1.0.0
The GetRecordBatchReader API is really useful for streaming ParquetFiles with
lots of RLE.
I propose exposing this API in PyArrow in the following manner:
{code}
file_ = ParquetFile('file/path.parquet', batch_size=100)
for batch in file_.get_batches():
pass
{code}
(If anyone has any better ideas hit me up, I'm not 100% sold on exposing it
this way.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)