[ 
https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lubo Slivka updated ARROW-15969:
--------------------------------
    Description: 
The suggested improvement is to introduce a conversion/adapter so that all 
batches from RecordBatchFileReader can be read one-by-one using 
RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using 
RecordBatchFileReader. The interface of this reader is incompatible with 
RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to 
efficiently (e.g. fully in C++) send out all data by using 
pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
client code wants to read data batch-by-batch transparently, without caring 
about the serialization format.

Further background is here: 
[https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 

  was:
The suggested improvement is to introduce a conversion/adapter so that all 
batches from RecordBatchFileReader can be read one-by-one, once using 
RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using 
RecordBatchFileReader. The interface of this reader is incompatible with 
RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to 
efficiently (e.g. fully in C++) send out all data by using 
pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
client code wants to read data batch-by-batch transparently, without caring 
about the serialization format.

Further background is here: 
[https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 


> [Python] Add conversion from RecordBatchFileReader to RecordBatchReader
> -----------------------------------------------------------------------
>
>                 Key: ARROW-15969
>                 URL: https://issues.apache.org/jira/browse/ARROW-15969
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Lubo Slivka
>            Priority: Major
>
> The suggested improvement is to introduce a conversion/adapter so that all 
> batches from RecordBatchFileReader can be read one-by-one using 
> RecordBatchReader.
> Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
> follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
> which also has to_reader().
> *Motivation*
> Record Batches serialized into IPC file format can be read using 
> RecordBatchFileReader. The interface of this reader is incompatible with 
> RecordBatchReader.
> This impacts for instance the Flight RPC DoGet, where it is not possible to 
> efficiently (e.g. fully in C++) send out all data by using 
> pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
> client code wants to read data batch-by-batch transparently, without caring 
> about the serialization format.
> Further background is here: 
> [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to