[ 
https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511786#comment-17511786
 ] 

Lubo Slivka commented on ARROW-15969:
-------------------------------------

Sorry for confusion - I have not expressed myself properly - I meant the exact 
thing as you drawn :)

The reason I see for not doing the class hierarchy as you propose is repeated 
streaming.

As I gather - and please correct me if I'm wrong, I'm very new to Arrow, and 
may be basing my argument on wrong assumption: one can get 
RecordBatchFileReader open as long as feasible and read from it at will. 
Keeping the file open cuts down on IO overhead, so it is a good idea to reuse 
it.

Having RecordBatchFileReader extend RecordBatchReader and implementing the 
necessary methods means client can stream the file once. To stream again, a new 
instance of RecordBatchFileReader has to be created.. or it is necessary to add 
some kind of Reset() function to allow streaming the whole file again. imho the 
adapter on top of RecordBatchFileReader is cleaner way that 'naturally' allows 
for repeated streaming.

> [C++][Python] Add conversion from RecordBatchFileReader to RecordBatchReader
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-15969
>                 URL: https://issues.apache.org/jira/browse/ARROW-15969
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Lubo Slivka
>            Priority: Major
>
> The suggested improvement is to introduce a conversion/adapter so that all 
> batches from RecordBatchFileReader can be read one-by-one using 
> RecordBatchReader.
> Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
> follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
> which also has to_reader().
> *Motivation*
> Record Batches serialized into IPC file format can be read using 
> RecordBatchFileReader. The interface of this reader is incompatible with 
> RecordBatchReader.
> This impacts for instance the Flight RPC DoGet, where it is not possible to 
> efficiently (e.g. fully in C++) send out all data by using 
> pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
> client code wants to read data batch-by-batch transparently, without caring 
> about the serialization format.
> Further background is here: 
> [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to