[ https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511724#comment-17511724 ]
Alessandro Molina edited comment on ARROW-15969 at 3/24/22, 9:31 AM: --------------------------------------------------------------------- {quote}Imho due to this difference, implementing RBR on the file reader would lead to 'awkwardness' down the road - what if user wants to consume contents of the file using RBR multiple times? {quote} I wasn't suggesting that {{RecordBatchReader}} should be implemented on top of {{{}RecordBatchFileReader{}}}, I was mostly wondering if it wasn't the case for the opposite. A random access reader should be able to reproduce all the features of a stream reader, plus some additional capabilities like seeking. The reason I was wondering is that if we exposed a class hierarchy like {code:java} RecordBatchReader /. \ / \ RecordBatchStreamReader RecordBatchFileReader{code} We wouldn't have to deal with "converting" file readers, they could be used directly wherever a \{{RecordBatchReader}} was needed In the end, I don't see a reason why {{RecordBatchFileReader}} can't have a {{get_next_batch}} method, when reading a file you have a current position for the current ongoing read. was (Author: amol-): {quote}Imho due to this difference, implementing RBR on the file reader would lead to 'awkwardness' down the road - what if user wants to consume contents of the file using RBR multiple times? {quote} I wasn't suggesting that {{RecordBatchReader}} should be implemented on top of {{{}RecordBatchFileReader{}}}, I was mostly wondering if it wasn't the case for the opposite. A random access reader should be able to reproduce all the features of a stream reader, plus some additional capabilities like seeking. The reason I was wondering is that if we exposed a class hierarchy like {code:java} RecordBatchReader /. \ / \ RecordBatchStreamReader RecordBatchFileReader{code} We wouldn't have to deal with "converting" file readers, they could be used directly wherever a {{RecordBatchReader }}was needed In the end, I don't see a reason why {{RecordBatchFileReader}} can't have a {{get_next_batch}} method, when reading a file you have a current position for the current ongoing read. > [C++][Python] Add conversion from RecordBatchFileReader to RecordBatchReader > ---------------------------------------------------------------------------- > > Key: ARROW-15969 > URL: https://issues.apache.org/jira/browse/ARROW-15969 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Reporter: Lubo Slivka > Priority: Major > > The suggested improvement is to introduce a conversion/adapter so that all > batches from RecordBatchFileReader can be read one-by-one using > RecordBatchReader. > Perhaps a new instance method RecordBatchFileReader.to_reader()? This would > follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader > which also has to_reader(). > *Motivation* > Record Batches serialized into IPC file format can be read using > RecordBatchFileReader. The interface of this reader is incompatible with > RecordBatchReader. > This impacts for instance the Flight RPC DoGet, where it is not possible to > efficiently (e.g. fully in C++) send out all data by using > pyarrow.flight.RecordBatchStream. However, there may be other use cases where > client code wants to read data batch-by-batch transparently, without caring > about the serialization format. > Further background is here: > [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb] > -- This message was sent by Atlassian Jira (v8.20.1#820001)