[ https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747321#comment-16747321 ]
Paul Taylor commented on ARROW-4283: ------------------------------------ [~pitrou] Thanks for the feedback. I want to clarify: my Python skills aren't sharp, I'm not familiar with the pyarrow API or Python's asyncio/async-iterable primitives, so filter my comments through the lens of a beginner. The little experience I do have is using the RecordBatchStreamReader to read from stdin (via {{sys.stdin.buffer}}) and named file descriptors (via {{os.fdopen()}}). Since Python's so friendly (and I have no idea how the Python IO primitives work), I thought maybe I could pass aiohttp's {{Request.stream}} to the RecordBatchStreamReader constructor, and quickly learned that no, I can't ;). In the JS implementation we have two main entry points for reading RecordBatch streams: # a static [{{RecordBatchReader.from(source)}}|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/src/ipc/reader.ts#L142], which accepts heterogeneous source types and returns a RecordBatchReader for the underlying Arrow type (file, stream, or JSON) and conforms to sync/async semantics of the source input type # methods that create [through/transform streams|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/bin/file-to-stream.js#L33] from the RecordBatchReader and RecordBatchWriter, for use with node's native stream primitives Each link in the streaming pipeline is a sort of transform stream, and a significant amount of effort went into supporting all the different node/browser IO primitives, so I understand if that's too much to ask at this point. As an alternative, would it be possible to add a method that accepts a Python byte stream, and returns a zero-copy AsyncIterable of RecordBatches? Or maybe add an an example in the [python/ipc|https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-streams] docs page of how to do that? > Should RecordBatchStreamReader/Writer be AsyncIterable? > ------------------------------------------------------- > > Key: ARROW-4283 > URL: https://issues.apache.org/jira/browse/ARROW-4283 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Paul Taylor > Priority: Minor > Fix For: 0.13.0 > > > Filing this issue after a discussion today with [~xhochy] about how to > implement streaming pyarrow http services. I had attempted to use both Flask > and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s > streaming interfaces because they seemed familiar, but no dice. I have no > idea how hard this would be to add -- supporting all the asynciterable > primitives in JS was non-trivial. -- This message was sent by Atlassian JIRA (v7.6.3#76005)