[ 
https://issues.apache.org/jira/browse/ARROW-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-228:
-----------------------------
    External issue URL: https://github.com/apache/arrow/issues/15573

> [Python] Create an Arrow-cpp-compatible interface for reading bytes from 
> Python file-like objects 
> --------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-228
>                 URL: https://issues.apache.org/jira/browse/ARROW-228
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>            Priority: Major
>             Fix For: 0.2.0
>
>
> In practice, IO interfaces in PyArrow will need to be bidirectional
> - Exposing internal IO interfaces written purely in C++ to Python users as 
> file-like objects
> - Exposing Python file-like objects to the C++ IO subsystem
> To do this efficiently, we may want to introduce an arrow::Buffer subclass 
> that manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on 
> destruction, the GIL is acquired and the object's refcount is decremented). 
> We can still implement a Read method that copies bytes into some other 
> buffer, after which the PyBytes is immediately destroyed.
> Outside of these byte buffer management issues, wrapping a file-like object 
> (having read() -> bytes, seek(), tell(), and other basic file methods) is 
> fairly straightforward, and will allow any of the current or upcoming IO 
> adapters to read either from native classes (file system, HDFS, etc.) or 
> arbitrary Python streams.
> To give a concrete example: consider the output of a GET http request -- this 
> can be put in a {{io.BytesIO}} object and then treated as a first class 
> citizen alongside the native (C++) IO classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to