Adding dev@ The is one purpose of the Arrow C data interface, which was developed after the __arrow_array__ protocol, and worth investigating
https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst On Sat, Sep 12, 2020 at 2:16 PM Marc Garcia <garcia.m...@gmail.com> wrote: > > Hi there, > > I'm writing a document analyzing different options for a Python dataframe > exchange protocol. And I wanted to ask a question regarding the > __arrow_array__ protocol. > > I checked the code, and looks like the producer is expected to be sending an > Arrow array, and the consumer just receives it. This is the code I'm > checking, I guess it's the right one: > https://github.com/apache/arrow/blob/master/python/pyarrow/array.pxi#L110 > > Compared to the array interface (the NumPy buffer protocol), it works a bit > differently. In the NumPy one, the producer exposes the pointer, the size... > So, the producer doesn't need to depend on NumPy or any other library, and > then the consumer can simply use `numpy.array(obj)` and generate the NumPy > array. Or if other implementations support the protocol (not sure if they > do), they could call something like `tensorflow.Tensor(obj)`, and NumPy would > not be used at all. > > Am I understanding correctly the `__arrow_array__` protocol? And if I am, is > there anything else similar to the NumPy protocol that can be used to > exchange data without relying on a particular implementation? > > Thanks in advance!