jorisvandenbossche opened a new issue, #39195:
URL: https://github.com/apache/arrow/issues/39195

   https://github.com/apache/arrow/pull/37797 added a Python layer for working 
with the C Data Interface through capsules and defined dunder methods, 
described at 
https://arrow.apache.org/docs/dev/format/CDataInterface/PyCapsuleInterface.html.
   
   Currently, libraries are using pyarrow's `_export_to_c`/`_import_from_c` 
semi-private methods to get the C Data Interface structs / construct pyarrow 
object from the structs. 
   With the formalized PyCapsule protocol, the idea is that this communication 
of the structs now happens with defined dunder methods `__arrow_c_schema__` / 
`__arrow_c_array__` / `__arrow_c_stream__` to get the data as a C Data 
Interface struct wrapped into a capsule instead as raw integer (export). In 
addition, the pyarrow constructors (`pa.array(..)`, `pa.table(..)`, ..) will 
now also accept any object implementing the relevant dunder method (import). 
   
   This new Python-level interface has the benefits to be 1) more robust (not 
passing around pointers as integers, the capsule will ensure the data is 
released when an error would happen in the middle), and 2) not tied to pyarrow 
the library. This means that other libraries can also start accepting any 
Arrow-compatible data structure, instead of harcoding support for pyarrow and 
using pyarrow's `_export_to_c`.
   
   We want to promote other libraries to start supporting this protocol as 
well, which can mean:
   
   - Add the dunders on their own objects where appropriate (so that other 
libraries will recognize your data structures as holding Arrow-compatible data)
   - Where ingesting data (eg where there might be specific pyarrow support 
right now), recognize objects that implement the dunder methods
   
   In addition, there are also some steps we can take within the Arrow project 
itself to further promote this:
   
   - [ ] [Python][Docs] Document the protocol in the "extending pyarrow" 
section of the Python docs 
(https://arrow.apache.org/docs/dev/python/extending_types.html)
   - [ ] [Python][Java][Docs] Update the "Integrating PyArrow with Java" 
documentation section to not use `_import_from_c`/`_export_to_c` 
(https://arrow.apache.org/docs/dev/python/integration/python_java.html)
   - [ ] [Python][R][Docs] Update the "Integrating PyArrow with R" 
documentation section to not use `_import_from_c`/`_export_to_c` 
(https://arrow.apache.org/docs/dev/python/integration/python_r.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to