jorisvandenbossche opened a new issue, #39195: URL: https://github.com/apache/arrow/issues/39195
https://github.com/apache/arrow/pull/37797 added a Python layer for working with the C Data Interface through capsules and defined dunder methods, described at https://arrow.apache.org/docs/dev/format/CDataInterface/PyCapsuleInterface.html. Currently, libraries are using pyarrow's `_export_to_c`/`_import_from_c` semi-private methods to get the C Data Interface structs / construct pyarrow object from the structs. With the formalized PyCapsule protocol, the idea is that this communication of the structs now happens with defined dunder methods `__arrow_c_schema__` / `__arrow_c_array__` / `__arrow_c_stream__` to get the data as a C Data Interface struct wrapped into a capsule instead as raw integer (export). In addition, the pyarrow constructors (`pa.array(..)`, `pa.table(..)`, ..) will now also accept any object implementing the relevant dunder method (import). This new Python-level interface has the benefits to be 1) more robust (not passing around pointers as integers, the capsule will ensure the data is released when an error would happen in the middle), and 2) not tied to pyarrow the library. This means that other libraries can also start accepting any Arrow-compatible data structure, instead of harcoding support for pyarrow and using pyarrow's `_export_to_c`. We want to promote other libraries to start supporting this protocol as well, which can mean: - Add the dunders on their own objects where appropriate (so that other libraries will recognize your data structures as holding Arrow-compatible data) - Where ingesting data (eg where there might be specific pyarrow support right now), recognize objects that implement the dunder methods In addition, there are also some steps we can take within the Arrow project itself to further promote this: - [ ] [Python][Docs] Document the protocol in the "extending pyarrow" section of the Python docs (https://arrow.apache.org/docs/dev/python/extending_types.html) - [ ] [Python][Java][Docs] Update the "Integrating PyArrow with Java" documentation section to not use `_import_from_c`/`_export_to_c` (https://arrow.apache.org/docs/dev/python/integration/python_java.html) - [ ] [Python][R][Docs] Update the "Integrating PyArrow with R" documentation section to not use `_import_from_c`/`_export_to_c` (https://arrow.apache.org/docs/dev/python/integration/python_r.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
