[arrow] branch main updated: GH-35531: [Python] C Data Interface PyCapsule Protocol (#37797)

apitrou Wed, 18 Oct 2023 04:45:20 -0700

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/main by this push:
     new d68f8e2164 GH-35531: [Python] C Data Interface PyCapsule Protocol 
(#37797)
d68f8e2164 is described below

commit d68f8e21643e8aec9ed253094edfd15c0a08f1c1
Author: Will Jones <[email protected]>
AuthorDate: Wed Oct 18 04:44:50 2023 -0700

    GH-35531: [Python] C Data Interface PyCapsule Protocol (#37797)
    
    ### Rationale for this change
    
    ### What changes are included in this PR?
    
    * A new specification for Arrow PyCapsules and related dunder methods
    * Implementing the dunder methods for `DataType`, `Field`, `Schema`, 
`Array`, `RecordBatch`, `Table`, and `RecordBatchReader`.
    
    ### Are these changes tested?
    
    Yes, I've added various roundtrip tests for each of the types.
    
    ### Are there any user-facing changes?
    
    This introduces some new APIs and documents them.
    
    * Closes: #34031
    * Closes: #35531
    
    Authored-by: Joris Van den Bossche <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
---
 docs/source/format/CDataInterface.rst              |  11 +
 .../format/CDataInterface/PyCapsuleInterface.rst   | 433 +++++++++++++++++++++
 docs/source/python/extending_types.rst             |   9 +
 docs/source/python/interchange_protocol.rst        |   2 +
 python/pyarrow/array.pxi                           |  81 +++-
 python/pyarrow/includes/libarrow.pxd               |   6 +-
 python/pyarrow/ipc.pxi                             |  68 +++-
 python/pyarrow/table.pxi                           | 148 ++++++-
 python/pyarrow/tests/test_array.py                 |  18 +
 python/pyarrow/tests/test_cffi.py                  | 126 +++++-
 python/pyarrow/tests/test_table.py                 |  87 +++++
 python/pyarrow/tests/test_types.py                 |  14 +
 python/pyarrow/types.pxi                           | 185 ++++++++-
 13 files changed, 1177 insertions(+), 11 deletions(-)

diff --git a/docs/source/format/CDataInterface.rst 
b/docs/source/format/CDataInterface.rst
index 8f49147096..e0884686ac 100644
--- a/docs/source/format/CDataInterface.rst
+++ b/docs/source/format/CDataInterface.rst
@@ -990,3 +990,14 @@ adaptation cost.
 
 
 .. _Python buffer protocol: https://www.python.org/dev/peps/pep-3118/
+
+Language-specific protocols
+===========================
+
+Some languages may define additional protocols on top of the Arrow C data
+interface.
+
+.. toctree::
+   :maxdepth: 1
+
+   CDataInterface/PyCapsuleInterface
diff --git a/docs/source/format/CDataInterface/PyCapsuleInterface.rst 
b/docs/source/format/CDataInterface/PyCapsuleInterface.rst
new file mode 100644
index 0000000000..263c428c1e
--- /dev/null
+++ b/docs/source/format/CDataInterface/PyCapsuleInterface.rst
@@ -0,0 +1,433 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+
+=============================
+The Arrow PyCapsule Interface
+=============================
+
+.. warning:: The Arrow PyCapsule Interface should be considered experimental
+
+Rationale
+=========
+
+The :ref:`C data interface <c-data-interface>` and
+:ref:`C stream interface <c-stream-interface>` allow moving Arrow data between
+different implementations of Arrow. However, these interfaces don't specify how
+Python libraries should expose these structs to other libraries. Prior to this,
+many libraries simply provided export to PyArrow data structures, using the
+``_import_from_c`` and ``_export_from_c`` methods. However, this always 
required
+PyArrow to be installed. In addition, those APIs could cause memory leaks if
+handled improperly.
+
+This interface allows any library to export Arrow data structures to other
+libraries that understand the same protocol.
+
+Goals
+-----
+
+* Standardize the `PyCapsule`_ objects that represent ``ArrowSchema``, 
``ArrowArray``,
+  and ``ArrowArrayStream``.
+* Define standard methods that export Arrow data into such capsule objects,
+  so that any Python library wanting to accept Arrow data as input can call the
+  corresponding method instead of hardcoding support for specific Arrow
+  producers.
+
+
+Non-goals
+---------
+
+* Standardize what public APIs should be used for import. This is left up to
+  individual libraries.
+
+PyCapsule Standard
+==================
+
+When exporting Arrow data through Python, the C Data Interface / C Stream 
Interface
+structures should be wrapped in capsules. Capsules avoid invalid access by
+attaching a name to the pointer and avoid memory leaks by attaching a 
destructor.
+Thus, they are much safer than passing pointers as integers.
+
+`PyCapsule`_ allows for a ``name`` to be associated with the capsule, allowing 
+consumers to verify that the capsule contains the expected kind of data. To 
make sure
+Arrow structures are recognized, the following names must be used:
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - C Interface Type
+     - PyCapsule Name
+   * - ArrowSchema
+     - ``arrow_schema``
+   * - ArrowArray
+     - ``arrow_array``
+   * - ArrowArrayStream
+     - ``arrow_array_stream``
+
+
+Lifetime Semantics
+------------------
+
+The exported PyCapsules should have a destructor that calls the
+:ref:`release callback <c-data-interface-released>`
+of the Arrow struct, if it is not already null. This prevents a memory leak in
+case the capsule was never passed to another consumer.
+
+If the capsule has been passed to a consumer, the consumer should have moved
+the data and marked the release callback as null, so there isn’t a risk of
+releasing data the consumer is using.
+:ref:`Read more in the C Data Interface specification 
<c-data-interface-released>`.
+
+Just like in the C Data Interface, the PyCapsule objects defined here can only
+be consumed once.
+
+For an example of a PyCapsule with a destructor, see `Create a PyCapsule`_.
+
+
+Export Protocol
+===============
+
+The interface consists of three separate protocols:
+
+* ``ArrowSchemaExportable``, which defines the ``__arrow_c_schema__`` method.
+* ``ArrowArrayExportable``, which defines the ``__arrow_c_array__`` method.
+* ``ArrowStreamExportable``, which defines the ``__arrow_c_stream__`` method.
+
+ArrowSchema Export
+------------------
+
+Schemas, fields, and data types can implement the method 
``__arrow_c_schema__``.
+
+.. py:method:: __arrow_c_schema__(self) -> object
+
+    Export the object as an ArrowSchema.
+
+    :return: A PyCapsule containing a C ArrowSchema representation of the
+        object. The capsule must have a name of ``"arrow_schema"``.
+
+
+ArrowArray Export
+-----------------
+
+Arrays and record batches (contiguous tables) can implement the method
+``__arrow_c_array__``.
+
+.. py:method:: __arrow_c_array__(self, requested_schema: object | None = None) 
-> Tuple[object, object]
+
+    Export the object as a pair of ArrowSchema and ArrowArray structures.
+
+    :param requested_schema: A PyCapsule containing a C ArrowSchema 
representation 
+        of a requested schema. Conversion to this schema is best-effort. See 
+        `Schema Requests`_.
+    :type requested_schema: PyCapsule or None
+
+    :return: A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
+        respectively. The schema capsule should have the name 
``"arrow_schema"``
+        and the array capsule should have the name ``"arrow_array"``.
+
+
+ArrowStream Export
+------------------
+
+Tables / DataFrames and streams can implement the method 
``__arrow_c_stream__``.
+
+.. py:method:: __arrow_c_stream__(self, requested_schema: object | None = 
None) -> object
+
+    Export the object as an ArrowArrayStream.
+
+    :param requested_schema: A PyCapsule containing a C ArrowSchema 
representation 
+        of a requested schema. Conversion to this schema is best-effort. See 
+        `Schema Requests`_.
+    :type requested_schema: PyCapsule or None
+
+    :return: A PyCapsule containing a C ArrowArrayStream representation of the
+        object. The capsule must have a name of ``"arrow_array_stream"``.
+
+Schema Requests
+---------------
+
+In some cases, there might be multiple possible Arrow representations of the
+same data. For example, a library might have a single integer type, but Arrow
+has multiple integer types with different sizes and sign. As another example,
+Arrow has several possible encodings for an array of strings: 32-bit offsets,
+64-bit offsets, string view, and dictionary-encoded. A sequence of strings 
could
+export to any one of these Arrow representations.
+
+In order to allow the caller to request a specific representation, the
+:meth:`__arrow_c_array__` and :meth:`__arrow_c_stream__` methods take an 
optional
+``requested_schema`` parameter. This parameter is a PyCapsule containing an
+``ArrowSchema``.
+
+The callee should attempt to provide the data in the requested schema. However,
+if the callee cannot provide the data in the requested schema, they may return
+with the same schema as if ``None`` were passed to ``requested_schema``.
+
+If the caller requests a schema that is not compatible with the data,
+say requesting a schema with a different number of fields, the callee should
+raise an exception. The requested schema mechanism is only meant to negotiate
+between different representations of the same data and not to allow arbitrary
+schema transformations.
+
+
+.. _PyCapsule: https://docs.python.org/3/c-api/capsule.html
+
+
+Protocol Typehints
+------------------
+
+The following typehints can be copied into your library to annotate that a 
+function accepts an object implementing one of these protocols.
+
+.. code-block:: python
+
+    from typing import Tuple, Protocol
+    from typing_extensions import Self
+
+    class ArrowSchemaExportable(Protocol):
+        def __arrow_c_schema__(self) -> object: ...
+
+    class ArrowArrayExportable(Protocol):
+        def __arrow_c_array__(
+            self,
+            requested_schema: object | None = None
+        ) -> Tuple[object, object]:
+            ...
+
+    class ArrowStreamExportable(Protocol):
+        def __arrow_c_stream__(
+            self,
+            requested_schema: object | None = None
+        ) -> object:
+            ...
+
+Examples
+========
+
+Create a PyCapsule
+------------------
+
+
+To create a PyCapsule, use the `PyCapsule_New 
<https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_New>`_
+function. The function must be passed a destructor function that will be called
+to release the data the capsule points to. It must first call the release
+callback if it is not null, then free the struct.
+
+Below is the code to create a PyCapsule for an ``ArrowSchema``. The code for
+``ArrowArray`` and ``ArrowArrayStream`` is similar.
+
+.. tab-set::
+
+   .. tab-item:: C
+
+      .. code-block:: c
+
+         #include <Python.h>
+
+         void ReleaseArrowSchemaPyCapsule(PyObject* capsule) {
+             struct ArrowSchema* schema =
+                 (struct ArrowSchema*)PyCapsule_GetPointer(capsule, 
"arrow_schema");
+             if (schema->release != NULL) {
+                 schema->release(schema);
+             }
+             free(schema);
+         }
+         
+         PyObject* ExportArrowSchemaPyCapsule() {
+             struct ArrowSchema* schema =
+                 (struct ArrowSchema*)malloc(sizeof(struct ArrowSchema));
+             // Fill in ArrowSchema fields
+             // ...
+             return PyCapsule_New(schema, "arrow_schema", 
ReleaseArrowSchemaPyCapsule);
+         }
+
+   .. tab-item:: Cython
+
+      .. code-block:: cython
+
+         cimport cpython
+         from libc.stdlib cimport malloc, free
+
+         cdef void release_arrow_schema_py_capsule(object schema_capsule):
+             cdef ArrowSchema* schema = 
<ArrowSchema*>cpython.PyCapsule_GetPointer(
+                 schema_capsule, 'arrow_schema'
+             )
+             if schema.release != NULL:
+                 schema.release(schema)
+         
+             free(schema)
+         
+         cdef object export_arrow_schema_py_capsule():
+             cdef ArrowSchema* schema = 
<ArrowSchema*>malloc(sizeof(ArrowSchema))
+             # It's recommended to immediately wrap the struct in a capsule, so
+             # if subsequent lines raise an exception memory will not be 
leaked.
+             schema.release = NULL
+             capsule = cpython.PyCapsule_New(
+                 <void*>schema, 'arrow_schema', release_arrow_schema_py_capsule
+             )
+             # Fill in ArrowSchema fields:
+             # schema.format = ...
+             # ...
+             return capsule
+
+
+Consume a PyCapsule
+-------------------
+
+To consume a PyCapsule, use the `PyCapsule_GetPointer 
<https://docs.python.org/3/c-api/capsule.html#c.PyCapsule_GetPointer>`_ function
+to get the pointer to the underlying struct. Import the struct using your
+system's Arrow C Data Interface import function. Only after that should the
+capsule be freed.
+
+The below example shows how to consume a PyCapsule for an ``ArrowSchema``. The
+code for ``ArrowArray`` and ``ArrowArrayStream`` is similar.
+
+.. tab-set::
+
+   .. tab-item:: C
+
+      .. code-block:: c
+
+         #include <Python.h>
+         
+         // If the capsule is not an ArrowSchema, will return NULL and set an 
exception.
+         struct ArrowSchema* GetArrowSchemaPyCapsule(PyObject* capsule) {
+           return PyCapsule_GetPointer(capsule, "arrow_schema");
+         }
+
+   .. tab-item:: Cython
+
+      .. code-block:: cython
+
+         cimport cpython
+        
+         cdef ArrowSchema* get_arrow_schema_py_capsule(object capsule) except 
NULL:
+             return <ArrowSchema*>cpython.PyCapsule_GetPointer(capsule, 
'arrow_schema')
+
+Backwards Compatibility with PyArrow
+------------------------------------
+
+When interacting with PyArrow, the PyCapsule interface should be preferred over
+the ``_export_to_c`` and ``_import_from_c`` methods. However, many libraries 
will
+want to support a range of PyArrow versions. This can be done via Duck typing.
+
+For example, if your library had an import method such as:
+
+.. code-block:: python
+
+   # OLD METHOD
+   def from_arrow(arr: pa.Array)
+       array_import_ptr = make_array_import_ptr()
+       schema_import_ptr = make_schema_import_ptr()
+       arr._export_to_c(array_import_ptr, schema_import_ptr)
+       return import_c_data(array_import_ptr, schema_import_ptr)
+
+You can rewrite this method to support both PyArrow and other libraries that
+implement the PyCapsule interface:
+
+.. code-block:: python
+
+   # NEW METHOD
+   def from_arrow(arr)
+       # Newer versions of PyArrow as well as other libraries with Arrow data
+       # implement this method, so prefer it over _export_to_c.
+       if hasattr(arr, "__arrow_c_array__"):
+            schema_ptr, array_ptr = arr.__arrow_c_array__()
+            return import_c_capsule_data(schema_ptr, array_ptr)
+       elif isinstance(arr, pa.Array):
+            # Deprecated method, used for older versions of PyArrow
+            array_import_ptr = make_array_import_ptr()
+            schema_import_ptr = make_schema_import_ptr()
+            arr._export_to_c(array_import_ptr, schema_import_ptr)
+            return import_c_data(array_import_ptr, schema_import_ptr)
+       else:
+           raise TypeError(f"Cannot import {type(arr)} as Arrow array data.")
+
+You may also wish to accept objects implementing the protocol in your
+constructors. For example, in PyArrow, the :func:`array` and 
:func:`record_batch`
+constructors accept any object that implements the :meth:`__arrow_c_array__` 
method
+protocol. Similarly, the PyArrow's :func:`schema` constructor accepts any 
object
+that implements the :meth:`__arrow_c_schema__` method.
+
+Now if your library has an export to PyArrow function, such as:
+
+.. code-block:: python
+
+   # OLD METHOD
+   def to_arrow(self) -> pa.Array:
+       array_export_ptr = make_array_export_ptr()
+       schema_export_ptr = make_schema_export_ptr()
+       self.export_c_data(array_export_ptr, schema_export_ptr)
+       return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)
+
+You can rewrite this function to use the PyCapsule interface by passing your
+object to the :py:func:`array` constructor, which accepts any object that
+implements the protocol. An easy way to check if the PyArrow version is new
+enough to support this is to check whether ``pa.Array`` has the
+``__arrow_c_array__`` method.
+
+.. code-block:: python
+
+  import warnings
+
+  # NEW METHOD
+  def to_arrow(self) -> pa.Array:
+      # PyArrow added support for constructing arrays from objects implementing
+      # __arrow_c_array__ in the same version it added the method for it's own
+      # arrays. So we can use hasattr to check if the method is available as
+      # a proxy for checking the PyArrow version.
+      if hasattr(pa.Array, "__arrow_c_array__"):
+          return pa.array(self)
+      else:
+          array_export_ptr = make_array_export_ptr()
+          schema_export_ptr = make_schema_export_ptr()
+          self.export_c_data(array_export_ptr, schema_export_ptr)
+          return pa.Array._import_from_c(array_export_ptr, schema_export_ptr)
+
+
+Comparison with Other Protocols
+===============================
+
+Comparison to DataFrame Interchange Protocol
+--------------------------------------------
+
+`The DataFrame Interchange Protocol 
<https://data-apis.org/dataframe-protocol/latest/>`_
+is another protocol in Python that allows for the sharing of data between 
libraries.
+This protocol is complementary to the DataFrame Interchange Protocol. Many of
+the objects that implement this protocol will also implement the DataFrame
+Interchange Protocol.
+
+This protocol is specific to Arrow-based data structures, while the DataFrame
+Interchange Protocol allows non-Arrow data frames and arrays to be shared as 
well.
+Because of this, these PyCapsules can support Arrow-specific features such as
+nested columns.
+
+This protocol is also much more minimal than the DataFrame Interchange 
Protocol.
+It just handles data export, rather than defining accessors for details like
+number of rows or columns.
+
+In summary, if you are implementing this protocol, you should also consider
+implementing the DataFrame Interchange Protocol.
+
+
+Comparison to ``__arrow_array__`` protocol
+------------------------------------------
+
+The :ref:`arrow_array_protocol` protocol is a dunder method that 
+defines how PyArrow should import an object as an Arrow array. Unlike this
+protocol, it is specific to PyArrow and isn't used by other libraries. It is
+also limited to arrays and does not support schemas, tabular structures, or 
streams.
\ No newline at end of file
diff --git a/docs/source/python/extending_types.rst 
b/docs/source/python/extending_types.rst
index 87f04f37dc..b9e875ceeb 100644
--- a/docs/source/python/extending_types.rst
+++ b/docs/source/python/extending_types.rst
@@ -21,6 +21,8 @@
 Extending pyarrow
 =================
 
+.. _arrow_array_protocol:
+
 Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol
 -----------------------------------------------------------------------------
 
@@ -46,6 +48,13 @@ The ``__arrow_array__`` method takes an optional `type` 
keyword which is passed
 through from :func:`pyarrow.array`. The method is allowed to return either
 a :class:`~pyarrow.Array` or a :class:`~pyarrow.ChunkedArray`.
 
+.. note::
+
+    For a more general way to control the conversion of Python objects to Arrow
+    data consider the :doc:`/format/CDataInterface/PyCapsuleInterface`. It is
+    not specific to PyArrow and supports converting other objects such as 
tables
+    and schemas.
+
 
 Defining extension types ("user-defined types")
 -----------------------------------------------
diff --git a/docs/source/python/interchange_protocol.rst 
b/docs/source/python/interchange_protocol.rst
index 7784d78619..e293699220 100644
--- a/docs/source/python/interchange_protocol.rst
+++ b/docs/source/python/interchange_protocol.rst
@@ -15,6 +15,8 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
+.. _pyarrow-dataframe-interchange-protocol:
+
 Dataframe Interchange Protocol
 ==============================
 
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index e36d8b2f04..2e97503822 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -15,8 +15,11 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer, 
PyCapsule_New
+
 import os
 import warnings
+from cython import sizeof
 
 
 cdef _sequence_to_array(object sequence, object mask, object size,
@@ -123,9 +126,11 @@ def array(object obj, type=None, mask=None, size=None, 
from_pandas=None,
 
     Parameters
     ----------
-    obj : sequence, iterable, ndarray or pandas.Series
+    obj : sequence, iterable, ndarray, pandas.Series, Arrow-compatible array
         If both type and size are specified may be a single use iterable. If
         not strongly-typed, Arrow type will be inferred for resulting array.
+        Any Arrow-compatible array that implements the Arrow PyCapsule Protocol
+        (has an ``__arrow_c_array__`` method) can be passed as well.
     type : pyarrow.DataType
         Explicit type to attempt to coerce to, otherwise will be inferred from
         the data.
@@ -241,6 +246,18 @@ def array(object obj, type=None, mask=None, size=None, 
from_pandas=None,
 
     if hasattr(obj, '__arrow_array__'):
         return _handle_arrow_array_protocol(obj, type, mask, size)
+    elif hasattr(obj, '__arrow_c_array__'):
+        if type is not None:
+            requested_type = type.__arrow_c_schema__()
+        else:
+            requested_type = None
+        schema_capsule, array_capsule = obj.__arrow_c_array__(requested_type)
+        out_array = Array._import_from_c_capsule(schema_capsule, array_capsule)
+        if type is not None and out_array.type != type:
+            # PyCapsule interface type coersion is best effort, so we need to
+            # check the type of the returned array and cast if necessary
+            out_array = array.cast(type, safe=safe, memory_pool=memory_pool)
+        return out_array
     elif _is_array_like(obj):
         if mask is not None:
             if _is_array_like(mask):
@@ -1699,6 +1716,68 @@ cdef class Array(_PandasConvertible):
                                                      c_type))
         return pyarrow_wrap_array(c_array)
 
+    def __arrow_c_array__(self, requested_schema=None):
+        """
+        Get a pair of PyCapsules containing a C ArrowArray representation of 
the object.
+
+        Parameters
+        ----------
+        requested_schema : PyCapsule | None
+            A PyCapsule containing a C ArrowSchema representation of a 
requested
+            schema. PyArrow will attempt to cast the array to this data type.
+            If None, the array will be returned as-is, with a type matching the
+            one returned by :meth:`__arrow_c_schema__()`.
+
+        Returns
+        -------
+        Tuple[PyCapsule, PyCapsule]
+            A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
+            respectively.
+        """
+        cdef:
+            ArrowArray* c_array
+            ArrowSchema* c_schema
+            shared_ptr[CArray] inner_array
+
+        if requested_schema is not None:
+            target_type = DataType._import_from_c_capsule(requested_schema)
+
+            if target_type != self.type:
+                try:
+                    casted_array = _pc().cast(self, target_type, safe=True)
+                    inner_array = pyarrow_unwrap_array(casted_array)
+                except ArrowInvalid as e:
+                    raise ValueError(
+                        f"Could not cast {self.type} to requested type 
{target_type}: {e}"
+                    )
+            else:
+                inner_array = self.sp_array
+        else:
+            inner_array = self.sp_array
+
+        schema_capsule = alloc_c_schema(&c_schema)
+        array_capsule = alloc_c_array(&c_array)
+
+        with nogil:
+            check_status(ExportArray(deref(inner_array), c_array, c_schema))
+
+        return schema_capsule, array_capsule
+
+    @staticmethod
+    def _import_from_c_capsule(schema_capsule, array_capsule):
+        cdef:
+            ArrowSchema* c_schema
+            ArrowArray* c_array
+            shared_ptr[CArray] array
+
+        c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema_capsule, 
'arrow_schema')
+        c_array = <ArrowArray*> PyCapsule_GetPointer(array_capsule, 
'arrow_array')
+
+        with nogil:
+            array = GetResultValue(ImportArray(c_array, c_schema))
+
+        return pyarrow_wrap_array(array)
+
 
 cdef _array_like_to_pandas(obj, options, types_mapper):
     cdef:
diff --git a/python/pyarrow/includes/libarrow.pxd 
b/python/pyarrow/includes/libarrow.pxd
index ad79c0edcd..fda9d44497 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -2743,13 +2743,13 @@ cdef extern from "arrow/array/concatenate.h" namespace 
"arrow" nogil:
 
 cdef extern from "arrow/c/abi.h":
     cdef struct ArrowSchema:
-        pass
+        void (*release)(ArrowSchema*) noexcept nogil
 
     cdef struct ArrowArray:
-        pass
+        void (*release)(ArrowArray*) noexcept nogil
 
     cdef struct ArrowArrayStream:
-        pass
+        void (*release)(ArrowArrayStream*) noexcept nogil
 
 cdef extern from "arrow/c/bridge.h" namespace "arrow" nogil:
     CStatus ExportType(CDataType&, ArrowSchema* out)
diff --git a/python/pyarrow/ipc.pxi b/python/pyarrow/ipc.pxi
index 53e521fc11..deb3bb728a 100644
--- a/python/pyarrow/ipc.pxi
+++ b/python/pyarrow/ipc.pxi
@@ -15,9 +15,11 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer, 
PyCapsule_New
+
 from collections import namedtuple
 import warnings
-
+from cython import sizeof
 
 cpdef enum MetadataVersion:
     V1 = <char> CMetadataVersion_V1
@@ -815,6 +817,70 @@ cdef class RecordBatchReader(_Weakrefable):
         self.reader = c_reader
         return self
 
+    def __arrow_c_stream__(self, requested_schema=None):
+        """
+        Export to a C ArrowArrayStream PyCapsule.
+
+        Parameters
+        ----------
+        requested_schema: Schema, default None
+            The schema to which the stream should be casted. Currently, this is
+            not supported and will raise a NotImplementedError if the schema 
+            doesn't match the current schema.
+
+        Returns
+        -------
+        PyCapsule
+            A capsule containing a C ArrowArrayStream struct.
+        """
+        cdef:
+            ArrowArrayStream* c_stream
+
+        if requested_schema is not None:
+            out_schema = Schema._import_from_c_capsule(requested_schema)
+            # TODO: figure out a way to check if one schema is castable to
+            # another. Once we have that, we can perform validation here and
+            # if successful creating a wrapping reader that casts each batch.
+            if self.schema != out_schema:
+                raise NotImplementedError("Casting to requested_schema")
+
+        stream_capsule = alloc_c_stream(&c_stream)
+
+        with nogil:
+            check_status(ExportRecordBatchReader(self.reader, c_stream))
+
+        return stream_capsule
+
+    @staticmethod
+    def _import_from_c_capsule(stream):
+        """
+        Import RecordBatchReader from a C ArrowArrayStream PyCapsule.
+
+        Parameters
+        ----------
+        stream: PyCapsule
+            A capsule containing a C ArrowArrayStream PyCapsule.
+
+        Returns
+        -------
+        RecordBatchReader
+        """
+        cdef:
+            ArrowArrayStream* c_stream
+            shared_ptr[CRecordBatchReader] c_reader
+            RecordBatchReader self
+
+        c_stream = <ArrowArrayStream*>PyCapsule_GetPointer(
+            stream, 'arrow_array_stream'
+        )
+
+        with nogil:
+            c_reader = GetResultValue(ImportRecordBatchReader(c_stream))
+
+        self = RecordBatchReader.__new__(RecordBatchReader)
+        self.reader = c_reader
+        return self
+
     @staticmethod
     def from_batches(Schema schema not None, batches):
         """
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 72af5a2dee..bbf60416de 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -15,8 +15,10 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import warnings
+from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer, 
PyCapsule_New
 
+import warnings
+from cython import sizeof
 
 cdef class ChunkedArray(_PandasConvertible):
     """
@@ -2983,6 +2985,100 @@ cdef class RecordBatch(_Tabular):
                     <ArrowArray*> c_ptr, c_schema))
         return pyarrow_wrap_batch(c_batch)
 
+    def __arrow_c_array__(self, requested_schema=None):
+        """
+        Get a pair of PyCapsules containing a C ArrowArray representation of 
the object.
+
+        Parameters
+        ----------
+        requested_schema : PyCapsule | None
+            A PyCapsule containing a C ArrowSchema representation of a 
requested
+            schema. PyArrow will attempt to cast the batch to this schema.
+            If None, the schema will be returned as-is, with a schema matching 
the
+            one returned by :meth:`__arrow_c_schema__()`.
+
+        Returns
+        -------
+        Tuple[PyCapsule, PyCapsule]
+            A pair of PyCapsules containing a C ArrowSchema and ArrowArray,
+            respectively.
+        """
+        cdef:
+            ArrowArray* c_array
+            ArrowSchema* c_schema
+
+        if requested_schema is not None:
+            target_schema = Schema._import_from_c_capsule(requested_schema)
+
+            if target_schema != self.schema:
+                try:
+                    # We don't expose .cast() on RecordBatch, only on Table.
+                    casted_batch = Table.from_batches([self]).cast(
+                        target_schema, safe=True).to_batches()[0]
+                    inner_batch = pyarrow_unwrap_batch(casted_batch)
+                except ArrowInvalid as e:
+                    raise ValueError(
+                        f"Could not cast {self.schema} to requested schema 
{target_schema}: {e}"
+                    )
+            else:
+                inner_batch = self.sp_batch
+        else:
+            inner_batch = self.sp_batch
+
+        schema_capsule = alloc_c_schema(&c_schema)
+        array_capsule = alloc_c_array(&c_array)
+
+        with nogil:
+            check_status(ExportRecordBatch(deref(inner_batch), c_array, 
c_schema))
+
+        return schema_capsule, array_capsule
+
+    def __arrow_c_stream__(self, requested_schema=None):
+        """
+        Export the batch as an Arrow C stream PyCapsule.
+
+        Parameters
+        ----------
+        requested_schema : pyarrow.lib.Schema, default None
+            A schema to attempt to cast the streamed data to. This is currently
+            unsupported and will raise an error.
+
+        Returns
+        -------
+        PyCapsule
+        """
+        return Table.from_batches([self]).__arrow_c_stream__(requested_schema)
+
+    @staticmethod
+    def _import_from_c_capsule(schema_capsule, array_capsule):
+        """
+        Import RecordBatch from a pair of PyCapsules containing a C ArrowArray
+        and ArrowSchema, respectively.
+
+        Parameters
+        ----------
+        schema_capsule : PyCapsule
+            A PyCapsule containing a C ArrowSchema representation of the 
schema.
+        array_capsule : PyCapsule
+            A PyCapsule containing a C ArrowArray representation of the array.
+
+        Returns
+        -------
+        pyarrow.RecordBatch
+        """
+        cdef:
+            ArrowSchema* c_schema
+            ArrowArray* c_array
+            shared_ptr[CRecordBatch] c_batch
+
+        c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema_capsule, 
'arrow_schema')
+        c_array = <ArrowArray*> PyCapsule_GetPointer(array_capsule, 
'arrow_array')
+
+        with nogil:
+            c_batch = GetResultValue(ImportRecordBatch(c_array, c_schema))
+
+        return pyarrow_wrap_batch(c_batch)
+
 
 def _reconstruct_record_batch(columns, schema):
     """
@@ -4757,6 +4853,22 @@ cdef class Table(_Tabular):
             output_type=Table
         )
 
+    def __arrow_c_stream__(self, requested_schema=None):
+        """
+        Export the table as an Arrow C stream PyCapsule.
+
+        Parameters
+        ----------
+        requested_schema : pyarrow.lib.Schema, default None
+            A schema to attempt to cast the streamed data to. This is currently
+            unsupported and will raise an error.
+
+        Returns
+        -------
+        PyCapsule
+        """
+        return self.to_reader().__arrow_c_stream__(requested_schema)
+
 
 def _reconstruct_table(arrays, schema):
     """
@@ -4772,8 +4884,10 @@ def record_batch(data, names=None, schema=None, 
metadata=None):
 
     Parameters
     ----------
-    data : pandas.DataFrame, list
-        A DataFrame or list of arrays or chunked arrays.
+    data : pandas.DataFrame, list, Arrow-compatible table
+        A DataFrame, list of arrays or chunked arrays, or a tabular object
+        implementing the Arrow PyCapsule Protocol (has an
+        ``__arrow_c_array__`` method).
     names : list, default None
         Column names if list of arrays passed as data. Mutually exclusive with
         'schema' argument.
@@ -4892,6 +5006,18 @@ def record_batch(data, names=None, schema=None, 
metadata=None):
     if isinstance(data, (list, tuple)):
         return RecordBatch.from_arrays(data, names=names, schema=schema,
                                        metadata=metadata)
+    elif hasattr(data, "__arrow_c_array__"):
+        if schema is not None:
+            requested_schema = schema.__arrow_c_schema__()
+        else:
+            requested_schema = None
+        schema_capsule, array_capsule = 
data.__arrow_c_array__(requested_schema)
+        batch = RecordBatch._import_from_c_capsule(schema_capsule, 
array_capsule)
+        if schema is not None and batch.schema != schema:
+            # __arrow_c_array__ coerces schema with best effort, so we might
+            # need to cast it if the producer wasn't able to cast to exact 
schema.
+            batch = Table.from_batches([batch]).cast(schema).to_batches()[0]
+        return batch
     elif _pandas_api.is_data_frame(data):
         return RecordBatch.from_pandas(data, schema=schema)
     else:
@@ -5013,6 +5139,22 @@ def table(data, names=None, schema=None, metadata=None, 
nthreads=None):
             raise ValueError(
                 "The 'names' argument is not valid when passing a dictionary")
         return Table.from_pydict(data, schema=schema, metadata=metadata)
+    elif hasattr(data, "__arrow_c_stream__"):
+        if schema is not None:
+            requested = schema.__arrow_c_schema__()
+        else:
+            requested = None
+        capsule = data.__arrow_c_stream__(requested)
+        reader = RecordBatchReader._import_from_c_capsule(capsule)
+        table = reader.read_all()
+        if schema is not None and table.schema != schema:
+            # __arrow_c_array__ coerces schema with best effort, so we might
+            # need to cast it if the producer wasn't able to cast to exact 
schema.
+            table = table.cast(schema)
+        return table
+    elif hasattr(data, "__arrow_c_array__"):
+        batch = record_batch(data, schema)
+        return Table.from_batches([batch])
     elif _pandas_api.is_data_frame(data):
         if names is not None or metadata is not None:
             raise ValueError(
diff --git a/python/pyarrow/tests/test_array.py 
b/python/pyarrow/tests/test_array.py
index cd565a72bc..2f9727922b 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -3336,6 +3336,24 @@ def test_array_protocol():
     assert result.equals(expected)
 
 
+def test_c_array_protocol():
+    class ArrayWrapper:
+        def __init__(self, data):
+            self.data = data
+
+        def __arrow_c_array__(self, requested_type=None):
+            return self.data.__arrow_c_array__(requested_type)
+
+    # Can roundtrip through the C array protocol
+    arr = ArrayWrapper(pa.array([1, 2, 3], type=pa.int64()))
+    result = pa.array(arr)
+    assert result == arr.data
+
+    # Will case to requested type
+    result = pa.array(arr, type=pa.int32())
+    assert result == pa.array([1, 2, 3], type=pa.int32())
+
+
 def test_concat_array():
     concatenated = pa.concat_arrays(
         [pa.array([1, 2]), pa.array([3, 4])])
diff --git a/python/pyarrow/tests/test_cffi.py 
b/python/pyarrow/tests/test_cffi.py
index 9b0e24fa46..55bab4359b 100644
--- a/python/pyarrow/tests/test_cffi.py
+++ b/python/pyarrow/tests/test_cffi.py
@@ -16,6 +16,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
+import ctypes
 import gc
 
 import pyarrow as pa
@@ -36,7 +37,6 @@ except ImportError:
 needs_cffi = pytest.mark.skipif(ffi is None,
                                 reason="test needs cffi package installed")
 
-
 assert_schema_released = pytest.raises(
     ValueError, match="Cannot import released ArrowSchema")
 
@@ -47,6 +47,10 @@ assert_stream_released = pytest.raises(
     ValueError, match="Cannot import released ArrowArrayStream")
 
 
+def PyCapsule_IsValid(capsule, name):
+    return ctypes.pythonapi.PyCapsule_IsValid(ctypes.py_object(capsule), name) 
== 1
+
+
 class ParamExtType(pa.PyExtensionType):
 
     def __init__(self, width):
@@ -411,3 +415,123 @@ def test_imported_batch_reader_error():
                        match="Expected to be able to read 16 bytes "
                              "for message body, got 8"):
         reader_new.read_all()
+
+
[email protected]('obj', [pa.int32(), pa.field('foo', pa.int32()),
+                                 pa.schema({'foo': pa.int32()})],
+                         ids=['type', 'field', 'schema'])
+def test_roundtrip_schema_capsule(obj):
+    gc.collect()  # Make sure no Arrow data dangles in a ref cycle
+    old_allocated = pa.total_allocated_bytes()
+
+    capsule = obj.__arrow_c_schema__()
+    assert PyCapsule_IsValid(capsule, b"arrow_schema") == 1
+    assert pa.total_allocated_bytes() > old_allocated
+    obj_out = type(obj)._import_from_c_capsule(capsule)
+    assert obj_out == obj
+
+    assert pa.total_allocated_bytes() == old_allocated
+
+    capsule = obj.__arrow_c_schema__()
+
+    assert pa.total_allocated_bytes() > old_allocated
+    del capsule
+    assert pa.total_allocated_bytes() == old_allocated
+
+
[email protected]('arr,schema_accessor,bad_type,good_type', [
+    (pa.array(['a', 'b', 'c']), lambda x: x.type, pa.int32(), pa.string()),
+    (
+        pa.record_batch([pa.array(['a', 'b', 'c'])], names=['x']),
+        lambda x: x.schema,
+        pa.schema({'x': pa.int32()}),
+        pa.schema({'x': pa.string()})
+    ),
+], ids=['array', 'record_batch'])
+def test_roundtrip_array_capsule(arr, schema_accessor, bad_type, good_type):
+    gc.collect()  # Make sure no Arrow data dangles in a ref cycle
+    old_allocated = pa.total_allocated_bytes()
+
+    import_array = type(arr)._import_from_c_capsule
+
+    schema_capsule, capsule = arr.__arrow_c_array__()
+    assert PyCapsule_IsValid(schema_capsule, b"arrow_schema") == 1
+    assert PyCapsule_IsValid(capsule, b"arrow_array") == 1
+    arr_out = import_array(schema_capsule, capsule)
+    assert arr_out.equals(arr)
+
+    assert pa.total_allocated_bytes() > old_allocated
+    del arr_out
+
+    assert pa.total_allocated_bytes() == old_allocated
+
+    capsule = arr.__arrow_c_array__()
+
+    assert pa.total_allocated_bytes() > old_allocated
+    del capsule
+    assert pa.total_allocated_bytes() == old_allocated
+
+    with pytest.raises(ValueError,
+                       match=r"Could not cast.* string to requested .* int32"):
+        arr.__arrow_c_array__(bad_type.__arrow_c_schema__())
+
+    schema_capsule, array_capsule = arr.__arrow_c_array__(
+        good_type.__arrow_c_schema__())
+    arr_out = import_array(schema_capsule, array_capsule)
+    assert schema_accessor(arr_out) == good_type
+
+
+# TODO: implement requested_schema for stream
[email protected]('constructor', [
+    pa.RecordBatchReader.from_batches,
+    # Use a lambda because we need to re-order the parameters
+    lambda schema, batches: pa.Table.from_batches(batches, schema),
+], ids=['recordbatchreader', 'table'])
+def test_roundtrip_reader_capsule(constructor):
+    batches = make_batches()
+    schema = batches[0].schema
+
+    gc.collect()  # Make sure no Arrow data dangles in a ref cycle
+    old_allocated = pa.total_allocated_bytes()
+
+    obj = constructor(schema, batches)
+
+    capsule = obj.__arrow_c_stream__()
+    assert PyCapsule_IsValid(capsule, b"arrow_array_stream") == 1
+    imported_reader = pa.RecordBatchReader._import_from_c_capsule(capsule)
+    assert imported_reader.schema == schema
+    imported_batches = list(imported_reader)
+    assert len(imported_batches) == len(batches)
+    for batch, expected in zip(imported_batches, batches):
+        assert batch.equals(expected)
+
+    del obj, imported_reader, batch, expected, imported_batches
+
+    assert pa.total_allocated_bytes() == old_allocated
+
+    obj = constructor(schema, batches)
+
+    # TODO: turn this to ValueError once we implement validation.
+    bad_schema = pa.schema({'ints': pa.int32()})
+    with pytest.raises(NotImplementedError):
+        obj.__arrow_c_stream__(bad_schema.__arrow_c_schema__())
+
+    # Can work with matching schema
+    matching_schema = pa.schema({'ints': pa.list_(pa.int32())})
+    capsule = obj.__arrow_c_stream__(matching_schema.__arrow_c_schema__())
+    imported_reader = pa.RecordBatchReader._import_from_c_capsule(capsule)
+    assert imported_reader.schema == matching_schema
+    for batch, expected in zip(imported_reader, batches):
+        assert batch.equals(expected)
+
+
+def test_roundtrip_batch_reader_capsule():
+    batch = make_batch()
+
+    capsule = batch.__arrow_c_stream__()
+    assert PyCapsule_IsValid(capsule, b"arrow_array_stream") == 1
+    imported_reader = pa.RecordBatchReader._import_from_c_capsule(capsule)
+    assert imported_reader.schema == batch.schema
+    assert imported_reader.read_next_batch().equals(batch)
+    with pytest.raises(StopIteration):
+        imported_reader.read_next_batch()
diff --git a/python/pyarrow/tests/test_table.py 
b/python/pyarrow/tests/test_table.py
index 6b48633b91..a678f521e3 100644
--- a/python/pyarrow/tests/test_table.py
+++ b/python/pyarrow/tests/test_table.py
@@ -553,6 +553,93 @@ def test_recordbatch_dunder_init():
         pa.RecordBatch()
 
 
+def test_recordbatch_c_array_interface():
+    class BatchWrapper:
+        def __init__(self, batch):
+            self.batch = batch
+
+        def __arrow_c_array__(self, requested_type=None):
+            return self.batch.__arrow_c_array__(requested_type)
+
+    data = pa.record_batch([
+        pa.array([1, 2, 3], type=pa.int64())
+    ], names=['a'])
+    wrapper = BatchWrapper(data)
+
+    # Can roundtrip through the wrapper.
+    result = pa.record_batch(wrapper)
+    assert result == data
+
+    # Can also import with a schema that implementer can cast to.
+    castable_schema = pa.schema([
+        pa.field('a', pa.int32())
+    ])
+    result = pa.record_batch(wrapper, schema=castable_schema)
+    expected = pa.record_batch([
+        pa.array([1, 2, 3], type=pa.int32())
+    ], names=['a'])
+    assert result == expected
+
+
+def test_table_c_array_interface():
+    class BatchWrapper:
+        def __init__(self, batch):
+            self.batch = batch
+
+        def __arrow_c_array__(self, requested_type=None):
+            return self.batch.__arrow_c_array__(requested_type)
+
+    data = pa.record_batch([
+        pa.array([1, 2, 3], type=pa.int64())
+    ], names=['a'])
+    wrapper = BatchWrapper(data)
+
+    # Can roundtrip through the wrapper.
+    result = pa.table(wrapper)
+    expected = pa.Table.from_batches([data])
+    assert result == expected
+
+    # Can also import with a schema that implementer can cast to.
+    castable_schema = pa.schema([
+        pa.field('a', pa.int32())
+    ])
+    result = pa.table(wrapper, schema=castable_schema)
+    expected = pa.table({
+        'a': pa.array([1, 2, 3], type=pa.int32())
+    })
+    assert result == expected
+
+
+def test_table_c_stream_interface():
+    class StreamWrapper:
+        def __init__(self, batches):
+            self.batches = batches
+
+        def __arrow_c_stream__(self, requested_type=None):
+            reader = pa.RecordBatchReader.from_batches(
+                self.batches[0].schema, self.batches)
+            return reader.__arrow_c_stream__(requested_type)
+
+    data = [
+        pa.record_batch([pa.array([1, 2, 3], type=pa.int64())], names=['a']),
+        pa.record_batch([pa.array([4, 5, 6], type=pa.int64())], names=['a'])
+    ]
+    wrapper = StreamWrapper(data)
+
+    # Can roundtrip through the wrapper.
+    result = pa.table(wrapper)
+    expected = pa.Table.from_batches(data)
+    assert result == expected
+
+    # Passing schema works if already that schema
+    result = pa.table(wrapper, schema=data[0].schema)
+    assert result == expected
+
+    # If schema doesn't match, raises NotImplementedError
+    with pytest.raises(NotImplementedError):
+        pa.table(wrapper, schema=pa.schema([pa.field('a', pa.int32())]))
+
+
 def test_recordbatch_itercolumns():
     data = [
         pa.array(range(5), type='int16'),
diff --git a/python/pyarrow/tests/test_types.py 
b/python/pyarrow/tests/test_types.py
index 660765f336..16343eae61 100644
--- a/python/pyarrow/tests/test_types.py
+++ b/python/pyarrow/tests/test_types.py
@@ -1225,3 +1225,17 @@ def test_types_come_back_with_specific_type():
         schema = pa.schema([pa.field("field_name", arrow_type)])
         type_back = schema.field("field_name").type
         assert type(type_back) is type(arrow_type)
+
+
+def test_schema_import_c_schema_interface():
+    class Wrapper:
+        def __init__(self, schema):
+            self.schema = schema
+
+        def __arrow_c_schema__(self):
+            return self.schema.__arrow_c_schema__()
+
+    schema = pa.schema([pa.field("field_name", pa.int32())])
+    wrapped_schema = Wrapper(schema)
+
+    assert pa.schema(wrapped_schema) == schema
diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index 764cb8e7b5..d394b803e7 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer
+from cpython.pycapsule cimport PyCapsule_CheckExact, PyCapsule_GetPointer, 
PyCapsule_New, PyCapsule_IsValid
 
 import atexit
 from collections.abc import Mapping
@@ -23,7 +23,7 @@ import pickle
 import re
 import sys
 import warnings
-
+from cython import sizeof
 
 # These are imprecise because the type (in pandas 0.x) depends on the presence
 # of nulls
@@ -357,6 +357,46 @@ cdef class DataType(_Weakrefable):
                                            _as_c_pointer(in_ptr)))
         return pyarrow_wrap_data_type(result)
 
+    def __arrow_c_schema__(self):
+        """
+        Export to a ArrowSchema PyCapsule
+
+        Unlike _export_to_c, this will not leak memory if the capsule is not 
used.
+        """
+        cdef ArrowSchema* c_schema
+        capsule = alloc_c_schema(&c_schema)
+
+        with nogil:
+            check_status(ExportType(deref(self.type), c_schema))
+
+        return capsule
+
+    @staticmethod
+    def _import_from_c_capsule(schema):
+        """
+        Import a DataType from a ArrowSchema PyCapsule
+
+        Parameters
+        ----------
+        schema : PyCapsule
+            A valid PyCapsule with name 'arrow_schema' containing an
+            ArrowSchema pointer.
+        """
+        cdef:
+            ArrowSchema* c_schema
+            shared_ptr[CDataType] c_type
+
+        if not PyCapsule_IsValid(schema, 'arrow_schema'):
+            raise TypeError(
+                "Not an ArrowSchema object"
+            )
+        c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema, 'arrow_schema')
+
+        with nogil:
+            c_type = GetResultValue(ImportType(c_schema))
+
+        return pyarrow_wrap_data_type(c_type)
+
 
 cdef class DictionaryMemo(_Weakrefable):
     """
@@ -2369,6 +2409,46 @@ cdef class Field(_Weakrefable):
             result = GetResultValue(ImportField(<ArrowSchema*> c_ptr))
         return pyarrow_wrap_field(result)
 
+    def __arrow_c_schema__(self):
+        """
+        Export to a ArrowSchema PyCapsule
+
+        Unlike _export_to_c, this will not leak memory if the capsule is not 
used.
+        """
+        cdef ArrowSchema* c_schema
+        capsule = alloc_c_schema(&c_schema)
+
+        with nogil:
+            check_status(ExportField(deref(self.field), c_schema))
+
+        return capsule
+
+    @staticmethod
+    def _import_from_c_capsule(schema):
+        """
+        Import a Field from a ArrowSchema PyCapsule
+
+        Parameters
+        ----------
+        schema : PyCapsule
+            A valid PyCapsule with name 'arrow_schema' containing an
+            ArrowSchema pointer.
+        """
+        cdef:
+            ArrowSchema* c_schema
+            shared_ptr[CField] c_field
+
+        if not PyCapsule_IsValid(schema, 'arrow_schema'):
+            raise ValueError(
+                "Not an ArrowSchema object"
+            )
+        c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema, 'arrow_schema')
+
+        with nogil:
+            c_field = GetResultValue(ImportField(c_schema))
+
+        return pyarrow_wrap_field(c_field)
+
 
 cdef class Schema(_Weakrefable):
     """
@@ -3153,6 +3233,45 @@ cdef class Schema(_Weakrefable):
     def __repr__(self):
         return self.__str__()
 
+    def __arrow_c_schema__(self):
+        """
+        Export to a ArrowSchema PyCapsule
+
+        Unlike _export_to_c, this will not leak memory if the capsule is not 
used.
+        """
+        cdef ArrowSchema* c_schema
+        capsule = alloc_c_schema(&c_schema)
+
+        with nogil:
+            check_status(ExportSchema(deref(self.schema), c_schema))
+
+        return capsule
+
+    @staticmethod
+    def _import_from_c_capsule(schema):
+        """
+        Import a Schema from a ArrowSchema PyCapsule
+
+        Parameters
+        ----------
+        schema : PyCapsule
+            A valid PyCapsule with name 'arrow_schema' containing an
+            ArrowSchema pointer.
+        """
+        cdef:
+            ArrowSchema* c_schema
+
+        if not PyCapsule_IsValid(schema, 'arrow_schema'):
+            raise ValueError(
+                "Not an ArrowSchema object"
+            )
+        c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema, 'arrow_schema')
+
+        with nogil:
+            result = GetResultValue(ImportSchema(c_schema))
+
+        return pyarrow_wrap_schema(result)
+
 
 def unify_schemas(schemas, *, promote_options="default"):
     """
@@ -4930,6 +5049,8 @@ def schema(fields, metadata=None):
     Parameters
     ----------
     fields : iterable of Fields or tuples, or mapping of strings to DataTypes
+        Can also pass an object that implements the Arrow PyCapsule Protocol
+        for schemas (has an ``__arrow_c_schema__`` method).
     metadata : dict, default None
         Keys and values must be coercible to bytes.
 
@@ -4969,6 +5090,8 @@ def schema(fields, metadata=None):
 
     if isinstance(fields, Mapping):
         fields = fields.items()
+    elif hasattr(fields, "__arrow_c_schema__"):
+        return Schema._import_from_c_capsule(fields.__arrow_c_schema__())
 
     for item in fields:
         if isinstance(item, tuple):
@@ -5103,3 +5226,61 @@ def _unregister_py_extension_types():
 
 _register_py_extension_type()
 atexit.register(_unregister_py_extension_types)
+
+
+#
+# PyCapsule export utilities
+#
+
+cdef void pycapsule_schema_deleter(object schema_capsule) noexcept:
+    cdef ArrowSchema* schema = <ArrowSchema*>PyCapsule_GetPointer(
+        schema_capsule, 'arrow_schema'
+    )
+    if schema.release != NULL:
+        schema.release(schema)
+
+    free(schema)
+
+cdef object alloc_c_schema(ArrowSchema** c_schema) noexcept:
+    c_schema[0] = <ArrowSchema*> malloc(sizeof(ArrowSchema))
+    # Ensure the capsule destructor doesn't call a random release pointer
+    c_schema[0].release = NULL
+    return PyCapsule_New(c_schema[0], 'arrow_schema', 
&pycapsule_schema_deleter)
+
+
+cdef void pycapsule_array_deleter(object array_capsule) noexcept:
+    cdef:
+        ArrowArray* array
+    # Do not invoke the deleter on a used/moved capsule
+    array = <ArrowArray*>cpython.PyCapsule_GetPointer(
+        array_capsule, 'arrow_array'
+    )
+    if array.release != NULL:
+        array.release(array)
+
+    free(array)
+
+cdef object alloc_c_array(ArrowArray** c_array) noexcept:
+    c_array[0] = <ArrowArray*> malloc(sizeof(ArrowArray))
+    # Ensure the capsule destructor doesn't call a random release pointer
+    c_array[0].release = NULL
+    return PyCapsule_New(c_array[0], 'arrow_array', &pycapsule_array_deleter)
+
+
+cdef void pycapsule_stream_deleter(object stream_capsule) noexcept:
+    cdef:
+        ArrowArrayStream* stream
+    # Do not invoke the deleter on a used/moved capsule
+    stream = <ArrowArrayStream*>PyCapsule_GetPointer(
+        stream_capsule, 'arrow_array_stream'
+    )
+    if stream.release != NULL:
+        stream.release(stream)
+
+    free(stream)
+
+cdef object alloc_c_stream(ArrowArrayStream** c_stream) noexcept:
+    c_stream[0] = <ArrowArrayStream*> malloc(sizeof(ArrowArrayStream))
+    # Ensure the capsule destructor doesn't call a random release pointer
+    c_stream[0].release = NULL
+    return PyCapsule_New(c_stream[0], 'arrow_array_stream', 
&pycapsule_stream_deleter)

[arrow] branch main updated: GH-35531: [Python] C Data Interface PyCapsule Protocol (#37797)

Reply via email to