Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

via GitHub Thu, 15 Feb 2024 11:48:25 -0800


paleolimbot commented on code in PR #378:
URL: https://github.com/apache/arrow-nanoarrow/pull/378#discussion_r1491559597



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -120,15 +157,134 @@ def c_array(obj=None, requested_schema=None) -> CArray:
             *obj.__arrow_c_array__(requested_schema=requested_schema_capsule)
         )
 
-    # for pyarrow < 14.0
-    if hasattr(obj, "_export_to_c"):
+    # Try buffer protocol (e.g., numpy arrays or a c_buffer())
+    if _obj_is_buffer(obj):
+        return _c_array_from_pybuffer(obj)
+
+    # Try import of bare capsule
+    if _obj_is_capsule(obj, "arrow_array"):
+        if requested_schema is None:
+            requested_schema_capsule = CSchema.allocate()._capsule
+        else:
+            requested_schema_capsule = requested_schema.__arrow_c_schema__()
+
+        return CArray._import_from_c_capsule(requested_schema_capsule, obj)
+
+    # Try _export_to_c for Array/RecordBatch objects if pyarrow < 14.0
+    if _obj_is_pyarrow_array(obj):
         out = CArray.allocate(CSchema.allocate())
         obj._export_to_c(out._addr(), out.schema._addr())
         return out
-    else:
-        raise TypeError(
-            f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
-        )
+
+    # Try import of iterable
+    if _obj_is_iterable(obj):
+        return _c_array_from_iterable(obj, requested_schema)
+
+    raise TypeError(
+        f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
+    )
+
+
+def c_array_from_buffers(
+    schema,
+    length: int,
+    buffers: Iterable[Any],
+    null_count: int = -1,
+    offset: int = 0,
+    children: Iterable[Any] = (),
+    validation_level: Literal["full", "default", "minimal", "none"] = 
"default",
+) -> CArray:
+    """Create an ArrowArray wrapper from components
+
+    Given a schema, build an ArrowArray buffer-wise. This allows almost any 
array
+    to be assembled; however, requires some knowledge of the Arrow Columnar
+    specification. This function will do its best to validate the sizes and
+    content of buffers according to ``validation_level``, which can be set
+    to ``"full""`` for maximum safety.
+
+    Parameters
+    ----------
+
+    schema : schema-like
+        The data type of the desired array as sanitized by :func:`c_schema`.
+    length : int
+        The length of the output array.
+    buffers : Iterable of buffer-like or None
+        An iterable of buffers as sanitized by :func:`c_buffer`. Any object
+        supporting the Python Buffer protocol is accepted. Buffer data types
+        are not checked. A buffer value of ``None`` will skip setting a buffer
+        (i.e., that buffer will be of length zero and its pointer will
+        be ``NULL``).
+    null_count : int, optional
+        The number of null values, if known in advance. If -1 (the default),
+        the null count will be calculated based on the validity bitmap. If
+        the validity bitmap was set to ``None``, the calculated null count
+        will be zero.
+    offset : int, optional
+        The logical offset from the start of the array.
+    children : Iterable of array-like
+        An iterable of arrays used to set child fields of the array. Can 
contain
+        any object accepted by :func:`c_array`. Must contain the exact number 
of
+        required children as specifed by ``schema``.
+    validation_level: str, optional
+        One of "none" (no check), "minimal" (check buffer sizes that do not 
require
+        dereferencing buffer content), "default" (check all buffer sizes), or 
"full"
+        (check all buffer sizes and all buffer content).
+
+    Examples
+    --------
+
+    >>> import nanoarrow as na
+    >>> c_array = na.c_array_from_buffers(na.uint8(), 5, [None, b"12345"])
+    >>> na.c_array_view(c_array)
+    <nanoarrow.c_lib.CArrayView>
+    - storage_type: 'uint8'
+    - length: 5
+    - offset: 0
+    - null_count: 0
+    - buffers[2]:
+      - validity <bool[0 b] >
+      - data <uint8[5 b] 49 50 51 52 53>
+    - dictionary: NULL
+    - children[0]:
+    """
+    schema = c_schema(schema)
+    builder = CArrayBuilder.allocate()
+
+    # This is slightly wasteful: it will allocate arrays recursively and we 
are about

Review Comment:
   I clarified this comment (yes, the child arrays). For every child there's an 
"allocate released `ArrowArray`" + "initialize the one with nanoarrow's release 
callback that lets you work with buffers"...technically we don't need the 
second half here but the validation is pretty nice to avoid memory leaks and if 
you have a gazillion columns you probably don't want to use this function 
anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

Reply via email to