Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

via GitHub Thu, 08 Feb 2024 13:37:04 -0800


danepitkin commented on code in PR #378:
URL: https://github.com/apache/arrow-nanoarrow/pull/378#discussion_r1483602790



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -125,10 +138,205 @@ def c_array(obj=None, requested_schema=None) -> CArray:
         out = CArray.allocate(CSchema.allocate())
         obj._export_to_c(out._addr(), out.schema._addr())
         return out
-    else:
+
+    # Try buffer protocol (e.g., numpy arrays)
+    try:
+        return c_array_from_pybuffer(obj)
+    except Exception as e:
         raise TypeError(
             f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
+        ) from e
+
+
+def c_array_from_pybuffer(obj) -> CArray:

Review Comment:
   (Optional) It might be nice to put these into a class e.g.
   ```
   class CArray:
       @staticmethod
       def from_pybuffers():
            ...
   
       @staticmethod
       def from_buffers():
            ...



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -125,10 +138,205 @@ def c_array(obj=None, requested_schema=None) -> CArray:
         out = CArray.allocate(CSchema.allocate())
         obj._export_to_c(out._addr(), out.schema._addr())
         return out
-    else:
+
+    # Try buffer protocol (e.g., numpy arrays)
+    try:
+        return c_array_from_pybuffer(obj)
+    except Exception as e:
         raise TypeError(
             f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
+        ) from e
+
+
+def c_array_from_pybuffer(obj) -> CArray:
+    """Create an ArrowArray wrapper from the Python buffer protocol
+
+    Invokes the Python buffer protocol to wrap the buffer represented by obj
+    if possible.
+
+    Examples
+    --------
+
+    >>> import nanoarrow as na
+    >>> from nanoarrow.c_lib import c_array_from_pybuffer
+    >>> na.c_array_view(c_array_from_pybuffer(b"1234"))
+    <nanoarrow.c_lib.CArrayView>
+    - storage_type: 'uint8'
+    - length: 4
+    - offset: 0
+    - null_count: 0
+    - buffers[2]:
+      - validity <bool[0 b] >
+      - data <uint8[4 b] 49 50 51 52>
+    - dictionary: NULL
+    - children[0]:
+    """
+
+    buffer = CBuffer().set_pybuffer(obj)
+    view = buffer.data
+    type_id = view.data_type_id
+    element_size_bits = view.element_size_bits
+
+    builder = CArrayBuilder.allocate()
+
+    # Fixed-size binary needs a schema
+    if type_id == CArrowType.BINARY and element_size_bits != 0:
+        c_schema = (
+            CSchemaBuilder.allocate()
+            .set_type_fixed_size(CArrowType.FIXED_SIZE_BINARY, 
element_size_bits // 8)
+            .finish()
         )
+        builder.init_from_schema(c_schema)
+    elif type_id == CArrowType.STRING:
+        builder.init_from_type(int(CArrowType.INT8))
+    elif type_id == CArrowType.BINARY:
+        builder.init_from_type(int(CArrowType.UINT8))
+    else:
+        builder.init_from_type(int(type_id))
+
+    # Set the length
+    builder.set_length(len(view))
+
+    # Move ownership of the ArrowBuffer wrapped by buffer to builder.buffer(1)
+    builder.set_buffer(1, buffer)
+
+    # No nulls or offset from a PyBuffer
+    builder.set_null_count(0)
+    builder.set_offset(0)
+
+    return builder.finish()
+
+
+def c_array_empty(schema) -> CArray:

Review Comment:
   As a user, I typically would assume calling `c_array(schema)` could give me 
an empty array, too. Would it be more intuitive to consolidate APIs?



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -257,7 +465,74 @@ def c_array_view(obj, requested_schema=None) -> CArrayView:
     return CArrayView.from_cpu_array(c_array(obj, requested_schema))
 
 
-def allocate_c_schema():
+def c_buffer(obj) -> CBuffer:
+    """Owning, read-only ArrowBuffer wrapper
+
+    Wraps obj in nanoarrow's owning buffer structure, the ArrowBuffer,

Review Comment:
   Could this maybe be handled in `CBuffer.__init__()` instead of using a 
separate API?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

Reply via email to