jorisvandenbossche commented on code in PR #43488:
URL: https://github.com/apache/arrow/pull/43488#discussion_r1705431221


##########
python/pyarrow/array.pxi:
##########
@@ -4447,6 +4447,69 @@ cdef class FixedShapeTensorArray(ExtensionArray):
             FixedSizeListArray.from_arrays(values, shape[1:].prod())
         )
 
+cdef class Bool8Array(ExtensionArray):
+    """
+    Concrete class for bool8 extension arrays.
+    Examples
+    --------
+    Define the extension type for an bool8 array
+    >>> import pyarrow as pa
+    >>> bool8_type = pa.bool8()
+    Create an extension array
+    >>> arr = [-1, 0, 1, 2, None]

Review Comment:
   ```suggestion
       Define the extension type for an bool8 array
   
       >>> import pyarrow as pa
       >>> bool8_type = pa.bool8()
   
       Create an extension array
   
       >>> arr = [-1, 0, 1, 2, None]
   ```
   
   (for correct rendering, it needs blank lines between the normal text and 
code blocks (and it also looks better in plain text IMO))



##########
python/pyarrow/array.pxi:
##########
@@ -4447,6 +4447,69 @@ cdef class FixedShapeTensorArray(ExtensionArray):
             FixedSizeListArray.from_arrays(values, shape[1:].prod())
         )
 
+cdef class Bool8Array(ExtensionArray):
+    """
+    Concrete class for bool8 extension arrays.
+    Examples
+    --------
+    Define the extension type for an bool8 array
+    >>> import pyarrow as pa
+    >>> bool8_type = pa.bool8()
+    Create an extension array
+    >>> arr = [-1, 0, 1, 2, None]
+    >>> storage = pa.array(arr, pa.int8())
+    >>> pa.ExtensionArray.from_storage(bool8_type, storage)
+    <pyarrow.lib.Bool8Array object at ...>
+    [
+      -1,
+      0,
+      1,
+      2,
+      null
+    ]
+    """
+
+    def to_numpy(self, zero_copy_only=True, writable=False):
+        try:
+            return self.storage.to_numpy().view(np.bool_)
+        except ArrowInvalid as e:
+            if zero_copy_only:
+                raise e
+
+        return _pc().not_equal(self.storage, 
0).to_numpy(zero_copy_only=zero_copy_only, writable=writable)
+
+    @staticmethod
+    def from_numpy(obj):
+        """
+        Convert numpy array to a bool8 extension array without making a copy.
+        The input array must be 1-dimensional, with either bool_ or int8 dtype.
+
+        Parameters
+        ----------
+        obj : numpy.ndarray
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> import numpy as np
+        >>> arr = np.array([True, False, True], dtype=np.bool_)
+        >>> pa.Bool8Array.from_numpy(arr)
+        <pyarrow.lib.Bool8Array object at ...>
+        [
+          1,
+          0,
+          1
+        ]
+        """
+
+        if obj.ndim != 1:
+            raise ValueError(f"Cannot convert {obj.ndim}-D array to bool8 
array")
+        
+        if obj.dtype not in [np.bool_, np.int8]:
+            raise TypeError(f"Array dtype {obj.dtype} incompatible with bool8 
storage")
+
+        buf = foreign_buffer(obj.ctypes.data, obj.size)
+        return Array.from_buffers(bool8(), obj.size, [None, buf])

Review Comment:
   This would loose track of the buffer owner (the numpy array `obj`), so you 
would need to pass that to the `foreign_buffer` function as `base` argument.
   
   However, I think we could also simplify this by first creating a pyarrow 
storage array of int8, and then using `self.from_storage()` instead of using 
`from_buffers()` ?



##########
python/pyarrow/array.pxi:
##########
@@ -4447,6 +4447,69 @@ cdef class FixedShapeTensorArray(ExtensionArray):
             FixedSizeListArray.from_arrays(values, shape[1:].prod())
         )
 
+cdef class Bool8Array(ExtensionArray):
+    """
+    Concrete class for bool8 extension arrays.
+    Examples

Review Comment:
   ```suggestion
       Concrete class for bool8 extension arrays.
   
       Examples
   ```



##########
python/pyarrow/scalar.pxi:
##########
@@ -1084,6 +1084,10 @@ cdef class FixedShapeTensorScalar(ExtensionScalar):
             ctensor = GetResultValue(c_type.MakeTensor(scalar))
         return pyarrow_wrap_tensor(ctensor)
 
+cdef class Bool8Scalar(ExtensionScalar):
+    """
+    Concrete class for bool8 extension scalar.
+    """

Review Comment:
   We could probably override the `as_py()` method here, which by default uses 
the storage value. But in this case we could convert that integer to a bool?



##########
python/pyarrow/types.pxi:
##########
@@ -1809,6 +1809,32 @@ cdef class FixedShapeTensorType(BaseExtensionType):
     def __arrow_ext_scalar_class__(self):
         return FixedShapeTensorScalar
 
+cdef class Bool8Type(BaseExtensionType):
+    """
+    Concrete class for bool8 extension type.
+    Bool8 is an alternate representation for boolean
+    arrays using 8 bits instead of 1 bit per value. The underlying
+    storage type is int8.
+    Examples

Review Comment:
   Same comments here about blank lines (I assume this might have been caused 
by copy pasting from the Opaque PR, I have noticed that blank lines sometimes 
get removed when copying from web interfaces)



##########
python/pyarrow/types.pxi:
##########
@@ -5206,6 +5232,41 @@ def fixed_shape_tensor(DataType value_type, shape, 
dim_names=None, permutation=N
 
     return out
 
+def bool8():
+    """
+    Create instance of bool8 extension type.
+    Examples

Review Comment:
   And same here



##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1661,3 +1661,105 @@ def test_legacy_int_type():
             batch = ipc_read_batch(buf)
             assert isinstance(batch.column(0).type, LegacyIntType)
             assert batch.column(0) == ext_arr
+

Review Comment:
   ```suggestion
   
   
   ```
   
   (2 blank lines between top-level functions. I suppose the linting build will 
fail about this, but I would recommend to enable the pre-commit checks that 
also catch some of those issues)



##########
python/pyarrow/includes/libarrow.pxd:
##########
@@ -2882,6 +2882,17 @@ cdef extern from "arrow/extension/fixed_shape_tensor.h" 
namespace "arrow::extens
             " arrow::extension::FixedShapeTensorArray"(CExtensionArray):
         const CResult[shared_ptr[CTensor]] ToTensor() const
 
+cdef extern from "arrow/extension/bool8.h" namespace "arrow::extension" nogil:
+    cdef cppclass CBool8Type \
+            " arrow::extension::Bool8Type"(CExtensionType):

Review Comment:
   ```suggestion
       cdef cppclass CBool8Type" arrow::extension::Bool8Type"(CExtensionType):
   ```



##########
python/pyarrow/array.pxi:
##########
@@ -4447,6 +4447,69 @@ cdef class FixedShapeTensorArray(ExtensionArray):
             FixedSizeListArray.from_arrays(values, shape[1:].prod())
         )
 
+cdef class Bool8Array(ExtensionArray):
+    """
+    Concrete class for bool8 extension arrays.
+    Examples
+    --------
+    Define the extension type for an bool8 array
+    >>> import pyarrow as pa
+    >>> bool8_type = pa.bool8()
+    Create an extension array
+    >>> arr = [-1, 0, 1, 2, None]
+    >>> storage = pa.array(arr, pa.int8())
+    >>> pa.ExtensionArray.from_storage(bool8_type, storage)
+    <pyarrow.lib.Bool8Array object at ...>
+    [
+      -1,
+      0,
+      1,
+      2,
+      null
+    ]
+    """
+
+    def to_numpy(self, zero_copy_only=True, writable=False):
+        try:
+            return self.storage.to_numpy().view(np.bool_)
+        except ArrowInvalid as e:
+            if zero_copy_only:
+                raise e
+
+        return _pc().not_equal(self.storage, 
0).to_numpy(zero_copy_only=zero_copy_only, writable=writable)

Review Comment:
   And in practice, this code path gets reached if there are missing values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to