lllangWV opened a new issue, #45208:
URL: https://github.com/apache/arrow/issues/45208
### Describe the enhancement requested
### Enhancement Request: Custom Operator Support for PyArrow Extension Types
in Compute Functions
Hello, pyarrow devs!
I have been using the PyArrow extension capability to define custom types,
which is extremely useful for extending Arrow's functionality. However, a
significant limitation arises when using these custom types with compute
functions.
For example, the `FixedShapeTensorType` type, designed as an extension type
for `ndarrays`, triggers an error when used with the `pc.equal` function to
compare arrays:
#### Example Code
```python
import pyarrow as pa
import pyarrow.compute as pc
tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
arr_1 = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage_1 = pa.array(arr_1, pa.list_(pa.int32(), 4))
tensor_array_1 = pa.ExtensionArray.from_storage(tensor_type, storage_1)
arr_2 = [[1, 3, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage_2 = pa.array(arr_2, pa.list_(pa.int32(), 4))
tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type, storage_2)
# This triggers an error
print(pc.equal(tensor_array_1, tensor_array_2))
```
#### Error Message
```bash
return func.call(args, None, memory_pool)
File "pyarrow\\_compute.pyx", line 385, in pyarrow._compute.Function.call
File "pyarrow\\error.pxi", line 155, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel
matching input types (extension<arrow.fixed_shape_tensor[value_type=int32,
shape=[2,2]]>, extension<arrow.fixed_shape_tensor[value_type=int32,
shape=[2,2]]>)
```
### Proposed Solution
I believe it would be highly useful for PyArrow to allow users to define
custom operator support for extension types, similar to how [Pandas enables
operator support for
`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extensionarray-operator-support).
#### Suggested Implementation
Here’s an example for the interface:
```python
class PythonObjectArrowType(pa.ExtensionType):
def __init__(self):
super().__init__(pa.binary(), "parquetdb.PythonObjectArrow")
def __arrow_ext_serialize__(self):
return b""
@classmethod
def __arrow_ext_deserialize__(cls, storage_type, serialized):
return PythonObjectArrowType()
def __arrow_ext_class__(self):
return PythonObjectArrowArray
def to_pandas_dtype(self):
return PythonObjectPandasDtype()
def __arrow_ext_scalar_class__(self):
return PythonObjectArrowScalar
pa.register_extension_type(PythonObjectArrowType())
class PythonObjectArrowScalar(pa.ExtensionScalar):
def as_py(self):
return data_utils.load_python_object(self.value.as_py())
def __eq__(self, other):
return self.value == other.value
class PythonObjectArrowArray(pa.ExtensionArray):
def to_pandas(self, **kwargs):
values = self.storage.to_numpy(zero_copy_only=False)
results = mp_utils.parallel_apply(data_utils.load_python_object,
values)
return pd.Series(results)
def to_values(self, **kwargs):
values = self.storage.to_pandas(**kwargs).values
results = mp_utils.parallel_apply(data_utils.load_python_object,
values)
return results
```
In this example, the `PythonObjectArrowScalar` class defines an `__eq__`
method, enabling custom equality comparisons for the scalar elements.
Similarly, the `PythonObjectArrowArray` class can provide custom
implementations for data conversion and manipulation.
### Challenges
While defining `__eq__` in the scalar class is straightforward, I am
uncertain how this would integrate into compute functions like `pc.equal`. It
may require exposing additional hooks or mechanisms in PyArrow to allow users
to register their operator implementations.
Please let me know if additional details or examples are needed.
Best,
Logan Lang
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]