On Fri, Mar 29, 2024 at 9:11 AM Steven G. Johnson <stev...@mit.edu> wrote:

> Should a dtype=object array be treated more like Python lists for type
> detection/coercion reasons?   Currently, they are treated quite differently:
>
> >>> import numpy as np
> >>> np.isfinite([1,2,3])
> array([ True,  True,  True])
> >>> np.isfinite(np.asarray([1,2,3], dtype=object))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: ufunc 'isfinite' not supported for the input types, and the
> inputs could not be safely coerced to any supported types according to the
> casting rule ''safe''
>
> The reason I ask is that we ran into something similar when trying to pass
> wrappers around Julia arrays to Python.  A Julia `Any[]` array is much like
> a Python list or a Numpy `object` array, and exposed in Python as a subtype
> of MutableSequence, but we found that it was treated by NumPy as more like
> a Numpy `object` array than a Python list (
> https://github.com/JuliaPy/PythonCall.jl/issues/486).
>
> Would it be desirable to treat a 1d Numpy `object` array more like a
> Python `list`?   Or is there a way for externally defined types to opt-in
> to the `list` behavior?  (I couldn't figure out in the numpy source code
> where `list` is being special-cased?)
>

`list` isn't special-cased, per se. Most numpy functions work on `ndarray`
objects and accept "array-like" objects like `list`s by coercing them to
`ndarray` objects ASAP using `np.asarray()`. `asarray` will leave existing
`ndarray` instances alone. When you pass `[1,2,3]` to a numpy function,
`np.asarray([1,2,3])` takes that list and infers the dtype and shape from
its contents using just the `Sequence` API. Since they are `int` objects of
reasonable size, the result is `np.array([1, 2, 3], dtype=np.int64)`.  The
`isfinite` ufunc has a loop defined for `dtype=np.int64`.

`np.asarray()` will also check for special methods and properties like
`__array__()` and `__array_interface__` to allow objects to customize how
they should be interpreted as `ndarray` objects, in particular, allowing
memory to be shared efficiently as pointers, if the layouts are compatible,
but also to control the dtype of the final array. PythonCall.jl implements
`__array__()` and `__array_interface__` for its array objects for these
purposes, and if I am not mistaken, explicitly makes `Any[]` convert to a
`dtype=object` `ndarray`
<https://github.com/JuliaPy/PythonCall.jl/blob/f586f2494432f2e5366ff1e2876b8aa532630b54/src/JlWrap/objectarray.jl#L61-L69>
(the
`typestr = "|O"` is equivalent to `dtype=object` in that context). It's
*possible* that you'd really rather have this `PyObjectArray` *not*
implement those interfaces and just let the `np.asarray()` inference work
its magic through the `Sequence` API, but that is expensive. You could also
try doing the type inference yourself and implement that in the
`PyObjectArray.__array__()` implementation and avoid implementing
`__array_interface__` for that object. Then `np.asarray()` will just
delegate to `PyObjectArray.__array__()`.

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to