On Fri, Mar 29, 2024 at 9:11 AM Steven G. Johnson <stev...@mit.edu> wrote:
> Should a dtype=object array be treated more like Python lists for type > detection/coercion reasons? Currently, they are treated quite differently: > > >>> import numpy as np > >>> np.isfinite([1,2,3]) > array([ True, True, True]) > >>> np.isfinite(np.asarray([1,2,3], dtype=object)) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: ufunc 'isfinite' not supported for the input types, and the > inputs could not be safely coerced to any supported types according to the > casting rule ''safe'' > > The reason I ask is that we ran into something similar when trying to pass > wrappers around Julia arrays to Python. A Julia `Any[]` array is much like > a Python list or a Numpy `object` array, and exposed in Python as a subtype > of MutableSequence, but we found that it was treated by NumPy as more like > a Numpy `object` array than a Python list ( > https://github.com/JuliaPy/PythonCall.jl/issues/486). > > Would it be desirable to treat a 1d Numpy `object` array more like a > Python `list`? Or is there a way for externally defined types to opt-in > to the `list` behavior? (I couldn't figure out in the numpy source code > where `list` is being special-cased?) > `list` isn't special-cased, per se. Most numpy functions work on `ndarray` objects and accept "array-like" objects like `list`s by coercing them to `ndarray` objects ASAP using `np.asarray()`. `asarray` will leave existing `ndarray` instances alone. When you pass `[1,2,3]` to a numpy function, `np.asarray([1,2,3])` takes that list and infers the dtype and shape from its contents using just the `Sequence` API. Since they are `int` objects of reasonable size, the result is `np.array([1, 2, 3], dtype=np.int64)`. The `isfinite` ufunc has a loop defined for `dtype=np.int64`. `np.asarray()` will also check for special methods and properties like `__array__()` and `__array_interface__` to allow objects to customize how they should be interpreted as `ndarray` objects, in particular, allowing memory to be shared efficiently as pointers, if the layouts are compatible, but also to control the dtype of the final array. PythonCall.jl implements `__array__()` and `__array_interface__` for its array objects for these purposes, and if I am not mistaken, explicitly makes `Any[]` convert to a `dtype=object` `ndarray` <https://github.com/JuliaPy/PythonCall.jl/blob/f586f2494432f2e5366ff1e2876b8aa532630b54/src/JlWrap/objectarray.jl#L61-L69> (the `typestr = "|O"` is equivalent to `dtype=object` in that context). It's *possible* that you'd really rather have this `PyObjectArray` *not* implement those interfaces and just let the `np.asarray()` inference work its magic through the `Sequence` API, but that is expensive. You could also try doing the type inference yourself and implement that in the `PyObjectArray.__array__()` implementation and avoid implementing `__array_interface__` for that object. Then `np.asarray()` will just delegate to `PyObjectArray.__array__()`. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com