Hello NumPy-ers!
The __array__ method is a great little tool to allow interoperability with
NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an
`__array__` method, one can get a NumPy representation of that object, which
may or may not involve data copying (this is up to the object’s implementation
of `__array__`). Some references:
https://numpy.org/devdocs/user/basics.dispatch.html
<https://numpy.org/devdocs/user/basics.dispatch.html>
https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__
<https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__>
https://numpy.org/devdocs/reference/generated/numpy.array.html
<https://numpy.org/devdocs/reference/generated/numpy.array.html>
https://numpy.org/devdocs/reference/generated/numpy.asarray.html
<https://numpy.org/devdocs/reference/generated/numpy.asarray.html>
(I couldn’t find an authoritative guide on good and bad practices with
`__array__`, btw.)
For people writing e.g. visualisation libraries, this is a wonderful thing,
because if we know how to visualise NumPy arrays, we can suddenly visualise
anything with an `__array__` method. As an example, napari, while not being
aware of dask, can visualise large dask arrays out of the box, which allows us
to view 100GB out-of-core datasets easily.
However, in many cases, instantiating a NumPy array is an expensive operation,
for example copying an array from GPU to CPU memory, or involves substantial
loss of information. Some library authors are reluctant to allow implicit
execution of such an operation, such as PyOpenCL [1], PyTorch [2], or even
scipy.sparse.
My proposal is to add an optional argument to `__array__` that would signal to
the downstream library that we *really* want a NumPy array and are willing to
wait for it. In the PyTorch issue I proposed `force=True`, and they are
somewhat receptive of this, but, reading more about the existing NumPy APIs, I
think `copy=True` would be a nice alternative:
- np.array already has a copy= keyword argument. Under this proposal, it would
attempt to pass it to the downstream library, and, if that failed, it would try
again without it and run its own copy.
- np.asarray could get a new copy= keyword argument that would match np.array’s.
- It would neatly express the idea that the array is going to e.g. get passed
around between devices.
Or, we could just go with `force=`.
One bit of expressivity we would miss is “copy if necessary, but otherwise
don’t bother”, but there are workarounds to this.
What do people think? I would be happy to write a PR and/or NEP for this if
there is general consensus that this would be useful.
Thanks,
Juan.
Refs:
[1]: https://github.com/inducer/pyopencl/pull/301
[2]: https://github.com/pytorch/pytorch/issues/36560
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion