Hello NumPy-ers!

The __array__ method is a great little tool to allow interoperability with 
NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an 
`__array__` method, one can get a NumPy representation of that object, which 
may or may not involve data copying (this is up to the object’s implementation 
of `__array__`). Some references:

https://numpy.org/devdocs/user/basics.dispatch.html 
<https://numpy.org/devdocs/user/basics.dispatch.html>

https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__
 
<https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__>
https://numpy.org/devdocs/reference/generated/numpy.array.html 
<https://numpy.org/devdocs/reference/generated/numpy.array.html>
https://numpy.org/devdocs/reference/generated/numpy.asarray.html 
<https://numpy.org/devdocs/reference/generated/numpy.asarray.html>


(I couldn’t find an authoritative guide on good and bad practices with 
`__array__`, btw.)

For people writing e.g. visualisation libraries, this is a wonderful thing, 
because if we know how to visualise NumPy arrays, we can suddenly visualise 
anything with an `__array__` method. As an example, napari, while not being 
aware of dask, can visualise large dask arrays out of the box, which allows us 
to view 100GB out-of-core datasets easily.

However, in many cases, instantiating a NumPy array is an expensive operation, 
for example copying an array from GPU to CPU memory, or involves substantial 
loss of information. Some library authors are reluctant to allow implicit 
execution of such an operation, such as PyOpenCL [1], PyTorch [2], or even 
scipy.sparse.

My proposal is to add an optional argument to `__array__` that would signal to 
the downstream library that we *really* want a NumPy array and are willing to 
wait for it. In the PyTorch issue I proposed `force=True`, and they are 
somewhat receptive of this, but, reading more about the existing NumPy APIs, I 
think `copy=True` would be a nice alternative:

- np.array already has a copy= keyword argument. Under this proposal, it would 
attempt to pass it to the downstream library, and, if that failed, it would try 
again without it and run its own copy.
- np.asarray could get a new copy= keyword argument that would match np.array’s.
- It would neatly express the idea that the array is going to e.g. get passed 
around between devices.

Or, we could just go with `force=`.

One bit of expressivity we would miss is “copy if necessary, but otherwise 
don’t bother”, but there are workarounds to this.

What do people think? I would be happy to write a PR and/or NEP for this if 
there is general consensus that this would be useful.

Thanks,

Juan.

Refs:

[1]: https://github.com/inducer/pyopencl/pull/301
[2]: https://github.com/pytorch/pytorch/issues/36560
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to