Hi Nathaniel,

Thanks for starting the discussion!

Like Marten says, I think it would be useful to more clearly define what it
means to be an abstract array. ndarray has lots of methods/properties that
expose internal implementation (e.g., view, strides) that presumably we
don't want to require as part of this interfaces. On the other hand, dtype
and shape are almost assuredly part of this interface.

To help guide the discussion, it would be good to identify concrete
examples of types that should and should not satisfy this interface, e.g.,
Marten's case 1: works exactly like ndarray, but stores data differently:
parallel arrays (e.g., dask.array), sparse arrays (e.g.,
https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
always C ordered).
Marten's case 2: same methods as ndarray, but gives different results:
np.ma.MaskedArray, arrays with units (quantities), maybe labeled arrays
like xarray.DataArray

I don't think we have a hope of making a single base class for case 2 work
with everything in NumPy, but we can define interfaces with different
levels of functionality.

Because there is such a gradation of "duck array" types, I agree with
Marten that we should not deprecate NDArrayOperatorsMixin. It's useful for
types like xarray.Dataset that define __array_ufunc__ but cannot satisfy
the full abstract array interface.

Finally for the name, what about `asduckarray`? Thought perhaps that could
be a source of confusion, and given the gradation of duck array like types.

Cheers,
Stephan

On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Nathaniel,
>
> Overall, hugely in favour!  For detailed comments, it would be good to
> have a link to a PR; could you put that up?
>
> A larger comment: you state that you think `np.asanyarray` is a
> mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
> and that those do not strictly mimic `NDArray`. Here, I agree with
> `matrix` (but since we're deprecating it, let's remove that from the
> discussion), but I do not see how your proposed interface would not
> let `MaskedArray` pass through, nor really that one would necessarily
> want that.
>
> I think it may be good to distinguish two separate cases:
> 1. Everything has exactly the same meaning as for `ndarray` but the
> data is stored differently (i.e., only `view` does not work). One can
> thus expect that for `output = function(inputs)`, at the end all
> `duck_output == ndarray_output`.
> 2. Everything is implemented but operations may give different output
> (depending on masks for masked arrays, units for quantities, etc.), so
> generally `duck_output != ndarray_output`.
>
> Which one of these are you aiming at? By including
> `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is
> there a case for both separately?
>
> Smaller general comment: at least in the NEP I would not worry about
> deprecating `NDArrayOperatorsMixin` - this may well be handy in itself
> (for things that implement `__array_ufunc__` but do not have shape,
> etc. (I have been doing some work on creating ufunc chains that would
> use this -- but they definitely are not array-like). Similarly, I
> think there is room for an `NDArrayShapeMixin` which might help with
> `concatenate` and friends.
>
> Finally, on the name: `asarray` and `asanyarray` are just shims over
> `array`, so one option would be to add an argument in `array` (or
> broaden the scope of `subok`).
>
> As an explicit suggestion, one could introduce a `duck` or `abstract`
> argument to `array` which is used in `asarray` and `asanyarray` as
> well (corresponding to options 1 and 2), and eventually default to
> something sensible (I would think `False` for `asarray` and `True` for
> `asanyarray`).
>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to