[Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Nathaniel Smith
Hi all,

Here's a more substantive NEP: trying to define how to define a
standard way for functions to say that they can accept any "duck
array".

Biggest open question for me: the name "asabstractarray" kinda sucks
(for reasons described in the NEP), and I'd love to have something
better. Any ideas?

Rendered version:
https://github.com/njsmith/numpy/blob/nep-16-abstract-array/doc/neps/nep-0016-abstract-array.rst

-n




An abstract base class for identifying "duck arrays"


:Author: Nathaniel J. Smith 
:Status: Draft
:Type: Standards Track
:Created: 2018-03-06


Abstract


We propose to add an abstract base class ``AbstractArray`` so that
third-party classes can declare their ability to "quack like" an
``ndarray``, and an ``asabstractarray`` function that performs
similarly to ``asarray`` except that it passes through
``AbstractArray`` instances unchanged.


Detailed description


Many functions, in NumPy and in third-party packages, start with some
code like::

   def myfunc(a, b):
   a = np.asarray(a)
   b = np.asarray(b)
   ...

This ensures that ``a`` and ``b`` are ``np.ndarray`` objects, so
``myfunc`` can carry on assuming that they'll act like ndarrays both
semantically (at the Python level), and also in terms of how they're
stored in memory (at the C level). But many of these functions only
work with arrays at the Python level, which means that they don't
actually need ``ndarray`` objects *per se*: they could work just as
well with any Python object that "quacks like" an ndarray, such as
sparse arrays, dask's lazy arrays, or xarray's labeled arrays.

However, currently, there's no way for these libraries to express that
their objects can quack like an ndarray, and there's no way for
functions like ``myfunc`` to express that they'd be happy with
anything that quacks like an ndarray. The purpose of this NEP is to
provide those two features.

Sometimes people suggest using ``np.asanyarray`` for this purpose, but
unfortunately its semantics are exactly backwards: it guarantees that
the object it returns uses the same memory layout as an ``ndarray``,
but tells you nothing at all about its semantics, which makes it
essentially impossible to use safely in practice. Indeed, the two
``ndarray`` subclasses distributed with NumPy – ``np.matrix`` and
``np.ma.masked_array`` – do have incompatible semantics, and if they
were passed to a function like ``myfunc`` that doesn't check for them
as a special-case, then it may silently return incorrect results.


Declaring that an object can quack like an array


There are two basic approaches we could use for checking whether an
object quacks like an array. We could check for a special attribute on
the class::

  def quacks_like_array(obj):
  return bool(getattr(type(obj), "__quacks_like_array__", False))

Or, we could define an `abstract base class (ABC)
`__::

  def quacks_like_array(obj):
  return isinstance(obj, AbstractArray)

If you look at how ABCs work, this is essentially equivalent to
keeping a global set of types that have been declared to implement the
``AbstractArray`` interface, and then checking it for membership.

Between these, the ABC approach seems to have a number of advantages:

* It's Python's standard, "one obvious way" of doing this.

* ABCs can be introspected (e.g. ``help(np.AbstractArray)`` does
  something useful).

* ABCs can provide useful mixin methods.

* ABCs integrate with other features like mypy type-checking,
  ``functools.singledispatch``, etc.

One obvious thing to check is whether this choice affects speed. Using
the attached benchmark script on a CPython 3.7 prerelease (revision
c4d77a661138d, self-compiled, no PGO), on a Thinkpad T450s running
Linux, we find::

np.asarray(ndarray_obj)  330 ns
np.asarray([])  1400 ns

Attribute check, success  80 ns
Attribute check, failure  80 ns

ABC, success via subclass340 ns
ABC, success via register()  700 ns
ABC, failure 370 ns

Notes:

* The first two lines are included to put the other lines in context.

* This used 3.7 because both ``getattr`` and ABCs are receiving
  substantial optimizations in this release, and it's more
  representative of the long-term future of Python. (Failed
  ``getattr`` doesn't necessarily construct an exception object
  anymore, and ABCs were reimplemented in C.)

* The "success" lines refer to cases where ``quacks_like_array`` would
  return True. The "failure" lines are cases where it would return
  False.

* The first measurement for ABCs is subclasses defined like::

  class MyArray(AbstractArray):
  ...

  The second is for subclasses defined like::

  class MyArray:
  ...

  AbstractArray.register(

Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Marten van Kerkwijk
Hi Nathaniel,

Overall, hugely in favour!  For detailed comments, it would be good to
have a link to a PR; could you put that up?

A larger comment: you state that you think `np.asanyarray` is a
mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
and that those do not strictly mimic `NDArray`. Here, I agree with
`matrix` (but since we're deprecating it, let's remove that from the
discussion), but I do not see how your proposed interface would not
let `MaskedArray` pass through, nor really that one would necessarily
want that.

I think it may be good to distinguish two separate cases:
1. Everything has exactly the same meaning as for `ndarray` but the
data is stored differently (i.e., only `view` does not work). One can
thus expect that for `output = function(inputs)`, at the end all
`duck_output == ndarray_output`.
2. Everything is implemented but operations may give different output
(depending on masks for masked arrays, units for quantities, etc.), so
generally `duck_output != ndarray_output`.

Which one of these are you aiming at? By including
`NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is
there a case for both separately?

Smaller general comment: at least in the NEP I would not worry about
deprecating `NDArrayOperatorsMixin` - this may well be handy in itself
(for things that implement `__array_ufunc__` but do not have shape,
etc. (I have been doing some work on creating ufunc chains that would
use this -- but they definitely are not array-like). Similarly, I
think there is room for an `NDArrayShapeMixin` which might help with
`concatenate` and friends.

Finally, on the name: `asarray` and `asanyarray` are just shims over
`array`, so one option would be to add an argument in `array` (or
broaden the scope of `subok`).

As an explicit suggestion, one could introduce a `duck` or `abstract`
argument to `array` which is used in `asarray` and `asanyarray` as
well (corresponding to options 1 and 2), and eventually default to
something sensible (I would think `False` for `asarray` and `True` for
`asanyarray`).

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Stephan Hoyer
Hi Nathaniel,

Thanks for starting the discussion!

Like Marten says, I think it would be useful to more clearly define what it
means to be an abstract array. ndarray has lots of methods/properties that
expose internal implementation (e.g., view, strides) that presumably we
don't want to require as part of this interfaces. On the other hand, dtype
and shape are almost assuredly part of this interface.

To help guide the discussion, it would be good to identify concrete
examples of types that should and should not satisfy this interface, e.g.,
Marten's case 1: works exactly like ndarray, but stores data differently:
parallel arrays (e.g., dask.array), sparse arrays (e.g.,
https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
always C ordered).
Marten's case 2: same methods as ndarray, but gives different results:
np.ma.MaskedArray, arrays with units (quantities), maybe labeled arrays
like xarray.DataArray

I don't think we have a hope of making a single base class for case 2 work
with everything in NumPy, but we can define interfaces with different
levels of functionality.

Because there is such a gradation of "duck array" types, I agree with
Marten that we should not deprecate NDArrayOperatorsMixin. It's useful for
types like xarray.Dataset that define __array_ufunc__ but cannot satisfy
the full abstract array interface.

Finally for the name, what about `asduckarray`? Thought perhaps that could
be a source of confusion, and given the gradation of duck array like types.

Cheers,
Stephan

On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Nathaniel,
>
> Overall, hugely in favour!  For detailed comments, it would be good to
> have a link to a PR; could you put that up?
>
> A larger comment: you state that you think `np.asanyarray` is a
> mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
> and that those do not strictly mimic `NDArray`. Here, I agree with
> `matrix` (but since we're deprecating it, let's remove that from the
> discussion), but I do not see how your proposed interface would not
> let `MaskedArray` pass through, nor really that one would necessarily
> want that.
>
> I think it may be good to distinguish two separate cases:
> 1. Everything has exactly the same meaning as for `ndarray` but the
> data is stored differently (i.e., only `view` does not work). One can
> thus expect that for `output = function(inputs)`, at the end all
> `duck_output == ndarray_output`.
> 2. Everything is implemented but operations may give different output
> (depending on masks for masked arrays, units for quantities, etc.), so
> generally `duck_output != ndarray_output`.
>
> Which one of these are you aiming at? By including
> `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is
> there a case for both separately?
>
> Smaller general comment: at least in the NEP I would not worry about
> deprecating `NDArrayOperatorsMixin` - this may well be handy in itself
> (for things that implement `__array_ufunc__` but do not have shape,
> etc. (I have been doing some work on creating ufunc chains that would
> use this -- but they definitely are not array-like). Similarly, I
> think there is room for an `NDArrayShapeMixin` which might help with
> `concatenate` and friends.
>
> Finally, on the name: `asarray` and `asanyarray` are just shims over
> `array`, so one option would be to add an argument in `array` (or
> broaden the scope of `subok`).
>
> As an explicit suggestion, one could introduce a `duck` or `abstract`
> argument to `array` which is used in `asarray` and `asanyarray` as
> well (corresponding to options 1 and 2), and eventually default to
> something sensible (I would think `False` for `asarray` and `True` for
> `asanyarray`).
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Juan Nunez-Iglesias
On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote:
> Marten's case 1: works exactly like ndarray, but stores data
> differently: parallel arrays (e.g., dask.array), sparse arrays (e.g.,
> https://github.com/pydata/sparse), hypothetical non-strided arrays
> (e.g., always C ordered).
Two other "hypotheticals" that would fit nicely in this space:
- the Open Connectome folks (https://neurodata.io) proposed linearising
  indices using space-filling curves, which minimizes cache misses (or
  IO reads) for giant volumes. I believe they implemented this but can't
  find it currently.- the N5 format for chunked arrays on disk:
  https://github.com/saalfeldlab/n5
> Finally for the name, what about `asduckarray`? Thought perhaps that
> could be a source of confusion, and given the gradation of duck array
> like types.
I suggest that the name should *not* use programmer lingo, so neither
"abstract" nor "duck" should be in there. My humble proposal is
"arraylike". (I know that this term has included things like "list-of-
list" before but only in text, not code, as far as I know.)
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Stephan Hoyer
On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias 
wrote:

> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote:
>
> Marten's case 1: works exactly like ndarray, but stores data differently:
> parallel arrays (e.g., dask.array), sparse arrays (e.g.,
> https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
> always C ordered).
>
>
> Two other "hypotheticals" that would fit nicely in this space:
> - the Open Connectome folks (https://neurodata.io) proposed linearising
> indices using space-filling curves, which minimizes cache misses (or IO
> reads) for giant volumes. I believe they implemented this but can't find it
> currently.
> - the N5 format for chunked arrays on disk:
> https://github.com/saalfeldlab/n5
>

I think these fall into another important category of duck arrays.
"Indexable" arrays the serve as storage, but that don't support
computation. These sorts of arrays typically support operations like
indexing and define handful of array-like properties (e.g., dtype and
shape), but not arithmetic, reductions or reshaping.

This means you can't quite use them as a drop-in replacement for NumPy
arrays in all cases, but that's OK. In contrast, both dask.array and sparse
do aspire to do fill out nearly the full numpy.ndarray API.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-08 Thread Hameer Abbasi
Not that I’m against different “levels” of ndarray granularity, but I just
don’t want it to introduce complexity for the end-user. For example, it
would be unreasonable to expect the end-user to check for all parts of the
interface that they need support for separately.

Keeping this in view; different levels only make sense if and only if they
are strict sub/supersets of each other, so the user can just check for the
highest level of compatibility they require, but even then they would need
to learn about the different “levels".

PS, thanks for putting this together! I was thinking of doing it this
weekend but you beat me to it and covered aspects I wouldn’t have thought
of.

The name “asarraylike” appeals to me, as does a “custom=“ kwarg for
asanyarray.


Sent from Astro  for Mac

On Mar 9, 2018 at 02:51, Juan Nunez-Iglesias  wrote:


On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote:

Marten's case 1: works exactly like ndarray, but stores data differently:
parallel arrays (e.g., dask.array), sparse arrays (e.g.,
https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
always C ordered).


Two other "hypotheticals" that would fit nicely in this space:
- the Open Connectome folks (https://neurodata.io) proposed linearising
indices using space-filling curves, which minimizes cache misses (or IO
reads) for giant volumes. I believe they implemented this but can't find it
currently.
- the N5 format for chunked arrays on disk:
https://github.com/saalfeldlab/n5

Finally for the name, what about `asduckarray`? Thought perhaps that could
be a source of confusion, and given the gradation of duck array like types.


I suggest that the name should *not* use programmer lingo, so neither
"abstract" nor "duck" should be in there. My humble proposal is
"arraylike". (I know that this term has included things like "list-of-list"
before but only in text, not code, as far as I know.)

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Nathaniel Smith
On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk
 wrote:
> A larger comment: you state that you think `np.asanyarray` is a
> mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
> and that those do not strictly mimic `NDArray`. Here, I agree with
> `matrix` (but since we're deprecating it, let's remove that from the
> discussion), but I do not see how your proposed interface would not
> let `MaskedArray` pass through, nor really that one would necessarily
> want that.

We can discuss whether MaskedArray should be an AbstractArray.
Conceptually it probably should be; I think that was a goal of the
MaskedArray authors (even if they wouldn't have put it that way). In
practice there are a lot of funny quirks in MaskedArray, so I'd want
to look more carefully in case there are weird incompatibilities that
would cause problems. Note that we can figure this out after the NEP
is finished, too.

I wonder if the matplotlib folks have any thoughts on this? I know
they're one of the more prominent libraries that tries to handle both
regular and masked arrays, so maybe they could comment on how often
they run

> I think it may be good to distinguish two separate cases:
> 1. Everything has exactly the same meaning as for `ndarray` but the
> data is stored differently (i.e., only `view` does not work). One can
> thus expect that for `output = function(inputs)`, at the end all
> `duck_output == ndarray_output`.
> 2. Everything is implemented but operations may give different output
> (depending on masks for masked arrays, units for quantities, etc.), so
> generally `duck_output != ndarray_output`.
>
> Which one of these are you aiming at? By including
> `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is
> there a case for both separately?

Well, (1) is much easier to design around, because it's well-defined
:-). And I'm not sure that there's a principled difference between
regular arrays and masked arrays/quantity arrays; these *could* be
ndarray objects with special dtypes and extra methods, neither of
which would disqualify you from being a "case 1" array.

(I guess one issue is that because MaskedArray ignores the mask by
default, you could get weird results from things like mean
calculations: np.sum(masked_arr) / np.prod(masked_arr.shape) does not
give the right result. This isn't an issue for quantities, though, or
for an R-style NA that propagated by default.)

> Smaller general comment: at least in the NEP I would not worry about
> deprecating `NDArrayOperatorsMixin` - this may well be handy in itself
> (for things that implement `__array_ufunc__` but do not have shape,
> etc. (I have been doing some work on creating ufunc chains that would
> use this -- but they definitely are not array-like). Similarly, I
> think there is room for an `NDArrayShapeMixin` which might help with
> `concatenate` and friends.

Fair enough.

> Finally, on the name: `asarray` and `asanyarray` are just shims over
> `array`, so one option would be to add an argument in `array` (or
> broaden the scope of `subok`).

We definitely don't want to broaden the scope of 'subok', because one
of the goals here is to have something that projects like sklearn can
use, and they won't use subok :-). (In particular, np.matrix is
definitely not a duck array of any kind.)

And supporting array() is tricky, because then you have to figure out
what to do with the copy=, order=, subok=, ndmin= arguments. copy= in
particular is tricky given that we don't know the object's type! I
guess we could call obj.copy() or something... but for this first
iteration it seemed simplest to make a new function that just has the
most important stuff for writing generic functions that accept duck
arrays.

What we could do is, in addition to adding some kind of
asabstractarray() function, *also* make it so asanyarray() starts
accepting abstract/duck arrays, on the theory that anyone who's
willing to put up with asanyarrays()'s weak guarantees won't notice if
we weaken them a bit more. Honestly though I'd rather just not touch
asanyarray at all, and maybe even deprecate it someday.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Sebastian Berg
On Thu, 2018-03-08 at 18:56 +, Stephan Hoyer wrote:
> Hi Nathaniel,
> 
> Thanks for starting the discussion!
> 
> Like Marten says, I think it would be useful to more clearly define
> what it means to be an abstract array. ndarray has lots of
> methods/properties that expose internal implementation (e.g., view,
> strides) that presumably we don't want to require as part of this
> interfaces. On the other hand, dtype and shape are almost assuredly
> part of this interface.
> 
> To help guide the discussion, it would be good to identify concrete
> examples of types that should and should not satisfy this interface,
> e.g.,
> Marten's case 1: works exactly like ndarray, but stores data
> differently: parallel arrays (e.g., dask.array), sparse arrays (e.g.,
> https://github.com/pydata/sparse), hypothetical non-strided arrays
> (e.g., always C ordered).
> Marten's case 2: same methods as ndarray, but gives different
> results: np.ma.MaskedArray, arrays with units (quantities), maybe
> labeled arrays like xarray.DataArray
> 
> I don't think we have a hope of making a single base class for case 2
> work with everything in NumPy, but we can define interfaces with
> different levels of functionality.


True, but I guess the aim is not to care at all about how things are
implemented (so only 2)? I agree that we can aim to be as close as
possible, but should not expect to reach it.
My personal opinion:

1. To do this, we should start it "experimentally"

2. We need something like a reference implementation. First, because it
allows testing whether a function e.g. in numpy is actually abstract-
safe and second because it will be the only way to find out what our
minimal abstract interface actually is (assuming we have started 3).

3. Go ahead with putting it into numpy functions and see how much you
need to make them work. In the end, my guess is, everything that works
for MaskedArrays and xarray is a pretty safe bet.

I disagree with the statement that we do not need to define the minimal
reference. In practice we do as soon as we use it for numpy functions.

- Sebastian


> 
> Because there is such a gradation of "duck array" types, I agree with
> Marten that we should not deprecate NDArrayOperatorsMixin. It's
> useful for types like xarray.Dataset that define __array_ufunc__ but
> cannot satisfy the full abstract array interface.
> 
> Finally for the name, what about `asduckarray`? Thought perhaps that
> could be a source of confusion, and given the gradation of duck array
> like types.
> 
> Cheers,
> Stephan
> 
> On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk  mail.com> wrote:
> > Hi Nathaniel,
> > 
> > Overall, hugely in favour!  For detailed comments, it would be good
> > to
> > have a link to a PR; could you put that up?
> > 
> > A larger comment: you state that you think `np.asanyarray` is a
> > mistake since `np.matrix` and `np.ma.MaskedArray` would pass
> > through
> > and that those do not strictly mimic `NDArray`. Here, I agree with
> > `matrix` (but since we're deprecating it, let's remove that from
> > the
> > discussion), but I do not see how your proposed interface would not
> > let `MaskedArray` pass through, nor really that one would
> > necessarily
> > want that.
> > 
> > I think it may be good to distinguish two separate cases:
> > 1. Everything has exactly the same meaning as for `ndarray` but the
> > data is stored differently (i.e., only `view` does not work). One
> > can
> > thus expect that for `output = function(inputs)`, at the end all
> > `duck_output == ndarray_output`.
> > 2. Everything is implemented but operations may give different
> > output
> > (depending on masks for masked arrays, units for quantities, etc.),
> > so
> > generally `duck_output != ndarray_output`.
> > 
> > Which one of these are you aiming at? By including
> > `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not?
> > Is
> > there a case for both separately?
> > 
> > Smaller general comment: at least in the NEP I would not worry
> > about
> > deprecating `NDArrayOperatorsMixin` - this may well be handy in
> > itself
> > (for things that implement `__array_ufunc__` but do not have shape,
> > etc. (I have been doing some work on creating ufunc chains that
> > would
> > use this -- but they definitely are not array-like). Similarly, I
> > think there is room for an `NDArrayShapeMixin` which might help
> > with
> > `concatenate` and friends.
> > 
> > Finally, on the name: `asarray` and `asanyarray` are just shims
> > over
> > `array`, so one option would be to add an argument in `array` (or
> > broaden the scope of `subok`).
> > 
> > As an explicit suggestion, one could introduce a `duck` or
> > `abstract`
> > argument to `array` which is used in `asarray` and `asanyarray` as
> > well (corresponding to options 1 and 2), and eventually default to
> > something sensible (I would think `False` for `asarray` and `True`
> > for
> > `asanyarray`).
> > 
> > All the best,
> > 
> > Marten
> > 

Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Ryan May
On Fri, Mar 9, 2018 at 2:29 AM, Nathaniel Smith  wrote:

> On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk
>  wrote:
> > A larger comment: you state that you think `np.asanyarray` is a
> > mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
> > and that those do not strictly mimic `NDArray`. Here, I agree with
> > `matrix` (but since we're deprecating it, let's remove that from the
> > discussion), but I do not see how your proposed interface would not
> > let `MaskedArray` pass through, nor really that one would necessarily
> > want that.
>
> We can discuss whether MaskedArray should be an AbstractArray.
> Conceptually it probably should be; I think that was a goal of the
> MaskedArray authors (even if they wouldn't have put it that way). In
> practice there are a lot of funny quirks in MaskedArray, so I'd want
> to look more carefully in case there are weird incompatibilities that
> would cause problems. Note that we can figure this out after the NEP
> is finished, too.
>
> I wonder if the matplotlib folks have any thoughts on this? I know
> they're one of the more prominent libraries that tries to handle both
> regular and masked arrays, so maybe they could comment on how often
> they run


There's a lot of places in matplotlib where this could simplify our checks,
though probably more from a standpoint of "does this thing we've been given
need converting?"

There are also a lot of places where matplotlib needs to know if we have
actually been given a MaskedArray so that we can handle it specially.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Ryan May
On Fri, Mar 9, 2018 at 12:21 AM, Hameer Abbasi 
wrote:

> Not that I’m against different “levels” of ndarray granularity, but I just
> don’t want it to introduce complexity for the end-user. For example, it
> would be unreasonable to expect the end-user to check for all parts of the
> interface that they need support for separately.
>

I wouldn't necessarily want all of the granularity exposed in something
like "asarraylike"--that should be kept really simple. But I think there's
value in numpy providing multiple ABCs for portions of the interface (and
one big one that combines them all). That way, people who want the
finer-grained checking (say for a more limited array-like) can use a
common, shared, existing ABC, rather than having everyone re-invent it.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Marten van Kerkwijk
We may be getting a bit distracted by the naming -- though I'll throw
out `asarraymimic` as another non-programmer-lingo option that doesn't
reuse `arraylike` and might describe what the duck array is attempting
to do more closely.

But more to the point: I think in essence, we're trying to create a
function that does the equivalent of:
```
def ...(arraylike, ...)
if isinstance(arraylike, NDAbstractArray):
return arraylike
else:
return np.array(arraylike, ...)
```

Given that one possibly might want to check for partial compatibility,
maybe the new or old function should just expose what compatibility is
desired, via something like:
```
input = np.as...(input, ..., mimicok='shape|operator|...')
```
Where one could have `mimicok=True` to indicate the highest level
(maybe not including being viewable?), `False` to not allow any
mimics.

This might even work for np.array itself:
- dtype - any mimic must provide `astype` (which can error if not
possible; this could be the ABC default)
- copy - can't one just use `copy.copy`? I think this defaults to `__copy__`.
- order - can be passed to `astype` as well; up to code to error if
not possible.
- subok - meaningless
- ndmin - requirement of mimicok='shape' would be to provide a shape
attribute and reshape method.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Nathaniel Smith
On Thu, Mar 8, 2018 at 9:45 PM, Stephan Hoyer  wrote:
> On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias 
> wrote:
>>
>> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote:
>>
>> Marten's case 1: works exactly like ndarray, but stores data differently:
>> parallel arrays (e.g., dask.array), sparse arrays (e.g.,
>> https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
>> always C ordered).
>>
>>
>> Two other "hypotheticals" that would fit nicely in this space:
>> - the Open Connectome folks (https://neurodata.io) proposed linearising
>> indices using space-filling curves, which minimizes cache misses (or IO
>> reads) for giant volumes. I believe they implemented this but can't find it
>> currently.
>> - the N5 format for chunked arrays on disk:
>> https://github.com/saalfeldlab/n5
>
>
> I think these fall into another important category of duck arrays.
> "Indexable" arrays the serve as storage, but that don't support computation.
> These sorts of arrays typically support operations like indexing and define
> handful of array-like properties (e.g., dtype and shape), but not
> arithmetic, reductions or reshaping.
>
> This means you can't quite use them as a drop-in replacement for NumPy
> arrays in all cases, but that's OK. In contrast, both dask.array and sparse
> do aspire to do fill out nearly the full numpy.ndarray API.

I'm not sure if these particular formats fall into that category or
not (isn't the point of the space-filling curves to support
cache-efficient computation?). But I suppose you're also thinking of
things like h5py.Dataset? My impression is that these are mostly
handled pretty well already by defining __array__ and/or providing
array operations that implicitly convert to ndarray -- do you agree?

This does raise an interesting point: maybe we'll eventually want an
__abstract_array__ method that asabstractarray tries calling if
defined, so e.g. if your object isn't itself an array but can be
efficiently converted into a *sparse* array, you have a way to declare
that? I think this is something to file under "worry about later,
after we have the basic infrastructure", but it's not something I'd
thought of before so mentioning here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-09 Thread Nathaniel Smith
On Thu, Mar 8, 2018 at 5:51 PM, Juan Nunez-Iglesias  wrote:
>> Finally for the name, what about `asduckarray`? Thought perhaps that could
>> be a source of confusion, and given the gradation of duck array like types.
>
> I suggest that the name should *not* use programmer lingo, so neither
> "abstract" nor "duck" should be in there. My humble proposal is "arraylike".
> (I know that this term has included things like "list-of-list" before but
> only in text, not code, as far as I know.)

I agree with your point about avoiding programmer lingo. My first
draft actually used 'asduckarray', but that's like an in-joke; it
works fine for us, but it's not really something I want teachers to
have to explain on day 1...

Array-like is problematic too though, because we still need a way to
say "thing that can be coerced to an array", which is what array-like
has been used to mean historically. And with the new type hints stuff,
it is actually becoming code. E.g. what should the type hints here be:

asabstractarray(a: X) -> Y

Right now "X" is "ArrayLike", but if we make "Y" be "ArrayLike" then
we'll need to come up with some other name for "X" :-).

Maybe we can call duck arrays "py arrays", since the idea is that they
implement the standard Python array API (but not necessarily the
C-level array API)? np.PyArray, np.aspyarray()?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-10 Thread Matthew Rocklin
I'm very glad to see this discussion.

I think that coming up with a single definition of array-like may be
difficult, and that we might end up wanting to embrace duck typing instead.

It seems to me that different array-like classes will implement different
mixtures of features.  It may be difficult to pin down a single definition
that includes anything except for the most basic attributes (shape and
dtype?).  Consider two extreme cases of restrictive functionality:

   1. LinearOperators (support dot in a numpy-like way)
   2. Storage objects like h5py (support getitem in a numpy-like way)

I can imagine authors of both groups saying that they should qualify as
array-like because downstream projects that consume them should not convert
them to numpy arrays in important contexts.

The name "duck arrays" that we sometimes use doesn't necessarily mean
"quack like an ndarray" but might actually mean a number of different
things in different contexts.  Making a single class or predicate for duck
arrays may not be as effective as we want.  Instead, it might be that we
need a number of different protocols like `__array_mat_vec__` or
`__array_slice__`
that downstream projects can check instead.  I can imagine cases where I
want to check only "can I use this thing to multiply against arrays" or
"can I get numpy arrays out of this thing with numpy slicing" rather than
"is this thing array-like" because I may genuinely not care about most of
the functionality in a blessed definition of "array-like".

On Fri, Mar 9, 2018 at 8:45 PM, Nathaniel Smith  wrote:

> On Thu, Mar 8, 2018 at 5:51 PM, Juan Nunez-Iglesias 
> wrote:
> >> Finally for the name, what about `asduckarray`? Thought perhaps that
> could
> >> be a source of confusion, and given the gradation of duck array like
> types.
> >
> > I suggest that the name should *not* use programmer lingo, so neither
> > "abstract" nor "duck" should be in there. My humble proposal is
> "arraylike".
> > (I know that this term has included things like "list-of-list" before but
> > only in text, not code, as far as I know.)
>
> I agree with your point about avoiding programmer lingo. My first
> draft actually used 'asduckarray', but that's like an in-joke; it
> works fine for us, but it's not really something I want teachers to
> have to explain on day 1...
>
> Array-like is problematic too though, because we still need a way to
> say "thing that can be coerced to an array", which is what array-like
> has been used to mean historically. And with the new type hints stuff,
> it is actually becoming code. E.g. what should the type hints here be:
>
> asabstractarray(a: X) -> Y
>
> Right now "X" is "ArrayLike", but if we make "Y" be "ArrayLike" then
> we'll need to come up with some other name for "X" :-).
>
> Maybe we can call duck arrays "py arrays", since the idea is that they
> implement the standard Python array API (but not necessarily the
> C-level array API)? np.PyArray, np.aspyarray()?
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-10 Thread Chris Barker
On Sat, Mar 10, 2018 at 1:27 PM, Matthew Rocklin  wrote:

> I'm very glad to see this discussion.
>

me too, but


> I think that coming up with a single definition of array-like may be
> difficult, and that we might end up wanting to embrace duck typing instead.
>

exactly -- I think there is a clear line between "uses the numpy memory
layout" and the Python API. But the python API is pretty darn big, and many
"array_ish" classes implement only partvof it, and may even implement some
parts a bit differently. So really hard to have "one" definition, except
"Python API exactly like a ndarray" -- and I'm wondering how useful that is.

It seems to me that different array-like classes will implement different
> mixtures of features.  It may be difficult to pin down a single definition
> that includes anything except for the most basic attributes (shape and
> dtype?).
>

or a minimum set -- but again, how useful??


> Storage objects like h5py (support getitem in a numpy-like way)
>

Exactly -- though I don't know about h5py, but netCDF4 variables supoprt a
useful subst of ndarray, but do "fancy indexing" differently -- so are they
ndarray_ish? -- sorry to coin yet another term :-)


> I can imagine authors of both groups saying that they should qualify as
> array-like because downstream projects that consume them should not convert
> them to numpy arrays in important contexts.
>

indeed. My solution so far is to define my own duck types "asarraylike"
that checks for the actual methods I need:

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/utilities.py

which has:

must_have = ['dtype', 'shape', 'ndim', '__len__', '__getitem__', '
__getattribute__']

def isarraylike(obj):
"""
tests if obj acts enough like an array to be used in gridded.
This should catch netCDF4 variables and numpy arrays, at least, etc.
Note: these won't check if the attributes required actually work right.
"""
for attr in must_have:
if not hasattr(obj, attr):
return False
return True
def asarraylike(obj):
"""
If it satisfies the requirements of pyugrid the object is returned as is.
If not, then numpy's array() will be called on it.
:param obj: The object to check if it's like an array
"""
return obj if isarraylike(obj) else np.array(obj)

It's possible that we could come up with semi-standard "groupings" of
attributes to produce "levels" of compatibility, or maybe not levels, but
independentgroupings, so you could specify which groupings you need in this
instance.


> The name "duck arrays" that we sometimes use doesn't necessarily mean
> "quack like an ndarray" but might actually mean a number of different
> things in different contexts.  Making a single class or predicate for duck
> arrays may not be as effective as we want.  Instead, it might be that we
> need a number of different protocols like `__array_mat_vec__` or 
> `__array_slice__`
> that downstream projects can check instead.  I can imagine cases where I
> want to check only "can I use this thing to multiply against arrays" or
> "can I get numpy arrays out of this thing with numpy slicing" rather than
> "is this thing array-like" because I may genuinely not care about most of
> the functionality in a blessed definition of "array-like".
>

exactly.

but maybe we won't know until we try.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-10 Thread Marten van Kerkwijk
​I think we don't have to make it sounds like there are *that* many types
of compatibility: really there is just array organisation
(indexing/reshaping) and array arithmetic. These correspond roughly to
ShapedLikeNDArray in astropy and NDArrayOperatorMixin in numpy (missing so
far is concatenation). The advantage of the ABC classes is that they can
supply missing methods (say, size, isscalar, __len__, and ndim given shape;
__iter__ given __getitem__, ravel, squeeze, flatten given reshape; etc.).

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-17 Thread Thomas Caswell
It would be nice if there was an IntEnum [1] that was taken is an input to
`np.asarrayish` and `np.isarrayish` to require a combination of the groups
of attributes/methods/semantics.

Tom

[1] https://docs.python.org/3/library/enum.html#intenum

On Sat, Mar 10, 2018 at 7:14 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
> ​I think we don't have to make it sounds like there are *that* many types
> of compatibility: really there is just array organisation
> (indexing/reshaping) and array arithmetic. These correspond roughly to
> ShapedLikeNDArray in astropy and NDArrayOperatorMixin in numpy (missing so
> far is concatenation). The advantage of the ABC classes is that they can
> supply missing methods (say, size, isscalar, __len__, and ndim given shape;
> __iter__ given __getitem__, ravel, squeeze, flatten given reshape; etc.).
>
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-17 Thread Hameer Abbasi
It would be nice if there was an IntEnum [1] that was taken is an input to
`np.asarrayish` and `np.isarrayish` to require a combination of the groups
of attributes/methods/semantics.


Don’t you mean IntFlag ?
I like Marten’s idea of “grouping together” related functionality via ABCs
and implementing different parts via ABCs (for example, in pydata/sparse we
use NDArrayOperatorsMixin for exactly this), but I believe that separate
ABCs should be provided for different parts of the interface.

Then we can either:

   1. Check with isinstance for the ABCs, or
   2. Check with hasattr.

I like the IntFlag idea most (it seems to be designed for use-cases like
these), but a string-based (np.aspyarray(x,
functionality=‘arithmetic|reductions')) or list-based (np.aspyarray(x,
functionality=[‘arithmetic’, ‘reductions’]) is also fine.

It might help to have some sort of a “dry-run” interface that (given a run
of code) figures out which parts you need.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-17 Thread Thomas Caswell
Yes, meant IntFlag :sheep:

On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi 
wrote:

>
> It would be nice if there was an IntEnum [1] that was taken is an input to
> `np.asarrayish` and `np.isarrayish` to require a combination of the groups
> of attributes/methods/semantics.
>
>
> Don’t you mean IntFlag
> ? I like Marten’s
> idea of “grouping together” related functionality via ABCs and implementing
> different parts via ABCs (for example, in pydata/sparse we use
> NDArrayOperatorsMixin for exactly this), but I believe that separate ABCs
> should be provided for different parts of the interface.
>
> Then we can either:
>
>1. Check with isinstance for the ABCs, or
>2. Check with hasattr.
>
> I like the IntFlag idea most (it seems to be designed for use-cases like
> these), but a string-based (np.aspyarray(x,
> functionality=‘arithmetic|reductions')) or list-based (np.aspyarray(x,
> functionality=[‘arithmetic’, ‘reductions’]) is also fine.
>
> It might help to have some sort of a “dry-run” interface that (given a run
> of code) figures out which parts you need.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-17 Thread Eric Wieser
I would have thought that a simple tuple of types would be more appropriate
than using integer flags, since that means that isinstance can be used on
the individual elements. Ideally there’d be a typing.Intersection[TraitA,
TraitB] for this kind of thing.
​

On Sat, 17 Mar 2018 at 15:10 Thomas Caswell  wrote:

> Yes, meant IntFlag :sheep:
>
> On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi 
> wrote:
>
>>
>> It would be nice if there was an IntEnum [1] that was taken is an input
>> to `np.asarrayish` and `np.isarrayish` to require a combination of the
>> groups of attributes/methods/semantics.
>>
>>
>> Don’t you mean IntFlag
>> ? I like Marten’s
>> idea of “grouping together” related functionality via ABCs and implementing
>> different parts via ABCs (for example, in pydata/sparse we use
>> NDArrayOperatorsMixin for exactly this), but I believe that separate ABCs
>> should be provided for different parts of the interface.
>>
>> Then we can either:
>>
>>1. Check with isinstance for the ABCs, or
>>2. Check with hasattr.
>>
>> I like the IntFlag idea most (it seems to be designed for use-cases like
>> these), but a string-based (np.aspyarray(x,
>> functionality=‘arithmetic|reductions')) or list-based (np.aspyarray(x,
>> functionality=[‘arithmetic’, ‘reductions’]) is also fine.
>>
>> It might help to have some sort of a “dry-run” interface that (given a
>> run of code) figures out which parts you need.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-18 Thread Marten van Kerkwijk
Yes, a tuple of types would make more sense, given `isinstance` --
string abbreviations for those could be there for convenience.
-- Marten


On Sat, Mar 17, 2018 at 8:25 PM, Eric Wieser
 wrote:
> I would have thought that a simple tuple of types would be more appropriate
> than using integer flags, since that means that isinstance can be used on
> the individual elements. Ideally there’d be a typing.Intersection[TraitA,
> TraitB] for this kind of thing.
>
>
> On Sat, 17 Mar 2018 at 15:10 Thomas Caswell  wrote:
>>
>> Yes, meant IntFlag :sheep:
>>
>> On Sat, Mar 17, 2018 at 6:02 PM Hameer Abbasi 
>> wrote:
>>>
>>>
>>> It would be nice if there was an IntEnum [1] that was taken is an input
>>> to `np.asarrayish` and `np.isarrayish` to require a combination of the
>>> groups of attributes/methods/semantics.
>>>
>>>
>>> Don’t you mean IntFlag? I like Marten’s idea of “grouping together”
>>> related functionality via ABCs and implementing different parts via ABCs
>>> (for example, in pydata/sparse we use NDArrayOperatorsMixin for exactly
>>> this), but I believe that separate ABCs should be provided for different
>>> parts of the interface.
>>>
>>> Then we can either:
>>>
>>> Check with isinstance for the ABCs, or
>>> Check with hasattr.
>>>
>>> I like the IntFlag idea most (it seems to be designed for use-cases like
>>> these), but a string-based (np.aspyarray(x,
>>> functionality=‘arithmetic|reductions')) or list-based (np.aspyarray(x,
>>> functionality=[‘arithmetic’, ‘reductions’]) is also fine.
>>>
>>> It might help to have some sort of a “dry-run” interface that (given a
>>> run of code) figures out which parts you need.
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-22 Thread Nathaniel Smith
On Sat, Mar 10, 2018 at 4:27 AM, Matthew Rocklin  wrote:
> I'm very glad to see this discussion.
>
> I think that coming up with a single definition of array-like may be
> difficult, and that we might end up wanting to embrace duck typing instead.
>
> It seems to me that different array-like classes will implement different
> mixtures of features.  It may be difficult to pin down a single definition
> that includes anything except for the most basic attributes (shape and
> dtype?).  Consider two extreme cases of restrictive functionality:
>
> LinearOperators (support dot in a numpy-like way)
> Storage objects like h5py (support getitem in a numpy-like way)
>
> I can imagine authors of both groups saying that they should qualify as
> array-like because downstream projects that consume them should not convert
> them to numpy arrays in important contexts.

I think this is an important point -- there are a lot of subtleties in
the interfaces that different objects might want to provide. Some
interesting ones that haven't been mentioned:

- a "duck array" that has everything except fancy indexing
- xarray's arrays are just like numpy arrays in most ways, but they
have incompatible broadcasting semantics
- immutable vs. mutable arrays

When faced with this kind of situation, always it's tempting to try to
write down some classification system to capture every possible
configuration of interesting behavior. In fact, this is one of the
most classic nerd snipes; it's been catching people for literally
thousands of years [1]. Most of these attempts fail though :-).

So let's back up -- I probably erred in not making this more clear in
the NEP, but I actually have a fairly concrete use case in mind here.
What happened is, I started working on a NEP for
__array_concatenate__, and my thought pattern went as follows:

1) Cool, this should work for np.concatenate.
2) But what about all the other variants, like np.row_stack. We don't
want __array_row_stack__; we want to express row_stack in terms of
concatenate.
3) Ok, what's row_stack? It's:
  np.concatenate([np.atleast_2d(arr) for arr in arrs], axis=0)
4) So I need to make atleast_2d work on duck arrays. What's
atleast_2d? It's: asarray + some shape checks and indexing with
newaxis
5) Okay, so I need something atleast_2d can call instead of asarray [2].

And this kind of pattern shows up everywhere inside numpy, e.g. it's
the first thing inside lots of functions in np.linalg b/c they do some
futzing with dtypes and shape before delegating to ufuncs, it's the
first thing the mean() function does b/c it needs to check arr.dtype
before proceeding, etc. etc.

So, we need something we can use in these functions as a first step
towards unlocking the use of duck arrays in general. But we can't
realistically go through each of these functions, make an exact list
of all the operations/attributes it cares about, and then come up with
exactly the right type constraint for it to impose at the top. And
these functions aren't generally going to work on LinearOperators or
h5py datasets anyway.

We also don't want to go through every function in numpy and add new
arguments to control this coercion behavior.

What we can do, at least to start, is to have a mechanism that passes
through objects that aspire to be "complete" duck arrays, like dask
arrays or sparse arrays or astropy's unit arrays, and then if it turns
out that in practice people find uses for finer-grained distinctions,
we can iteratively add those as a second pass. Notice that if a
function starts out requiring a "complete" duck array, and then later
relaxes that to accept "partial" duck arrays, that's actually
increasing the domain of objects that it can act on, so it's a
backwards-compatible change that we can do later.

So I think we should start out with a concept of "duck array" that's
fairly strong but a bit vague on the exact details (e.g.,
dask.array.Array is currently missing some weird things like arr.ptp()
and arr.tolist(), I guess because no-one has ever noticed or cared?).



Thinking things through like this, I also realized that this proposal
jumps through hoops to avoid changing np.asarray itself, because I was
nervous about changing the rule that its output is always an
ndarray... but actually, this is currently the rule for most functions
in numpy, and the whole point of this proposal is to relax that rule
for most functions, in cases where the user is explicitly passing in a
duck-array object. So maybe I'm being overparanoid? I'm genuinely
unsure here.

Instead of messing about with ABCs, an alternative mechanism would be
to add a new method __arrayish__ (hat tip to Tom Caswell for the name
:-)), that essentially acts as an override for Python-level calls to
np.array / np.asarray, in much the same way that __array_ufunc__
overrides ufuncs, etc. (C level calls to PyArray_FromAny and similar
would of course continue to return ndarray objects, and I assume we'd
add some argument like require_nda

Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-22 Thread Hameer Abbasi
I think that with your comments in mind, it may just be best to embrace
duck typing, like Matthew suggested. I propose the following workflow:

   - __array_concatenate__ and similar "protocol" functions return
   NotImplemented if they won't work.
   - "Base functions" that can be called directly like __getitem__ raise
   NotImplementedError if they won't work.
   - __arrayish__ = True

Then, something like np.concatenate would do the following:

   - Call __array_concatenate__ following the same order as ufunc arguments.
   - If everything fails, raise NotImplementedError (or convert everything
   to ndarray).

Overloaded functions would do something like this (perhaps a simple
decorator will do for the repetitive work?):

   - Try with np.arrayish
   - Catch NotImplementedError
  - Try with np.array

Then, we use abstract classes just to overload functionality or implement
things in terms of others. If something fails, we have a decent fallback.
We don't need to do anything special in order to "check" functionality.

Feel free to propose changes, but this is the best I could come up with
that would require the smallest incremental changes to Numpy while also
supporting everything right from the start.

On Thu, Mar 22, 2018 at 9:14 AM, Nathaniel Smith  wrote:

> On Sat, Mar 10, 2018 at 4:27 AM, Matthew Rocklin 
> wrote:
> > I'm very glad to see this discussion.
> >
> > I think that coming up with a single definition of array-like may be
> > difficult, and that we might end up wanting to embrace duck typing
> instead.
> >
> > It seems to me that different array-like classes will implement different
> > mixtures of features.  It may be difficult to pin down a single
> definition
> > that includes anything except for the most basic attributes (shape and
> > dtype?).  Consider two extreme cases of restrictive functionality:
> >
> > LinearOperators (support dot in a numpy-like way)
> > Storage objects like h5py (support getitem in a numpy-like way)
> >
> > I can imagine authors of both groups saying that they should qualify as
> > array-like because downstream projects that consume them should not
> convert
> > them to numpy arrays in important contexts.
>
> I think this is an important point -- there are a lot of subtleties in
> the interfaces that different objects might want to provide. Some
> interesting ones that haven't been mentioned:
>
> - a "duck array" that has everything except fancy indexing
> - xarray's arrays are just like numpy arrays in most ways, but they
> have incompatible broadcasting semantics
> - immutable vs. mutable arrays
>
> When faced with this kind of situation, always it's tempting to try to
> write down some classification system to capture every possible
> configuration of interesting behavior. In fact, this is one of the
> most classic nerd snipes; it's been catching people for literally
> thousands of years [1]. Most of these attempts fail though :-).
>
> So let's back up -- I probably erred in not making this more clear in
> the NEP, but I actually have a fairly concrete use case in mind here.
> What happened is, I started working on a NEP for
> __array_concatenate__, and my thought pattern went as follows:
>
> 1) Cool, this should work for np.concatenate.
> 2) But what about all the other variants, like np.row_stack. We don't
> want __array_row_stack__; we want to express row_stack in terms of
> concatenate.
> 3) Ok, what's row_stack? It's:
>   np.concatenate([np.atleast_2d(arr) for arr in arrs], axis=0)
> 4) So I need to make atleast_2d work on duck arrays. What's
> atleast_2d? It's: asarray + some shape checks and indexing with
> newaxis
> 5) Okay, so I need something atleast_2d can call instead of asarray [2].
>
> And this kind of pattern shows up everywhere inside numpy, e.g. it's
> the first thing inside lots of functions in np.linalg b/c they do some
> futzing with dtypes and shape before delegating to ufuncs, it's the
> first thing the mean() function does b/c it needs to check arr.dtype
> before proceeding, etc. etc.
>
> So, we need something we can use in these functions as a first step
> towards unlocking the use of duck arrays in general. But we can't
> realistically go through each of these functions, make an exact list
> of all the operations/attributes it cares about, and then come up with
> exactly the right type constraint for it to impose at the top. And
> these functions aren't generally going to work on LinearOperators or
> h5py datasets anyway.
>
> We also don't want to go through every function in numpy and add new
> arguments to control this coercion behavior.
>
> What we can do, at least to start, is to have a mechanism that passes
> through objects that aspire to be "complete" duck arrays, like dask
> arrays or sparse arrays or astropy's unit arrays, and then if it turns
> out that in practice people find uses for finer-grained distinctions,
> we can iteratively add those as a second pass. Notice that if a
> function starts out requiring a "comple