Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-11 Thread Stephan Hoyer
On Sat, Nov 10, 2018 at 10:45 PM Eric Firing  wrote:

> On 2018/11/10 12:39 PM, Stephan Hoyer wrote:
> > On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi  > > wrote:
> >
> > To summarize, I think these are our options:
> >
> > 1. Change the behavior of np.anyarray() to check for an
> > __anyarray__() protocol. Change np.matrix.__anyarray__() to
> > return a base numpy array (this is a minor backwards
> > compatibility break, but probably for the best). Start issuing a
> > FutureWarning for any MaskedArray operations that violate Liskov
> > and add a skipna argument that in the future will default to
> > skipna=False.
> >
> > 2. Introduce a new coercion function, e.g., np.duckarray(). This
> > is the easiest option because we don't need to cleanup NumPy's
> > existing ndarray subclasses.
> >
> >
> > My vote is still for 1. I don’t have an issue for PyData/Sparse
> > depending on recent-ish NumPy versions — It’ll need a lot of the
> > recent protocols anyway, although I could be convinced otherwise if
> > major package devs (scikits, SciPy, Dask) were to weigh in and say
> > they’ll jump on it (which seems unlikely given SciPy’s policy to
> > support old NumPy versions).
> >
> >
> > I agree that option (1) is fine for PyData/sparse. The bigger issue is
> > that this change should be conditional on making breaking changes (at
> > least raising FutureWarning for now) to np.ma.MaskedArray.
> >
> > I don't know how people who currently use MaskedArray would feel about
> > that. I would love to hear their thoughts.
>
> Thank you.  I am a user of masked arrays, and have been since pre-numpy
> days.  I introduced their extensive use in matplotlib long ago.  I have
> been a bit concerned, indeed, that all of the discussion of modifying
> masked arrays seems to be by people who don't actually use them
> explicitly (though they might be using them without knowing it via
> internal operations in matplotlib, or they might be quickly getting rid
> of them after they are yielded by netCDF4.Dataset()).
>
> I think that those of us who do use masked arrays recognize that they
> are not perfect; they have some quirks and gotchas, and one has to be
> careful to use numpy.ma functions instead of numpy functions in most
> cases.  But we use them because they have real advantages over the
> alternatives, which are using nans and/or manually tracking independent
> masks throughout calculations.  These advantages are largely because
> masked values *don't* behave like nan, *don't* propagate.  This is
> fundamental to the design, and motivated by real-life use cases.
>
> The proposal to add a skipna kwarg to MaskedArray looks to me like it is
> giving purity priority over practicality.  It will force ma users to
> insert skipna kwargs all over the place--because the default will be
> contrary to the primary purposes of using masked arrays, in most cases.
> How many people will it actually benefit?  How many people are being
> bitten, and how badly, by masked array behavior?
>
> If there were a prospect of truly integrating missing/masked value
> handling into numpy, simplifying or phasing out numpy.ma, I would be
> delighted--I think it is the biggest single fundamental improvement that
> could be made, from the user's standpoint.  I was sad to see Mark
> Wiebe's work in that direction come to grief.
>
> If there are ways of gradually improving numpy.ma and its
> interoperability with the rest of numpy and with the proliferation of
> duck arrays, I'm all in favor--so long as they don't effectively wreck
> numpy.ma for its present intended purposes.


Eric -- thank you for sharing your perspective! I guess it should not be
surprising that the semantics of MaskedArray intentionally deviate from the
semantics of base NumPy arrays.

This deviation is fortunately less severe than than deviations in the
behavior of np.matrix, but it still presents some difficulties for duck
typing. We're in a position to reduce (but still not eliminate) these
differences with new protocols like __array_function__.

I think Nathaniel actually summarized these issues pretty well in NEP 16 (
http://www.numpy.org/neps/nep-0016-abstract-array.html). If we want a
coercion function that guarantees an object is a "full duck array", then it
can't pass on either np.matrix or MaskedArray in their current state.
Anything less than full compatibility provides a shaky foundation for use
in downstream projects or inside NumPy itself.

In theory (certainly if we were starting from scratch) it would make sense
to make asabstractarray() pass on any ndarray subclass, but this would
require willingness to make breaking changes to both np.matrix and
MaskedArray.

I would suggest adopting a variation of the proposal in NEP 16, except
using a protocol rather an abstract base class per NEP 22, e.g.,

# names still to be determined
def 

Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-11 Thread Marten van Kerkwijk
Hi Eric,

Thanks very much for the detailed response; it is good to be reminded that
`MaskedArray` is used in a package that, indeed, (nearly?) all of us use!

But I do think that those of us who have been trying to change MaskedArray,
are generally good at making sure the tests continue to pass, i.e., that
the behaviour does not change (the main exception in the last few years was
that views should be taken of masks too, not just the data).

I also think that between __array_ufunc__ and __array_function__, it has
become quite easy to ensure that one no longer has to rely on `np.ma`
functions, i.e., that the regular numpy functions will do the right thing.
But it will need work to actually implement that.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Eric Firing

On 2018/11/10 12:39 PM, Stephan Hoyer wrote:
On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi > wrote:


To summarize, I think these are our options:

1. Change the behavior of np.anyarray() to check for an
__anyarray__() protocol. Change np.matrix.__anyarray__() to
return a base numpy array (this is a minor backwards
compatibility break, but probably for the best). Start issuing a
FutureWarning for any MaskedArray operations that violate Liskov
and add a skipna argument that in the future will default to
skipna=False.

2. Introduce a new coercion function, e.g., np.duckarray(). This
is the easiest option because we don't need to cleanup NumPy's
existing ndarray subclasses.


My vote is still for 1. I don’t have an issue for PyData/Sparse
depending on recent-ish NumPy versions — It’ll need a lot of the
recent protocols anyway, although I could be convinced otherwise if
major package devs (scikits, SciPy, Dask) were to weigh in and say
they’ll jump on it (which seems unlikely given SciPy’s policy to
support old NumPy versions).


I agree that option (1) is fine for PyData/sparse. The bigger issue is 
that this change should be conditional on making breaking changes (at 
least raising FutureWarning for now) to np.ma.MaskedArray.


I don't know how people who currently use MaskedArray would feel about 
that. I would love to hear their thoughts.


Thank you.  I am a user of masked arrays, and have been since pre-numpy 
days.  I introduced their extensive use in matplotlib long ago.  I have 
been a bit concerned, indeed, that all of the discussion of modifying 
masked arrays seems to be by people who don't actually use them 
explicitly (though they might be using them without knowing it via 
internal operations in matplotlib, or they might be quickly getting rid 
of them after they are yielded by netCDF4.Dataset()).


I think that those of us who do use masked arrays recognize that they 
are not perfect; they have some quirks and gotchas, and one has to be 
careful to use numpy.ma functions instead of numpy functions in most 
cases.  But we use them because they have real advantages over the 
alternatives, which are using nans and/or manually tracking independent 
masks throughout calculations.  These advantages are largely because 
masked values *don't* behave like nan, *don't* propagate.  This is 
fundamental to the design, and motivated by real-life use cases.


The proposal to add a skipna kwarg to MaskedArray looks to me like it is 
giving purity priority over practicality.  It will force ma users to 
insert skipna kwargs all over the place--because the default will be 
contrary to the primary purposes of using masked arrays, in most cases. 
How many people will it actually benefit?  How many people are being 
bitten, and how badly, by masked array behavior?


If there were a prospect of truly integrating missing/masked value 
handling into numpy, simplifying or phasing out numpy.ma, I would be 
delighted--I think it is the biggest single fundamental improvement that 
could be made, from the user's standpoint.  I was sad to see Mark 
Wiebe's work in that direction come to grief.


If there are ways of gradually improving numpy.ma and its 
interoperability with the rest of numpy and with the proliferation of 
duck arrays, I'm all in favor--so long as they don't effectively wreck 
numpy.ma for its present intended purposes.


Eric



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Charles R Harris
On Sat, Nov 10, 2018 at 2:15 PM Eric Wieser 
wrote:

> > If the only way MaskedArray violates Liskov is in terms of NA skipping
> aggregations by default, then this might be viable
>
> One of the ways to fix these liskov substitution problems is just to
> introduce more base classes - for instance, if we had an `NDContainer` base
> class with only slicing support, then masked arrays would be an exact
> liskov substitution, but np.matrix would not.
>
> Eric
>

I've had the same thought and wouldn't be surprised if others have
considered that possibility. Travis would be a good guy to ask about that.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Marten van Kerkwijk
On Sat, Nov 10, 2018 at 5:39 PM Stephan Hoyer  wrote:

> On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi 
> wrote:
>
>> To summarize, I think these are our options:
>>
>> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
>> protocol. Change np.matrix.__anyarray__() to return a base numpy array
>> (this is a minor backwards compatibility break, but probably for the best).
>> Start issuing a FutureWarning for any MaskedArray operations that violate
>> Liskov and add a skipna argument that in the future will default to
>> skipna=False.
>>
>> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
>> easiest option because we don't need to cleanup NumPy's existing ndarray
>> subclasses.
>>
>>
>> My vote is still for 1. I don’t have an issue for PyData/Sparse depending
>> on recent-ish NumPy versions — It’ll need a lot of the recent protocols
>> anyway, although I could be convinced otherwise if major package devs
>> (scikits, SciPy, Dask) were to weigh in and say they’ll jump on it (which
>> seems unlikely given SciPy’s policy to support old NumPy versions).
>>
>
> I agree that option (1) is fine for PyData/sparse. The bigger issue is
> that this change should be conditional on making breaking changes (at least
> raising FutureWarning for now) to np.ma.MaskedArray.
>

Might be good to try before worrying too much - MaskedArray already
overrides *a lot*; it is not at all obvious to me that things wouldn't
"just work" if we bulk-replaced `asarray` with `asanyarray`.  And with
`__array_function__` we now have the option to fix code paths that do not
work immediately.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Stephan Hoyer
On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi 
wrote:

> To summarize, I think these are our options:
>
> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
> protocol. Change np.matrix.__anyarray__() to return a base numpy array
> (this is a minor backwards compatibility break, but probably for the best).
> Start issuing a FutureWarning for any MaskedArray operations that violate
> Liskov and add a skipna argument that in the future will default to
> skipna=False.
>
> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
> easiest option because we don't need to cleanup NumPy's existing ndarray
> subclasses.
>
>
> My vote is still for 1. I don’t have an issue for PyData/Sparse depending
> on recent-ish NumPy versions — It’ll need a lot of the recent protocols
> anyway, although I could be convinced otherwise if major package devs
> (scikits, SciPy, Dask) were to weigh in and say they’ll jump on it (which
> seems unlikely given SciPy’s policy to support old NumPy versions).
>

I agree that option (1) is fine for PyData/sparse. The bigger issue is that
this change should be conditional on making breaking changes (at least
raising FutureWarning for now) to np.ma.MaskedArray.

I don't know how people who currently use MaskedArray would feel about
that. I would love to hear their thoughts.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Hameer Abbasi

> On Saturday, Nov 10, 2018 at 9:16 PM, Stephan Hoyer  (mailto:sho...@gmail.com)> wrote:
> On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk 
> mailto:m.h.vankerkw...@gmail.com)> wrote:
> > Hi Hameer,
> >
> > I do not think we should change `asanyarray` itself to special-case matrix; 
> > rather, we could start converting `asarray` to `asanyarray` and solve the 
> > problems that produces for matrices in `matrix` itself (e.g., by overriding 
> > the relevant function with `__array_function__`).
> >
> > I think the idea of providing an `__anyarray__` method (in analogy with 
> > `__array__`) might work. Indeed, the default in `ndarray` (and thus all its 
> > subclasses) could be to let it return `self` and to override it for 
> > `matrix` to return an ndarray view.
>
> Yes, we certainly would rather implement a matrix.__anyarray__ method (if 
> we're already doing a new protocol) rather than special case np.matrix 
> explicitly.
>
> Unfortunately, per Nathaniel's comments about NA skipping behavior, it seems 
> like we will also need MaskedArray.__anyarray__ to return something other 
> than itself. In principle, we should probably write new version of 
> MaskedArray that doesn't deviate from ndarray semantics, but that's a rather 
> large project (we'd also probably want to stop subclassing ndarray).
>
> Changing the default aggregation behavior for the existing MaskedArray is 
> also an option but that would be a serious annoyance to users and backwards 
> compatibility break. If the only way MaskedArray violates Liskov is in terms 
> of NA skipping aggregations by default, then this might be viable. In 
> practice, this would require adding an explicit skipna argument so 
> FutureWarnings could be silenced. The plus side of this option is that it 
> would make it easier to use np.anyarray() or any new coercion function 
> throughout the internal NumPy code base.
>
> To summarize, I think these are our options:
> 1. Change the behavior of np.anyarray() to check for an __anyarray__() 
> protocol. Change np.matrix.__anyarray__() to return a base numpy array (this 
> is a minor backwards compatibility break, but probably for the best). Start 
> issuing a FutureWarning for any MaskedArray operations that violate Liskov 
> and add a skipna argument that in the future will default to skipna=False.
>
>
>
>
>

> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the 
> easiest option because we don't need to cleanup NumPy's existing ndarray 
> subclasses.
>
>
>
>
>

My vote is still for 1. I don’t have an issue for PyData/Sparse depending on 
recent-ish NumPy versions — It’ll need a lot of the recent protocols anyway, 
although I could be convinced otherwise if major package devs (scikits, SciPy, 
Dask) were to weigh in and say they’ll jump on it (which seems unlikely given 
SciPy’s policy to support old NumPy versions).

>
>
> P.S. I'm just glad pandas stopped subclassing ndarray a while ago -- there's 
> no way pandas.Series() could be fixed up to not violate Liskov :). 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Eric Wieser
> If the only way MaskedArray violates Liskov is in terms of NA skipping
aggregations by default, then this might be viable

One of the ways to fix these liskov substitution problems is just to
introduce more base classes - for instance, if we had an `NDContainer` base
class with only slicing support, then masked arrays would be an exact
liskov substitution, but np.matrix would not.

Eric

On Sat, 10 Nov 2018 at 12:17 Stephan Hoyer  wrote:

> On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Hameer,
>>
>> I do not think we should change `asanyarray` itself to special-case
>> matrix; rather, we could start converting `asarray` to `asanyarray` and
>> solve the problems that produces for matrices in `matrix` itself (e.g., by
>> overriding the relevant function with `__array_function__`).
>>
>> I think the idea of providing an `__anyarray__` method (in analogy with
>> `__array__`) might work. Indeed, the default in `ndarray` (and thus all its
>> subclasses) could be to let it return `self`  and to override it for
>> `matrix` to return an ndarray view.
>>
>
> Yes, we certainly would rather implement a matrix.__anyarray__ method (if
> we're already doing a new protocol) rather than special case np.matrix
> explicitly.
>
> Unfortunately, per Nathaniel's comments about NA skipping behavior, it
> seems like we will also need MaskedArray.__anyarray__ to return something
> other than itself. In principle, we should probably write new version of
> MaskedArray that doesn't deviate from ndarray semantics, but that's a
> rather large project (we'd also probably want to stop subclassing ndarray).
>
> Changing the default aggregation behavior for the existing MaskedArray is
> also an option but that would be a serious annoyance to users and backwards
> compatibility break. If the only way MaskedArray violates Liskov is in
> terms of NA skipping aggregations by default, then this might be viable. In
> practice, this would require adding an explicit skipna argument so
> FutureWarnings could be silenced. The plus side of this option is that it
> would make it easier to use np.anyarray() or any new coercion function
> throughout the internal NumPy code base.
>
> To summarize, I think these are our options:
> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
> protocol. Change np.matrix.__anyarray__() to return a base numpy array
> (this is a minor backwards compatibility break, but probably for the best).
> Start issuing a FutureWarning for any MaskedArray operations that violate
> Liskov and add a skipna argument that in the future will default to
> skipna=False.
> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
> easiest option because we don't need to cleanup NumPy's existing ndarray
> subclasses.
>
> P.S. I'm just glad pandas stopped subclassing ndarray a while ago --
> there's no way pandas.Series() could be fixed up to not violate Liskov :).
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Stephan Hoyer
On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Hameer,
>
> I do not think we should change `asanyarray` itself to special-case
> matrix; rather, we could start converting `asarray` to `asanyarray` and
> solve the problems that produces for matrices in `matrix` itself (e.g., by
> overriding the relevant function with `__array_function__`).
>
> I think the idea of providing an `__anyarray__` method (in analogy with
> `__array__`) might work. Indeed, the default in `ndarray` (and thus all its
> subclasses) could be to let it return `self`  and to override it for
> `matrix` to return an ndarray view.
>

Yes, we certainly would rather implement a matrix.__anyarray__ method (if
we're already doing a new protocol) rather than special case np.matrix
explicitly.

Unfortunately, per Nathaniel's comments about NA skipping behavior, it
seems like we will also need MaskedArray.__anyarray__ to return something
other than itself. In principle, we should probably write new version of
MaskedArray that doesn't deviate from ndarray semantics, but that's a
rather large project (we'd also probably want to stop subclassing ndarray).

Changing the default aggregation behavior for the existing MaskedArray is
also an option but that would be a serious annoyance to users and backwards
compatibility break. If the only way MaskedArray violates Liskov is in
terms of NA skipping aggregations by default, then this might be viable. In
practice, this would require adding an explicit skipna argument so
FutureWarnings could be silenced. The plus side of this option is that it
would make it easier to use np.anyarray() or any new coercion function
throughout the internal NumPy code base.

To summarize, I think these are our options:
1. Change the behavior of np.anyarray() to check for an __anyarray__()
protocol. Change np.matrix.__anyarray__() to return a base numpy array
(this is a minor backwards compatibility break, but probably for the best).
Start issuing a FutureWarning for any MaskedArray operations that violate
Liskov and add a skipna argument that in the future will default to
skipna=False.
2. Introduce a new coercion function, e.g., np.duckarray(). This is the
easiest option because we don't need to cleanup NumPy's existing ndarray
subclasses.

P.S. I'm just glad pandas stopped subclassing ndarray a while ago --
there's no way pandas.Series() could be fixed up to not violate Liskov :).
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Marten van Kerkwijk
Hi Hameer,

I do not think we should change `asanyarray` itself to special-case matrix;
rather, we could start converting `asarray` to `asanyarray` and solve the
problems that produces for matrices in `matrix` itself (e.g., by overriding
the relevant function with `__array_function__`).

I think the idea of providing an `__anyarray__` method (in analogy with
`__array__`) might work. Indeed, the default in `ndarray` (and thus all its
subclasses) could be to let it return `self`  and to override it for
`matrix` to return an ndarray view.

All the best,

Marten

p.s. Note that we are already giving PendingDeprecationWarning for matrix;
https://github.com/numpy/numpy/pull/10142.



On Sat, Nov 10, 2018 at 11:02 AM Matti Picus  wrote:

> On 9/11/18 5:09 pm, Nathaniel Smith wrote:
> > On Fri, Nov 9, 2018 at 4:59 PM, Stephan Hoyer  wrote:
> >> On Fri, Nov 9, 2018 at 6:46 PM Nathaniel Smith  wrote:
> >>> But matrix isn't the only problem with asanyarray. np.ma also violates
> >>> Liskov. No doubt there are other problematic ndarray subclasses out
> >>> there too...
> >>>
> >>>
> >>> Please forgive my ignorance (I don't really use mask arrays), but how
> >>> specifically do masked arrays violate Liskov? In most cases shouldn't
> they
> >>> work the same as base numpy arrays, except with operations keeping
> track of
> >>> masks?
> > Since many operations silently skip over masked values, the
> > computation semantics are different. For example, in a regular array,
> > sum()/size() == mean(), but with a masked array these are totally
> > different operations. So if you have code that was written for regular
> > arrays, but pass in a masked array, there's a solid chance that it
> > will silently return nonsensical results.
> >
> > (This is why it's better for NAs to propagate by default.)
> >
> > -n
>
>
> Echos of the discussions in neps 12, 24, 25, 26. http://www.numpy.org/neps
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Matti Picus

On 9/11/18 5:09 pm, Nathaniel Smith wrote:

On Fri, Nov 9, 2018 at 4:59 PM, Stephan Hoyer  wrote:

On Fri, Nov 9, 2018 at 6:46 PM Nathaniel Smith  wrote:

But matrix isn't the only problem with asanyarray. np.ma also violates
Liskov. No doubt there are other problematic ndarray subclasses out
there too...


Please forgive my ignorance (I don't really use mask arrays), but how
specifically do masked arrays violate Liskov? In most cases shouldn't they
work the same as base numpy arrays, except with operations keeping track of
masks?

Since many operations silently skip over masked values, the
computation semantics are different. For example, in a regular array,
sum()/size() == mean(), but with a masked array these are totally
different operations. So if you have code that was written for regular
arrays, but pass in a masked array, there's a solid chance that it
will silently return nonsensical results.

(This is why it's better for NAs to propagate by default.)

-n



Echos of the discussions in neps 12, 24, 25, 26. http://www.numpy.org/neps


Matti

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-09 Thread Nathaniel Smith
On Fri, Nov 9, 2018 at 4:59 PM, Stephan Hoyer  wrote:
> On Fri, Nov 9, 2018 at 6:46 PM Nathaniel Smith  wrote:
>>
>> But matrix isn't the only problem with asanyarray. np.ma also violates
>> Liskov. No doubt there are other problematic ndarray subclasses out
>> there too...
>
>
> Please forgive my ignorance (I don't really use mask arrays), but how
> specifically do masked arrays violate Liskov? In most cases shouldn't they
> work the same as base numpy arrays, except with operations keeping track of
> masks?

Since many operations silently skip over masked values, the
computation semantics are different. For example, in a regular array,
sum()/size() == mean(), but with a masked array these are totally
different operations. So if you have code that was written for regular
arrays, but pass in a masked array, there's a solid chance that it
will silently return nonsensical results.

(This is why it's better for NAs to propagate by default.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-09 Thread Stephan Hoyer
On Fri, Nov 9, 2018 at 6:46 PM Nathaniel Smith  wrote:

> But matrix isn't the only problem with asanyarray. np.ma also violates
> Liskov. No doubt there are other problematic ndarray subclasses out
> there too...
>

Please forgive my ignorance (I don't really use mask arrays), but how
specifically do masked arrays violate Liskov? In most cases shouldn't they
work the same as base numpy arrays, except with operations keeping track of
masks?

I'm sure there are some cases where masked arrays have different semantics
than NumPy arrays, but are any of these intentional?

I would guess that the worst current violation is that there is a risk of
losing mask information in some operations, but implementing
__array_function__ would presumably make it possible to fix most of these.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-09 Thread Nathaniel Smith
But matrix isn't the only problem with asanyarray. np.ma also violates
Liskov. No doubt there are other problematic ndarray subclasses out
there too...

If we were going to try to reuse asanyarray through some deprecation
mechanism, I think we'd need to deprecate allowing asanyarray to
return *any* ndarray subclass, unless they explicitly provided an
__asanyarray__ dunder. But at that point I'm not sure what the point
would be of reusing it.

On Fri, Nov 9, 2018 at 7:15 AM, Hameer Abbasi  wrote:
> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on the
> NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi 
> wrote:
>>
>> Hi Stephan,
>>
>> The issue I have with writing another function is that asarray/asanyarray
>> are so widely used that it’d be a huge maintenance task to update them
>> throughout NumPy, not to mention other codebases, not to mention other
>> codebases having to rely on newer NumPy versions for this. In short, it
>> would dramatically reduce adaptability of this function.
>>
>> One path we can take is to allow asarray/asanyarray to be overridable via
>> __array_function__ (the former is debatable). This solves most of our
>> duck-array related issues without introducing another protocol.
>>
>> Regardless of what path we choose, I would recommend changing asanyarray
>> to not pass through np.matrix regardless, instead passing through
>> mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the
>> vast majority of contexts, it’s used to ensure an array-ish structure for
>> another operation, and usually there’s no guarantee that what comes out will
>> be a matrix anyway. I suggest we raise a FutureWarning and then change this
>> behaviour.
>>
>> There have been a number of discussions about deprecating np.matrix (and a
>> few about MaskedArray as well, though there are less compelling reasons for
>> that one). I suggest we start down that path as soon as possible. The
>> biggest (only?) user I know of blocking that is scipy.sparse, and we’re on
>> our way to replacing that with PyData/Sparse.
>>
>> Best Regards,
>> Hameer Abbasi
>>
>> On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer 
>> wrote:
>> Hi Hameer,
>>
>> I'd love to talk about this in more detail. I agree that something like
>> this is needed.
>>
>> The challenge with reusing an existing function like asanyarray() is that
>> there is at least one (somewhat?) widely used ndarray subclass that badly
>> violates the Liskov Substitution Principle: np.matrix.
>>
>> NumPy can't really use np.asanyarray() widely for internal purposes until
>> we don't have to worry about np matrix. We might special case np.matrix in
>> some way, but then asanyarray() would do totally opposite things on
>> different versions of NumPy. It's almost certainly a better idea to just
>> write a new function with the desired semantics, and "soft deprecate"
>> asanyarray(). The new function can explicitly black list np.matrix, as well
>> as any other subclasses we know of that badly violate LSP.
>>
>> Cheers,
>> Stephan
>> On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi 
>> wrote:
>>>
>>> No, Stefan, I’ll do that now. Putting you in the cc.
>>>
>>> It slipped my mind among the million other things I had in mind — Namely:
>>> My job visa. It was only done this Monday.
>>>
>>> Hi, Marten, Stephan:
>>>
>>> Stefan wants me to write up a NEP that allows a given object to specify
>>> that it is a duck array — Namely, that it follows duck-array semantics.
>>>
>>> We were thinking of switching asanyarray to switch to passing through
>>> anything that implements the duck-array protocol along with ndarray
>>> subclasses. I’m sure this would help XArray and Quantity work better with
>>> existing codebases, along with PyData/Sparse arrays.
>>>
>>> Would you be interested?
>>>
>>> Best Regards,
>>> Hameer Abbasi
>>>
>>> On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt
>>>  wrote:
>>> Hi Hameer,
>>>
>>> In last week's meeting, we had the following in the notes:
>>>
>>> Hameer is contacting Marten & Stephan and write up a draft NEP for
>>> clarifying the asarray/asanyarray and matrix/subclass path forward.
>>>
>>>
>>> Did any of that happen that you could share?
>>>
>>> Thanks and best regards,
>>> Stéfan
>
>
> Hello, everyone,
>
> Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having a
> discussion about the state of matrix, asarray and asanyarray. Our thoughts
> are summarised above (in the quoted text that I’m forwarding)
>
> Basically, this grew out of a discussion relating to asanyarray/asarray
> inconsistencies in NumPy about which to use where. Historically, asarray was
> used in many libraries/places instead of asanyarray usually because
> np.matrix caused problems due to its special 

Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-09 Thread Stephan Hoyer
I’m still not sure I agree with the advantages of reusing asanyarray(),
even if matrix did not exist. Yes, asanyarray will exist in old NumPy
versions, but you can’t use it with sparse arrays anyways because it will
have the wrong semantics. I expect this would be a bug magnet, with
inadvertent loading of sparse arrays into memory if you’re accidentally
using old NumPy.

With regards to the protocol, I would suggest a dedicated method, e.g.,
__asanyarray__ (or something similar based on the final chosen name of the
function). Coercing to arrays is special enough to have its own dedicated
protocol, and it could be useful for libraries like xarray to check for
__asanyarray__ attributes before deciding which coercion mechanism to use.
On Fri, Nov 9, 2018 at 10:17 AM Hameer Abbasi 
wrote:

> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on
> the NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi 
> wrote:
>
>> Hi Stephan,
>>
>> The issue I have with writing another function is that asarray/asanyarray
>> are so widely used that it’d be a huge maintenance task to update them
>> throughout NumPy, not to mention other codebases, not to mention other
>> codebases having to rely on newer NumPy versions for this. In short, it
>> would dramatically reduce adaptability of this function.
>>
>> One path we can take is to allow asarray/asanyarray to be overridable via
>> __array_function__ (the former is debatable). This solves most of our
>> duck-array related issues without introducing another protocol.
>>
>> Regardless of what path we choose, I would recommend changing asanyarray
>> to not pass through np.matrix regardless, instead passing through
>> mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the
>> vast majority of contexts, it’s used to ensure an array-ish structure for
>> another operation, and usually there’s no guarantee that what comes out
>> will be a matrix anyway. I suggest we raise a FutureWarning and then change
>> this behaviour.
>>
>> There have been a number of discussions about deprecating np.matrix (and
>> a few about MaskedArray as well, though there are less compelling reasons
>> for that one). I suggest we start down that path as soon as possible. The
>> biggest (only?) user I know of blocking that is scipy.sparse, and we’re on
>> our way to replacing that with PyData/Sparse.
>>
>> Best Regards,
>> Hameer Abbasi
>>
>> On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer 
>> wrote:
>> Hi Hameer,
>>
>> I'd love to talk about this in more detail. I agree that something like
>> this is needed.
>>
>> The challenge with reusing an existing function like asanyarray() is that
>> there is at least one (somewhat?) widely used ndarray subclass that badly
>> violates the Liskov Substitution Principle: np.matrix.
>>
>> NumPy can't really use np.asanyarray() widely for internal purposes until
>> we don't have to worry about np matrix. We might special case np.matrix in
>> some way, but then asanyarray() would do totally opposite things on
>> different versions of NumPy. It's almost certainly a better idea to just
>> write a new function with the desired semantics, and "soft deprecate"
>> asanyarray(). The new function can explicitly black list np.matrix, as well
>> as any other subclasses we know of that badly violate LSP.
>>
>> Cheers,
>> Stephan
>> On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi 
>> wrote:
>>
>>> No, Stefan, I’ll do that now. Putting you in the cc.
>>>
>>> It slipped my mind among the million other things I had in mind —
>>> Namely: My job visa. It was only done this Monday.
>>>
>>> Hi, Marten, Stephan:
>>>
>>> Stefan wants me to write up a NEP that allows a given object to specify
>>> that it is a duck array — Namely, that it follows duck-array semantics.
>>>
>>> We were thinking of switching asanyarray to switch to passing through
>>> anything that implements the duck-array protocol along with ndarray
>>> subclasses. I’m sure this would help XArray and Quantity work better with
>>> existing codebases, along with PyData/Sparse arrays.
>>>
>>> Would you be interested?
>>>
>>> Best Regards,
>>> Hameer Abbasi
>>>
>>> On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt <
>>> stef...@berkeley.edu> wrote:
>>> Hi Hameer,
>>>
>>> In last week's meeting, we had the following in the notes:
>>>
>>> Hameer is contacting Marten & Stephan and write up a draft NEP for
>>> clarifying the asarray/asanyarray and matrix/subclass path forward.
>>>
>>>
>>> Did any of that happen that you could share?
>>>
>>> Thanks and best regards,
>>> Stéfan
>>>
>>>
> Hello, everyone,
>
> Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having
> a discussion about the state of matrix, asarray and asanyarray. Our
> thoughts are summarised 

[Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-09 Thread Hameer Abbasi

> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on the 
> NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi  (mailto:einstein.edi...@gmail.com)> wrote:
> > Hi Stephan,
> >
> > The issue I have with writing another function is that asarray/asanyarray 
> > are so widely used that it’d be a huge maintenance task to update them 
> > throughout NumPy, not to mention other codebases, not to mention other 
> > codebases having to rely on newer NumPy versions for this. In short, it 
> > would dramatically reduce adaptability of this function.
> >
> > One path we can take is to allow asarray/asanyarray to be overridable via 
> > __array_function__ (the former is debatable). This solves most of our 
> > duck-array related issues without introducing another protocol.
> >
> > Regardless of what path we choose, I would recommend changing asanyarray to 
> > not pass through np.matrix regardless, instead passing through 
> > mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the 
> > vast majority of contexts, it’s used to ensure an array-ish structure for 
> > another operation, and usually there’s no guarantee that what comes out 
> > will be a matrix anyway. I suggest we raise a FutureWarning and then change 
> > this behaviour.
> >
> > There have been a number of discussions about deprecating np.matrix (and a 
> > few about MaskedArray as well, though there are less compelling reasons for 
> > that one). I suggest we start down that path as soon as possible. The 
> > biggest (only?) user I know of blocking that is scipy.sparse, and we’re on 
> > our way to replacing that with PyData/Sparse.
> >
> > Best Regards,
> > Hameer Abbasi
> >
> >
> > > On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer  > > (mailto:sho...@gmail.com)> wrote:
> > > Hi Hameer,
> > >
> > > I'd love to talk about this in more detail. I agree that something like 
> > > this is needed.
> > >
> > > The challenge with reusing an existing function like asanyarray() is that 
> > > there is at least one (somewhat?) widely used ndarray subclass that badly 
> > > violates the Liskov Substitution Principle: np.matrix.
> > >
> > > NumPy can't really use np.asanyarray() widely for internal purposes until 
> > > we don't have to worry about np matrix. We might special case np.matrix 
> > > in some way, but then asanyarray() would do totally opposite things on 
> > > different versions of NumPy. It's almost certainly a better idea to just 
> > > write a new function with the desired semantics, and "soft deprecate" 
> > > asanyarray(). The new function can explicitly black list np.matrix, as 
> > > well as any other subclasses we know of that badly violate LSP.
> > >
> > > Cheers,
> > > Stephan
> > > On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi  > > (mailto:einstein.edi...@gmail.com)> wrote:
> > > > No, Stefan, I’ll do that now. Putting you in the cc.
> > > >
> > > > It slipped my mind among the million other things I had in mind — 
> > > > Namely: My job visa. It was only done this Monday.
> > > >
> > > > Hi, Marten, Stephan:
> > > >
> > > > Stefan wants me to write up a NEP that allows a given object to specify 
> > > > that it is a duck array — Namely, that it follows duck-array semantics.
> > > >
> > > > We were thinking of switching asanyarray to switch to passing through 
> > > > anything that implements the duck-array protocol along with ndarray 
> > > > subclasses. I’m sure this would help XArray and Quantity work better 
> > > > with existing codebases, along with PyData/Sparse arrays.
> > > >
> > > > Would you be interested?
> > > >
> > > > Best Regards,
> > > > Hameer Abbasi
> > > >
> > > >
> > > > > On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt 
> > > > > mailto:stef...@berkeley.edu)> wrote:
> > > > > Hi Hameer,
> > > > >
> > > > > In last week's meeting, we had the following in the notes:
> > > > >
> > > > > > Hameer is contacting Marten & Stephan and write up a draft NEP for
> > > > > > clarifying the asarray/asanyarray and matrix/subclass path forward.
> > > > >
> > > > > Did any of that happen that you could share?
> > > > >
> > > > > Thanks and best regards,
> > > > > Stéfan

Hello, everyone,

Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having a 
discussion about the state of matrix, asarray and asanyarray. Our thoughts are 
summarised above (in the quoted text that I’m forwarding)

Basically, this grew out of a discussion relating to asanyarray/asarray 
inconsistencies in NumPy about which to use where. Historically, asarray was 
used in many libraries/places instead of asanyarray usually because np.matrix 
caused problems due to its special behaviour with regard to indexing (it always 
returns a 2-D object when eliminating one dimension, but a