On 2018/11/10 12:39 PM, Stephan Hoyer wrote:
On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi <einstein.edi...@gmail.com
<mailto:einstein.edi...@gmail.com>> wrote:
To summarize, I think these are our options:
1. Change the behavior of np.anyarray() to check for an
__anyarray__() protocol. Change np.matrix.__anyarray__() to
return a base numpy array (this is a minor backwards
compatibility break, but probably for the best). Start issuing a
FutureWarning for any MaskedArray operations that violate Liskov
and add a skipna argument that in the future will default to
skipna=False.
2. Introduce a new coercion function, e.g., np.duckarray(). This
is the easiest option because we don't need to cleanup NumPy's
existing ndarray subclasses.
My vote is still for 1. I don’t have an issue for PyData/Sparse
depending on recent-ish NumPy versions — It’ll need a lot of the
recent protocols anyway, although I could be convinced otherwise if
major package devs (scikits, SciPy, Dask) were to weigh in and say
they’ll jump on it (which seems unlikely given SciPy’s policy to
support old NumPy versions).
I agree that option (1) is fine for PyData/sparse. The bigger issue is
that this change should be conditional on making breaking changes (at
least raising FutureWarning for now) to np.ma.MaskedArray.
I don't know how people who currently use MaskedArray would feel about
that. I would love to hear their thoughts.
Thank you. I am a user of masked arrays, and have been since pre-numpy
days. I introduced their extensive use in matplotlib long ago. I have
been a bit concerned, indeed, that all of the discussion of modifying
masked arrays seems to be by people who don't actually use them
explicitly (though they might be using them without knowing it via
internal operations in matplotlib, or they might be quickly getting rid
of them after they are yielded by netCDF4.Dataset()).
I think that those of us who do use masked arrays recognize that they
are not perfect; they have some quirks and gotchas, and one has to be
careful to use numpy.ma functions instead of numpy functions in most
cases. But we use them because they have real advantages over the
alternatives, which are using nans and/or manually tracking independent
masks throughout calculations. These advantages are largely because
masked values *don't* behave like nan, *don't* propagate. This is
fundamental to the design, and motivated by real-life use cases.
The proposal to add a skipna kwarg to MaskedArray looks to me like it is
giving purity priority over practicality. It will force ma users to
insert skipna kwargs all over the place--because the default will be
contrary to the primary purposes of using masked arrays, in most cases.
How many people will it actually benefit? How many people are being
bitten, and how badly, by masked array behavior?
If there were a prospect of truly integrating missing/masked value
handling into numpy, simplifying or phasing out numpy.ma, I would be
delighted--I think it is the biggest single fundamental improvement that
could be made, from the user's standpoint. I was sad to see Mark
Wiebe's work in that direction come to grief.
If there are ways of gradually improving numpy.ma and its
interoperability with the rest of numpy and with the proliferation of
duck arrays, I'm all in favor--so long as they don't effectively wreck
numpy.ma for its present intended purposes.
Eric
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion