On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote: > On 05/23/2018 04:02 PM, Eric Firing wrote: > > Bad or missing values (and situations where one wants to use a mask > > to > > operate on a subset of an array) are found in many domains of real > > life; > > do you really want python users in those domains to have to fall > > back on > > Matlab-style reliance on nans and/or manual mask manipulations, as > > the > > new maskedarray package is sidelined? > > I also think that missing value support is important to include > inside > numpy, just as it is included in other numerical packages like R and > Julia. > > The time is ripe to write a new and better MaskedArray, because > __array_ufunc__ exists now. With some other numpy devs a few months > ago > we also played with rewriting MA using __array_ufunc__ and fixing up > all > the bugs and inconsistencies we have discovered over time (eg, > getting > rid of the Masked constant). Both Eric and I started working on some > code changes, but never submitted PRs. See a little bit of discussion > here (there was some more elsewhere I can't find now): > > https://github.com/numpy/numpy/pull/9792#issuecomment-333346420 > > As I say there, numpy's current MA support is pretty poor compared to > R > - Wes McKinney partly justified his desire to move pandas away from > numpy because of it. We have a lot to gain by implementing it nicely. > > We already have an NEP discussing possible ways forward: > https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html > > I was pretty excited by discussion above, and still am. I want to get > back to it after I finish more immediate priorities - finishing > printing/loading/saving fixes and structured array fixes. > > But Masked-Array-2 is on my list of desired long-term enhancements > for > numpy.
Well, if we plan to replace it within numpy, I think we should wait until then for any move on deprecation (after which it seems like the obviously right choice)? If we do not plan to replace it within numpy, we need to discuss a bit how it might affect infrastructure (multiple implementations....). There is the other discussion about how to replace it. By opening up/creating new masked dtypes or similar (cool but unclear how complex/long term) or `__array_ufunc__` based (relatively simple, will get rid of the nastier hacks that are currently needed). Or even both, just on different time scales? My first gut feeling about the proposal is: I love the idea to get rid of it... but lets not do it, it does feel like it makes too much infrastructure unclear. - Sebastian > > Allan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion