On 2018/05/23 9:06 AM, Matti Picus wrote:
MaskedArray is a strange but useful creature. This NEP proposes to distribute it as a separate package under the NumPy brand.

As I understand the process, a proposed NEP should be first discussed here to gauge general acceptance, then after that the details should be discussed on the pull request itself https://github.com/numpy/numpy/pull/11146.

Here is the motivation section from the NEP:

MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
masking capabilities, i.e. the ability to ignore or hide certain array
values during computation.

While historically convenient to distribute this class inside of NumPy,
improved packaging has made it possible to distribute it separately
without difficulty.

Motivations for this move include:

 * Focus: the NumPy package should strive to only include the
   `ndarray` object, and the essential utilities needed to manipulate
   such arrays.
 * Complexity: the MaskedArray implementation is non-trivial, and imposes
   a significant maintenance burden.
 * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
   often cause complications when being used with other packages.
   Fixing these issues is outside the scope of NumPy development.

This NEP proposes a deprecation pathway through which MaskedArrays
would still be accessible to users, but no longer as part of the core
package.

Any thoughts?

Matti and Stefan

I understand at least some of the motivation and potential advantages, but as it stands, I find this NEP highly alarming. Masked arrays are critical to my numpy usage, and I suspect they are critical for many other use cases as well. In fact, I would prefer that a high priority for major numpy development be the more complete integration of masked array capabilities into numpy, not their removal to a separate package. I was unhappy to see the effort in that direction a few years ago being killed. I didn't agree with every design decision, but overall I thought it was going in the right direction.

Bad or missing values (and situations where one wants to use a mask to operate on a subset of an array) are found in many domains of real life; do you really want python users in those domains to have to fall back on Matlab-style reliance on nans and/or manual mask manipulations, as the new maskedarray package is sidelined?

Or is there any realistic prospect for maintenance and improvement of the package after it is separated out? Or of mask/missing value handling being integrated into numpy? Is the latter option on the table in any form, or is it DOA?

Side question: does your proposed purification of numpy include elimination of linalg and random? Based on the criteria in the NEP, I would expect it does; so maybe you should have a more ambitious NEP, and do the purification all in one step as a numpy version 2.0. (Surely if masked arrays are purged, the matrix class should be booted out at the same time.)

Eric
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to