Hi all, Chuck suggested we think about a MaskedArray replacement for 1.18.
A few months ago I did some work on a MaskedArray replacement using `__array_function__`, which I got mostly working. It seems like a good time to bring it up for discussion now. See it at: https://github.com/ahaldane/ndarray_ducktypes It should be very usable, it has docs you can read, and it passes a pytest-suite with 1024 tests adapted from numpy's MaskedArray tests. What is missing? It needs even more tests for new functionality, and a couple numpy-API functions are missing, in particular `np.median`, `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could also find many things to improve too. Besides fixing many little annoyances from MaskedArray, and simplifying the logic by always storing the mask in full, it also has new features. For instance it allows the use of a "X" variable to mark masked locations during array construction, and I solve the issue of how to mask individual fields of a structured array differently. At this point I would by happy to get some feedback on the design and what seems good or bad. If it seems like a good start, I'd be happy to move it into a numpy repo of some sort for further collaboration & discussion, and maybe into 1.18. At the least I hope it can serve as a design study of what we could do. Let me also drop here two more interesting detailed issues: First, the issue of what to do about .real and .imag of complex arrays, and similarly about field-assignment of structured arrays. The problem is that we have a single mask bool per element of a complex array, but what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the mask of the original array change? Should we make .real and .imag readonly? Second, a more general issue of how to ducktype scalars when using `__array_function__` which I think many ducktype implementors will have to face. For MaskedArray, I created an associated "MaskedScalar" type. However, MaskedScalar has to behave differently from normal numpy scalars in a number of ways: It is not part of the numpy scalar hierarchy, it fails checks `isinstance(var, np.floating)`, and np.isscalar returns false. Numpy scalar types cannot be subclassed. We have discussed before the need to have distinction between 0d-arrays and scalars, so we shouldn't just use a 0d (in fact, this makes printing very difficult). This leads me to think that in future dtype-overhaul plans, we should consider creating a subclassable `np.scalar` base type to wrap all numpy scalar variables, and code like `isinstance(var, np.floating)` should be replaced by `isinstance(var.dtype.type, np.floating)` or similar. That is, the numeric dtype of the scalar is no longer encoded in `type(var)` but in `var.dtype`: The fact that the variable is a numpy scalar is decoupled from its numeric dtype. This is useful because there are many "associated" properties of scalars in common with arrays which have nothing to do with the dtype, which ducktype implementors want to touch. I imagine this will come up a lot: In that repo I also have an "ArrayCollection" ducktype which required a "CollectionScalar" scalar, and similarly I imagine people implementing units want the units attached to the scalar, independently of the dtype. Cheers, Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion