Hi Allen, Thanks for the message and link! In astropy, we've been struggling with masking a lot, and one of the main conclusions I have reached is that ideally one has a more abstract `Masked` class that can take any type of data (including `ndarray`, of course), and behaves like that data as much as possible, to the extent that if, e.g., I create a `Masked(Quantity(..., unit), mask=...)`, the instance will have a `.unit` attribute and perhaps even `isinstance(..., Quantity)` will hold. And similarly for `Masked(Time(...), mask=...)`, `Masked(SkyCoord(...), mask=...)`, etc. In a way, `Masked` would be a kind of Mixin-class that just tracks a mask attribute.
This may be too much to ask from the initializer, but, if so, it still seems most useful if it is made as easy as possible to do, say, `class MaskedQuantity(Masked, Quantity): <very few overrides>`. Even if this impossible, I think it is conceptually useful to think about what the masking class should do. My sense is that, e.g., it should not attempt to decide when an operation succeeds or not, but just "or together" input masks for regular, multiple-input functions, and let the underlying arrays skip elements for reductions by using `where` (hey, I did implement that for a reason... ;-). In particular, it suggests one should not have things like domains and all that (I never understood why `MaskedArray` did that). If one wants more, the class should provide a method that updates the mask (a sensible default might be `mask |= ~np.isfinite(result)` - here, the class being masked should logically support ufuncs and functions, so it can decide what "isfinite" means). In any case, I would think that a basic truth should be that everything has a mask with a shape consistent with the data, so 1. Each complex numbers has just one mask, and setting `a.imag` with a masked array should definitely propagate the mask. 2. For a masked array with structured dtype, I'd similarly say that the default is for a mask to have the same shape as the array. But that something like your collection makes sense for the case where one wants to mask items in a structure. All the best, Marten p.s. I started trying to implement the above "Mixin" class; will try to clean that up a bit so that at least it uses `where` and push it up. On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <allanhald...@gmail.com> wrote: > Hi all, > > Chuck suggested we think about a MaskedArray replacement for 1.18. > > A few months ago I did some work on a MaskedArray replacement using > `__array_function__`, which I got mostly working. It seems like a good > time to bring it up for discussion now. See it at: > > https://github.com/ahaldane/ndarray_ducktypes > > It should be very usable, it has docs you can read, and it passes a > pytest-suite with 1024 tests adapted from numpy's MaskedArray tests. > What is missing? It needs even more tests for new functionality, and a > couple numpy-API functions are missing, in particular `np.median`, > `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could > also find many things to improve too. > > Besides fixing many little annoyances from MaskedArray, and simplifying > the logic by always storing the mask in full, it also has new features. > For instance it allows the use of a "X" variable to mark masked > locations during array construction, and I solve the issue of how to > mask individual fields of a structured array differently. > > At this point I would by happy to get some feedback on the design and > what seems good or bad. If it seems like a good start, I'd be happy to > move it into a numpy repo of some sort for further collaboration & > discussion, and maybe into 1.18. At the least I hope it can serve as a > design study of what we could do. > > > > > > Let me also drop here two more interesting detailed issues: > > First, the issue of what to do about .real and .imag of complex arrays, > and similarly about field-assignment of structured arrays. The problem > is that we have a single mask bool per element of a complex array, but > what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the > mask of the original array change? Should we make .real and .imag readonly? > > Second, a more general issue of how to ducktype scalars when using > `__array_function__` which I think many ducktype implementors will have > to face. For MaskedArray, I created an associated "MaskedScalar" type. > However, MaskedScalar has to behave differently from normal numpy > scalars in a number of ways: It is not part of the numpy scalar > hierarchy, it fails checks `isinstance(var, np.floating)`, and > np.isscalar returns false. Numpy scalar types cannot be subclassed. We > have discussed before the need to have distinction between 0d-arrays and > scalars, so we shouldn't just use a 0d (in fact, this makes printing > very difficult). This leads me to think that in future dtype-overhaul > plans, we should consider creating a subclassable `np.scalar` base type > to wrap all numpy scalar variables, and code like `isinstance(var, > np.floating)` should be replaced by `isinstance(var.dtype.type, > np.floating)` or similar. That is, the numeric dtype of the scalar is no > longer encoded in `type(var)` but in `var.dtype`: The fact that the > variable is a numpy scalar is decoupled from its numeric dtype. > > This is useful because there are many "associated" properties of scalars > in common with arrays which have nothing to do with the dtype, which > ducktype implementors want to touch. I imagine this will come up a lot: > In that repo I also have an "ArrayCollection" ducktype which required a > "CollectionScalar" scalar, and similarly I imagine people implementing > units want the units attached to the scalar, independently of the dtype. > > Cheers, > Allan > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion