On 6/18/19 10:06 AM, Marten van Kerkwijk wrote: > Hi Allen, > > Thanks for the message and link! In astropy, we've been struggling with > masking a lot, and one of the main conclusions I have reached is that > ideally one has a more abstract `Masked` class that can take any type of > data (including `ndarray`, of course), and behaves like that data as > much as possible, to the extent that if, e.g., I create a > `Masked(Quantity(..., unit), mask=...)`, the instance will have a > `.unit` attribute and perhaps even `isinstance(..., Quantity)` will > hold. And similarly for `Masked(Time(...), mask=...)`, > `Masked(SkyCoord(...), mask=...)`, etc. In a way, `Masked` would be a > kind of Mixin-class that just tracks a mask attribute. > > This may be too much to ask from the initializer, but, if so, it still > seems most useful if it is made as easy as possible to do, say, `class > MaskedQuantity(Masked, Quantity): <very few overrides>`.
Currently MaskedArray does not accept ducktypes as underlying arrays, but I think it shouldn't be too hard to modify it to do so. Good idea! I already partly navigated this mixin-issue in the "MaskedArrayCollection" class, which essentially does ArrayCollection(MaskedArray(array)), and only takes about 30 lines of boilerplate. That's the backwards encapsulation order from what you want though. > Even if this impossible, I think it is conceptually useful to think > about what the masking class should do. My sense is that, e.g., it > should not attempt to decide when an operation succeeds or not, but just > "or together" input masks for regular, multiple-input functions, and let > the underlying arrays skip elements for reductions by using `where` > (hey, I did implement that for a reason... ;-). In particular, it > suggests one should not have things like domains and all that (I never > understood why `MaskedArray` did that). If one wants more, the class > should provide a method that updates the mask (a sensible default might > be `mask |= ~np.isfinite(result)` - here, the class being masked should > logically support ufuncs and functions, so it can decide what "isfinite" > means). I agree it would be nice to remove domains. It would make life easier, and I could remove a lot of twiddly code! I kept it in for now to minimize the behavior changes from the old MaskedArray. > In any case, I would think that a basic truth should be that everything > has a mask with a shape consistent with the data, so > 1. Each complex numbers has just one mask, and setting `a.imag` with a > masked array should definitely propagate the mask. > 2. For a masked array with structured dtype, I'd similarly say that the > default is for a mask to have the same shape as the array. But that > something like your collection makes sense for the case where one wants > to mask items in a structure. Agreed that we should have a single bool per complex or structured element, and the mask shape is the same as the array shape. That's how I implemented it. But there is still a problem with complex.imag assignment: >>> a = MaskedArray([1j, 2, X]) >>> i = a.imag >>> i[:] = MaskedArray([1, X, 1]) If we make the last line copy the mask to the original array, what should the real part of a[2] be? Conversely, if we don't copy the mask, what should the imag part of a[1] be? It seems like we might "want" the masks to be OR'd instead, but then should i[2] be masked after we just set it to 1? > All the best, > > Marten > > p.s. I started trying to implement the above "Mixin" class; will try to > clean that up a bit so that at least it uses `where` and push it up. I played with "where", but didn't include it since 1.17 is not released. To avoid duplication of effort, I've attached a diff of what I tried. I actually get a slight slowdown of about 10% by using where... If you make progress with the mixin, a push is welcome. I imagine a problem is going to be that np.isscalar doesn't work to detect duck scalars. Cheers, Allan > On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <allanhald...@gmail.com > <mailto:allanhald...@gmail.com>> wrote: > > Hi all, > > Chuck suggested we think about a MaskedArray replacement for 1.18. > > A few months ago I did some work on a MaskedArray replacement using > `__array_function__`, which I got mostly working. It seems like a good > time to bring it up for discussion now. See it at: > > https://github.com/ahaldane/ndarray_ducktypes > > It should be very usable, it has docs you can read, and it passes a > pytest-suite with 1024 tests adapted from numpy's MaskedArray tests. > What is missing? It needs even more tests for new functionality, and a > couple numpy-API functions are missing, in particular `np.median`, > `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could > also find many things to improve too. > > Besides fixing many little annoyances from MaskedArray, and simplifying > the logic by always storing the mask in full, it also has new features. > For instance it allows the use of a "X" variable to mark masked > locations during array construction, and I solve the issue of how to > mask individual fields of a structured array differently. > > At this point I would by happy to get some feedback on the design and > what seems good or bad. If it seems like a good start, I'd be happy to > move it into a numpy repo of some sort for further collaboration & > discussion, and maybe into 1.18. At the least I hope it can serve as a > design study of what we could do. > > > > > > Let me also drop here two more interesting detailed issues: > > First, the issue of what to do about .real and .imag of complex arrays, > and similarly about field-assignment of structured arrays. The problem > is that we have a single mask bool per element of a complex array, but > what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the > mask of the original array change? Should we make .real and .imag > readonly? > > Second, a more general issue of how to ducktype scalars when using > `__array_function__` which I think many ducktype implementors will have > to face. For MaskedArray, I created an associated "MaskedScalar" type. > However, MaskedScalar has to behave differently from normal numpy > scalars in a number of ways: It is not part of the numpy scalar > hierarchy, it fails checks `isinstance(var, np.floating)`, and > np.isscalar returns false. Numpy scalar types cannot be subclassed. We > have discussed before the need to have distinction between 0d-arrays and > scalars, so we shouldn't just use a 0d (in fact, this makes printing > very difficult). This leads me to think that in future dtype-overhaul > plans, we should consider creating a subclassable `np.scalar` base type > to wrap all numpy scalar variables, and code like `isinstance(var, > np.floating)` should be replaced by `isinstance(var.dtype.type, > np.floating)` or similar. That is, the numeric dtype of the scalar is no > longer encoded in `type(var)` but in `var.dtype`: The fact that the > variable is a numpy scalar is decoupled from its numeric dtype. > > This is useful because there are many "associated" properties of scalars > in common with arrays which have nothing to do with the dtype, which > ducktype implementors want to touch. I imagine this will come up a lot: > In that repo I also have an "ArrayCollection" ducktype which required a > "CollectionScalar" scalar, and similarly I imagine people implementing > units want the units attached to the scalar, independently of the dtype. > > Cheers, > Allan > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
diff --git a/MaskedArray.py b/MaskedArray.py index 2df12b4..272699f 100755 --- a/MaskedArray.py +++ b/MaskedArray.py @@ -904,12 +904,23 @@ class _Masked_BinOp(_Masked_UFunc): if isinstance(initial, (MaskedScalar, MaskedX)): raise ValueError("initial should not be masked") - if not np.isscalar(da): - da[ma] = self.reduce_fill(da.dtype) - # if da is a scalar, we get correct result no matter fill + if 1: # two different implementations, investigate performance + wheremask = ~ma + if 'where' in kwargs: + wheremask |= kwargs['where'] + kwargs['where'] = wheremask + if 'initial' not in kwargs: + kwargs['initial'] = self.reduce_fill(da.dtype) + + result = self.f.reduce(da, **kwargs) + m = np.logical_and.reduce(ma, **mkwargs) + else: + if not np.isscalar(da): + da[ma] = self.reduce_fill(da.dtype) + # if da is a scalar, we get correct result no matter fill - result = self.f.reduce(da, **kwargs) - m = np.logical_and.reduce(ma, **mkwargs) + result = self.f.reduce(da, **kwargs) + m = np.logical_and.reduce(ma, **mkwargs) ## Code that might be used to support domained ufuncs. WIP #with np.errstate(divide='ignore', invalid='ignore'):
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion