Re: [Numpy-discussion] new MaskedArray class

Marten van Kerkwijk Tue, 18 Jun 2019 11:05:06 -0700

On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhald...@gmail.com>
wrote:
<snip>


> > This may be too much to ask from the initializer, but, if so, it still
> > seems most useful if it is made as easy as possible to do, say, `class
> > MaskedQuantity(Masked, Quantity): <very few overrides>`.
>
> Currently MaskedArray does not accept ducktypes as underlying arrays,
> but I think it shouldn't be too hard to modify it to do so. Good idea!
>

Looking back at my trial, I see that I also never got to duck arrays - only
ndarray subclasses - though I tried to make the code as agnostic as
possible.

(Trial at
https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
)

I already partly navigated this mixin-issue in the
> "MaskedArrayCollection" class, which essentially does
> ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> boilerplate. That's the backwards encapsulation order from what you want
> though.
>

Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
mask=[True, False, False])` does indeed not have a `.unit` attribute (and
cannot represent itself...); I'm not at all sure that my method of just
creating a mixed class is anything but a recipe for disaster, though!


> > Even if this impossible, I think it is conceptually useful to think
> > about what the masking class should do. My sense is that, e.g., it
> > should not attempt to decide when an operation succeeds or not, but just
> > "or together" input masks for regular, multiple-input functions, and let
> > the underlying arrays skip elements for reductions by using `where`
> > (hey, I did implement that for a reason... ;-). In particular, it
> > suggests one should not have things like domains and all that (I never
> > understood why `MaskedArray` did that). If one wants more, the class
> > should provide a method that updates the mask (a sensible default might
> > be `mask |= ~np.isfinite(result)` - here, the class being masked should
> > logically support ufuncs and functions, so it can decide what "isfinite"
> > means).
>
> I agree it would be nice to remove domains. It would make life easier,
> and I could remove a lot of twiddly code! I kept it in for now to
> minimize the behavior changes from the old MaskedArray.
>

That makes sense. Could be separated out to a backwards-compatibility class
later.


> > In any case, I would think that a basic truth should be that everything
> > has a mask with a shape consistent with the data, so
> > 1. Each complex numbers has just one mask, and setting `a.imag` with a
> > masked array should definitely propagate the mask.
> > 2. For a masked array with structured dtype, I'd similarly say that the
> > default is for a mask to have the same shape as the array. But that
> > something like your collection makes sense for the case where one wants
> > to mask items in a structure.
>
> Agreed that we should have a single bool per complex or structured
> element, and the mask shape is the same as the array shape. That's how I
> implemented it. But there is still a problem with complex.imag assignment:
>
>     >>> a = MaskedArray([1j, 2, X])
>     >>> i = a.imag
>     >>> i[:] = MaskedArray([1, X, 1])
>
> If we make the last line copy the mask to the original array, what
> should the real part of a[2] be? Conversely, if we don't copy the mask,
> what should the imag part of a[1] be? It seems like we might "want" the
> masks to be OR'd instead, but then should i[2] be masked after we just
> set it to 1?
>
> Ah, I see the issue now... Easiest to implement and closest in analogy to
a regular view would be to just let it unmask a[2] (with whatever is in
real; user beware!).

Perhaps better would be to special-case such that `imag` returns a
read-only view of the mask. Making `imag` itself read-only would prevent
possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is no
reason this should update the mask.

Still, neither is really satisfactory...


>
> > p.s. I started trying to implement the above "Mixin" class; will try to
> > clean that up a bit so that at least it uses `where` and push it up.
>
> I played with "where", but didn't include it since 1.17 is not released.
> To avoid duplication of effort, I've attached a diff of what I tried. I
> actually get a slight slowdown of about 10% by using where...
>

Your implementation is indeed quite similar to what I got in
__array_ufunc__ (though one should "&" the where with ~mask).

I think the main benefit is not to presume that whatever is underneath
understands 0 or 1, i.e., avoid filling.


> If you make progress with the mixin, a push is welcome. I imagine a
> problem is going to be that np.isscalar doesn't work to detect duck
> scalars.
>
> I fear that in my attempts I've simply decided that only array scalars
exist...

-- Marten

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] new MaskedArray class

Reply via email to