Re: [Numpy-discussion] new MaskedArray class

Eric Wieser Sun, 23 Jun 2019 15:59:53 -0700

I think we’d need to consider separately the operation on the mask and on
the data. In my proposal, the data would always do np.sum(array,
where=~mask), while how the mask would propagate might depend on the mask
itself,


I quite like this idea, and I think Stephan’s strawman design is actually
plausible, where MaskedArray.mask is either an InvalidMask or a IgnoreMask
instance to pick between the different propagation types. Both classes
could simply have an underlying ._array attribute pointing to a duck-array
of some kind that backs their boolean data.

The second version requires that you *also* know how Mask classes work, and
how they implement +

I remain unconvinced that Mask classes should behave differently on
different ufuncs. I don’t think np.minimum(ignore_na, b) is any different
to np.add(ignore_na, b) - either both should produce b, or both should
produce ignore_na. I would lean towards produxing ignore_na, and
propagation behavior differing between “ignore” and “invalid” only for
reduce / accumulate operations, where the concept of skipping an
application is well-defined.

Some possible follow-up questions that having two distinct masked types
raise:

   - what if I want my data to support both invalid and skip fields at the
   same time? sum([invalid, skip, 1]) == invalid
   - is there a use case for more that these two types of mask?
   invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
   things to track through a calculation, possibly a dictionary of named masks.

Eric

On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <sho...@gmail.com> wrote:

> On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Your proposal would be something like np.sum(array,
>>> where=np.ones_like(array))? This seems rather verbose for a common
>>> operation. Perhaps np.sum(array, where=True) would work, making use of
>>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>>
>>> I think we'd need to consider separately the operation on the mask and
>> on the data. In my proposal, the data would always do `np.sum(array,
>> where=~mask)`, while how the mask would propagate might depend on the mask
>> itself, i.e., we'd have different mask types for `skipna=True` (default)
>> and `False` ("contagious") reductions, which differed in doing
>> `logical_and.reduce` or `logical_or.reduce` on the mask.
>>
>
> OK, I think I finally understand what you're getting at. So suppose this
> this how we implement it internally. Would we really insist on a user
> creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
> We could add sugar for this, but certainly array.greedy_masked().sum() is
> significantly less clear than array.sum(skipna=False).
>
> I'm also a little concerned about a proliferation of MaskedArray/Mask
> types. New types are significantly harder to understand than new functions
> (or new arguments on existing functions). I don't know if we have enough
> distinct use cases for this many types.
>
> Are there use-cases for propagating masks separately from data? If not, it
>>> might make sense to only define mask operations along with data, which
>>> could be much simpler.
>>>
>>
>> I had only thought about separating out the concern of mask propagation
>> from the "MaskedArray" class to the mask proper, but it might indeed make
>> things easier if the mask also did any required preparation for passing
>> things on to the data (such as adjusting the "where" argument in a
>> reduction). I also like that this way the mask can determine even before
>> the data what functionality is available (i.e., it could be the place from
>> which to return `NotImplemented` for a ufunc.at call with a masked index
>> argument).
>>
>
> You're going to have to come up with something more compelling than
> "separation of concerns" to convince me that this extra Mask abstraction is
> worthwhile. On its own, I think a separate Mask class would only obfuscate
> MaskedArray functions.
>
> For example, compare these two implementations of add:
>
> def  add1(x, y):
>     return MaskedArray(x.data + y.data,  x.mask | y.mask)
>
> def  add2(x, y):
>     return MaskedArray(x.data + y.data,  x.mask + y.mask)
>
> The second version requires that you *also* know how Mask classes work,
> and how they implement +. So now you need to look in at least twice as many
> places to understand add() for MaskedArray objects.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] new MaskedArray class

Reply via email to