I think we’d need to consider separately the operation on the mask and on the data. In my proposal, the data would always do np.sum(array, where=~mask), while how the mask would propagate might depend on the mask itself,
I quite like this idea, and I think Stephan’s strawman design is actually plausible, where MaskedArray.mask is either an InvalidMask or a IgnoreMask instance to pick between the different propagation types. Both classes could simply have an underlying ._array attribute pointing to a duck-array of some kind that backs their boolean data. The second version requires that you *also* know how Mask classes work, and how they implement + I remain unconvinced that Mask classes should behave differently on different ufuncs. I don’t think np.minimum(ignore_na, b) is any different to np.add(ignore_na, b) - either both should produce b, or both should produce ignore_na. I would lean towards produxing ignore_na, and propagation behavior differing between “ignore” and “invalid” only for reduce / accumulate operations, where the concept of skipping an application is well-defined. Some possible follow-up questions that having two distinct masked types raise: - what if I want my data to support both invalid and skip fields at the same time? sum([invalid, skip, 1]) == invalid - is there a use case for more that these two types of mask? invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting things to track through a calculation, possibly a dictionary of named masks. Eric On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <sho...@gmail.com> wrote: > On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk < > m.h.vankerkw...@gmail.com> wrote: > >> Your proposal would be something like np.sum(array, >>> where=np.ones_like(array))? This seems rather verbose for a common >>> operation. Perhaps np.sum(array, where=True) would work, making use of >>> broadcasting? (I haven't actually checked whether this is well-defined yet.) >>> >>> I think we'd need to consider separately the operation on the mask and >> on the data. In my proposal, the data would always do `np.sum(array, >> where=~mask)`, while how the mask would propagate might depend on the mask >> itself, i.e., we'd have different mask types for `skipna=True` (default) >> and `False` ("contagious") reductions, which differed in doing >> `logical_and.reduce` or `logical_or.reduce` on the mask. >> > > OK, I think I finally understand what you're getting at. So suppose this > this how we implement it internally. Would we really insist on a user > creating a new MaskedArray with a new mask object, e.g., with a GreedyMask? > We could add sugar for this, but certainly array.greedy_masked().sum() is > significantly less clear than array.sum(skipna=False). > > I'm also a little concerned about a proliferation of MaskedArray/Mask > types. New types are significantly harder to understand than new functions > (or new arguments on existing functions). I don't know if we have enough > distinct use cases for this many types. > > Are there use-cases for propagating masks separately from data? If not, it >>> might make sense to only define mask operations along with data, which >>> could be much simpler. >>> >> >> I had only thought about separating out the concern of mask propagation >> from the "MaskedArray" class to the mask proper, but it might indeed make >> things easier if the mask also did any required preparation for passing >> things on to the data (such as adjusting the "where" argument in a >> reduction). I also like that this way the mask can determine even before >> the data what functionality is available (i.e., it could be the place from >> which to return `NotImplemented` for a ufunc.at call with a masked index >> argument). >> > > You're going to have to come up with something more compelling than > "separation of concerns" to convince me that this extra Mask abstraction is > worthwhile. On its own, I think a separate Mask class would only obfuscate > MaskedArray functions. > > For example, compare these two implementations of add: > > def add1(x, y): > return MaskedArray(x.data + y.data, x.mask | y.mask) > > def add2(x, y): > return MaskedArray(x.data + y.data, x.mask + y.mask) > > The second version requires that you *also* know how Mask classes work, > and how they implement +. So now you need to look in at least twice as many > places to understand add() for MaskedArray objects. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion