On 6/23/19 6:58 PM, Eric Wieser wrote:
>     I think we’d need to consider separately the operation on the mask
>     and on the data. In my proposal, the data would always do
>     |np.sum(array, where=~mask)|, while how the mask would propagate
>     might depend on the mask itself,
> 
> I quite like this idea, and I think Stephan’s strawman design is
> actually plausible, where |MaskedArray.mask| is either an |InvalidMask|
> or a |IgnoreMask| instance to pick between the different propagation
> types. Both classes could simply have an underlying |._array| attribute
> pointing to a duck-array of some kind that backs their boolean data.
> 
>     The second version requires that you /also/ know how Mask classes
>     work, and how they implement +
> 
> I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don’t think |np.minimum(ignore_na, b)| is any
> different to |np.add(ignore_na, b)| - either both should produce |b|, or
> both should produce |ignore_na|. I would lean towards produxing
> |ignore_na|, and propagation behavior differing between “ignore” and
> “invalid” only for |reduce| / |accumulate| operations, where the concept
> of skipping an application is well-defined.
> 
> Some possible follow-up questions that having two distinct masked types
> raise:
> 
>   * what if I want my data to support both invalid and skip fields at
>     the same time? sum([invalid, skip, 1]) == invalid
>   * is there a use case for more that these two types of mask?
>     |invalid_due_to_reason_A|, |invalid_due_to_reason_B| would be
>     interesting things to track through a calculation, possibly a
>     dictionary of named masks.
> 
> Eric

General comments on the last few emails:

For now I intentionally decided not to worry about NA masks in my
implementation. I want to get a first working implementation finished
for IGNORE style.

I agree it would be nice to have some support for NA style later, either
by a new MaskedArray subclass, a ducktype'd .mask attribute, or by some
other modification. In the latter category, consider that currently the
mask is stored as a boolean (1 byte) mask. One idea I have not put much
thought into is that internally we could make the mask a uint8, so
unmasked would be "0" IGNORE mask would be 1, and NA mask would be 2.
That allows mixing of mask types. Not sure how messy it would be to
implement.

For the idea of replacing the mask by a ducktype for NA style, my
instinct is that would be tricky. Many of the MaskedArray
__array_function__ method implementations, like sort and dot and many
others, do "complicated" computations using the mask that I don't think
you could easily get to work right by simply substituting a mask
ducktype. I think those would need to be reimplemented for NA style, in
other words you would need to make a MaskedArray subclass anyway.

Cheers,
Allan


> On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <sho...@gmail.com
> <mailto:sho...@gmail.com>> wrote:
> 
>     On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk
>     <m.h.vankerkw...@gmail.com <mailto:m.h.vankerkw...@gmail.com>> wrote:
> 
>             Your proposal would be something like np.sum(array,
>             where=np.ones_like(array))? This seems rather verbose for a
>             common operation. Perhaps np.sum(array, where=True) would
>             work, making use of broadcasting? (I haven't actually
>             checked whether this is well-defined yet.)
> 
>         I think we'd need to consider separately the operation on the
>         mask and on the data. In my proposal, the data would always do
>         `np.sum(array, where=~mask)`, while how the mask would propagate
>         might depend on the mask itself, i.e., we'd have different mask
>         types for `skipna=True` (default) and `False` ("contagious")
>         reductions, which differed in doing `logical_and.reduce` or
>         `logical_or.reduce` on the mask.
> 
> 
>     OK, I think I finally understand what you're getting at. So suppose
>     this this how we implement it internally. Would we really insist on
>     a user creating a new MaskedArray with a new mask object, e.g., with
>     a GreedyMask? We could add sugar for this, but certainly
>     array.greedy_masked().sum() is significantly less clear than
>     array.sum(skipna=False).
> 
>     I'm also a little concerned about a proliferation of
>     MaskedArray/Mask types. New types are significantly harder to
>     understand than new functions (or new arguments on existing
>     functions). I don't know if we have enough distinct use cases for
>     this many types.
> 
>             Are there use-cases for propagating masks separately from
>             data? If not, it might make sense to only define mask
>             operations along with data, which could be much simpler.
> 
> 
>         I had only thought about separating out the concern of mask
>         propagation from the "MaskedArray" class to the mask proper, but
>         it might indeed make things easier if the mask also did any
>         required preparation for passing things on to the data (such as
>         adjusting the "where" argument in a reduction). I also like that
>         this way the mask can determine even before the data what
>         functionality is available (i.e., it could be the place from
>         which to return `NotImplemented` for a ufunc.at
>         <http://ufunc.at> call with a masked index argument).
> 
> 
>     You're going to have to come up with something more compelling than
>     "separation of concerns" to convince me that this extra Mask
>     abstraction is worthwhile. On its own, I think a separate Mask class
>     would only obfuscate MaskedArray functions.
> 
>     For example, compare these two implementations of add:
> 
>     def  add1(x, y):
>         return MaskedArray(x.data + y.data,  x.mask | y.mask)
> 
>     def  add2(x, y):
>         return MaskedArray(x.data + y.data,  x.mask + y.mask)
> 
>     The second version requires that you *also* know how Mask classes
>     work, and how they implement +. So now you need to look in at least
>     twice as many places to understand add() for MaskedArray objects.
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to