Re: [Numpy-discussion] new MaskedArray class

2019-06-19 Thread Allan Haldane
On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> 
> 
> On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane  > wrote:
> 
> 
> > This may be too much to ask from the initializer, but, if so, it still
> > seems most useful if it is made as easy as possible to do, say, `class
> > MaskedQuantity(Masked, Quantity): `.
> 
> Currently MaskedArray does not accept ducktypes as underlying arrays,
> but I think it shouldn't be too hard to modify it to do so. Good idea!
> 
> 
> Looking back at my trial, I see that I also never got to duck arrays -
> only ndarray subclasses - though I tried to make the code as agnostic as
> possible.
> 
> (Trial at
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
> 
> I already partly navigated this mixin-issue in the
> "MaskedArrayCollection" class, which essentially does
> ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> boilerplate. That's the backwards encapsulation order from what you want
> though.
> 
> 
> Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> mask=[True, False, False])` does indeed not have a `.unit` attribute
> (and cannot represent itself...); I'm not at all sure that my method of
> just creating a mixed class is anything but a recipe for disaster, though!

Based on your suggestion I worked on this a little today, and now my
MaskedArray more easily encapsulates both ducktypes and ndarray
subclasses (pushed to repo). Here's an example I got working with masked
units using unyt:

[1]: from MaskedArray import X, MaskedArray, MaskedScalar

[2]: from unyt import m, km

[3]: import numpy as np

[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])

[5]: uarr

MaskedArray([1., X , 3.])
[6]: uarr + 1*m

MaskedArray([1.001, X, 3.001])
[7]: uarr.filled()

unyt_array([1., 0., 3.], 'km')
[8]: np.concatenate([uarr, 2*uarr]).filled()
unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')

The catch is the ducktype/subclass has to rigorously follow numpy's
indexing rules, including distinguishing 0d arrays from scalars. For now
only I used unyt in the example above since it happens to be less strict
 about dimensionless operations than astropy.units which trips up my
repr code. (see below for example with astropy.units). Note in the last
line I lost the dimensions, but that is because unyt does not handle
np.concatenate. To get that to work we need a true ducktype for units.

The example above doesn't expose the ".units" attribute outside the
MaskedArray, and it doesn't print the units in the repr. But you can
access them using "filled".

While I could make MaskedArray forward unknown attribute accesses to the
encapsulated array, that seems a bit dangerous/bug-prone at first
glance, so probably I want to require the user to make a MaskedArray
subclass to do so. I've just started playing with that (probably buggy),
and Ive attached subclass examples for astropy.unit and unyt, with some
example output below.

Cheers,
Allan



Example using the attached astropy unit subclass:

>>> from astropy.units import m, km, s
>>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>>> uarr
MaskedQ([1., X , 1.], units=km)
>>> uarr.units
km
>>> uarr + (1*m)
MaskedQ([1.001, X, 1.001], units=km)
>>> uarr/(1*s)
MaskedQ([1., X , 1.], units=km / s)
>>> (uarr*(1*m))[1:]
MaskedQ([X , 1.], units=km m)
>>> np.add.outer(uarr, uarr)
MaskedQ([[2., X , 2.],
 [X , X , X ],
 [2., X , 2.]], units=km)
>>> print(uarr)
[1. X  1.] km m

Cheers,
Allan


> > Even if this impossible, I think it is conceptually useful to think
> > about what the masking class should do. My sense is that, e.g., it
> > should not attempt to decide when an operation succeeds or not,
> but just
> > "or together" input masks for regular, multiple-input functions,
> and let
> > the underlying arrays skip elements for reductions by using `where`
> > (hey, I did implement that for a reason... ;-). In particular, it
> > suggests one should not have things like domains and all that (I never
> > understood why `MaskedArray` did that). If one wants more, the class
> > should provide a method that updates the mask (a sensible default
> might
> > be `mask |= ~np.isfinite(result)` - here, the class being masked
> should
> > logically support ufuncs and functions, so it can decide what
> "isfinite"
> > means).
> 
> I agree it would be nice to remove domains. It would make life easier,
> and I could remove a lot of twiddly code! I kept it in for now to
> minimize the behavior changes from the old MaskedArray.
> 
> 
> That makes sense. Could be separated out to a backwards-compatibility
> class later.
> 
> 
> > In any case, I would think that a basic truth should be that
> everything
> > has a mask with a shape consistent wit

Re: [Numpy-discussion] new MaskedArray class

2019-06-19 Thread Marten van Kerkwijk
Hi Allan,

This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the idea
of setting the mask on some elements via `ma[item] = X`); is this on
purpose?

Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!

All the best,

Marten


On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane 
wrote:

> On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >
> >
> > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane  > > wrote:
> > 
> >
> > > This may be too much to ask from the initializer, but, if so, it
> still
> > > seems most useful if it is made as easy as possible to do, say,
> `class
> > > MaskedQuantity(Masked, Quantity): `.
> >
> > Currently MaskedArray does not accept ducktypes as underlying arrays,
> > but I think it shouldn't be too hard to modify it to do so. Good
> idea!
> >
> >
> > Looking back at my trial, I see that I also never got to duck arrays -
> > only ndarray subclasses - though I tried to make the code as agnostic as
> > possible.
> >
> > (Trial at
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >
> > I already partly navigated this mixin-issue in the
> > "MaskedArrayCollection" class, which essentially does
> > ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> > boilerplate. That's the backwards encapsulation order from what you
> want
> > though.
> >
> >
> > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > mask=[True, False, False])` does indeed not have a `.unit` attribute
> > (and cannot represent itself...); I'm not at all sure that my method of
> > just creating a mixed class is anything but a recipe for disaster,
> though!
>
> Based on your suggestion I worked on this a little today, and now my
> MaskedArray more easily encapsulates both ducktypes and ndarray
> subclasses (pushed to repo). Here's an example I got working with masked
> units using unyt:
>
> [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>
> [2]: from unyt import m, km
>
> [3]: import numpy as np
>
> [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>
> [5]: uarr
>
> MaskedArray([1., X , 3.])
> [6]: uarr + 1*m
>
> MaskedArray([1.001, X, 3.001])
> [7]: uarr.filled()
>
> unyt_array([1., 0., 3.], 'km')
> [8]: np.concatenate([uarr, 2*uarr]).filled()
> unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>
> The catch is the ducktype/subclass has to rigorously follow numpy's
> indexing rules, including distinguishing 0d arrays from scalars. For now
> only I used unyt in the example above since it happens to be less strict
>  about dimensionless operations than astropy.units which trips up my
> repr code. (see below for example with astropy.units). Note in the last
> line I lost the dimensions, but that is because unyt does not handle
> np.concatenate. To get that to work we need a true ducktype for units.
>
> The example above doesn't expose the ".units" attribute outside the
> MaskedArray, and it doesn't print the units in the repr. But you can
> access them using "filled".
>
> While I could make MaskedArray forward unknown attribute accesses to the
> encapsulated array, that seems a bit dangerous/bug-prone at first
> glance, so probably I want to require the user to make a MaskedArray
> subclass to do so. I've just started playing with that (probably buggy),
> and Ive attached subclass examples for astropy.unit and unyt, with some
> example output below.
>
> Cheers,
> Allan
>
>
>
> Example using the attached astropy unit subclass:
>
> >>> from astropy.units import m, km, s
> >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >>> uarr
> MaskedQ([1., X , 1.], units=km)
> >>> uarr.units
> km
> >>> uarr + (1*m)
> MaskedQ([1.001, X, 1.001], units=km)
> >>> uarr/(1*s)
> MaskedQ([1., X , 1.], units=km / s)
> >>> (uarr*(1*m))[1:]
> MaskedQ([X , 1.], units=km m)
> >>> np.add.outer(uarr, uarr)
> MaskedQ([[2., X , 2.],
>  [X , X , X ],
>  [2., X , 2.]], units=km)
> >>> print(uarr)
> [1. X  1.] km m
>
> Cheers,
> Allan
>
>
> > > Even if this impossible, I think it is conceptually useful to think
> > > about what the masking class should do. My sense is that, e.g., it
> > > should not attempt to decide when an operation succeeds or not,
> > but just
> > > "or together" input masks for regular, multiple-input functions,
> > and let
> > > the underlying arrays skip elements for reductions by using `where`
> > > (hey, I did implement that for a reason... ;-). In particular, it
> > > suggests one should not have things like domains and all that (I
> never
> > > understood why `MaskedArray` did that). If one wants more, th