On Tue, Apr 17, 2012 at 6:44 AM, Travis Oliphant <tra...@continuum.io> wrote: > Basically, there are two sets of changes as far as I understand right now: > > 1) ufunc infrastructure understands masked arrays > 2) ndarray grew attributes to represent masked arrays > > I am proposing that we keep 1) but change 2) so that only certain kinds of > NumPy arrays actually have the extra function pointers (effectively a > sub-type). In essence, what I'm proposing is that the NumPy 1.6 > PyArrayObject become a base-object, but the other members of the C-structure > are not even present unless the Masked flag is set. Such changes would not > require ripping code out --- just altering the presentation a bit. Yet, > they could have large long-term implications, that we should explore before > they get fixed. > > Whether masked arrays should be a formal sub-class is actually an un-related > question and I generally lean in the direction of not encouraging sub-classes > of the ndarray. The big questions are does this object work in the > calculation infrastructure. Can I add an array to a masked array. Does it > have a sum method? I think it could be argued that a masked array does have > a "is a" relationship with an array. It can also be argued that it is > better to have a "has a" relationship with an array and be-it's own-object. > Either way, this object could still have it's first-part be binary compatible > with a NumPy Array, and that is what I'm really suggesting.
It sounds like the main implementation issue here is that this masked array class needs some way to coordinate with the ufunc infrastructure to efficiently and reliably handle the mask in calculations. The core ufunc code now knows how to handle masks, and this functionality is needed for where= and NA-dtypes, so obviously it's staying, independent of what we decide to do with masked arrays. So the question is just, how do we get the masked array and the ufuncs talking to each other so they can do the right thing. Perhaps we should focus, then, on how to create a better hooking mechanism for ufuncs? Something along these lines? http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html If done in a solid enough way, this would also solve other problems, e.g. we could make ufuncs work reliably on sparse matrices, which seems to trip people up on scipy-user every month or two. Of course, it's very tricky to get right :-( As far the masked array API: I'm still not convinced we know how we want these things to behave. The masked arrays in master currently implement MISSING semantics, but AFAICT everyone who wants MISSING semantics prefers NA-dtypes or even plain old NaN's over a masked implementation. And some of the current implementation's biggest backers, like Chuck, have argued that they should switch to skipNA=True, which is more of an IGNORED-style semantic. OTOH, there's still disagreement over how IGNORED-style semantics should even work (I'm thinking of that discussion about commutivity). The best existing model is numpy.ma -- but the numpy.ma API is quite different from the NEP, in more ways than just the default setting for skipNA. numpy.ma uses the opposite convention for mask values, it has additional concepts like the fillvalue, hardmask versus softmask, and then there's the whole way the NEP uses views to manage the mask. And I don't know which of these numpy.ma features are useful, which are extraneous, and which are currently useful but will become extraneous once the users who really wanted something more like NA-dtypes switch to those. So we all agree that masked arrays can be useful, and that numpy.ma has problems. But straightforwardly porting numpy.ma to C doesn't seem like it would help much, and neither does simply declaring that numpy.ma has been deprecated in favour of a new NEP-like API. So, I dunno. It seems like it might make the most sense to: 1) take the mask fields out of the core ndarray (while leaving the rest of Mark's infrastructure, as per above) 2) make sure we have the hooks needed so that numpy.ma, and NEP-like APIs, and whatever other experiments people want to try, can all integrate well with ufuncs, and make any other extensions that are generally useful and required so that they can work well 3) once we've experimented, move the winner into the core. Or whatever else makes sense to do once we understand what we're trying to accomplish. -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion