On Friday, November 4, 2011, Nathaniel Smith <n...@pobox.com> wrote: > On Thu, Nov 3, 2011 at 7:54 PM, Gary Strangman > <str...@nmr.mgh.harvard.edu> wrote: >> For the non-destructive+propagating case, do I understand correctly that >> this would mean I (as a user) could temporarily decide to IGNORE certain >> portions of my data, perform a series of computation on that data, and the >> IGNORED flag (or however it is implemented) would be propagated from >> computation to computation? If that's the case, I suspect I'd use it all >> the time ... to effectively perform data subsetting without generating >> (partial) copies of large datasets. But maybe I misunderstand the >> intended notion of propagation ... > > I *think* it's more subtle than that, but I admit I'm somewhat > confused about how exactly people would want IGNORED to work in > various corner cases. (This is another part of why figuring out our > audience/use-cases seems like an important first step to me... > fortunately the semantics for MISSING are, I think, much more clear.) > > Say we have > >>> a = np.array([1, IGNORED(2), 3]) > >>> b = np.array([10, 20, 30]) > (Here's I'm using IGNORED(2) to mean a value that is currently > ignored, but if you unmasked it it would have the value 2.) > > Then we have: > > # non-propagating *or* propagating, doesn't matter: >>>> a + 2 > [3, IGNORED(2), 5] > > # non-propagating: >>>> a + b > One of these, I don't know which: > [11, IGNORED(2), 33] # numpy.ma chooses this > [11, 20, 33] > "Error: shape mismatch" > > (An error is maybe the most *consistent* option; the suggestion in the > alterNEP was that masks had to match on all axes that were *not* > broadcast, so a + 2 and a + a are okay, but a + b is an error. I > assume the numpy.ma approach is also useful, but note that it has the > surprising effect that addition is not commutative: IGNORED(x) + > IGNORED(y) = IGNORED(x). Try it: > masked1 = np.ma.masked_array([1, 2, 3], mask=[False, True, False]) > masked2 = np.ma.masked_array([10, 20, 30], mask=[False, True, False]) > np.asarray(masked1 + masked2) # [11, 2, 33] > np.asarray(masked2 + masked1) # [11, 20, 33] > I don't really know what people would prefer.) > > # propagating: >>>> a + b > One of these, I don't know which: > [11, IGNORED(2), 33] # same as numpy.ma, again > [11, IGNORED(22), 33] > > # non-propagating: >>>> np.sum(a) > 4 > > # propagating: >>>> np.sum(a) > One of these, I don't know which: > IGNORED(4) > IGNORED(6) > > So from your description, I wouldn't say that you necessarily want > non-destructive+propagating -- it really depends on exactly what > computations you want to perform, and how you expect them to work. The > main difference is how reduction operations are treated. I kind of > feel like the non-propagating version makes more sense overall, but I > don't know if there's any consensus on that.
I think this is further evidence for my idea that a mask should not be undone, but is non destructive. If you want to be able to access the values after masking, have a view, or only apply the mask to a view. Reduction ufuncs make a lot of sense because they have a basis in mathematics when there are no values. Reduction ufuncs are covered in great detail in Mark's NEP. Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion