On Thu, Nov 3, 2011 at 7:54 PM, Gary Strangman <str...@nmr.mgh.harvard.edu> wrote: > For the non-destructive+propagating case, do I understand correctly that > this would mean I (as a user) could temporarily decide to IGNORE certain > portions of my data, perform a series of computation on that data, and the > IGNORED flag (or however it is implemented) would be propagated from > computation to computation? If that's the case, I suspect I'd use it all > the time ... to effectively perform data subsetting without generating > (partial) copies of large datasets. But maybe I misunderstand the > intended notion of propagation ...
I *think* it's more subtle than that, but I admit I'm somewhat confused about how exactly people would want IGNORED to work in various corner cases. (This is another part of why figuring out our audience/use-cases seems like an important first step to me... fortunately the semantics for MISSING are, I think, much more clear.) Say we have >>> a = np.array([1, IGNORED(2), 3]) >>> b = np.array([10, 20, 30]) (Here's I'm using IGNORED(2) to mean a value that is currently ignored, but if you unmasked it it would have the value 2.) Then we have: # non-propagating *or* propagating, doesn't matter: >>> a + 2 [3, IGNORED(2), 5] # non-propagating: >>> a + b One of these, I don't know which: [11, IGNORED(2), 33] # numpy.ma chooses this [11, 20, 33] "Error: shape mismatch" (An error is maybe the most *consistent* option; the suggestion in the alterNEP was that masks had to match on all axes that were *not* broadcast, so a + 2 and a + a are okay, but a + b is an error. I assume the numpy.ma approach is also useful, but note that it has the surprising effect that addition is not commutative: IGNORED(x) + IGNORED(y) = IGNORED(x). Try it: masked1 = np.ma.masked_array([1, 2, 3], mask=[False, True, False]) masked2 = np.ma.masked_array([10, 20, 30], mask=[False, True, False]) np.asarray(masked1 + masked2) # [11, 2, 33] np.asarray(masked2 + masked1) # [11, 20, 33] I don't really know what people would prefer.) # propagating: >>> a + b One of these, I don't know which: [11, IGNORED(2), 33] # same as numpy.ma, again [11, IGNORED(22), 33] # non-propagating: >>> np.sum(a) 4 # propagating: >>> np.sum(a) One of these, I don't know which: IGNORED(4) IGNORED(6) So from your description, I wouldn't say that you necessarily want non-destructive+propagating -- it really depends on exactly what computations you want to perform, and how you expect them to work. The main difference is how reduction operations are treated. I kind of feel like the non-propagating version makes more sense overall, but I don't know if there's any consensus on that. (You also have the option of just using the new where= argument to your ufuncs, which avoids some of this confusion because it gives a single mask that would apply to the whole operation. The ambiguities here arise because it's not clear what to do when applying a binary operation to two arrays that have different masks.) Maybe you could give some examples of the kinds of computations you're thinking of? -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion