On Fri, Nov 4, 2011 at 5:26 AM, Pierre GM <pgmdevl...@gmail.com> wrote:
> > On Nov 03, 2011, at 23:07 , Joe Kington wrote: > > > I'm not sure if this is exactly a bug, per se, but it's a very confusing > consequence of the current design of masked arrays… > I would just add a "I think" between the "but" and "it's" before I could > agree. > > > Consider the following example: > > > > import numpy as np > > > > x = np.ma.masked_all(10, dtype=np.float32) > > print x > > x[x > 0] = 5 > > print x > > > > The exact results will vary depending the contents of the empty memory > the array was initialized from. > > Not a surprise. But isn't mentioned in the doc somewhere that using a > masked array as index is a very bad idea ? And that you should always fill > it before you use it as an array ? (Actually, using a MaskedArray as index > used to raise an IndexError. But I thought it was a bit too harsh, so I > dropped it). > Not that I can find in the docs (Perhaps I just missed it?). At any rate, it's not mentioned in the numpy.ma section on indexing: http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#indexing-and-slicing The only mention of it is a comment in MaskedArray.__setitem__ where the IndexError is commented out. > ma.masked_all is an empty array with all its elements masked. Ie, you have > an uninitialized ndarray as data, and a bool array of the same size, full > of True. The operative word is here "uninitialized". > > > This wreaks havoc when filtering the contents of masked arrays (and > leads to hard-to-find bugs!). The mask of the array in question is altered > at random (or, rather, based on the masked values as well as the masked > ones). > > Once again, you're working on an *uninitialized* array. What you should > really do is to initialize it first, e.g. by 0, or whatever would make > sense in your field, and then work from that. > Sure, I shouldn't have used that as the example. My point was that it's counter-intuitive that something like "x[x > 0] = 0" alters the mask of x based on the values of _masked_ elements. How it's initialized is irrelevant (though, of course, it wouldn't be semi-random if it were initialized in another way). > > I can see the reasoning behind the way it works. It makes sense that "x > > 0" returns a masked boolean array with potentially several elements > masked, as well as the unmasked elements greater than 0. > > Well, "x > 0" is also a masked array, with its mask full of True. Not very > usable by itself, and especially *not* for indexing. > > However, wouldn't it make more sense to have MaskedArray.__setitem__ > only operate on the unmasked elements of the "indx" passed in (at least in > the case where the assigned "value" isn't a masked array)? > > > Normally, that should be the case. But you're not working in "normal" > conditions, here. A bit like trying to boil water on a stove with a plastic > pan. > "x[x > threshold] = something" is a very common idiom for ndarrays. I think most people would find it surprising that this operation doesn't ignore the masked values. I noticed this because one of my coworkers was complaining that a piece of my code was "messing up" their masked arrays. I'd never tested it with masked arrays, but it took me ages to find, just because I wasn't looking in places where I was just using common idioms. In this particular case, they'd initialized it with "masked_all", so it effectively altered the mask of the array at random. Regardless of how it was initialized, though, it is surprising that the mask of "x" is changed based on masked values. I just think it would be useful for it to be documented. Cheers, -Joe
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion