Hi, On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM <pgmdevl...@gmail.com> wrote: > > On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote: >> ############################################### >> A alternative-NEP on masking and missing values >> ############################################### > > I like the idea of two different special values, np.NA for missing values, > np.IGNORE for masked values. np.NA values in an array define what was > implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while > np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous > that way. > > >> ************** >> Initialization >> ************** >> >> First, missing values can be set and be displayed as ``np.NA, NA``:: >> >>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]') >> array([1., 2., NA, 7.], dtype='NA[<f8]') >> >> As the initialization is not ambiguous, this can be written without the NA >> dtype:: >> >>>>> np.array([1.0, 2.0, np.NA, 7.0]) >> array([1., 2., NA, 7.], dtype='NA[<f8]') >> >> Masked values can be set and be displayed as ``np.MASKED, MASKED``:: >> >>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True) >> array([1., 2., MASKED, 7.], masked=True) >> >> As the initialization is not ambiguous, this can be written without >> ``masked=True``:: >> >>>>> np.array([1.0, 2.0, np.MASKED, 7.0]) >> array([1., 2., MASKED, 7.], masked=True) > > I'm not happy with this 'masked' parameter, at all. What's the point? Either > you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing > something here.
If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then obviously I mean it should be masked, so the 'masked=True' here is completely redundant, yes, I agree. And in fact: np.array([1.0, 2.0, np.MASKED, 7.0], masked=False) should raise an error. On the other hand, if I make a normal array: arr = np.array([1.0, 2.0, 7.0]) and then do this: arr.visible[2] = False then either I should raise an error (it's not a masked array), or, more magically, construct a mask on the fly. This somewhat breaks expectations though, because you might just have made a largish mask array without having any clue that that had happened. > >> ****** >> Ufuncs >> ****** > > All fine. >> >> ********** >> Assignment >> ********** >> >> is obvious in the NA case:: >> >>>>> arr = np.array([1.0, 2.0, 7.0]) >>>>> arr[2] = np.NA >> TypeError('dtype does not support NA') >>>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]') >>>>> na_arr[2] = np.NA >>>>> na_arr >> array([1., 2., NA], dtype='NA[<f8]') > > OK > > >> >> Direct assignnent in the masked case is magic and confusing, and so happens >> only >> via the mask:: >> >>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True) >>>>> masked_arr[2] = np.NA >> TypeError('dtype does not support NA') >>>>> masked_arr[2] = np.MASKED >> TypeError('float() argument must be a string or a number') >>>>> masked_arr.visible[2] = False >>>>> masked_arr >> array([1., 2., MASKED], masked=True) > > What about the reverse case ? When you assign a regular value to a > np.NA/np.IGNORE item ? Well, for the np.NA case, this is straightforward: na_arr[2] = 3 It's just assignment. For ``masked_array[2] = 3`` - I don't know, I guess whatever we are used to. What do you think? Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion