On Thu, Jun 30, 2011 at 7:31 AM, Matthew Brett <matthew.br...@gmail.com>wrote:
> Hi, > > On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith <n...@pobox.com> wrote: > > Anyway, it's pretty clear that in this particular case, there are two > > distinct features that different people want: the missing data > > feature, and the masked array feature. The more I think about it, the > > less I see how they can be combined into one dessert topping + floor > > wax solution. Here are three particular points where they seem to > > contradict each other: > ... > [some proposals] > > In the interest of making the discussion as concrete as possible, here > is my draft of an alternative proposal for NAs and masking, based on > Nathaniel's comments. Writing it, it seemed to me that Nathaniel is > right, that the ideas become much clearer when the NA idea and the > MASK idea are separate. Please do pitch in for things I may have > missed or misunderstood: > > ############################################### > A alternative-NEP on masking and missing values > ############################################### > > The principle of this aNEP is to separate the APIs for masking and for > missing > values, according to > > * The current implementation of masked arrays > * Nathaniel Smith's proposal. > > This discussion is only of the API, and not of the implementation. > > ************** > Initialization > ************** > > First, missing values can be set and be displayed as ``np.NA, NA``:: > > >>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]') > array([1., 2., NA, 7.], dtype='NA[<f8]') > > As the initialization is not ambiguous, this can be written without the NA > dtype:: > > >>> np.array([1.0, 2.0, np.NA, 7.0]) > array([1., 2., NA, 7.], dtype='NA[<f8]') > > Masked values can be set and be displayed as ``np.MASKED, MASKED``:: > > >>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True) > array([1., 2., MASKED, 7.], masked=True) > > As the initialization is not ambiguous, this can be written without > ``masked=True``:: > > >>> np.array([1.0, 2.0, np.MASKED, 7.0]) > array([1., 2., MASKED, 7.], masked=True) > > ****** > Ufuncs > ****** > > By default, NA values propagate:: > > >>> na_arr = np.array([1.0, 2.0, np.NA, 7.0]) > >>> np.sum(na_arr) > NA('float64') > > unless the ``skipna`` flag is set:: > > >>> np.sum(na_arr, skipna=True) > 10.0 > > By default, masking does not propagate:: > > >>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0]) > >>> np.sum(masked_arr) > 10.0 > > unless the ``propmsk`` flag is set:: > > >>> np.sum(masked_arr, propmsk=True) > MASKED > > An array can be masked, and contain NA values:: > > >>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0]) > > In the default case, the behavior is obvious:: > > >>> np.sum(both_arr) > NA('float64') > > It's also obvious what to do with ``skipna=True``:: > > >>> np.sum(both_arr, skipna=True) > 10.0 > >>> np.sum(both_arr, skipna=True, propmsk=True) > MASKED > > To break the tie between NA and MSK, NAs propagate harder:: > > >>> np.sum(both_arr, propmsk=True) > NA('float64') > > ********** > Assignment > ********** > > is obvious in the NA case:: > > >>> arr = np.array([1.0, 2.0, 7.0]) > >>> arr[2] = np.NA > TypeError('dtype does not support NA') > >>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]') > >>> na_arr[2] = np.NA > >>> na_arr > array([1., 2., NA], dtype='NA[<f8]') > > Direct assignnent in the masked case is magic and confusing, and so happens > only > via the mask:: > > >>> masked_array = np.array([1.0, 2.0, 7.0], masked=True) > >>> masked_arr[2] = np.NA > TypeError('dtype does not support NA') > >>> masked_arr[2] = np.MASKED > TypeError('float() argument must be a string or a number') > >>> masked_arr.visible[2] = False > >>> masked_arr > array([1., 2., MASKED], masked=True) > > See y'all, > > I honestly don't see the problem here. The difference isn't between masked_values/missing_values, it is between masked arrays and masked views of unmasked arrays. I think the view concept is central to what is going on. It may not be what folks are used to, but it strikes me as a clarifying advance rather than a mixed up confusion. Admittedly, it depends on the numpy centric ability to have views, but views are a wonderful thing. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion