Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant <oliph...@enthought.com> wrote: > So, I am very interested in making sure I remember the details of the > counterproposal. What I recall is that you wanted to be able to > differentiate between a "bit-pattern" mask and a boolean-array mask in the > API. I believe currently even when bit-pattern masks are implemented the > difference will be "hidden" from the user on the Python level. > > I am sure to be missing other parts of the discussion as I have been in and > out of it.
The ideas -------------- The question that we were addressing in the alter-NEP was: should missing values implemented as bitpatterns appear to be the same as missing values implemented with masks? We said no, and Mark said yes. To restate the argument in brief; Nathaniel and I and some others thought that there were two separable ideas in play: 1) A value that is finally and completely missing. == ABSENT 2) A value that we would like to ignore for the moment but might want back at some future time == IGNORED (I'm using the adjectives ABSENT and IGNORED here to be short for the objects 'absent value' and 'ignored value'. This is to distinguish from the verbs below). We thought bitpatterns were a good match for the former, and masking was a good match for the latter. We all agreed there were two things you might like to do with values that were missing in both senses above: A) PROPAGATE; V + 1 == V B) SKIP; K + 1 == 1 (Note verbs for the behaviors). I believe the original np.ma masked arrays always SKIP. In [2]: a = np.ma.masked_array? In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) In [4]: a Out[4]: masked_array(data = [-- 2], mask = [ True False], fill_value = 999999) In [5]: a.sum() Out[5]: 2 There was some discussion as to whether there was a reason to think that ABSENT should always or by default PROPAGATE, and IGNORED should always or by default SKIP. Chuck is referring to this idea when he said further up this thread: > For instance, I'm thinking skipna=1 is the natural default for the masked > arrays. The current implementation --------------------------------------- What we have now is an implementation of masked arrays, but more tightly integrated into the numpy core. In our language we have an implementation of IGNORED that is tuned to be nearly indistinguishable from the behavior we are expecting of ABSENT. Specifically, once you have done this: In [9]: a = np.array([99, 2], maskna=True) you can get something representing the mask: In [11]: np.isna(a) Out[11]: array([False, False], dtype=bool) but I believe there is no way of setting the mask directly. In order to set the mask, you have to do what looks like an assignment: In [12]: a[0] = np.NA In [14]: a Out[14]: array([NA, 2]) In fact, what has happened is the mask has changed, but the underlying value has not: In [18]: orig = np.array([99, 2]) In [19]: a = orig.view(maskna=True) In [20]: a[0] = np.NA In [21]: a Out[21]: array([NA, 2]) In [22]: orig Out[22]: array([99, 2]) This is different from real assignment: In [23]: a[0] = 0 In [24]: a Out[24]: array([0, 2], maskna=True) In [25]: orig Out[25]: array([0, 2]) Some effort has gone into making it difficult to pull off the mask: In [30]: a.view(np.int64) Out[30]: array([NA, 2]) In [31]: a.view(np.int64).flags Out[31]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False MASKNA : True OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [32]: a.astype(np.int64) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/mb312/<ipython-input-32-e7f3381c9692> in <module>() ----> 1 a.astype(np.int64) ValueError: Cannot assign NA to an array which does not support NAs The default behavior of the masked values is PROPAGATE, but they can be individually made to SKIP: In [28]: a.sum() # PROPAGATE Out[28]: NA(dtype='int64') In [29]: a.sum(skipna=True) # SKIP Out[29]: 2 Where's the beef? ------------------------- I personally still think that it is confusing to fuse the concept of: 1) Masked arrays 2) Arrays with bitpattern codes for missing and the concepts of A) ABSENT and B) IGNORED Consequences for current code -------------------------------------------- Specifically, it still seems to me to make sense to prefer this: >> a = np.array([99, 2[, masking=True) >> a.mask [ True, True ] >> a.sum() 101 >> a.mask[0] = False >> a.sum() 2 It might make sense, as Chuck suggests, to change the default to 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED and 'skipna' to skipignored' for clarity. I still think the pseudo-assignment: In [20]: a[0] = np.NA is confusing, and should be removed. Later, should we ever have bitpatterns, there would be something like np.ABSENT. This of course would make sense for assignment: In [20]: a[0] = np.ABSENT There would be another keyword argument 'skipabsent=False' such that, when this is False, the ABSENT values propagate. Honestly, I think that NA should be a synonym for ABSENT, and so should be removed until the dust has settled, and restored as (np.NA == np.ABSENT) And I think, these two ideas, of masking / IGNORED and bitpattern / ABSENT, would be much easier to explain. That's my best shot. Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion