I haven't actually tested the code, but AFAIK the following is a short overview with examples of how the two orthogonal feature axis (ABSENT/IGNORE and PROPAGATE/SKIP) are related and how it all is supposed to work.
I have never talked to Mark or anybody else in this list (that is, outside of this list), so I may well be mistaken. Thus, sorry if there are any inaccuracies and/or if you are already aware of what I'm describing here. So please tell me if this has helped clarify why I (and I hope others) think the implementation mechanism is independent of the semantics. Lluis ABSENT vs IGNORE ================ Travis Oliphant writes: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing. In other words, I think distinguishing between > masks > and bit-patterns is not just an implementation detail, but provides a useful > concept for multiple use-cases. I think it's an implementation detail as long as you have two clear ways of separating them. Summarizing: let's forget for a moment that "mask" has a meaning in english: - "maskna" corresponds to ABSENT - "ownmaskna" corresponds to IGNORED The problem here is that of the two implementation mechanisms (masks and bitpatterns), only the first can provide both semantics. Let's start with an array that already supports NAs: In [1]: a = np.array([1, 2, 3], maskna = True) ABSENT (destructive NA assignment) ---------------------------------- Once you assign NA, even if you're using NA masks, the value seems to be lost forever (i.e., the assignment is destructive regardless of the value): In [2]: b = a.view() In [3]: c = a.view(maskna = True) In [4]: b[0] = np.NA In [5]: a Out[5]: array([NA, 2, 3]) In [6]: b Out[6]: array([NA, 2, 3]) In [7]: c Out[7]: array([NA, 2, 3]) This is the default behaviour, and is probably what the regular user expects by what has been learned from previous uses of the "view" method. Note that here "maskna" acts as an idempotent operation. Once an array has the "maskna" property, all its views will transitively (and destructively) use it. Also note that an array copy will make a copy of both "regular" data and NA values, as expected. IGNORED (non-destructive NA assignment) --------------------------------------- But you can also have non-destructuve NA assignments, although *only* if you explicitly (and thus purposefully) ask for it -> ownmaskna In [8]: b = a.view(ownmaskna = True) In [9]: b[1] = np.NA In [10]: a Out[10]: array([NA, 2, 3]) In [11]: b Out[11]: array([NA, NA, 3]) In [12]: a[2] = np.NA In [13]: a Out[13]: array([NA, 2, NA]) In [14]: b Out[14]: array([NA, NA, 3]) The mask is a copy: In [15]: a[0] = 1 In [16]: a Out[16]: array([1, 2, 3], maskna = True) In [17]: b Out[17]: array([NA, NA, 3]) But the data itself is not (aka, non-NA values are *always* destructive, but I think this is out of the scope of this discussion): In [17]: a[0] = -10 In [18]: a[2] = -30 In [19]: a Out[19]: array([-10, 2, -30], maskna = True) In [20]: b Out[20]: array([NA, NA, -30]) The dark corner --------------- The only potential misunderstanding can be the creation of a NA-masked array from a "regular" array. This is precisely why I put this case at the end, as it seems to break the intuition some people have about assignment being always destructive (unless you explicitly ask for IGNORED, which is not the case): In [21]: a = np.array([1, 2, 3]) Out[21]: array([1, 2, 3]) In [22]: b = a.view(maskna = True) In [23]: b[0] = np.NA In [24]: a Out[24]: array([1, 2, 3]) In [25]: b Out[25]: array([NA, 2, 3]) This is in fact a corner case, and there is no obvious (and efficient!) way to handle it. As "a" is just a "regular" array, and has no support for any type of NA values (neither masks nor bit-patterns), assignments to any of its views cannot, in any case, be destructive. Note that the previous holds true because it currently is a design decision to forbid the in-flight conversion from "regular" to "NA-enabled" arrays. In fact I forgot that, when reading the docs in [1], I thought that a slight change could make it all feel more consistent: the view of a regular array can have NA values only if "ownmaskna" is used (IGNORED/non-destructive NA assignments), and will give an error if "maskna" is used in entry number 19. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views PROPAGATE vs SKIP ================= I've also read some comments regarding this. Maybe I didn't explain myself correctly in previous mails, or maybe I just misunderstood other people's mails (which might not be about this at all). PROPAGATE --------- All ufuncs in ndarray propagate NA values. Note that ABSENT (destructive NA-assignment) is also a default, so we could say that the default is R-like behaviour (AFAIK). SKIP ---- You have a different array type (let's call it skip_array), where all ufuncs do *not* propagate NA values. Middle-ground ------------- For the sake of code maintainability (and the specific needs one might have on a per-ufunc basis), in fact you only have one type of ndarray that supports both PROPAGATE and SKIP with the very same NA values. This can be controlled on a per-ufunc basis through the "skipna" argument that is present on all ufuncs, so that ndarray defaults to "skipna = False" and skip_array defaults to "skipna = True". The latter is done by simply defining an ndarray subclass that provides an ufunc wrapper like this (fake code): class skip_array (np.ndarray): ... def __ufunc_wrap__ (ufunc, *args, **kwargs): kwargs["skipna"] = True return ufunc(*args, **kwargs) There are other ways of doing it, but IMHO how it can be done doesn't matter right now. -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion