Travis et al, This isn't a reply to anything specific in your email and I apologize if there is a better thread or place to share this information. I've been meaning to participate in the discussion for a long time and never got around to it. The main thing I'd like to is convey my typical use of the numpy.ma module as an environmental engineer analyzing censored datasets --contaminant concentrations that are either at well understood values (not masked) or some unknown value below an upper bound (masked).
My basic understanding is that this discussion revolved around how to treat masked data (ignored vs missing) and how to implement one, both, or some middle ground between those two concepts. If I'm off-base, just ignore all of the following. For my purposes, numpy.ma is implemented in a way very well suited to my needs. Here's a gist of a something that was *really* hard for me before I discovered numpy.ma and numpy in general. (this is a bit much, see below for the highlights) https://gist.github.com/2361814 The main message here is that I include the upper bounds of the unknown values (detection limits) in my array and use that to statistically estimate their values. I must be able to retrieve the masked detection limits throughout this process. Additionally the masks as currently implemented allow me sort first the undetected values, then the detected values (see __rosRanks in the gist). As boots-on-the-ground user of numpy, I'm ecstatic that this tool exists. I'm also pretty flexible and don't anticipated any major snags in my work if things change dramatically as the masked/missing/ignored functionality evolves. Thanks to everyone for the hard work and great tools, -Paul Hobson On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant <tra...@continuum.io> wrote: > Hey all, > > I've been waiting for Mark Wiebe to arrive in Austin where he will spend > several weeks, but I also know that masked arrays will be only one of the > things he and I are hoping to make head-way on while he is in Austin. > Nevertheless, we need to make progress on the masked array discussion and if > we want to finalize the masked array implementation we will need to finish > the design. > > I've caught up on most of the discussion including Mark's NEP, Nathaniel's > NEP and other writings and the very-nice mailing list discussion that > included a somewhat detailed discussion on the algebra of IGNORED. I think > there are some things still to be decided. However, I think some things are > pretty clear: > > 1) Masked arrays are going to be fundamental in NumPy and these should > replace most people's use of numpy.ma. The numpy.ma code will remain as a > compatibility layer > > 2) The reality of #1 and NumPy's general philosophy to date means that > masked arrays in NumPy should support the common use-cases of masked arrays > (including getting and setting of the mask from the Python and C-layers). > However, the semantic of what the mask implies may change from what numpy.ma > uses to having a True value meaning selected. > > 3) There will be missing-data dtypes in NumPy. Likely only a limited > sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, > and object) with an API that allows more to be defined if desired. These > will most likely use Mark's nice machinery for managing the calculation > structure without requiring new C-level loops to be defined. > > 4) I'm still not sure about whether the IGNORED concept is necessary > or not. I really like the separation that was emphasized between > implementation (masks versus bit-patterns) and operations (propagating versus > non-propagating). Pauli even created another dimension which I don't > totally grok and therefore can't remember. Pauli? Do you still feel that > is a necessary construction? But, do we need the IGNORED concept to indicate > what amounts to different default key-word arguments to functions that > operate on NumPy arrays containing missing data (however that is > represented)? My current weak view is that it is not really necessary. > But, I could be convinced otherwise. > > I think the good news is that given Mark's hard-work and Nathaniel's > follow-up we are really quite far along. I would love to get Nathaniel's > opinion about what remains un-done in the current NumPy code-base. I would > also appreciate knowing (from anyone with an interest) opinions of items 1-4 > above and anything else I've left out. > > Thanks, > > -Travis > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion