On 03/07/2012 09:26 AM, Nathaniel Smith wrote: > On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris > <charlesr.har...@gmail.com> wrote: >> On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig<pierre.haes...@crans.org> >>> Coming back to Travis proposition "bit-pattern approaches to missing >>> data (*at least* for float64 and int32) need to be implemented.", I >>> wonder what is the amount of extra work to go from nafloat64 to >>> nafloat32/16 ? Is there an hardware support NaN payloads with these >>> smaller floats ? If not, or if it is too complicated, I feel it is >>> acceptable to say "it's too complicated" and fall back to mask. One may >>> have to choose between fancy types and fancy NAs... >> >> I'm in agreement here, and that was a major consideration in making a >> 'masked' implementation first. > > When it comes to "missing data", bitpatterns can do everything that > masks can do, are no more complicated to implement, and have better > performance characteristics. > >> Also, different folks adopt different values >> for 'missing' data, and distributing one or several masks along with the >> data is another common practice. > > True, but not really relevant to the current debate, because you have > to handle such issues as part of your general data import workflow > anyway, and none of these is any more complicated no matter which > implementations are available. > >> One inconvenience I have run into with the current API is that is should be >> easier to clear the mask from an "ignored" value without taking a new view >> or assigning known data. So maybe two types of masks (different payloads), >> or an additional flag could be helpful. The process of assigning masks could >> also be made a bit easier than using fancy indexing. > > So this, uh... this was actually the whole goal of the "alterNEP" > design for masks -- making all this stuff easy for people (like you, > apparently?) that want support for ignored values, separately from > missing data, and want a nice clean API for it. Basically having a > separate .mask attribute which was an ordinary, assignable array > broadcastable to the attached array's shape. Nobody seemed interested > in talking about it much then but maybe there's interest now?
In other words, good low-level support for numpy.ma functionality? With a migration path so that a separate numpy.ma might wither away? Yes, there is interest; this is exactly what I think is needed for my own style of applications (which I think are common at least in geoscience), and for matplotlib. The question is how to achieve it as simply and cleanly as possible while also satisfying the needs of the R users, and while making it easy for matplotlib, for example, to handle *any* reasonable input: ma, other masking, nan, or NA-bitpattern. It may be that a rather pragmatic approach to implementation will prove better than a highly idealized set of data models. Or, it may be that a dual approach is best, in which the flag value missing data implementation is tightly bound to the R model and the mask implementation is explicitly designed for the numpy.ma model. In any case, a reasonable level of agreement on the goals is needed. I presume Travis's involvement will facilitate a clarification of the goals and of the implementation; and I expect that much of Mark's work will end up serving well, even if much needs to be added and the API evolves considerably. Eric > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion