On 10/29/2011 12:57 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing <efir...@hawaii.edu > <mailto:efir...@hawaii.edu>> wrote: > > On 10/29/2011 12:02 PM, Olivier Delalleau wrote: > > > > > I haven't been following the discussion closely, but wouldn't it > be instead: > > a.mask[0:2] = True? > > That would be consistent with numpy.ma <http://numpy.ma> and the > opposite of Mark's > implementation. > > I can live with either, but I much prefer the numpy.ma > <http://numpy.ma> version because > it fits with the use of bit-flags for editing data; set bit 1 if it > fails check A, set bit 2 if it fails check B, etc. So, if it evaluates > as True, there is a problem, and the value is masked *out*. > > Similarly, in Marks implementation, 7 bits are available for a payload > to describe what kind of masking is meant. This seems more consistent > with True as masked (or NA) than with False as masked. > > > I wouldn't rely on the 7 bits yet. Mark left them available to keep open > possible future use, but didn't implement anything using them yet. If > memory use turns out to exclude whole sectors of application we will > have to go to bit masks.
Right; I was only commenting on a subjective sense of internal consistency. A minor point. The larger context of all this is how users end up being able to work with all the different types and specifications of "NA" (in the most general sense) data: 1) nans 2) numpy.ma 3) masks in the core (Mark's new code) 4) bit patterns Substantial code now in place--including matplotlib--relies on numpy.ma. It has some rough edges, it can be slow, it is a pain having it as a bolted-on module, it may be more complicated than it needs to be, but it fits a lot of use cases pretty well. There are many users. Everyone using matplotlib is using it, whether they know it or not. The ideal from my numpy.ma-user's standpoint would an NA-handling implementation in the core that would do two things: (1) allow a gradual transition away from numpy.ma, so that the latter would become redundant. (2) allow numpy.ma to be reasonably easily modified to use the in-core facilities for greater efficiency during the long transition. Implicit is the hope that someone (most likely not me, although I might be able to help a bit) would actually perform this modification. Mark's mission, paid for by Enthought, was not to please numpy.ma users, but to add NA-handling that would be comfortable for R-users. He chose to do so with the idea that two possible implementations (masks and bitpatterns) were desirable, each with strengths and weaknesses, and that so as to get *something* done in the very short time he had left, he would start with the mask implementation. We now have the result, incomplete, but not breaking anything. Additional development (coding as well as designing) will be needed. The main question raised by Matthew and Nathaniel is, I think, whether Mark's code should develop in a direction away from the R-compatibility model, with the idea that the latter would be handled via a bit-pattern implementation, some day, when someone codes it; or whether it should remain as the prototype and first implementation of an API to handle the R-compatible use case, minimizing any divergence from any eventual bit-pattern implementation. The answer to this depends on several questions, including: 1) Who is available to do how much implementation of any of the possibilities? My reading of Travis's blog and rare posts to this list suggest that he hopes and expects to be able to free up coding time. Perhaps he will clarify that soon. 2) What sorts of changes would actually be needed to make the present implementation good enough for the R use case? Evolutionary, or revolutionary? 3) What sorts of changes would help with the numpy.ma use case? Evolutionary, or revolutionary. 4) Given available resources, how can we maximize progress: making numpy more capable, easier to use, etc. Unless the answers to questions 2 *and* 3 are "revolutionary", I don't see the point in pulling Mark's changes out of master. At most, the documentation might be changed to mark the NA API as "experimental" for a release or two. Overall, I think that the differences between the R use case and the ma use case have been overstated and over-emphasized. Eric > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion