On Thursday, October 27, 2011, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <oliph...@enthought.com> wrote: >> >> That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). >> >> What is the counter-argument to this proposal? >> > > What exactly do you find convincing? The current masks propagate by default: > > In [1]: a = ones(5, maskna=1) > > In [2]: a[2] = NA > > In [3]: a > Out[3]: array([ 1., 1., NA, 1., 1.]) > > In [4]: a + 1 > Out[4]: array([ 2., 2., NA, 2., 2.]) > > In [5]: a[2] = 10 > > In [5]: a > Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > > > I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. > > The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. > > Chuck > >
I think chuck sums it up quite nicely. The implementation detail about using mask versus bit patterns can still be discussed and addressed. Personally, I just don't see how parameterized dtypes would be easier to use than the pseudo assignment. The elegance of mark's solution was to consider the treatment of missing data in a unified manner. This puts missing data in a more prominent spot for extension builders, which should greatly improve support throughout the ecosystem. By letting there be a single missing data framework (instead of two) all that users need to figure out is when they want nan-like behavior (propagate) or to be more like masks (skip). Numpy takes care of the rest. There is a reason why I like using masked arrays because I don't have to use nansum in my library functions to guard against the possibility of receiving nans. Duck-typing is a good thing. My argument against separating IGNORE and PROPAGATE is that it becomes too tempting to want to mix these in an array, but the desired behavior would likely become ambiguous.. There is one other proplem that I just thought of that I don't think has been outlined in either NEP. What if I perform an operation between an array set up with propagate NAs and an array with skip NAs? cheers, Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion