On 27 April 2012 17:42, Travis Oliphant <tra...@continuum.io> wrote: > > 1) There is a lot of code out there that does not know anything about > masks and is not used to checking for masks. It enlarges the basic > abstraction in a way that is not backwards compatible *conceptually*. > This smells fishy to me and I could see a lot of downstream problems from > libraries that rely on NumPy. >
That's exactly why I'd love to see plain arrays remain functionally unchanged. It's just a small, random sample, but here's how a few routines from NumPy and SciPy sanitise their inputs... numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray scipy.spatial.KDTree - numpy.asarray scipy.spatial.cKDTree - numpy.ascontiguousarray scipy.integrate.odeint - PyArray_ContiguousFromObject scipy.interpolate.interp1d - numpy.array scipy.interpolate.griddata - numpy.asanyarray & numpy.ascontiguousarray So, assuming numpy.ndarray became a strict subclass of some new masked array, it looks plausible that adding just a few checks to numpy.ndarray to exclude the masked superclass would prevent much downstream code from accidentally operating on masked arrays. > 2) We cannot agree on how masks should be handled and consequently don't > have a real plan for migrating numpy.ma to use these masks. So, we are > just growing the API and introducing uncertainty for unclear benefit --- > especially for the person that does not want to use masks. > > I've not yet looked at how numpy.ma users could be migrated. But if we make masked arrays a strict superclass and leave the numpy/ndarray interface and behaviour unchanged, API growth shouldn't be an issue. End-users will be able to completely ignore the existence of masked arrays (except for the minority(?) for whom the ABI/re-compile issue would be relevant). > 3) Subclassing in C in Python requires that C-structures are *binary* > compatible. This implies that all subclasses have *more* attributes than > the superclass. The way it is currently implemented, that means that POAs > would have these extra pointers they don't need sitting there to satisfy > that requirement. From a C-struct perspective it therefore makes more > sense for MAs to inherit from POAs. Ideally, that shouldn't drive the > design, but it's part of the landscape in NumPy 1.X > > I'd hate to see the logical class hierarchy inverted (or collapsed to a single class) just to save a pointer or two from the struct. Now seems like a golden opportunity to fix the relationship between masked and plain arrays. I'm assuming (and implicitly checking that assumption with this statement!) that there's far more code using the Python interface to NumPy, than there is code using the C interface. So I'm urging that the logical consistency of the Python interface (and even the C and Cython interfaces) takes precedence over the C-struct memory saving. I'm not sure I agree with "extra pointers they don't need". If we make plain arrays a subclass of masked arrays, aren't these pointers essential to ensure masked array methods can continue to work on plain arrays without requiring special code paths? > I have some ideas about how to move forward, but I'm anxiously awaiting > the write-up that Mark and Nathaniel are working on to inform and enhance > those ideas. > +1 As an aside, the implication of preserving the behaviour of the numpy/ndarray interface is that masked arrays will need a *new* interface. For example: >>> import mumpy # Yes - I know it's a terrible name! But I had to write *something* ... sorry! ;-) >>> import numpy >>> a = mumpy.array(...) # makes a masked array >>> b = numpy.array(...) # makes a plain array >>> isinstance(a, mumpy.ndarray) True >>> isinstance(b, mumpy.ndarray) True >>> isinstance(a, numpy.ndarray) False >>> isinstance(b, numpy.ndarray) True Richard Hattersley
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion