On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote: > > > On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant <tra...@continuum.io> wrote: > > On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote: > > > Hi, > > > > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant <tra...@continuum.io> > > wrote: > > > >> I have heard from a few people that they are not excited by the growth of > >> the NumPy data-structure by the 3 pointers needed to hold the masked-array > >> storage. This is especially true when there is talk to potentially add > >> additional attributes to the NumPy array (for labels and other > >> meta-information). If you are willing to let us know how you feel > >> about > >> this, please speak up. > > > > I guess there are two questions here > > > > 1) Will something like the current version of masked arrays have a > > long term future in numpy, regardless of eventual API? Most likely > > answer - yes? > > I think the answer to this is yes, but it could be as a feature-filled > sub-class (like the current numpy.ma, except in C). > > I think making numpy.ma a subclass of ndarray has caused all sorts of > trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from > ndarray for implementation of various parts. The upshot is that almost > everything has to be overridden, so it didn't buy much.
This is a valid point. One could create a new object that is binary compatible with the NumPy Array but not really a sub-class but provides the array interface. We could keep Mark's modifications to the array interface as well so that it can communicate a mask. -Travis > > > > 2) Will likely changes to the masked array API make any difference to > > the number of extra pointers? Likely answer no? > > > > Is that right? > > The answer to this is very likely no on the Python side. But, on the C-side, > their could be some differences (i.e. are masked arrays a sub-class of the > ndarray or not). > > > > > I have the impression that the masked array API discussion still has > > not come out fully into the unforgiving light of discussion day, but > > if the answer to 2) is No, then I suppose the API discussion is not > > relevant to the 3 pointers change. > > You are correct that the API discussion is separate from this one. > Overall, I was surprised at how fervently people would oppose ABI changes. > As has been pointed out, NumPy and Numeric before it were not really designed > to prevent having to recompile when changes were made. I'm still not sure > that a better overall solution is not to promote better availability of > downstream binary packages than excessively worry about ABI changes in NumPy. > But, that is the current climate. > > In that climate, my concern is that we haven't finalized the API but are > rapidly cementing the *structure* of NumPy arrays into a modified form that > has real downstream implications. Two other people I have talked to share > this concern (nobody who has posted on this list before but who are heavy > users of NumPy). I may have missed the threads where it was discussed, but > have these structure changes and their implications been fully discussed? > Is there anyone else who is concerned about adding 3 more pointers (12 bytes > or 24 bytes) to the NumPy structure? > > As Chuck points out, 3 more pointers is not necessarily that big of a deal if > you are talking about a large array (though for small arrays it could > matter). But, I personally know of half-written NEPs that propose to add > more pointers to the NumPy array: > > * to allow meta-information to be attached to a NumPy array > * to allow labels to be attached to a NumPy array (ala data-array) > * to allow multiple chunks for an array. > > Are people O.K. with 5 or 6 more pointers on every NumPy array? We could > also think about adding just one more pointer to a new "enhanced" structure > that contains multiple enhancements to the NumPy array. > > > Yes, this whole thing could get out of hand with too many extras. One of the > things you could discuss with Mark is how to deal with this, or limit the > modifications. At some point the ndarray class could become cumbersome, > complicated, and difficult to maintain. We need to be careful that it doesn't > go that way. I'd like to keep it as simple as possible, the question is what > is fundamental. The main long term advantage of having masks part of the base > is the possibility of adapted loops in ufuncs, which would give the advantage > of speed. But that is just how it looks from where I stand, no doubt others > have different priorities. > > But, this whole line of discussion sounds a lot like a true sub-class of the > NumPy array at the C-level. It has the benefit that only people that use > the features of the sub-class have to worry about using the extra space. > > Mark and I will talk about this long and hard. Mark has ideas about where he > wants to see NumPy go, but I don't think we have fully accounted for where > NumPy and its user base *is* and there may be better ways to approach this > evolution. If others are interested in the outcome of the discussion > please speak up (either on the list or privately) and we will make sure your > views get heard and accounted for. > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion