On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote: > Hi, > > On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant <tra...@continuum.io> wrote: >>>> >>>> I think the answer to this is yes, but it could be as a feature-filled >>>> sub-class (like the current numpy.ma, except in C). >>> >>> I'd love to hear that argument fleshed out in more detail - do you have >>> time? >> >> >> My proposal here is to basically take the current github NumPy >> data-structure and make this a sub-type (in C) of the NumPy 1.6 >> data-structure which is unchanged in NumPy 1.7. >> >> This would not require removing code but would require another PyTypeObject >> and associated structures. I expect Mark could do this work in 2-4 weeks. >> We also have other developers who could help in order to get the sub-type in >> NumPy 1.7. What kind of details would you like to see? > > I was dimly thinking of the same questions that Chuck had - about how > subclassing would relate to the ufunc changes.
Basically, there are two sets of changes as far as I understand right now: 1) ufunc infrastructure understands masked arrays 2) ndarray grew attributes to represent masked arrays I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. Such changes would not require ripping code out --- just altering the presentation a bit. Yet, they could have large long-term implications, that we should explore before they get fixed. Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. The big questions are does this object work in the calculation infrastructure. Can I add an array to a masked array. Does it have a sum method? I think it could be argued that a masked array does have a "is a" relationship with an array. It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting. -Travis > >> I just think we need more data and uses and this would provide a way to get >> that without making a forced decision one way or another. > > Is the proposal that this would be an alternative API to numpy.ma? > Is numpy.ma not itself satisfactory as a test of these uses, because > of performance or some other reason? > >>>>> 2) Will likely changes to the masked array API make any difference to >>>>> the number of extra pointers? Likely answer no? >>>>> >>>>> Is that right? >>>> >>>> The answer to this is very likely no on the Python side. But, on the >>>> C-side, their could be some differences (i.e. are masked arrays a >>>> sub-class of the ndarray or not). >>>> >>>>> >>>>> I have the impression that the masked array API discussion still has >>>>> not come out fully into the unforgiving light of discussion day, but >>>>> if the answer to 2) is No, then I suppose the API discussion is not >>>>> relevant to the 3 pointers change. >>>> >>>> You are correct that the API discussion is separate from this one. >>>> Overall, I was surprised at how fervently people would oppose ABI >>>> changes. As has been pointed out, NumPy and Numeric before it were not >>>> really designed to prevent having to recompile when changes were made. >>>> I'm still not sure that a better overall solution is not to promote better >>>> availability of downstream binary packages than excessively worry about >>>> ABI changes in NumPy. But, that is the current climate. >>> >>> The objectors object to any binary ABI change, but not specifically >>> three pointers rather than two or one? >> >> Adding pointers is not really an ABI change (but removing them after they >> were there would be...) It's really just the addition of data to the NumPy >> array structure that they aren't going to use. Most of the time it would >> not be a real problem (the number of use-cases where you have a lot of small >> NumPy arrays is small), but when it is a problem it is very annoying. >> >>> >>> Is their point then about ABI breakage? Because that seems like a >>> different point again. >> >> Yes, it's not that. >> >>> >>> Or is it possible that they are in fact worried about the masked array API? >> >> I don't think most people whose opinion would be helpful are really tuned in >> to the discussion at this point. I think they just want us to come up with >> an answer and then move forward. But, they will judge us based on the >> solution we come up with. >> >>> >>>> Mark and I will talk about this long and hard. Mark has ideas about where >>>> he wants to see NumPy go, but I don't think we have fully accounted for >>>> where NumPy and its user base *is* and there may be better ways to >>>> approach this evolution. If others are interested in the outcome of the >>>> discussion please speak up (either on the list or privately) and we will >>>> make sure your views get heard and accounted for. >>> >>> I started writing something about this but I guess you'd know what I'd >>> write, so I only humbly ask that you consider whether it might be >>> doing real damage to allow substantial discussion that is not >>> documented or argued out in public. >> >> It will be documented and argued in public. We are just going to have >> one off-list conversation to try and speed up the process. You make a >> valid point, and I appreciate the perspective. Please speak up again >> after hearing the report if something is not clear. I don't want this to >> even have the appearance of a "back-room" deal. >> >> Mark and I will have conversations about NumPy while he is in Austin. >> There are many other active stake-holders whose opinions and views are >> essential for major changes. Mark and I are working on other things >> besides just NumPy and all NumPy changes will be discussed on list and >> require consensus or super-majority for NumPy itself to change. I'm not >> sure if that helps. Is there more we can do? > > As you might have heard me say before, my concern is that it has not > been easy to have good discussions on this list. I think the problem > has been that is has not been clear what the culture was, and how > decisions got made, and that had led to some uncomfortable and > unhelpful discussions. My plea would be for you as BDF$N to strongly > encourage on-list discussions and discourage off-list discussions as > far as possible, and to help us make the difficult public effort to > bash out the arguments to clarity and consensus. I know that's a big > ask. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion