On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote:
> Hi,
>
> On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant <[email protected]> wrote:
>>>>
>>>> I think the answer to this is yes, but it could be as a feature-filled
>>>> sub-class (like the current numpy.ma, except in C).
>>>
>>> I'd love to hear that argument fleshed out in more detail - do you have
>>> time?
>>
>>
>> My proposal here is to basically take the current github NumPy
>> data-structure and make this a sub-type (in C) of the NumPy 1.6
>> data-structure which is unchanged in NumPy 1.7.
>>
>> This would not require removing code but would require another PyTypeObject
>> and associated structures. I expect Mark could do this work in 2-4 weeks.
>> We also have other developers who could help in order to get the sub-type in
>> NumPy 1.7. What kind of details would you like to see?
>
> I was dimly thinking of the same questions that Chuck had - about how
> subclassing would relate to the ufunc changes.
Basically, there are two sets of changes as far as I understand right now:
1) ufunc infrastructure understands masked arrays
2) ndarray grew attributes to represent masked arrays
I am proposing that we keep 1) but change 2) so that only certain kinds of
NumPy arrays actually have the extra function pointers (effectively a
sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject
become a base-object, but the other members of the C-structure are not even
present unless the Masked flag is set. Such changes would not require ripping
code out --- just altering the presentation a bit. Yet, they could have large
long-term implications, that we should explore before they get fixed.
Whether masked arrays should be a formal sub-class is actually an un-related
question and I generally lean in the direction of not encouraging sub-classes
of the ndarray. The big questions are does this object work in the
calculation infrastructure. Can I add an array to a masked array. Does it
have a sum method? I think it could be argued that a masked array does have a
"is a" relationship with an array. It can also be argued that it is better to
have a "has a" relationship with an array and be-it's own-object. Either way,
this object could still have it's first-part be binary compatible with a NumPy
Array, and that is what I'm really suggesting.
-Travis
>
>> I just think we need more data and uses and this would provide a way to get
>> that without making a forced decision one way or another.
>
> Is the proposal that this would be an alternative API to numpy.ma?
> Is numpy.ma not itself satisfactory as a test of these uses, because
> of performance or some other reason?
>
>>>>> 2) Will likely changes to the masked array API make any difference to
>>>>> the number of extra pointers? Likely answer no?
>>>>>
>>>>> Is that right?
>>>>
>>>> The answer to this is very likely no on the Python side. But, on the
>>>> C-side, their could be some differences (i.e. are masked arrays a
>>>> sub-class of the ndarray or not).
>>>>
>>>>>
>>>>> I have the impression that the masked array API discussion still has
>>>>> not come out fully into the unforgiving light of discussion day, but
>>>>> if the answer to 2) is No, then I suppose the API discussion is not
>>>>> relevant to the 3 pointers change.
>>>>
>>>> You are correct that the API discussion is separate from this one.
>>>> Overall, I was surprised at how fervently people would oppose ABI
>>>> changes. As has been pointed out, NumPy and Numeric before it were not
>>>> really designed to prevent having to recompile when changes were made.
>>>> I'm still not sure that a better overall solution is not to promote better
>>>> availability of downstream binary packages than excessively worry about
>>>> ABI changes in NumPy. But, that is the current climate.
>>>
>>> The objectors object to any binary ABI change, but not specifically
>>> three pointers rather than two or one?
>>
>> Adding pointers is not really an ABI change (but removing them after they
>> were there would be...) It's really just the addition of data to the NumPy
>> array structure that they aren't going to use. Most of the time it would
>> not be a real problem (the number of use-cases where you have a lot of small
>> NumPy arrays is small), but when it is a problem it is very annoying.
>>
>>>
>>> Is their point then about ABI breakage? Because that seems like a
>>> different point again.
>>
>> Yes, it's not that.
>>
>>>
>>> Or is it possible that they are in fact worried about the masked array API?
>>
>> I don't think most people whose opinion would be helpful are really tuned in
>> to the discussion at this point. I think they just want us to come up with
>> an answer and then move forward. But, they will judge us based on the
>> solution we come up with.
>>
>>>
>>>> Mark and I will talk about this long and hard. Mark has ideas about where
>>>> he wants to see NumPy go, but I don't think we have fully accounted for
>>>> where NumPy and its user base *is* and there may be better ways to
>>>> approach this evolution. If others are interested in the outcome of the
>>>> discussion please speak up (either on the list or privately) and we will
>>>> make sure your views get heard and accounted for.
>>>
>>> I started writing something about this but I guess you'd know what I'd
>>> write, so I only humbly ask that you consider whether it might be
>>> doing real damage to allow substantial discussion that is not
>>> documented or argued out in public.
>>
>> It will be documented and argued in public. We are just going to have
>> one off-list conversation to try and speed up the process. You make a
>> valid point, and I appreciate the perspective. Please speak up again
>> after hearing the report if something is not clear. I don't want this to
>> even have the appearance of a "back-room" deal.
>>
>> Mark and I will have conversations about NumPy while he is in Austin.
>> There are many other active stake-holders whose opinions and views are
>> essential for major changes. Mark and I are working on other things
>> besides just NumPy and all NumPy changes will be discussed on list and
>> require consensus or super-majority for NumPy itself to change. I'm not
>> sure if that helps. Is there more we can do?
>
> As you might have heard me say before, my concern is that it has not
> been easy to have good discussions on this list. I think the problem
> has been that is has not been clear what the culture was, and how
> decisions got made, and that had led to some uncomfortable and
> unhelpful discussions. My plea would be for you as BDF$N to strongly
> encourage on-list discussions and discourage off-list discussions as
> far as possible, and to help us make the difficult public effort to
> bash out the arguments to clarity and consensus. I know that's a big
> ask.
>
> See you,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion