On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:

> 
> 
> On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant <tra...@continuum.io> wrote:
> 
> On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:
> 
> > Hi,
> >
> > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant <tra...@continuum.io> 
> > wrote:
> >
> >> I have heard from a few people that they are not excited by the growth of
> >> the NumPy data-structure by the 3 pointers needed to hold the masked-array
> >> storage.   This is especially true when there is talk to potentially add
> >> additional attributes to the NumPy array (for labels and other
> >> meta-information).      If you are willing to let us know how you feel 
> >> about
> >> this, please speak up.
> >
> > I guess there are two questions here
> >
> > 1) Will something like the current version of masked arrays have a
> > long term future in numpy, regardless of eventual API? Most likely
> > answer - yes?
> 
> I think the answer to this is yes, but it could be as a feature-filled 
> sub-class (like the current numpy.ma, except in C).
> 
> I think making numpy.ma a subclass of ndarray has caused all sorts of 
> trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from 
> ndarray for implementation of various parts. The upshot is that almost 
> everything has to be overridden, so it didn't buy much.

This is a valid point.   One could create a new object that is binary 
compatible with the NumPy Array but not really a sub-class but provides the 
array interface.    We could keep Mark's modifications to the array interface 
as well so that it can communicate a mask. 

-Travis




>  
> 
> > 2) Will likely changes to the masked array API make any difference to
> > the number of extra pointers?  Likely answer no?
> >
> > Is that right?
> 
> The answer to this is very likely no on the Python side.  But, on the C-side, 
> their could be some differences (i.e. are masked arrays a sub-class of the 
> ndarray or not).
> 
> >
> > I have the impression that the masked array API discussion still has
> > not come out fully into the unforgiving light of discussion day, but
> > if the answer to 2) is No, then I suppose the API discussion is not
> > relevant to the 3 pointers change.
> 
> You are correct that the API discussion is separate from this one.     
> Overall,  I was surprised at how fervently people would oppose ABI changes.   
> As has been pointed out, NumPy and Numeric before it were not really designed 
> to prevent having to recompile when changes were made.   I'm still not sure 
> that a better overall solution is not to promote better availability of 
> downstream binary packages than excessively worry about ABI changes in NumPy. 
>    But, that is the current climate.
> 
> In that climate, my concern is that we haven't finalized the API but are 
> rapidly cementing the *structure* of NumPy arrays into a modified form that 
> has real downstream implications.   Two other people I have talked to share 
> this concern (nobody who has posted on this list before but who are heavy 
> users of NumPy).    I may have missed the threads where it was discussed, but 
> have these structure changes and their implications been fully discussed?   
> Is there anyone else who is concerned about adding 3 more pointers (12 bytes 
> or 24 bytes) to the NumPy structure?
> 
> As Chuck points out, 3 more pointers is not necessarily that big of a deal if 
> you are talking about a large array (though for small arrays it could 
> matter).   But, I personally know of half-written NEPs that propose to add 
> more pointers to the NumPy array:
> 
>        * to allow meta-information to be attached to a NumPy array
>        * to allow labels to be attached to a NumPy array (ala data-array)
>        * to allow multiple chunks for an array.
> 
> Are people O.K. with 5 or 6 more pointers on every NumPy array?    We could 
> also think about adding just one more pointer to a new "enhanced" structure 
> that contains multiple enhancements to the NumPy array.
> 
> 
> Yes, this whole thing could get out of hand with too many extras. One of the 
> things you could discuss with Mark is how to deal with this, or limit the 
> modifications. At some point the ndarray class could become cumbersome, 
> complicated, and difficult to maintain. We need to be careful that it doesn't 
> go that way. I'd like to keep it as simple as possible, the question is what 
> is fundamental. The main long term advantage of having masks part of the base 
> is the possibility of adapted loops in ufuncs, which would give the advantage 
> of speed. But that is just how it looks from where I stand, no doubt others 
> have different priorities.
> 
> But, this whole line of discussion sounds a lot like a true sub-class of the 
> NumPy array at the C-level.    It has the benefit that only people that use 
> the features of the sub-class have to worry about using the extra space.
> 
> Mark and I will talk about this long and hard.  Mark has ideas about where he 
> wants to see NumPy go, but I don't think we have fully accounted for where 
> NumPy and its user base *is* and there may be better ways to approach this 
> evolution.    If others are interested in the outcome of the discussion 
> please speak up (either on the list or privately) and we will make sure your 
> views get heard and accounted for.
> 
> 
> Chuck 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to