Re: [Numpy-discussion] Missing data wrap-up and request for comments

Travis Oliphant Thu, 10 May 2012 22:14:33 -0700

On May 10, 2012, at 12:21 AM, Charles R Harris wrote:

> 
> 
> On Wed, May 9, 2012 at 11:05 PM, Benjamin Root <ben.r...@ou.edu> wrote:
> 
> 
> On Wednesday, May 9, 2012, Nathaniel Smith wrote:
> 
> 
> My only objection to this proposal is that committing to this approach
> seems premature. The existing masked array objects act quite
> differently from numpy.ma, so why do you believe that they're a good
> foundation for numpy.ma, and why will users want to switch to their
> semantics over numpy.ma's semantics? These aren't rhetorical
> questions, it seems like they must have concrete answers, but I don't
> know what they are.
> 
> Based on the design decisions made in the original NEP, a re-made numpy.ma 
> would have to lose _some_ features particularly, the ability to share masks. 
> Save for that and some very obscure behaviors that are undocumented, it is 
> possible to remake numpy.ma as a compatibility layer.
> 
> That being said, I think that there are some fundamental questions that has 
> concerned. If I recall, there were unresolved questions about behaviors 
> surrounding assignments to elements of a view.
> 
> I see the project as broken down like this:
> 1.) internal architecture (largely abi issues)
> 2.) external architecture (hooks throughout numpy to utilize the new features 
> where possible such as where= argument)
> 3.) getter/setter semantics
> 4.) mathematical semantics
> 
> At this moment, I think we have pieces of 2 and they are fairly 
> non-controversial. It is 1 that I see as being the immediate hold-up here. 3 
> & 4 are non-trivial, but because they are mostly about interfaces, I think we 
> can be willing to accept some very basic, fundamental, barebones components 
> here in order to lay the groundwork for a more complete API later.
> 
> To talk of Travis's proposal, doing nothing is no-go. Not moving forward 
> would dishearten the community. Making a ndmasked type is very intriguing. I 
> see it as a set towards eventually deprecating ndarray? Also, how would it 
> behave with no.asarray() and no.asanyarray()? My other concern is a possible 
> violation of DRY. How difficult would it be to maintain two ndarrays in 
> parallel?  
> 
> As for the flag approach, this still doesn't solve the problem of legacy code 
> (or did I misunderstand?)
> 
> My understanding of the flag is to allow the code to stay in and get reworked 
> and experimented with while keeping it from contaminating conventional use.
> 
> The whole point of putting the code in was to experiment and adjust. The 
> rather bizarre idea that it needs to be perfect from the get go is 
> disheartening, and is seldom how new things get developed. Sure, there is a 
> plan up front, but there needs to be feedback and change. And in fact, I 
> haven't seen much feedback about the actual code, I don't even know that the 
> people complaining have tried using it to see where it hurts. I'd like that 
> sort of feedback.
>


I don't think anyone is saying it needs to be perfect from the get go.    What 
I am saying is that this is fundamental enough to downstream users that this 
kind of thing is best done as a separate object.  The flag could still be used 
to make all Python-level array constructors build ndmasked objects.  

But, this doesn't address the C-level story where there is quite a bit of 
downstream use where people have used the NumPy array as just a pointer to 
memory without considering that there might be a mask attached that should be 
inspected as well. 

The NEP addresses this a little bit for those C or C++ consumers of the ndarray 
in C who always use PyArray_FromAny which can fail if the array has non-NULL 
mask contents.   However, it is *not* true that all downstream users use 
PyArray_FromAny. 

A large number of users just use something like PyArray_Check and then 
PyArray_DATA to get the pointer to the data buffer and then go from there 
thinking of their data as a strided memory chunk only (no extra mask).    The 
NEP fundamentally changes this simple invariant that has been in NumPy and 
Numeric before it for a long, long time. 

I really don't see how we can do this in a 1.7 release.    It has too many 
unknown and I think unknowable downstream effects.    But, I think we could 
introduce another arrayobject that is the masked_array with a Python-level flag 
that makes it the default array in Python. 

There are a few more subtleties,  PyArray_Check by default will pass 
sub-classes so if the new ndmask array were a sub-class then it would be passed 
(just like current numpy.ma arrays and matrices would pass that check today).   
 However, there is a PyArray_CheckExact macro which could be used to ensure the 
object was actually of PyArray_Type.   There is also the PyArg_ParseTuple 
command with "O!" that I have seen used many times to ensure an exact NumPy 
array.  

-Travis






> Chuck
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing data wrap-up and request for comments

Reply via email to