[Numpy-discussion] Missing data wrap-up and request for comments

Travis Oliphant Wed, 09 May 2012 09:47:03 -0700

Hey all, 

Nathaniel and Mark have worked very hard on a joint document to try and explain 
the current status of the missing-data debate.   I think they've done an 
amazing job at providing some context, articulating their views and suggesting 
ways forward in a mutually respectful manner.   This is an exemplary 
collaboration and is at the core of why open source is valuable.


The document is available here: 
   https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

After reading that document, it appears to me that there are some fundamentally 
different views on how things should move forward.   I'm also reading the 
document incorporating my understanding of the history, of NumPy as well as all 
of the users I've met and interacted with which means I have my own perspective 
that is not necessarily incorporated into that document but informs my 
recommendations.    I'm not sure we can reach full consensus on this.     We 
are also well past time for moving forward with a resolution on this (perhaps 
we can all agree on that).    

I would like one more discussion thread where the technical discussion can take 
place.    I will make a plea that we keep this discussion as free from logical 
fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.   I can't 
guarantee that I personally will succeed at that, but I can tell you that I 
will try.   That's all I'm asking of anyone else.    I recognize that there are 
a lot of other issues at play here besides *just* the technical questions, but 
we are not going to resolve every community issue in this technical thread. 

We need concrete proposals and so I will start with three.   Please feel free 
to comment on these proposals or add your own during the discussion.    I will 
stop paying attention to this thread next Wednesday (May 16th) (or earlier if 
the thread dies) and hope that by that time we can agree on a way forward.  If 
we don't have agreement, then I will move forward with what I think is the 
right approach.   I will either write the code myself or convince someone else 
to write it. 

In all cases, we have agreement that bit-pattern dtypes should be added to 
NumPy.      We should work on these (int32, float64, complex64, str, bool) to 
start.    So, the three proposals are independent of this way forward.   The 
proposals are all about the extra mask part:  

My three proposals: 

        * do nothing and leave things as is 

        * add a global flag that turns off masked array support by default but 
otherwise leaves things unchanged (I'm still unclear how this would work 
exactly)

        * move Mark's "masked ndarray objects" into a new fundamental type 
(ndmasked), leaving the actual ndarray type unchanged.  The array_interface 
keeps the masked array notions and the ufuncs keep the ability to handle arrays 
like ndmasked.    Ideally, numpy.ma would be changed to use ndmasked objects as 
their core. 

For the record, I'm currently in favor of the third proposal.   Feel free to 
comment on these proposals (or provide your own). 

Best regards,

-Travis

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Missing data wrap-up and request for comments

Reply via email to