On 05/10/2012 01:01 AM, Matthew Brett wrote: > Hi, > > On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn > <d.s.seljeb...@astro.uio.no> wrote: >> On 05/09/2012 06:46 PM, Travis Oliphant wrote: >>> Hey all, >>> >>> Nathaniel and Mark have worked very hard on a joint document to try and >>> explain the current status of the missing-data debate. I think they've >>> done an amazing job at providing some context, articulating their views >>> and suggesting ways forward in a mutually respectful manner. This is an >>> exemplary collaboration and is at the core of why open source is valuable. >>> >>> The document is available here: >>> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >>> >>> After reading that document, it appears to me that there are some >>> fundamentally different views on how things should move forward. I'm >>> also reading the document incorporating my understanding of the history, >>> of NumPy as well as all of the users I've met and interacted with which >>> means I have my own perspective that is not necessarily incorporated >>> into that document but informs my recommendations. I'm not sure we can >>> reach full consensus on this. We are also well past time for moving >>> forward with a resolution on this (perhaps we can all agree on that). >>> >>> I would like one more discussion thread where the technical discussion >>> can take place. I will make a plea that we keep this discussion as free >>> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as >>> we can. I can't guarantee that I personally will succeed at that, but I >>> can tell you that I will try. That's all I'm asking of anyone else. I >>> recognize that there are a lot of other issues at play here besides >>> *just* the technical questions, but we are not going to resolve every >>> community issue in this technical thread. >>> >>> We need concrete proposals and so I will start with three. Please feel >>> free to comment on these proposals or add your own during the >>> discussion. I will stop paying attention to this thread next Wednesday >>> (May 16th) (or earlier if the thread dies) and hope that by that time we >>> can agree on a way forward. If we don't have agreement, then I will move >>> forward with what I think is the right approach. I will either write the >>> code myself or convince someone else to write it. >>> >>> In all cases, we have agreement that bit-pattern dtypes should be added >>> to NumPy. We should work on these (int32, float64, complex64, str, bool) >>> to start. So, the three proposals are independent of this way forward. >>> The proposals are all about the extra mask part: >>> >>> My three proposals: >>> >>> * do nothing and leave things as is >>> >>> * add a global flag that turns off masked array support by default but >>> otherwise leaves things unchanged (I'm still unclear how this would work >>> exactly) >>> >>> * move Mark's "masked ndarray objects" into a new fundamental type >>> (ndmasked), leaving the actual ndarray type unchanged. The >>> array_interface keeps the masked array notions and the ufuncs keep the >>> ability to handle arrays like ndmasked. Ideally, numpy.ma >>> <http://numpy.ma> would be changed to use ndmasked objects as their core. >>> >>> For the record, I'm currently in favor of the third proposal. Feel free >>> to comment on these proposals (or provide your own). >>> >> >> Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! > > Yes, it is very well written, my compliments to the chefs. > >> The third proposal is certainly the best one from Cython's perspective; >> and I imagine for those writing C extensions against the C API too. >> Having PyType_Check fail for ndmasked is a very good way of having code >> fail that is not written to take masks into account.
I want to make something more clear: There are two Cython cases; in the case of "cdef np.ndarray[double]" there is no problem as PEP 3118 access will raise an exception for masked arrays. But, there's the case where you do "cdef np.ndarray", and then proceed to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually because I pass the data pointer to some C or C++ code. It'd be great to have such code be forward-compatible in the sense that it raises an exception when it meets a masked array. Having PyType_Check fail seems like the only way? Am I wrong? > Mark, Nathaniel - can you comment how your chosen approaches would > interact with extension code? > > I'm guessing the bitpattern dtypes would be expected to cause > extension code to choke if the type is not supported? The proposal, as I understand it, is to use that with new dtypes (?). So things will often be fine for that reason: if arr.dtype == np.float32: c_function_32bit(np.PyArray_DATA(arr), ...) else: raise ValueError("need 32-bit float array") > > Mark - in : > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython > > - do I understand correctly that you think that Cython and other > extension writers should use the numpy API to access the data rather > than accessing it directly via the data pointer and strides? That's not really fleshed out (for all the different usecases etc.); I read that as "let's discuss Cython later, when this is actively used in NumPy". Which sounds reasonable to me. Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion