On 05/11/2012 07:36 AM, Travis Oliphant wrote: >>> >>> I guess this mixture of Python-API and C-API is different from the way >>> the API tries to protect incorrect access. From the Python API, it. >>> should let everything through, because it's for Python code to use. From >>> the C API, it should default to not letting things through, because >>> special NA-mask aware code needs to be written. I'm not sure if there is >>> a reasonable approach here which works for everything. >> >> Does that mean you consider changing ob_type for masked arrays >> unreasonable? They can still use the same object struct... >> >>> >>> But in general, I will often be lazy and just do >>> >>> def f(np.ndarray arr): >>> c_func(np.PyArray_DATA(arr)) >>> >>> It's an exception if you don't provide an array -- so who cares. (I >>> guess the odds of somebody feeding a masked array to code like that, >>> which doesn't try to be friendly, is relatively smaller though.) >>> >>> >>> This code would already fail with non-contiguous strides or byte-swapped >>> data, so the additional NA mask case seems to fit in an already-failing >>> category. >> >> Honestly! I hope you did't think I provided a full-fledged example? >> Perhaps you'd like to point out to me that "c_func" is a bad name for a >> function as well? >> >> One would of course check that things are contiguous (or pass on the >> strides), check the dtype and dispatch to different C functions in each >> case, etc. >> >> But that isn't the point. Scientific code most of the time does fall in >> the "already-failing" category. That doesn't mean it doesn't count. >> Let's focus on the number of code lines written and developer hours that >> will be spent cleaning up the mess -- not the "validity" of the code in >> question. >> >>> >>> >>> If you know the datatype, you can really do >>> >>> def f(np.ndarray[double] arr): >>> c_func(&arr[0]) >>> >>> which works with PEP 3118. But I use PyArray_DATA out of habit (and >>> since it works in the cases without dtype). >>> >>> Frankly, I don't expect any Cython code to do the right thing here; >>> calling PyArray_FromAny is much more typing. And really, nobody ever >>> questioned that if we had an actual ndarray instance, we'd be allowed to >>> call PyArray_DATA. >>> >>> I don't know how much Cython code is out there in the wild for which >>> this is a problem. Either way, it would cause something of a reeducation >>> challenge for Cython users. >>> >>> >>> Since this style of coding already has known problems, do you think the >>> case with NA-masks deserves more attention here? What will happen is. >>> access to array element data without consideration of the mask, which >>> seems similar in nature to accessing array data with the wrong stride or >>> byte order. >> >> I don't agree with the premise of that paragraph. There's no reason to >> assume that just because code doesn't call FromAny, it has problems. >> (And I'll continue to assume that whatever array is returned from >> "np.ascontiguousarray is really contiguous...) >> >> Whether it requires attention or not is a different issue though. I'm >> not sure. I think other people should weigh in on that -- I mostly write >> code for my own consumption. >> >> One should at least check pandas, scikits-image, scikits-learn, mpi4py, >> petsc4py, and so on. And ask on the Cython users list. Hopefully it will >> usually be PEP 3118. But now I need to turn in. >> >> Travis, would such a survey be likely to affect the outcome of your >> decision in any way? Or should we just leave this for now? >> > > This dialog gets at the heart of the matter, I think. The NEP seems to want > NumPy to have a "better" API that always protects downstream users from > understanding what is actually under the covers. It would prefer to push > NumPy in the direction of an array object that is fundamentally more opaque. > However, the world NumPy lives in is decidedly not opaque. There has been > significant education and shared understanding of what a NumPy array actually > *is* (a strided view of memory of a particular "dtype"). This shared > understanding has even been pushed into Python as the buffer protocol. It > is very common for extension modules to go directly to the data they want by > using this understanding. > > This is very different from the traditional "shield your users" from how > things are actually done view of most object APIs. It was actually > intentional. I'm not saying that different choices could not have been > made or that some amount of shielding should never be contemplated. I'm > just saying that NumPy has been used as a nice bridge between the world of > scientific computing codes that have chunks of memory allocated for > processing and high-level code. Part of the reason for this bridge has been > the simple object model. > > I just don't think the NEP fully appreciates just how fundamental of a shift > this is in the wider NumPy community and it is not something that can be done > immediately or without careful attention. > > Dag, is an *active* member in that larger group of C-consumers of NumPy > arrays. As a long-time member of that group, myself, this is where my > concerns are coming from. So far I am not hearing anything to alleviate > those concerns. > > See my post in the other thread for my proposal to add a flag that allows > users to switch between the Python side default being ndarray's or ndmasked, > but they are different types at the C-level. The proposal so far does not > specify whether or not ndarray or ndmasked is a subclass of the other. > Given the history of numpy.ma and the fact that it makes sense on the > C-level, I would lean toward ndmasked being a sub-class of ndarray --- thus a > C-user would have to do a PyArray_CheckExact to ensure they are getting a > base Python Array Object --- which they would have to do anyway because > numpy.ma arrays also pass PyArray_Check.
Making it a subclass means existing Cython code is not catered for, as PyObject_TypeCheck is used. Is there a advantage for users by making it a subclass? Nobody is saying you couldn't 'inherit' the struct (make the ndmask struct be castable to a PyArrayObject*) even if that is not declared in the Python type object. Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion