On 05/11/2012 12:28 AM, Mark Wiebe wrote: > I did some searching for typical Cython and C code which accesses numpy > arrays, and added a section to the NEP describing how they behave in the > current implementation. Cython code which uses either straight Python > access or the buffer protocol is fine (after a bugfix in numpy, it > wasn't failing currently as it should in the pep3118 case). C code which > follows the recommended practice of using PyArray_FromAny or one of the > related macros is also fine, because these functions have been made to > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. > > In general, code which follows the recommended numpy practices will > raise exceptions when encountering NA-masked arrays. This means > programmers don't have to worry about the NA unless they want to support > it. Having things go through PyArray_FromAny also provides a place where > lazy evaluation arrays could be evaluated, and other similar potential > future extensions can use to provide compatibility. > > Here's the section I added to the NEP: > > Interaction With Pre-existing C API Usage > ========================================= > > Making sure existing code using the C API, whether it's written in C, C++, > or Cython, does something reasonable is an important goal of this > implementation. > The general strategy is to make existing code which does not explicitly > tell numpy it supports NA masks fail with an exception saying so. There are > a few different access patterns people use to get ahold of the numpy > array data, > here we examine a few of them to see what numpy can do. These examples are > found from doing google searches of numpy C API array access. > > Numpy Documentation - How to extend NumPy > ----------------------------------------- > > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects > > This page has a section "Dealing with array objects" which has some > advice for how > to access numpy arrays from C. When accepting arrays, the first step it > suggests is > to use PyArray_FromAny or a macro built on that function, so code > following this > advice will properly fail when given an NA-masked array it doesn't know > how to handle. > > The way this is handled is that PyArray_FromAny requires a special flag, > NPY_ARRAY_ALLOWNA, > before it will allow NA-masked arrays to flow through. > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA > > Code which does not follow this advice, and instead just calls > PyArray_Check() to verify > its an ndarray and checks some flags, will silently produce incorrect > results. This style > of code does not provide any opportunity for numpy to say "hey, this > array is special", > so also is not compatible with future ideas of lazy evaluation, derived > dtypes, etc.
This doesn't really cover the Cython code I write that interfaces with C (and probably the code others write in Cython). Often I'd do: def f(arg): cdef np.ndarray arr = np.asarray(arg) c_func(np.PyArray_DATA(arr)) So I mix Python np.asarray with C PyArray_DATA. In general, I think you use PyArray_FromAny if you're very concerned about performance or need some special flag, but it's certainly not the first thing you tgry. But in general, I will often be lazy and just do def f(np.ndarray arr): c_func(np.PyArray_DATA(arr)) It's an exception if you don't provide an array -- so who cares. (I guess the odds of somebody feeding a masked array to code like that, which doesn't try to be friendly, is relatively smaller though.) If you know the datatype, you can really do def f(np.ndarray[double] arr): c_func(&arr[0]) which works with PEP 3118. But I use PyArray_DATA out of habit (and since it works in the cases without dtype). Frankly, I don't expect any Cython code to do the right thing here; calling PyArray_FromAny is much more typing. And really, nobody ever questioned that if we had an actual ndarray instance, we'd be allowed to call PyArray_DATA. I don't know how much Cython code is out there in the wild for which this is a problem. Either way, it would cause something of a reeducation challenge for Cython users. Dag > > Tutorial From Cython Website > ---------------------------- > > http://docs.cython.org/src/tutorial/numpy.html > > This tutorial gives a convolution example, and all the examples fail with > Python exceptions when given inputs that contain NA values. > > Before any Cython type annotation is introduced, the code functions just > as equivalent Python would in the interpreter. > > When the type information is introduced, it is done via numpy.pxd which > defines a mapping between an ndarray declaration and PyArrayObject \*. > Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct > comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. > > Then the code does some dtype comparisons, and uses regular python indexing > to access the array elements. This python indexing still goes through the > Python API, so the NA handling and error checking in numpy still can work > like normal and fail if the inputs have NAs which cannot fit in the output > array. In this case it fails when trying to convert the NA into an integer > to set in in the output. > > The next version of the code introduces more efficient indexing. This > operates based on Python's buffer protocol. This causes Cython to call > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls > PyObject_GetBuffer. This call gives numpy the opportunity to raise an > exception if the inputs are arrays with NA-masks, something not supported > by the Python buffer protocol. > > Numerical Python - JPL website > ------------------------------ > > http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html > > This document is from 2001, so does not reflect recent numpy, but it is the > second hit when searching for "numpy c api example" on google. > > There first example, heading "A simple example", is in fact already > invalid for > recent numpy even without the NA support. In particular, if the data is > misaligned > or in a different byteorder, it may crash or produce incorrect results. > > The next thing the document does is introduce > PyArray_ContiguousFromObject, which > gives numpy an opportunity to raise an exception when NA-masked arrays > are used, > so the later code will raise exceptions as desired. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion