On 05/11/2012 12:47 AM, Dag Sverre Seljebotn wrote: > On 05/11/2012 12:28 AM, Mark Wiebe wrote: >> I did some searching for typical Cython and C code which accesses numpy >> arrays, and added a section to the NEP describing how they behave in the >> current implementation. Cython code which uses either straight Python >> access or the buffer protocol is fine (after a bugfix in numpy, it >> wasn't failing currently as it should in the pep3118 case). C code which >> follows the recommended practice of using PyArray_FromAny or one of the >> related macros is also fine, because these functions have been made to >> fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. >> >> In general, code which follows the recommended numpy practices will >> raise exceptions when encountering NA-masked arrays. This means >> programmers don't have to worry about the NA unless they want to support >> it. Having things go through PyArray_FromAny also provides a place where >> lazy evaluation arrays could be evaluated, and other similar potential >> future extensions can use to provide compatibility. >> >> Here's the section I added to the NEP: >> >> Interaction With Pre-existing C API Usage >> ========================================= >> >> Making sure existing code using the C API, whether it's written in C, C++, >> or Cython, does something reasonable is an important goal of this >> implementation. >> The general strategy is to make existing code which does not explicitly >> tell numpy it supports NA masks fail with an exception saying so. There are >> a few different access patterns people use to get ahold of the numpy >> array data, >> here we examine a few of them to see what numpy can do. These examples are >> found from doing google searches of numpy C API array access. >> >> Numpy Documentation - How to extend NumPy >> ----------------------------------------- >> >> http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects >> >> This page has a section "Dealing with array objects" which has some >> advice for how >> to access numpy arrays from C. When accepting arrays, the first step it >> suggests is >> to use PyArray_FromAny or a macro built on that function, so code >> following this >> advice will properly fail when given an NA-masked array it doesn't know >> how to handle. >> >> The way this is handled is that PyArray_FromAny requires a special flag, >> NPY_ARRAY_ALLOWNA, >> before it will allow NA-masked arrays to flow through. >> >> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA >> >> Code which does not follow this advice, and instead just calls >> PyArray_Check() to verify >> its an ndarray and checks some flags, will silently produce incorrect >> results. This style >> of code does not provide any opportunity for numpy to say "hey, this >> array is special", >> so also is not compatible with future ideas of lazy evaluation, derived >> dtypes, etc. > > This doesn't really cover the Cython code I write that interfaces with C > (and probably the code others write in Cython). > > Often I'd do: > > def f(arg): > cdef np.ndarray arr = np.asarray(arg) > c_func(np.PyArray_DATA(arr)) > > So I mix Python np.asarray with C PyArray_DATA. In general, I think you > use PyArray_FromAny if you're very concerned about performance or need > some special flag, but it's certainly not the first thing you tgry. > > But in general, I will often be lazy and just do > > def f(np.ndarray arr): > c_func(np.PyArray_DATA(arr)) > > It's an exception if you don't provide an array -- so who cares. (I > guess the odds of somebody feeding a masked array to code like that, > which doesn't try to be friendly, is relatively smaller though.) > > If you know the datatype, you can really do > > def f(np.ndarray[double] arr): > c_func(&arr[0]) > > which works with PEP 3118. But I use PyArray_DATA out of habit (and > since it works in the cases without dtype). > > Frankly, I don't expect any Cython code to do the right thing here; > calling PyArray_FromAny is much more typing. And really, nobody ever > questioned that if we had an actual ndarray instance, we'd be allowed to > call PyArray_DATA. > > I don't know how much Cython code is out there in the wild for which > this is a problem. Either way, it would cause something of a reeducation > challenge for Cython users.
Also note that Cython users are in the habit of accessing "arr.data" (which is the char*, not the buffer object) directly. Just in case you had the idea of grepping for PyArray_DATA in Cython code. Our plan there is we'll eventually put out a Cython version which special-cases np.ndarray and turn ".data" into a call to PyArray_DATA (and same for shape, strides, ...). Ugly hack, but avoids breaking existing Cython code if NumPy removes the field access. Dag > > Dag > >> >> Tutorial From Cython Website >> ---------------------------- >> >> http://docs.cython.org/src/tutorial/numpy.html >> >> This tutorial gives a convolution example, and all the examples fail with >> Python exceptions when given inputs that contain NA values. >> >> Before any Cython type annotation is introduced, the code functions just >> as equivalent Python would in the interpreter. >> >> When the type information is introduced, it is done via numpy.pxd which >> defines a mapping between an ndarray declaration and PyArrayObject \*. >> Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct >> comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. >> >> Then the code does some dtype comparisons, and uses regular python indexing >> to access the array elements. This python indexing still goes through the >> Python API, so the NA handling and error checking in numpy still can work >> like normal and fail if the inputs have NAs which cannot fit in the output >> array. In this case it fails when trying to convert the NA into an integer >> to set in in the output. >> >> The next version of the code introduces more efficient indexing. This >> operates based on Python's buffer protocol. This causes Cython to call >> __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls >> PyObject_GetBuffer. This call gives numpy the opportunity to raise an >> exception if the inputs are arrays with NA-masks, something not supported >> by the Python buffer protocol. >> >> Numerical Python - JPL website >> ------------------------------ >> >> http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >> >> This document is from 2001, so does not reflect recent numpy, but it is the >> second hit when searching for "numpy c api example" on google. >> >> There first example, heading "A simple example", is in fact already >> invalid for >> recent numpy even without the NA support. In particular, if the data is >> misaligned >> or in a different byteorder, it may crash or produce incorrect results. >> >> The next thing the document does is introduce >> PyArray_ContiguousFromObject, which >> gives numpy an opportunity to raise an exception when NA-masked arrays >> are used, >> so the later code will raise exceptions as desired. >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion