Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote:
>On 05/11/2012 01:06 AM, Mark Wiebe wrote: >> On Thu, May 10, 2012 at 5:47 PM, Dag Sverre Seljebotn >> <d.s.seljeb...@astro.uio.no <mailto:d.s.seljeb...@astro.uio.no>> >wrote: >> >> On 05/11/2012 12:28 AM, Mark Wiebe wrote: >> > I did some searching for typical Cython and C code which >accesses >> numpy >> > arrays, and added a section to the NEP describing how they >behave >> in the >> > current implementation. Cython code which uses either straight >Python >> > access or the buffer protocol is fine (after a bugfix in >numpy, it >> > wasn't failing currently as it should in the pep3118 case). C >> code which >> > follows the recommended practice of using PyArray_FromAny or >one >> of the >> > related macros is also fine, because these functions have been >> made to >> > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is >> provided. >> > >> > In general, code which follows the recommended numpy practices >will >> > raise exceptions when encountering NA-masked arrays. This >means >> > programmers don't have to worry about the NA unless they want >to >> support >> > it. Having things go through PyArray_FromAny also provides a >> place where >> > lazy evaluation arrays could be evaluated, and other similar >> potential >> > future extensions can use to provide compatibility. >> > >> > Here's the section I added to the NEP: >> > >> > Interaction With Pre-existing C API Usage >> > ========================================= >> > >> > Making sure existing code using the C API, whether it's >written >> in C, C++, >> > or Cython, does something reasonable is an important goal of >this >> > implementation. >> > The general strategy is to make existing code which does not >> explicitly >> > tell numpy it supports NA masks fail with an exception saying >so. >> There are >> > a few different access patterns people use to get ahold of the >numpy >> > array data, >> > here we examine a few of them to see what numpy can do. These >> examples are >> > found from doing google searches of numpy C API array access. >> > >> > Numpy Documentation - How to extend NumPy >> > ----------------------------------------- >> > >> > >> >http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects >> > >> > This page has a section "Dealing with array objects" which has >some >> > advice for how >> > to access numpy arrays from C. When accepting arrays, the >first >> step it >> > suggests is >> > to use PyArray_FromAny or a macro built on that function, so >code >> > following this >> > advice will properly fail when given an NA-masked array it >> doesn't know >> > how to handle. >> > >> > The way this is handled is that PyArray_FromAny requires a >> special flag, >> > NPY_ARRAY_ALLOWNA, >> > before it will allow NA-masked arrays to flow through. >> > >> > >> >http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA >> > >> > Code which does not follow this advice, and instead just calls >> > PyArray_Check() to verify >> > its an ndarray and checks some flags, will silently produce >incorrect >> > results. This style >> > of code does not provide any opportunity for numpy to say >"hey, this >> > array is special", >> > so also is not compatible with future ideas of lazy >evaluation, >> derived >> > dtypes, etc. >> >> This doesn't really cover the Cython code I write that interfaces >with C >> (and probably the code others write in Cython). >> >> Often I'd do: >> >> def f(arg): >> cdef np.ndarray arr = np.asarray(arg) >> c_func(np.PyArray_DATA(arr)) >> >> So I mix Python np.asarray with C PyArray_DATA. In general, I >think you >> use PyArray_FromAny if you're very concerned about performance or >need >> some special flag, but it's certainly not the first thing you >tgry. >> >> >> I guess this mixture of Python-API and C-API is different from the >way >> the API tries to protect incorrect access. From the Python API, it. >> should let everything through, because it's for Python code to use. >From >> the C API, it should default to not letting things through, because >> special NA-mask aware code needs to be written. I'm not sure if there >is >> a reasonable approach here which works for everything. > >Does that mean you consider changing ob_type for masked arrays >unreasonable? They can still use the same object struct... > >> >> But in general, I will often be lazy and just do >> >> def f(np.ndarray arr): >> c_func(np.PyArray_DATA(arr)) >> >> It's an exception if you don't provide an array -- so who cares. >(I >> guess the odds of somebody feeding a masked array to code like >that, >> which doesn't try to be friendly, is relatively smaller though.) >> >> >> This code would already fail with non-contiguous strides or >byte-swapped >> data, so the additional NA mask case seems to fit in an >already-failing >> category. > >Honestly! I hope you did't think I provided a full-fledged example? >Perhaps you'd like to point out to me that "c_func" is a bad name for a > >function as well? > >One would of course check that things are contiguous (or pass on the >strides), check the dtype and dispatch to different C functions in each > >case, etc. > >But that isn't the point. Scientific code most of the time does fall in > >the "already-failing" category. That doesn't mean it doesn't count. >Let's focus on the number of code lines written and developer hours >that >will be spent cleaning up the mess -- not the "validity" of the code in > >question. > >> >> >> If you know the datatype, you can really do >> >> def f(np.ndarray[double] arr): >> c_func(&arr[0]) >> >> which works with PEP 3118. But I use PyArray_DATA out of habit >(and >> since it works in the cases without dtype). >> >> Frankly, I don't expect any Cython code to do the right thing >here; >> calling PyArray_FromAny is much more typing. And really, nobody >ever >> questioned that if we had an actual ndarray instance, we'd be >allowed to >> call PyArray_DATA. >> >> I don't know how much Cython code is out there in the wild for >which >> this is a problem. Either way, it would cause something of a >reeducation >> challenge for Cython users. >> >> >> Since this style of coding already has known problems, do you think >the >> case with NA-masks deserves more attention here? What will happen is. >> access to array element data without consideration of the mask, which >> seems similar in nature to accessing array data with the wrong stride >or >> byte order. I realized something -- I think this is not the most important question to ask. The question to ask is: what will create a nice, seamless NA-experience for a NumPy user. Can he/she just try to call a function (which may call other functions, which may call...) with a masked array and trust that it is correct or barfs? It's not a question of how much code needs fixing, but of the uncertainty and delay of adoption it'll create that code needs to be verified. With ndmasked, you get a *guarantee* against old code. (crazy thought: look into whether ob-type can be reassigned after object creation? I wouldn't put it past CPython to pull off a hack like that.) Dag > >I don't agree with the premise of that paragraph. There's no reason to >assume that just because code doesn't call FromAny, it has problems. >(And I'll continue to assume that whatever array is returned from >"np.ascontiguousarray is really contiguous...) > >Whether it requires attention or not is a different issue though. I'm >not sure. I think other people should weigh in on that -- I mostly >write >code for my own consumption. > >One should at least check pandas, scikits-image, scikits-learn, mpi4py, > >petsc4py, and so on. And ask on the Cython users list. Hopefully it >will >usually be PEP 3118. But now I need to turn in. > >Travis, would such a survey be likely to affect the outcome of your >decision in any way? Or should we just leave this for now? > >Dag > >> >> Cheers, >> Mark >> >> Dag >> >> > >> > Tutorial From Cython Website >> > ---------------------------- >> > >> > http://docs.cython.org/src/tutorial/numpy.html >> > >> > This tutorial gives a convolution example, and all the >examples >> fail with >> > Python exceptions when given inputs that contain NA values. >> > >> > Before any Cython type annotation is introduced, the code >> functions just >> > as equivalent Python would in the interpreter. >> > >> > When the type information is introduced, it is done via >numpy.pxd >> which >> > defines a mapping between an ndarray declaration and >> PyArrayObject \*. >> > Under the hood, this maps to __Pyx_ArgTypeTest, which does a >direct >> > comparison of Py_TYPE(obj) against the PyTypeObject for the >ndarray. >> > >> > Then the code does some dtype comparisons, and uses regular >> python indexing >> > to access the array elements. This python indexing still goes >> through the >> > Python API, so the NA handling and error checking in numpy >still >> can work >> > like normal and fail if the inputs have NAs which cannot fit >in >> the output >> > array. In this case it fails when trying to convert the NA >into >> an integer >> > to set in in the output. >> > >> > The next version of the code introduces more efficient >indexing. This >> > operates based on Python's buffer protocol. This causes Cython >to >> call >> > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which >calls >> > PyObject_GetBuffer. This call gives numpy the opportunity to >raise an >> > exception if the inputs are arrays with NA-masks, something >not >> supported >> > by the Python buffer protocol. >> > >> > Numerical Python - JPL website >> > ------------------------------ >> > >> > >http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >> > >> > This document is from 2001, so does not reflect recent numpy, >but >> it is the >> > second hit when searching for "numpy c api example" on google. >> > >> > There first example, heading "A simple example", is in fact >already >> > invalid for >> > recent numpy even without the NA support. In particular, if >the >> data is >> > misaligned >> > or in a different byteorder, it may crash or produce incorrect >> results. >> > >> > The next thing the document does is introduce >> > PyArray_ContiguousFromObject, which >> > gives numpy an opportunity to raise an exception when >NA-masked >> arrays >> > are used, >> > so the later code will raise exceptions as desired. >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion@scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion