Hi, On Tue, Mar 15, 2011 at 5:55 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgoh...@uci.edu> wrote: >> >> >> On 3/15/2011 5:13 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.br...@gmail.com> >>> wrote: >>>> Hi, >>>> >>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<p...@iki.fi> wrote: >>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote: >>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port, >>>>>> but is this the right fix to this problem? I was confused by the >>>>>> presence of the old PyString_AsString function. >>>>> >>>>> It's not a correct fix. The original code seems also wrong ("index" can >>>>> either be Unicode or Bytes/String object), and will probably bomb when >>>>> indexing with Unicode strings on Python 2. The right thing to do is to >>>>> make it show the repr of the "index" object. >>>> >>>> OK - I realize I'm being very lazy here but, do you mean: >>>> >>>> PyErr_Format(PyExc_ValueError, >>>>>> "field named %s not found.", >>>>>> PyString_AsString(PyObject_Repr(index))); >>> >>> Being less lazy, and having read the cpython source, and read >>> Christoph's mail more carefully, I believe Christoph's patch is >>> correct... >>> >>> Unicode indexing of structured array fields doesn't raise an error on >>> python 2.x; I assume because PyString_AsString is returning a char* >>> using the Unicode default encoding, as per the docs. >>> >> >> I think the patch is correct for Python 3 but, as Pauli pointed out, the >> original code can crash also under Python 2.x when indexing with an >> unicode string that contains non-ascii7 characters, which seems much >> less likely and apparently has been undetected for quite a while. For >> example, this crashes for me on Python 2.7: >> >> import numpy as np >> a = np.zeros((1,), dtype=[('f1', 'f')]) >> a[u's'] = 1 # works >> a[u'µ'] = 1 # crash >> >> So, the proposed patch works for Python 3, but there could be a better >> patch fixing also the corner case on Python 2. > > Thanks. How about something like the check further up the file: > > if (PyUnicode_Check(op)) { > temp = PyUnicode_AsUnicodeEscapeString(op); > } > PyErr_Format(PyExc_ValueError, > "field named %s not found.", > PyBytes_AsString(temp)); > > ? I'm happy to submit a pull request with tests if that kind of thing > looks sensible to y'all,
That fix works for 2.6 but crashes 3.2 when it is rejecting (by design I think) field indexing with byte strings. I've put a test function in this branch which exercises the routes I could think of. It will crash current 2.x because of the unicode error that Pauli pointed out, but I think it corresponds to the expected behavior: https://github.com/matthew-brett/numpy/compare/numpy:master...matthew-brett:struct-arr-fields Do these tests look correct? I'd be happy to hear why the code above does not work, it wasn't obvious to me. See you, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion