On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.cer...@gmail.com> wrote: > On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.cer...@gmail.com> > wrote: >> Many of the failures in >> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 >> are of the type: >> >> ====================================================================== >> FAIL: Check byteorder of single-dimensional objects >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", >> line 286, in test_valuesSD >> self.assertTrue(ua[0] != ua2[0]) >> AssertionError: False is not true >> >> >> and those are caused by the following minimal example: >> >> Python 3.2: >> >>>>> from numpy import array >>>>> a = array(["abc"]) >>>>> b = a.newbyteorder() >>>>> a.dtype >> dtype('<U3') >>>>> b.dtype >> dtype('>U3') >>>>> a[0].dtype >> dtype('<U3') >>>>> b[0].dtype >> dtype('<U6') >>>>> a[0] == b[0] >> False >>>>> a[0] >> 'abc' >>>>> b[0] >> 'ៀ\udc00埀\udc00韀\udc00' >> >> >> Python 3.3: >> >> >>>>> from numpy import array >>>>> a = array(["abc"]) >>>>> b = a.newbyteorder() >>>>> a.dtype >> dtype('<U3') >>>>> b.dtype >> dtype('>U3') >>>>> a[0].dtype >> dtype('<U3') >>>>> b[0].dtype >> dtype('<U3') >>>>> a[0] == b[0] >> True >>>>> a[0] >> 'abc' >>>>> b[0] >> 'abc' >> >> >> So somehow the newbyteorder() method doesn't change the dtype of the >> elements in our new code. >> This method is implemented in numpy/core/src/multiarray/descriptor.c >> (I think), but so far I don't see >> where the problem could be. >> >> Any ideas? > > Ok, after some investigating, I think we need to do something along these > lines: > > diff --git a/numpy/core/src/multiarray/scalarapi.c > b/numpy/core/src/multiarray/s > index c134aed..daf7fc4 100644 > --- a/numpy/core/src/multiarray/scalarapi.c > +++ b/numpy/core/src/multiarray/scalarapi.c > @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, > PyObject * > #if PY_VERSION_HEX >= 0x03030000 > if (type_num == NPY_UNICODE) { > PyObject *b, *args; > - b = PyBytes_FromStringAndSize(data, itemsize); > + if (swap) { > + char *buffer; > + buffer = malloc(itemsize); > + if (buffer == NULL) { > + PyErr_NoMemory(); > + } > + memcpy(buffer, data, itemsize); > + byte_swap_vector(buffer, itemsize, 4); > + b = PyBytes_FromStringAndSize(buffer, itemsize); > + // We have to deallocate this later, otherwise we get a > segfault... > + //free(buffer); > + } else { > + b = PyBytes_FromStringAndSize(data, itemsize); > + } > if (b == NULL) { > return NULL; > } > > This particular implementation still fails though: > > >>>> from numpy import array >>>> a = array(["abc"]) >>>> b = a.newbyteorder() >>>> a.dtype > dtype('<U3') >>>> b.dtype > dtype('>U3') >>>> a[0].dtype > dtype('<U3') >>>> b[0].dtype > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: > codepoint not in range(0x110000) >>>> a[0] == b[0] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: > codepoint not in range(0x110000) >>>> a[0] > 'abc' >>>> b[0] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: > codepoint not in range(0x110000) > > > > But I think that we simply need to take into account the "swap" flag.
Ok, so first of all, I tried to disable the swapping in Python 3.2: if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); } And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; } That works well, except that it gives the UnicodeDecodeError: >>> b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) This error is actually triggered by this line: obj = type->tp_new(type, args, NULL); in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this. Ondrej _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion