On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.cer...@gmail.com> wrote: > On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.cer...@gmail.com> > wrote: >> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >> wrote: >>> Many of the failures in >>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 >>> are of the type: >>> >>> ====================================================================== >>> FAIL: Check byteorder of single-dimensional objects >>> ---------------------------------------------------------------------- >>> Traceback (most recent call last): >>> File >>> "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", >>> line 286, in test_valuesSD >>> self.assertTrue(ua[0] != ua2[0]) >>> AssertionError: False is not true >>> >>> >>> and those are caused by the following minimal example: >>> >>> Python 3.2: >>> >>>>>> from numpy import array >>>>>> a = array(["abc"]) >>>>>> b = a.newbyteorder() >>>>>> a.dtype >>> dtype('<U3') >>>>>> b.dtype >>> dtype('>U3') >>>>>> a[0].dtype >>> dtype('<U3') >>>>>> b[0].dtype >>> dtype('<U6') >>>>>> a[0] == b[0] >>> False >>>>>> a[0] >>> 'abc' >>>>>> b[0] >>> 'ៀ\udc00埀\udc00韀\udc00' >>> >>> >>> Python 3.3: >>> >>> >>>>>> from numpy import array >>>>>> a = array(["abc"]) >>>>>> b = a.newbyteorder() >>>>>> a.dtype >>> dtype('<U3') >>>>>> b.dtype >>> dtype('>U3') >>>>>> a[0].dtype >>> dtype('<U3') >>>>>> b[0].dtype >>> dtype('<U3') >>>>>> a[0] == b[0] >>> True >>>>>> a[0] >>> 'abc' >>>>>> b[0] >>> 'abc' >>> >>> >>> So somehow the newbyteorder() method doesn't change the dtype of the >>> elements in our new code. >>> This method is implemented in numpy/core/src/multiarray/descriptor.c >>> (I think), but so far I don't see >>> where the problem could be. >>> >>> Any ideas? >> >> Ok, after some investigating, I think we need to do something along these >> lines: >> >> diff --git a/numpy/core/src/multiarray/scalarapi.c >> b/numpy/core/src/multiarray/s >> index c134aed..daf7fc4 100644 >> --- a/numpy/core/src/multiarray/scalarapi.c >> +++ b/numpy/core/src/multiarray/scalarapi.c >> @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, >> PyObject * >> #if PY_VERSION_HEX >= 0x03030000 >> if (type_num == NPY_UNICODE) { >> PyObject *b, *args; >> - b = PyBytes_FromStringAndSize(data, itemsize); >> + if (swap) { >> + char *buffer; >> + buffer = malloc(itemsize); >> + if (buffer == NULL) { >> + PyErr_NoMemory(); >> + } >> + memcpy(buffer, data, itemsize); >> + byte_swap_vector(buffer, itemsize, 4); >> + b = PyBytes_FromStringAndSize(buffer, itemsize); >> + // We have to deallocate this later, otherwise we get a >> segfault... >> + //free(buffer); >> + } else { >> + b = PyBytes_FromStringAndSize(data, itemsize); >> + } >> if (b == NULL) { >> return NULL; >> } >> >> This particular implementation still fails though: >> >> >>>>> from numpy import array >>>>> a = array(["abc"]) >>>>> b = a.newbyteorder() >>>>> a.dtype >> dtype('<U3') >>>>> b.dtype >> dtype('>U3') >>>>> a[0].dtype >> dtype('<U3') >>>>> b[0].dtype >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >> codepoint not in range(0x110000) >>>>> a[0] == b[0] >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >> codepoint not in range(0x110000) >>>>> a[0] >> 'abc' >>>>> b[0] >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >> codepoint not in range(0x110000) >> >> >> >> But I think that we simply need to take into account the "swap" flag. > > Ok, so first of all, I tried to disable the swapping in Python 3.2: > > if (swap) { > byte_swap_vector(buffer, itemsize >> 2, 4); > } > > And then it behaves *exactly* as in Python 3.3. So I am pretty sure > that the problem is right there and something > along the lines of my patch above should fix it. I had a few bugs > there, here is the correct version: > > diff --git a/numpy/core/src/multiarray/scalarapi.c > b/numpy/core/src/multiarray/s > index c134aed..bed73f7 100644 > --- a/numpy/core/src/multiarray/scalarapi.c > +++ b/numpy/core/src/multiarray/scalarapi.c > @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, > PyObject * > #if PY_VERSION_HEX >= 0x03030000 > if (type_num == NPY_UNICODE) { > PyObject *b, *args; > - b = PyBytes_FromStringAndSize(data, itemsize); > + if (swap) { > + char *buffer; > + buffer = malloc(itemsize); > + if (buffer == NULL) { > + PyErr_NoMemory(); > + } > + memcpy(buffer, data, itemsize); > + byte_swap_vector(buffer, itemsize >> 2, 4); > + b = PyBytes_FromStringAndSize(buffer, itemsize); > + free(buffer); > + } else { > + b = PyBytes_FromStringAndSize(data, itemsize); > + } > if (b == NULL) { > return NULL; > } > > > That works well, except that it gives the UnicodeDecodeError: > >>>> b[0].dtype > NULL > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: > codepoint not in range(0x110000) > > This error is actually triggered by this line: > > > obj = type->tp_new(type, args, NULL); > > in the patch by Stefan above. So I think what is happening is that it > simply tries to convert it from bytes > to a string and fails. That makes great sense. The question is why > doesn't it fail in exactly the same way > in Python 3.2? I think it's because the conversion check is bypassed > somehow. Stefan, I think > we need to swap it after the object is created. I am still > experimenting with this.
Well, I simply went to the Python sources and then implemented a solution that works with this patch: https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654 So now the PR actually seems to work. The rest of the failures are here: https://gist.github.com/3195520 and they seem to be unrelated. Can somebody please review this PR? https://github.com/numpy/numpy/pull/366 I will squash the commits after it's reviewed (I want to keep the history there for now). Ondrej _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion