On 7/28/2012 6:09 PM, Ondřej Čertík wrote: > On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.cer...@gmail.com> > wrote: >> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >> wrote: >>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >>> wrote: >>>> Many of the failures in >>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 >>>> are of the type: >>>> >>>> ====================================================================== >>>> FAIL: Check byteorder of single-dimensional objects >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File >>>> "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", >>>> line 286, in test_valuesSD >>>> self.assertTrue(ua[0] != ua2[0]) >>>> AssertionError: False is not true >>>> >>>> >>>> and those are caused by the following minimal example: >>>> >>>> Python 3.2: >>>> >>>>>>> from numpy import array >>>>>>> a = array(["abc"]) >>>>>>> b = a.newbyteorder() >>>>>>> a.dtype >>>> dtype('<U3') >>>>>>> b.dtype >>>> dtype('>U3') >>>>>>> a[0].dtype >>>> dtype('<U3') >>>>>>> b[0].dtype >>>> dtype('<U6') >>>>>>> a[0] == b[0] >>>> False >>>>>>> a[0] >>>> 'abc' >>>>>>> b[0] >>>> 'ៀ\udc00埀\udc00韀\udc00' >>>> >>>> >>>> Python 3.3: >>>> >>>> >>>>>>> from numpy import array >>>>>>> a = array(["abc"]) >>>>>>> b = a.newbyteorder() >>>>>>> a.dtype >>>> dtype('<U3') >>>>>>> b.dtype >>>> dtype('>U3') >>>>>>> a[0].dtype >>>> dtype('<U3') >>>>>>> b[0].dtype >>>> dtype('<U3') >>>>>>> a[0] == b[0] >>>> True >>>>>>> a[0] >>>> 'abc' >>>>>>> b[0] >>>> 'abc' >>>> >>>> >>>> So somehow the newbyteorder() method doesn't change the dtype of the >>>> elements in our new code. >>>> This method is implemented in numpy/core/src/multiarray/descriptor.c >>>> (I think), but so far I don't see >>>> where the problem could be. >>>> >>>> Any ideas? >>> >>> Ok, after some investigating, I think we need to do something along these >>> lines: >>> >>> diff --git a/numpy/core/src/multiarray/scalarapi.c >>> b/numpy/core/src/multiarray/s >>> index c134aed..daf7fc4 100644 >>> --- a/numpy/core/src/multiarray/scalarapi.c >>> +++ b/numpy/core/src/multiarray/scalarapi.c >>> @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, >>> PyObject * >>> #if PY_VERSION_HEX >= 0x03030000 >>> if (type_num == NPY_UNICODE) { >>> PyObject *b, *args; >>> - b = PyBytes_FromStringAndSize(data, itemsize); >>> + if (swap) { >>> + char *buffer; >>> + buffer = malloc(itemsize); >>> + if (buffer == NULL) { >>> + PyErr_NoMemory(); >>> + } >>> + memcpy(buffer, data, itemsize); >>> + byte_swap_vector(buffer, itemsize, 4); >>> + b = PyBytes_FromStringAndSize(buffer, itemsize); >>> + // We have to deallocate this later, otherwise we get a >>> segfault... >>> + //free(buffer); >>> + } else { >>> + b = PyBytes_FromStringAndSize(data, itemsize); >>> + } >>> if (b == NULL) { >>> return NULL; >>> } >>> >>> This particular implementation still fails though: >>> >>> >>>>>> from numpy import array >>>>>> a = array(["abc"]) >>>>>> b = a.newbyteorder() >>>>>> a.dtype >>> dtype('<U3') >>>>>> b.dtype >>> dtype('>U3') >>>>>> a[0].dtype >>> dtype('<U3') >>>>>> b[0].dtype >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>> codepoint not in range(0x110000) >>>>>> a[0] == b[0] >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>> codepoint not in range(0x110000) >>>>>> a[0] >>> 'abc' >>>>>> b[0] >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>> codepoint not in range(0x110000) >>> >>> >>> >>> But I think that we simply need to take into account the "swap" flag. >> >> Ok, so first of all, I tried to disable the swapping in Python 3.2: >> >> if (swap) { >> byte_swap_vector(buffer, itemsize >> 2, 4); >> } >> >> And then it behaves *exactly* as in Python 3.3. So I am pretty sure >> that the problem is right there and something >> along the lines of my patch above should fix it. I had a few bugs >> there, here is the correct version: >> >> diff --git a/numpy/core/src/multiarray/scalarapi.c >> b/numpy/core/src/multiarray/s >> index c134aed..bed73f7 100644 >> --- a/numpy/core/src/multiarray/scalarapi.c >> +++ b/numpy/core/src/multiarray/scalarapi.c >> @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, >> PyObject * >> #if PY_VERSION_HEX >= 0x03030000 >> if (type_num == NPY_UNICODE) { >> PyObject *b, *args; >> - b = PyBytes_FromStringAndSize(data, itemsize); >> + if (swap) { >> + char *buffer; >> + buffer = malloc(itemsize); >> + if (buffer == NULL) { >> + PyErr_NoMemory(); >> + } >> + memcpy(buffer, data, itemsize); >> + byte_swap_vector(buffer, itemsize >> 2, 4); >> + b = PyBytes_FromStringAndSize(buffer, itemsize); >> + free(buffer); >> + } else { >> + b = PyBytes_FromStringAndSize(data, itemsize); >> + } >> if (b == NULL) { >> return NULL; >> } >> >> >> That works well, except that it gives the UnicodeDecodeError: >> >>>>> b[0].dtype >> NULL >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >> codepoint not in range(0x110000) >> >> This error is actually triggered by this line: >> >> >> obj = type->tp_new(type, args, NULL); >> >> in the patch by Stefan above. So I think what is happening is that it >> simply tries to convert it from bytes >> to a string and fails. That makes great sense. The question is why >> doesn't it fail in exactly the same way >> in Python 3.2? I think it's because the conversion check is bypassed >> somehow. Stefan, I think >> we need to swap it after the object is created. I am still >> experimenting with this. > > Well, I simply went to the Python sources and then implemented a > solution that works with this patch: > > https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654 > > So now the PR actually seems to work. The rest of the failures are here: > > https://gist.github.com/3195520 > > and they seem to be unrelated. Can somebody please review this PR? > > https://github.com/numpy/numpy/pull/366 > > > I will squash the commits after it's reviewed (I want to keep the > history there for now). > > > Ondrej
Thank you. I backported the PR to numpy 1.6.2 and it works for me on win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures of the kind: AssertionError: Items are not equal: ACTUAL: () DESIRED: None Christoph _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion