Stefan, On Sat, Jul 28, 2012 at 2:36 AM, Stefan Krah <stefan-use...@bytereef.org> wrote: > Ond??ej ??ert??k <ondrej.cer...@gmail.com> wrote: >> >> I took a brief look at it, and from the errors I have seen, one is >> >> cosmetic, the other one is a bit more involved (rewriting >> >> PyArray_Scalar unicode support). While it is not difficult in nature, >> >> the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it >> >> would require multiple configurations on multiple python versions to >> >> be tested. > > The cleanest way might be to leave the existing code in place and write > completely new and independent code for Python 3.3. > > >> https://github.com/numpy/numpy/pull/366 >> >> It's a work in progress, I am still have some little issues, see the >> PR for up-to-date details. > > I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE > altogether.
I think so too. > > What should matter in 3.3 is the maximum character in a Unicode string that > determines the kind of the string: > > PyUnicode_1BYTE_KIND -> Py_UCS1 > PyUnicode_2BYTE_KIND -> Py_UCS2 > PyUnicode_4BYTE_KIND -> Py_UCS4 > > > So Py_UNICODE_WIDE should not matter as all builds support > PyUnicode_4BYTE_KIND. > That's why I /think/ it's possible to drop Py_UNICODE altogether. For > instance, > the line in > https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8deccc > should probably be: > > itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj) Yes, I think that's it. I've changed it and pushed in the change into the PR. I am now seeing failures like these: ====================================================================== ERROR: test_rmul (test_defchararray.TestOperations) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_defchararray.py", line 592, in test_rmul Ar = np.array([[A[0,0]*r, A[0,1]*r], File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/defchararray.py", line 1916, in __getitem__ if issubclass(val.dtype.type, character) and not _len(val) == 0: AttributeError: 'str' object has no attribute 'dtype' Here is the code in defchararray.py: 1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp and here is some debugging info: (Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val) <class 'str'> So "val" is a Python string, which of course doesn't have .dtype. What I don't understand yet is why val = ndarray.__getitem__(self, obj) returns a Python string. I've been debugging it for a few hours yesterday, but so far no luck. Then there are failures in the test_unicode.py of the following type: ====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true I didn't dig into those yet. If anyone has any ideas, let me know. Ondrej _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion