On Apr 21, 2010, at 10:37 PM, Stefan Behnel wrote: > Lisandro Dalcin, 21.04.2010 23:26: >> What do you think? >> >> diff -r 2701901737d4 Cython/Compiler/PyrexTypes.py >> --- a/Cython/Compiler/PyrexTypes.py Wed Apr 21 15:36:27 2010 +0200 >> +++ b/Cython/Compiler/PyrexTypes.py Wed Apr 21 18:25:42 2010 -0300 >> @@ -871,7 +871,7 @@ >> # to integers here. The maximum value for a Py_UNICODE is >> # 1114111, so PyInt_FromLong() will do just fine here. >> >> - to_py_function = "PyInt_FromLong" >> + to_py_function = "PyUnicode_FromOrdinal" >> >> def sign_and_name(self): >> return "Py_UNICODE" > > I didn't know about that function, even though I had looked for it > in the > CPython docs. It's available in all relevant CPython versions, and > it's > pretty efficient, too. > > This would let Py_UNICODE values turn into a single character unicode > string when coercing to a Python object. I had also thought about > this, and > wasn't sure what I wanted. In current Cython, 'char' doesn't coerce > to a > single character 'bytes' object but to an integer. My thinking was > that > Py_UNICODE should behave the same. > > This is a bit inconsistent in itself, given that single character > strings > can coerce to their C ordinal value, e.g. on comparison with > char/Py_UNICODE, but not so much of an inconsistency to break > backwards > compatibility. I'm really not sure what the 'expected' behaviour is > here, > although I'm leaning slightly towards the char/bytes and Py_UNICODE/ > unicode > coercion. > > It's certainly easier to write > > cdef Py_UNICODE cval = some_c_integer > > py_object = <long>cval > > to get a Python integer value, than to find, import and call > PyUnicode_FromOrdinal() to get a unicode string. There doesn't seem > to be > an equivalent PyBytes function, so I guess the PyBytes conversion > would use > > py_bytes = PyBytes_FromStringAndSize(&char_val, 1) > > which isn't exactly beautiful either, and certainly less so than the > opposite > > py_integer = <int>char_val > > This would also speak in favour of letting char and Py_UNICODE > coerce to > Python strings by default, although the above would go away if we > special > cased the builtin chr() function to output exactly the above code > for each > input type. > > Another option is to consider Py_UNICODE more special (and more > specific) > than the somewhat generic 'char', and to accept the inconsistency of > coercing one to a unicode string and the other to an integer. > > What do the others think?
I think char -> bytes and Py_UNICODE -> unicode make a lot of sense, my only concern would be backwards incompatibility. - Robert _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
