Re: [Cython] coercion of char/Py_UNICODE to Python objects - string or integer?

Robert Bradshaw Sat, 24 Apr 2010 22:06:29 -0700

On Apr 21, 2010, at 10:37 PM, Stefan Behnel wrote:

> Lisandro Dalcin, 21.04.2010 23:26:
>> What do you think?
>>
>> diff -r 2701901737d4 Cython/Compiler/PyrexTypes.py
>> --- a/Cython/Compiler/PyrexTypes.py  Wed Apr 21 15:36:27 2010 +0200
>> +++ b/Cython/Compiler/PyrexTypes.py  Wed Apr 21 18:25:42 2010 -0300
>> @@ -871,7 +871,7 @@
>>      # to integers here.  The maximum value for a Py_UNICODE is
>>      # 1114111, so PyInt_FromLong() will do just fine here.
>>
>> -    to_py_function = "PyInt_FromLong"
>> +    to_py_function = "PyUnicode_FromOrdinal"
>>
>>      def sign_and_name(self):
>>          return "Py_UNICODE"
>
> I didn't know about that function, even though I had looked for it  
> in the
> CPython docs. It's available in all relevant CPython versions, and  
> it's
> pretty efficient, too.
>
> This would let Py_UNICODE values turn into a single character unicode
> string when coercing to a Python object. I had also thought about  
> this, and
> wasn't sure what I wanted. In current Cython, 'char' doesn't coerce  
> to a
> single character 'bytes' object but to an integer. My thinking was  
> that
> Py_UNICODE should behave the same.
>
> This is a bit inconsistent in itself, given that single character  
> strings
> can coerce to their C ordinal value, e.g. on comparison with
> char/Py_UNICODE, but not so much of an inconsistency to break  
> backwards
> compatibility. I'm really not sure what the 'expected' behaviour is  
> here,
> although I'm leaning slightly towards the char/bytes and Py_UNICODE/ 
> unicode
> coercion.
>
> It's certainly easier to write
>
>     cdef Py_UNICODE cval = some_c_integer
>
>     py_object = <long>cval
>
> to get a Python integer value, than to find, import and call
> PyUnicode_FromOrdinal() to get a unicode string. There doesn't seem  
> to be
> an equivalent PyBytes function, so I guess the PyBytes conversion  
> would use
>
>     py_bytes = PyBytes_FromStringAndSize(&char_val, 1)
>
> which isn't exactly beautiful either, and certainly less so than the  
> opposite
>
>     py_integer = <int>char_val
>
> This would also speak in favour of letting char and Py_UNICODE  
> coerce to
> Python strings by default, although the above would go away if we  
> special
> cased the builtin chr() function to output exactly the above code  
> for each
> input type.
>
> Another option is to consider Py_UNICODE more special (and more  
> specific)
> than the somewhat generic 'char', and to accept the inconsistency of
> coercing one to a unicode string and the other to an integer.
>
> What do the others think?



I think char -> bytes and Py_UNICODE -> unicode make a lot of sense,  
my only concern would be backwards incompatibility.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] coercion of char/Py_UNICODE to Python objects - string or integer?

Reply via email to