On Dec 12, 2009, at 11:15 PM, Stefan Behnel wrote:

> Robert Bradshaw, 12.12.2009 20:27:
>> However, I agree with your assessment of backwards incompatibility.
>> Consider
>>
>>     len("\xc3\xbf")
>>
>> In both Python 2 and Python 3 this gives 2, but in Cython it gives 2
>> when compiled against 2.x and 1 when compiled against 3.x. That seems
>> inconsistent.
>
> The inconsistent thing here is that the string changes semantics  
> *after*
> being parsed, whereas Python simply parses it differently in Py2 and  
> Py3.
>
> This could be worked around in Cython by parsing the string literal  
> twice
> (potentially in parallel) once with byte string semantics and once  
> with
> unicode string semantics, and then generate two C string literals  
> into the
> C code that get converted back into a Python string depending on the C
> compile time Python version. (Note that simple recoding isn't  
> possible as
> there may not be an encoding that maps the unicode string literal to  
> the
> byte string literal if character escapes are used).

Yeah, it wouldn't be trivial to change (though nor would it be that  
hard...)

> This whole 'str' semantics business is really getting hard to  
> understand by
> now. If we're having a hard time to "get it right", how is a user ever
> going to understand the semantics once we're done?


In my mind, the guiding principle should be that they behave in a .pyx  
file as similar as possible to the way they would behave in a .py  
file, and where there are differences we document and justify them.  
The smaller the number of differences, the easier for the user to  
understand. (Of course, we do things in .pyx files that don't make  
sense in Python, so it can be a bit more complicated.)

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to