Robert Bradshaw, 05.09.2010 07:06:
> On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel wrote:
>> Robert Bradshaw, 04.09.2010 22:04:
>>> How about we parse the literals as unicode strings, and if used in a
>>> bytes context we raise a compile time error if any characters are
>>> larger than a char?
>>
>> Can't work because you cannot recover the original byte sequence from a
>> decoded Unicode string. It may have used escapes or not, and it may or may
>> not be encodable using the source code encoding.
>
> I'm saying we shouldn't care about using escapes, and should raise a
> compile time error if it's not encodable using the source encoding.

In that case, you'd break most code that actually uses escapes. If the byte 
values were correctly representable using the source encoding the escapes 
wouldn't be necessary in the first place.


> In other words, I'm not a fan of
>
>      foo("abc \u0001")
>
> behaving (in my opinion) very differently depending on whether foo
> takes a char* or object argument.

It's Python compatible, though:

     Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
     [GCC 4.4.3] on linux2
     Type "help", "copyright", "credits" or "license" for more information.
     >>> 'abc \u0001'
     'abc \\u0001'
     >>> len('abc \u0001')
     10
     >>> u'abc \u0001'
     u'abc \x01'
     >>> len(u'abc \u0001')
     5

Same for Python 3 with the 'b' prefix on the byte string examples.

The fix I committed mimics this behaviour.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to