Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Lino Mastrodomenico Tue, 28 Apr 2009 06:14:54 -0700

2009/4/28 Hrvoje Niksic <[email protected]>:
> Lino Mastrodomenico wrote:
>>
>> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid
>> character when
>> decoded with UTF-8, it should simply be considered an invalid UTF-8
>> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
>> '\udcff').
>
> "Should be considered" or "will be considered"?  Python 3.0's UTF-8 decoder
> happily accepts it and returns u'\udcff':
>
>>>> b'\xed\xb3\xbf'.decode('utf-8')
> '\udcff'


Only for the new utf-8b encoding (if Martin agrees), while the
existing utf-8 is fine as is (or at least waaay outside the scope of
this PEP).

-- 
Lino Mastrodomenico
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to