Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Thomas Breuel Wed, 29 Apr 2009 23:29:24 -0700

On Wed, Apr 29, 2009 at 23:03, Terry Reedy <[email protected]> wrote:

> Thomas Breuel wrote:
>
>>
>>    Sure. However, that requires you to provide meaningful, reproducible
>>    counter-examples, rather than a stenographic formulation that might
>>    hint some problem you apparently see (which I believe is just not
>>    there).
>>
>>
>> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
>> surrogates.
>>
>
> By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows
> that.



If we use conformance to Unicode 5.1 as the basis for our discussion, then
PEP 383 is off the table anyway.  I'm all for strict Unicode compliance.
But apparently, the Python community doesn't care.

CESU-8 is described in Unicode Technical Report #26, so it at least has some
official recognition.  More importantly, it's also widely used.  So, my
question: what are the implications of PEP 383 for CESU-8 encodings on
Python?

My meta-point is: there are probably many more such issues hidden away and
it is a really bad idea to rush something like PEP 383 out.  Unicode is hard
anyway, and tinkering with its semantics requires a lot of thought.

Tom

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to