Steve Dower added the comment:

New patch attached (1602_2.patch - hopefully the review will work this time 
too).

I discovered while researching for the PEP that a decent amount of code expects 
to be able to write ASCII to sys.stdout.buffer (or sys.stdout.buffer.raw). As 
my first patch required utf-16-le at this point, it was going to cause havoc.

Rather than break that compatibility, I decided that exposing utf-8 and doing 
the reencoding at the latest possible stage was better. This is also more 
consistent with how other encoding issues are likely to be resolved, and 
shouldn't be any less performant, given that previously we were decoding to 
utf-16 anyway.

The downsides of this is that read(n) now can only read up to n/4 characters, 
and write(n) has a much more complicated time dealing with large buffers (as we 
need to cap the number of utf-16-le bytes but return the number of utf-8 bytes 
- it's not a direct relationship, so there's more work and a little bit of 
guessing in some cases).

On the upside, the readline handling is simpler as utf-8 is compatible with the 
existing interface and now sys.stdin.encoding is accurate. I've rolled that fix 
into this patch (just the myreadline.c change) as they really ought to go in 
together.

----------
Added file: http://bugs.python.org/file44290/1602_2.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to