Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Steve Dower Mon, 05 Sep 2016 14:48:06 -0700

On 05Sep2016 1308, Paul Moore wrote:

On 5 September 2016 at 20:30, Steve Dower <steve.do...@python.org> wrote:

The only case we can reasonably handle at the raw layer is "n / 4" is zero
but n != 0, in which case we can read and cache up to 4 bytes (one wchar_t)
and then return those in future calls. If we try to cache any more than that
we're substituting for buffered reader, which I don't want to do.


Does caching up to one (Unicode) character at a time sound reasonable? I
think that won't be much trouble, since there's no interference between
system calls in that case and it will be consistent with POSIX behaviour.


Caching a single character sounds perfectly OK. As I noted previously,
my use case probably won't need to work at the raw level anyway, so I
no longer expect to have code that will break, but I think that a
1-character buffer ensuring that we avoid surprises for code that was
written for POSIX is a good trade-off.

So it works, though the behaviour is a little strange when you do itfrom the interactive prompt:


>>> sys.stdin.buffer.raw.read(1)
ɒprint('hi')
b'\xc9'
>>> hi
>>> sys.stdin.buffer.raw.read(1)
b'\x92'
>>>

What happens here is the raw.read(1) rounds one byte up to onecharacter, reads the turned alpha, returns a single byte of the two byteencoded form and caches the second byte. Then interactive mode readsfrom stdin and gets the rest of the characters, starting from theprint() and executes that. Finally the next call to raw.read(1) returnsthe cached second byte of the turned alpha.

This is basically only a problem because the readline implementation istotally separate from the stdin object and doesn't know about the smallcache (and for now, I think it's going to stay that way - mergingreadline and stdin would be great, but is a fairly significant task thatwon't make 3.6 at this stage).

I feel like this is an acceptable edge case, as it will only show upwhen interleaving calls to raw.read(n < 4) with multibyte characters andinput()/interactive prompts. We've taken the 99% compatible to 99.99%compatible, and I feel like going any further is practically certain tointroduce bugs (I'm being very careful with the single characterbuffering, but even that feels risky). Hopefully others agree with myrisk assessment here, but speak up if you think it's worthwhile tryingto deal with this final case.


Cheers,
Steve

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Reply via email to