On 05Sep2016 1308, Paul Moore wrote:
On 5 September 2016 at 20:30, Steve Dower <steve.do...@python.org> wrote:
The only case we can reasonably handle at the raw layer is "n / 4" is zero
but n != 0, in which case we can read and cache up to 4 bytes (one wchar_t)
and then return those in future calls. If we try to cache any more than that
we're substituting for buffered reader, which I don't want to do.

Does caching up to one (Unicode) character at a time sound reasonable? I
think that won't be much trouble, since there's no interference between
system calls in that case and it will be consistent with POSIX behaviour.

Caching a single character sounds perfectly OK. As I noted previously,
my use case probably won't need to work at the raw level anyway, so I
no longer expect to have code that will break, but I think that a
1-character buffer ensuring that we avoid surprises for code that was
written for POSIX is a good trade-off.

So it works, though the behaviour is a little strange when you do it from the interactive prompt:

>>> sys.stdin.buffer.raw.read(1)
ɒprint('hi')
b'\xc9'
>>> hi
>>> sys.stdin.buffer.raw.read(1)
b'\x92'
>>>

What happens here is the raw.read(1) rounds one byte up to one character, reads the turned alpha, returns a single byte of the two byte encoded form and caches the second byte. Then interactive mode reads from stdin and gets the rest of the characters, starting from the print() and executes that. Finally the next call to raw.read(1) returns the cached second byte of the turned alpha.

This is basically only a problem because the readline implementation is totally separate from the stdin object and doesn't know about the small cache (and for now, I think it's going to stay that way - merging readline and stdin would be great, but is a fairly significant task that won't make 3.6 at this stage).

I feel like this is an acceptable edge case, as it will only show up when interleaving calls to raw.read(n < 4) with multibyte characters and input()/interactive prompts. We've taken the 99% compatible to 99.99% compatible, and I feel like going any further is practically certain to introduce bugs (I'm being very careful with the single character buffering, but even that feels risky). Hopefully others agree with my risk assessment here, but speak up if you think it's worthwhile trying to deal with this final case.

Cheers,
Steve

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to