Eryk Sun <[email protected]> added the comment:
Apparently handling non-BMP codes is broken in recent builds of the new console
in Windows 10. I see this problem in build 18362 as well. It seems there have
been updates that have changed the naive way the console used to handle
surrogate codes as just regular UCS-2 codes, and this has disrupted the UTF-16
wide-character API in several ways. This is probably related to the new support
for virtual-terminal emulation and pseudoconsoles, since supporting a UTF-8
stream interface has required significant redesign of the console backend.
Low-level ReadConsoleInputW and WriteConsoleInputW still work, but high-level
ReadConsoleW now fails if it encounters a non-BMP surrogate pair, i.e. at least
two key-event records with the non-BMP character encoded as a UTF-16 surrogate
pair. It can be more than two input records depending on the source of input --
WriteConsoleInputW vs pasting from the clipboard -- in terms of KeyDown/KeyUp
events or an Alt+Numpad sequence.
There are issues with reading from screen buffers as well. WriteConsoleW can
still successfully write non-BMP characters, and these can be copied from the
console fine. But ReadConsoleOutputCharacterW can no longer read them. This
used to work, but now it 'succeeds with 0 characters read if the screen-buffer
region contains a non-BMP character. I checked the lower-level
ReadConsoleOutputW function, and it's behaving differently now. It used to read
a non-BMP character as two CHAR_INFO records containing the surrogate pair
codes, but now it reads a non-BMP character as a single CHAR_INFO record
containing a replacement character U+FFFD.
I suppose we need to skip testing non-BMP and surrogate codes if the Windows
version is (10, 0, 18362) and above.
Also, _testconsole needs to support FlushConsoleInputBuffer. Every test that
calls _testconsole.write_input should be isolated with a try/finally that
flushes the input buffer at the end. For example:
write_input(raw, 'spam')
try:
actual = input()
finally:
flush_input(raw)
If reading fails, 'spam' will be flushed from the input buffer.
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue38325>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com