Eryk Sun added the comment: For breaking out of the readall while loop, you only need to check if the current read is empty:
/* when the read is empty we break */ if (n == 0) break; Also, the logic is wrong here: if (len == 0 || buf[0] == '\x1a' && _buflen(self) == 0) { /* when the result starts with ^Z we return an empty buffer */ PyMem_Free(buf); return PyBytes_FromStringAndSize(NULL, 0); } This is true when len is 0 or when buf[0] is Ctrl+Z and _buflen(self) is 0. Since buf[0] shouldn't ever be Ctrl+Z here (low-level EOF handling is abstracted in read_console_w), it's never checking the internal buffer. We can easily see this going wrong here: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read() Ā^Z >>> a b'\xc4' >>> b b'' It misses the remaining byte in the internal buffer. This check can be simplified as follows: rn = _buflen(self); if (len == 0 && rn == 0) { /* return an empty buffer */ PyMem_Free(buf); return PyBytes_FromStringAndSize(NULL, 0); } After this the code assumes that len isn't 0, which leads to more WideCharToMultiByte failure cases. In the last conversion it's overwrite bytes_size without including rn. I'm not sure what's going on with _PyBytes_Resize(&bytes, n * sizeof(wchar_t)). ISTM, it should be resized to bytes_size, and make sure this includes rn. Finally, _copyfrombuf is repeatedly overwriting buf[0] instead of writing to buf[n]. With the attached patch, the behavior seems correct now: >>> sys.stdin.buffer.raw.read() ^Z b'' >>> sys.stdin.buffer.raw.read() abc^Z ^Z b'abc\x1a\r\n' Split U+0100: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read() Ā^Z >>> a b'\xc4' >>> b b'\x80' Split U+1234: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read() ሴ^Z >>> a b'\xe1' >>> b b'\x88\xb4' The buffer still can't handle splitting an initial non-BMP character, stored as a surrogate pair. Both codes end up as replacement characters because they aren't transcoded as a unit. Split U+00010000: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read() 𐀀^Z ^Z >>> a b'\xef' >>> b b'\xbf\xbd\xef\xbf\xbd\x1a\r\n' ---------- keywords: +patch status: closed -> open Added file: http://bugs.python.org/file44766/issue_28162_01.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28162> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com