Martin v. Löwis <mar...@v.loewis.de> added the comment: > UTF-16 units are 16-bit words, not bytes, so '\uffffd' sounds correct to > me. You resynchronize on the word boundary: the invalid word is skipped.
I agree. The only odd case is when the number of bytes is not even (pun intended). In that case, anybody can guess which of the bytes is extra. The most natural (IMO) assumption is that the data is truncated, so it would be the last byte which is extra. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14579> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com