New submission from Mark Dickinson <dicki...@gmail.com>:
The Unicode HOWTO currently has contains this text in the "Files in an Unknown Encoding" section [1]: > The surrogateescape error handler will decode any non-ASCII bytes as code > points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These > private code points will then be turned back into the same bytes when the > surrogateescape error handler is used when encoding the data and writing it > back out. Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that use these surrogates in their UTF-16 encoding are the codepoints in planes 15 and 16, which are almost entirely PUA codepoints), but that's not what the surrogateescape handler is using. [1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding ---------- assignee: docs@python components: Documentation messages: 323976 nosy: docs@python, mark.dickinson priority: normal severity: normal status: open title: Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape versions: Python 3.6, Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34484> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com