[issue34484] Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape

Mark Dickinson Thu, 23 Aug 2018 14:15:02 -0700


New submission from Mark Dickinson <[email protected]>:


The Unicode HOWTO currently has contains this text in the "Files in an Unknown 
Encoding" section [1]:

> The surrogateescape error handler will decode any non-ASCII bytes as code
> points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These
> private code points will then be turned back into the same bytes when the
> surrogateescape error handler is used when encoding the data and writing it
> back out.

Unless I'm missing something, the subrange U+DC80 to U+DCFF of the low 
surrogates is *not* a Private Use Area. There *is* a kinda-sorta PUA in the 
high surrogates from U+DB80 to U+DBFF (because the only valid codepoints that 
use these surrogates in their UTF-16 encoding are the codepoints in planes 15 
and 16, which are almost entirely PUA codepoints), but that's not what the 
surrogateescape handler is using.


[1] https://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding

----------
assignee: docs@python
components: Documentation
messages: 323976
nosy: docs@python, mark.dickinson
priority: normal
severity: normal
status: open
title: Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape
versions: Python 3.6, Python 3.7

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34484>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34484] Unicode HOWTO incorrectly refers to Private Use Area for surrogateescape

Reply via email to