Martin Panter added the comment:
I don’t have much to do with Windows, but I understand we don’t support
surrogate-escaped bytes there. E.g. see
<https://docs.python.org/dev/library/os.html#os.fsdecode> and
sys.getfilesystemencoding(). However I suspect your first patch would have
failed on Windows doing os.fsencode(TESTFN_UNENCODABLE); apparently it cannot
represent all possible file names in bytes. The second patch doesn’t call
fsencode() so this shouldn’t be a problem.
Your tests do not test that the output is valid Unicode without surrogates.
With your first patch applied, when pydoc wrote the HTML to a UTF-8 disk file,
I got the error:
File "/media/disk/home/proj/python/cpy\udcffthon/Lib/pydoc.py", line 2626, in
cli
writedoc(arg)
File "/media/disk/home/proj/python/cpy\udcffthon/Lib/pydoc.py", line 1659, in
writedoc
file.write(page)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position
674: surrogates not allowed
I have been working on an alternative patch using my IRI (Unicode URLs)
proposal for “file:” links, and “surrogatepass” for HTTP links. But I am also
trying to fix some related problems with the built-in HTTP server, and the unit
tests are a bit tricky.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25184>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com