Martin Panter added the comment: Serhiy’s patch essentially uses the local filesystem encoding and then percent encoding, rather than the current behaviour of strict UTF-8 encoding and percent encoding. This is similar to what the “pathlib” make_uri() methods do, so maybe we could let “pathlib” do the work instead.
This draft RFC discusses encoding “file:” URLs: https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03#section-4 It suggests leaving Unicode characters alone (in IRIs) if possible, or using UTF-8 and percent encoding even if the filesystem uses a non-UTF-8 encoding. Perhaps we could leave the filename in the HTML as Unicode characters without percent encoding, and only percent encode the undecodable (surrogate-escaped) bytes. This “IRI” scheme is also recommended by <http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>, which says on Windows, “in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage”. This contradicts the draft RFC and the “pathlib” implementation, which both use UTF-8. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25184> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com