Martin Panter added the comment:

Serhiy’s patch essentially uses the local filesystem encoding and then percent 
encoding, rather than the current behaviour of strict UTF-8 encoding and 
percent encoding. This is similar to what the “pathlib” make_uri() methods do, 
so maybe we could let “pathlib” do the work instead.

This draft RFC discusses encoding “file:” URLs:

https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03#section-4

It suggests leaving Unicode characters alone (in IRIs) if possible, or using 
UTF-8 and percent encoding even if the filesystem uses a non-UTF-8 encoding. 
Perhaps we could leave the filename in the HTML as Unicode characters without 
percent encoding, and only percent encode the undecodable (surrogate-escaped) 
bytes.

This “IRI” scheme is also recommended by 
<http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>, 
which says on Windows, “in file URIs, percent-encoded octets are interpreted as 
a byte in the user’s current codepage”. This contradicts the draft RFC and the 
“pathlib” implementation, which both use UTF-8.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25184>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to