And Clover added the comment:

WSGI's usage of ISO-8859-1 for all HTTP-byte-originated strings is very much 
deliberate; we needed a way to preserve the original input bytes whilst still 
using unicode strings, and at the time surrogateescape was not available. The 
result is counter-intuitive but at least it is finally consistent; the 
expectation is that most web authors will be using some kind of web framework 
or input-reading library that will hide away the unpleasant details.

See http://mail.python.org/pipermail/web-sig/2007-December/thread.html#3002 and 
http://mail.python.org/pipermail/web-sig/2010-July/thread.html#4473 for the 
background discussion.

In any case we cannot assume a path is UTF-8 - not every URI is known to have 
come from an IRI so RFC 3987 does not necessarily apply. 
UTF-8-with-Latin1-fallback is also undesirable in itself as it adds ambiguity - 
an ISO-8859-1 byte sequence that by coincidence happens to be a valid UTF-8 
byte sequence will get mangled.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16679>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to