Graham Dumpleton added the comment:

You can't try UTF-8 and then fall back to ISO-8859-1. PEP 3333 requires it 
always be ISO-8859-1. If an application needs it as something else, it is the 
web applications job to do it.

The relevant part of the PEP is:

"""On Python platforms where the str or StringType type is in fact 
Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred 
to in this specification must contain only code points representable in 
ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for 
an application to supply strings containing any other Unicode character or code 
point. Similarly, servers and gateways must not supply strings to an 
application containing any other Unicode characters."""

By converting as UTF-8 you would be breaking the requirement that only code 
points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive) 
are passed through.

So it is inconvenient if your expectation is that will always be UTF-8, but is 
how it has to work. This is because it could be something other than UTF-8, yet 
still be able to be successfully converted as UTF-8. In that case the 
application would get something totally different to the original which is 
wrong.

So, the WSGI server cannot ever make any assumptions and the WSGI application 
always has to be the one which converts it to the correct Unicode string. The 
only way that can be done and still pass through a native string, is that it is 
done as ISO-8859-1 (which is byte preserving), allowing the application to go 
back to bytes and then back to Unicode in correct encoding.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16679>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to