Glenn Linderman <v+pyt...@g.nevcal.com> added the comment:

Aha!

Found a page <http://htmlpurifier.org/docs/enduser-utf8.html#whyutf8-support> 
which links to another page 
<http://web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html>
 that explains the behavior.

The synopsis is that browsers (all modern browsers) return form data
Form data is generally returned in the same character encoding as the Form page 
itself was sent to the client.

I suspect this explains the differences between what Pierre and I are 
reporting.  I suspect (but would appreciate confirmation from Pierre), that his 
web pages use 
<meta http-equiv="Content-Type" content="text/html; charset=CP-1252" />
or else do not use such a meta tag, and his server is configured (or defaults) 
to send HTTP headers:
Content-Type: text/html; charset=CP-1252

Whereas, I do know that all my web pages are coded in UTF-8, have no meta tags, 
and my CGI scripts are sending 
Content-Type: text/html; charset=UTF-8
for all served form pages... and thus getting back UTF-8 also, per the above 
explanation.

What does this mean for Python support for http.server and cgi?
Well, http.server, by default, sends Content-Type without charset, except for 
directory listings, where it supplies charset= the result of 
sys.getfilesystemcoding().  So it is up to META tags to define the coding, or 
for the browser to guess.  That's probably OK: for a single machine 
environment, it is likely that the data files are coded in the default file 
system encoding, and it is likely the browser will guess that.  But it quickly 
breaks when going to a multiple machine or internet environment with different 
default encodings on different machines.  So if using http.server in such an 
environment, it is necessary to inform the client of the page encoding using 
META tags, or generating the Content-Type: HTTP header in the CGI script (which 
latter is what I'm doing for the forms and data of interest).

What does it mean for cgi.py's FieldStorage?

Well, use of the default encoding can work in the single machine environment... 
so I guess there are would be worse things that doing so, as Pierre has been 
doing.  But clearly, that isn't the complete solution.  The new parameter he 
proposes to FieldStorage can be used, if the application can properly determine 
the likeliest encoding for the form data, before calling it.

On a single machine system, that could be the default, as mentioned above.  On 
a single application web server, it could be some constant encoding used for 
all pages (like I use UTF-8 for all my pages).  For a multiple application web 
server, as long as each application uses a consistent encoding, that 
application could properly guess the encoding to pass to FieldStorage.  Or, if 
the application wishes to allow multiple encodings, as long as it can keep 
track of them, and use the right ones at the right time, it is welcome to.

How does this affect email?  Not at all, directly.

How does this affect cgi.py's use of email?
It means that cgi.py cannot use BytesFeedParser, in spite of what the standards 
say, so Pierre's approach of predecoding the headers is the correct one, since 
email doesn't offer an encoding parameter.  Since email doesn't support disk 
storage for file uploads, but buffers everything in memory, it means that 
cgi.py can only pass headers to FeedParser, so has to detect end-of-headers 
itself, since email provides no feedback to indicate that end-of-headers was 
reached, and that means that cgi.py must parse the MIME parts itself, so it can 
put the large parts on disk. It means that the email package provides extremely 
little value to cgi.py, and since web browsers and multipart/form-data use 
simple subsets of the full power of RFC822 headers, email could be replaced 
with the use of its existing parse_header function, but that should be 
deprecated.  A copy could be moved inside FieldStorage class and fixed a bit.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4953>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to