On 8/2/07, ludvig.ericson <[EMAIL PROTECTED]> wrote:
> On Aug 2, 11:02 pm, Gábor Farkas <[EMAIL PROTECTED]> wrote:
> > Jacob Kaplan-Moss wrote:
> > > On 8/2/07, Daniel Brandt <[EMAIL PROTECTED]> wrote:
> > >> I am recieving POST-data that is submitted to my application not via a
> > >> form or a browser, but from other web applications, according to a
> > >> known protocol. This data may or may not have the charset of the data
> > >> set in the Content-Type header.
> >
> There's also the possibility that Firefox looks for HTTP headers
> telling it what charsets are acceptable -- though I forgot the name of
> said header, it's one of the Accept-* ones.
>

There is actually very inconsistent browser behavior when it comes to
letting an app know the charset of a form.

I hit this when developing a bookmarklet that would submit forms from
web pages on arbitrary servers.  From my experience, here is the short
version:

1. There is no way to reliably know the charset of a url-encoded form
across all browsers from the content of the submission, but the
charset of the submitted form will be the same charset used to render
the form.

2. You can use a hidden field _charset_ on IE and Firefox (but not
Safari) to reliably get the charset.

3. W3C now recommends using multipart/form-data for non-ASCII data
(essentially all forms) [1]:

"The content type "application/x-www-form-urlencoded" is inefficient
for sending large quantities of binary data or text containing
non-ASCII characters. The content type "multipart/form-data" should be
used for submitting forms that contain files, non-ASCII data, and
binary data."

(For a decent overview of the issue, but a little dated, see [2])

IMO, the best route forward for django would be to assume that the
decoding should be done using the same charset the site is using to
render pages.  If the developer has special needs, they can use
_charset_ or other means to determine the charset and handle the
encoding.

As an aside, I also found that virtually all browsers actually use
Windows-1252 when they say they are using Latin-1 (across all Windows,
Mac and Linux at least).  The easiest test for this is the trademark
symbol (tm) which doesn't exist in ISO-8859-1.  This is described in
Wikipedia [3] and can be seen by setting the encoding on your browser
while viewing this page [4] for the Palm Treo which a literal (tm) in
it that renders fine when the browser is set to ISO-8859-1.  Greatest
compatibility with browsers would also treat ISO-8859-1 as
Windows-1252.

I am new to django and this list, so I hope this email is constructive
and helpful.

Craig

[1] http://www.w3.org/TR/html40/interact/forms.html#submit-format
[2] http://www.crazysquirrel.com/computing/general/form-encoding.jspx
[3] http://en.wikipedia.org/wiki/ISO_8859-1
[4] http://www.palm.com/us/products/smartphones/treo650/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to