On 8/2/07, ludvig.ericson <[EMAIL PROTECTED]> wrote: > On Aug 2, 11:02 pm, Gábor Farkas <[EMAIL PROTECTED]> wrote: > > Jacob Kaplan-Moss wrote: > > > On 8/2/07, Daniel Brandt <[EMAIL PROTECTED]> wrote: > > >> I am recieving POST-data that is submitted to my application not via a > > >> form or a browser, but from other web applications, according to a > > >> known protocol. This data may or may not have the charset of the data > > >> set in the Content-Type header. > > > There's also the possibility that Firefox looks for HTTP headers > telling it what charsets are acceptable -- though I forgot the name of > said header, it's one of the Accept-* ones. >
There is actually very inconsistent browser behavior when it comes to letting an app know the charset of a form. I hit this when developing a bookmarklet that would submit forms from web pages on arbitrary servers. From my experience, here is the short version: 1. There is no way to reliably know the charset of a url-encoded form across all browsers from the content of the submission, but the charset of the submitted form will be the same charset used to render the form. 2. You can use a hidden field _charset_ on IE and Firefox (but not Safari) to reliably get the charset. 3. W3C now recommends using multipart/form-data for non-ASCII data (essentially all forms) [1]: "The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data." (For a decent overview of the issue, but a little dated, see [2]) IMO, the best route forward for django would be to assume that the decoding should be done using the same charset the site is using to render pages. If the developer has special needs, they can use _charset_ or other means to determine the charset and handle the encoding. As an aside, I also found that virtually all browsers actually use Windows-1252 when they say they are using Latin-1 (across all Windows, Mac and Linux at least). The easiest test for this is the trademark symbol (tm) which doesn't exist in ISO-8859-1. This is described in Wikipedia [3] and can be seen by setting the encoding on your browser while viewing this page [4] for the Palm Treo which a literal (tm) in it that renders fine when the browser is set to ISO-8859-1. Greatest compatibility with browsers would also treat ISO-8859-1 as Windows-1252. I am new to django and this list, so I hope this email is constructive and helpful. Craig [1] http://www.w3.org/TR/html40/interact/forms.html#submit-format [2] http://www.crazysquirrel.com/computing/general/form-encoding.jspx [3] http://en.wikipedia.org/wiki/ISO_8859-1 [4] http://www.palm.com/us/products/smartphones/treo650/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---