On Thu, 2007-08-02 at 15:14 -0700, Craig Ogg wrote:
> On 8/2/07, ludvig.ericson <[EMAIL PROTECTED]> wrote:
> > On Aug 2, 11:02 pm, Gábor Farkas <[EMAIL PROTECTED]> wrote:
> > > Jacob Kaplan-Moss wrote:
> > > > On 8/2/07, Daniel Brandt <[EMAIL PROTECTED]> wrote:
> > > >> I am recieving POST-data that is submitted to my application not via a
> > > >> form or a browser, but from other web applications, according to a
> > > >> known protocol. This data may or may not have the charset of the data
> > > >> set in the Content-Type header.
> > >
> > There's also the possibility that Firefox looks for HTTP headers
> > telling it what charsets are acceptable -- though I forgot the name of
> > said header, it's one of the Accept-* ones.
> >
> 
> There is actually very inconsistent browser behavior when it comes to
> letting an app know the charset of a form.
> 
> I hit this when developing a bookmarklet that would submit forms from
> web pages on arbitrary servers.  From my experience, here is the short
> version:
> 
> 1. There is no way to reliably know the charset of a url-encoded form
> across all browsers from the content of the submission, but the
> charset of the submitted form will be the same charset used to render
> the form.
> 
> 2. You can use a hidden field _charset_ on IE and Firefox (but not
> Safari) to reliably get the charset.
> 
> 3. W3C now recommends using multipart/form-data for non-ASCII data
> (essentially all forms) [1]:
> 
> "The content type "application/x-www-form-urlencoded" is inefficient
> for sending large quantities of binary data or text containing
> non-ASCII characters. The content type "multipart/form-data" should be
> used for submitting forms that contain files, non-ASCII data, and
> binary data."
> 
> (For a decent overview of the issue, but a little dated, see [2])
> 
> IMO, the best route forward for django would be to assume that the
> decoding should be done using the same charset the site is using to
> render pages.  If the developer has special needs, they can use
> _charset_ or other means to determine the charset and handle the
> encoding.

I realise this is now an old thread, but I wanted to point out that the
above paragraphs are precisely the reasoning behind why we do things the
current way in Django. Because there is no reliable way to know the
submission encoding, we assume it is what Django uses by default and
provide a way to set it (via request.encoding) inside the view (which is
important, so that it can be set by the client code) on a per-view
basis. The logic is that the client code is in a much better position
than Django to know what the encoding is when it's talking to a legacy
application.

When I wrote the docs for form encoding handling, I made it simple and
wrote "there is no reliable way to tell", but Craig has laid out all the
reasoning behind the decision.

Regards,
Malcolm

-- 
How many of you believe in telekinesis? Raise my hand... 
http://www.pointy-stick.com/blog/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to