Re: Silent removed parameters when posting UTF8

Bill Moseley Thu, 25 Sep 2008 06:27:51 -0700

On Thu, Sep 25, 2008 at 11:35:40AM +0200, Gisle Aas wrote:
> >  Do the url-encoded post parameters have to be of
> > a given character encoding or is that just an agreement between the
> > sender and receiver?
> 
> There certainly has to be agreement between the sender and the
> receiver.  I thought the normal behaviour was to encode using the same
> encoding as the document the form was embedded in uses.


Yes, that's why I use accept-charset=utf8 on my forms.  BTW, I looked
at a Firefox post with Wireshark and there's no charset added to the
urlencoded content type.  Seems like I've seen example of adding a
charset to that content type.

In absence of a charset in the post I guess the server just has to
assume it's encoded as requested (with accept-charset).  That's what I
do.


Now, in the case that trigged this question there is no form -- just
documentation that says "post to this url", and the url has
?encoding=utf8 if the content is utf8.



> > If that's the case then it would seem like query_param should die if
> > it receives any strings with the utf8 flag on.  You can't encode_utf8
> > or utf8::downgrade because we don't know what (octet) encoding that
> > the sender and receiver agreed on.
> 
> I basically agree with that view.  It can still be convenient to have
> it assume UTF-8 encoding in this case, and there is the potential that
> introducing this strictness breaks code.

With something as widely used as URI I don't think you can make that
change.  But a warning would probably be safe.  Maybe a package
variable could be used to enable exceptions.


> $u->query_form(foo => "bål");
> $u->query_form(foo => "bål", bar => "\N{WATCH}");
> 
> which prints:
> 
>   http://www.example.com?foo=b%E5l
>   http://www.example.com?foo=b%C3%A5l&bar=%E2%8C%9A
> 
> Here the encoding of the first parameter depends on the presence of
> the second parameter which is clearly not a good thing.

Right, the 8859-1 encoded bål was upgraded to a utf8 character string
when combined with the ut8 character string.  Might been different if
your script was encoded in utf8 and had the use utf8 pragma.

I kind of wish the uf8 flag in Perl was called "character" instead.
It's not utf8 (ok, it is), but it really should be though of as an
abstract representation of characters without any encoding.

When the utf8 flag is set then the string is a Perl character string
and to use it external of Perl (i.e. sending to another server) it
really needs to be converted to octets.  And most of the time it's up
to the user to decide what encoding to use and not something the
module can guess.

-- 
Bill Moseley
[EMAIL PROTECTED]
Sent from my iMutt

Re: Silent removed parameters when posting UTF8

Reply via email to