On Sun, Sep 21, 2008 at 08:36:32AM +0200, Gisle Aas wrote:
> The issue with dropped chars has been fixed so I don't worry about
> that.  Just upgrade the URI module.
> 
> The remaining issue is if $url->query_form should accept Unicode data
> and automatically UTF-8 encode it as it does now.  When I accepted
> that patch I though it would be harmless as this provide a convenience
> for some at the same time as it does not change anything for users
> that properly encode their data before passing it to this API. What's
> problematic is that this strengthens the idea that the UTF-8 flag has
> semantic meaning at the Perl level.  Strings with chars in the range
> 128-255 behave differently depending on the internal representation.
> I'm not happy about that.  It's certainly not my idea of a sane
> Unicode model.
> 
> To me that leaves 2 options; either make the URI API strict and only
> accept args that are bytes (strings that can be utf8::downgraded) or
> just live with the ugliness of inconsistent Unicode model and try to
> document the issues better over time. I'm leaning towards the later.

Sorry, kind of got stuck behind work here.

So, in my situation I need to post some utf8 characters.  The service
I'm using requires an ?encoding=utf8 query parameter to say what
encoding the text is encoded in.  The post doesn't include
a charset:

    Content-Type: application/x-www-form-urlencoded

So it seems the server needs to be explicitly told.


The problem I had was if I passed in a character string (utf8 flag on)
then the url-encoding process dropped chars.  You say that has been
fixed.  I fixed on my side by simply calling encode_utf8 to convert my
character string into octets.  Then all octets were url-encoded and
passed to the server and all works fine.

Now, here's my question.  Could I pass in any byte (octet) string and
have it url-encoded?  Do the url-encoded post parameters have to be of
a given character encoding or is that just an agreement between the
sender and receiver?

That is, can I encode my character string into any character
encoding and send it url-encoded?  Then as long as the server
receiving the post knows how to decoded (using same encoding I used)
then it would be fine?

If that's the case then it would seem like query_param should die if
it receives any strings with the utf8 flag on.  You can't encode_utf8
or utf8::downgrade because we don't know what (octet) encoding that
the sender and receiver agreed on.

-- 
Bill Moseley
[EMAIL PROTECTED]
Sent from my iMutt

Reply via email to