On Sun, Sep 21, 2008 at 08:36:32AM +0200, Gisle Aas wrote: > The issue with dropped chars has been fixed so I don't worry about > that. Just upgrade the URI module. > > The remaining issue is if $url->query_form should accept Unicode data > and automatically UTF-8 encode it as it does now. When I accepted > that patch I though it would be harmless as this provide a convenience > for some at the same time as it does not change anything for users > that properly encode their data before passing it to this API. What's > problematic is that this strengthens the idea that the UTF-8 flag has > semantic meaning at the Perl level. Strings with chars in the range > 128-255 behave differently depending on the internal representation. > I'm not happy about that. It's certainly not my idea of a sane > Unicode model. > > To me that leaves 2 options; either make the URI API strict and only > accept args that are bytes (strings that can be utf8::downgraded) or > just live with the ugliness of inconsistent Unicode model and try to > document the issues better over time. I'm leaning towards the later.
Sorry, kind of got stuck behind work here. So, in my situation I need to post some utf8 characters. The service I'm using requires an ?encoding=utf8 query parameter to say what encoding the text is encoded in. The post doesn't include a charset: Content-Type: application/x-www-form-urlencoded So it seems the server needs to be explicitly told. The problem I had was if I passed in a character string (utf8 flag on) then the url-encoding process dropped chars. You say that has been fixed. I fixed on my side by simply calling encode_utf8 to convert my character string into octets. Then all octets were url-encoded and passed to the server and all works fine. Now, here's my question. Could I pass in any byte (octet) string and have it url-encoded? Do the url-encoded post parameters have to be of a given character encoding or is that just an agreement between the sender and receiver? That is, can I encode my character string into any character encoding and send it url-encoded? Then as long as the server receiving the post knows how to decoded (using same encoding I used) then it would be fine? If that's the case then it would seem like query_param should die if it receives any strings with the utf8 flag on. You can't encode_utf8 or utf8::downgrade because we don't know what (octet) encoding that the sender and receiver agreed on. -- Bill Moseley [EMAIL PROTECTED] Sent from my iMutt