Dear Niko, > > > > Function url(-path-info=>1) does not work well if I have ISO-8859-2 > > > > accented chars in the URL. Utility function CGI::Util::escape() > > > > unconditionally forces an ISO-8859-1 -> UTF-8 conversion: > > > > > > > > # force bytes while preserving backward compatibility -- dankogai > > > > $toencode = pack("C*", unpack("U0C*", $toencode)); > > > Unfortunately 3.38 does not work. > > OK, thanks. > > I must admit I'm a bit confused about the problem. Could you please > give a simple test case (either a command-line version or a CGI script) > with the current result and the one you'd expect?
See below. > As far as I can see (looking at 3.29), url(-path-info=>1) will unescape() > the PATH_INFO variable into 8-bit characters and then encode those manually > into URL encoding with sprintf() as the last thing in the url() function. > > I can't see CGI::Util::escape() being called here - are you calling > that manually? url() calls query_string() that calls escape(). > I do get your point about the idempotency of course: > > % perl -MCGI::Util=escape,unescape -E 'say escape(unescape("%E4"))' > %C3%A4 > > but it's not clear to me what this breaks, particularly as those aren't > public subroutines. This is the scenario: My CGI program runs and produces an ISO-8859-2 encoded HTML page with a form that processed by GET method. User enters some accented chars (e.g. "ä") in form then clicks submit button. Browser honors encoding and sends back a latin2 encoded URL to server like http://www.example.com/sample.cgi&search=%E4 . CGI program unescapes query string and stores internally as {search=>"\xe4"}. When it calls url() in order to place a self pointing URL on next HTML page. Sub url() calls query_string() that uses CGI::Util::escape to produce this: http://www.example.com/sample.cgi&search=%C3A4 . This is because escape() assumes that HTML page encoded in UTF-8. However if the user follows this link, browser sends back the wrong URL. after unescaping stores {search=>"\xc3\xa4"} and prints http://www.example.com/sample.cgi&search=%C3%83%C2%A4 in the next round and so on. I could not demonstrate this behavior off-line. But I set up a short demo program that you can test with your browser if necessary. This script below shows no more than your one liner above. ------------------8<---------------------8<--------------- #!/usr/bin/perl use strict; use CGI::Util; use Dumpvalue; my $dumper=Dumpvalue->new(quoteHighBit=>1); my $latin2_string = "a\341e\351i\355o\363\366\365u\372\374\373"; #aáeéiíoóöőuúüű $dumper->dumpValue($latin2_string); my $escaped_string = CGI::Util::escape($latin2_string); $dumper->dumpValue($escaped_string); my $unescaped_string = CGI::Util::unescape($escaped_string); $dumper->dumpValue($unescaped_string); ------------------8<---------------------8<--------------- Ooops! Stop the press. I've just noticed in 3.29 source that CGI::Util::escape is substantially changed. It seems to be good for my purposes: $toencode = pack("C*", unpack("C*", $toencode)); Note: this line can be omitted. :-) (However may cause problems if someone wants to use UTF-8.) Unfortunately the latest version (3.42) is confused again. :-( Actually I defined my own MyCGI subclass that overrides CGI::query_string() and CGI::Util::escape(). This works for me but is not a simple and elegant solution. Cheers Gabor -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org