I started with some very simple (I thought) tests, but got completely confused very quickly. Here is the short program that I was using:
>>>> test.pl use utf8; use URI; use URI::Escape;
print (uri_escape("\xFD") [snip]
With this, on perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail), I get
>>>> %FD %C3%BD
[snip] However, on perl, v5.8.4 built for i386-linux-thread-multi, I get:
>>>> %FD [snip] Nothing seems to work anymore, although (or because?) 5.8 has better Unicode support.
The (easiest|new canonical) way to go is to use uri_escape_utf8() instead of uri_escape(). Note that as of version 3.28 uri_escape_utf8() is NOT AUTOMATICALLY loaded.
% perl -MURI::Escape -le 'print uri_escape("\xFD")'
%FD
% perl -MURI::Escape=uri_escape_utf8 -le 'print uri_escape_utf8("\xFD")'
%C3%BDperldoc URI::Escape
uri_escape_utf8( $string )
uri_escape_utf8( $string, $unsafe )
Works like uri_escape(), but will encode chars as UTF-8 before
escaping them. This makes this function able do deal with charac-
ters with code above 255 in $string. Note that chars in the 128 ..
255 range will be escaped differently by this function compared to
what uri_escape() would. For chars in the 0 .. 127 range there is
no difference.
The call:
$uri = uri_escape_utf8($string);
will be the same as:
use Encode qw(encode); $uri = uri_escape(encode("UTF-8", $string));
but will even work for perl-5.6 for chars in the 128 .. 255 range.
Dan the Encode Maintainer
