On May 30, 5:58 pm, [EMAIL PROTECTED] (Chas Owens) wrote: > On 30 May 2007 06:07:55 -0700, cc96ai <[EMAIL PROTECTED]> wrote: > snip> I have a UTF8 input > > $value = "%23%C2%A9%C2%AE%C3%98%C2%A5%C2%BC%C3%A9%C3%8B > > %C3%B1%C3%A0%C3%A6%3F%23"; > > > the HTML output should be > > ">#(c)(r)Ø¥¼éËñàæ?#"; > > > but I cannot find a way to convert it > > snip > > #!/usr/bin/perl > use strict; > use warnings; > use URI::Escape; > > my $s = '%C3%A9'; > > print uri_unescape($s), "\n"; > > This prints > é > for me.
But Perl doesn't actually print an e-acute character! It prints the _byte_ sequence "\xC3\xA9". Now if you happen to print this byte sequence to a device that's expecting UTF8 the it'll be rendered as an e-acute. Remember, in Perl there are two types of string, Unicode strings (unfortunately known as "utf8 strings") and byte strings. I suspect the OP wants to decode '%C3%A9' into a single character string containing e-acute, not the two-byte byte string "\xC3\xA9". Oddly, there's a uri_unescape_utf8 but no uri_unescape_utf8 provided by URI::Escape. However combining URI::Escape::uri_unescape() and Encode::decode_utf8() in one statement is not overly taxing. use Encode; use URI::Escape qw(uri_unescape); my $e_accute = decode_utf8 uri_unescape '%C3%A9'; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/