On May 30, 5:58 pm, [EMAIL PROTECTED] (Chas Owens) wrote:
> On 30 May 2007 06:07:55 -0700, cc96ai <[EMAIL PROTECTED]> wrote:
> snip> I have a UTF8 input
> > $value = "%23%C2%A9%C2%AE%C3%98%C2%A5%C2%BC%C3%A9%C3%8B
> > %C3%B1%C3%A0%C3%A6%3F%23";
>
> > the HTML output should be
> > ">#(c)(r)Ø¥¼éËñàæ?#";
>
> > but I cannot find a way to convert it
>
> snip
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use URI::Escape;
>
> my $s = '%C3%A9';
>
> print uri_unescape($s), "\n";
>
> This prints
> é
> for me.

But Perl doesn't actually print an e-acute character!

It prints the _byte_ sequence "\xC3\xA9".

Now if you happen to print this byte sequence to a device that's
expecting UTF8 the it'll be rendered as an e-acute.

Remember, in Perl there are two types of string, Unicode strings
(unfortunately known as "utf8 strings") and byte strings. I suspect
the OP wants to decode '%C3%A9' into a single character string
containing e-acute, not the two-byte byte string "\xC3\xA9".

Oddly, there's a uri_unescape_utf8 but no uri_unescape_utf8 provided
by URI::Escape.

However combining URI::Escape::uri_unescape() and
Encode::decode_utf8()
in one statement is not overly taxing.

use Encode;
use URI::Escape qw(uri_unescape);
my $e_accute = decode_utf8 uri_unescape '%C3%A9';



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to