Anthony Ettinger wrote:
there must be a UTF-8 html entity module on cpan?
Thanks. That got me thinking in the right direction.
HTML::Entities will translate entities to Unicode. Unfortunately if all
the values in the string passed to HTML::Entities are less than 256, the
string returned by HTML::Entities doesn't have the UTF-8 flag on, but I
can force it to UTF-8 by using utf8::decode_utf8.
For example,
$string1 = "Test string with ¿"
$string2 = decode_utf8(decode_entities($string1));
If the original string has entities that have an ordinal value greater
than 255 or if I concatenate a value on the end of the string that is
greater than 255, the string will become UTF-8.
$utf8_chr = chr(256);
$string1 = "Test string with ¿"
$string2 = decode_entities($string1);
# $string2 is still not flagged as UTF-8
$string2 .= $utf8_chr;
# now $string2 is UTF-8 encoded and the UTF-8 flag is on
$string2 =~ s/$utf8_chr$//;
Thanks for your suggestion.
Carl.
_______________________________________________
Perl-Unix-Users mailing list
Perl-Unix-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs