Anthony Ettinger wrote:

there must be a UTF-8 html entity module on cpan?

Thanks. That got me thinking in the right direction.

HTML::Entities will translate entities to Unicode. Unfortunately if all
the values in the string passed to HTML::Entities are less than 256, the
string returned by HTML::Entities doesn't have the UTF-8 flag on, but I
can force it to UTF-8 by using utf8::decode_utf8.

For example,

$string1 = "Test string with ¿"
$string2 = decode_utf8(decode_entities($string1));


If the original string has entities that have an ordinal value greater than 255 or if I concatenate a value on the end of the string that is greater than 255, the string will become UTF-8.


$utf8_chr = chr(256);

$string1 = "Test string with ¿"
$string2 =  decode_entities($string1);

# $string2 is still not flagged as UTF-8

$string2 .= $utf8_chr;
# now $string2 is UTF-8 encoded and the UTF-8 flag is on

$string2 =~ s/$utf8_chr$//;



Thanks for your suggestion.
Carl.



_______________________________________________
Perl-Unix-Users mailing list
Perl-Unix-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to