"Octavian Rasnita" schreef:
> How can I compare a certain string which is UTF-8 encoded with the
> same string which is ISO-8859-2?
Let's use "ä" or "รค" as an example. In Unicode you can express it
by "\x{E4}", or by "\x{61}\x{308}" or "a\x{308}" (a, combining
diaeresis).
See http://www.unicode.org/faq/char_combmark.html
For example:
perl -MUnicode::Normalize -e 'printf "%x\n", ord for split "",
NFKD(qq<\x{E4}>)'
perl -MUnicode::Normalize -le 'print NFKC(qq<\x{61}\x{308}>)'
perl -MUnicode::Normalize -le 'print NFKC(qq<a\x{308}u\x{308}>)'
So, before you use the Unicode data, you need to 'normalize' it.
perl -MEncode -MUnicode::Normalize -le '
$s = encode q<iso-8859-2>, chr 0xE4;
print( (decode(q<iso-8859-2>, $s) eq NFKC(qq<a\x{308}>))
? "Yes" : "No" )'
perl -MUnicode::Normalize -le 'use encoding q<iso-8859-2>;
print( chr(0xE4) eq NFKC(qq<a\x{308}>)
? "Yes" : "No" )'
--
Affijn, Ruud
"Gewoon is een tijger."
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>