Re: Compare a UTF-8 string with an ISO-8859-2 one

Dr.Ruud Sat, 06 May 2006 05:38:25 -0700

"Octavian Rasnita" schreef:

> How can I compare a certain string which is UTF-8 encoded with the
> same string which is ISO-8859-2?


Let's use "&auml;" or "ä" as an example. In Unicode you can express it
by "\x{E4}", or by "\x{61}\x{308}" or "a\x{308}" (a, combining
diaeresis).
See http://www.unicode.org/faq/char_combmark.html

For example:

  perl -MUnicode::Normalize -e 'printf "%x\n", ord for split "",
NFKD(qq<\x{E4}>)'

  perl -MUnicode::Normalize -le 'print NFKC(qq<\x{61}\x{308}>)'
  perl -MUnicode::Normalize -le 'print NFKC(qq<a\x{308}u\x{308}>)'


So, before you use the Unicode data, you need to 'normalize' it.

  perl -MEncode -MUnicode::Normalize -le '
    $s = encode q<iso-8859-2>, chr 0xE4;
    print( (decode(q<iso-8859-2>, $s) eq NFKC(qq<a\x{308}>))
    ? "Yes" : "No" )'

  perl -MUnicode::Normalize -le 'use encoding q<iso-8859-2>;
    print( chr(0xE4) eq NFKC(qq<a\x{308}>)
    ? "Yes" : "No" )'

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Compare a UTF-8 string with an ISO-8859-2 one

Reply via email to