On Fri, 2 Jan 2004 11:56:12 +0100 Eric Cholet <[EMAIL PROTECTED]> wrote:
> Thanks for your detailed reply. I looked into this and found that I > can use Unicode::Normalize to decompose a string in NFD form and then > remove the accents with a regex removing /pM/. I wonder if I overlooked > a shortcoming in this approach since you didn't recommend it although > your are the author of Unicode::Normalize. I'm afraid, the process of taking NFD followed by removing \pM characters (remove_accent() as below) would remove marks other than accents too much. Say, it replaces 'â' (U+2260, <NOT EQUAL TO>) with '=' (<EQUALS SIGN>) since a mathematic "negation slash" is encoded by U+0338 <COMBINING LONG SOLIDUS OVERLAY> which is to be removed. sub remove_accent { use Unicode::Normalize; my $s = NFD(shift); $s =~ s/\pM//g; return $s; } Regards, SADAHIRO Tomoyuki