On Sat, 27 Dec 2003 13:30:19 +0100 Eric Cholet <[EMAIL PROTECTED]> wrote:
> Here's another naive question from a unicode newbie: > Is there a way, using perl's unicode support, to remove > accents from a string? I looked at \pM but can't figure > out how it works, I wasn't able to match anything with it. > > Thanks, > -- > Eric Cholet Hello. There are some threads on this issue. Those which I found are as following. * http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-05/msg00016.html * http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2001-12/msg00004.html I hope something there can help you. == P.S. UTR #30, Character Foldings, has two concepts about removing accents. [cf. http://www.unicode.org/reports/tr30/ ] One is "accent removal", and the other is "diacritic removal (includes stroke, hook, descender)". The accent removal utilizes canonical decomposition, and non-decomposable characters, including Eth ("Ð", U+00D0), O with stroke ("Ø", U+00D8), c with curl (U+0255, cf. http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0255 ), d with hook (U+0257, cf. http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0257 ), will not be transformed. Though "diacritic removal" is provisional and its definition has not been specified yet, I suppose it to have mapping of "Ø" to "O", etc. Regards, SADAHIRO Tomoyuki