Re: removing accents

SADAHIRO Tomoyuki Sat, 27 Dec 2003 20:48:28 -0800

On Sat, 27 Dec 2003 13:30:19 +0100
Eric Cholet <[EMAIL PROTECTED]> wrote:


> Here's another naive question from a unicode newbie:
> Is there a way, using perl's unicode support, to remove
> accents from a string? I looked at \pM but can't figure
> out how it works, I wasn't able to match anything with it.
> 
> Thanks,
> --
> Eric Cholet

Hello.
There are some threads on this issue.
Those which I found are as following.

* http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-05/msg00016.html
* http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2001-12/msg00004.html

I hope something there can help you.

==
P.S. UTR #30, Character Foldings, has two concepts about removing accents.
[cf. http://www.unicode.org/reports/tr30/ ]

One is "accent removal", and
the other is "diacritic removal (includes stroke, hook, descender)".

The accent removal utilizes canonical decomposition, and
non-decomposable characters, including Eth ("Š", U+00D0),
O with stroke ("Ų", U+00D8), c with curl (U+0255,
cf. http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0255 ),
d with hook (U+0257,
cf. http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0257 ),
will not be transformed.

Though "diacritic removal" is provisional and its definition has not
been specified yet, I suppose it to have mapping of "Ų" to "O", etc.

Regards,
SADAHIRO Tomoyuki

Re: removing accents

Reply via email to