RE: Stripping out Unicode combining characters (diacritics)

Doran, Michael D Mon, 05 May 2008 19:12:58 -0700

Hi Mike,

I appreciate the quick reply.  I am familiar with the Unicode::Normalize module 
(and will also be using that), but I left it out of this question because it's 
not relevant to the problem I'm currently trying to solve.  The text I'm trying 
to strip diacritics out of does not have precomposed accented characters.


-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/



-----Original Message-----
From: Mike Rylander [mailto:[EMAIL PROTECTED]
Sent: Mon 5/5/2008 8:52 PM
To: Doran, Michael D
Cc: [EMAIL PROTECTED]; Perl4lib
Subject: Re: Stripping out Unicode combining characters (diacritics)
 
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <[EMAIL PROTECTED]> wrote:
[snip]
>
>  I'm pulling my hair out on this... so any help would be appreciated.  If 
> there's any other info I can provide, let me know.
>

You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: [EMAIL PROTECTED]
 | web: http://www.esilibrary.com

RE: Stripping out Unicode combining characters (diacritics)

Reply via email to