Dan Sugalski <[EMAIL PROTECTED]> writes:

> Should perl's regexes and other character comparison bits have an option
> to consider different characters for the same thing as identical beasts? 
> I'm thinking in particular of the Katakana/Hiragana bits of japanese,
> but other languages may have the same concepts.

I think canonicalization gets you that if that's what you want.  I
definitely think that Perl should be able to do all of NFD, NFC, NFKD, and
NFKC canonicalization.

NFC will collapse most different characters for the same thing to a single
character and get rid of most of the compatibility characters for you.
NFKC will go further and do stuff like getting rid of superscripts and the
like.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to