Dan Sugalski <[EMAIL PROTECTED]> writes:
> Should perl's regexes and other character comparison bits have an option
> to consider different characters for the same thing as identical beasts?
> I'm thinking in particular of the Katakana/Hiragana bits of japanese,
> but other languages may have the same concepts.
I think canonicalization gets you that if that's what you want. I
definitely think that Perl should be able to do all of NFD, NFC, NFKD, and
NFKC canonicalization.
NFC will collapse most different characters for the same thing to a single
character and get rid of most of the compatibility characters for you.
NFKC will go further and do stuff like getting rid of superscripts and the
like.
--
Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/>