> >I recommend to use 'u' flag, which indicates all operations are performed
> >against unicode grapheme/glyph. By default re is performed on codepoint.
>
> U doesn't really signal "glyph" to me, but we are sort of limited in what
> we have left. We still need a zero-width assertion for glyph boundary
> within regexes themselves.
The 'u' flag means "advanced unicode feature(s)", which includes "always
matching against glyph/grapheme, not codepoint". What it really means is
up to discussion. I think we probably still need "glyph" or "grapheme"
boundary in some cases.
> >We need the character equivalence construct, such as [[=a=]], which
> >matches "a", "A ACUTE".
>
> Yeah, we really need a big list of these. PDD anyone?
I don't think we need a big list here. The [[=a=]] is part of POSIX 1003.2
regex syntax, also [[.ch.]]. Perl 5 does not support these syntax. We can
implement in Perl 6.
For even advantage equivalence, we can offload the job to collation library.
Hong