> >I recommend to use 'u' flag, which indicates all operations are performed
> >against unicode grapheme/glyph. By default re is performed on codepoint.
>
> U doesn't really signal "glyph" to me, but we are sort of limited in what
> we have left. We still need a zero-width assertion for glyph boundary
> within regexes themselves.

The 'u' flag means "advanced unicode feature(s)", which includes "always
matching against glyph/grapheme, not codepoint". What it really means is
up to discussion.  I think we probably still need "glyph" or "grapheme"
boundary in some cases.

> >We need the character equivalence construct, such as [[=a=]], which
> >matches "a", "A ACUTE".
>
> Yeah, we really need a big list of these. PDD anyone?

I don't think we need a big list here. The [[=a=]] is part of POSIX 1003.2
regex syntax, also [[.ch.]]. Perl 5 does not support these syntax. We can
implement in Perl 6.

For even advantage equivalence, we can offload the job to collation library.

Hong

Reply via email to