Re: Unicode handling

Hong Zhang Thu, 22 Mar 2001 13:12:44 -0800

> 6) There will be a glyph boundary/non-glyph boundary pair of regex
> characters to match the word/non-word boundary ones we already have.
(While
> I'd personally like \g and \G, that won't work as \G is already taken)
>
> I also realize that the decomposition flag on regexes would mean that
> s/A/B/D would turn A ACUTE to B ACUTE, which is meaningless. See the
> previous paragraph.

I recommend to use 'u' flag, which indicates all operations are performed
against unicode grapheme/glyph. By default re is performed on codepoint.
We need the character equivalence construct, such as [[=a=]], which
matches "a", "A ACUTE".

Hong

Re: Unicode handling

Reply via email to