Re: Unicode handling

Bryan C. Warnock Fri, 23 Mar 2001 11:13:49 -0800
On Friday 23 March 2001 14:18, Dan Sugalski wrote:
> At 01:30 PM 3/22/2001 -0800, Hong Zhang wrote:
> > > 6) There will be a glyph boundary/non-glyph boundary pair of regex
> > > characters to match the word/non-word boundary ones we already have.
> >
> >(While
> >
> > > I'd personally like \g and \G, that won't work as \G is already taken)
> > >
> > > I also realize that the decomposition flag on regexes would mean that
> > > s/A/B/D would turn A ACUTE to B ACUTE, which is meaningless. See the
> > > previous paragraph.
> >
> >I recommend to use 'u' flag, which indicates all operations are performed
> >against unicode grapheme/glyph. By default re is performed on codepoint.
>
> U doesn't really signal "glyph" to me, but we are sort of limited in what
> we have left. We still need a zero-width assertion for glyph boundary
> within regexes themselves.
>
> >We need the character equivalence construct, such as [[=a=]], which
> >matches "a", "A ACUTE".
>
> Yeah, we really need a big list of these. PDD anyone?
>

But surely this is a locale issue, and not an encoding one?  Not every 
language recognizes the same character equivalences.


-- 
Bryan C. Warnock
[EMAIL PROTECTED]
Re: Unicode handling

Reply via email to