At 02:31 PM 3/23/2001 -0500, Bryan C. Warnock wrote:
>On Friday 23 March 2001 14:18, Dan Sugalski wrote:
> > At 01:30 PM 3/22/2001 -0800, Hong Zhang wrote:
> > > > 6) There will be a glyph boundary/non-glyph boundary pair of regex
> > > > characters to match the word/non-word boundary ones we already have.
> > >
> > >(While
> > >
> > > > I'd personally like \g and \G, that won't work as \G is already taken)
> > > >
> > > > I also realize that the decomposition flag on regexes would mean that
> > > > s/A/B/D would turn A ACUTE to B ACUTE, which is meaningless. See the
> > > > previous paragraph.
> > >
> > >I recommend to use 'u' flag, which indicates all operations are performed
> > >against unicode grapheme/glyph. By default re is performed on codepoint.
> >
> > U doesn't really signal "glyph" to me, but we are sort of limited in what
> > we have left. We still need a zero-width assertion for glyph boundary
> > within regexes themselves.
> >
> > >We need the character equivalence construct, such as [[=a=]], which
> > >matches "a", "A ACUTE".
> >
> > Yeah, we really need a big list of these. PDD anyone?
> >
>
>But surely this is a locale issue, and not an encoding one?  Not every
>language recognizes the same character equivalences.

In Unicode, there's theoretically no locale. Theoretically...

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to