On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
> This has no direct bearing on p6l, since performance is a p6i issue.
> But perhaps in the interests of performance as well as hackery we
> should explicitly provide some sort of variant regex behavior:
>
> /a./ :bytes
> /a./ :graphemes
>
> where the first would recognize 0x61 followed by any single byte, while
> the second would recognize 'a' followed by any number of bytes
> composing a single grapheme.
Isn't that what :u0, :u1, :u2, and :u3 are for?
:u0 # use bytes (. is byte)
:u1 # level 1 support (. is codepoint)
:u2 # level 1 support (. is grapheme)
:u3 # level 1 support (. is language dependent)
These modifiers say nothing about the state of the data, but in
general internal Perl data will already be in Normalization Form
C, so even under :u1, the precomposed characters will usually do
the right thing. Note that these modifiers are for overriding
the default support level, which was probably set by pragma at
the top of the file.
Or was that to imply that a literal "a" in the RE would be
interpretted as a "grapheme a" when :u2 is active?
-Scott
--
Jonathan Scott Duff Division of Nearshore Research
[EMAIL PROTECTED] Senior Systems Analyst II