On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote: : On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: : : > This has no direct bearing on p6l, since performance is a p6i issue. : : > But perhaps in the interests of performance as well as hackery we : : > should explicitly provide some sort of variant regex behavior: : : > : : > /a./ :bytes : : > /a./ :graphemes : : > : : > where the first would recognize 0x61 followed by any single byte, while : : > the second would recognize 'a' followed by any number of bytes : : > composing a single grapheme. : : : : Isn't that what :u0, :u1, :u2, and :u3 are for? : : : : :u0 # use bytes (. is byte) : : :u1 # level 1 support (. is codepoint) : : :u2 # level 1 support (. is grapheme) : : :u3 # level 1 support (. is language dependent) : : These modifiers might get renamed to match whatever b/c/g/w convention : we come up with pragmas. The levels aren't all that intuitive, though : there is a kind of progression of semantic complexity that would get : lost with ordinary names.
On the flip side, a good reason to get rid of the numeric values is that in all likelihood people will continually make the mistake of thinking :u1 means "one byte at a time" and :u2 means "two bytes at a time". And then they'll wonder why :u4 doesn't give them UTF-32... Larry