A couple nights ago I read RFC93 as discussed in Apoc. 5 and got fired up- it reminded me of some ideas from when I was hacking Henry Spencer's regexp package. How to futher generalize regular expression input. It's a bit orthoginal- a properly implemented RFC93 make some difficult things easier- whether it's done as binding to a sub, or as overloading =~, or whatever.
A very general description of a regular expression, is a program that seeks a match within a string of letters. In perl4 the string of letters was a string of bytes, and in perl6 it's a string of Unicode (most of the time). It might as well be a string of *anythings*. Binding a match against a sub is a natural way to get the anythings you want to match. Now, I'm a newbie to perl6, so be patient with my hacked-up examples below. They won't work in any language. And, for the first I tweaked RFC93: When the match is finished, the subroutine would be called one final time, and passed >1 arguments: a flag set to 1, and a list containing the "unused" elements which I admit is a poor interface- but it lets me write: # Looking for luck- find a run of 3 numbers divisible by 7 or 13 # "sub numerology" is simply an interface to an array of integers sub numerology { $#_ ? shift,unshift @::nums,@_ : splice @::nums,0,@_ } &numerology =~ / <( !($_[0] % 7 and $_[0] % 13) )><3> /; True, it's easy to join integers with spaces and write an equivalent regexp on the result- but why stringify when you don't have to? I'm running into trouble here- using <( code )> to match against a single "atom" (a number), it should be more "character classy". Assertions are flexible enough to match all sorts of non-letter atoms, can write a grammer to make it more readable- maybe something like &numerology =~ / < <divisible(7)><divisible(13)> ><3> /; Another example. Let's say there's a class that deals with colors. It has an operator that returns true if two colors look about the same. Given a list of color objects, is there a regexp to find a rainbow? Even if the color class doesn't support stringification? A less fanciful example- scan a sound. A very crude beat-finding regexp- &fetch_sound_frames =~ / ( # store soundclip (array of frames) in $1 (<volume(-40db)><50,1500>) # quietish section, 50-1500 frames (<volume(-15db)>+) # Followed by some loud frame(s) ) # End capture of the first beat <before # Make sure the loud/quiet pattern repeats, [ # but don't require the exact same frames <volume(-40db)><$2.length*.95,$2.length*1.05> <volume(-15db)><$3.length*.95,$3.length*1.05> ]{3} > / The point I'm trying to make: A regexp is already able to consume diffent kinds of characters from a string- :u0, :u1, :u2, :u3- and with RFC93 it can be fed anything a sub can return. Those things can be characters- or strings- or stringified if the regexp requires- but if the regexp doesn't have any strings to match against, don't bother. Let the assertions get the atoms raw. Plenty of brilliance on this list, I know I'm not brilliant, especially when drowsy... did some research before posting but if this has been covered already (or is completely daft) please face me in the right direction and shoo me along gently. -y ~~~~~ The Moon is New