Re: Ruminating RFC 93- alphabet-blind pattern matching

Austin Hastings Wed, 02 Apr 2003 09:11:53 -0800

--- Yary Hluchan <[EMAIL PROTECTED]> wrote:
> A couple nights ago I read RFC93 as discussed in Apoc. 5 and got
> fired up- it reminded me of some ideas from when I was hacking
> Henry Spencer's regexp package. How to futher generalize regular
> expression input.  It's a bit orthoginal- a properly implemented
> RFC93 make some difficult things easier- whether it's done as
> binding to a sub, or as overloading =~, or whatever.
> 
> A very general description of a regular expression, is a program
> that seeks a match within a string of letters.  In perl4 the string
> of letters was a string of bytes, and in perl6 it's a string of
> Unicode (most of the time).
> 
> It might as well be a string of *anythings*.  Binding a match against
> a sub is a natural way to get the anythings you want to match.  Now,
> I'm a newbie to perl6, so be patient with my hacked-up examples
> below.
> They won't work in any language. And, for the first I tweaked RFC93:
> 
>   When the match is finished, the subroutine would be called one
> final
>   time, and passed >1 arguments: a flag set to 1, and a list
> containing
>   the "unused" elements
> 
> which I admit is a poor interface- but it lets me write:
> 
>   # Looking for luck- find a run of 3 numbers divisible by 7 or 13
>   # "sub numerology" is simply an interface to an array of integers
>   sub numerology { $#_ ? shift,unshift @::nums,@_ : splice
> @::nums,0,@_ }
>   &numerology =~ / <( !($_[0] % 7 and $_[0] % 13) )><3> /;
> 
grammar Numerology;


rule number { \b \d+ \b }
rule lucky  { <number>
              { fail unless ($1 % 7 == 0) && ($1 % 13 == 0); }
            }

rule lucky_strike  { :3x <lucky> } ## Is this right?


> True, it's easy to join integers with spaces and write an equivalent
> regexp on the result- but why stringify when you don't have to?
> 
> I'm running into trouble here- using <( code )> to match against a
> single "atom" (a number), it should be more "character classy".  
> Assertions are flexible enough to match all sorts of non-letter 
> atoms, can write a grammer to make it more readable- maybe something
> like
>   &numerology =~ / < <divisible(7)><divisible(13)> ><3> /;

Actually, this is a good argument for nested rules, and thereby for
nested subs:

my &numerology = rx/
   rule number { \b \d+ \b }
   rule divisible($by) { (<number>) :: { fail if ($1 % $by); }}
   
   :x3 <all(divisible(7), divisible(13))>
/;


> Another example.  Let's say there's a class that deals with colors.
> It has an operator that returns true if two colors look about the 
> same. Given a list of color objects, is there a regexp to find a
> rainbow? Even if the color class doesn't support stringification? 

Yes.

grammar Rainbow;

rule color {...};  # this one's on you.

rule same_color($color is Colorific)
{
  <color> ::: { fail unless $1.looks_like($color); }
}

rule band($color is Colorific)
{
  <same_color($color)>+
}

rule Rainbow
{
  <band(new Color("red"))>
  <band(new Color("orange"))>
  <band(new Color("yellow"))>
  <band(new Color("green"))>
  <band(new Color("blue"))>
  <band(new Color("indigo"))>
  <band(new Color("violet"))>
  <pot_o_gold>?
}

> A less fanciful example- scan a sound. A very crude beat-finding
> regexp- 
>  &fetch_sound_frames =~
>   / (                           # store soundclip (array of frames)
> in $1
>      (<volume(-40db)><50,1500>) # quietish section, 50-1500 frames
>      (<volume(-15db)>+)         # Followed by some loud frame(s)
>     )                           # End capture of the first beat
> 
>     <before                     # Make sure the loud/quiet pattern
> repeats,
>      [                          # but don't require the exact same
> frames
>       <volume(-40db)><$2.length*.95,$2.length*1.05> 
>       <volume(-15db)><$3.length*.95,$3.length*1.05>
>      ]{3}
>     >
>   /
> 

You're just about there. Only the syntax needs work.

http://dev.perl.org/perl6/exegesis/5

> The point I'm trying to make:
> A regexp is already able to consume diffent kinds of characters from
> a
> string- :u0, :u1, :u2, :u3- and with RFC93 it can be fed anything a
> sub
> can return.  Those things can be characters- or strings- or
> stringified if
> the regexp requires- but if the regexp doesn't have any strings to
> match
> against, don't bother. Let the assertions get the atoms raw.
> 
> Plenty of brilliance on this list, I know I'm not brilliant,
> especially
> when drowsy... did some research before posting but if this has been
> covered already (or is completely daft) please face me in the right
> direction and shoo me along gently.

What I think you're looking for is the fact that they're not regexes
any more. They are "rexen", but in horrifying-secret-reality, what has
happened is that Larry's decided to move Fortran out of core, and
replace it with yacc. 

It's funny, but I try to describe this to people (gently) and they
immediately fall into three classes:

People who never got it, regex-wise, just kind of screw up their faces
and say "Huh?"

People (a very small number) who go "Oh. Cool!" and their eyes light
up.

And finally the majority of coders, who look as though they opened a
door expecting to find a bookstore, and instead a 250-pound tuna fell
on them.

> -y

=Austin

Re: Ruminating RFC 93- alphabet-blind pattern matching

Reply via email to