Re: Perl 6 regex parser

Jeff 'japhy' Pinyan Sun, 27 Jun 2004 17:17:48 -0700

On Jun 27, Luke Palmer said:

>Jeff 'japhy' Pinyan writes:
>> I am currently completing work on an extensible regex-specific parsing
>> module, Regexp::Parser.  It should appear on CPAN by early July
>> (hopefully under my *new* CPAN ID "JAPHY").
>>
>> Once it is completed, I will be starting work on writing a subclass
>> that matches Perl 6 regexes, Regexp::Perl6 (or Perl6::Regexp, or
>> Perl6::Regexp::Parser).
>
>Or Regexp::Parser::Perl6 :-)


I wasn't sure where in the hierarchy of modules it should go.

>The grammar for Perl 6 is going to be specified with Perl 6 patterns.
>That presents us a little bootstrapping problem.  So the original goal
>of Damian's Perl6::Rules was to transform this grammar back into Perl 5
>patterns so they can parse the simplified Perl 6 code for Perl 6 and
>compile a bootstrap.
>
>My personal, nondivine plan would be to use your module to create a
>driver-based parser.  That could then be used for the bootstrap instead.

If you mean what I think you mean by driver-based, than my module is a
perfect fit.  To subclass it, you do this:

  package Regexp::NoCode;
  use base 'Regexp::Parser';

  sub init {
    my $self = shift;

    $self->SUPER::init();

    $self->del_handler('(?{');
    $self->del_handler('(??{');
    $self->del_handler('(?p{');
  }

  1;

Now you have a parser that refuses to acknowledge (?{ ... }) and (??{ ...
}) assertions (resulting in an error).

Another example would be to support the '&' metacharacter, which is the
"AND" equivalent of '|':

  package Regexp::AndBranch;
  use base 'Regexp::Parser';

  sub init {
    my $self = shift;

    $self->SUPER::init();

    $self->add_handler('&' => sub {
      my ($S) = @_;
      return $S->object('and');
    });
  }

Then you create Regexp::AndBranch::and like so:

  package Regexp::AndBranch::and;
  @ISA = Regexp::Parser::or;  # it behaves like an 'or' branch...

  sub new {
    my ($class, $rx, $lhs, $rhs) = @_;
    my $self = bless {
      rx => $rx,
      flags => $rx->{flags}[-1],
      class => 'branch',
      type => 'and',
      data => [$lhs, $rhs],
      raw => '&',
    }, $class;
    return $self;
  }

It'll inherit the other methods it needs from the 'or' class.  Then, when
you want to convert it to an existing construct (specifically, /x&y&z/
would become /(?=x)(?=y)z/, like in vim).

>A driver-based parser has a couple of advantages over regexes and even
>Parse::RecDescent.  First, the parsing algorithm can be easily
>customized, so we can play with hybrid models and see how the time
>complexity works out.  Also, you can suspend the parsing in the middle
>of execution, go somewhere else, and continue, which the Perl 6 parser
>might just want to do (something like simulated coroutines).

Yeah, you can parse node-by-node:

  my $p = Regexp::Parser->new($regex);
  while (my $n = $p->next) {
    # ...
  }

When I finish writing it to work with the current set, I'll post it to
CPAN and alert the group.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
CPAN ID: PINYAN    [Need a programmer?  If you like my work, let me know.]
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.

Re: Perl 6 regex parser

Reply via email to