Re: Regexp capture group list

2009-11-14 Thread Torsten Knorr
On Tue, 10 Nov 2009, at 14:37:59, Philip Newton philip.new...@gmail.com wrote:

But you could try this:

sub parse
{
  my ( $text, $re ) = @_;
  my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n;
  $_[0] =~ s/^$re//;

  return @matches
}

at the cost of running the regexp twice (once for matching and
capturing, then once for substituting).

Cheers,
Philip

Capturing and substituting in one step.

#-
 sub Parse
 {
my @matches = ();
use re 'eval';
 $_[0] =~ s/($_[1])(?{push(@matches, $1)})//g;
return @matches;
 }
#-

 Torsten




Regexp capture group list

2009-11-10 Thread Paul LeoNerd Evans
I'm writing an attempt at a simple recursive-descent parser with no
backtracking or alternation, for parsing a really simple grammar.

My usual method is to write a collection of functions that eat a prefix
from the string they're passed as $_[0] (mutably so), and return any
interesting data. A basic primative to start with is something like:


 sub parse
 {
my ( $text, $re ) = @_;
$_[0] =~ s/^$re// or die Expected $re in $text...\n;
 }

 sub parse_idspec
 {
parse $_[0], qr/ID\s+(\d+)/ and return $1;
 }


I was rather annoyed to find that the regexp capture buffers $1, $2,
etc... are in fact dynamically scoped. This means that $1 can't escape
from parse(). It behaves as if 'local $1' was present in parse(); $1 in
parse_idspec() contains whatever it used to.

After some headscratching I decided instead to have parse() return a
list of the capture groups. I so far haven't found a neater expression
than


 sub parse
 {
my ( $text, $re ) = @_;
$_[0] =~ s/^$re// or die Expected $re in $text...\n;

return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+
 }


This seems a common-enough idiom that perhaps there's a neater solution
- I find there's no @{^MATCHGROUPS} or similar present in perl...

Can anyone offer any neater suggestions?

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Regexp capture group list

2009-11-10 Thread Paul LeoNerd Evans
On Tue, Nov 10, 2009 at 02:37:59PM +0100, Philip Newton wrote:
 But you could try this:
 
 sub parse
 {
   my ( $text, $re ) = @_;
   my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n;
   $_[0] =~ s/^$re//;
 
   return @matches
 }
 
 at the cost of running the regexp twice (once for matching and
 capturing, then once for substituting).

Ooh; but wait a moment.. we can do better... $+[0] contains the string
index of the end of the match. The leading ^ means it must have been at
the start.

So how about

  my @matches = $_[0] =~ m/^$re/ or die ;
  substr( $_[0], 0, $+[0] ) = ;

  return @matches;

I think I like that...

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Regexp capture group list

2009-11-10 Thread Philip Newton
On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans
leon...@leonerd.org.uk wrote:
 So how about

  my @matches = $_[0] =~ m/^$re/ or die ;
  substr( $_[0], 0, $+[0] ) = ;

  return @matches;

 I think I like that...

Ooh, yes, it does have a certain charm. And it may even involve less
string copying -- I don't know whether s/^.// is optimised to do
that, but AFAIK substr( ..., 0, ... ) =  will simply set the
internal OFFSET flag in the SV.

Cheers,
Philip
-- 
Philip Newton philip.new...@gmail.com



Re: Regexp capture group list

2009-11-10 Thread Paul LeoNerd Evans
On Tue, Nov 10, 2009 at 01:59:24PM +, Jasper wrote:
 return map $$_, 1..$#-
 
 too hideous? (I would think it was fine...)

That isn't going to work under strict... Surely you mean..?

  return map { no strict 'refs'; $$_ } 1 .. $#-;

;)

In any case, I think I prefer the match in m// then cut of matching
prefix idea, as suggested by Philip Newton..

-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Regexp capture group list

2009-11-10 Thread Philip Newton
On Tue, Nov 10, 2009 at 14:11, Paul LeoNerd Evans
leon...@leonerd.org.uk wrote:
 After some headscratching I decided instead to have parse() return a
 list of the capture groups. I so far haven't found a neater expression
 than


  sub parse
  {
    my ( $text, $re ) = @_;
    $_[0] =~ s/^$re// or die Expected $re in $text...\n;

    return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+
  }


 This seems a common-enough idiom that perhaps there's a neater solution
 - I find there's no @{^MATCHGROUPS} or similar present in perl...

 Can anyone offer any neater suggestions?

For matches, you can use list context assignment, which will give you
the groups. Unfortunately, that doesn't work for substitutions, which
always return a count of substitutions made.

But you could try this:

sub parse
{
  my ( $text, $re ) = @_;
  my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n;
  $_[0] =~ s/^$re//;

  return @matches
}

at the cost of running the regexp twice (once for matching and
capturing, then once for substituting).

Cheers,
Philip
-- 
Philip Newton philip.new...@gmail.com



Re: Regexp capture group list

2009-11-10 Thread Jasper
2009/11/10 Philip Newton philip.new...@gmail.com:
 On Tue, Nov 10, 2009 at 14:11, Paul LeoNerd Evans
 leon...@leonerd.org.uk wrote:
 After some headscratching I decided instead to have parse() return a
 list of the capture groups. I so far haven't found a neater expression
 than


  sub parse
  {
    my ( $text, $re ) = @_;
    $_[0] =~ s/^$re// or die Expected $re in $text...\n;

    return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+
  }

return map $$_, 1..$#-

too hideous? (I would think it was fine...)


-- 
Jasper



Re: Regexp capture group list

2009-11-10 Thread Paul LeoNerd Evans
On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote:
 On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans
 leon...@leonerd.org.uk wrote:
  So how about
 
   my @matches = $_[0] =~ m/^$re/ or die ;
   substr( $_[0], 0, $+[0] ) = ;
 
   return @matches;
 
  I think I like that...
 
 Ooh, yes, it does have a certain charm. And it may even involve less
 string copying -- I don't know whether s/^.// is optimised to do
 that, but AFAIK substr( ..., 0, ... ) =  will simply set the
 internal OFFSET flag in the SV.

Seems to:

$ perl -MDevel::Peek -e 'my $foo = abcde; substr( $foo, 0, 3 ) = ; Dump 
$foo'
SV = PVIV(0xf370d0) at 0xf24e38
  REFCNT = 2
  FLAGS = (PADMY,POK,OOK,pPOK)
  IV = 3  (OFFSET)
  PV = 0xf3c143 ( abc . ) de\0
  CUR = 2
  LEN = 5


-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Regexp capture group list

2009-11-10 Thread Paul LeoNerd Evans
(appols for semi-duplicate)

On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote:
 On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans
 leon...@leonerd.org.uk wrote:
  So how about
 
   my @matches = $_[0] =~ m/^$re/ or die ;
   substr( $_[0], 0, $+[0] ) = ;
 
   return @matches;
 
  I think I like that...
 
 Ooh, yes, it does have a certain charm. And it may even involve less
 string copying -- I don't know whether s/^.// is optimised to do
 that, but AFAIK substr( ..., 0, ... ) =  will simply set the
 internal OFFSET flag in the SV.

In fact, they seem to behave quite similarly:

$ perl -MDevel::Peek -e 'my $foo = abcde; substr( $foo, 0, 3 ) = ; Dump 
$foo'
SV = PVIV(0x8e50d0) at 0x8d2e38
  REFCNT = 2
  FLAGS = (PADMY,POK,OOK,pPOK)
  IV = 3  (OFFSET)
  PV = 0x8ea143 ( abc . ) de\0
  CUR = 2
  LEN = 5

$ perl -MDevel::Peek -e 'my $foo = abcde; $foo =~ s/^abc//; Dump $foo'
SV = PVIV(0x1bc90d0) at 0x1bb6e38
  REFCNT = 1
  FLAGS = (PADMY,POK,OOK,pPOK)
  IV = 3  (OFFSET)
  PV = 0x1bce133 ( abc . ) de\0
  CUR = 2
  LEN = 5



-- 
Paul LeoNerd Evans

leon...@leonerd.org.uk
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Regexp capture group list

2009-11-10 Thread Philip Newton
On Tue, Nov 10, 2009 at 15:53, Paul LeoNerd Evans
leon...@leonerd.org.uk wrote:
 On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote:
 Ooh, yes, it does have a certain charm. And it may even involve less
 string copying

 In fact, they seem to behave quite similarly:

Ah, poo :) Well, at least it's no slower in that respect -- and may
still be a bit quicker since you won't have to run the regexp machine
twice.

Cheers,
Philip
-- 
Philip Newton philip.new...@gmail.com


Re: Regexp capture group list

2009-11-10 Thread Uri Guttman
 PLE == Paul LeoNerd Evans leon...@leonerd.org.uk writes:

  PLE   substr( $_[0], 0, $+[0] ) = ;

4 arg substr is faster than lvalue substr.

substr( $_[0], 0, $+[0], '' ) ;

i do a very similar recursive parse in Template::Simple and i also use
$1 and $2 in s/// in the basic rendering. the compiler variation does
m// and then a 4 arg substr to chop off the matched leading text. you
can see the basic code on cpan and the latest unreleased version with
the compiler code at:

http://perlhunter.com/git/template
http://perlhunter.com/git/gitweb/gitweb.cgi

there also may be a way to use \G in the regex to start parsing from
where you last parsed. i couldn't get that to work but maybe ask damian
for help. :)

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -