Re: Regexp capture group list
On Tue, 10 Nov 2009, at 14:37:59, Philip Newton philip.new...@gmail.com wrote: But you could try this: sub parse { my ( $text, $re ) = @_; my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n; $_[0] =~ s/^$re//; return @matches } at the cost of running the regexp twice (once for matching and capturing, then once for substituting). Cheers, Philip Capturing and substituting in one step. #- sub Parse { my @matches = (); use re 'eval'; $_[0] =~ s/($_[1])(?{push(@matches, $1)})//g; return @matches; } #- Torsten
Regexp capture group list
I'm writing an attempt at a simple recursive-descent parser with no backtracking or alternation, for parsing a really simple grammar. My usual method is to write a collection of functions that eat a prefix from the string they're passed as $_[0] (mutably so), and return any interesting data. A basic primative to start with is something like: sub parse { my ( $text, $re ) = @_; $_[0] =~ s/^$re// or die Expected $re in $text...\n; } sub parse_idspec { parse $_[0], qr/ID\s+(\d+)/ and return $1; } I was rather annoyed to find that the regexp capture buffers $1, $2, etc... are in fact dynamically scoped. This means that $1 can't escape from parse(). It behaves as if 'local $1' was present in parse(); $1 in parse_idspec() contains whatever it used to. After some headscratching I decided instead to have parse() return a list of the capture groups. I so far haven't found a neater expression than sub parse { my ( $text, $re ) = @_; $_[0] =~ s/^$re// or die Expected $re in $text...\n; return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+ } This seems a common-enough idiom that perhaps there's a neater solution - I find there's no @{^MATCHGROUPS} or similar present in perl... Can anyone offer any neater suggestions? -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 02:37:59PM +0100, Philip Newton wrote: But you could try this: sub parse { my ( $text, $re ) = @_; my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n; $_[0] =~ s/^$re//; return @matches } at the cost of running the regexp twice (once for matching and capturing, then once for substituting). Ooh; but wait a moment.. we can do better... $+[0] contains the string index of the end of the match. The leading ^ means it must have been at the start. So how about my @matches = $_[0] =~ m/^$re/ or die ; substr( $_[0], 0, $+[0] ) = ; return @matches; I think I like that... -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: So how about my @matches = $_[0] =~ m/^$re/ or die ; substr( $_[0], 0, $+[0] ) = ; return @matches; I think I like that... Ooh, yes, it does have a certain charm. And it may even involve less string copying -- I don't know whether s/^.// is optimised to do that, but AFAIK substr( ..., 0, ... ) = will simply set the internal OFFSET flag in the SV. Cheers, Philip -- Philip Newton philip.new...@gmail.com
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 01:59:24PM +, Jasper wrote: return map $$_, 1..$#- too hideous? (I would think it was fine...) That isn't going to work under strict... Surely you mean..? return map { no strict 'refs'; $$_ } 1 .. $#-; ;) In any case, I think I prefer the match in m// then cut of matching prefix idea, as suggested by Philip Newton.. -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 14:11, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: After some headscratching I decided instead to have parse() return a list of the capture groups. I so far haven't found a neater expression than sub parse { my ( $text, $re ) = @_; $_[0] =~ s/^$re// or die Expected $re in $text...\n; return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+ } This seems a common-enough idiom that perhaps there's a neater solution - I find there's no @{^MATCHGROUPS} or similar present in perl... Can anyone offer any neater suggestions? For matches, you can use list context assignment, which will give you the groups. Unfortunately, that doesn't work for substitutions, which always return a count of substitutions made. But you could try this: sub parse { my ( $text, $re ) = @_; my @matches = $_[0] =~ /^$re// or die Expected $re in $text...\n; $_[0] =~ s/^$re//; return @matches } at the cost of running the regexp twice (once for matching and capturing, then once for substituting). Cheers, Philip -- Philip Newton philip.new...@gmail.com
Re: Regexp capture group list
2009/11/10 Philip Newton philip.new...@gmail.com: On Tue, Nov 10, 2009 at 14:11, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: After some headscratching I decided instead to have parse() return a list of the capture groups. I so far haven't found a neater expression than sub parse { my ( $text, $re ) = @_; $_[0] =~ s/^$re// or die Expected $re in $text...\n; return map { substr $text, $-[$_], $+[$_]-$-[$_] } 1 .. $#+ } return map $$_, 1..$#- too hideous? (I would think it was fine...) -- Jasper
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote: On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: So how about my @matches = $_[0] =~ m/^$re/ or die ; substr( $_[0], 0, $+[0] ) = ; return @matches; I think I like that... Ooh, yes, it does have a certain charm. And it may even involve less string copying -- I don't know whether s/^.// is optimised to do that, but AFAIK substr( ..., 0, ... ) = will simply set the internal OFFSET flag in the SV. Seems to: $ perl -MDevel::Peek -e 'my $foo = abcde; substr( $foo, 0, 3 ) = ; Dump $foo' SV = PVIV(0xf370d0) at 0xf24e38 REFCNT = 2 FLAGS = (PADMY,POK,OOK,pPOK) IV = 3 (OFFSET) PV = 0xf3c143 ( abc . ) de\0 CUR = 2 LEN = 5 -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Regexp capture group list
(appols for semi-duplicate) On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote: On Tue, Nov 10, 2009 at 14:51, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: So how about my @matches = $_[0] =~ m/^$re/ or die ; substr( $_[0], 0, $+[0] ) = ; return @matches; I think I like that... Ooh, yes, it does have a certain charm. And it may even involve less string copying -- I don't know whether s/^.// is optimised to do that, but AFAIK substr( ..., 0, ... ) = will simply set the internal OFFSET flag in the SV. In fact, they seem to behave quite similarly: $ perl -MDevel::Peek -e 'my $foo = abcde; substr( $foo, 0, 3 ) = ; Dump $foo' SV = PVIV(0x8e50d0) at 0x8d2e38 REFCNT = 2 FLAGS = (PADMY,POK,OOK,pPOK) IV = 3 (OFFSET) PV = 0x8ea143 ( abc . ) de\0 CUR = 2 LEN = 5 $ perl -MDevel::Peek -e 'my $foo = abcde; $foo =~ s/^abc//; Dump $foo' SV = PVIV(0x1bc90d0) at 0x1bb6e38 REFCNT = 1 FLAGS = (PADMY,POK,OOK,pPOK) IV = 3 (OFFSET) PV = 0x1bce133 ( abc . ) de\0 CUR = 2 LEN = 5 -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Regexp capture group list
On Tue, Nov 10, 2009 at 15:53, Paul LeoNerd Evans leon...@leonerd.org.uk wrote: On Tue, Nov 10, 2009 at 03:11:04PM +0100, Philip Newton wrote: Ooh, yes, it does have a certain charm. And it may even involve less string copying In fact, they seem to behave quite similarly: Ah, poo :) Well, at least it's no slower in that respect -- and may still be a bit quicker since you won't have to run the regexp machine twice. Cheers, Philip -- Philip Newton philip.new...@gmail.com
Re: Regexp capture group list
PLE == Paul LeoNerd Evans leon...@leonerd.org.uk writes: PLE substr( $_[0], 0, $+[0] ) = ; 4 arg substr is faster than lvalue substr. substr( $_[0], 0, $+[0], '' ) ; i do a very similar recursive parse in Template::Simple and i also use $1 and $2 in s/// in the basic rendering. the compiler variation does m// and then a 4 arg substr to chop off the matched leading text. you can see the basic code on cpan and the latest unreleased version with the compiler code at: http://perlhunter.com/git/template http://perlhunter.com/git/gitweb/gitweb.cgi there also may be a way to use \G in the regex to start parsing from where you last parsed. i couldn't get that to work but maybe ask damian for help. :) uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com -