On Nov 1, 2012, at 12:44 AM, Thomas Smith wrote:

> Hi,
> 
> I'm trying to search a file for several matching blocks of text. A sample
> of what I'm searching through is below.
> 
> What I want to do is match "##### START block #####" through to the next
> "##### END block #####" and repeat that throughout the file without
> matching any of the text that falls between each matched block (that is,
> the "ok: some text" lines should not be matched). Here is the one-liner I'm
> using:
> 
> perl -p -e '/^##### START block #####.*##### END block #####$/s' file.txt
> 
> I've tried a few variations of this but with the same result--a match is
> being made from the first "##### START block #####" to the last "##### END
> block #####", and everything in between... I believe that the ".*",
> combined with the "s" modifier, in the regex is causing this match to be
> made.

The '*' is what's called a "greedy" quantifier. That means it will match as 
many characters in the string as possible. What the regular expression engine 
does when it encounters the pattern '.*' is to immediately match it with as 
many characters as possible. Since your regular expression includes the 's' 
modifier, this will include newlines as well. When the RE engine sees that 
there are characters in the pattern after the '.*', it will start removing 
characters from the end of the substring matched by the '.*' until the 
subsequent pattern characters are also matched. This will continue until there 
are no characters matched by the '.*'.

The result of all this is that for your pattern, the last '##### END block 
#####' substring is the one that will be matched, and the '.*' pattern will 
match everything between the first '##### START block #####' and the last 
'##### END block #####'.

The way to fix this is to make the '*' quantifier "non-greedy" by putting a '?' 
quantifier after it. With that pattern, the RE engine will match as few 
characters as possible, and the first START block will pair up with the first 
subsequent END block. A 'g' modifier will tell the RE engine to start looking 
after each match for the next match in the string.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to