From: Robert Citek <[EMAIL PROTECTED]> > On Tuesday, Sep 21, 2004, at 17:17 US/Central, Jenda Krynicky wrote: > > > > How about this: > > $s = "sasas dfgfgh asasas asedsase"; > > while ($s =~ /\G.*?(?=sas)./g) { > > print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n"; > > } > > Thanks. Seems to work, although I'm still trying to grok it. I'll > probably have questions later.
Let me try again then :-) If you use the /g option with a match evaluated in the scalar context, the match finds the first match only on the first round, then the next one next time it's evaluated and so forth: $s = "foo brkshr frt ty fgh fss"; while ($s =~ /f(..)/g) { print "$1\n"; } Each time it starts looking for the next match where the last one left off: $s = "foo brkshr fftr ty fgh fss"; while ($s =~ /f(..)/g) { print "$1\n"; } As you can see it found. foo, fft, fgh and fss, but skipped ftr because it starts before the end of the previous match. That's why I need the (?=). This instructs the regexp engine to check that the regexp inside the braces matches at the point but keep the pointer at the same place: $s = "faaf bar fbbfccf"; while ($s =~ /f(..)(?=f)/g) { print "$1\n"; } vs. $s = "faaf bar fbbfccf"; while ($s =~ /f(..)f/g) { print "$1\n"; } The regexp I gave you was unnecessarily complex. With /g the regexp starts automaticaly where it left off the last time so I do not need the \G.*? so I can write it as: while ($s =~ /(?=sas)./g) { print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n"; } and it will mean exactly the same. And it seems the . at the end of the regexp and the -1 subtracted from the pos($s) is not needed either. Which means it's actually much easier than I had you believe: $s = "sasas dfgfgh asasas asedsase"; while ($s =~ /(?=sas)/g) { print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n"; } With the \G.*? I had to use the . at the end of the regexp to make sure the pointer gets moved just after the first character matched by the regexp, without it the pointer gets moved automaticaly. Try $s = "sasas dfgfgh asasas asedsase"; while ($s =~ /\G.*?(?=sas)/g) { print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n"; } As you can see it returns most matches twice. The reason is that Perl moves the pointer by as many characters as matched by the complete regexp or by one character is the match was zero size (keep in mind that the stuff in (?=) doesn't count!). So in the string it first match was "" at the very beginning of the string and the pointer was moved one char: s^asas dfgfgh asasas asedsase next match was "a" preceding the second "sas" and the pointer was moved one character to sa^sas dfgfgh asasas asedsase next match was empty and the pointer was moved one char: sas^as dfgfgh asasas asedsase the next match was "as dfgfgh a" and the pointer was moved to: sasas dfgfgh a^sasas asedsase the next match is again empty and the pointer is moved to: sasas dfgfgh as^asas asedsase and so forth. If we do not include the \G.*? we do not match the strings in between so the match is always empty, just before the searched stuff, we always move the pointer just after the first character of the stuff we looked for. Humpf, not sure I'm still making sense. HTH, Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>