starting position of RE match

2004-09-21 Thread Robert Citek
Is there an equivalent for "index" that uses regular expressions 
instead of exact string?

I've been looking at index, pos, m//, and the corresponding "$" 
variables but nothing I've found so far does what I'm looking for.  
Specifically, what I'm trying to do is find all the starting locations 
of a RE match.  For example, using an exact string match:

$ perl -e '$foo="bb"; $re="aa" ;
   for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo, 
$re, $bar+1))
 { print $bar, "\t" }
   print "\n" ; '
1   2   3

I'd like to do the same except use a regular expression.  BTW, notice 
that matches can overlap.

Any thoughts or ideas?
Regards,
- Robert
http://www.cwelug.org
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: starting position of RE match

2004-09-21 Thread Jenda Krynicky
From: Robert Citek <[EMAIL PROTECTED]>
> Is there an equivalent for "index" that uses regular expressions
> instead of exact string?
> 
> I've been looking at index, pos, m//, and the corresponding "$" 
> variables but nothing I've found so far does what I'm looking for. 
> Specifically, what I'm trying to do is find all the starting locations
> of a RE match.  For example, using an exact string match:
> 
> $ perl -e '$foo="bb"; $re="aa" ;
> for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo,
> $re, $bar+1))
>   { print $bar, "\t" }
> print "\n" ; '
> 1   2   3
> 
> I'd like to do the same except use a regular expression.  BTW, notice
> that matches can overlap.
> 
> Any thoughts or ideas?

How about this:

$s = "sasas dfgfgh asasas asedsase";

while ($s =~ /\G.*?(?=sas)./g) {
print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n";
}

the "sas" is the regexp being matched.

The \G matches where the last match left off, the .*? skips as few 
characters as possible, the (?=) makes sure the regexp matches at 
that place, but doesn't move the position in string and the . at the 
end moves the position so that the next round doesn't find the same 
occurrence. That's also why I have to subtrct the 1 from the pos($s).

You could also do this:

while ($s =~ /\G.+?(?=sas)/g) {
print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n";
}

Which looks a bit nicer, but it would miss the match at the very 
beginning of the string.

HTH, Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: starting position of RE match

2004-09-21 Thread Robert Citek
On Tuesday, Sep 21, 2004, at 17:17 US/Central, Jenda Krynicky wrote:
How about this:
$s = "sasas dfgfgh asasas asedsase";
while ($s =~ /\G.*?(?=sas)./g) {
print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n";
}
Thanks.  Seems to work, although I'm still trying to grok it.  I'll 
probably have questions later.

Regards,
- Robert
http://www.cwelug.org
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: starting position of RE match

2004-09-22 Thread Jenda Krynicky
From: Robert Citek <[EMAIL PROTECTED]>
> On Tuesday, Sep 21, 2004, at 17:17 US/Central, Jenda Krynicky wrote: 
> >
> > How about this: 
> > $s = "sasas dfgfgh asasas asedsase"; 
> > while ($s =~ /\G.*?(?=sas)./g) { 
> > print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n"; 
> > }
> 
> Thanks.  Seems to work, although I'm still trying to grok it.  I'll
> probably have questions later.

Let me try again then :-)

If you use the /g option with a match evaluated in the scalar 
context, the match finds the first match only on the first round, 
then the next one next time it's evaluated and so forth:

$s = "foo brkshr frt ty fgh fss";
while ($s =~ /f(..)/g) {
print "$1\n";
}

Each time it starts looking for the next match where the last one 
left off:

$s = "foo brkshr fftr ty fgh fss";
while ($s =~ /f(..)/g) {
print "$1\n";
} 

As you can see it found. foo, fft, fgh and fss, but skipped ftr 
because it starts before the end of the previous match.

That's why I need the (?=). This instructs the regexp engine to check 
that the regexp inside the braces matches at the point but keep the 
pointer at the same place:

$s = "faaf bar fbbfccf";
while ($s =~ /f(..)(?=f)/g) {
print "$1\n";
} 
vs.
$s = "faaf bar fbbfccf";
while ($s =~ /f(..)f/g) {
print "$1\n";
} 

The regexp I gave you was unnecessarily complex. With /g the regexp 
starts automaticaly where it left off the last time so I do not need 
the \G.*? so I can write it as:

while ($s =~ /(?=sas)./g) { 
print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n"; 
}

and it will mean exactly the same.

And it seems the . at the end of the regexp and the -1 subtracted 
from the pos($s) is not needed either.

Which means it's actually much easier than I had you believe:

$s = "sasas dfgfgh asasas asedsase"; 
while ($s =~ /(?=sas)/g) { 
print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n"; 
}

With the \G.*? I had to use the . at the end of the regexp to make 
sure the pointer gets moved just after the first character matched by 
the regexp, without it the pointer gets moved automaticaly.

Try

$s = "sasas dfgfgh asasas asedsase"; 
while ($s =~ /\G.*?(?=sas)/g) { 
print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n"; 
}

As you can see it returns most matches twice. The reason is that Perl 
moves the pointer by as many characters as matched by the complete 
regexp or by one character is the match was zero size (keep in mind 
that the stuff in (?=) doesn't count!).

So in the string it first match was "" at the very beginning of the 
string and the pointer was moved one char:
s^asas dfgfgh asasas asedsase
next match was "a" preceding the second "sas" and the pointer was 
moved one character to
sa^sas dfgfgh asasas asedsase
next match was empty and the pointer was moved one char:
sas^as dfgfgh asasas asedsase
the next match was "as dfgfgh a" and the pointer was moved to:
sasas dfgfgh a^sasas asedsase
the next match is again empty and the pointer is moved to:
sasas dfgfgh as^asas asedsase
and so forth.

If we do not include the \G.*? we do not match the strings in between 
so the match is always empty, just before the searched stuff, we 
always move the pointer just after the first character of the stuff 
we looked for.

Humpf, not sure I'm still making sense.

HTH, Jenda





= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: starting position of RE match

2004-09-23 Thread Robert Citek
On Wednesday, Sep 22, 2004, at 08:05 US/Central, Jenda Krynicky wrote:
Which means it's actually much easier than I had you believe:
$s = "sasas dfgfgh asasas asedsase";
while ($s =~ /(?=sas)/g) {
print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n";
}
Based on your example, I was able to transform this:
$ perl -e '$foo="bb"; $re="aa" ;
   for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo, 
$re, $bar+1))
 { print $bar, "\t" }
   print "\n" ; '

into this:
$ perl -e '$foo="bb"; $re=qr/(?=aa)/ ;
   while($foo =~ /$re/g) { print pos($foo), "\t" }
   print "\n" ; '
Works exactly as I had hoped, and I understand this one.
Will study your other examples with \G.  I still don't understand 
those.  Will probably just take a little time and experimenting.

Thanks for your help.
Regards,
- Robert
OpenSource for Windows, Linux, and Mac OS/X
http://www.cwelug.org/downloads
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]