Sharan Basappa wrote:
Hi All,
I have some background working with scanners built from Flex. And I have
used lookahead capability of flex many a times. But I dont understand the
meaning of ZERO in zero lookahead match rule i.e. (?=pattern)
For example, to capture overlapping 3 digit patterns from string $str =
123456
I use the regex @store = $str =~ m/(?=(\d\d\d))/g;
So here the regex engine actually looks ahead by chars digits.
As far as lookahead expressions are concerned, Perl functions identically to
Flex. It is called zero-width lookahead because it matches a zero-width
/position/ in the string instead of a sequence of characters. If I write
'123456' =~ /\d\d\d(...)/
then '456' will be captured as the first three characters were consumed by the
preceding pattern. However if I write
'123456' =~ /(?=\d\d\d)(...)/
then '123' will be captured instead because the lookahead pattern has zero
width.
The other question I have is - how does regex engine decide that it has to
move further its scanner by 1 character everytime since I get output 123
234
345 456
when I run this script ?
The engine moves as far through your target string as it needs to to find a new
match. If I write
'1B3D5F' =~ /(?=(.\d.))/g;
then the engine will find a match at only every second character, and if I use
a much simpler zero-width match, just
'ABCDEF' =~ //g
then the regex will match seven times - at the beginning and end and between
every pair of characters - so the more complex zero-width match you have written
will match at all of the those places as long as there are three digits
following.
HTH,
Rob
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/