Thanks Rob and Chas ..

On 5/30/07, Rob Dixon <[EMAIL PROTECTED]> wrote:

Sharan Basappa wrote:
>
> Hi All,
>
> I have some background working with scanners built from Flex. And I have
> used lookahead capability of flex many a times. But I dont understand
the
> meaning of ZERO in zero lookahead match rule i.e. (?=pattern)
>
> For example, to capture overlapping 3 digit patterns from string $str =
> 123456
> I use the regex @store = $str =~ m/(?=(\d\d\d))/g;
> So here the regex engine actually looks ahead by chars digits.

As far as lookahead expressions are concerned, Perl functions identically
to
Flex. It is called zero-width lookahead because it matches a zero-width
/position/ in the string instead of a sequence of characters. If I write

'123456' =~ /\d\d\d(...)/

then '456' will be captured as the first three characters were consumed by
the
preceding pattern. However if I write

'123456' =~ /(?=\d\d\d)(...)/

then '123' will be captured instead because the lookahead pattern has zero
width.

> The other question I have is - how does regex engine decide that it has
to
> move further its scanner by 1 character everytime since I get output 123
> 234
> 345 456
> when I run this script ?

The engine moves as far through your target string as it needs to to find
a new
match. If I write

'1B3D5F' =~ /(?=(.\d.))/g;

then the engine will find a match at only every second character, and if I
use
a much simpler zero-width match, just

'ABCDEF' =~ //g

then the regex will match seven times - at the beginning and end and
between
every pair of characters - so the more complex zero-width match you have
written
will match at all of the those places as long as there are three digits
following.

HTH,

Rob


Reply via email to