Thanks Rob and Chas .. On 5/30/07, Rob Dixon <[EMAIL PROTECTED]> wrote:
Sharan Basappa wrote: > > Hi All, > > I have some background working with scanners built from Flex. And I have > used lookahead capability of flex many a times. But I dont understand the > meaning of ZERO in zero lookahead match rule i.e. (?=pattern) > > For example, to capture overlapping 3 digit patterns from string $str = > 123456 > I use the regex @store = $str =~ m/(?=(\d\d\d))/g; > So here the regex engine actually looks ahead by chars digits. As far as lookahead expressions are concerned, Perl functions identically to Flex. It is called zero-width lookahead because it matches a zero-width /position/ in the string instead of a sequence of characters. If I write '123456' =~ /\d\d\d(...)/ then '456' will be captured as the first three characters were consumed by the preceding pattern. However if I write '123456' =~ /(?=\d\d\d)(...)/ then '123' will be captured instead because the lookahead pattern has zero width. > The other question I have is - how does regex engine decide that it has to > move further its scanner by 1 character everytime since I get output 123 > 234 > 345 456 > when I run this script ? The engine moves as far through your target string as it needs to to find a new match. If I write '1B3D5F' =~ /(?=(.\d.))/g; then the engine will find a match at only every second character, and if I use a much simpler zero-width match, just 'ABCDEF' =~ //g then the regex will match seven times - at the beginning and end and between every pair of characters - so the more complex zero-width match you have written will match at all of the those places as long as there are three digits following. HTH, Rob