>>   $dna =~ m{
>>     (?=
>>       tag
>>       (?:
>>         .*? tag
>>         # the substr(...) is there to avoid using $&
>>         (?{ push @matches, substr($dna, $-[0], $+[0] - $-[0]) })
>>       )+
>>     )
>>     (?!)
>>   }x;

First of all, I haven't benchmarked, and I had thought of doing the
index() and substr() as approach that J. Krahn demonstrated.

The regex uses (?= ... ) to look ahead, so it can match stuff without
consuming it.  Here's an example of what I mean:  if I have a string
"ABCADEFA", and I want all chunks of "A...A", if the regex actually
CONSUMES the "ABCADEFA", then it will have to start after the last A,
meaning I've missed embedded "ADEFA" chunk.  By using a look-ahead, I can
match text while staying where I am in the string.  Compare:

  print "japhy" =~ /(..)/g;

with

  print "japhy" =~ /(?=(..))/g;

Next, to get all the "tag...tag" chunks of varying lengths, I use

  /tag(?:.*?tag)+/

which matches "tagAtag", "tagAtagBtag", "tagAtagBtagCtag", and so on.

The real magic is the code block (?{ ... }) that does the dirty work.
First of all, substr($DNA, $-[0], $+[0] - $-[0]) is just a way of
accessing $& without incurring the penalties associated with it.  So let's
just use $& for now.  The code (push @matches, $&) is executed after every
point that the regex has matched up to an occurence of "tag", so in

  tagTHIStagTHATtagTHOSEtag

it'll happen at:

  tagTHIStag X
  tagTHIStagTHATtag X
  tagTHIStagTHATtagTHOSEtag X
         tagTHATtag X
         tagTHATtagTHOSEtag X
                tagTHOSEtag X

those six locations.  The last thing in the regex is the (?!), which is a
negative look-ahead for nothing, which ALWAYS fails.  This forces the
regex to backtrack, so I get all the matches.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to