>> $dna =~ m{
>> (?=
>> tag
>> (?:
>> .*? tag
>> # the substr(...) is there to avoid using $&
>> (?{ push @matches, substr($dna, $-[0], $+[0] - $-[0]) })
>> )+
>> )
>> (?!)
>> }x;
First of all, I haven't benchmarked, and I had thought of doing the
index() and substr() as approach that J. Krahn demonstrated.
The regex uses (?= ... ) to look ahead, so it can match stuff without
consuming it. Here's an example of what I mean: if I have a string
"ABCADEFA", and I want all chunks of "A...A", if the regex actually
CONSUMES the "ABCADEFA", then it will have to start after the last A,
meaning I've missed embedded "ADEFA" chunk. By using a look-ahead, I can
match text while staying where I am in the string. Compare:
print "japhy" =~ /(..)/g;
with
print "japhy" =~ /(?=(..))/g;
Next, to get all the "tag...tag" chunks of varying lengths, I use
/tag(?:.*?tag)+/
which matches "tagAtag", "tagAtagBtag", "tagAtagBtagCtag", and so on.
The real magic is the code block (?{ ... }) that does the dirty work.
First of all, substr($DNA, $-[0], $+[0] - $-[0]) is just a way of
accessing $& without incurring the penalties associated with it. So let's
just use $& for now. The code (push @matches, $&) is executed after every
point that the regex has matched up to an occurence of "tag", so in
tagTHIStagTHATtagTHOSEtag
it'll happen at:
tagTHIStag X
tagTHIStagTHATtag X
tagTHIStagTHATtagTHOSEtag X
tagTHATtag X
tagTHATtagTHOSEtag X
tagTHOSEtag X
those six locations. The last thing in the regex is the (?!), which is a
negative look-ahead for nothing, which ALWAYS fails. This forces the
regex to backtrack, so I get all the matches.
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]