Well, some of it depends upon how consistent your markers are:
$temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
> I need to write a regex for filterin out the string between.
AAA
BBB
CCC
> so in the above case i should have the output as:
AAAZZZZZBBB
BBBSSSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.
So you want the markers and what's between them - will there always be a
begin/end set of markers, but just of different content?
> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.
$temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
@t = ($temp =~/(AAA)(.*?)(BBB)/g);
foreach (@t)
{
print $_;
}
So, use the alternative to create marker sets (note, you need to add "\n"
to the end of your print stmts or it'll all run together which makes its
seem like its working ... sort of):
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/(AAA|BBB|CCC)(.*?)(AAA|BBB|CCC)/g);
foreach (@t) {
print "Got: ", $_, "\n";
}
Sort of work - it gets:
Got: AAA
Got: ZZZZ
Got: BBB
Got: CCC
Got: GGGG
Got: BBB
you want to capture the whole shebang - so we use both the capture parens
and, because we're using the alternative pipe "|" , the non-capturing
parens (which are "(?:....)" ) to group our alternatives:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g);
foreach (@t) {
print "Got: ", $_, "\n";
}
Got: AAAZZZZBBB
Got: CCCGGGGBBB
But this isn't quite right as its not 'reusing' the last marker set to be
the beginning of the first. This gets trickier, you want to restart the
match at the marker of the previous match not just after it. First, lets
go to the cool
while ( /.../g ) {
loop - note the change to '$1' in the print:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
print "Got: ", $1, "\n";
}
Got: AAAZZZZBBB
Got: CCCGGGGBBB
Er, I have to go here but I think the proper bump along/reset code might
be in this articles:
http://www.samag.com/documents/s=10118/sam0703i/0703i.htm
nope. Dang. I'll have to find it. The \G marks the point of the last
match, when you're doing a global "/g" matching process. The "pos()"
function is the location of the current \G and you can reset that.
Something like:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
$pos = pos $temp;
print "Got ($pos):", $1, "\n";
pos $temp -= 3;
}
Got (14):AAAZZZZBBB
Got (21):BBBSSSSCCC
Got (28):CCCGGGGBBB
Got (36):BBBVVVVVBBB
a
Andy Bach
Systems Mangler
Internet: [EMAIL PROTECTED]
VOICE: (608) 261-5738 FAX 264-5932
"Procrastination is like putting lots and lots of commas in the sentence
of your life."
Ze Frank
http://lifehacker.com/software/procrastination/ze-frank-on-procrastination-235859.php
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs