Well, some of it depends upon how consistent your markers are:

$temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"

> I need to write a regex for filterin out the string between.
AAA
BBB
CCC

> so in the above case i should have the output as:
AAAZZZZZBBB
BBBSSSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.

So you want the markers and what's between them - will there always be a 
begin/end set of markers, but just of different content?


> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.

$temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';

 @t = ($temp =~/(AAA)(.*?)(BBB)/g);
 foreach (@t)
 {

 print $_;

 }

So, use the alternative to create marker sets (note, you need to add "\n" 
to the end of your print stmts or it'll all run together which makes its 
seem like its working ... sort of):

my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/(AAA|BBB|CCC)(.*?)(AAA|BBB|CCC)/g);
foreach (@t) {
     print "Got: ", $_, "\n";
} 

Sort of work - it gets:
Got: AAA
Got: ZZZZ
Got: BBB
Got: CCC
Got: GGGG
Got: BBB

you want to capture the whole shebang - so we use both the capture parens 
and, because we're using the alternative pipe "|" , the non-capturing 
parens (which are "(?:....)" ) to group our alternatives:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g);
foreach (@t) {
     print "Got: ", $_, "\n";
} 

Got: AAAZZZZBBB
Got: CCCGGGGBBB

But this isn't quite right as its not 'reusing' the last marker set to be 
the beginning of the first.  This gets trickier, you want to restart the 
match at the marker  of the previous match not just after it. First, lets 
go to the cool 
while ( /.../g ) { 

loop - note the change to '$1'  in the print:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
     print "Got: ", $1, "\n";
} 

Got: AAAZZZZBBB
Got: CCCGGGGBBB

Er, I have to go here but I think the proper bump along/reset code might 
be in this articles:

http://www.samag.com/documents/s=10118/sam0703i/0703i.htm

nope. Dang. I'll have to find it.  The \G marks the point of the last 
match, when you're doing a global "/g" matching process. The "pos()" 
function is the location of the current \G and you can reset that. 
Something like:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
     $pos = pos $temp;
     print "Got ($pos):", $1, "\n";
     pos $temp -= 3;
}

Got (14):AAAZZZZBBB
Got (21):BBBSSSSCCC
Got (28):CCCGGGGBBB
Got (36):BBBVVVVVBBB

a

Andy Bach
Systems Mangler
Internet: [EMAIL PROTECTED]
VOICE: (608) 261-5738  FAX 264-5932

"Procrastination is like putting lots and lots of commas in the sentence 
of your life."
Ze Frank 
http://lifehacker.com/software/procrastination/ze-frank-on-procrastination-235859.php
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to