Hi all,
Suppose I have a string ($str) and also the number of maximum mismatch
position is given ($d).
The string may come in two types first is normal string the other is
ambiguous.
$str = 'TTTCGG'; # (length 6)
$str_amb = '[TA]TTCGG'; # Ambiguous (length 6)
$d = 2;
What I intend to do is to find all the string of of length 6 that has
maximum Hamming Distance
(number of mismatches) is less or equal to $d. These strings are
constructed with bases [ATCG].
I already have a brute-force way to do it. That is to pre-generate as many
as 4^l, all the strings of length $l, and then find the neighbours from
there.
But this way is way too time consuming to do it. Since there are many many
strings to test. And also the length of the string is around 12-20
characters. Can anybody advice what's the best way to go about it?
Here is the bruteforce way to do it.
__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Carp;
my $str = '[TA]TTCGG';
$str_amb = '[TA]TTCGG'; # Ambiguous (length 6)
my $e = 2;
find_nbe($str,$e);
sub find_nbe {
my ($str,$d) = @_;
my $l = 6;
my @nucs = qw/A T C G/;
my @enum = enum( $l, [EMAIL PROTECTED] );
foreach my $oligo (@enum) {
my $mm_val =(mm_and_len( $str, $oligo))[0];
# Neighbours shouldn't be itself
if ( $mm_val <= $d && $mm_val > 0 ) {
print "$oligo - $mm_val\n";
}
}
return;
}
sub mm_and_len {
# Compute mismatch count with and without ambigous
# position and return length of source string
# Target is assumed to be never ambigous
my ( $source_, $target ) = @_;
my $source = $source_;
$source =~ s/N/\[ATCG\]/g;
my @sparts = ( $source =~ /(\[.*?\]|.)/g );
my $mismatch_count
= scalar grep substr( $target, $_, 1 ) !~ /^$sparts[$_]/,
0 .. $#sparts;
return ( $mismatch_count, scalar(@sparts) );
}
sub enum {
return @{ $_[1] } unless --$_[0];
map {
my $nuc = $_;
map { $nuc . $_ } @{ $_[1] }
} enum( $_[0], $_[1] );
}
__END__
--
Regards,
Edward WIJAYA
SINGAPORE
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>