Hi all,

Suppose I have a string ($str) and also the number of maximum mismatch position is given ($d). The string may come in two types first is normal string the other is ambiguous.


$str     = 'TTTCGG'; # (length 6)
$str_amb = '[TA]TTCGG'; # Ambiguous (length 6)
$d   = 2;

What I intend to do is to find all the string of of length 6 that has maximum Hamming Distance (number of mismatches) is less or equal to $d. These strings are constructed with bases [ATCG].

I already have a brute-force way to do it. That is to pre-generate as many as 4^l, all the strings of length $l, and then find the neighbours from there.

But this way is way too time consuming to do it. Since there are many many strings to test. And also the length of the string is around 12-20 characters. Can anybody advice what's the best way to go about it?

Here is the bruteforce way to do it.

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Carp;
my $str = '[TA]TTCGG';
$str_amb = '[TA]TTCGG'; # Ambiguous (length 6)
my $e    = 2;

find_nbe($str,$e);

sub find_nbe {

    my ($str,$d) = @_;
    my $l = 6;
    my @nucs = qw/A T C G/;
    my @enum = enum( $l, [EMAIL PROTECTED] );

    foreach my $oligo (@enum) {
        my $mm_val =(mm_and_len( $str, $oligo))[0];

        # Neighbours shouldn't be itself
        if ( $mm_val <= $d && $mm_val > 0 ) {
            print "$oligo - $mm_val\n";
        }
    }

    return;
}

sub mm_and_len {

    # Compute mismatch count with and without ambigous
    # position and return length of source string
    # Target is assumed to be never ambigous

    my ( $source_, $target ) = @_;
    my $source = $source_;
    $source =~ s/N/\[ATCG\]/g;
    my @sparts = ( $source =~ /(\[.*?\]|.)/g );
    my $mismatch_count
        = scalar grep substr( $target, $_, 1 ) !~ /^$sparts[$_]/,
        0 .. $#sparts;
    return ( $mismatch_count, scalar(@sparts) );
}



sub enum {
    return @{ $_[1] } unless --$_[0];
    map {
        my $nuc = $_;
        map { $nuc . $_ } @{ $_[1] }
    } enum( $_[0], $_[1] );
}

__END__

--
Regards,
Edward WIJAYA
SINGAPORE

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to