I haven't yet had the chance to thoroughly test and Benchmark this vs. Ron Kimball's solution, so there's no summary yet to report (beside the fact that you and Ron Kimball both provided apparently reasonable solutions). I will also, as originally promised, package the "winning" solution into BioPerl's Bio::Tools::* namespace for general biosequence pattern matching. But not today, and probably not tomorrow. Beyond that, I cannot say.
-Aaron
On Mar 17, 2004, at 5:51 PM, John Douglas Porter wrote:
Here's my solution, which I sent to the OP off list. Funny how he never reported back with a summary, as is the custom....
# operates on $_ sub find_subsequences { #local $_ = shift; # if you'd rather pass it. my( $chars, $min_occurrences, $max_subseq_length, $results_ar ) = @_; $results_ar ||= []; my @p; # a sliding window of positions at which pattern matches. while ( /[$chars]/g ) { push @p, pos() - 1; # since pos() is 1-based. shift @p while @p > $min_occurrences; # slide the window if ( @p == $min_occurrences ) { my $len = $p[-1] - $p[0] + 1; if ( $len <= $max_subseq_length ) { my $subseq = substr $_, $p[0], $len; push @$results_ar, [ $p[0], $subseq ]; } } } $results_ar }
# test
my @chars = qw( A T C G );
$_ = ''; for ( my $i = 0; $i < 10_000; $i++ ) { $_ .= $chars[ rand @chars ]; }
use Tie::Array; @Tie::ArrayPrint::ISA = qw( Tie::StdArray ); sub Tie::ArrayPrint::PUSH { my $self = shift; @_ == 1 && ref($_[0]) eq 'ARRAY' and print "\@ $_[0][0] ($_[0][1])\n"; push @$self, @_; }
my @subseqs; #tie @subseqs, 'Tie::ArrayPrint'; find_subsequences( join('',@chars[0,1]), 7, 12, [EMAIL PROTECTED] ); print scalar(@subseqs), " hits found.\n";
-- John Douglas Porter
__________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com
