On Wednesday 25 February 2004 17:35, Henry Todd generously enriched virtual 
reality by making up this one:

Hi, 

> I'm having trouble counting the number of specific substrings within a
> string. I'm working on a bioinformatics coursework at the moment, so my
> string looks like this:
>
> $sequence = "caggaacttcccctcggaagaccatgta";
>
> I want to count the number of occurrences of each pair of letters, for
> example:
>
> Number of occurrences of "aa"
> Number of occurrences of "gt"
> Number of occurrences of "cc"
>
> This is how I'm counting the number of "cc" pairs at the moment ($cc is
> my counter variable):
>
> $cc++ while $sequence =~ /cc/gi;
>

As I understand Biology, there is 4 nucleotid acids which gives 4**2 
combinaions for dupplets. So you need 8 vars to count the occourence of all 
douplets. Worse for triplets. (24)
As I understand genetics, triplets are what matters, since the rma 
transcriptase reads triplets as code of amino acids. 
You might give my updates un my biol. knowledge:-)

To make your code reusable in upcomming classworks I suggest:

---snip---

#! /usr/bin/perl

use strict;
use warnings;


my %wmers;
my $sequence = "caggaacttcccctcggaagaccatgta";
my $wordsize = 2;

for (my $i=0;$i < length($sequence) - $wordsize;$i++){
  $wmers{substr($sequence,$i,$wordsize)}++;
}

foreach (keys %wmers) {
 print "$_ => $wmers{$_}\n";
} 

---snap---

prints on my box:

---snip---

#~> ./gataca.pl
at => 1
ct => 2
ag => 2
tt => 1
cc => 4
aa => 2
gt => 1
ga => 3
tg => 1
ca => 2
tc => 2
gg => 2
cg => 1
ac => 2

---snap---

The Idea is simple: imitate the rma transcriptase (I know you are talking 
about dna, but does that matter?) by sliding a $wordsize window over the 
sequenze.
For each window content inc the value of the corosponding hash field, create 
if necessary.

I bet, there is  a smarter solution using pos and regexes and a character 
class [gatc]{ $wordsize} - that would even make the thing usable for proteins 
by changing the character class to the protein alphabet....

But im getting OT her - maybe I should have done something else for a 
living:-)

Enjoy (and reproduce), Wolf


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to