On Wednesday 25 February 2004 17:35, Henry Todd generously enriched virtual reality by making up this one:
Hi, > I'm having trouble counting the number of specific substrings within a > string. I'm working on a bioinformatics coursework at the moment, so my > string looks like this: > > $sequence = "caggaacttcccctcggaagaccatgta"; > > I want to count the number of occurrences of each pair of letters, for > example: > > Number of occurrences of "aa" > Number of occurrences of "gt" > Number of occurrences of "cc" > > This is how I'm counting the number of "cc" pairs at the moment ($cc is > my counter variable): > > $cc++ while $sequence =~ /cc/gi; > As I understand Biology, there is 4 nucleotid acids which gives 4**2 combinaions for dupplets. So you need 8 vars to count the occourence of all douplets. Worse for triplets. (24) As I understand genetics, triplets are what matters, since the rma transcriptase reads triplets as code of amino acids. You might give my updates un my biol. knowledge:-) To make your code reusable in upcomming classworks I suggest: ---snip--- #! /usr/bin/perl use strict; use warnings; my %wmers; my $sequence = "caggaacttcccctcggaagaccatgta"; my $wordsize = 2; for (my $i=0;$i < length($sequence) - $wordsize;$i++){ $wmers{substr($sequence,$i,$wordsize)}++; } foreach (keys %wmers) { print "$_ => $wmers{$_}\n"; } ---snap--- prints on my box: ---snip--- #~> ./gataca.pl at => 1 ct => 2 ag => 2 tt => 1 cc => 4 aa => 2 gt => 1 ga => 3 tg => 1 ca => 2 tc => 2 gg => 2 cg => 1 ac => 2 ---snap--- The Idea is simple: imitate the rma transcriptase (I know you are talking about dna, but does that matter?) by sliding a $wordsize window over the sequenze. For each window content inc the value of the corosponding hash field, create if necessary. I bet, there is a smarter solution using pos and regexes and a character class [gatc]{ $wordsize} - that would even make the thing usable for proteins by changing the character class to the protein alphabet.... But im getting OT her - maybe I should have done something else for a living:-) Enjoy (and reproduce), Wolf -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>