On Thu, 14 Oct 2004 23:23:48 +0200, Paul Johnson <[EMAIL PROTECTED]> wrote:
> On Thu, Oct 14, 2004 at 11:02:06AM -0600, Michael Robeson wrote:
> 
> > I have a set of data that looks something like the following:
> 
> > So, my problem is that I think I know some of the bits of code to put
> > into place the problem is I am getting lost on how to structure it all
> > together.

  Hi Paul,
  I think you missed a critical part of Mike's post!:

> For now I am just trying to get my output to look like this:
>
>Human
>number of 3 base pair gaps:             2
>                      at positions:           6, 16
>number of 5 base pair gaps:             1
>                       at positions:           25
>
>Chimp
>.... and so on ...

I've put together something that will get the first part done
(counting base pair gaps, I guess is the point!)  Code is as follows:

use warnings;
use strict;

use Data::Dumper;

my %sequences = (
        'human' => "acgtt---cgatacg---acgact-----t",
        'chimp' => "acgtt---cgatacg---acgact-----t",
        'mouse' => "acgata---acgatcg----acgt",
);
my %results;

foreach my $species( keys %sequences ) {
        my $base_pair = 0;
        my $base_pair_value;
        foreach( split( / */, $sequences{$species} )) {
                if( /-/ ) {
                        $base_pair = 1;
                        $base_pair_value .= $_;
                } elsif( $base_pair ) {
                        $results{$species}{length($base_pair_value)} += 1;
                        $base_pair = 0;
                        $base_pair_value = undef;
                }
        }        
}
 
foreach my $species( keys %results ) {
        print "$species = $sequences{$species}\n";
        foreach my $base_pair( keys %{$results{$species}} ) {
                print "   Number of $base_pair base pair
gaps:\t$results{$species}{$base_pair}\n";
        }
}

This will produce the following output:

# dnatest
human = acgtt---cgatacg---acgact-----t
   Number of 3 base pair gaps:  2
   Number of 5 base pair gaps:  1
chimp = acgtt---cgatacg---acgact-----t
   Number of 3 base pair gaps:  2
   Number of 5 base pair gaps:  1
mouse = acgata---acgatcg----acgt
   Number of 4 base pair gaps:  1
   Number of 3 base pair gaps:  1

I put the sequence in the output for easy troubleshooting and
checking.  I'm still working on figuring out the positional data.

This IS fun.  I'll post when I've got it figured out
--Errin

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to