On Thu, 14 Oct 2004 16:11:42 -0600, Michael Robeson <[EMAIL PROTECTED]> wrote:
> Yeah, I have just submitted that same question verbatim to the bio-perl
> list. I am still running through some ideas though. I have both
> Bioinformatics perl books. They are not very effective teaching books.
> 
> The books spend too much time on using modules. Though while I
> understand the usefulness of not having to re-write code, it is a bad
> idea for beginners like me. Because re-writing code at first gives me a
> lot of practice. Some of the scripts in the books use like 3-5 modules,
> so it gets confusing on what is going on.
> 
> I mean the books are not useless, but they definitely are structured
> for a class with a teacher.
> 
> :-)
> 
> -Mike
> 

Hi again, Mike!

I've thrown together the following code.  I have not commented this! 
If you have some questions, just ask.  I hard coded the sequences for
my ease-of-use.  It looked to me like you have figured out how to grab
the sequences out of  a file and throw them in a hash.  This code uses
some deep nested references, and therefore, some crazy dereferences. 
Have fun with it, I know I did!  Things that might look weird:  check
out perldoc -f split for info on using a null-string to split with
(That's were I found it!) and of course perldoc perlref for all the
deep nested references and dereferencing stuff!  I'm currently reading
"Learning Perl Objects, References & Modules" by Randal Schwartz.  I
highly recommend it.  It helped a lot in this exercise.  Here's the
code:

use warnings;
use strict;

my %sequences = (
        'Human' => "acgtt---cgatacg---acgact-----t",
        'Chimp' => "acgtt---cgatacg---acgact-----t",
        'Mouse' => "acgata---acgatcg----acgt",
);
my %results;

foreach my $species( keys %sequences ) {
        my $is_base_pair_gap = 0;
        my $base_pair_gap;
        my $base_pair_gap_pos;
        my $position = 1;
        foreach( split( / */, $sequences{$species} )) {
                if( /-/ ) {
                        unless( $is_base_pair_gap ) {
                                $base_pair_gap_pos = $position;
                        }
                        $is_base_pair_gap = 1;
                        $base_pair_gap .= $_;
                } elsif( $is_base_pair_gap ) {
                        push
@{$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;
                        $is_base_pair_gap = 0;
                        $base_pair_gap = undef;
                }
                $position++;
        }
}

foreach my $species( keys %results ) {
        print "$species:\n";
        foreach my $base_pair_gap( keys %{$results{$species}} ) {
                print "   Number of $base_pair_gap base pair gaps:\t",
scalar( @{$results{$species}{$base_pair_gap}}), "\n";
                print "     at position(s) ", join( ',',
@{$results{$species}{$base_pair_gap}} ), ".\n";
        }
        print "\n";
}




The heart of this code is this line:
push @{$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;

there is a %results hash which has keys that are the different
species, and values that point to another hash.  THAT hash (the inner
hash) has keys that are the length of the base-pair-gaps, and values
that point to an array.  The array holds a list of the positions of
those base-pair gaps!  The first base pair gap in the human sequence
is '---' at the 6th character.  That looks like this (warning: pseudo
code for clarity!)
  %results->{'Human'}->{ 3 }->[6]
When we find the second '---' gap, we add it's position to the array:
  %results->{'Human'}->{ 3 }->[6,16]
Then, we find a new base-pair-gap ('-----') so we add a new key to inner hash:
  %results->{'Human'}->{ 3 }->[6,16]
                               ->{ 5 }->[25]
Next, we move on to the next species ...
  %results->{'Human'}->{ 3 }->[6,16]
                               ->{ 5 }->[25]
               ->{'Mouse'}->{ 3 }->[7]

So, finally, with Data::Dumper, we can see the %results hash when the
code is done processing the sequence:

%results = {
          'Human' => {
                       '3' => [
                                6,
                                16
                              ],
                       '5' => [
                                25
                              ]
                     },
          'Mouse' => {
                       '4' => [
                                17
                              ],
                       '3' => [
                                7
                              ]
                     },
          'Chimp' => {
                       '3' => [
                                6,
                                16
                              ],
                       '5' => [
                                25
                              ]
                     }
        };
                 

I hope this is helpful!  This really was a lot of fun.

--Errin

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to