On Thursday 14 October 2004 21:44, Michael Robeson wrote:
> Here is as far as I can get with some real code and pseudo code. This
> is just to show you that I am actually trying. :-)
Hi Michael,
this looks quite promising - I'll try to sketch a solution without giving
everything away ;-)
>
> Pseudo - code
This is really important: Always start your scripts with
use warnings;
use strict;
This will help you avoiding errors.
>
> ################################
> # open DNA sequence file
> ################################
>
> print "Enter in the name of the DNA sequence file:\n";
> chomp (my $dna_seq = <STDIN>);
> open(DNA_SEQ, $dna_seq)
> or die "Can't open sequence file for input: $!";
>
> ########################################################
> # read sequence data into a hash - not sure if this is how I should do
> it?
> ########################################################
>
> my %sequences;
>
> $/ = '>'; # set to read in paragraph mode
>
> print "\n***Discovered the following DNA sequences:***\n";
>
> while ( <DNA_SEQ> ) {
> chomp;
> next unless s/^\s*(.+)//;
> my $name = $1;
> s/\s//g;
> $sequences{$name} = $_;
> print "$name found!\n";
> }
> close DNA_SEQ;
This part is not working completely for me - I get the first sequence only.
I've never got used to working with the $/ operator, therefore I'd rewrite
this part like this (without wanting to say that this would be easier than
your approach):
print "\n***Discovered the following DNA sequences:***\n";
# each line holds either a caption or a sequence
my $is_sequence_line = 0; # first line holds a caption
my $name;
while ( <DNA_SEQ> ) {
chomp;
if ($is_sequence_line) {
# this line holds a sequence
$sequences{$name} = $_;
print "sequence for $name found!\n";
}
else {
# this line holds a name - remember it
print "new name : $_\n";
$name = $_;
}
$is_sequence_line = ! $is_sequence_line;
}
close DNA_SEQ;
print "\n";
>
>
> ###################################
> # search for and characterize gaps
> ###################################
>
> <<somehow get data from hash and present it to a loop>>
>
> %gaptype;
>
> <<major pseudo code below>>
> foreach /\D(-+)\D/ found in each sequece # searches for gaps flanked by
> sequence
> $position = pos($1);
> $gaplength = $1;
> $gaplength =~ tr/-//g; # count the number of '-' for that particular
> # gap being processed
> $gaptype{gaplength}++; # count the number of times each gap type
> appears
>
> <<somehow get information from loop an print as seen below>>
>
# first, we need to iterate over the sequences
while (my ($name, $sequence) = each(%sequences)) {
# now we can work on each sequence
print "working on sequence $name\n";
print "sequence : $sequence\n";
# find all gaps in the sequence
# (the g modifier means to search for all occurences)
while ($sequence =~ /(-+)/g) {
print "\n";
# the special variables $`, $& and $' give you the
# text left of the match, the match itself, and the
# text right of the match.
# Using those variables and the length() function,
# you can print something like this:
print "found the following gap : '" . $& . "'\n";
print "the text before the gap : '" . $` . "'\n";
print "the text after the gap : '" . $' . "'\n";
print "the text before the gap is " . length ($`) . " chars long.\n";
}
print "\n";
}
I hope this might give you an approach for handling this - with the variables
I printed, you should be able to fill your %gaptype hash like you sketched
above (you find the length of the gap in length($&), and the gap's position
is the same as the "length of the text left of the gap" + 1).
> ____OUTPUT_____
>
> Human
> number of 3 base pair gaps: 2
> at positions: 6, 16
> number of 5 base pair gaps: 1
> at positions: 25
> .
> .. and so on...
You could either (easy variant) print this information while running through
the while loop or (a bit harder, but might be easier in the long run if you
need to process the data further) store the amount and position of the gaps
inside a two-dimensional hash of arrays - something like:
my %gaps =
( human =>
{ 3 => [6, 16],
5 => [25]
},
chimp =>
...
)
It's a bit tricky to deal with structures like this, but once you got used to
them, you must love them ;-)
Hope this helps.
Philipp
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>