On Thursday 14 October 2004 21:44, Michael Robeson wrote: > Here is as far as I can get with some real code and pseudo code. This > is just to show you that I am actually trying. :-)
Hi Michael, this looks quite promising - I'll try to sketch a solution without giving everything away ;-) > > Pseudo - code This is really important: Always start your scripts with use warnings; use strict; This will help you avoiding errors. > > ################################ > # open DNA sequence file > ################################ > > print "Enter in the name of the DNA sequence file:\n"; > chomp (my $dna_seq = <STDIN>); > open(DNA_SEQ, $dna_seq) > or die "Can't open sequence file for input: $!"; > > ######################################################## > # read sequence data into a hash - not sure if this is how I should do > it? > ######################################################## > > my %sequences; > > $/ = '>'; # set to read in paragraph mode > > print "\n***Discovered the following DNA sequences:***\n"; > > while ( <DNA_SEQ> ) { > chomp; > next unless s/^\s*(.+)//; > my $name = $1; > s/\s//g; > $sequences{$name} = $_; > print "$name found!\n"; > } > close DNA_SEQ; This part is not working completely for me - I get the first sequence only. I've never got used to working with the $/ operator, therefore I'd rewrite this part like this (without wanting to say that this would be easier than your approach): print "\n***Discovered the following DNA sequences:***\n"; # each line holds either a caption or a sequence my $is_sequence_line = 0; # first line holds a caption my $name; while ( <DNA_SEQ> ) { chomp; if ($is_sequence_line) { # this line holds a sequence $sequences{$name} = $_; print "sequence for $name found!\n"; } else { # this line holds a name - remember it print "new name : $_\n"; $name = $_; } $is_sequence_line = ! $is_sequence_line; } close DNA_SEQ; print "\n"; > > > ################################### > # search for and characterize gaps > ################################### > > <<somehow get data from hash and present it to a loop>> > > %gaptype; > > <<major pseudo code below>> > foreach /\D(-+)\D/ found in each sequece # searches for gaps flanked by > sequence > $position = pos($1); > $gaplength = $1; > $gaplength =~ tr/-//g; # count the number of '-' for that particular > # gap being processed > $gaptype{gaplength}++; # count the number of times each gap type > appears > > <<somehow get information from loop an print as seen below>> > # first, we need to iterate over the sequences while (my ($name, $sequence) = each(%sequences)) { # now we can work on each sequence print "working on sequence $name\n"; print "sequence : $sequence\n"; # find all gaps in the sequence # (the g modifier means to search for all occurences) while ($sequence =~ /(-+)/g) { print "\n"; # the special variables $`, $& and $' give you the # text left of the match, the match itself, and the # text right of the match. # Using those variables and the length() function, # you can print something like this: print "found the following gap : '" . $& . "'\n"; print "the text before the gap : '" . $` . "'\n"; print "the text after the gap : '" . $' . "'\n"; print "the text before the gap is " . length ($`) . " chars long.\n"; } print "\n"; } I hope this might give you an approach for handling this - with the variables I printed, you should be able to fill your %gaptype hash like you sketched above (you find the length of the gap in length($&), and the gap's position is the same as the "length of the text left of the gap" + 1). > ____OUTPUT_____ > > Human > number of 3 base pair gaps: 2 > at positions: 6, 16 > number of 5 base pair gaps: 1 > at positions: 25 > . > .. and so on... You could either (easy variant) print this information while running through the while loop or (a bit harder, but might be easier in the long run if you need to process the data further) store the amount and position of the gaps inside a two-dimensional hash of arrays - something like: my %gaps = ( human => { 3 => [6, 16], 5 => [25] }, chimp => ... ) It's a bit tricky to deal with structures like this, but once you got used to them, you must love them ;-) Hope this helps. Philipp -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>