I show in order: the format of the input data, the output of the input data, and finally the working script. I have yet to add comments into the code but I am sure many of you veterans will figure it out.
-Thanks again all! As always comments, questions or suggestions are welcome!
-Mike
-------- INPUT -------- >dog atcg--acgat---act-ca---- >cat acgt-acgt----acgt-gt-agct- >mouse ---acgtacg-atcg---actgac-
------- OUTPUT --------- ***Discovered the following DNA sequences:*** dog found! cat found! mouse found! >>>>>> mouse <<<<<< Indel size: 1 Times found: 2 Positions: 11 25
Indel size: 3 Times found: 2 Positions: 1 16
>>>>>> cat <<<<<< Indel size: 1 Times found: 4 Positions: 5 18 21 26
Indel size: 4 Times found: 1 Positions: 10
>>>>>> dog <<<<<< Indel size: 1 Times found: 1 Positions: 18
Indel size: 2 Times found: 1 Positions: 5
Indel size: 3 Times found: 1 Positions: 12
Indel size: 4 Times found: 1 Positions: 21
------- Script -------
#!usr/bin/perl
# By Michael S. Robeson II, with the help of friends at lernperl.org and bioperl.org! :-)
# 10/16/2004
use warnings; use strict;
############################### # Open Sequence Data & OUTFILE ###############################
print "Enter in the name of the DNA sequence file:\n"; chomp (my $dna_seq = <STDIN>);
open(DNA_SEQ, $dna_seq)
or die "Can't open file: $!\n";open(OUTFILE, ">indel_list_"."$dna_seq")
or die "Can't open outfile: $!\n";############################ # Read sequence data into a hash ############################
my %sequences;
$/ = '>';
print "\n***Discovered the following DNA sequences:***\n";
while ( <DNA_SEQ> ) {
chomp;
next unless s/^\s*(.+)//;
my $name = $1;
s/\s//g;
$sequences{$name} = $_;
print "$name found!\n";
}
close DNA_SEQ;###################################### # iterate over gaps and write to file ######################################
foreach (keys %sequences) {
print "\t\t\>\>\>\>\>\> $_ \<\<\<\<\<\<\n";
print OUTFILE "\>\>\>\>\>\> $_ \<\<\<\<\<\<\n";
my $dna = $sequences{$_};
my %gap_data;
my %position;
while ($dna =~ /(\-+)/g) {
my $gap_pos = pos ($dna) - length($&) + 1;
my $gap_length = length $1; #$1 =~ tr/\-+//
$gap_data{$gap_length}++;
$position{$gap_length} .= "$gap_pos"." \n";
}
my @indels = keys (%gap_data);
my @keys = sort { $a <=> $b} @indels;
foreach my $key (@keys) {
print "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n";
print OUTFILE "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n";
print "Positions:\n";
print OUTFILE "Positions:\n";
print "$position{$key}";
print OUTFILE "$position{$key}";
print "\n";
print OUTFILE "\n";
}
# Can replace the last "foreach loop" above with the while loop
# below to do the same thing. Only Gap sizes will not be sorted.
# nor is it set up to print to a file
#
# while (my ($key, $vlaue) = each (%gap_data)) {
# print "Indel size:\t$key\tTimes found:\t$gap_data{$key}\n";
# print "Positions:\n";
# print "$position{$key}";
# print "\n\n";
# }}
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>
