Good morning!
Thx for your help..

The problem seems to lay in "filling the hash".
But I can't see why.

I want to compare two fasta-files, more precisely the IDs of two sets of sequences.
Each file looks like:


>gi|13699918|dbj|BAB41217.1|.....
MSEKEIWEKVLEIAQEKLSAVSYSTFLKDTELYTIKDGEAIVLSSIPFNANWLNQQYAEIIQAILFDVVG
YEVKPHFITTEELANYSNNETATPKETTKPSTETTEDNHVLGREQFNAHNTFDTFVIGPGNRFPHAASLA
VAEAPAKAYNPLFIYGGVGLGKTHLMHAIGHHVLDNNPDA......
>gi|13699919|dbj|BAB41218.1|....
QFQTLITSGHSEFNLSGLDPDQYPLLPQVSRDDAIQLSVKVLKNVIAQTNFAVSTSETRPVLTGVNWLIQ
ENELICTATDSHRLAVRKLQLEDVSENKNVIIPGKALAE....
>gi|13699920|dbj|BAB41219.1|.....
MIILVQEVVVEGDINLGQFLKTEGIIESGGQAKWFLQDVEVLINGVRETRRGKKLEHQDRIDIPELPEDA
GSFLIIHQGEQ
>gi|13699921|dbj|BAB41220.1|...
MKLNTLQLENYRNYDEVTLKCHPDVNILIGENAQGKTNLLESIYTLALAKSHRTSNDKELIRFNADYAKI
EGELSYRHGTMPLTMFITKKGKQVKVNHLEQSRLTQYIGHLNVVLFAPEDLNIVKGSPQIRRRFIDMELG
QISAVYLNDLAQYQRILKQKNNYLKQLQLGQ


The whole code is as follows:


#!/usr/local/bin/perl
#
###############################################################################
# add_IDs_by_pattern_matching
#
# Input: Two fastafiles.
# (Some sequences are in both files, but labeled with different IDS)
# Output: One fasta-file, sequences labeled with both IDs.
###############################################################################

use strict;
my $fastafilename1 = '';
my $fastafilename2 = '';
my %hash_fasta1 = '';
my %hash_fasta2 = '';
my $key_fasta1 = '';
my $key_fasta2 = '';
my $value_fasta1 = '';
my $value_fasta2 = '';
my %hash_pattern = '';
my $key_pattern = '';
my $value_pattern = '';


print "Please type the name of the first file:\n"; chomp ($fastafilename1 = <STDIN>); open(FASTAFILE1, $fastafilename1) || die("Cannot open file for reading: $!");

#read in the first file
#and filling two hashes
#values in hash_pattern later used for pattern-matching

my $k = 0;
my $i = 0;
while (<FASTAFILE1>) {
    if(/^>/) {
        chomp;
        $i =1;
        if ($k==1) {
            chomp;
            $hash_fasta1{$key_fasta1} = $value_fasta1;
            $value_fasta1 ='';
        }
        else {
        $key_fasta1 = $_;
        $key_pattern = $_;
        }
    }
    else {
        if ($i==0)  {
                chomp;
                $value_fasta1 = $value_fasta1 . $_;
                $k=1;
        }
        else    {
        $i = 0;
        chomp;
        $hash_pattern{$key_pattern} = $_;
        $value_fasta1 = $value_fasta1 . $_;
        $k=1;
        }
    }
}


$hash_fasta1{$key_fasta1} = $value_fasta1; delete $hash_fasta1 {''}; delete $hash_pattern {''};


close(FASTAFILE1) || die("Can't close in file: $!") ;


#read in the second file

print "Please type the name of the first file:\n\n";
chomp ($fastafilename2 = <STDIN>);
open(FASTAFILE2, $fastafilename2) ||
        die("Cannot open file for reading: $!");


my $j = 0; while (<FASTAFILE2>) { if(/^>/) { chomp; if ($j==1) { chomp; $hash_fasta2{$key_fasta2} = $value_fasta2; $value_fasta2 =''; } else { $key_fasta2 = $_; } } else { chomp; $value_fasta2 = $value_fasta2 . $_; $j=1; } }

$hash_fasta2{$key_fasta2} = $value_fasta2;
delete $hash_fasta2 {''};
close(FASTAFILE2) || die("Can't close in file: $!") ;

my $outputfile = '';

#open outputfile
$outputfile = "both_IDs";
unless (open(BOTH_IDS, ">$outputfile") ) {
print "Cannot open file \"$outputfile\" to write to!!\n\n";
exit;
}

my @array1 = keys %hash_fasta1;
my @array2 = keys %hash_fasta2;

##################################################################
# Because I only found one sequence, which is in both fasta-files,
# I tried to find out, if the hashes are correctly filled.
# So here I put the code as described** and got the different
# output. Rest of code:


my $key_hash1 = ''; my $key_hash2 = '';

#pattern-matching

foreach (@array1) {
$key_hash1 = $_;
foreach (@array2) {
$key_hash2 = $_;
if ($hash_fasta2{$key_hash2} =~ $hash_pattern{$key_hash1}) {
print "Pattern $hash_pattern{$key_hash1} is in $hash_fasta2{$key_hash2} \n";
print BOTH_IDS $key_hash1 . $key_hash2 . $hash_fasta2{$key_hash2};
}
}
}


#close outputfile
close (BOTH_IDS)  || die("Can't close in file: $!");



**
I get all four values with::

my @array = keys %hash;

print $hash{$array[0]};
print"\n";
print $hash{$array[1]};
print"\n";
print $hash{$array[2]};
print"\n";
print $hash{$array[3]};

With the code
foreach (keys %hash) {
    print $hash{$_};
    print "\n";}
I only get the value corresponding to $hash{$array[3]}.


James Edward Gray II wrote:


If you would like to post more of your code, I would be happy to take a look at it.

James

On Nov 11, 2003, at 2:27 PM, Christiane Nerz wrote:

jepp - all four are there..
I really don't understand it.

thx so far - I have to finish for today - my little baby-son is crying :-(

Jane


...


As near as I can tell, the above two chunks of code have identical effects. If you put the first chunk in the program EXACTLY where the foreach() loop is you see all of them?
James





-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to