Good morning! Thx for your help..
The problem seems to lay in "filling the hash". But I can't see why.
I want to compare two fasta-files, more precisely the IDs of two sets of sequences.
Each file looks like:
>gi|13699918|dbj|BAB41217.1|..... MSEKEIWEKVLEIAQEKLSAVSYSTFLKDTELYTIKDGEAIVLSSIPFNANWLNQQYAEIIQAILFDVVG YEVKPHFITTEELANYSNNETATPKETTKPSTETTEDNHVLGREQFNAHNTFDTFVIGPGNRFPHAASLA VAEAPAKAYNPLFIYGGVGLGKTHLMHAIGHHVLDNNPDA...... >gi|13699919|dbj|BAB41218.1|.... QFQTLITSGHSEFNLSGLDPDQYPLLPQVSRDDAIQLSVKVLKNVIAQTNFAVSTSETRPVLTGVNWLIQ ENELICTATDSHRLAVRKLQLEDVSENKNVIIPGKALAE.... >gi|13699920|dbj|BAB41219.1|..... MIILVQEVVVEGDINLGQFLKTEGIIESGGQAKWFLQDVEVLINGVRETRRGKKLEHQDRIDIPELPEDA GSFLIIHQGEQ >gi|13699921|dbj|BAB41220.1|... MKLNTLQLENYRNYDEVTLKCHPDVNILIGENAQGKTNLLESIYTLALAKSHRTSNDKELIRFNADYAKI EGELSYRHGTMPLTMFITKKGKQVKVNHLEQSRLTQYIGHLNVVLFAPEDLNIVKGSPQIRRRFIDMELG QISAVYLNDLAQYQRILKQKNNYLKQLQLGQ
The whole code is as follows:
#!/usr/local/bin/perl # ############################################################################### # add_IDs_by_pattern_matching # # Input: Two fastafiles. # (Some sequences are in both files, but labeled with different IDS) # Output: One fasta-file, sequences labeled with both IDs. ###############################################################################
use strict; my $fastafilename1 = ''; my $fastafilename2 = ''; my %hash_fasta1 = ''; my %hash_fasta2 = ''; my $key_fasta1 = ''; my $key_fasta2 = ''; my $value_fasta1 = ''; my $value_fasta2 = ''; my %hash_pattern = ''; my $key_pattern = ''; my $value_pattern = '';
print "Please type the name of the first file:\n"; chomp ($fastafilename1 = <STDIN>); open(FASTAFILE1, $fastafilename1) || die("Cannot open file for reading: $!");
#read in the first file #and filling two hashes #values in hash_pattern later used for pattern-matching
my $k = 0; my $i = 0; while (<FASTAFILE1>) { if(/^>/) { chomp; $i =1; if ($k==1) { chomp; $hash_fasta1{$key_fasta1} = $value_fasta1; $value_fasta1 =''; } else { $key_fasta1 = $_; $key_pattern = $_; } } else { if ($i==0) { chomp; $value_fasta1 = $value_fasta1 . $_; $k=1; } else { $i = 0; chomp; $hash_pattern{$key_pattern} = $_; $value_fasta1 = $value_fasta1 . $_; $k=1; } } }
$hash_fasta1{$key_fasta1} = $value_fasta1; delete $hash_fasta1 {''}; delete $hash_pattern {''};
close(FASTAFILE1) || die("Can't close in file: $!") ;
#read in the second file
print "Please type the name of the first file:\n\n"; chomp ($fastafilename2 = <STDIN>); open(FASTAFILE2, $fastafilename2) || die("Cannot open file for reading: $!");
my $j = 0; while (<FASTAFILE2>) { if(/^>/) { chomp; if ($j==1) { chomp; $hash_fasta2{$key_fasta2} = $value_fasta2; $value_fasta2 =''; } else { $key_fasta2 = $_; } } else { chomp; $value_fasta2 = $value_fasta2 . $_; $j=1; } }
$hash_fasta2{$key_fasta2} = $value_fasta2; delete $hash_fasta2 {''}; close(FASTAFILE2) || die("Can't close in file: $!") ;
my $outputfile = '';
#open outputfile $outputfile = "both_IDs"; unless (open(BOTH_IDS, ">$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n"; exit; }
my @array1 = keys %hash_fasta1; my @array2 = keys %hash_fasta2;
################################################################## # Because I only found one sequence, which is in both fasta-files, # I tried to find out, if the hashes are correctly filled. # So here I put the code as described** and got the different # output. Rest of code:
my $key_hash1 = ''; my $key_hash2 = '';
#pattern-matching
foreach (@array1) {
$key_hash1 = $_;
foreach (@array2) {
$key_hash2 = $_;
if ($hash_fasta2{$key_hash2} =~ $hash_pattern{$key_hash1}) {
print "Pattern $hash_pattern{$key_hash1} is in $hash_fasta2{$key_hash2} \n";
print BOTH_IDS $key_hash1 . $key_hash2 . $hash_fasta2{$key_hash2};
}
}
}
#close outputfile close (BOTH_IDS) || die("Can't close in file: $!");
** I get all four values with::
my @array = keys %hash;
print $hash{$array[0]}; print"\n"; print $hash{$array[1]}; print"\n"; print $hash{$array[2]}; print"\n"; print $hash{$array[3]};
With the code foreach (keys %hash) { print $hash{$_}; print "\n";} I only get the value corresponding to $hash{$array[3]}.
James Edward Gray II wrote:
If you would like to post more of your code, I would be happy to take a look at it.
James
On Nov 11, 2003, at 2:27 PM, Christiane Nerz wrote:
jepp - all four are there.. I really don't understand it.
thx so far - I have to finish for today - my little baby-son is crying :-(
Jane
...
As near as I can tell, the above two chunks of code have identical effects. If you put the first chunk in the program EXACTLY where the foreach() loop is you see all of them?
James
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]