subroutine problem
I have written a rather simplistic script so I can get used to LWP::Simple etc... Anyway I am using a subroutine to get and print data from a website. I have gotten it to work except for the fact that the first iteration of the subroutine uses no data at all, yet after that it works fine. I know it has to do with how I am passing the data into the subroutine. The output is as follows (the perl code that was used is below too): Begin OutPut Fetching # See it is fetching nothing Appending to fetched_ncbi_sequences.txt. # Appending nothing Fetching cv889431 # It works fine from this point on Appending cv889431 to fetched_ncbi_sequences.txt. Fetching cv889432 Appending cv889432 to fetched_ncbi_sequences.txt. Fetching cv889433 Appending cv889433 to fetched_ncbi_sequences.txt. Fetching cv889434 Appending cv889434 to fetched_ncbi_sequences.txt. Fetching cv889435 Appending cv889435 to fetched_ncbi_sequences.txt. Fetching cv889436 Appending cv889436 to fetched_ncbi_sequences.txt. Fetching cv889437 Appending cv889437 to fetched_ncbi_sequences.txt. Fetching cv889438 Appending cv889438 to fetched_ncbi_sequences.txt. Fetching cv889439 Appending cv889439 to fetched_ncbi_sequences.txt. Fetching cv889440 Appending cv889440 to fetched_ncbi_sequences.txt. Fetching cv889441 Appending cv889441 to fetched_ncbi_sequences.txt. **Finished** /End OutPut The script is as follows: Begin code #!usr/bin/perl -w use strict; use LWP::Simple; open(FASTA, fetched_ncbi_sequences.txt) or die Cannot open FASTA file: $!; print \n\t**Welcome to Mike Robeson's NCBI-fetch Script!**\n A - Just enter in the accession numbers of the sequence data you wish to pull from genbank individually (e.g. cv889410) or by defining a range (cv889431-cv889441). Hit enter after each entry or entry range.\n B - When finished, hit enter one last time and press ctrl-d.\n C - All sequence data will be downloaded into one file in FASTA format (e.g. fetched_ncbi_sequences.txt). \n\n; print Enter a list of Sequence IDs to fetch:\n; chomp (my @list = ARGV); printSequence; foreach my $id (@list) { if ($id =~ s/([a-z]*)(\d+)-[a-z]*(\d+)//) { my @range = split(/-/,$id); my $init_range_letters = $1; my $init_range_num = $2; my $term_range_num = $3; for (my $count = $init_range_num; $count=$term_range_num; $count++) { my $genbank = $init_range_letters.$count; printSequence($genbank); } } else { printSequence($id); } } print \n\n**Finished**\n\n; sub printSequence { my $accession = @_; print Fetching $accession \n; my $data = get(http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi? cmd=txt=onsave=cfm=term=list_uids=$accessiondb=nucleotideextrafea t=16view=fastadispmax=20SendTo=t__from=__to=__strand=); print FASTA $data; print Appending \$accession\ to \fetched_ncbi_sequences.txt\\.\n; } \End Code I have been trying to figure out why this is occurring and have remained stumped for 3 hours now and I can't figure out what is going on. Any suggestions? -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: counting gaps in sequence data
Errin, Thanks so much! I will spend the weekend going over what you've posted. Looks like I will learn a lot from this post alone. This stuff is so addictive. I can spend hours doing this and not realize it. If I am successful or not is another story! :-) I'll definitely let you know if I have any trouble. -Cheers! -Mike On Oct 15, 2004, at 7:22 AM, Errin Larsen wrote: On Thu, 14 Oct 2004 16:11:42 -0600, Michael Robeson [EMAIL PROTECTED] wrote: Yeah, I have just submitted that same question verbatim to the bio-perl list. I am still running through some ideas though. I have both Bioinformatics perl books. They are not very effective teaching books. The books spend too much time on using modules. Though while I understand the usefulness of not having to re-write code, it is a bad idea for beginners like me. Because re-writing code at first gives me a lot of practice. Some of the scripts in the books use like 3-5 modules, so it gets confusing on what is going on. I mean the books are not useless, but they definitely are structured for a class with a teacher. :-) -Mike Hi again, Mike! I've thrown together the following code. I have not commented this! If you have some questions, just ask. I hard coded the sequences for my ease-of-use. It looked to me like you have figured out how to grab the sequences out of a file and throw them in a hash. This code uses some deep nested references, and therefore, some crazy dereferences. Have fun with it, I know I did! Things that might look weird: check out perldoc -f split for info on using a null-string to split with (That's were I found it!) and of course perldoc perlref for all the deep nested references and dereferencing stuff! I'm currently reading Learning Perl Objects, References Modules by Randal Schwartz. I highly recommend it. It helped a lot in this exercise. Here's the code: use warnings; use strict; my %sequences = ( 'Human' = acgtt---cgatacg---acgact-t, 'Chimp' = acgtt---cgatacg---acgact-t, 'Mouse' = acgata---acgatcgacgt, ); my %results; foreach my $species( keys %sequences ) { my $is_base_pair_gap = 0; my $base_pair_gap; my $base_pair_gap_pos; my $position = 1; foreach( split( / */, $sequences{$species} )) { if( /-/ ) { unless( $is_base_pair_gap ) { $base_pair_gap_pos = $position; } $is_base_pair_gap = 1; $base_pair_gap .= $_; } elsif( $is_base_pair_gap ) { push @{$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos; $is_base_pair_gap = 0; $base_pair_gap = undef; } $position++; } } foreach my $species( keys %results ) { print $species:\n; foreach my $base_pair_gap( keys %{$results{$species}} ) { printNumber of $base_pair_gap base pair gaps:\t, scalar( @{$results{$species}{$base_pair_gap}}), \n; print at position(s) , join( ',', @{$results{$species}{$base_pair_gap}} ), .\n; } print \n; } The heart of this code is this line: push @{$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos; there is a %results hash which has keys that are the different species, and values that point to another hash. THAT hash (the inner hash) has keys that are the length of the base-pair-gaps, and values that point to an array. The array holds a list of the positions of those base-pair gaps! The first base pair gap in the human sequence is '---' at the 6th character. That looks like this (warning: pseudo code for clarity!) %results-{'Human'}-{ 3 }-[6] When we find the second '---' gap, we add it's position to the array: %results-{'Human'}-{ 3 }-[6,16] Then, we find a new base-pair-gap ('-') so we add a new key to inner hash: %results-{'Human'}-{ 3 }-[6,16] -{ 5 }-[25] Next, we move on to the next species ... %results-{'Human'}-{ 3 }-[6,16] -{ 5 }-[25] -{'Mouse'}-{ 3 }-[7] So, finally, with Data::Dumper, we can see the %results hash when the code is done processing the sequence: %results = { 'Human' = { '3' = [ 6, 16 ], '5' = [ 25 ] }, 'Mouse' = { '4' = [ 17 ], '3' = [ 7 ] }, 'Chimp' = { '3' = [ 6, 16
counting gaps in sequence data
I have a set of data that looks something like the following: human acgtt---cgatacg---acgact-t chimp acgtacgatac---actgca---ac mouse acgata---acgatcgacgt I am having trouble setting up a hash etc., to count the number and types of continuous gaps. For example the 'human' sequence above has 2 sets of 3 gaps and 1 set of 5 gaps. The 'chimp' has 2 sets of 3 gaps and finally the 'mouse' has 1 set of 3 gaps and 1 set of 4 gaps. So, I am having trouble being able to assign a dynamic variable (i.e. gap length) and place that in a pattern match so that it can count how many gaps of that length are in that particular sequence. I know how to set up a hash to count the number of times a gap appears: '$gaptype{$gap}++' or something. The problem is: what is the best way (and how) can I set '$gap' to be dynamic. I need to know the length of each consecutive string of gaps. I know how to count the gaps by using the 'tr' function. But it gets confusing when I need to add counts to every instance of that gap length. I also need to know the position of each gap (denoted by the position of the first gap in that particular instance). I know that I can use the 'pos()' command for this. So, my problem is that I think I know some of the bits of code to put into place the problem is I am getting lost on how to structure it all together. For now I am just trying to get my output to look like this: Human number of 3 base pair gaps: 2 at positions: 6, 16 number of 5 base pair gaps: 1 at positions: 25 Chimp and so on ... So, any suggestions would be greatly appreciated. If anyone can help me out with all or even just bits of this I would greatly appreciate it. This should help me get started on some more advanced parsing I need to do after this. I like to try and figure things out on my own if I can, so even pseudo code would be of great help! -Thanks -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: counting gaps in sequence data
Here is as far as I can get with some real code and pseudo code. This is just to show you that I am actually trying. :-) Pseudo - code # open DNA sequence file print Enter in the name of the DNA sequence file:\n; chomp (my $dna_seq = STDIN); open(DNA_SEQ, $dna_seq) or die Can't open sequence file for input: $!; # read sequence data into a hash - not sure if this is how I should do it? my %sequences; $/ = ''; # set to read in paragraph mode print \n***Discovered the following DNA sequences:***\n; while ( DNA_SEQ ) { chomp; next unless s/^\s*(.+)//; my $name = $1; s/\s//g; $sequences{$name} = $_; print $name found!\n; } close DNA_SEQ; ### # search for and characterize gaps ### somehow get data from hash and present it to a loop %gaptype; major pseudo code below foreach /\D(-+)\D/ found in each sequece # searches for gaps flanked by sequence $position = pos($1); $gaplength = $1; $gaplength =~ tr/-//g; # count the number of '-' for that particular # gap being processed $gaptype{gaplength}++; # count the number of times each gap type appears somehow get information from loop an print as seen below OUTPUT_ Human number of 3 base pair gaps: 2 at positions: 6, 16 number of 5 base pair gaps: 1 at positions: 25 . .. and so on... . __DATA__ human acgtt---cgatacg---acgact-t chimp acgtacgatac---actgca---ac mouse acgata---acgatcgacgt -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: counting gaps in sequence data
Yeah, I have just submitted that same question verbatim to the bio-perl list. I am still running through some ideas though. I have both Bioinformatics perl books. They are not very effective teaching books. The books spend too much time on using modules. Though while I understand the usefulness of not having to re-write code, it is a bad idea for beginners like me. Because re-writing code at first gives me a lot of practice. Some of the scripts in the books use like 3-5 modules, so it gets confusing on what is going on. I mean the books are not useless, but they definitely are structured for a class with a teacher. :-) -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Moving between hashes 2.
Gunnar, Thanks so much for the help and the links! They help quit a bit. I decided to use the if statement you posted: if ( $aa eq '-' ) { $hash3{$_} .= '---'; } else { $hash3{$_} .= substr $dna,0,3,''; } instead of: $hash3{$_} .= $aa eq '-' ? '---' : substr $dna,0,3,''; only because I had to add a $count++ function within the else statement (shown below) to accomplish another task within my larger script: if ( $aa eq '-' ) { $hash3{$_} .= '---'; } else { $hash3{$_} .= substr $dna,0,3,''; $count++ } I couldn't figure out if it was possible to add $count++ within the ?: statement above. I tried but could not get it to work. However, everything works well at this point. Again, I really appreciate the help! -Mike On Sep 20, 2004, at 6:55 PM, [EMAIL PROTECTED] wrote: From: Gunnar Hjalmarsson [EMAIL PROTECTED] Date: September 19, 2004 9:12:32 PM MDT To: [EMAIL PROTECTED] Subject: Re: Moving between hashes 2. Michael S. Robeson II wrote: Ok, well I think I can see the forest but I have little idea as to what is actually going on here. I spent a few hours looking things up and I have a general sense of what is actually occurring but I am getting lost in the details that were posted in the last digest. Well, before an attempt to explain and/or point you to the applicable docs, I'd like to change my mind once again. :) This is my latest idea: my %hash3; for ( keys %hash1 ) { my $dna = $hash2{$_}; for my $aa ( split //, $hash1{$_} ) { $hash3{$_} .= $aa eq '-' ? '---' : substr $dna,0,3,''; } } I'll assume that you don't have a problem with the outer loop, that simply iterates over the hash keys. As a first step in each iteration I copy the DNA sequence to the $dna variable, so as to not destroying %hash2. Over to the 'tricky' part. The inner loop iterates over each character in the amino-acid sequence data, and respective character is assigned to $aa. For that I use the split() function: http://www.perldoc.com/perl5.8.4/pod/func/split.html $hash3{$_} .= $aa eq '-' ? '---' : substr $hash2{$_},0,3,''; This is something new to me. I think I follow your use of the ?: pattern feature. However, none of the perl books I have discuss it's use in this fashion. That sounds strange to me, because that's how it should be used... Read about the conditional operator in http://www.perldoc.com/perl5.8.4/pod/perlop.html OTOH, that notation is basically the same as: if ( $aa eq '-' ) { $hash3{$_} .= '---'; } else { $hash3{$_} .= substr $dna,0,3,''; } which is a little more intuitive (at least I think it is). So, as far as I can tell, you are saying: hey, if you find '-' in $aa then append a '---' in $hash3, otherwise append the next three DNA letters. Precisely. However, I do not understand the syntax of how perl is actually doing this. Hopefully the if/else statement makes it easier to grasp, and the '.=' operator is used just for appending something to a string. Finally we have my use of the substr() function. http://www.perldoc.com/perl5.8.4/pod/func/substr.html It returns the first three characters in $dna, and since I also pass the null string as the fourth argument, it changes the content of $dna at the same time, i.e. it replaces the first three characters with nothing. HTH. If you need further explanations, you'll have to ask specific questions. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Moving between hashes 2.
x-tad-bigger**Sorry, if this is a repeat. Wasn't sure if the mail went through. If you already replied can you re-send it to my e-mail address above as well? Thanks!*** I have two sets of data that have been stored in hashes. The first hash /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger has amino-acid (protein) sequence data. The second hash has the /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger corresponding DNA sequence of those amino-acids: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger Hash 1 /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger key: value: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger cat = mfgdhf /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger doq = mfg--f /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger mouse = mf-d-f /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger Hash 2 /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger key: value: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger cat = agtcatgcacactgatcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger dog = agtcatgcatcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger mouse = agtcatcactcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger And I need to insert gaps (missing or absent data) proportionally into /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger the DNA sequence (Hash 2) so that the output is as follows: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger Hash 3 /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger key: value: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger cat = agtcatgcacactgatcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger dog = agtcatgca--tcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger mouse = agtca---tca---ctcg /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger It doesn't look right here, but all the lines should end up being the /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger same length with courier font. Basically, I am having trouble scanning /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger though, say... hash1{cat} and for every dash found there being /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger finally represented as three dashes in hash2{cat}. Also, every /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger amino-acid is represented by 3 DNA letters. This is why I need to move /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger in increments of 3 and add in increments of 3 for my final data to /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger appear as it does in Hash 3. /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger Example of relationship: /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger M F DF = amino-acid /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger agt tca --- act --- tcg = dna /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger I have everything else set up I just need a few suggestions on how to /x-tad-biggerx-tad-bigger /x-tad-biggerx-tad-bigger do the above. Any help will be greatly appreciated. /x-tad-biggerx-tad-bigger /x-tad-biggerinline: spacer.gifx-tad-bigger /x-tad-bigger -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: combining data from more than one file...
Thanks to those that helped. The code works great. Now I will practice one honing it down to the bare essentials. Below is the final code you all helped with. -Thanks a million! -Mike Begin PERL Code #! /usr/bin/perl -w use strict; use FileHandle; my %organisms; print Enter in a list of files to be processed:\n; # For example: # CytB.fasta # NADH1.fasta # chomp (my @infiles = STDIN); # TODO we should make this nice later #my @infiles = ('genetics.txt'); print Enter in the name of the OUTFILE:\n; chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE: $!; foreach my $infile (@infiles) { my $FASTA = new FileHandle; open ($FASTA, $infile) or die Can't open INFILE: $!; # I moved this variable outside the while-loop # in order to be able to assign the data in # the nextline to the organism it belongs to # (we're keeping track of the last start line # that we came across here) my $orgID; while (defined($_ = $FASTA)) { chomp; print \nWorking on $_\n; # see if this line is the start of an # organism; the thing we´re searching for # looks like this: # dog # so try to match something like # \s* zero-to-many characters of # optional whitespace #the bigger-than sign # \w+ one-to-many (word) characters # the parenthesis around the \w+ means that # we want to access this value later using $1 if (/\s*(\w+)/) { $orgID = $1; print Found a new organism start line ('$orgID')\n; } # or just some data belonging to the last # organism we found else { print Sequence data found: $_\n; print Appending data to $orgID\n; # let´s check if we´ve got data for this entry if (exists ($organisms{$orgID})) { # TODO append the data to the hash here $organisms{$orgID} .= $_; } else { # create a new hash entry for this data $organisms{$orgID} = $_; } } } # do not forget to close the input file close ($FASTA) or die could not close INFILE : $!; } # we've processed all input files...print the resulting hash print \n\n; while (my ($orgID, $sequence) = each(%organisms)) { print OUTFILE $orgID\n$sequence\n\n; } END PERL CODE -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: combining data from more than one file...
Sorry, I meant to upload this script (see below). However, I have one last question. Why can't I use s/\n//g;# instead of tr/A-Za-z-//cd; in the script below? I thought it would be simpler to remove the newline characters from $_ which is all I really want to do. However, most of the time all I will see are - and letters which is why I set the tr function the way I did. I just couldn't figure out why the substitution function wouldn't work in this case. How am I setting it up wrong? -Thanks -Mike BEGIN PERL SCRIPT #! /usr/bin/perl -w use strict; use FileHandle; my %organisms; print Enter in a list of files to be processed:\n; # For example: # CytB.fasta # NADH1.fasta # chomp (my @infiles = STDIN); print Enter in the name of the OUTFILE:\n; chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE: $!; foreach my $infile (@infiles) { my $FASTA = new FileHandle; open ($FASTA, $infile) or die Can't open INFILE: $!; my $orgID; while (defined($_ = $FASTA)) { chomp; print \n Processing $_\n; if (/\s*(\w+)/) { $orgID = $1; print Found a new organism start line ('$orgID')\n; } else { tr/A-Za-z-//cd; # originally tried s/\n//g; print Sequence data found: $_\n; print Appending data to $orgID\n; $organisms{$orgID} .= $_; } } # do not forget to close the input file close ($FASTA) or die could not close INFILE : $!; } # we've processed all input files...print the resulting hash print \n\n; while (my ($orgID, $sequence) = each(%organisms)) { print OUTFILE $orgID\n$sequence\n\n; } END PERL SCRIPT -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: combining data from more than one file...
Ok great. Most of what you show does make sense. However, there are some bits of code that I need further clarification with. Some bits I am able to tell what they are doing but I do not quite know how or why they work they way they do. I'll state these areas in the code we've got together at this point. Hopefully, I have copied over the bits you wrote correctly. I find this is like learning Spanish. I can read and (roughly) get the gist of the code. But when it comes to writing the original code on my own is when I have trouble. I am sure this will go away when I practice more. :-) I didn't finish everything because I just need some code explained / clarified. Start PERL code #!usr/bin/perl -w use strict; use FileHandle; # I am unsure of what this module is. I've tried looking it up # in the Camel and Llama book to no avail, not enough description. # I guess I have to figure out the whole object thing? my %organisms; print Enter in a list of files to be processed:\n; # For example: # Cytb.fasta # NADH1.fasta # ... # chomp (my @infiles = STDIN); # TODO we should make this nicer later my @infiles = ('genetics.txt'); foreach my $infile(@infiles) { my $FASTA = new FileHandle; # Does the above statement tell PERL to create a new # filehandle for each file it finds? I guess I need to understand # what new and the module FileHandle are doing. open ($FASTA, $infile) or die Can't open INFILE:$!; #$/='' #Set input operator my $orgID; while (defined($_ = $FASTA)) { # Above I am unsure of why the defined function # helps us here? I know it has something to do with an # expression containing a valid string, but I am unsure # of it's function here. This is something I would have # never thought to do. :-) chomp; print \nworking on $_\n; if (\s*(\w+)/) { $orgID=$1; print Found a new organism start line ('$orgID')\n; # The above regex makes complete sense. Actually, I was going to put # something similar to that in my original post but wasn't sure # if this was appropriate at the time. I guess it was! } else { print This is just some data: $_\n; print This data needs to be appended to the hash entry for $orgID/n; # okay, in the above you are taking the left over # sequence ($_) and linking it as a value to $orgID ? if (exists ($organsims{$orgID})) { #TODO append the data to the hash here # I guess I would put the following to append to # the already existing hash: # $organism{$orgID} .= $_; } else { #create new hash entry for this data $organsims{$orgID} = $_; } } } # Do not forget to close the input file close ($FASTA) or die Could not close INFILE: $!; # We've processed all input files... print the resulting hash print \n*\n; while (my($orgID, $sequence) = each(%organisms)) { # since I want the output as: # cat # actgac---cgatc-ag-cttag---acg # dog # actatc---actat-at-accta---atc # I would change the print statement to: print . $orgID\n $sequence\n; } end; end PERL code Thanks for all your help so far! Most of this is starting help my thinking. I will be doing a lot more of this multi-file parsing as most of my work entails manipulating data in several files or folders at once. -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response