Ok great. Most of what you show does make sense. However, there are some bits of code that I need further clarification with. Some bits I am able to tell what they are doing but I do not quite know how or why they work they way they do. I'll state these areas in the code we've got together at this point.

Hopefully, I have copied over the bits you wrote correctly. I find this is like learning Spanish. I can read and (roughly) get the gist of the code. But when it comes to writing the original code on my own is when I have trouble. I am sure this will go away when I practice more. :-)

I didn't finish everything because I just need some code explained / clarified.

>>>Start PERL code<<<<<

#!usr/bin/perl -w

use strict;
use FileHandle;

# I am unsure of what this module is. I've tried looking it up
# in the Camel and Llama book to no avail, not enough description.
# I guess I have to figure out the whole object thing?

my %organisms;

print "Enter in a list of files to be processed:\n";

# For example:
# Cytb.fasta
# NADH1.fasta
# ...

# chomp (my @infiles = <STDIN>);
# TODO we should make this nicer later
my @infiles = ('genetics.txt');

foreach my $infile(@infiles) {
        my $FASTA = new FileHandle;
        
        # Does the above statement tell PERL to create a new
        # filehandle for each file it finds? I guess I need to understand
        # what "new" and the module "FileHandle" are doing.

        open ($FASTA, $infile)
                or die "Can't open INFILE:$!";
                
#$/='>' #Set input operator

my $orgID;

while (defined($_ = <$FASTA>)) {
        
        # Above I am unsure of why the "defined function
        # helps us here? I know it has something to do with an
        # expression containing a valid string, but I am unsure
        # of it's function here. This is something I would have
        # never thought to do.  :-)
        
        chomp;
        print "\nworking on >>$_<<\n";
        
        if (\s*>(\w+)/) {
                $orgID=$1;
                print "Found a new organism start line ('$orgID')\n";
        
        # The above regex makes complete sense. Actually, I was going to put
        # something similar to that in my original post but wasn't sure
        # if this was appropriate at the time. I guess it was!
                
        } else {
                print "This is just some data: $_\n";
                print "This data needs to be appended to the hash entry for $orgID/n";

                # okay, in the above you are taking the left over
                # sequence ($_) and linking it as a "value" to "$orgID" ?
                
                if (exists ($organsims{$orgID})) {
                #TODO append the data to the hash here
                
                # I guess I would put the following to append to
                # the already existing hash:
                # $organism{$orgID} .= $_;
                
                } else {
                        #create new hash entry for this data
                        $organsims{$orgID} = $_;
                        }
                }       
        }
        
# Do not forget to close the input file
close ($FASTA)
        or die "Could not close INFILE: $!";

# We've processed all input files... print the resulting hash

print "\n*****************************************************\n";

while (my($orgID, $sequence) = each(%organisms)) {
        # since I want the output as:
        # >cat
        # actgac---cgatc-ag-cttag---acg
        # >dog
        # actatc---actat-at-accta---atc
        # I would change the print statement to:
        print "> . $orgID\n $sequence\n";
}

end;

>>>end PERL code<<<

Thanks for all your help so far! Most of this is starting help my thinking. I will be doing a lot more of this multi-file parsing as most of my work entails manipulating data in several files or folders at once.

-Mike


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to