At 10:48 AM +0200 6/9/11, venkates wrote:
Hi,

data snippet:


I need to retrieve all the gene entries to add it to a hash ref. My code does that in the first record but in the second case it also pulls out the REFERENCE information. I have provided the code below. If some one could tell me where exactly I am going wrong (is it in the regex? or otherwise) I would be glad!!

code :

use strict;
use warnings;
use Carp;
use Data::Dumper;


my $set = parse("/home/venkates/workspace/KEGG_Parser/data/ko");

sub parse {

    my $kegg_file_path = shift;
    my $keggData; # Hash ref

Please simplify your program for posting by using a hash instead of a hash reference. Your goal should be to make it as easy as possible for people to help you. Once you learn how to solve your problems, you can use the solution in your actual program with whatever complexity is necessary.


open my $fh, '<', $kegg_file_path or croak("Cannot open file '$kegg_file_path': $!");
    local $/ = "\n///\n";
    while (<$fh>){
        chomp;
        my $record = $_;


Why don't you just read into $record in the first place:

    while( my $record = <$fh> ) [


        $record =~ m/^ENTRY\s{7}(.+?)\s+/xms;
        my $entries = $1;
        if ($record =~ m/^GENES\s{7}(.+)$/xms){


You are capturing everything from just after GENES to the end of the record. Try putting in REFERENCE:

        if ($record =~ m/^GENES\s{7}(.+)REFERENCE/xms){


            my $gene = $1;
            ${$keggData}{$entries}{'GENE'} = $gene;
            my @genes = split ('\s{13}', $gene);
            foreach my $gene_element (@genes){
                my $taxon_label = substr($gene_element, 0, 3);
                my $gene_label = substr($gene_element, 5);
                my @gene_label_array = split '\s', $gene_label;
push @{${$keggData}{$entries}{'GENES'}{$taxon_label}}, @gene_label_array;
            }
        }

    }
    print Dumper($keggData);
    close $fh;
}

Please use the <DATA> file handle to make it easier to run your program. Put your file data at the end of the program after the line

__DATA__

then use <DATA> to read the data lines.

Thanks.

--
Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to