venkates wrote:
Hi,

Hello,

This is a snippet of the data

ENTRY K00001 KO
NAME E1.1.1.1, adh
DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
ko00350 Tyrosine metabolism
ko00625 Chloroalkane and chloroalkene degradation
ko00626 Naphthalene degradation
ko00830 Retinol metabolism
ko00980 Metabolism of xenobiotics by cytochrome P450
ko00982 Drug metabolism - cytochrome P450
///
ENTRY K14865 KO
NAME U14snoRNA, snR128
DEFINITION U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///

I am trying to store this in the following data structure by splitting
the file along the "///" and have each record in a hash with primary key
as the ENTRY number and storing all the other info under that key :

$VAR1 = {
K00001 => {
'NAME' => [
'E1.1.1.1',
'adh'
],
'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]',
'PATHWAY' => {
'ko00010' => 'Glycolysis / Gluconeogenesis',
'ko00071' => 'Fatty acid metabolism'
}

I have started off with the following code:

sub parse{
    my $kegg_file_path = shift;
    my %keggData;
    open my $fh, '<', $kegg_file_path || croak ("Cannot open file '$kegg_file_path': 
$!");

Because of the high precedence of the || operator that will only croak() if the value of $kegg_file_path is FALSE, not if the file cannot be opened. You need to either use parentheses with open:

open( my $fh, '<', $kegg_file_path ) || croak( "Cannot open file '$kegg_file_path': $!" );

Or use the low precedence or operator:

open my $fh, '<', $kegg_file_path or croak( "Cannot open file '$kegg_file_path': $!" );


    my $contents = do{local $/, <$fh>};
    my @dataArray = split ('///', $contents);
    foreach my $currentLine (@dataArray){

That would probably be better as:

    local $/ = "///\n";
    while ( <$fh> ) {

Why read the whole file in when you are only processing one record at a time.


        if ($currentLine =~ /^ENTRY\s{7}(.+?)\s+/){

Because you are splitting on '///' the records will start with "\nEntry" and /^ENTRY/ will only match if 'ENTRY' is at the beginning of the string, not "\nEntry".


            my $value = $1;
            $keggData{'ENTRY'} = $value;

You don't show a key of 'ENTRY' in your desired data structure.


        }
    }
    print Dumper(%keggData);

That is usually written as:

    print Dumper( \%keggData );


    close $fh;
}



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to