venkates wrote:
Hi,
Hello,
This is a snippet of the data
ENTRY K00001 KO
NAME E1.1.1.1, adh
DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
ko00350 Tyrosine metabolism
ko00625 Chloroalkane and chloroalkene degradation
ko00626 Naphthalene degradation
ko00830 Retinol metabolism
ko00980 Metabolism of xenobiotics by cytochrome P450
ko00982 Drug metabolism - cytochrome P450
///
ENTRY K14865 KO
NAME U14snoRNA, snR128
DEFINITION U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///
I am trying to store this in the following data structure by splitting
the file along the "///" and have each record in a hash with primary key
as the ENTRY number and storing all the other info under that key :
$VAR1 = {
K00001 => {
'NAME' => [
'E1.1.1.1',
'adh'
],
'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]',
'PATHWAY' => {
'ko00010' => 'Glycolysis / Gluconeogenesis',
'ko00071' => 'Fatty acid metabolism'
}
I have started off with the following code:
sub parse{
my $kegg_file_path = shift;
my %keggData;
open my $fh, '<', $kegg_file_path || croak ("Cannot open file '$kegg_file_path':
$!");
Because of the high precedence of the || operator that will only croak()
if the value of $kegg_file_path is FALSE, not if the file cannot be
opened. You need to either use parentheses with open:
open( my $fh, '<', $kegg_file_path ) || croak( "Cannot open file
'$kegg_file_path': $!" );
Or use the low precedence or operator:
open my $fh, '<', $kegg_file_path or croak( "Cannot open file
'$kegg_file_path': $!" );
my $contents = do{local $/, <$fh>};
my @dataArray = split ('///', $contents);
foreach my $currentLine (@dataArray){
That would probably be better as:
local $/ = "///\n";
while ( <$fh> ) {
Why read the whole file in when you are only processing one record at a
time.
if ($currentLine =~ /^ENTRY\s{7}(.+?)\s+/){
Because you are splitting on '///' the records will start with "\nEntry"
and /^ENTRY/ will only match if 'ENTRY' is at the beginning of the
string, not "\nEntry".
my $value = $1;
$keggData{'ENTRY'} = $value;
You don't show a key of 'ENTRY' in your desired data structure.
}
}
print Dumper(%keggData);
That is usually written as:
print Dumper( \%keggData );
close $fh;
}
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/