It's hard to suggest improvement seeing only a fragment of entire code for, but I would probably expect your code to be slow as you have 5 nested for each loops:
while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) ... foreach my $file ( @data_files ) ... while ( my $data = <$FH> ) ... foreach my $up_acs ( keys %{$up_map} ) ... foreach my $ensemble_id (@{$up_map->{$up_acs}{'Ensembl_TRS'}} ) ... I am not entirely sure how big could be your keys %{$up_map} and @{$up_map->{$up_acs}{'Ensembl_TRS'}} but it's obvious by nesting so deeply you increase complexity and dramatically the total number of iterations. Here I found a good explanation: http://pages.cs.wisc.edu/~vernon/cs367/notes/3.COMPLEXITY.html See nested loops. Maybe try to simplify your code to reduce the number of nested loops if possible. On 11/06/2012 15:31, "venkates" <venka...@nt.ntnu.no> wrote: >Hi all, > >I am trying to filter files from a directory (code provided below) by >comparing the contents of each file with a hash ref (a parsed id map >file provided as an argument). The code is working however, is extremely >slow. The .csv files (81 files) that I am reading are not very large >(largest file is 183,258 bytes). I would appreciate if you could >suggest improvements to the code. > >sub filter { > my ( $pazar_dir_path, $up_map, $output ) = @_; > croak "Not enough arguments! " if ( @_ < 3 ); > > my $accepted = 0; > my $rejected = 0; > > opendir DH, $pazar_dir_path or croak ("Error in opening directory >'$pazar_dir_path': $!"); > open my $OUT, '>', $output or croak ("Cannot open file for writing >'$output': $!"); > while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) { > my @records; > foreach my $file ( @data_files ) { > open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot >open file '$file': $!"); > while ( my $data = <$FH> ) { > chomp $data; > my $record_output; > @records = split /\t/, $data; > foreach my $up_acs ( keys %{$up_map} ) { > foreach my $ensemble_id ( >@{$up_map->{$up_acs}{'Ensembl_TRS'}} ){ > if ( $records[1] eq $ensemble_id ) { > $record_output = join( "\t", @records ); > print $OUT "$record_output\n"; > $accepted++; > } > else { > $rejected++; > next; > } > } > } > } > close $FH; > } > } > close $OUT; > closedir (DH); > print "accepted records: $accepted\n, rejected records: $rejected\n"; > return $output; >} > >__DATA__ > >TF0000210 ENSMUST00000001326 SP1_MOUSE GS0000422 >ENSMUSG00000037974 7 148974877 149005136 Mus musculus >MUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY >(EMSA)::SUPERSHIFT >TF0000211 ENSMUST00000066003 SP3_MOUSE GS0000422 >ENSMUSG00000037974 7 148974877 149005136 Mus musculus >MUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY >(EMSA)::SUPERSHIFT > > >Thanks a lot, > >Aravind > >-- >To unsubscribe, e-mail: beginners-unsubscr...@perl.org >For additional commands, e-mail: beginners-h...@perl.org >http://learn.perl.org/ > > > ----------------------------------------------------------------------------------------------------------------------------------------- LOVEFiLM UK Limited is a company registered in England and Wales. Registered Number: 06528297. Registered Office: No.9, 6 Portal Way, London W3 6RU, United Kingdom. This e-mail is confidential to the ordinary user of the e-mail address to which it was addressed. If you have received it in error, please delete it from your system and notify the sender immediately. This email message has been delivered safely and archived online by Mimecast. For more information please visit http://www.mimecast.co.uk -----------------------------------------------------------------------------------------------------------------------------------------