So I have a perl application that upon startup loads about ten perl hashes
(some of them complex) from files. This takes up a few GB of memory and
about 5 minutes. It then iterates through some cases and reads from (never
writes) these perl hashes. To process all our cases, it takes about 3 hours
(millions of cases). We would like to speed up this process. I am thinking
this is an ideal application of mod_perl because it would allow multiple
processes but share memory.

The scheme would be to load the hashes on apache startup and have a master
program send requests with each case and apache children will use the
shared hashes.

I just want to verify some of the details about variable sharing.  Would
the following setup work (oversimplified, but you get the idea…):

In a file Data.pm, which I would use() in my Apache startup.pl, I would
load the perl hashes and have hash references that would be retrieved with
class methods:

package Data;

my %big_hash;

open(FILE,"file.txt");

while ( <FILE> ) {

      … code ….

      $big_hash{ $key } = $value;
}

sub get_big_hashref {   return \%big_hash; }

<snip>

And so in the apache request handler, the code would be something like:

use Data.pm;

my $hashref = Data::get_big_hashref();

…. code to access $hashref data with request parameters…..

<snip>

The idea is the HTTP request/response will contain the relevant
input/output for each case… and the master client program will collect
these and concatentate the final output from all the requests.

So any issues/suggestions with this approach? I am facing a non-trivial
task of refactoring the existing code to work in this framework, so just
wanted to get some feedback before I invest more time into this...

I am planning on using mod_perl 2.07 on a linux machine.

Thanks in advance, Alan

Reply via email to