Re: mod_perl for multi-process file processing?

2015-02-03 Thread Alan Raetz
Thanks for all the input. I was considering threads, but according to how I
read the perlthreadtut (
http://search.cpan.org/~rjbs/perl-5.18.4/pod/perlthrtut.pod#Threads_And_Data),
quote: When a new Perl thread is created, all the data associated with the
current thread is copied to the new thread, and is subsequently private to
that new thread So in my application, each thread would get the entire
memory footprint copied. So although the data is shared in terms of
application usage, in terms of physical memory limitations, I would quickly
use up the machine memory. If you're reading/writing files, this may not be
a significant difference, but with the way this app is structured now, I
think it may be.

One of my thoughts as far as reducing the impact of request overhead was to
bundle requests, such that a single request is getting multiple tasks.
​
I will look into some of these suggestions more, thanks again.


mod_perl for multi-process file processing?

2015-02-02 Thread Alan Raetz
So I have a perl application that upon startup loads about ten perl hashes
(some of them complex) from files. This takes up a few GB of memory and
about 5 minutes. It then iterates through some cases and reads from (never
writes) these perl hashes. To process all our cases, it takes about 3 hours
(millions of cases). We would like to speed up this process. I am thinking
this is an ideal application of mod_perl because it would allow multiple
processes but share memory.

The scheme would be to load the hashes on apache startup and have a master
program send requests with each case and apache children will use the
shared hashes.

I just want to verify some of the details about variable sharing.  Would
the following setup work (oversimplified, but you get the idea…):

In a file Data.pm, which I would use() in my Apache startup.pl, I would
load the perl hashes and have hash references that would be retrieved with
class methods:

package Data;

my %big_hash;

open(FILE,file.txt);

while ( FILE ) {

  … code ….

  $big_hash{ $key } = $value;
}

sub get_big_hashref {   return \%big_hash; }

snip

And so in the apache request handler, the code would be something like:

use Data.pm;

my $hashref = Data::get_big_hashref();

…. code to access $hashref data with request parameters…..

snip

The idea is the HTTP request/response will contain the relevant
input/output for each case… and the master client program will collect
these and concatentate the final output from all the requests.

So any issues/suggestions with this approach? I am facing a non-trivial
task of refactoring the existing code to work in this framework, so just
wanted to get some feedback before I invest more time into this...

I am planning on using mod_perl 2.07 on a linux machine.

Thanks in advance, Alan