Re: mod_perl for multi-process file processing?
Thanks for all the input. I was considering threads, but according to how I read the perlthreadtut ( http://search.cpan.org/~rjbs/perl-5.18.4/pod/perlthrtut.pod#Threads_And_Data), quote: "When a new Perl thread is created, all the data associated with the current thread is copied to the new thread, and is subsequently private to that new thread" So in my application, each thread would get the entire memory footprint copied. So although the data is "shared" in terms of application usage, in terms of physical memory limitations, I would quickly use up the machine memory. If you're reading/writing files, this may not be a significant difference, but with the way this app is structured now, I think it may be. One of my thoughts as far as reducing the impact of request overhead was to "bundle" requests, such that a single request is getting multiple tasks. I will look into some of these suggestions more, thanks again.
mod_perl for multi-process file processing?
So I have a perl application that upon startup loads about ten perl hashes (some of them complex) from files. This takes up a few GB of memory and about 5 minutes. It then iterates through some cases and reads from (never writes) these perl hashes. To process all our cases, it takes about 3 hours (millions of cases). We would like to speed up this process. I am thinking this is an ideal application of mod_perl because it would allow multiple processes but share memory. The scheme would be to load the hashes on apache startup and have a master program send requests with each case and apache children will use the shared hashes. I just want to verify some of the details about variable sharing. Would the following setup work (oversimplified, but you get the idea…): In a file Data.pm, which I would use() in my Apache startup.pl, I would load the perl hashes and have hash references that would be retrieved with class methods: package Data; my %big_hash; open(FILE,"file.txt"); while ( ) { … code …. $big_hash{ $key } = $value; } sub get_big_hashref { return \%big_hash; } And so in the apache request handler, the code would be something like: use Data.pm; my $hashref = Data::get_big_hashref(); …. code to access $hashref data with request parameters….. The idea is the HTTP request/response will contain the relevant input/output for each case… and the master client program will collect these and concatentate the final output from all the requests. So any issues/suggestions with this approach? I am facing a non-trivial task of refactoring the existing code to work in this framework, so just wanted to get some feedback before I invest more time into this... I am planning on using mod_perl 2.07 on a linux machine. Thanks in advance, Alan