Unless I’m entirely wrong it appears that you want to use shared threaded
memory.
This would allow you to keep out of apache altogether.
Here is an example of using threads that I worked out using shared memory.
We took a 4 hour task, serial, and turned it into 5 minutes with threads.
This worked extremely well for me. I wasn’t using large hashes, but I was
hundreds of files, per thread, with 30k lines per file.
#!/usr/bin/env perl -w
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Deepcopy = 1;
use threads;
use threads::shared;
use constant MAX_TRIES => 5;
sub sub_threads($$$);
my $switch = undef;
my $hash = undef;
my $gsx = undef;
my $cnt = 5;
my %switches = ( 'A' => { 'b' => undef , 'c' => undef, 'd' => undef },
'B' => { 'b' => undef , 'c' => undef, 'd' => undef },
'C' => { 'b' => undef , 'c' => undef, 'd' => undef },
'D' => { 'b' => undef , 'c' => undef, 'd' => undef },
'E' => { 'b' => undef , 'c' => undef, 'd' => undef },
);
my %threads : shared = ();
######
## create the threads
#####
while (($switch,$hash) = each %switches) {
unless (exists $threads{$switch}) {
my %h : shared;
$threads{$switch} = \%h;
}
while (($gsx, $_) = each %$hash) {
unless (exists $threads{$switch}{$gsx}) {
my %h : shared;
$threads{$switch}{$gsx} = \%h;
}
unless (exists $threads{$switch}{$gsx}{'messages'}) {
my @h : shared;
$threads{$switch}->{$gsx}->{'messages'} = \@h;
}
$hash->{$gsx}->{'thread'} =
threads->create(\&sub_threads,\$switch,\$gsx,\$cnt);
$hash->{$gsx}->{'tries'} = 1;
$cnt += 5;
}
}
#print Dumper \%threads;
#print Dumper \%switches;
######
## endless loop to run while threads are still running
######
$cnt = 1;
while ($cnt) {
$cnt = 0;
while (($switch,$hash) = each %switches) {
while (($gsx, $_) = each %$hash) {
if ($hash->{$gsx}->{'thread'}->is_running()) {
$cnt = 1;
# print "$switch->$gsx is running\n";
} else {
# print "$switch->$gsx is NOT running\n";
# print " Reason for failure : \n";
# print ' ' . join('\n' ,
@{$threads{$switch}->{$gsx}->{'messages'}}) . "\n";
if ($hash->{$gsx}->{'tries'} < MAX_TRIES) {
# print " max tries not reached for $switch->$gsx, will be trying
again!\n";
$hash->{$gsx}->{'tries'}++;
$hash->{$gsx}->{'thread'} =
threads->create(\&sub_threads,\$switch,\$gsx,\$cnt);
} else {
print "send email! $switch->$gsx\n";
}
}
}
sleep 2;
}
}
#print Dumper \%threads;
#print Dumper \%switches;
sub sub_threads($$$) {
my $ptr_switch = shift;
my $ptr_gsx = shift;
my $ptr_tNum = shift;
sleep $$ptr_tNum;
{
lock(%threads);
push @{$threads{$$ptr_switch}->{$$ptr_gsx}->{'messages'}} , "Leaving thread
$$ptr_switch->$$ptr_gsx";
#$threads{$$ptr_switch}->{$ptr_gsx}->{'messages'} = "Leaving thread
$$ptr_switch->$$ptr_gsx";
# locke freed at end oc scope
}
return 0;
}
On Feb 2, 2015, at 10:11 PM, Alan Raetz
<[email protected]<mailto:[email protected]>> wrote:
So I have a perl application that upon startup loads about ten perl hashes
(some of them complex) from files. This takes up a few GB of memory and about 5
minutes. It then iterates through some cases and reads from (never writes)
these perl hashes. To process all our cases, it takes about 3 hours (millions
of cases). We would like to speed up this process. I am thinking this is an
ideal application of mod_perl because it would allow multiple processes but
share memory.
The scheme would be to load the hashes on apache startup and have a master
program send requests with each case and apache children will use the shared
hashes.
I just want to verify some of the details about variable sharing. Would the
following setup work (oversimplified, but you get the idea…):
In a file Data.pm, which I would use() in my Apache
startup.pl<http://startup.pl/>, I would load the perl hashes and have hash
references that would be retrieved with class methods:
package Data;
my %big_hash;
open(FILE,"file.txt");
while ( <FILE> ) {
… code ….
$big_hash{ $key } = $value;
}
sub get_big_hashref { return \%big_hash; }
<snip>
And so in the apache request handler, the code would be something like:
use Data.pm;
my $hashref = Data::get_big_hashref();
…. code to access $hashref data with request parameters…..
<snip>
The idea is the HTTP request/response will contain the relevant input/output
for each case… and the master client program will collect these and
concatentate the final output from all the requests.
So any issues/suggestions with this approach? I am facing a non-trivial task of
refactoring the existing code to work in this framework, so just wanted to get
some feedback before I invest more time into this...
I am planning on using mod_perl 2.07 on a linux machine.
Thanks in advance, Alan