I agree, either threads or Parallel::ForkManager, depending on your platform and your perl, will be a lot faster than mod_perl for this. Of course there might be other reasons to use mod_perl, e.g. it's useful to have this available as a remote service, or you want to call this frequently for small jobs throughout the day without needing to reload the data.
- Perrin On Tue, Feb 3, 2015 at 7:42 AM, Patton, Billy N <billy.pat...@h3net.com> wrote: > Unless I’m entirely wrong it appears that you want to use shared threaded > memory. > This would allow you to keep out of apache altogether. > Here is an example of using threads that I worked out using shared memory. > We took a 4 hour task, serial, and turned it into 5 minutes with threads. > This worked extremely well for me. I wasn’t using large hashes, but I was > hundreds of files, per thread, with 30k lines per file. > #!/usr/bin/env perl -w > use strict; > use warnings; > use Data::Dumper; > $Data::Dumper::Indent = 1; > $Data::Dumper::Sortkeys = 1; > $Data::Dumper::Deepcopy = 1; > use threads; > use threads::shared; > use constant MAX_TRIES => 5; > > sub sub_threads($$$); > > my $switch = undef; > my $hash = undef; > my $gsx = undef; > my $cnt = 5; > my %switches = ( 'A' => { 'b' => undef , 'c' => undef, 'd' => undef }, > 'B' => { 'b' => undef , 'c' => undef, 'd' => undef }, > 'C' => { 'b' => undef , 'c' => undef, 'd' => undef }, > 'D' => { 'b' => undef , 'c' => undef, 'd' => undef }, > 'E' => { 'b' => undef , 'c' => undef, 'd' => undef }, > ); > my %threads : shared = (); > > ###### > ## create the threads > ##### > while (($switch,$hash) = each %switches) { > unless (exists $threads{$switch}) { > my %h : shared; > $threads{$switch} = \%h; > } > while (($gsx, $_) = each %$hash) { > unless (exists $threads{$switch}{$gsx}) { > my %h : shared; > $threads{$switch}{$gsx} = \%h; > } > unless (exists $threads{$switch}{$gsx}{'messages'}) { > my @h : shared; > $threads{$switch}->{$gsx}->{'messages'} = \@h; > } > $hash->{$gsx}->{'thread'} = > threads->create(\&sub_threads,\$switch,\$gsx,\$cnt); > $hash->{$gsx}->{'tries'} = 1; > $cnt += 5; > } > } > > #print Dumper \%threads; > #print Dumper \%switches; > > ###### > ## endless loop to run while threads are still running > ###### > $cnt = 1; > while ($cnt) { > $cnt = 0; > while (($switch,$hash) = each %switches) { > while (($gsx, $_) = each %$hash) { > if ($hash->{$gsx}->{'thread'}->is_running()) { > $cnt = 1; > # print "$switch->$gsx is running\n"; > } else { > # print "$switch->$gsx is NOT running\n"; > # print " Reason for failure : \n"; > # print ' ' . join('\n' , > @{$threads{$switch}->{$gsx}->{'messages'}}) . "\n"; > if ($hash->{$gsx}->{'tries'} < MAX_TRIES) { > # print " max tries not reached for $switch->$gsx, will be > trying again!\n"; > $hash->{$gsx}->{'tries'}++; > $hash->{$gsx}->{'thread'} = > threads->create(\&sub_threads,\$switch,\$gsx,\$cnt); > } else { > print "send email! $switch->$gsx\n"; > } > } > } > sleep 2; > } > } > > #print Dumper \%threads; > #print Dumper \%switches; > > > sub sub_threads($$$) { > my $ptr_switch = shift; > my $ptr_gsx = shift; > my $ptr_tNum = shift; > sleep $$ptr_tNum; > { > lock(%threads); > push @{$threads{$$ptr_switch}->{$$ptr_gsx}->{'messages'}} , "Leaving > thread $$ptr_switch->$$ptr_gsx"; > #$threads{$$ptr_switch}->{$ptr_gsx}->{'messages'} = "Leaving thread > $$ptr_switch->$$ptr_gsx"; > # locke freed at end oc scope > } > return 0; > } > > On Feb 2, 2015, at 10:11 PM, Alan Raetz <alanra...@gmail.com<mailto: > alanra...@gmail.com>> wrote: > > So I have a perl application that upon startup loads about ten perl hashes > (some of them complex) from files. This takes up a few GB of memory and > about 5 minutes. It then iterates through some cases and reads from (never > writes) these perl hashes. To process all our cases, it takes about 3 hours > (millions of cases). We would like to speed up this process. I am thinking > this is an ideal application of mod_perl because it would allow multiple > processes but share memory. > > The scheme would be to load the hashes on apache startup and have a master > program send requests with each case and apache children will use the > shared hashes. > > I just want to verify some of the details about variable sharing. Would > the following setup work (oversimplified, but you get the idea…): > > In a file Data.pm, which I would use() in my Apache startup.pl< > http://startup.pl/>, I would load the perl hashes and have hash > references that would be retrieved with class methods: > > package Data; > > my %big_hash; > > open(FILE,"file.txt"); > > while ( <FILE> ) { > > … code …. > > $big_hash{ $key } = $value; > } > > sub get_big_hashref { return \%big_hash; } > > <snip> > > And so in the apache request handler, the code would be something like: > > use Data.pm; > > my $hashref = Data::get_big_hashref(); > > …. code to access $hashref data with request parameters….. > > <snip> > > The idea is the HTTP request/response will contain the relevant > input/output for each case… and the master client program will collect > these and concatentate the final output from all the requests. > > So any issues/suggestions with this approach? I am facing a non-trivial > task of refactoring the existing code to work in this framework, so just > wanted to get some feedback before I invest more time into this... > > I am planning on using mod_perl 2.07 on a linux machine. > > Thanks in advance, Alan > >