Re: mod_perl for multi-process file processing?

Perrin Harkins Tue, 03 Feb 2015 08:44:16 -0800

I agree, either threads or Parallel::ForkManager, depending on your
platform and your perl, will be a lot faster than mod_perl for this. Of
course there might be other reasons to use mod_perl, e.g. it's useful to
have this available as a remote service, or you want to call this
frequently for small jobs throughout the day without needing to reload the
data.


- Perrin

On Tue, Feb 3, 2015 at 7:42 AM, Patton, Billy N <billy.pat...@h3net.com>
wrote:

> Unless I’m entirely wrong it appears that you want to use shared threaded
> memory.
> This would allow you to keep out of apache altogether.
> Here is an example of using threads that I worked out using shared memory.
> We took a 4 hour task, serial, and turned it into 5 minutes with threads.
> This worked extremely well for me.  I wasn’t using large hashes, but I was
> hundreds of files, per thread, with 30k lines per file.
> #!/usr/bin/env perl -w
> use strict;
> use warnings;
> use Data::Dumper;
> $Data::Dumper::Indent = 1;
> $Data::Dumper::Sortkeys = 1;
> $Data::Dumper::Deepcopy = 1;
> use threads;
> use threads::shared;
> use constant  MAX_TRIES => 5;
>
> sub sub_threads($$$);
>
> my $switch            = undef;
> my $hash              = undef;
> my $gsx               = undef;
> my $cnt               = 5;
> my %switches    = ( 'A' => { 'b' => undef , 'c' => undef, 'd' => undef },
>                     'B' => { 'b' => undef , 'c' => undef, 'd' => undef },
>                     'C' => { 'b' => undef , 'c' => undef, 'd' => undef },
>                     'D' => { 'b' => undef , 'c' => undef, 'd' => undef },
>                     'E' => { 'b' => undef , 'c' => undef, 'd' => undef },
>                    );
> my %threads : shared  = ();
>
> ######
> ## create the threads
> #####
> while (($switch,$hash) = each %switches) {
>   unless (exists $threads{$switch}) {
>     my %h : shared;
>     $threads{$switch} = \%h;
>   }
>   while (($gsx, $_) = each %$hash) {
>     unless (exists $threads{$switch}{$gsx}) {
>       my %h : shared;
>       $threads{$switch}{$gsx} = \%h;
>     }
>     unless (exists $threads{$switch}{$gsx}{'messages'}) {
>       my @h : shared;
>       $threads{$switch}->{$gsx}->{'messages'} = \@h;
>     }
>     $hash->{$gsx}->{'thread'} =
> threads->create(\&sub_threads,\$switch,\$gsx,\$cnt);
>     $hash->{$gsx}->{'tries'}  = 1;
>     $cnt += 5;
>   }
> }
>
> #print Dumper \%threads;
> #print Dumper \%switches;
>
> ######
> ## endless loop to run while threads are still running
> ######
> $cnt = 1;
> while ($cnt) {
>   $cnt = 0;
>   while (($switch,$hash) = each %switches) {
>     while (($gsx, $_) = each %$hash) {
>       if ($hash->{$gsx}->{'thread'}->is_running()) {
>         $cnt = 1;
> #        print "$switch->$gsx is running\n";
>       } else {
> #        print "$switch->$gsx is NOT running\n";
> #        print "  Reason for failure : \n";
> #        print '     ' .  join('\n' ,
> @{$threads{$switch}->{$gsx}->{'messages'}}) . "\n";
>         if ($hash->{$gsx}->{'tries'} < MAX_TRIES) {
> #          print "  max tries not reached for $switch->$gsx, will be
> trying again!\n";
>           $hash->{$gsx}->{'tries'}++;
>           $hash->{$gsx}->{'thread'} =
> threads->create(\&sub_threads,\$switch,\$gsx,\$cnt);
>         } else {
>           print "send email! $switch->$gsx\n";
>         }
>       }
>     }
>     sleep 2;
>   }
> }
>
> #print Dumper \%threads;
> #print Dumper \%switches;
>
>
> sub sub_threads($$$) {
>   my $ptr_switch  = shift;
>   my $ptr_gsx     = shift;
>   my $ptr_tNum    = shift;
>   sleep $$ptr_tNum;
>   {
>     lock(%threads);
>     push @{$threads{$$ptr_switch}->{$$ptr_gsx}->{'messages'}} , "Leaving
> thread $$ptr_switch->$$ptr_gsx";
>     #$threads{$$ptr_switch}->{$ptr_gsx}->{'messages'} = "Leaving thread
> $$ptr_switch->$$ptr_gsx";
>     # locke freed at end oc scope
>   }
>   return 0;
> }
>
> On Feb 2, 2015, at 10:11 PM, Alan Raetz <alanra...@gmail.com<mailto:
> alanra...@gmail.com>> wrote:
>
> So I have a perl application that upon startup loads about ten perl hashes
> (some of them complex) from files. This takes up a few GB of memory and
> about 5 minutes. It then iterates through some cases and reads from (never
> writes) these perl hashes. To process all our cases, it takes about 3 hours
> (millions of cases). We would like to speed up this process. I am thinking
> this is an ideal application of mod_perl because it would allow multiple
> processes but share memory.
>
> The scheme would be to load the hashes on apache startup and have a master
> program send requests with each case and apache children will use the
> shared hashes.
>
> I just want to verify some of the details about variable sharing.  Would
> the following setup work (oversimplified, but you get the idea…):
>
> In a file Data.pm, which I would use() in my Apache startup.pl<
> http://startup.pl/>, I would load the perl hashes and have hash
> references that would be retrieved with class methods:
>
> package Data;
>
> my %big_hash;
>
> open(FILE,"file.txt");
>
> while ( <FILE> ) {
>
>       … code ….
>
>       $big_hash{ $key } = $value;
> }
>
> sub get_big_hashref {   return \%big_hash; }
>
> <snip>
>
> And so in the apache request handler, the code would be something like:
>
> use Data.pm;
>
> my $hashref = Data::get_big_hashref();
>
> …. code to access $hashref data with request parameters…..
>
> <snip>
>
> The idea is the HTTP request/response will contain the relevant
> input/output for each case… and the master client program will collect
> these and concatentate the final output from all the requests.
>
> So any issues/suggestions with this approach? I am facing a non-trivial
> task of refactoring the existing code to work in this framework, so just
> wanted to get some feedback before I invest more time into this...
>
> I am planning on using mod_perl 2.07 on a linux machine.
>
> Thanks in advance, Alan
>
>

Re: mod_perl for multi-process file processing?

Reply via email to