Re: Reducing memory usage using fewer cgi programs

André Warnier Sat, 25 Oct 2008 04:39:59 -0700

Michael Lackhoff wrote:

On 24.10.2008 15:03 Michael Peters wrote:
This is only true if those structures were created during run time and go out of scope at run time.If they are generated at compile time or attached to global variables or package level variables,they will not be re-used by Perl.
Wait a minute, I would like to do exactly that: use a config module in
startup.pl that loads some massive config hashes in the hope that the
memory they use will be shared:

package MyConfig;
our $aHugeConfigHash = load_data_from_config_file();

then in my mod_perl module:
my $conf = $MyConfig::aHugeConfigHash;

(well sort of, it is actually wrapped in an accessor but that gets its
data from the package variable)
Are you saying, I cannot share the memory this way?

Yes, he is saying that.
You cannot share memory between Apache "children" (independently of
whether we are talking about perl, mod_perl, or whatever else).  Each
child is a separate process, with its separate copy of mod_perl, the
perl interpreter, global variables, everything.

What happens when you are using a mod_perl startup script is :
Apache will load mod_perl and perl, and compile and execute this script
(and all that it "use"'s) *before* it forks into multiple children.

So when Apache has finished its initialisation, and forks into multiple
children, each one of those will have its own copy of what was compiled
and run and initialised there, without needing to recompile and execute
them itself.
The same happens when in the future Apache creates a new child (by
forking again) : this new child will also get that same initial copy of
the modules and structures you compile/create at startup time.

To a certain extent, this can save memory under modern operating
systems, because a piece of memory that is identical for a number of
processes, can be in memory only once, and shared between processes, *as
long as nothing in it is modified*. (That's the "copy-on-write" thing).
But as soon as one of the processes modifies something in that memory
area, the OS will copy the entire area and give a new copy to the
process to modify, and after that the process keeps this "personal
copy".  So any changes made to this table are invisible to the other
processes (Apache children), because they are still using the unmodified
"shared" original copy.

That can still be a huge time saving though.  Imagine that loading this
table initially takes 2 minutes, and that you have 30 Apache children.
If you load it in your startup script, it will be done once and take 2
minutes.  If you don't, it will be done in each new Apache child, and
take in total 60 minutes, plus 2 minutes each time a child dies and a
new one is started.

In your case, what that means is : if you allocate your huge hashtable
once at the beginning, and later you never modify it, then yes you can
probably consider that it will be loaded and present in memory only once
(but even that depends on how perl internally handles it).
But as soon as one of the Apache children modifies this hashtable, then
it is 100% sure that this process now has its own copy forever after.

Now, one of the characteristics of running things under mod_perl, is
that mod_perl and the perl interpreter are "persistent" within that
Apache child.  In other words, it is the same mod_perl and perl
interpreter that execute many modules or scripts one after the other,
and they never themselves terminate.
And they do "remember" some things between consecutive runs of scripts
or modules.  That is usually undesirable, because it can give nasty
errors : a variable that you declare with "my $var" and that you expect
to be "undef", might not be, if a previous run of the same script or
module (in the same Apache child) has left something in it.
But if you use this carefully, it may also be very useful, because it
might "remember" your hashtable between one call and the
next, and avoid you having to reload the table from scratch.
Just be careful about this, and remember always that when you find
something already in the table, it is due to a previous run of something
in this particular Apache child, not in Apache in general.  You are
still not sharing this table with other Apache children and other
mod_perl and perl instances.

And if so, is there an alternative?

There are several, which depend on what you really do with this data,
how often it is modified etc.
One alternative goes somewhat like this :
- the table is loaded from the original data in the startup script, and
a reference to it put in a global variable (our $hashtable)
- the startup script then writes the loaded table into a file, as a
Storable object, and initialises another global variable $stamp with the
current time.
- each time your application module/script starts, it compares its
global $stamp variable with the Storable file's timestamp.  If they are
different (and only then), it reloads the table from the Storable file.
That, hopefully, is a lot faster than having to rebuild the table from
scratch.
If the table is mostly used read-only, and modifications to it are
unfrequent, that may be your best bet.

Of course if one process modifies the table, and the changes have tobecome visible to the others, it needs to rewrite the Storable object,with an appropriate inter-process locking mechanism.


The thing is that if a table is in a global variable, it will be kept in
the memory *of that Apache child* across separate invocations of the
application modules over time *executed by that same child*.
So if it does not change often, you may run the script hundreds of times
before it needs to reload the table.

Another alternative is to have this huge data structure loaded/created
by a totally independent "server" process, and have all your application
modules/scripts access this separate process through TCP/IP to read or
modify the table.

There exists a module like that somewhere in CPAN, I believe it iscalled "daemon"-something.

IPC-based modules also exist, but they work only under Unix/Linux.

Re: Reducing memory usage using fewer cgi programs

Reply via email to