Perrin-
On Sat, Apr 15, 2000 at 11:33:15AM -0700, Perrin Harkins wrote:
> > Each process of apache has
> > it's registry which holds the compiled perl scripts in..., a copy of
> > each for each process.  This has become an issue for one of the
> > companies that I work for, and I noted from monitoring the list that
> > some people have apache processes that are upwards of 25Megs, which is
> > frankly ridiculous.
> 
> I have processes that large, but more than 50% of that is shared through
> copy-on-write.
> 
> > I wrote a very small perl engine
> > for phhttpd that worked within it's threaded paradigm that sucked up a
> > neglibible amount of memory which used a very basic version of
> > Apache's registry.
> 
> Can you explain how this uses less memory than mod_perl doing the same
> thing?  Was it just that you were using fewer perl interpreters?  If so, you
> need to improve your use of apache with a multi-server setup.  The only way
> I could see phttpd really using less memory to do the same work is if you
> somehow managed to get perl to share more of its internals in memory.  Did
> you?

Yep very handily I might add ;-).  Basically phhttpd is not process
based, it's threaded based.  Which means that everything is running
inside of the same address space.  Which means 100% sharing except for
the present local stack of variables... which is very minimal.  In
terms of the perl thing... when you look at your processes and see all
that non-shared memory, most of that is stack variables.  Now most
webservers are running on single processor machines, so they get no
benefit from having 10s or even 100s of copies of these perl stack
variables.  Its much more efficient to have a single process handle
all the perl requests.  On a multiprocessor box that single process
could have multiple threads in order to take advantage of the
processors.  See..., mod_perl stores the stack state of every script
it runs in the apache process... for every script... copies of it,
many many copies of it.  This is not efficient.  What would be
efficient is to have as many threads/processes as you have processors
for the mod_perl engine.  In other words seperate the engine from the
apache process so that there is never unneccesary stack variables
being tracked.

Hmm... can I explain this better.  Let me try.  Okay, for every apache
proccess there is an entire perl engine with all the stack variables
for every script you run recorded there.  What I'm proposing is a
system where by there would be a seperate process that would have only
a perl engine in it... you would make as many of these processes as
you have processors.  (Or multithread them... it doesn't really
matter)  Now your apache processes would not have a bunch of junk
memory in them.  Your apache processes would be the size of a stock
apache process, like 4-6M or so, and you would have 1 process that
would be 25MB or so that would have all your registry in it.  For a
high capacity box this would be an incredible boon to increasing
capacity.  (I'm trying to explain clearly, but I'd be the first to
admit this isn't one of my strong points)

As to how the multithreaded phhttpd can handle tons of load, well...
that's a seperate issue and frankly a question much better handled by
Zach.  I understand it very well, but I don't feel that I could
adequately explain it.  Its based on real time sig_queue software
technology... for a "decent" reference on this you can take a look at
a book by Oreily called "POSIX.4 Programming for the Real World".  I
should say that this book doesn't go into enough depth... but it's the
only book that goes into any depth that I could find.

> 
> > What I'm
> > thinking is essentially we take the perl engine which has the apache
> > registry and all the perl symbols etc., and seperate it into it's own
> > process which would could be multithreaded (via pthreads) for multiple
> > processor boxes.  (above 2 this would be beneficial probably)  On the
> > front side the apache module API would just connect into this other
> > process via shared memory pages (shmget et. al), or Unix pipes or
> > something like that.
> 
> This is how FastCGI, and all the Java servlet runners (JServ, Resin, etc.)
> work.  The thing is, even if you run the perl interpreters in a
> multi-threaded process, it still needs one interpreter per perl thread and I
> don't know how much you'd be able to share between them.  It might not be
> any smaller at all.

But there is no need to have more than one perl thread per processor.
Right now we have a perl "thread" (er.. engine is a better term) per
process.  Since most boxes start up 10 processes or so of Apache we'd
be talking about a memory savings something like this:
6MB stock apache process
25MB (we'll say that's average) mod_perl apache process 50% shared,
leaving 12.5 MB non shared
The way it works now: 12.5 * 10=125MB + 12.5 (shared bit one
instance)= 147.5 MB total.
Suggested way:
6MB stock with about 3MB shared or so.  3MB * 10=30 +25MB mod_perl
process = 55MB total.

That would be an overal difference of 147.5-55... almost 100 MB of
memory.  I have no idea how accurate this is, but I'd put my money on
not too far from the expected result in a high load enviro with lots
of apache scripts.

> 
> My suggestion would be to look at the two-server approach for mod_perl, and
> if that doesn't work for you look at FastCGI, and if that doesn't work for
> you join the effort to get mod_perl working on Apache 2.0 with a
> multi-threaded model.  Or just skip the preliminaries and go straight for
> the hack value...

Well... the second option certainly has a lot of merit.  Maybe I
should get involved in that... actually that has a lot of appeal to
me.  Hmm... I guess it's time to pick apache 2.0 stuff and do some
tinkering! :)  As far as the present problem... I'm not all that
concerned about it.  It actually falls outside of the area of my
responsibilities at our site..., I'm thinking for the other people in
the community mostly.

Thanks!
Shane
> 
> - Perrin
> 

Reply via email to