On Tue, Sep 18, 2001 at 11:08:22PM -0700, Justin Erenkrantz wrote:
> FWIW, in my tests on the Solaris 8/MP box, worker MPM is close enough
> to prefork where you can call it a dead heat.
>
> Flood results are at:
>
> http://www.apache.org/~jerenkrantz/mpm-flood-relative.tar.bz2
I compiled this data into a nifty little graph (a histogram):
http://www.clove.org/~aaron/workers_comp.png
Details (from what Justin told me):
225000 hits, 25 simultaneous clients, 9 URLs/hits per keepalive request
prefork-relative - prefork MPM with default config
-- total avg. 1037.703 r/s
worker-relative - worker MPM with 3 children, 35 threads each
-- total avg. 1050.023 r/s
worker-relative-one - worker MPM with 1 child, 35 threads
-- total avg. 1035.983 r/s
Observation:
Don't be deceived by this graph, the overall requests/second are almost
identical. I deliberately zoomed in to this vertical range in the graph
to make an observation. You'll notice that the single-child worker
starts out the slowest, and after about 20 seconds later (really 20,000
requests, only ~1/9th of which were new socket connections due to the
keepalive setting) it catches up to the 3-child version of worker. This
is probably due to the fact that the LWP creation agent (the one Justin
and I talked about so tirelessly in that awful setconcurrency() thread)
hasn't yet created enough LWPs to get scheduled on the second CPU. It
has yet to be seen if this "ramp-up" is more dramatic or is negligible
on larger MP machines.
General Questions:
- Child processes may have threads running across multiple CPUs. They
each share memory between processes, and even more significantly
between threads in the same process. Given this scenario and my relative
naivete' with SMP architectures, how worried should we be about cache
invalidation for these hevily accesses regions of shared memory? We
don't want to end up in a scenario where the threads of a single
process are spread across multiple CPUs, and those CPUs go around
invalidating each other's caches when they update this shared memory
space. I'm thinking that the best way to avoid this is to hope that
the kernel is smart enough to keep threads together on the same CPU
if possible, and by purposefully keeping the number of child processes
at a multiple of the number of CPUs in the system. Is this even close
to a valid argument?
- Justin mentioned that he observed much higher load on the worker tests
compared to the prefork tests. Something on the order of ~8 for the
worker case, and ~3-4 for prefork was discussed. Can someone explain
to us why this might be happening? It obviously didn't impact
performance negatively, since the worker performed slightly better.
-aaron