Re: [PATCH] fix scoreboard state for worker threads

Aaron Bannert Wed, 19 Sep 2001 01:35:26 -0700
On Tue, Sep 18, 2001 at 11:08:22PM -0700, Justin Erenkrantz wrote:
> FWIW, in my tests on the Solaris 8/MP box, worker MPM is close enough
> to prefork where you can call it a dead heat.
> 
> Flood results are at:
> 
> http://www.apache.org/~jerenkrantz/mpm-flood-relative.tar.bz2

I compiled this data into a nifty little graph (a histogram):

http://www.clove.org/~aaron/workers_comp.png

Details (from what Justin told me):
225000 hits, 25 simultaneous clients, 9 URLs/hits per keepalive request

prefork-relative    - prefork MPM with default config
 -- total avg. 1037.703 r/s
worker-relative     - worker MPM with 3 children, 35 threads each
 -- total avg. 1050.023 r/s
worker-relative-one - worker MPM with 1 child, 35 threads
 -- total avg. 1035.983 r/s

Observation:

Don't be deceived by this graph, the overall requests/second are almost
identical. I deliberately zoomed in to this vertical range in the graph
to make an observation. You'll notice that the single-child worker
starts out the slowest, and after about 20 seconds later (really 20,000
requests, only ~1/9th of which were new socket connections due to the
keepalive setting) it catches up to the 3-child version of worker. This
is probably due to the fact that the LWP creation agent (the one Justin
and I talked about so tirelessly in that awful setconcurrency() thread)
hasn't yet created enough LWPs to get scheduled on the second CPU. It
has yet to be seen if this "ramp-up" is more dramatic or is negligible
on larger MP machines.

General Questions:
- Child processes may have threads running across multiple CPUs. They
  each share memory between processes, and even more significantly
  between threads in the same process. Given this scenario and my relative
  naivete' with SMP architectures, how worried should we be about cache
  invalidation for these hevily accesses regions of shared memory? We
  don't want to end up in a scenario where the threads of a single
  process are spread across multiple CPUs, and those CPUs go around
  invalidating each other's caches when they update this shared memory
  space. I'm thinking that the best way to avoid this is to hope that
  the kernel is smart enough to keep threads together on the same CPU
  if possible, and by purposefully keeping the number of child processes
  at a multiple of the number of CPUs in the system. Is this even close
  to a valid argument?

- Justin mentioned that he observed much higher load on the worker tests
  compared to the prefork tests. Something on the order of ~8 for the
  worker case, and ~3-4 for prefork was discussed. Can someone explain
  to us why this might be happening? It obviously didn't impact
  performance negatively, since the worker performed slightly better.

-aaron
Re: [PATCH] fix scoreboard state for worker threads

Reply via email to