Aaron Bannert wrote:

[...]

>General Questions:
>- Child processes may have threads running across multiple CPUs. They
>  each share memory between processes, and even more significantly
>  between threads in the same process. Given this scenario and my relative
>  naivete' with SMP architectures, how worried should we be about cache
>  invalidation for these hevily accesses regions of shared memory? We
>  don't want to end up in a scenario where the threads of a single
>  process are spread across multiple CPUs, and those CPUs go around
>  invalidating each other's caches when they update this shared memory
>  space. I'm thinking that the best way to avoid this is to hope that
>  the kernel is smart enough to keep threads together on the same CPU
>  if possible, and by purposefully keeping the number of child processes
>  at a multiple of the number of CPUs in the system. Is this even close
>  to a valid argument?
>

I think it's reasonable to assume that Solaris will make smart decisions
about LWP locality.  If you want to validate this assumption empirically,
though, I think it's possible to get a snapshot of what CPU is running
each LWP, via Solaris-specific ioctls on the corresponding /proc file.
(I know this was possible in 5.6; it's probably still supported in 5.8.)

This issue, by the way, makes me favor letting the OS figure out the
thread concurrency for itself: assuming that the number of child processes
is greater than or equal to the number of CPUs, it might be more effective
to have a small number of LWPs per process, with a large number of threads
per LWP for cache locality purposes.

>- Justin mentioned that he observed much higher load on the worker tests
>  compared to the prefork tests. Something on the order of ~8 for the
>  worker case, and ~3-4 for prefork was discussed. Can someone explain
>  to us why this might be happening? It obviously didn't impact
>  performance negatively, since the worker performed slightly better.
>

I'm surprised about this result.  In Ian's tests comparing worker and
prefork on an 8-CPU Solaris box, the load average was about the same
for both MPMs.  Justin, do you remember how the usr+sys CPU utilization
for worker compared to that of prefork in your test?

--Brian


Reply via email to