Aaron Bannert wrote:
[...]
>General Questions:
>- Child processes may have threads running across multiple CPUs. They
> each share memory between processes, and even more significantly
> between threads in the same process. Given this scenario and my relative
> naivete' with SMP architectures, how worried should we be about cache
> invalidation for these hevily accesses regions of shared memory? We
> don't want to end up in a scenario where the threads of a single
> process are spread across multiple CPUs, and those CPUs go around
> invalidating each other's caches when they update this shared memory
> space. I'm thinking that the best way to avoid this is to hope that
> the kernel is smart enough to keep threads together on the same CPU
> if possible, and by purposefully keeping the number of child processes
> at a multiple of the number of CPUs in the system. Is this even close
> to a valid argument?
>
I think it's reasonable to assume that Solaris will make smart decisions
about LWP locality. If you want to validate this assumption empirically,
though, I think it's possible to get a snapshot of what CPU is running
each LWP, via Solaris-specific ioctls on the corresponding /proc file.
(I know this was possible in 5.6; it's probably still supported in 5.8.)
This issue, by the way, makes me favor letting the OS figure out the
thread concurrency for itself: assuming that the number of child processes
is greater than or equal to the number of CPUs, it might be more effective
to have a small number of LWPs per process, with a large number of threads
per LWP for cache locality purposes.
>- Justin mentioned that he observed much higher load on the worker tests
> compared to the prefork tests. Something on the order of ~8 for the
> worker case, and ~3-4 for prefork was discussed. Can someone explain
> to us why this might be happening? It obviously didn't impact
> performance negatively, since the worker performed slightly better.
>
I'm surprised about this result. In Ian's tests comparing worker and
prefork on an 8-CPU Solaris box, the load average was about the same
for both MPMs. Justin, do you remember how the usr+sys CPU utilization
for worker compared to that of prefork in your test?
--Brian