On Thu, Apr 10, 2008 at 01:02:34PM -0700, Alexander Kolbasov wrote:
>
> It may be useful to observe prstat -mL which will report micro-state
> accounting data for each thread.
>
prstat -mL takes a *lot* of time with 16k threads. Nevertheless, I have some
further data: reruning the dtrace gave me the output of top 10 consumers of
CPU time for both CPUs: the winner is "idle" on CPU0 (671 samples from 2029
samples; the next highest has 114 samples), and again "idle" on CPU1 (776/3891
samples, the next highest has 106 samples). Ordinary prstat shows that my
process is often in sleep state.
Furthermore, I do not think that the problem lies in TLB trashing. Here
are three different runs:
2^28 B total block size (256MB), 2^14 B chunk size (= also 2^{28-14} threads),
2^7 repetitions (= 2^35 B (32 GB) encrypted in total): 33.6 seconds
2^24 B total block size (16MB), 2^10 B chunk size (= again 2^14 threads),
2^7 repetitions (= 2^31 B (2 GB) encrypted in total): 25.4 seconds
16MB block size is only twice the TLB capacity (2048 entries x 4kB = 8MB).
Lowering the block size to 4MB (half the TLB capacity) gives the following:
2^22 B total block size (4 MB), 2^8 B chunk size (= 2^14 threads),
2^7 repetitions ( = 2^29 B (512 MB) encrypted in total): 24.5 seconds
==
Is there some backoff heuristics in the mutex/CV/whatever code that puts the
thread to sleep under high contention? Adaptive mutexes? I'm off to browse the
opensolaris code on the net.
==
_______________________________________________
perf-discuss mailing list
[email protected]