so, i've done a little bit more work characterizing the performance
of the scheduler correctness changes, and i know have some understanding
on why e.g. ping times are a bit slower.

the old code essentially let processor 0 spin in runproc, other processors 
called
halt.  the new code uses monmwait to wait for a change on all processors.
this has some significant impacts on performance and power use.  for example,
on my test box with 4c/8t:

        spin/halt               monmwait        spin/monmwait
ping    8µs             14µs            8µs             # ip/ping -n10 $sysname
mk      6.26s           3.98s           3.80            # make nix kernel
fans    audible         silent          audible
δpower  -               -24w            0               # resolution = .1A = 
12w @ 120v)

this seems to indicate the latency is all in runproc(), and not waiting for 
things
to be ready and assuming they will be has a big performance boost.

(the third column, testing spin on mach 0, plus monmwait on the others was done
to tell if monmwait has high latency or not.)

i'd really be interested to see what this does on 24c/48t machines.  something
tells me the performance impacts would be huge, and different.

- erik

---
ps. hzsched in the distribution is 10% off for HZ=100, since
schedticks = m->ticks + HZ/10, and delaysched tests
for > not the expected >=.

Reply via email to