Hi,

Erik's thread about a 16-processor x86 machine convinced me to try something
related to spinlocks.

The current 9 spinlocks are portable code, calling an arch-provided tas() in
a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a
spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you prefer) if the
lock-acquire attempt failed.

In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and
256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a
fully-built source tree, adding the PAUSE reduced times from an average of
18.97s to 18.84s (across ten runs).

I tinkered a bit further. Removing the increments of glare, inglare and
lockstat.locks, coupled with the PAUSE addition, reduced the average real
time to 18.16s, again across 10 runs.

If taslock.c were arch-specific, we could almost certainly do better - i386
doesn't need the coherence() call in unlock, we could safely test-and-tas
rather than than raw tas().

There're also other places to look at too, wrt to application of
arch-specific bits; see:
http://code.google.com/p/inferno-npe/source/detail?r=b83540e1e77e62a19cbd21d2eb54d43d338716a5for
what XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be
much shorter, again using XADD.

None of these are a huge deal; just thought they might be interesting.

Take care,
-- vs

Reply via email to