On 02/14/12 00:38, Alexander Motin wrote:
I see no much point in committing them sequentially, as they are quite
orthogonal. I need to make one decision. I am going on small vacation
next week. It will give time for thoughts to settle. May be I indeed
just clean previous patch a bit and commit it when I get back. I've
spent too much time trying to make these things formal and so far
results are not bad, but also not so brilliant as I would like. May be
it is indeed time to step back and try some more simple solution.
I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've dropped
most of cache related things from the patch and made the rest of things
more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch
This patch adds check to skip fast previous CPU selection if it's SMT
neighbor is in use, not just if no SMT present as in previous patches.
I've took affinity/preference algorithm from the first patch and
improved it. That makes pickcpu() to prefer previous core or it's
neighbors in case of equal load. That is very simple to keep it, but
still should give cache hits.
I've changed the general algorithm of topology tree processing. First I
am looking for idle core on the same last-level cache as before, with
affinity to previous core or it's neighbors on higher level caches.
Original code could put additional thread on already busy core, while
next socket is completely idle. Now if there is no idle core on this
cache, then all other CPUs are checked.
CPU groups comparison now done in two steps: first, same as before,
compared summary load of all cores; but now, if it is equal, I am
comparing load of the less/most loaded cores. That should allow to
differentiate whether load 2 really means 1+1 or 2+0. In that case group
with 2+0 will be taken as more loaded than one with 1+1, making group
choice more grounded and predictable.
I've added randomization in case if all above factors are equal.
As before I've tested this on Core i7-870 with 4 physical and 8 logical
cores and Atom D525 with 2 physical and 4 logical cores. On Core i7 I've
got speedup up to 10-15% in super-smack MySQL and PostgreSQL indexed
select for 2-8 threads and no penalty in other cases. pbzip2 shows up to
13% performance increase for 2-5 threads and no penalty in other cases.
Tests on Atom show mostly about the same performance as before in
database benchmarks: faster for 1 thread, slower for 2-3 and about the
same for other cases. Single stream network performance improved same as
for the first patch. That CPU is quite difficult to handle as with mix
of effective SMT and lack of L3 cache different scheduling approaches
give different results in different situations.
Specific performance numbers can be found here:
http://people.freebsd.org/~mav/bench.ods
Every point there includes at least 5 samples and except pbzip2 test
that is quite unstable with previous sources all are statistically valid.
Florian is now running alternative set of benchmarks on dual-socket
hardware without SMT.
--
Alexander Motin
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"