Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

Josef Bacik Mon, 06 Jul 2015 12:42:14 -0700

On 07/06/2015 02:36 PM, Mike Galbraith wrote:

On Mon, 2015-07-06 at 10:34 -0400, Josef Bacik wrote:

On 07/06/2015 01:13 AM, Mike Galbraith wrote:

Hm.  Piddling with pgbench, which doesn't seem to collapse into a
quivering heap when load exceeds cores these days, deltas weren't all
that impressive, but it does appreciate the extra effort a bit, and a
bit more when clients receive it as well.


If you test, and have time to piddle, you could try letting wake_wide()
return 1 + sched_feat(WAKE_WIDE_IDLE) instead of adding only if wakee is
the dispatcher.

Numbers from my little desktop box.

NO_WAKE_WIDE_IDLE
postgres@homer:~> pgbench.sh
clients 8       tps = 116697.697662
clients 12      tps = 115160.230523
clients 16      tps = 115569.804548
clients 20      tps = 117879.230514
clients 24      tps = 118281.753040
clients 28      tps = 116974.796627
clients 32      tps = 119082.163998   avg   117092.239   1.000

WAKE_WIDE_IDLE
postgres@homer:~> pgbench.sh
clients 8       tps = 124351.735754
clients 12      tps = 124419.673135
clients 16      tps = 125050.716498
clients 20      tps = 124813.042352
clients 24      tps = 126047.442307
clients 28      tps = 125373.719401
clients 32      tps = 126711.243383   avg   125252.510   1.069   1.000

WAKE_WIDE_IDLE (clients as well as server)
postgres@homer:~> pgbench.sh
clients 8       tps = 130539.795246
clients 12      tps = 128984.648554
clients 16      tps = 130564.386447
clients 20      tps = 129149.693118
clients 24      tps = 130211.119780
clients 28      tps = 130325.355433
clients 32      tps = 129585.656963   avg   129908.665   1.109   1.037


I had a typo in my script, so those desktop box numbers were all doing
the same number of clients.  It doesn't invalidate anything, but the
individual deltas are just run to run variance.. not to mention that
single cache box is not all that interesting for this anyway.  That
happens when interconnect becomes a player.

I have time for twiddling, we're carrying ye olde WAKE_IDLE until we get
this solved upstream and then I'll rip out the old and put in the new,
I'm happy to screw around until we're all happy.  I'll throw this in a
kernel this morning and run stuff today.  Barring any issues with the
testing infrastructure I should have results today.  Thanks,


I'll be interested in your results.  Taking pgbench to a little NUMA
box, I'm seeing _nada_ outside of variance with master (crap).  I have a
way to win significantly for _older_ kernels, and that win over master
_may_ provide some useful insight, but I don't trust postgres/pgbench as
far as I can toss the planet, so don't have a warm fuzzy about trying to
use it to approximate your real world load.

BTW, what's your topology look like (numactl --hardware).

So the NO_WAKE_WIDE_IDLE results are very good, almost the same as thebaseline with a slight regression at lower RPS and a slight improvementat high RPS. I'm running with WAKE_WIDE_IDLE set now, that should bedone soonish and then I'll do the 1 + sched_feat(WAKE_WIDE_IDLE) thingnext and those results should come in the morning. Here is the numainformation from one of the boxes in the test cluster


available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
node 0 size: 15890 MB
node 0 free: 2651 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
node 1 size: 16125 MB
node 1 free: 2063 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

Reply via email to