Re: [patch] CFS scheduler, -v12

Ingo Molnar Mon, 21 May 2007 01:58:18 -0700

* William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> cfs should probably consider aggregate lag as opposed to aggregate 
> weighted load. Mainline's convergence to proper CPU bandwidth 
> distributions on SMP (e.g. N+1 tasks of equal nice on N cpus) is 
> incredibly slow and probably also fragile in the presence of arrivals 
> and departures partly because of this. [...]


hm, have you actually tested CFS before coming to this conclusion?

CFS is fair even on SMP. Consider for example the worst-case 
3-tasks-on-2-CPUs workload on a 2-CPU box:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2658 mingo     20   0  1580  248  200 R   67  0.0   0:56.30 loop
 2656 mingo     20   0  1580  252  200 R   66  0.0   0:55.55 loop
 2657 mingo     20   0  1576  248  200 R   66  0.0   0:55.24 loop

66% of CPU time for each task. The 'TIME+' column shows a 2% spread 
between the slowest and the fastest loop after just 1 minute of runtime 
(and the spread gets narrower with time). Mainline does a 50% / 50% / 
100% split:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3121 mingo     25   0  1584  252  204 R  100  0.0   0:13.11 loop
 3120 mingo     25   0  1584  256  204 R   50  0.0   0:06.68 loop
 3119 mingo     25   0  1584  252  204 R   50  0.0   0:06.64 loop

and i fixed that in CFS.

or consider a sleepy workload like massive_intr, 3-tasks-on-2-CPUs:

  europe:~> head -1 /proc/interrupts
             CPU0       CPU1

  europe:~> ./massive_intr 3 10
  002623  00000722
  002621  00000720
  002622  00000721

Or a 5-tasks-on-2-CPS workload:

  europe:~> ./massive_intr 5 50
  002649  00002519
  002653  00002492
  002651  00002478
  002652  00002510
  002650  00002478

that's around 1% of spread.

load-balancing is a performance vs. fairness tradeoff so we wont be able 
to make it precisely fair because that's hideously expensive on SMP 
(barring someone showing a working patch of course) - but in CFS i got 
quite close to having it very fair in practice.

> [...] Tong Li's DWRR repairs the deficit in mainline by synchronizing 
> epochs or otherwise bounding epoch dispersion. This doesn't directly 
> translate to cfs. In cfs cpu should probably try to figure out if its 
> aggregate lag (e.g. via minimax) is above or below average, and push 
> to or pull from the other half accordingly.

i'd first like to see a demonstration of a problem to solve, before 
thinking about more complex solutions ;-)

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v12

Reply via email to