* William Lee Irwin III <[EMAIL PROTECTED]> wrote: > cfs should probably consider aggregate lag as opposed to aggregate > weighted load. Mainline's convergence to proper CPU bandwidth > distributions on SMP (e.g. N+1 tasks of equal nice on N cpus) is > incredibly slow and probably also fragile in the presence of arrivals > and departures partly because of this. [...]
hm, have you actually tested CFS before coming to this conclusion? CFS is fair even on SMP. Consider for example the worst-case 3-tasks-on-2-CPUs workload on a 2-CPU box: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2658 mingo 20 0 1580 248 200 R 67 0.0 0:56.30 loop 2656 mingo 20 0 1580 252 200 R 66 0.0 0:55.55 loop 2657 mingo 20 0 1576 248 200 R 66 0.0 0:55.24 loop 66% of CPU time for each task. The 'TIME+' column shows a 2% spread between the slowest and the fastest loop after just 1 minute of runtime (and the spread gets narrower with time). Mainline does a 50% / 50% / 100% split: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3121 mingo 25 0 1584 252 204 R 100 0.0 0:13.11 loop 3120 mingo 25 0 1584 256 204 R 50 0.0 0:06.68 loop 3119 mingo 25 0 1584 252 204 R 50 0.0 0:06.64 loop and i fixed that in CFS. or consider a sleepy workload like massive_intr, 3-tasks-on-2-CPUs: europe:~> head -1 /proc/interrupts CPU0 CPU1 europe:~> ./massive_intr 3 10 002623 00000722 002621 00000720 002622 00000721 Or a 5-tasks-on-2-CPS workload: europe:~> ./massive_intr 5 50 002649 00002519 002653 00002492 002651 00002478 002652 00002510 002650 00002478 that's around 1% of spread. load-balancing is a performance vs. fairness tradeoff so we wont be able to make it precisely fair because that's hideously expensive on SMP (barring someone showing a working patch of course) - but in CFS i got quite close to having it very fair in practice. > [...] Tong Li's DWRR repairs the deficit in mainline by synchronizing > epochs or otherwise bounding epoch dispersion. This doesn't directly > translate to cfs. In cfs cpu should probably try to figure out if its > aggregate lag (e.g. via minimax) is above or below average, and push > to or pull from the other half accordingly. i'd first like to see a demonstration of a problem to solve, before thinking about more complex solutions ;-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/