* Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Roman Zippel <[EMAIL PROTECTED]> wrote: > > > Well, I've sent him the stuff now... > > received it - thanks alot, looking at it!
everything looks good in your debug output and the TSC dump data, except for the wait_runtime values, they are quite out of balance - and that balance cannot be explained with jiffies granularity or with any sort of sched_clock() artifact. So this clearly looks like a CFS regression that should be fixed. the only relevant thing that comes to mind at the moment is that last week Peter noticed a buggy aspect of sleeper bonuses (in that we do not rate-limit their output, hence we 'waste' them instead of redistributing them), and i've got the small patch below in my queue to fix that - could you give it a try? this is just a blind stab into the dark - i couldnt see any real impact from that patch in various workloads (and it's not upstream yet), so it might not make a big difference. The trace you did (could you send the source for that?) seems to implicate sleeper bonuses though. if this patch doesnt help, could you check the general theory whether it's related to sleeper-fairness, via turning it off: echo 30 > /proc/sys/kernel/sched_features does the bug go away if you do that? If sleeper bonuses are showing too many artifacts then we could turn it off for final .23. Ingo ---------------------> Subject: sched: fix sleeper bonus From: Ingo Molnar <[EMAIL PROTECTED]> Peter Ziljstra noticed that the sleeper bonus deduction code was not properly rate-limited: a task that scheduled more frequently would get a disproportionately large deduction. So limit the deduction to delta_exec and limit production to runtime_limit. Not-Yet-Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/sched_fair.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) Index: linux/kernel/sched_fair.c =================================================================== --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -75,7 +75,7 @@ enum { unsigned int sysctl_sched_features __read_mostly = SCHED_FEAT_FAIR_SLEEPERS *1 | - SCHED_FEAT_SLEEPER_AVG *1 | + SCHED_FEAT_SLEEPER_AVG *0 | SCHED_FEAT_SLEEPER_LOAD_AVG *1 | SCHED_FEAT_PRECISE_CPU_LOAD *1 | SCHED_FEAT_START_DEBIT *1 | @@ -304,11 +304,9 @@ __update_curr(struct cfs_rq *cfs_rq, str delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw); if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) { - delta = calc_delta_mine(cfs_rq->sleeper_bonus, - curr->load.weight, lw); - if (unlikely(delta > cfs_rq->sleeper_bonus)) - delta = cfs_rq->sleeper_bonus; - + delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec); + delta = calc_delta_mine(delta, curr->load.weight, lw); + delta = min((u64)delta, cfs_rq->sleeper_bonus); cfs_rq->sleeper_bonus -= delta; delta_mine -= delta; } @@ -521,6 +519,8 @@ static void __enqueue_sleeper(struct cfs * Track the amount of bonus we've given to sleepers: */ cfs_rq->sleeper_bonus += delta_fair; + if (unlikely(cfs_rq->sleeper_bonus > sysctl_sched_runtime_limit)) + cfs_rq->sleeper_bonus = sysctl_sched_runtime_limit; schedstat_add(cfs_rq, wait_runtime, se->wait_runtime); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/