On Wed, 14 May 2014 06:08:09 +0200 Mike Galbraith <umgwanakikb...@gmail.com> wrote: > On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote:
> > 1) If the node_distance between nodes on a NUMA system is > > <= RECLAIM_DISTANCE, we will call select_idle_sibling for > > a wakeup of a previously existing task (SD_BALANCE_WAKE) > > > > 2) If the node distance exceeds RECLAIM_DISTANCE, we will > > wake up a task on prev_cpu, even if it is not currently > > idle > > > > This behaviour only happens on certain large NUMA systems, > > and is different from the behaviour on small systems. > > I suspect we will want to call select_idle_sibling with > > prev_cpu in case target and prev_cpu are not in the same > > SD_WAKE_AFFINE domain. > > Sometimes. It's the same can of worms remote as it is local.. latency > gain may or may not outweigh cache miss pain. Ahh, but it is a DIFFERENT can of worms. If the distance between cpu and prev_cpu exceeds RECLAIM_DISTANCE, we will not look for an idle sibling in the same LLC domain as prev_cpu. If the distance is smaller, and we decide not to do an affine wakeup, then we do look for an idle sibling of prev_cpu. This patch makes sure that both types of systems have the same can of worms :) ---8<--- Subject: sched: call select_idle_sibling when not affine_sd On smaller systems, the top level sched domain will be an affine domain, and select_idle_sibling is invoked for every SD_WAKE_AFFINE wakeup. This seems to be working well. On larger systems, with the node distance between far away NUMA nodes being > RECLAIM_DISTANCE, select_idle_sibling is only called if the waker and the wakee are on nodes less than RECLAIM_DISTANCE apart. This patch leaves in place the policy of not pulling the task across nodes on such systems, while fixing the issue that select_idle_sibling is not called at all in certain circumstances. The code will look for an idle CPU in the same CPU package as the CPU where the task ran previously. Signed-off-by: Rik van Riel <r...@redhat.com> --- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 39b63d0..1e58159 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4423,10 +4423,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f sd = tmp; } - if (affine_sd) { - if (cpu != prev_cpu && wake_affine(affine_sd, p, sync)) - prev_cpu = cpu; + if (affine_sd && cpu != prev_cpu && wake_affine(affine_sd, p, sync)) + prev_cpu = cpu; + if (sd_flag & SD_WAKE_AFFINE) { new_cpu = select_idle_sibling(p, prev_cpu); goto unlock; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/