In task_numa_migrate(), env.dst_nid points to either a preferred node or a node that has free capacity and has more task weight than the current node.
Currently in such a scenario, there are checks to see if tasks in the numa_group have previously run on the node that has free capacity before updating the preferred node. Commit (c1ceac62: "sched/numa: Reduce conflict between fbq_classify_rq() and migration") gives preferance to preferred node while load balancing. Hence if setting the preferred_node after evaluating is skipped, then the task might miss opportunity later at load balancing time to move to the preferred node. In such a scenario, it makes sense to unconditionally set env.dst_nid as the preferred node unless the said node is already the preferred node. While here, update env.dst_nid only when both task and groups benefit. This is as per the comment in the code. Signed-off-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com> --- kernel/sched/fair.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7b23efa..d1aa374 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1503,7 +1503,7 @@ static int task_numa_migrate(struct task_struct *p) /* Only consider nodes where both task and groups benefit */ taskimp = task_weight(p, nid, dist) - taskweight; groupimp = group_weight(p, nid, dist) - groupweight; - if (taskimp < 0 && groupimp < 0) + if (taskimp < 0 || groupimp < 0) continue; env.dist = dist; @@ -1519,16 +1519,9 @@ static int task_numa_migrate(struct task_struct *p) * and is migrating into one of the workload's active nodes, remember * this node as the task's preferred numa node, so the workload can * settle down. - * A task that migrated to a second choice node will be better off - * trying for a better one later. Do not set the preferred node here. */ if (p->numa_group) { - if (env.best_cpu == -1) - nid = env.src_nid; - else - nid = env.dst_nid; - - if (node_isset(nid, p->numa_group->active_nodes)) + if (env.dst_nid != p->numa_preferred_nid) sched_setnuma(p, env.dst_nid); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/