On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote:

> OK, after doing some other NUMA stuff, and then looking at the scheduler
> again with a fresh mind, I have drawn some more conclusions about what
> the scheduler does, and how it breaks NUMA locality :)
> 
> 1) If the node_distance between nodes on a NUMA system is
>    <= RECLAIM_DISTANCE, we will call select_idle_sibling for
>    a wakeup of a previously existing task (SD_BALANCE_WAKE)
> 
> 2) If the node distance exceeds RECLAIM_DISTANCE, we will
>    wake up a task on prev_cpu, even if it is not currently
>    idle
> 
>    This behaviour only happens on certain large NUMA systems,
>    and is different from the behaviour on small systems.
>    I suspect we will want to call select_idle_sibling with
>    prev_cpu in case target and prev_cpu are not in the same
>    SD_WAKE_AFFINE domain.

Sometimes.  It's the same can of worms remote as it is local.. latency
gain may or may not outweigh cache miss pain.

> 3) If wake_wide is false, we call select_idle_sibling with
>    the CPU number of the code that is waking up the task
> 
> 4) If wake_wide is true, we call select_idle_sibling with
>    the CPU number the task was previously running on (prev_cpu)
> 
>    In effect, the "wake task on waking task's CPU" behaviour
>    is the default, regardless of how frequently a task wakes up
>    its wakee, and regardless of impact on NUMA locality.
> 
>    This may need to be changed.

That behavior also improves the odds of communicating tasks sharing a
cache though.

> Am I overlooking anything?

No, I think you're seeing where the worms live. 

> What benchmarks should I run to test any changes I make?

Mixed bag, it'll affects all, bursty, static, ramp up/down.

-Mike



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to