On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote: > OK, after doing some other NUMA stuff, and then looking at the scheduler > again with a fresh mind, I have drawn some more conclusions about what > the scheduler does, and how it breaks NUMA locality :) > > 1) If the node_distance between nodes on a NUMA system is > <= RECLAIM_DISTANCE, we will call select_idle_sibling for > a wakeup of a previously existing task (SD_BALANCE_WAKE) > > 2) If the node distance exceeds RECLAIM_DISTANCE, we will > wake up a task on prev_cpu, even if it is not currently > idle > > This behaviour only happens on certain large NUMA systems, > and is different from the behaviour on small systems. > I suspect we will want to call select_idle_sibling with > prev_cpu in case target and prev_cpu are not in the same > SD_WAKE_AFFINE domain.
Sometimes. It's the same can of worms remote as it is local.. latency gain may or may not outweigh cache miss pain. > 3) If wake_wide is false, we call select_idle_sibling with > the CPU number of the code that is waking up the task > > 4) If wake_wide is true, we call select_idle_sibling with > the CPU number the task was previously running on (prev_cpu) > > In effect, the "wake task on waking task's CPU" behaviour > is the default, regardless of how frequently a task wakes up > its wakee, and regardless of impact on NUMA locality. > > This may need to be changed. That behavior also improves the odds of communicating tasks sharing a cache though. > Am I overlooking anything? No, I think you're seeing where the worms live. > What benchmarks should I run to test any changes I make? Mixed bag, it'll affects all, bursty, static, ramp up/down. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/