On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: > On 02/21/2013 05:43 PM, Mike Galbraith wrote: > > On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote: > > > >> But is this patch set really cause regression on your Q6600? It may > >> sacrificed some thing, but I still think it will benefit far more, > >> especially on huge systems. > > > > We spread on FORK/EXEC, and will no longer will pull communicating tasks > > back to a shared cache with the new logic preferring to leave wakee > > remote, so while no, I haven't tested (will try to find round tuit) it > > seems it _must_ hurt. Dragging data from one llc to the other on Q6600 > > hurts a LOT. Every time a client and server are cross llc, it's a huge > > hit. The previous logic pulled communicating tasks together right when > > it matters the most, intermittent load... or interactive use. > > I agree that this is a problem need to be solved, but don't agree that > wake_affine() is the solution.
It's not perfect, but it's better than no countering force at all. It's a relic of the dark ages, when affine meant L2, ie this cpu. Now days, affine has a whole new meaning, L3, so it could be done differently, but _some_ kind of opposing force is required. > According to my understanding, in the old world, wake_affine() will only > be used if curr_cpu and prev_cpu share cache, which means they are in > one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't > have the chance to spread the task out of that package. ? affine_sd is the first domain spanning both cpus, that may be NODE. True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is set that is. Would be nice to be able to do that without shredding performance. Off the top of my pointy head, I can think of a way to _maybe_ improve the "affine" wakeup criteria: Add a small (package size? and very fast) FIFO queue to task struct, record waker/wakee relationship. If relationship exists in that queue (rbtree), try to wake local, if not, wake remote. The thought is to identify situations ala 1:N pgbench where you really need to keep the load spread. That need arises when the sum wakees + waker won't fit in one cache. True buddies would always hit (hm, hit rate), always try to become affine where they thrive. 1:N stuff starts missing when client count exceeds package size, starts expanding it's horizons. 'Course you would still need to NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls and whatnot. With a little more smarts, we could have happy 1:N, and buddies don't have to chat through 2m thick walls to make 1:N scale as well as it can before it dies of stupidity. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/