Hi Mike, On Sat, Jul 29, 2017 at 1:19 PM, Joel Fernandes <joe...@google.com> wrote: <snip> > >>> To explain the second condition above, Michael Wang said the following in >>> [1] >>> >>> "Furthermore, if waker also has a high 'nr_wakee_switch', imply that >>> multiple >>> tasks rely on it, then waker's higher latency will damage all of them, pull >>> wakee seems to be a bad deal." >> >> Yes, "Furthermore". To detect 1:N, Michael chose llc_size as his N. Is >> the one flipping partners at least N/s, and the other about N times as >> often? If so, the two may be part of a too big to wisely pull 1:N. >> >> If you have a better idea, by all means, pull it out. Nobody is > > Sure yeah, first I'm trying to understand the heuristic itself which > I'm glad to be making progress with thanks to yours and others' help! > >> attached to wake_wide(), in fact, I suspect Peter hates it. I'm not >> fond of it either, it having obvious holes. The only thing it has >> going for it is simplicity. Bend it up, replace it, fire away. >> > > Ok, it makes much more sense to me now. Also for the N:N case, > wouldn't the excessive wake-affine increase the latency and a > spreading might be better? Say if slave and master flips are much > greater than factor (llc_size), then slave > factor && master < slave > * factor, would probably return true a lot (and we would return 0 > causing an affine wakeup). That's probably a bad thing right as it > could overload the waker's CPU quickly? I guess the heuristic tries to > maximize cache-hits more than reduce latency? > >>> Again I didn't follow why the second condition couldn't just be: >>> waker->nr_wakee_switch > factor, or, (waker->nr_wakee_switch + >>> wakee->nr_wakee_switch) > factor, based on the above explanation from >>> Micheal Wang that I quoted. >>> and why he's instead doing the whole multiplication thing there that I >>> was talking about earlier: "factor * wakee->nr_wakee_switch". >>> >>> Rephrasing my question in another way, why are we talking the ratio of >>> master/slave instead of the sum when comparing if its > factor? I am >>> surely missing something here. >> >> Because the heuristic tries to not demolish 1:1 buddies. Big partner >> flip delta means the pair are unlikely to be a communicating pair, >> perhaps at high frequency where misses hurt like hell. > > But it does seem to me to demolish the N:N communicating pairs from a > latency/load balancing standpoint. For he case of N readers and N > writers, the ratio (master/slave) comes down to 1:1 and we wake > affine. Hopefully I didn't miss something too obvious about that.
I think wake_affine() should correctly handle the case (of overloading) I bring up here where wake_wide() is too conservative and does affine a lot, (I don't have any data for this though, this just from code reading), so I take this comment back for this reason. thanks, -Joel