On 13/07/20 14:43, Peter Zijlstra wrote: > On Mon, Jul 13, 2020 at 02:28:29PM +0100, Valentin Schneider wrote: >> >> On 13/07/20 13:55, Peter Zijlstra wrote: >> > On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote: >> >> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and >> >> sd_degenerate_parent(). >> >> >> >> Note that this changes sd_degenerate() somewhat: I'm using the negation of >> >> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which >> >> is equivalent to: >> >> >> >> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA >> >> >> >> whereas the current mask for that is simply >> >> >> >> SD_WAKE_AFFINE >> >> >> >> I played with a few toy NUMA topologies on QEMU and couldn't cause a >> >> different degeneration than what mainline does currently. If that is >> >> deemed >> >> too risky, we can go back to using SD_WAKE_AFFINE explicitly. >> > >> > Arguably SD_SERIALIZE needs groups, note how we're only having that >> > effective for machines with at least 2 nodes. It's a bit shit how we end >> > up there, but IIRC that's what it ends up as. >> > >> >> Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any >> level above NODE. > > Oh, right, I forgot we have NODE, d'0h. But in that case these lines: > > if (nr_node_ids == 1) > pflags &= ~SD_SERIALIZE; > > are dead code, right? >
Uuuuh right, that does sound like something made obsolete by having NODE; we didn't have it back then: 5436499e6098 ("sched: fix sd_parent_degenerate on non-numa smp machine") I'll fold that in (and test it, just to be sure :-)). >> > SD_NUMA is descriptive, and not marking a group as degenerates because >> > it has SD_NUMA seems a bit silly. >> >> It does, although we can still degenerate it, see below. >> >> > But then, it would be the top domain >> > and would survive anyway? >> >> So from what I've tested we still get rid of those via >> sd_parent_degenerate(): child and parent have the same flags and same span, >> so parent goes out. >> >> That happens in the middle of the NUMA topology levels on that borked >> topology with weird distances, aka >> >> node distances: >> node 0 1 2 3 >> 0: 10 12 20 22 >> 1: 12 10 22 24 >> 2: 20 22 10 12 >> 3: 22 24 12 10 >> >> which ought to look something like (+local distance to end result) >> >> 2 10 2 >> 1 <---> 0 <---> 2 <---> 3 >> >> We end up with the following NUMA levels (i.e. deduplicated distances) >> NUMA (<= 12) >> NUMA (<= 20) >> NUMA (<= 22) >> NUMA (<= 24) >> >> For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as >> NUMA(<=12), so we'll degenerate it. > > Man, that's horrible :-) It is :( > OK, fair enough, keep it as is, we'll see what > if anything breaks.