On Fri, Sep 14, 2018 at 06:25:44PM +0200, Jan H. Schönherr wrote: > On 09/14/2018 01:12 PM, Peter Zijlstra wrote:
> >> 1. Execute parallel applications that rely on active waiting or synchronous > >> execution concurrently with other applications. > >> > >> The prime example in this class are probably virtual machines. Here, > >> coscheduling is an alternative to paravirtualized spinlocks, pause loop > >> exiting, and other techniques with its own set of advantages and > >> disadvantages over the other approaches. > > > > Note that in order to avoid PLE and paravirt spinlocks and paravirt > > tlb-invalidate you have to gang-schedule the _entire_ VM, not just SMT > > siblings. > > > > Now explain to me how you're going to gang-schedule a VM with a good > > number of vCPU threads (say spanning a number of nodes) and preserving > > the rest of CFS without it turning into a massive trainwreck? > > You probably don't -- for the same reason, why it is a bad idea to give > an endless loop realtime priority. It's just a bad idea. As I said in the > text you quoted: coscheduling comes with its own set of advantages and > disadvantages. Just because you find one example, where it is a bad idea, > doesn't make it a bad thing in general. > > > > Such things (gang scheduling VMs) _are_ possible, but not within the > > confines of something like CFS, they are also fairly inefficient > > because, as you do note, you will have to explicitly schedule idle time > > for idle vCPUs. > > With gang scheduling as defined by Feitelson and Rudolph [6], you'd have to > explicitly schedule idle time. With coscheduling as defined by Ousterhout [7], > you don't. In this patch set, the scheduling of idle time is "merely" a quirk > of the implementation. And even with this implementation, there's nothing > stopping you from down-sizing the width of the coscheduled set to take out > the idle vCPUs dynamically, cutting down on fragmentation. The thing is, if you drop the full width gang scheduling, you instantly require the paravirt spinlock / tlb-invalidate stuff again. Of course, the constraints of L1TF itself requires the explicit scheduling of idle time under a bunch of conditions. I did not read your [7] in much detail (also very bad quality scan that :-/; but I don't get how they leap from 'thrashing' to co-scheduling. Their initial problem, where A generates data that B needs and the 3 scenarios: 1) A has to wait for B 2) B has to wait for A 3) the data gets buffered Seems fairly straight forward and is indeed quite common, needing co-scheduling for that, I'm not convinced. We have of course added all sorts of adaptive wait loops in the kernel to deal with just that issue. With co-scheduling you 'ensure' B is running when A is, but that doesn't mean you can actually make more progress, you could just be burning a lot of CPu cycles (which could've been spend doing other work). I'm also not convinced co-scheduling makes _any_ sense outside SMT -- does one of the many papers you cite make a good case for !SMT co-scheduling? It just doesn't make sense to co-schedule the LLC domain, that's 16+ cores on recent chips.