On Wed, 2010-12-01 at 21:42 +0530, Srivatsa Vaddagiri wrote: > Not if yield() remembers what timeslice was given up and adds that back when > thread is finally ready to run. Figure below illustrates this idea: > > > A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 > p0 |----|-L|----|----|----|L|----|----|----|L|----|----|----|--------------| > \ \ \ \ > B0/2[2] B0/0[6] B0/0[10] B0/14[0] > > > where, > p0 -> physical cpu0 > L -> denotes period of lock contention > A0/4 -> means vcpu A0 (of guest A) ran for 4 ms > B0/2[6] -> means vcpu B0 (of guest B) ran for 2 ms (and has given up > 6ms worth of its timeslice so far). In reality, we should > not see too much of "given up" timeslice for a vcpu.
/me fails to parse > > >Regarding directed yield, do we have any reliable mechanism to find target > > >of > > >directed yield in this (unmodified/non-paravirtualized guest) case? IOW > > >how do > > >we determine the vcpu thread to which cycles need to be yielded upon > > >contention? > > > > My idea was to yield to a random starved vcpu of the same guest. > > There are several cases to consider: > > > > - we hit the right vcpu; lock is released, party. > > - we hit some vcpu that is doing unrelated work. yielding thread > > doesn't make progress, but we're not wasting cpu time. > > - we hit another waiter for the same lock. it will also PLE exit > > and trigger a directed yield. this increases the cost of directed > > yield by a factor of count_of_runnable_but_not_running_vcpus, which > > could be large, but not disasterously so (i.e. don't run a 64-vcpu > > guest on a uniprocessor host with this) > > > > >> So if you were to test something similar running with a 20% vcpu > > >> cap, I'm sure you'd run into similar issues. It may show with fewer > > >> vcpus (I've only tested 64). > > >> > > >> >Are you assuming the existence of a directed yield and the > > >> >specific concern is what happens when a directed yield happens > > >> >after a PLE and the target of the yield has been capped? > > >> > > >> Yes. My concern is that we will see the same kind of problems > > >> directed yield was designed to fix, but without allowing directed > > >> yield to fix them. Directed yield was designed to fix lock holder > > >> preemption under contention, > > > > > >For modified guests, something like [2] seems to be the best approach to > > >fix > > >lock-holder preemption (LHP) problem, which does not require any sort of > > >directed yield support. Essentially upon contention, a vcpu registers its > > >lock > > >of interest and goes to sleep (via hypercall) waiting for lock-owner to > > >wake it > > >up (again via another hypercall). > > > > Right. > > We don't have these hypercalls for KVM atm, which I am working on now. > > > >For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1]) > > >should fix the LHP problem. > > > > A plain yield (ignoring no-opiness on Linux) will penalize the > > running guest wrt other guests. We need to maintain fairness. > > Agreed on the need to maintain fairness. Directed yield and fairness don't mix well either. You can end up feeding the other tasks more time than you'll ever get back. > > >Fyi, Xen folks also seem to be avoiding a directed yield for some of the > > >same > > >reasons [3]. > > > > I think that fails for unmodified guests, where you don't know when > > the lock is released and so you don't have a wake_up notification. > > You lost a large timeslice and you can't gain it back, whereas with > > pv the wakeup means you only lose as much time as the lock was held. > > > > >Given this line of thinking, hard-limiting guests (either in user-space or > > >kernel-space, latter being what I prefer) should not have adverse > > >interactions > > >with LHP-related solutions. > > > > If you hard-limit a vcpu that holds a lock, any waiting vcpus are > > also halted. > > This can happen in normal case when lock-holders are preempted as well. So > not a new problem that hard-limits is introducing! No, but hard limits make it _much_ worse. > > With directed yield you can let the lock holder make > > some progress at the expense of another vcpu. A regular yield() > > will simply stall the waiter. > > Agreed. Do you see any problems with slightly enhanced version of yeild > described above (rather than directed yield)? It has some advantage over > directed yield in that it preserves not only fairness between VMs but also > fairness between VCPUs of a VM. Also it avoids the need for a guessing game > mentioned above and bad interactions with hard-limits. > > CCing other scheduler experts for their opinion of proposed yield() > extensions. sys_yield() usage for anything other but two FIFO threads of the same priority goes to /dev/null. The Xen paravirt spinlock solution is relatively sane, use that. Unmodified guests suck anyway, there's really nothing much sane you can do there as you don't know who owns what lock.