Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
* Rik van Riel [2010-12-13 12:02:51]: > On 12/11/2010 08:57 AM, Balbir Singh wrote: > > >If the vpcu holding the lock runs more and capped, the timeslice > >transfer is a heuristic that will not help. > > That indicates you really need the cap to be per guest, and > not per VCPU. > Yes, I personally think so too, but I suspect there needs to be a larger agreement on the semantics. The VCPU semantics in terms of power apply to each VCPU as opposed to the entire system (per guest). > Having one VCPU spin on a lock (and achieve nothing), because > the other one cannot give up the lock due to hitting its CPU > cap could lead to showstoppingly bad performance. Yes, that seems right! -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/11/2010 08:57 AM, Balbir Singh wrote: If the vpcu holding the lock runs more and capped, the timeslice transfer is a heuristic that will not help. That indicates you really need the cap to be per guest, and not per VCPU. Having one VCPU spin on a lock (and achieve nothing), because the other one cannot give up the lock due to hitting its CPU cap could lead to showstoppingly bad performance. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/13/2010 02:39 PM, Balbir Singh wrote: * Avi Kivity [2010-12-13 13:57:37]: > On 12/11/2010 03:57 PM, Balbir Singh wrote: > >* Avi Kivity [2010-12-11 09:31:24]: > > > >> On 12/10/2010 07:03 AM, Balbir Singh wrote: > >> >> > >> >>Scheduler people, please flame me with anything I may have done > >> >>wrong, so I can do it right for a next version :) > >> >> > >> > > >> >This is a good problem statement, there are other things to consider > >> >as well > >> > > >> >1. If a hard limit feature is enabled underneath, donating the > >> >timeslice would probably not make too much sense in that case > >> > >> What's the alternative? > >> > >> Consider a two vcpu guest with a 50% hard cap. Suppose the workload > >> involves ping-ponging within the guest. If the scheduler decides to > >> schedule the vcpus without any overlap, then the throughput will be > >> dictated by the time slice. If we allow donation, throughput is > >> limited by context switch latency. > >> > > > >If the vpcu holding the lock runs more and capped, the timeslice > >transfer is a heuristic that will not help. > > Why not? as long as we shift the cap as well. > Shifting the cap would break it, no? The total cap for the guest would remain. Anyway, that is something for us to keep track of as we add additional heuristics, not a show stopper. Sure, as long as we see a way to fix it eventually. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
* Avi Kivity [2010-12-13 13:57:37]: > On 12/11/2010 03:57 PM, Balbir Singh wrote: > >* Avi Kivity [2010-12-11 09:31:24]: > > > >> On 12/10/2010 07:03 AM, Balbir Singh wrote: > >> >> > >> >> Scheduler people, please flame me with anything I may have done > >> >> wrong, so I can do it right for a next version :) > >> >> > >> > > >> >This is a good problem statement, there are other things to consider > >> >as well > >> > > >> >1. If a hard limit feature is enabled underneath, donating the > >> >timeslice would probably not make too much sense in that case > >> > >> What's the alternative? > >> > >> Consider a two vcpu guest with a 50% hard cap. Suppose the workload > >> involves ping-ponging within the guest. If the scheduler decides to > >> schedule the vcpus without any overlap, then the throughput will be > >> dictated by the time slice. If we allow donation, throughput is > >> limited by context switch latency. > >> > > > >If the vpcu holding the lock runs more and capped, the timeslice > >transfer is a heuristic that will not help. > > Why not? as long as we shift the cap as well. > Shifting the cap would break it, no? Anyway, that is something for us to keep track of as we add additional heuristics, not a show stopper. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/11/2010 03:57 PM, Balbir Singh wrote: * Avi Kivity [2010-12-11 09:31:24]: > On 12/10/2010 07:03 AM, Balbir Singh wrote: > >> > >> Scheduler people, please flame me with anything I may have done > >> wrong, so I can do it right for a next version :) > >> > > > >This is a good problem statement, there are other things to consider > >as well > > > >1. If a hard limit feature is enabled underneath, donating the > >timeslice would probably not make too much sense in that case > > What's the alternative? > > Consider a two vcpu guest with a 50% hard cap. Suppose the workload > involves ping-ponging within the guest. If the scheduler decides to > schedule the vcpus without any overlap, then the throughput will be > dictated by the time slice. If we allow donation, throughput is > limited by context switch latency. > If the vpcu holding the lock runs more and capped, the timeslice transfer is a heuristic that will not help. Why not? as long as we shift the cap as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
* Avi Kivity [2010-12-11 09:31:24]: > On 12/10/2010 07:03 AM, Balbir Singh wrote: > >> > >> Scheduler people, please flame me with anything I may have done > >> wrong, so I can do it right for a next version :) > >> > > > >This is a good problem statement, there are other things to consider > >as well > > > >1. If a hard limit feature is enabled underneath, donating the > >timeslice would probably not make too much sense in that case > > What's the alternative? > > Consider a two vcpu guest with a 50% hard cap. Suppose the workload > involves ping-ponging within the guest. If the scheduler decides to > schedule the vcpus without any overlap, then the throughput will be > dictated by the time slice. If we allow donation, throughput is > limited by context switch latency. > If the vpcu holding the lock runs more and capped, the timeslice transfer is a heuristic that will not help. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/10/2010 07:03 AM, Balbir Singh wrote: > > Scheduler people, please flame me with anything I may have done > wrong, so I can do it right for a next version :) > This is a good problem statement, there are other things to consider as well 1. If a hard limit feature is enabled underneath, donating the timeslice would probably not make too much sense in that case What's the alternative? Consider a two vcpu guest with a 50% hard cap. Suppose the workload involves ping-ponging within the guest. If the scheduler decides to schedule the vcpus without any overlap, then the throughput will be dictated by the time slice. If we allow donation, throughput is limited by context switch latency. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/10/2010 12:03 AM, Balbir Singh wrote: This is a good problem statement, there are other things to consider as well 1. If a hard limit feature is enabled underneath, donating the timeslice would probably not make too much sense in that case The idea is to get the VCPU that is holding the lock to run ASAP, so it can release the lock. 2. The implict assumption is that spinning is bad, but for locks held for short durations, the assumption is not true. I presume by the problem statement above, the h/w does the detection of when to pause, but that is not always correct as you suggest above. The hardware waits a certain number of spins before it traps to the virt host. Short-held locks, held by a virtual CPU that is running, will not trigger the exception. 3. With respect to donating timeslices, don't scheduler cgroups and job isolation address that problem today? No. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
* Rik van Riel [2010-12-02 14:41:29]: > When running SMP virtual machines, it is possible for one VCPU to be > spinning on a spinlock, while the VCPU that holds the spinlock is not > currently running, because the host scheduler preempted it to run > something else. > > Both Intel and AMD CPUs have a feature that detects when a virtual > CPU is spinning on a lock and will trap to the host. > > The current KVM code sleeps for a bit whenever that happens, which > results in eg. a 64 VCPU Windows guest taking forever and a bit to > boot up. This is because the VCPU holding the lock is actually > running and not sleeping, so the pause is counter-productive. > > In other workloads a pause can also be counter-productive, with > spinlock detection resulting in one guest giving up its CPU time > to the others. Instead of spinning, it ends up simply not running > much at all. > > This patch series aims to fix that, by having a VCPU that spins > give the remainder of its timeslice to another VCPU in the same > guest before yielding the CPU - one that is runnable but got > preempted, hopefully the lock holder. > > Scheduler people, please flame me with anything I may have done > wrong, so I can do it right for a next version :) > This is a good problem statement, there are other things to consider as well 1. If a hard limit feature is enabled underneath, donating the timeslice would probably not make too much sense in that case 2. The implict assumption is that spinning is bad, but for locks held for short durations, the assumption is not true. I presume by the problem statement above, the h/w does the detection of when to pause, but that is not always correct as you suggest above. 3. With respect to donating timeslices, don't scheduler cgroups and job isolation address that problem today? -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
On 12/03/2010 12:41 AM, Chris Wright wrote: * Rik van Riel (r...@redhat.com) wrote: > When running SMP virtual machines, it is possible for one VCPU to be > spinning on a spinlock, while the VCPU that holds the spinlock is not > currently running, because the host scheduler preempted it to run > something else. > > Both Intel and AMD CPUs have a feature that detects when a virtual > CPU is spinning on a lock and will trap to the host. > > The current KVM code sleeps for a bit whenever that happens, which > results in eg. a 64 VCPU Windows guest taking forever and a bit to > boot up. This is because the VCPU holding the lock is actually > running and not sleeping, so the pause is counter-productive. Seems like simply increasing the spin window help in that case? Or is it just too contended a lock (I think they use mcs locks, so I can see a single wrong sleep causing real contention problems). It may, but that just pushes the problem to a more contended lock or to a higher vcpu count. We want something that works after PLE threshold tuning has failed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting
* Rik van Riel (r...@redhat.com) wrote: > When running SMP virtual machines, it is possible for one VCPU to be > spinning on a spinlock, while the VCPU that holds the spinlock is not > currently running, because the host scheduler preempted it to run > something else. > > Both Intel and AMD CPUs have a feature that detects when a virtual > CPU is spinning on a lock and will trap to the host. > > The current KVM code sleeps for a bit whenever that happens, which > results in eg. a 64 VCPU Windows guest taking forever and a bit to > boot up. This is because the VCPU holding the lock is actually > running and not sleeping, so the pause is counter-productive. Seems like simply increasing the spin window help in that case? Or is it just too contended a lock (I think they use mcs locks, so I can see a single wrong sleep causing real contention problems). > In other workloads a pause can also be counter-productive, with > spinlock detection resulting in one guest giving up its CPU time > to the others. Instead of spinning, it ends up simply not running > much at all. > > This patch series aims to fix that, by having a VCPU that spins > give the remainder of its timeslice to another VCPU in the same > guest before yielding the CPU - one that is runnable but got > preempted, hopefully the lock holder. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/3] directed yield for Pause Loop Exiting
When running SMP virtual machines, it is possible for one VCPU to be spinning on a spinlock, while the VCPU that holds the spinlock is not currently running, because the host scheduler preempted it to run something else. Both Intel and AMD CPUs have a feature that detects when a virtual CPU is spinning on a lock and will trap to the host. The current KVM code sleeps for a bit whenever that happens, which results in eg. a 64 VCPU Windows guest taking forever and a bit to boot up. This is because the VCPU holding the lock is actually running and not sleeping, so the pause is counter-productive. In other workloads a pause can also be counter-productive, with spinlock detection resulting in one guest giving up its CPU time to the others. Instead of spinning, it ends up simply not running much at all. This patch series aims to fix that, by having a VCPU that spins give the remainder of its timeslice to another VCPU in the same guest before yielding the CPU - one that is runnable but got preempted, hopefully the lock holder. Scheduler people, please flame me with anything I may have done wrong, so I can do it right for a next version :) -- All rights reversed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html