On Wed, Oct 9, 2013 at 8:20 PM, Anup Patel <a...@brainfault.org> wrote: > On Wed, Oct 9, 2013 at 7:48 PM, Marc Zyngier <marc.zyng...@arm.com> wrote: >> On 09/10/13 14:26, Gleb Natapov wrote: >>> On Wed, Oct 09, 2013 at 03:09:54PM +0200, Alexander Graf wrote: >>>> >>>> On 07.10.2013, at 18:53, Gleb Natapov <g...@redhat.com> wrote: >>>> >>>>> On Mon, Oct 07, 2013 at 06:30:04PM +0200, Alexander Graf wrote: >>>>>> >>>>>> On 07.10.2013, at 18:16, Marc Zyngier <marc.zyng...@arm.com> wrote: >>>>>> >>>>>>> On 07/10/13 17:04, Alexander Graf wrote: >>>>>>>> >>>>>>>> On 07.10.2013, at 17:40, Marc Zyngier <marc.zyng...@arm.com> wrote: >>>>>>>> >>>>>>>>> On an (even slightly) oversubscribed system, spinlocks are quickly >>>>>>>>> becoming a bottleneck, as some vcpus are spinning, waiting for a >>>>>>>>> lock to be released, while the vcpu holding the lock may not be >>>>>>>>> running at all. >>>>>>>>> >>>>>>>>> This creates contention, and the observed slowdown is 40x for >>>>>>>>> hackbench. No, this isn't a typo. >>>>>>>>> >>>>>>>>> The solution is to trap blocking WFEs and tell KVM that we're now >>>>>>>>> spinning. This ensures that other vpus will get a scheduling boost, >>>>>>>>> allowing the lock to be released more quickly. >>>>>>>>> >>>>>>>>>> From a performance point of view: hackbench 1 process 1000 >>>>>>>>> >>>>>>>>> 2xA15 host (baseline): 1.843s >>>>>>>>> >>>>>>>>> 2xA15 guest w/o patch: 2.083s 4xA15 guest w/o patch: 80.212s >>>>>>>>> >>>>>>>>> 2xA15 guest w/ patch: 2.072s 4xA15 guest w/ patch: 3.202s >>>>>>>> >>>>>>>> I'm confused. You got from 2.083s when not exiting on spin locks to >>>>>>>> 2.072 when exiting on _every_ spin lock that didn't immediately >>>>>>>> succeed. I would've expected to second number to be worse rather than >>>>>>>> better. I assume it's within jitter, I'm still puzzled why you don't >>>>>>>> see any significant drop in performance. >>>>>>> >>>>>>> The key is in the ARM ARM: >>>>>>> >>>>>>> B1.14.9: "When HCR.TWE is set to 1, and the processor is in a Non-secure >>>>>>> mode other than Hyp mode, execution of a WFE instruction generates a Hyp >>>>>>> Trap exception if, ignoring the value of the HCR.TWE bit, conditions >>>>>>> permit the processor to suspend execution." >>>>>>> >>>>>>> So, on a non-overcommitted system, you rarely hit a blocking spinlock, >>>>>>> hence not trapping. Otherwise, performance would go down the drain very >>>>>>> quickly. >>>>>> >>>>>> Well, it's the same as pause/loop exiting on x86, but there we have >>>>>> special hardware features to only ever exit after n number of >>>>>> turnarounds. I wonder why we have those when we could just as easily >>>>>> exit on every blocking path. >>>>>> >>>>> It will hurt performance if vcpu that holds the lock is running. >>>> >>>> Apparently not so on ARM. At least that's what Marc's numbers are showing. >>>> I'm not sure what exactly that means. Basically his logic is "if we spin, >>>> the holder must have been preempted". And it seems to work out >>>> surprisingly well. >> >> Yes. I basically assume that contention should be rare, and that ending >> up in a *blocking* WFE is a sign that we're in thrashing mode already >> (no event is pending). >> >>>> >>> For not contended locks it make sense. We need to recheck if x86 >>> assumption is still true there, but x86 lock is ticketing which >>> has not only lock holder preemption, but also lock waiter >>> preemption problem which make overcommit problem even worse. >> >> Locks are ticketing on ARM as well. But there is one key difference here >> with x86 (or at least what I understand of it, which is very close to >> none): We only trap if we would have blocked anyway. In our case, it is >> almost always better to give up the CPU to someone else rather than >> waiting for some event to take the CPU out of sleep. > > Benefits of "Yield CPU when vcpu executes a WFE" seems to depend on: > 1. How spin lock is implemented in Guest OS? > we cannot assume > that underlying Guest OS is always Linux. > 2. How bad/good is spin > > It will be good if we can enable/disable "Yield CPU when vcpu executes a WFE
(Please ignore previous incomplete reply ....) Benefits of "Yield CPU when vcpu executes a WFE" seems to depend on: 1. How spin lock is implemented in Guest OS? (Note: we cannot assume that underlying Guest OS is always Linux) 2. How bad/good is spin lock contention in Guest ? (Note: here too we cannot assume the loads running on Guest) It will be good if we can enable/disable "Yield CPU when vcpu executes a WFE" via Kconfig. --Anup > > >> >> M. >> -- >> Jazz is not dead. It just smells funny... >> >> >> _______________________________________________ >> kvmarm mailing list >> kvm...@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html