Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
On 12/09/2010 07:07 PM, Rik van Riel wrote: Right. May be clearer by using a for () loop instead of the goto. And open coding kvm_for_each_vcpu ? Somehow I suspect that won't add to clarity... No, I meant having a for (pass = 0; pass 2; ++pass) and nesting kvm_for_each_vcpu() in it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
On 12/09/2010 12:38 AM, Rik van Riel wrote: - /* Sleep for 100 us, and hope lock-holder got scheduled */ - expires = ktime_add_ns(ktime_get(), 10UL); - schedule_hrtimeout(expires, HRTIMER_MODE_ABS); + if (first_round last_boosted_vcpu == kvm-last_boosted_vcpu) { + /* We have not found anyone yet. */ + first_round = 0; + goto again; Need to guarantee termination. We do that by setting first_round to 0 :) We at most walk N+1 VCPUs in a VM with N VCPUs, with this patch. Right. May be clearer by using a for () loop instead of the goto. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
On 12/09/2010 05:28 AM, Avi Kivity wrote: On 12/09/2010 12:38 AM, Rik van Riel wrote: - /* Sleep for 100 us, and hope lock-holder got scheduled */ - expires = ktime_add_ns(ktime_get(), 10UL); - schedule_hrtimeout(expires, HRTIMER_MODE_ABS); + if (first_round last_boosted_vcpu == kvm-last_boosted_vcpu) { + /* We have not found anyone yet. */ + first_round = 0; + goto again; Need to guarantee termination. We do that by setting first_round to 0 :) We at most walk N+1 VCPUs in a VM with N VCPUs, with this patch. Right. May be clearer by using a for () loop instead of the goto. And open coding kvm_for_each_vcpu ? Somehow I suspect that won't add to clarity... -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
On 12/05/2010 07:56 AM, Avi Kivity wrote: + if (vcpu == me) + continue; + if (vcpu-spinning) + continue; You may well want to wake up a spinner. Suppose A takes a lock B preempts A B grabs a ticket, starts spinning, yields to A A releases lock A grabs ticket, starts spinning at this point, we want A to yield to B, but it won't because of this check. That's a good point. I guess we'll have to benchmark both with and without the vcpu-spinning logic. + if (!task) + continue; + if (waitqueue_active(vcpu-wq)) + continue; + if (task-flags PF_VCPU) + continue; + kvm-last_boosted_vcpu = i; + yield_to(task); + break; + } I think a random selection algorithm will be a better fit against special guest behaviour. Possibly, though I suspect we'd have to hit very heavy overcommit ratios with very large VMs before round robin stops working. - /* Sleep for 100 us, and hope lock-holder got scheduled */ - expires = ktime_add_ns(ktime_get(), 10UL); - schedule_hrtimeout(expires, HRTIMER_MODE_ABS); + if (first_round last_boosted_vcpu == kvm-last_boosted_vcpu) { + /* We have not found anyone yet. */ + first_round = 0; + goto again; Need to guarantee termination. We do that by setting first_round to 0 :) We at most walk N+1 VCPUs in a VM with N VCPUs, with this patch. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
On 12/03/2010 04:24 AM, Chris Wright wrote: * Rik van Riel (r...@redhat.com) wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1880,18 +1880,53 @@ void kvm_resched(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_resched); -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) +void kvm_vcpu_on_spin(struct kvm_vcpu *me) { - ktime_t expires; - DEFINE_WAIT(wait); + struct kvm *kvm = me-kvm; + struct kvm_vcpu *vcpu; + int last_boosted_vcpu = me-kvm-last_boosted_vcpu; s/me-// + int first_round = 1; + int i; - prepare_to_wait(vcpu-wq,wait, TASK_INTERRUPTIBLE); + me-spinning = 1; + + /* + * We boost the priority of a VCPU that is runnable but not + * currently running, because it got preempted by something + * else and called schedule in __vcpu_run. Hopefully that + * VCPU is holding the lock that we need and will release it. + * We approximate round-robin by starting at the last boosted VCPU. + */ + again: + kvm_for_each_vcpu(i, vcpu, kvm) { + struct task_struct *task = vcpu-task; + if (first_round i last_boosted_vcpu) { + i = last_boosted_vcpu; + continue; + } else if (!first_round i last_boosted_vcpu) + break; + if (vcpu == me) + continue; + if (vcpu-spinning) + continue; + if (!task) + continue; + if (waitqueue_active(vcpu-wq)) + continue; + if (task-flags PF_VCPU) + continue; I wonder if you set vcpu-task in sched_out and then NULL it in sched_in if you'd get what you want you could simplify the checks. Basically that would be only the preempted runnable vcpus. They may be sleeping due to some other reason (HLT, major page fault). Better check is that the task is runnable but not running. Can we get this information from a task? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin
* Rik van Riel (r...@redhat.com) wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1880,18 +1880,53 @@ void kvm_resched(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_resched); -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu) +void kvm_vcpu_on_spin(struct kvm_vcpu *me) { - ktime_t expires; - DEFINE_WAIT(wait); + struct kvm *kvm = me-kvm; + struct kvm_vcpu *vcpu; + int last_boosted_vcpu = me-kvm-last_boosted_vcpu; s/me-// + int first_round = 1; + int i; - prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE); + me-spinning = 1; + + /* + * We boost the priority of a VCPU that is runnable but not + * currently running, because it got preempted by something + * else and called schedule in __vcpu_run. Hopefully that + * VCPU is holding the lock that we need and will release it. + * We approximate round-robin by starting at the last boosted VCPU. + */ + again: + kvm_for_each_vcpu(i, vcpu, kvm) { + struct task_struct *task = vcpu-task; + if (first_round i last_boosted_vcpu) { + i = last_boosted_vcpu; + continue; + } else if (!first_round i last_boosted_vcpu) + break; + if (vcpu == me) + continue; + if (vcpu-spinning) + continue; + if (!task) + continue; + if (waitqueue_active(vcpu-wq)) + continue; + if (task-flags PF_VCPU) + continue; I wonder if you set vcpu-task in sched_out and then NULL it in sched_in if you'd get what you want you could simplify the checks. Basically that would be only the preempted runnable vcpus. + kvm-last_boosted_vcpu = i; + yield_to(task); Just trying to think of ways to be sure this doesn't become just yield() (although I thnk PF_VCPU is enough to catch that). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html