Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Il 08/10/2014 12:06, Radim Krčmář ha scritto: > KVM: x86: fix deadline tsc interrupt injection > > The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a > situation where we lose a pending deadline timer in a MSR write. > Losing it is fine, because it effectively occurs before the timer fired, > so we should be able to cancel or postpone it. > > Another problem comes from interaction with QEMU, or other userspace > that can set deadline MSR without a good reason, when timer is already > pending: one guest's deadline request results in more than one > interrupt because one is injected immediately on MSR write from > userspace and one through hrtimer later. > > The solution is to remove the injection when replacing a pending timer > and to improve the usual QEMU path, we inject without a hrtimer when the > deadline has already passed. > > Signed-off-by: Radim Krčmář > Reported-by: Nadav Amit > --- > arch/x86/kvm/lapic.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index b8345dd..51428dd 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) > if (likely(tscdeadline > guest_tsc)) { > ns = (tscdeadline - guest_tsc) * 100ULL; > do_div(ns, this_tsc_khz); > + hrtimer_start(&apic->lapic_timer.timer, > + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > + } else { > + atomic_inc(&ktimer->pending); > + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > } > - hrtimer_start(&apic->lapic_timer.timer, > - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > > local_irq_restore(flags); > } > @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu > *vcpu, u64 data) > return; > > hrtimer_cancel(&apic->lapic_timer.timer); > - /* Inject here so clearing tscdeadline won't override new value */ > - if (apic_has_pending_timer(vcpu)) > - kvm_inject_apic_timer_irqs(vcpu); > apic->lapic_timer.tscdeadline = data; > start_apic_timer(apic); > } Radim, the patch looks good but please extract this code: /* * There is a race window between reading and incrementing, but we do * not care about potentially losing timer events in the !reinject * case anyway. Note: KVM_REQ_PENDING_TIMER is implicitly checked * in vcpu_enter_guest. */ if (!atomic_read(&ktimer->pending)) { atomic_inc(&ktimer->pending); /* FIXME: this code should not know anything about vcpus */ kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); } if (waitqueue_active(q)) wake_up_interruptible(q); to a new "static void apic_timer_expired(struct kvm_lapic *apic)" function, and call it from both apic_timer_fn and start_apic_timer. Also, we should not need to do wake_up_interruptible() unless we have changed ktimer->pending from zero to non-zero. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Il 10/10/2014 14:51, Nadav Amit ha scritto: >>> Second, I think that the solution I proposed would perform better. >>> Currently, there are many unnecessary cancellations and setups of the >>> timer. This solution does not resolve this problem. >> >> I think it does. You do not get an hrtimer_start if tscdeadline <= >> guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueued >> before calling hrtimer_cancel, or go straight to the source and avoid >> taking the lock in the easy cases: >> >> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c >> index 1c2fe7de2842..6ce725007424 100644 >> --- a/kernel/time/hrtimer.c >> +++ b/kernel/time/hrtimer.c >> @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) >> { >> struct hrtimer_clock_base *base; >> unsigned long flags; >> -int ret = -1; >> +unsigned long state = timer->state; >> +int ret; >> + >> +if (state & HRTIMER_STATE_ENQUEUED) >> +return 0; >> +if (state & HRTIMER_STATE_CALLBACK) >> +return -1; >> >> base = lock_hrtimer_base(timer, &flags); >> >> +ret = -1; >> if (!hrtimer_callback_running(timer)) >> ret = remove_hrtimer(timer, base); > Wouldn’t this change would cause cancellations never to succeed (the first > check would always be true if the timer is active)? Ehm, there is a missing ! in that first "if". >>> Last, I think that having less interrupts on deadline changes is not >>> completely according to the SDM which says: "If software disarms the >>> timer or postpones the deadline, race conditions may result in the >>> delivery of a spurious timer interrupt.” It never says interrupts may >>> be lost if you reprogram the deadline before you check it expired. >> >> But the case when you rewrite the same value to the MSR is neither >> disarming nor postponing. You would be getting two interrupts for the >> same event. That is why I agree with Radim that checking host_initiated >> is wrong. > > I understand, and Radim's solution seems functionally fine, now that I am > fully awake and understand it. > I still think that if tscdeadline > guest_tsc, then reprogramming the > deadline with the same value, as QEMU does, would result in unwarranted > overhead. The overhead is about two atomic operations (70 clock cycles?). Still, there are plenty of other micro-optimizations possible: 1) instead of incrementing timer->pending, set it to 1 2) change it to test_and_set_bit and only set PENDING_TIMER if the result was zero 3) non-atomically test PENDING_TIMER before (atomically) clearing it 4) return bool from kvm_inject_apic_timer_irqs and only clear PENDING_TIMER if a timer interrupt was actually injected. (1) or (2) would remove one atomic operation when reprogramming a passed deadline with the same value. (3) or (4) would remove one atomic operation in the case where the cause of the exit is not an expired timer. Any takers? > Perhaps it would be enough not to reprogram the timer if tscdeadline value > does not change (by either guest or host). Yes, and that would just be your patch without host_initiated. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
On Oct 10, 2014, at 12:45 PM, Paolo Bonzini wrote: > Il 10/10/2014 03:55, Nadav Amit ha scritto: >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> index b8345dd..51428dd 100644 >>> --- a/arch/x86/kvm/lapic.c >>> +++ b/arch/x86/kvm/lapic.c >>> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) >>> if (likely(tscdeadline > guest_tsc)) { >>> ns = (tscdeadline - guest_tsc) * 100ULL; >>> do_div(ns, this_tsc_khz); >>> + hrtimer_start(&apic->lapic_timer.timer, >>> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); >>> + } else { >>> + atomic_inc(&ktimer->pending); >>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >>> } >>> - hrtimer_start(&apic->lapic_timer.timer, >>> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); >>> >>> local_irq_restore(flags); >>> } >>> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu >>> *vcpu, u64 data) >>> return; >>> >>> hrtimer_cancel(&apic->lapic_timer.timer); >>> - /* Inject here so clearing tscdeadline won't override new value */ >>> - if (apic_has_pending_timer(vcpu)) >>> - kvm_inject_apic_timer_irqs(vcpu); >>> apic->lapic_timer.tscdeadline = data; >>> start_apic_timer(apic); >>> } >> >> Perhaps I am missing something, but I don’t see how it solves the problem I >> encountered. >> Recall the scenario: >> 1. A TSC deadline timer interrupt is pending. >> 2. TSC deadline was still not cleared (which happens during vcpu_run). >> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. >> >> It appears that in such scenario the guest would still get spurious >> interrupt for no reason, as ktimer->pending may already be increased in >> apic_timer_fn. > > If ktimer->pending == 2 you still get only one interrupt, not two. Am I > misunderstanding your objection? You understood me, and I was wrong... > >> Second, I think that the solution I proposed would perform better. >> Currently, there are many unnecessary cancellations and setups of the >> timer. This solution does not resolve this problem. > > I think it does. You do not get an hrtimer_start if tscdeadline <= > guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueued > before calling hrtimer_cancel, or go straight to the source and avoid > taking the lock in the easy cases: > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index 1c2fe7de2842..6ce725007424 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) > { > struct hrtimer_clock_base *base; > unsigned long flags; > - int ret = -1; > + unsigned long state = timer->state; > + int ret; > + > + if (state & HRTIMER_STATE_ENQUEUED) > + return 0; > + if (state & HRTIMER_STATE_CALLBACK) > + return -1; > > base = lock_hrtimer_base(timer, &flags); > > + ret = -1; > if (!hrtimer_callback_running(timer)) > ret = remove_hrtimer(timer, base); Wouldn’t this change would cause cancellations never to succeed (the first check would always be true if the timer is active)? > > >> Last, I think that having less interrupts on deadline changes is not >> completely according to the SDM which says: "If software disarms the >> timer or postpones the deadline, race conditions may result in the >> delivery of a spurious timer interrupt.” It never says interrupts may >> be lost if you reprogram the deadline before you check it expired. > > But the case when you rewrite the same value to the MSR is neither > disarming nor postponing. You would be getting two interrupts for the > same event. That is why I agree with Radim that checking host_initiated > is wrong. I understand, and Radim's solution seems functionally fine, now that I am fully awake and understand it. I still think that if tscdeadline > guest_tsc, then reprogramming the deadline with the same value, as QEMU does, would result in unwarranted overhead. Perhaps it would be enough not to reprogram the timer if tscdeadline value does not change (by either guest or host). Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
2014-10-10 11:45+0200, Paolo Bonzini: > Il 10/10/2014 03:55, Nadav Amit ha scritto: > >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >> index b8345dd..51428dd 100644 > >> --- a/arch/x86/kvm/lapic.c > >> +++ b/arch/x86/kvm/lapic.c > >> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) > >>if (likely(tscdeadline > guest_tsc)) { > >>ns = (tscdeadline - guest_tsc) * 100ULL; > >>do_div(ns, this_tsc_khz); > >> + hrtimer_start(&apic->lapic_timer.timer, > >> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> + } else { > >> + atomic_inc(&ktimer->pending); > >> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > >>} > >> - hrtimer_start(&apic->lapic_timer.timer, > >> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> > >>local_irq_restore(flags); > >>} > >> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu > >> *vcpu, u64 data) > >>return; > >> > >>hrtimer_cancel(&apic->lapic_timer.timer); > >> - /* Inject here so clearing tscdeadline won't override new value */ > >> - if (apic_has_pending_timer(vcpu)) > >> - kvm_inject_apic_timer_irqs(vcpu); > >>apic->lapic_timer.tscdeadline = data; > >>start_apic_timer(apic); > >> } > > > > Perhaps I am missing something, but I don’t see how it solves the problem I > > encountered. > > Recall the scenario: > > 1. A TSC deadline timer interrupt is pending. > > 2. TSC deadline was still not cleared (which happens during vcpu_run). > > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. > > > > It appears that in such scenario the guest would still get spurious > > interrupt for no reason, as ktimer->pending may already be increased in > > apic_timer_fn. > > If ktimer->pending == 2 you still get only one interrupt, not two. Am I > misunderstanding your objection? (I don't see it either, the patch removed kvm_inject_apic_timer_irqs(), so we inject as if the MSR write didn't happen.) > > Second, I think that the solution I proposed would perform better. > > Currently, there are many unnecessary cancellations and setups of the > > timer. This solution does not resolve this problem. > > I think it does. You do not get an hrtimer_start if tscdeadline <= > guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueued > before calling hrtimer_cancel, or go straight to the source and avoid > taking the lock in the easy cases: I think the useless cancels were when the timer is set to the same value in the future. Which makes sense to optimize, but I wasn't sure how it would affect the guest if the TSC offset was be changing or TSC itself wasn't stable ... so I chose not to do it. > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index 1c2fe7de2842..6ce725007424 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) > { > struct hrtimer_clock_base *base; > unsigned long flags; > - int ret = -1; > + unsigned long state = timer->state; > + int ret; > + > + if (state & HRTIMER_STATE_ENQUEUED) > + return 0; It doesn't try very hard to cancel it ;) > + if (state & HRTIMER_STATE_CALLBACK) > + return -1; > > base = lock_hrtimer_base(timer, &flags); > > + ret = -1; > if (!hrtimer_callback_running(timer)) > ret = remove_hrtimer(timer, base); > > > Last, I think that having less interrupts on deadline changes is not > > completely according to the SDM which says: "If software disarms the > > timer or postpones the deadline, race conditions may result in the > > delivery of a spurious timer interrupt.” It never says interrupts may > > be lost if you reprogram the deadline before you check it expired. Yeah, too bad it wasn't written in a formally defined language ... They could be just reserving design space for concurrent implementation, not making race conditions mandatory. Our race windows are much bigger -- a disarm that is hundreds of cycles in the future can easily be hit after a msr write vm-exit, but it would be very unlikely to get an interrupt on bare metal. And thinking about the meaning of postpone or disarm -- software doesn't want the old interrupt anymore, so it would just add useless work. (And I liked the code better without it, but it'd be ok if we fixed all the cases.) > But the case when you rewrite the same value to the MSR is neither > disarming nor postponing. You would be getting two interrupts for the > same event. That is why I agree with Radim that checking host_initiated > is wrong. Current implementation also injects two interrupts when the timer is set to a lower nonzero value, which should give us just one under both interpretations. -- To unsubscr
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Il 10/10/2014 03:55, Nadav Amit ha scritto: >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >> index b8345dd..51428dd 100644 >> --- a/arch/x86/kvm/lapic.c >> +++ b/arch/x86/kvm/lapic.c >> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) >> if (likely(tscdeadline > guest_tsc)) { >> ns = (tscdeadline - guest_tsc) * 100ULL; >> do_div(ns, this_tsc_khz); >> +hrtimer_start(&apic->lapic_timer.timer, >> +ktime_add_ns(now, ns), HRTIMER_MODE_ABS); >> +} else { >> +atomic_inc(&ktimer->pending); >> +kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >> } >> -hrtimer_start(&apic->lapic_timer.timer, >> -ktime_add_ns(now, ns), HRTIMER_MODE_ABS); >> >> local_irq_restore(flags); >> } >> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu >> *vcpu, u64 data) >> return; >> >> hrtimer_cancel(&apic->lapic_timer.timer); >> -/* Inject here so clearing tscdeadline won't override new value */ >> -if (apic_has_pending_timer(vcpu)) >> -kvm_inject_apic_timer_irqs(vcpu); >> apic->lapic_timer.tscdeadline = data; >> start_apic_timer(apic); >> } > > Perhaps I am missing something, but I don’t see how it solves the problem I > encountered. > Recall the scenario: > 1. A TSC deadline timer interrupt is pending. > 2. TSC deadline was still not cleared (which happens during vcpu_run). > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. > > It appears that in such scenario the guest would still get spurious > interrupt for no reason, as ktimer->pending may already be increased in > apic_timer_fn. If ktimer->pending == 2 you still get only one interrupt, not two. Am I misunderstanding your objection? > Second, I think that the solution I proposed would perform better. > Currently, there are many unnecessary cancellations and setups of the > timer. This solution does not resolve this problem. I think it does. You do not get an hrtimer_start if tscdeadline <= guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueued before calling hrtimer_cancel, or go straight to the source and avoid taking the lock in the easy cases: diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 1c2fe7de2842..6ce725007424 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer) { struct hrtimer_clock_base *base; unsigned long flags; - int ret = -1; + unsigned long state = timer->state; + int ret; + + if (state & HRTIMER_STATE_ENQUEUED) + return 0; + if (state & HRTIMER_STATE_CALLBACK) + return -1; base = lock_hrtimer_base(timer, &flags); + ret = -1; if (!hrtimer_callback_running(timer)) ret = remove_hrtimer(timer, base); > Last, I think that having less interrupts on deadline changes is not > completely according to the SDM which says: "If software disarms the > timer or postpones the deadline, race conditions may result in the > delivery of a spurious timer interrupt.” It never says interrupts may > be lost if you reprogram the deadline before you check it expired. But the case when you rewrite the same value to the MSR is neither disarming nor postponing. You would be getting two interrupts for the same event. That is why I agree with Radim that checking host_initiated is wrong. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
On Oct 8, 2014, at 1:06 PM, Radim Krčmář wrote: > > And it would get one from the currently pending timer. > > What about the following patch? > (The introduced else branch could use some abstractions.) > > --8<--- > KVM: x86: fix deadline tsc interrupt injection > > The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a > situation where we lose a pending deadline timer in a MSR write. > Losing it is fine, because it effectively occurs before the timer fired, > so we should be able to cancel or postpone it. > > Another problem comes from interaction with QEMU, or other userspace > that can set deadline MSR without a good reason, when timer is already > pending: one guest's deadline request results in more than one > interrupt because one is injected immediately on MSR write from > userspace and one through hrtimer later. > > The solution is to remove the injection when replacing a pending timer > and to improve the usual QEMU path, we inject without a hrtimer when the > deadline has already passed. > > Signed-off-by: Radim Krčmář > Reported-by: Nadav Amit > --- > arch/x86/kvm/lapic.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index b8345dd..51428dd 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) > if (likely(tscdeadline > guest_tsc)) { > ns = (tscdeadline - guest_tsc) * 100ULL; > do_div(ns, this_tsc_khz); > + hrtimer_start(&apic->lapic_timer.timer, > + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > + } else { > + atomic_inc(&ktimer->pending); > + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > } > - hrtimer_start(&apic->lapic_timer.timer, > - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > > local_irq_restore(flags); > } > @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu > *vcpu, u64 data) > return; > > hrtimer_cancel(&apic->lapic_timer.timer); > - /* Inject here so clearing tscdeadline won't override new value */ > - if (apic_has_pending_timer(vcpu)) > - kvm_inject_apic_timer_irqs(vcpu); > apic->lapic_timer.tscdeadline = data; > start_apic_timer(apic); > } Perhaps I am missing something, but I don’t see how it solves the problem I encountered. Recall the scenario: 1. A TSC deadline timer interrupt is pending. 2. TSC deadline was still not cleared (which happens during vcpu_run). 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. It appears that in such scenario the guest would still get spurious interrupt for no reason, as ktimer->pending may already be increased in apic_timer_fn. Second, I think that the solution I proposed would perform better. Currently, there are many unnecessary cancellations and setups of the timer. This solution does not resolve this problem. Last, I think that having less interrupts on deadline changes is not completely according to the SDM which says: "If software disarms the timer or postpones the deadline, race conditions may result in the delivery of a spurious timer interrupt.” It never says interrupts may be lost if you reprogram the deadline before you check it expired. Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Il 08/10/2014 12:06, Radim Krčmář ha scritto: > > > - why is host_initiated required? > > > Since if the guest writes to the MSR, it means it wants to rearm the TSC > > deadline. Even if the deadline passed, interrupt should be triggered. > > MSR isn't 0, so the deadline hasn't passed for the guest yet. > > > If the guest writes the same value on the deadline MSR twice, it might > > expect two interrupts. > > When guest writes to it without getting an interrupt first, it might > expect just one. (Which it better IMO.) Indeed that was my doubt as well. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
2014-10-07 12:35+0300, Nadav Amit: > Thanks for reviewing this patch and the rest of the gang. Happy to do so, I've learned a lot. > On Oct 6, 2014, at 11:57 PM, Radim Krčmář wrote: > > 2014-10-03 01:10+0300, Nadav Amit: > >> Setting the TSC deadline MSR that are initiated by the host (using > >> ioctl's) may > >> cause superfluous interrupt. This occurs in the following case: > >> > >> 1. A TSC deadline timer interrupt is pending. > >> 2. TSC deadline was still not cleared (which happens during vcpu_run). > >> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. > >> > >> To solve this situation, ignore host initiated TSC deadline writes that do > >> not > >> change the deadline value. > > > > I find this change slightly dubious … > Why? I see similar handling of MSR_TSC_ADJUST. In other modes, we don't inject pending timer when writing to APIC_TMICT. (Which, sadly, is inconsistent with APIC_LVTT.) Adding a workaround is usually worse than removing the reason ... > > - why does the userspace do that? > It seems qemu’s kvm_cpu_exec does so when it calls kvm_arch_put_registers. > It is pretty much done after every exit to userspace. Thanks, it really doesn't do much checking of what is needed. > > - why is host_initiated required? > Since if the guest writes to the MSR, it means it wants to rearm the TSC > deadline. Even if the deadline passed, interrupt should be triggered. MSR isn't 0, so the deadline hasn't passed for the guest yet. > If the guest writes the same value on the deadline MSR twice, it might expect > two interrupts. When guest writes to it without getting an interrupt first, it might expect just one. (Which it better IMO.) > > > > Thanks. > > > > It seems like an performance improvement, so why shouldn't return when > > 'data <= tscdeadline && pending()' > > or even > > 'data <= now() && pending()' > > > > (Sorry, I ran out of time to search for answers today.) > The bug I encountered is not a performance issue, but a superfluous interrupt > (functional bug). True. > As I said, the guest may write a new deadline MSR value which is in the past > and expect an interrupt. And it would get one from the currently pending timer. What about the following patch? (The introduced else branch could use some abstractions.) --8<--- KVM: x86: fix deadline tsc interrupt injection The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a situation where we lose a pending deadline timer in a MSR write. Losing it is fine, because it effectively occurs before the timer fired, so we should be able to cancel or postpone it. Another problem comes from interaction with QEMU, or other userspace that can set deadline MSR without a good reason, when timer is already pending: one guest's deadline request results in more than one interrupt because one is injected immediately on MSR write from userspace and one through hrtimer later. The solution is to remove the injection when replacing a pending timer and to improve the usual QEMU path, we inject without a hrtimer when the deadline has already passed. Signed-off-by: Radim Krčmář Reported-by: Nadav Amit --- arch/x86/kvm/lapic.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index b8345dd..51428dd 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic) if (likely(tscdeadline > guest_tsc)) { ns = (tscdeadline - guest_tsc) * 100ULL; do_div(ns, this_tsc_khz); + hrtimer_start(&apic->lapic_timer.timer, + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + } else { + atomic_inc(&ktimer->pending); + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); } - hrtimer_start(&apic->lapic_timer.timer, - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); local_irq_restore(flags); } @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) return; hrtimer_cancel(&apic->lapic_timer.timer); - /* Inject here so clearing tscdeadline won't override new value */ - if (apic_has_pending_timer(vcpu)) - kvm_inject_apic_timer_irqs(vcpu); apic->lapic_timer.tscdeadline = data; start_apic_timer(apic); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Il 03/10/2014 00:10, Nadav Amit ha scritto: > To solve this situation, ignore host initiated TSC deadline writes that do not > change the deadline value. > > Signed-off-by: Nadav Amit > --- > arch/x86/kvm/lapic.c | 7 ++- > arch/x86/kvm/lapic.h | 3 ++- > arch/x86/kvm/x86.c | 2 +- > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index b8345dd..0bcf2e1 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1346,14 +1346,19 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu > *vcpu) > return apic->lapic_timer.tscdeadline; > } > > -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) > +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, > +struct msr_data *msr_info) > { > struct kvm_lapic *apic = vcpu->arch.apic; > + u64 data = msr_info->data; > > if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) || > apic_lvtt_period(apic)) > return; > > + if (msr_info->host_initiated && apic->lapic_timer.tscdeadline == data) > + return; > + Why do we have to check host_initiated? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Thanks for reviewing this patch and the rest of the gang. On Oct 6, 2014, at 11:57 PM, Radim Krčmář wrote: > 2014-10-03 01:10+0300, Nadav Amit: >> Setting the TSC deadline MSR that are initiated by the host (using ioctl's) >> may >> cause superfluous interrupt. This occurs in the following case: >> >> 1. A TSC deadline timer interrupt is pending. >> 2. TSC deadline was still not cleared (which happens during vcpu_run). >> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. >> >> To solve this situation, ignore host initiated TSC deadline writes that do >> not >> change the deadline value. > > I find this change slightly dubious … Why? I see similar handling of MSR_TSC_ADJUST. > - why does the userspace do that? It seems qemu’s kvm_cpu_exec does so when it calls kvm_arch_put_registers. It is pretty much done after every exit to userspace. > - why is host_initiated required? Since if the guest writes to the MSR, it means it wants to rearm the TSC deadline. Even if the deadline passed, interrupt should be triggered. If the guest writes the same value on the deadline MSR twice, it might expect two interrupts. > > Thanks. > > It seems like an performance improvement, so why shouldn't return when > 'data <= tscdeadline && pending()' > or even > 'data <= now() && pending()' > > (Sorry, I ran out of time to search for answers today.) The bug I encountered is not a performance issue, but a superfluous interrupt (functional bug). As I said, the guest may write a new deadline MSR value which is in the past and expect an interrupt. Nadav-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
2014-10-03 01:10+0300, Nadav Amit: > Setting the TSC deadline MSR that are initiated by the host (using ioctl's) > may > cause superfluous interrupt. This occurs in the following case: > > 1. A TSC deadline timer interrupt is pending. > 2. TSC deadline was still not cleared (which happens during vcpu_run). > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. > > To solve this situation, ignore host initiated TSC deadline writes that do not > change the deadline value. I find this change slightly dubious ... - why does the userspace do that? - why is host_initiated required? Thanks. It seems like an performance improvement, so why shouldn't return when 'data <= tscdeadline && pending()' or even 'data <= now() && pending()' (Sorry, I ran out of time to search for answers today.) > Signed-off-by: Nadav Amit > --- > arch/x86/kvm/lapic.c | 7 ++- > arch/x86/kvm/lapic.h | 3 ++- > arch/x86/kvm/x86.c | 2 +- > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index b8345dd..0bcf2e1 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1346,14 +1346,19 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu > *vcpu) > return apic->lapic_timer.tscdeadline; > } > > -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) > +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, > +struct msr_data *msr_info) > { > struct kvm_lapic *apic = vcpu->arch.apic; > + u64 data = msr_info->data; > > if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) || > apic_lvtt_period(apic)) > return; > > + if (msr_info->host_initiated && apic->lapic_timer.tscdeadline == data) > + return; > + > hrtimer_cancel(&apic->lapic_timer.timer); > /* Inject here so clearing tscdeadline won't override new value */ > if (apic_has_pending_timer(vcpu)) > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > index 6a11845..5bfa3db 100644 > --- a/arch/x86/kvm/lapic.h > +++ b/arch/x86/kvm/lapic.h > @@ -71,7 +71,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, > int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); > > u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); > -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data); > +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, > + struct msr_data *msr_info); > > void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset); > void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e903167..8c2745c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2093,7 +2093,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct > msr_data *msr_info) > case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: > return kvm_x2apic_msr_write(vcpu, msr, data); > case MSR_IA32_TSCDEADLINE: > - kvm_set_lapic_tscdeadline_msr(vcpu, data); > + kvm_set_lapic_tscdeadline_msr(vcpu, msr_info); > break; > case MSR_IA32_TSC_ADJUST: > if (guest_cpuid_has_tsc_adjust(vcpu)) { > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes
Setting the TSC deadline MSR that are initiated by the host (using ioctl's) may cause superfluous interrupt. This occurs in the following case: 1. A TSC deadline timer interrupt is pending. 2. TSC deadline was still not cleared (which happens during vcpu_run). 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr. To solve this situation, ignore host initiated TSC deadline writes that do not change the deadline value. Signed-off-by: Nadav Amit --- arch/x86/kvm/lapic.c | 7 ++- arch/x86/kvm/lapic.h | 3 ++- arch/x86/kvm/x86.c | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index b8345dd..0bcf2e1 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1346,14 +1346,19 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu) return apic->lapic_timer.tscdeadline; } -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr_info) { struct kvm_lapic *apic = vcpu->arch.apic; + u64 data = msr_info->data; if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) || apic_lvtt_period(apic)) return; + if (msr_info->host_initiated && apic->lapic_timer.tscdeadline == data) + return; + hrtimer_cancel(&apic->lapic_timer.timer); /* Inject here so clearing tscdeadline won't override new value */ if (apic_has_pending_timer(vcpu)) diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..5bfa3db 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -71,7 +71,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data); +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr_info); void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset); void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e903167..8c2745c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2093,7 +2093,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff: return kvm_x2apic_msr_write(vcpu, msr, data); case MSR_IA32_TSCDEADLINE: - kvm_set_lapic_tscdeadline_msr(vcpu, data); + kvm_set_lapic_tscdeadline_msr(vcpu, msr_info); break; case MSR_IA32_TSC_ADJUST: if (guest_cpuid_has_tsc_adjust(vcpu)) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html