Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-10 Thread Paolo Bonzini
Il 08/10/2014 12:06, Radim Krčmář ha scritto:
> KVM: x86: fix deadline tsc interrupt injection
> 
> The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a
> situation where we lose a pending deadline timer in a MSR write.
> Losing it is fine, because it effectively occurs before the timer fired,
> so we should be able to cancel or postpone it.
> 
> Another problem comes from interaction with QEMU, or other userspace
> that can set deadline MSR without a good reason, when timer is already
> pending:  one guest's deadline request results in more than one
> interrupt because one is injected immediately on MSR write from
> userspace and one through hrtimer later.
> 
> The solution is to remove the injection when replacing a pending timer
> and to improve the usual QEMU path, we inject without a hrtimer when the
> deadline has already passed.
> 
> Signed-off-by: Radim Krčmář 
> Reported-by: Nadav Amit 
> ---
>  arch/x86/kvm/lapic.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8345dd..51428dd 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
>   if (likely(tscdeadline > guest_tsc)) {
>   ns = (tscdeadline - guest_tsc) * 100ULL;
>   do_div(ns, this_tsc_khz);
> + hrtimer_start(&apic->lapic_timer.timer,
> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> + } else {
> + atomic_inc(&ktimer->pending);
> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>   }
> - hrtimer_start(&apic->lapic_timer.timer,
> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
>  
>   local_irq_restore(flags);
>   }
> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu 
> *vcpu, u64 data)
>   return;
>  
>   hrtimer_cancel(&apic->lapic_timer.timer);
> - /* Inject here so clearing tscdeadline won't override new value */
> - if (apic_has_pending_timer(vcpu))
> - kvm_inject_apic_timer_irqs(vcpu);
>   apic->lapic_timer.tscdeadline = data;
>   start_apic_timer(apic);
>  }

Radim, the patch looks good but please extract this code:

/*
 * There is a race window between reading and incrementing, but we do
 * not care about potentially losing timer events in the !reinject
 * case anyway. Note: KVM_REQ_PENDING_TIMER is implicitly checked
 * in vcpu_enter_guest.
 */
if (!atomic_read(&ktimer->pending)) {
atomic_inc(&ktimer->pending);
/* FIXME: this code should not know anything about vcpus */
kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
}

if (waitqueue_active(q))
wake_up_interruptible(q);

to a new "static void apic_timer_expired(struct kvm_lapic *apic)" function,
and call it from both apic_timer_fn and start_apic_timer.

Also, we should not need to do wake_up_interruptible() unless we have
changed ktimer->pending from zero to non-zero.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-10 Thread Paolo Bonzini
Il 10/10/2014 14:51, Nadav Amit ha scritto:
>>> Second, I think that the solution I proposed would perform better.
>>> Currently, there are many unnecessary cancellations and setups of the
>>> timer. This solution does not resolve this problem.
>>
>> I think it does.  You do not get an hrtimer_start if tscdeadline <=
>> guest_tsc.  To avoid useless cancels, either check hrtimer_is_enqueued
>> before calling hrtimer_cancel, or go straight to the source and avoid
>> taking the lock in the easy cases:
>>
>> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
>> index 1c2fe7de2842..6ce725007424 100644
>> --- a/kernel/time/hrtimer.c
>> +++ b/kernel/time/hrtimer.c
>> @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
>> {
>>  struct hrtimer_clock_base *base;
>>  unsigned long flags;
>> -int ret = -1;
>> +unsigned long state = timer->state;
>> +int ret;
>> +
>> +if (state & HRTIMER_STATE_ENQUEUED)
>> +return 0;
>> +if (state & HRTIMER_STATE_CALLBACK)
>> +return -1;
>>
>>  base = lock_hrtimer_base(timer, &flags);
>>
>> +ret = -1;
>>  if (!hrtimer_callback_running(timer))
>>  ret = remove_hrtimer(timer, base);
> Wouldn’t this change would cause cancellations never to succeed (the first 
> check would always be true if the timer is active)?

Ehm, there is a missing ! in that first "if".

>>> Last, I think that having less interrupts on deadline changes is not
>>> completely according to the SDM which says: "If software disarms the
>>> timer or postpones the deadline, race conditions may result in the
>>> delivery of a spurious timer interrupt.” It never says interrupts may
>>> be lost if you reprogram the deadline before you check it expired.
>>
>> But the case when you rewrite the same value to the MSR is neither
>> disarming nor postponing.  You would be getting two interrupts for the
>> same event.  That is why I agree with Radim that checking host_initiated
>> is wrong.
> 
> I understand, and Radim's solution seems functionally fine, now that I am 
> fully awake and understand it.
> I still think that if tscdeadline > guest_tsc, then reprogramming the 
> deadline with the same value, as QEMU does, would result in unwarranted 
> overhead.

The overhead is about two atomic operations (70 clock cycles?).  Still,
there are plenty of other micro-optimizations possible:

1) instead of incrementing timer->pending, set it to 1

2) change it to test_and_set_bit and only set PENDING_TIMER if the
result was zero

3) non-atomically test PENDING_TIMER before (atomically) clearing it

4) return bool from kvm_inject_apic_timer_irqs and only clear
PENDING_TIMER if a timer interrupt was actually injected.

(1) or (2) would remove one atomic operation when reprogramming a passed
deadline with the same value.  (3) or (4) would remove one atomic
operation in the case where the cause of the exit is not an expired
timer.  Any takers?

> Perhaps it would be enough not to reprogram the timer if tscdeadline value 
> does not change (by either guest or host).

Yes, and that would just be your patch without host_initiated.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-10 Thread Nadav Amit

On Oct 10, 2014, at 12:45 PM, Paolo Bonzini  wrote:

> Il 10/10/2014 03:55, Nadav Amit ha scritto:
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index b8345dd..51428dd 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
>>> if (likely(tscdeadline > guest_tsc)) {
>>> ns = (tscdeadline - guest_tsc) * 100ULL;
>>> do_div(ns, this_tsc_khz);
>>> +   hrtimer_start(&apic->lapic_timer.timer,
>>> +   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
>>> +   } else {
>>> +   atomic_inc(&ktimer->pending);
>>> +   kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>>> }
>>> -   hrtimer_start(&apic->lapic_timer.timer,
>>> -   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
>>> 
>>> local_irq_restore(flags);
>>> }
>>> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu 
>>> *vcpu, u64 data)
>>> return;
>>> 
>>> hrtimer_cancel(&apic->lapic_timer.timer);
>>> -   /* Inject here so clearing tscdeadline won't override new value */
>>> -   if (apic_has_pending_timer(vcpu))
>>> -   kvm_inject_apic_timer_irqs(vcpu);
>>> apic->lapic_timer.tscdeadline = data;
>>> start_apic_timer(apic);
>>> }
>> 
>> Perhaps I am missing something, but I don’t see how it solves the problem I 
>> encountered.
>> Recall the scenario:
>> 1. A TSC deadline timer interrupt is pending.
>> 2. TSC deadline was still not cleared (which happens during vcpu_run).
>> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
>> 
>> It appears that in such scenario the guest would still get spurious
>> interrupt for no reason, as ktimer->pending may already be increased in
>> apic_timer_fn.
> 
> If ktimer->pending == 2 you still get only one interrupt, not two.  Am I
> misunderstanding your objection?
You understood me, and I was wrong...

> 
>> Second, I think that the solution I proposed would perform better.
>> Currently, there are many unnecessary cancellations and setups of the
>> timer. This solution does not resolve this problem.
> 
> I think it does.  You do not get an hrtimer_start if tscdeadline <=
> guest_tsc.  To avoid useless cancels, either check hrtimer_is_enqueued
> before calling hrtimer_cancel, or go straight to the source and avoid
> taking the lock in the easy cases:
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 1c2fe7de2842..6ce725007424 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
> {
>   struct hrtimer_clock_base *base;
>   unsigned long flags;
> - int ret = -1;
> + unsigned long state = timer->state;
> + int ret;
> +
> + if (state & HRTIMER_STATE_ENQUEUED)
> + return 0;
> + if (state & HRTIMER_STATE_CALLBACK)
> + return -1;
> 
>   base = lock_hrtimer_base(timer, &flags);
> 
> + ret = -1;
>   if (!hrtimer_callback_running(timer))
>   ret = remove_hrtimer(timer, base);
Wouldn’t this change would cause cancellations never to succeed (the first 
check would always be true if the timer is active)?

> 
> 
>> Last, I think that having less interrupts on deadline changes is not
>> completely according to the SDM which says: "If software disarms the
>> timer or postpones the deadline, race conditions may result in the
>> delivery of a spurious timer interrupt.” It never says interrupts may
>> be lost if you reprogram the deadline before you check it expired.
> 
> But the case when you rewrite the same value to the MSR is neither
> disarming nor postponing.  You would be getting two interrupts for the
> same event.  That is why I agree with Radim that checking host_initiated
> is wrong.
I understand, and Radim's solution seems functionally fine, now that I am fully 
awake and understand it.
I still think that if tscdeadline > guest_tsc, then reprogramming the deadline 
with the same value, as QEMU does, would result in unwarranted overhead.
Perhaps it would be enough not to reprogram the timer if tscdeadline value does 
not change (by either guest or host).

Thanks,
Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-10 Thread Radim Krčmář
2014-10-10 11:45+0200, Paolo Bonzini:
> Il 10/10/2014 03:55, Nadav Amit ha scritto:
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index b8345dd..51428dd 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
> >>if (likely(tscdeadline > guest_tsc)) {
> >>ns = (tscdeadline - guest_tsc) * 100ULL;
> >>do_div(ns, this_tsc_khz);
> >> +  hrtimer_start(&apic->lapic_timer.timer,
> >> +  ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> >> +  } else {
> >> +  atomic_inc(&ktimer->pending);
> >> +  kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> >>}
> >> -  hrtimer_start(&apic->lapic_timer.timer,
> >> -  ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> >>
> >>local_irq_restore(flags);
> >>}
> >> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu 
> >> *vcpu, u64 data)
> >>return;
> >>
> >>hrtimer_cancel(&apic->lapic_timer.timer);
> >> -  /* Inject here so clearing tscdeadline won't override new value */
> >> -  if (apic_has_pending_timer(vcpu))
> >> -  kvm_inject_apic_timer_irqs(vcpu);
> >>apic->lapic_timer.tscdeadline = data;
> >>start_apic_timer(apic);
> >> }
> > 
> > Perhaps I am missing something, but I don’t see how it solves the problem I 
> > encountered.
> > Recall the scenario:
> > 1. A TSC deadline timer interrupt is pending.
> > 2. TSC deadline was still not cleared (which happens during vcpu_run).
> > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
> > 
> > It appears that in such scenario the guest would still get spurious
> > interrupt for no reason, as ktimer->pending may already be increased in
> > apic_timer_fn.
> 
> If ktimer->pending == 2 you still get only one interrupt, not two.  Am I
> misunderstanding your objection?

(I don't see it either, the patch removed kvm_inject_apic_timer_irqs(),
 so we inject as if the MSR write didn't happen.)

> > Second, I think that the solution I proposed would perform better.
> > Currently, there are many unnecessary cancellations and setups of the
> > timer. This solution does not resolve this problem.
> 
> I think it does.  You do not get an hrtimer_start if tscdeadline <=
> guest_tsc.  To avoid useless cancels, either check hrtimer_is_enqueued
> before calling hrtimer_cancel, or go straight to the source and avoid
> taking the lock in the easy cases:

I think the useless cancels were when the timer is set to the same value
in the future.

Which makes sense to optimize, but I wasn't sure how it would affect the
guest if the TSC offset was be changing or TSC itself wasn't stable ...
so I chose not to do it.

> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 1c2fe7de2842..6ce725007424 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
>  {
>   struct hrtimer_clock_base *base;
>   unsigned long flags;
> - int ret = -1;
> + unsigned long state = timer->state;
> + int ret;
> +
> + if (state & HRTIMER_STATE_ENQUEUED)
> + return 0;

It doesn't try very hard to cancel it ;)

> + if (state & HRTIMER_STATE_CALLBACK)
> + return -1;
> 
>   base = lock_hrtimer_base(timer, &flags);
> 
> + ret = -1;
>   if (!hrtimer_callback_running(timer))
>   ret = remove_hrtimer(timer, base);
> 
> > Last, I think that having less interrupts on deadline changes is not
> > completely according to the SDM which says: "If software disarms the
> > timer or postpones the deadline, race conditions may result in the
> > delivery of a spurious timer interrupt.” It never says interrupts may
> > be lost if you reprogram the deadline before you check it expired.

Yeah, too bad it wasn't written in a formally defined language ...
They could be just reserving design space for concurrent implementation,
not making race conditions mandatory.

Our race windows are much bigger -- a disarm that is hundreds of cycles
in the future can easily be hit after a msr write vm-exit, but it would
be very unlikely to get an interrupt on bare metal.

And thinking about the meaning of postpone or disarm -- software doesn't
want the old interrupt anymore, so it would just add useless work.

(And I liked the code better without it, but it'd be ok if we fixed all
 the cases.)

> But the case when you rewrite the same value to the MSR is neither
> disarming nor postponing.  You would be getting two interrupts for the
> same event.  That is why I agree with Radim that checking host_initiated
> is wrong.

Current implementation also injects two interrupts when the timer is set
to a lower nonzero value, which should give us just one under both
interpretations.
--
To unsubscr

Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-10 Thread Paolo Bonzini
Il 10/10/2014 03:55, Nadav Amit ha scritto:
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index b8345dd..51428dd 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
>>  if (likely(tscdeadline > guest_tsc)) {
>>  ns = (tscdeadline - guest_tsc) * 100ULL;
>>  do_div(ns, this_tsc_khz);
>> +hrtimer_start(&apic->lapic_timer.timer,
>> +ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
>> +} else {
>> +atomic_inc(&ktimer->pending);
>> +kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>>  }
>> -hrtimer_start(&apic->lapic_timer.timer,
>> -ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
>>
>>  local_irq_restore(flags);
>>  }
>> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu 
>> *vcpu, u64 data)
>>  return;
>>
>>  hrtimer_cancel(&apic->lapic_timer.timer);
>> -/* Inject here so clearing tscdeadline won't override new value */
>> -if (apic_has_pending_timer(vcpu))
>> -kvm_inject_apic_timer_irqs(vcpu);
>>  apic->lapic_timer.tscdeadline = data;
>>  start_apic_timer(apic);
>> }
> 
> Perhaps I am missing something, but I don’t see how it solves the problem I 
> encountered.
> Recall the scenario:
> 1. A TSC deadline timer interrupt is pending.
> 2. TSC deadline was still not cleared (which happens during vcpu_run).
> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
> 
> It appears that in such scenario the guest would still get spurious
> interrupt for no reason, as ktimer->pending may already be increased in
> apic_timer_fn.

If ktimer->pending == 2 you still get only one interrupt, not two.  Am I
misunderstanding your objection?

> Second, I think that the solution I proposed would perform better.
> Currently, there are many unnecessary cancellations and setups of the
> timer. This solution does not resolve this problem.

I think it does.  You do not get an hrtimer_start if tscdeadline <=
guest_tsc.  To avoid useless cancels, either check hrtimer_is_enqueued
before calling hrtimer_cancel, or go straight to the source and avoid
taking the lock in the easy cases:

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 1c2fe7de2842..6ce725007424 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
 {
struct hrtimer_clock_base *base;
unsigned long flags;
-   int ret = -1;
+   unsigned long state = timer->state;
+   int ret;
+
+   if (state & HRTIMER_STATE_ENQUEUED)
+   return 0;
+   if (state & HRTIMER_STATE_CALLBACK)
+   return -1;

base = lock_hrtimer_base(timer, &flags);

+   ret = -1;
if (!hrtimer_callback_running(timer))
ret = remove_hrtimer(timer, base);


> Last, I think that having less interrupts on deadline changes is not
> completely according to the SDM which says: "If software disarms the
> timer or postpones the deadline, race conditions may result in the
> delivery of a spurious timer interrupt.” It never says interrupts may
> be lost if you reprogram the deadline before you check it expired.

But the case when you rewrite the same value to the MSR is neither
disarming nor postponing.  You would be getting two interrupts for the
same event.  That is why I agree with Radim that checking host_initiated
is wrong.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-09 Thread Nadav Amit

On Oct 8, 2014, at 1:06 PM, Radim Krčmář  wrote:

> 
> And it would get one from the currently pending timer.
> 
> What about the following patch?
> (The introduced else branch could use some abstractions.)
> 
> --8<---
> KVM: x86: fix deadline tsc interrupt injection
> 
> The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a
> situation where we lose a pending deadline timer in a MSR write.
> Losing it is fine, because it effectively occurs before the timer fired,
> so we should be able to cancel or postpone it.
> 
> Another problem comes from interaction with QEMU, or other userspace
> that can set deadline MSR without a good reason, when timer is already
> pending:  one guest's deadline request results in more than one
> interrupt because one is injected immediately on MSR write from
> userspace and one through hrtimer later.
> 
> The solution is to remove the injection when replacing a pending timer
> and to improve the usual QEMU path, we inject without a hrtimer when the
> deadline has already passed.
> 
> Signed-off-by: Radim Krčmář 
> Reported-by: Nadav Amit 
> ---
> arch/x86/kvm/lapic.c | 10 +-
> 1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8345dd..51428dd 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
>   if (likely(tscdeadline > guest_tsc)) {
>   ns = (tscdeadline - guest_tsc) * 100ULL;
>   do_div(ns, this_tsc_khz);
> + hrtimer_start(&apic->lapic_timer.timer,
> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> + } else {
> + atomic_inc(&ktimer->pending);
> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>   }
> - hrtimer_start(&apic->lapic_timer.timer,
> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
> 
>   local_irq_restore(flags);
>   }
> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu 
> *vcpu, u64 data)
>   return;
> 
>   hrtimer_cancel(&apic->lapic_timer.timer);
> - /* Inject here so clearing tscdeadline won't override new value */
> - if (apic_has_pending_timer(vcpu))
> - kvm_inject_apic_timer_irqs(vcpu);
>   apic->lapic_timer.tscdeadline = data;
>   start_apic_timer(apic);
> }

Perhaps I am missing something, but I don’t see how it solves the problem I 
encountered.
Recall the scenario:
1. A TSC deadline timer interrupt is pending.
2. TSC deadline was still not cleared (which happens during vcpu_run).
3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.

It appears that in such scenario the guest would still get spurious interrupt 
for no reason, as ktimer->pending may already be increased in apic_timer_fn.

Second, I think that the solution I proposed would perform better. Currently, 
there are many unnecessary cancellations and setups of the timer. This solution 
does not resolve this problem.

Last, I think that having less interrupts on deadline changes is not completely 
according to the SDM which says: "If software disarms the timer or postpones 
the deadline, race conditions may result in the delivery of a spurious timer 
interrupt.” It never says interrupts may be lost if you reprogram the deadline 
before you check it expired.

Thanks,
Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-08 Thread Paolo Bonzini
Il 08/10/2014 12:06, Radim Krčmář ha scritto:
> > > - why is host_initiated required?
> 
> > Since if the guest writes to the MSR, it means it wants to rearm the TSC 
> > deadline. Even if the deadline passed, interrupt should be triggered.
> 
> MSR isn't 0, so the deadline hasn't passed for the guest yet.
> 
> > If the guest writes the same value on the deadline MSR twice, it might 
> > expect two interrupts.
> 
> When guest writes to it without getting an interrupt first, it might
> expect just one.  (Which it better IMO.)

Indeed that was my doubt as well.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-08 Thread Radim Krčmář
2014-10-07 12:35+0300, Nadav Amit:
> Thanks for reviewing this patch and the rest of the gang.

Happy to do so, I've learned a lot.

> On Oct 6, 2014, at 11:57 PM, Radim Krčmář  wrote:
> > 2014-10-03 01:10+0300, Nadav Amit:
> >> Setting the TSC deadline MSR that are initiated by the host (using 
> >> ioctl's) may
> >> cause superfluous interrupt.  This occurs in the following case:
> >> 
> >> 1. A TSC deadline timer interrupt is pending.
> >> 2. TSC deadline was still not cleared (which happens during vcpu_run).
> >> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
> >> 
> >> To solve this situation, ignore host initiated TSC deadline writes that do 
> >> not
> >> change the deadline value.
> > 
> > I find this change slightly dubious …
> Why? I see similar handling of MSR_TSC_ADJUST.

In other modes, we don't inject pending timer when writing to
APIC_TMICT.  (Which, sadly, is inconsistent with APIC_LVTT.)

Adding a workaround is usually worse than removing the reason ...

> > - why does the userspace do that?
> It seems qemu’s kvm_cpu_exec does so when it calls kvm_arch_put_registers.
> It is pretty much done after every exit to userspace.

Thanks, it really doesn't do much checking of what is needed.

> > - why is host_initiated required?
> Since if the guest writes to the MSR, it means it wants to rearm the TSC 
> deadline. Even if the deadline passed, interrupt should be triggered.

MSR isn't 0, so the deadline hasn't passed for the guest yet.

> If the guest writes the same value on the deadline MSR twice, it might expect 
> two interrupts.

When guest writes to it without getting an interrupt first, it might
expect just one.  (Which it better IMO.)

> > 
> > Thanks.
> > 
> > It seems like an performance improvement, so why shouldn't return when
> >  'data <= tscdeadline && pending()'
> > or even
> >  'data <= now() && pending()'
> > 
> > (Sorry, I ran out of time to search for answers today.)
> The bug I encountered is not a performance issue, but a superfluous interrupt 
> (functional bug).

True.

> As I said, the guest may write a new deadline MSR value which is in the past 
> and expect an interrupt.

And it would get one from the currently pending timer.

What about the following patch?
(The introduced else branch could use some abstractions.)

--8<---
KVM: x86: fix deadline tsc interrupt injection

The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a
situation where we lose a pending deadline timer in a MSR write.
Losing it is fine, because it effectively occurs before the timer fired,
so we should be able to cancel or postpone it.

Another problem comes from interaction with QEMU, or other userspace
that can set deadline MSR without a good reason, when timer is already
pending:  one guest's deadline request results in more than one
interrupt because one is injected immediately on MSR write from
userspace and one through hrtimer later.

The solution is to remove the injection when replacing a pending timer
and to improve the usual QEMU path, we inject without a hrtimer when the
deadline has already passed.

Signed-off-by: Radim Krčmář 
Reported-by: Nadav Amit 
---
 arch/x86/kvm/lapic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b8345dd..51428dd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lapic *apic)
if (likely(tscdeadline > guest_tsc)) {
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
+   hrtimer_start(&apic->lapic_timer.timer,
+   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+   } else {
+   atomic_inc(&ktimer->pending);
+   kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
}
-   hrtimer_start(&apic->lapic_timer.timer,
-   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
 
local_irq_restore(flags);
}
@@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
return;
 
hrtimer_cancel(&apic->lapic_timer.timer);
-   /* Inject here so clearing tscdeadline won't override new value */
-   if (apic_has_pending_timer(vcpu))
-   kvm_inject_apic_timer_irqs(vcpu);
apic->lapic_timer.tscdeadline = data;
start_apic_timer(apic);
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-08 Thread Paolo Bonzini
Il 03/10/2014 00:10, Nadav Amit ha scritto:
> To solve this situation, ignore host initiated TSC deadline writes that do not
> change the deadline value.
> 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/lapic.c | 7 ++-
>  arch/x86/kvm/lapic.h | 3 ++-
>  arch/x86/kvm/x86.c   | 2 +-
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8345dd..0bcf2e1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1346,14 +1346,19 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu 
> *vcpu)
>   return apic->lapic_timer.tscdeadline;
>  }
>  
> -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)
> +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu,
> +struct msr_data *msr_info)
>  {
>   struct kvm_lapic *apic = vcpu->arch.apic;
> + u64 data = msr_info->data;
>  
>   if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) ||
>   apic_lvtt_period(apic))
>   return;
>  
> + if (msr_info->host_initiated && apic->lapic_timer.tscdeadline == data)
> + return;
> +

Why do we have to check host_initiated?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-07 Thread Nadav Amit
Thanks for reviewing this patch and the rest of the gang.

On Oct 6, 2014, at 11:57 PM, Radim Krčmář  wrote:

> 2014-10-03 01:10+0300, Nadav Amit:
>> Setting the TSC deadline MSR that are initiated by the host (using ioctl's) 
>> may
>> cause superfluous interrupt.  This occurs in the following case:
>> 
>> 1. A TSC deadline timer interrupt is pending.
>> 2. TSC deadline was still not cleared (which happens during vcpu_run).
>> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
>> 
>> To solve this situation, ignore host initiated TSC deadline writes that do 
>> not
>> change the deadline value.
> 
> I find this change slightly dubious …
Why? I see similar handling of MSR_TSC_ADJUST.

> - why does the userspace do that?
It seems qemu’s kvm_cpu_exec does so when it calls kvm_arch_put_registers.
It is pretty much done after every exit to userspace.

> - why is host_initiated required?
Since if the guest writes to the MSR, it means it wants to rearm the TSC 
deadline. Even if the deadline passed, interrupt should be triggered.
If the guest writes the same value on the deadline MSR twice, it might expect 
two interrupts.


> 
> Thanks.
> 
> It seems like an performance improvement, so why shouldn't return when
>  'data <= tscdeadline && pending()'
> or even
>  'data <= now() && pending()'
> 
> (Sorry, I ran out of time to search for answers today.)
The bug I encountered is not a performance issue, but a superfluous interrupt 
(functional bug).
As I said, the guest may write a new deadline MSR value which is in the past 
and expect an interrupt.

Nadav--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes

2014-10-06 Thread Radim Krčmář
2014-10-03 01:10+0300, Nadav Amit:
> Setting the TSC deadline MSR that are initiated by the host (using ioctl's) 
> may
> cause superfluous interrupt.  This occurs in the following case:
> 
> 1. A TSC deadline timer interrupt is pending.
> 2. TSC deadline was still not cleared (which happens during vcpu_run).
> 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadline msr.
> 
> To solve this situation, ignore host initiated TSC deadline writes that do not
> change the deadline value.

I find this change slightly dubious ...
 - why does the userspace do that?
 - why is host_initiated required?

Thanks.

It seems like an performance improvement, so why shouldn't return when
  'data <= tscdeadline && pending()'
or even
  'data <= now() && pending()'

(Sorry, I ran out of time to search for answers today.)

> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/kvm/lapic.c | 7 ++-
>  arch/x86/kvm/lapic.h | 3 ++-
>  arch/x86/kvm/x86.c   | 2 +-
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b8345dd..0bcf2e1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1346,14 +1346,19 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu 
> *vcpu)
>   return apic->lapic_timer.tscdeadline;
>  }
>  
> -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)
> +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu,
> +struct msr_data *msr_info)
>  {
>   struct kvm_lapic *apic = vcpu->arch.apic;
> + u64 data = msr_info->data;
>  
>   if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) ||
>   apic_lvtt_period(apic))
>   return;
>  
> + if (msr_info->host_initiated && apic->lapic_timer.tscdeadline == data)
> + return;
> +
>   hrtimer_cancel(&apic->lapic_timer.timer);
>   /* Inject here so clearing tscdeadline won't override new value */
>   if (apic_has_pending_timer(vcpu))
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 6a11845..5bfa3db 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -71,7 +71,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
>  
>  u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
> -void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
> +void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info);
>  
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>  void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e903167..8c2745c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2093,7 +2093,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>   case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
>   return kvm_x2apic_msr_write(vcpu, msr, data);
>   case MSR_IA32_TSCDEADLINE:
> - kvm_set_lapic_tscdeadline_msr(vcpu, data);
> + kvm_set_lapic_tscdeadline_msr(vcpu, msr_info);
>   break;
>   case MSR_IA32_TSC_ADJUST:
>   if (guest_cpuid_has_tsc_adjust(vcpu)) {
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html