Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-03-01 Thread Steve Rutherford
This issue seems generic to level triggered interrupts as well as RTC
interrupts. It looks like KVM hacks around the issue with level
triggered interrupts by clearing the remote IRR when an IRQ is
reconfigured. Seems like an (admittedly lossy) way to handle this
issue with the RTC-IRQ would be to follow the lead of level-triggered
interrupts, and clear the pending EOIs when reconfiguring the RTC-IRQ.

[Given that we are already talking about this, this could be viewed as
a good time to go back and fix the issues with the remote IRR in the
IOAPIC.]

On Mon, Feb 29, 2016 at 7:30 AM, Joerg Roedel  wrote:
> On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote:
>> > This information is then used to match EOI signals from the
>> > guest to the RTC. This explicit back-tracking fixes the
>> > issue.
>>
>> Nice patches, really.  Ok to wait until 4.6?
>
> Thanks. Putting them into v4.6 is fine for me.
>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-02-29 Thread Joerg Roedel
On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote:
> > This information is then used to match EOI signals from the
> > guest to the RTC. This explicit back-tracking fixes the
> > issue.
> 
> Nice patches, really.  Ok to wait until 4.6?

Thanks. Putting them into v4.6 is fine for me.


Joerg



[PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-02-29 Thread Joerg Roedel
Hi,

here is a small patch-set to fix a race condition which
happens when an RTC-IRQ is migrated to another VCPU while it
is being handled by the guest.

The RTC-EOI handling in KVM requires that all sent interrupt
messages to the VCPUs need to be acked before another
RTC-IRQ can be sent. When an EOI signal from the guest is
lost, it will never see an RTC interrupt again (until it
reboots).

This is easily reproducible with a Linux guest executing
this loop:

$ while true;do time hwclock --show --test --debug;done

When the guest has multiple vcpus and the RTC-IRQ is
regularily migrated (e.g. by irqbalance), the race condition
will be hit after some time and the hwclock tool will fail
with:

select() to /dev/rtc to wait for clock tick timed out...synchronization 
failed

The race condition happens because of the way the EOI
backtracking between local APIC and IOAPIC works in KVM. The
destination VCPU and vector is part of the IOAPIC state.
When the guest sends an EOI to the local APIC the vector is
matched against the destinations stored in the IOAPIC and
ACKed there too if it matches.

The problem begins when a VCPU handles an RTC interrupt and
at the same time another VCPU migrates the RTC-IRQ away from
that VCPU. This updates the IOAPIC state in KVM to
the new destination, so that the EOI sent from the first
VCPU does not match anymore in the IOAPIC, hence losing the
RTC-EOI.

This patch-set fixes the race-condition by adding explicit
back-tracking information for RTC-IRQs. The rtc_status
struct already holds a dest_map bitmap to store which VCPUs
receveived an RTC-IRQ. This is extended to also hold the
vector that was sent to this VCPU.

This information is then used to match EOI signals from the
guest to the RTC. This explicit back-tracking fixes the
issue.

Regards,

Joerg

Joerg Roedel (3):
  kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
  kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
  kvm: x86: Check dest_map->vector to match eoi signals for rtc

 arch/x86/kvm/ioapic.c   | 30 +-
 arch/x86/kvm/ioapic.h   | 17 +++--
 arch/x86/kvm/irq_comm.c |  2 +-
 arch/x86/kvm/lapic.c| 14 --
 arch/x86/kvm/lapic.h|  7 +--
 5 files changed, 50 insertions(+), 20 deletions(-)

-- 
1.9.1



Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-02-29 Thread Paolo Bonzini


On 29/02/2016 16:04, Joerg Roedel wrote:
> Hi,
> 
> here is a small patch-set to fix a race condition which
> happens when an RTC-IRQ is migrated to another VCPU while it
> is being handled by the guest.
> 
> The RTC-EOI handling in KVM requires that all sent interrupt
> messages to the VCPUs need to be acked before another
> RTC-IRQ can be sent. When an EOI signal from the guest is
> lost, it will never see an RTC interrupt again (until it
> reboots).
> 
> This is easily reproducible with a Linux guest executing
> this loop:
> 
>   $ while true;do time hwclock --show --test --debug;done
> 
> When the guest has multiple vcpus and the RTC-IRQ is
> regularily migrated (e.g. by irqbalance), the race condition
> will be hit after some time and the hwclock tool will fail
> with:
> 
>   select() to /dev/rtc to wait for clock tick timed out...synchronization 
> failed
> 
> The race condition happens because of the way the EOI
> backtracking between local APIC and IOAPIC works in KVM. The
> destination VCPU and vector is part of the IOAPIC state.
> When the guest sends an EOI to the local APIC the vector is
> matched against the destinations stored in the IOAPIC and
> ACKed there too if it matches.
> 
> The problem begins when a VCPU handles an RTC interrupt and
> at the same time another VCPU migrates the RTC-IRQ away from
> that VCPU. This updates the IOAPIC state in KVM to
> the new destination, so that the EOI sent from the first
> VCPU does not match anymore in the IOAPIC, hence losing the
> RTC-EOI.
> 
> This patch-set fixes the race-condition by adding explicit
> back-tracking information for RTC-IRQs. The rtc_status
> struct already holds a dest_map bitmap to store which VCPUs
> receveived an RTC-IRQ. This is extended to also hold the
> vector that was sent to this VCPU.
> 
> This information is then used to match EOI signals from the
> guest to the RTC. This explicit back-tracking fixes the
> issue.
> 
> Regards,

Nice patches, really.  Ok to wait until 4.6?

Paolo