Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC
This issue seems generic to level triggered interrupts as well as RTC interrupts. It looks like KVM hacks around the issue with level triggered interrupts by clearing the remote IRR when an IRQ is reconfigured. Seems like an (admittedly lossy) way to handle this issue with the RTC-IRQ would be to follow the lead of level-triggered interrupts, and clear the pending EOIs when reconfiguring the RTC-IRQ. [Given that we are already talking about this, this could be viewed as a good time to go back and fix the issues with the remote IRR in the IOAPIC.] On Mon, Feb 29, 2016 at 7:30 AM, Joerg Roedel wrote: > On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote: >> > This information is then used to match EOI signals from the >> > guest to the RTC. This explicit back-tracking fixes the >> > issue. >> >> Nice patches, really. Ok to wait until 4.6? > > Thanks. Putting them into v4.6 is fine for me. > > > Joerg > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC
On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote: > > This information is then used to match EOI signals from the > > guest to the RTC. This explicit back-tracking fixes the > > issue. > > Nice patches, really. Ok to wait until 4.6? Thanks. Putting them into v4.6 is fine for me. Joerg
[PATCH 0/3] KVM: Fix lost IRQ acks for RTC
Hi, here is a small patch-set to fix a race condition which happens when an RTC-IRQ is migrated to another VCPU while it is being handled by the guest. The RTC-EOI handling in KVM requires that all sent interrupt messages to the VCPUs need to be acked before another RTC-IRQ can be sent. When an EOI signal from the guest is lost, it will never see an RTC interrupt again (until it reboots). This is easily reproducible with a Linux guest executing this loop: $ while true;do time hwclock --show --test --debug;done When the guest has multiple vcpus and the RTC-IRQ is regularily migrated (e.g. by irqbalance), the race condition will be hit after some time and the hwclock tool will fail with: select() to /dev/rtc to wait for clock tick timed out...synchronization failed The race condition happens because of the way the EOI backtracking between local APIC and IOAPIC works in KVM. The destination VCPU and vector is part of the IOAPIC state. When the guest sends an EOI to the local APIC the vector is matched against the destinations stored in the IOAPIC and ACKed there too if it matches. The problem begins when a VCPU handles an RTC interrupt and at the same time another VCPU migrates the RTC-IRQ away from that VCPU. This updates the IOAPIC state in KVM to the new destination, so that the EOI sent from the first VCPU does not match anymore in the IOAPIC, hence losing the RTC-EOI. This patch-set fixes the race-condition by adding explicit back-tracking information for RTC-IRQs. The rtc_status struct already holds a dest_map bitmap to store which VCPUs receveived an RTC-IRQ. This is extended to also hold the vector that was sent to this VCPU. This information is then used to match EOI signals from the guest to the RTC. This explicit back-tracking fixes the issue. Regards, Joerg Joerg Roedel (3): kvm: x86: Convert ioapic->rtc_status.dest_map to a struct kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map kvm: x86: Check dest_map->vector to match eoi signals for rtc arch/x86/kvm/ioapic.c | 30 +- arch/x86/kvm/ioapic.h | 17 +++-- arch/x86/kvm/irq_comm.c | 2 +- arch/x86/kvm/lapic.c| 14 -- arch/x86/kvm/lapic.h| 7 +-- 5 files changed, 50 insertions(+), 20 deletions(-) -- 1.9.1
Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC
On 29/02/2016 16:04, Joerg Roedel wrote: > Hi, > > here is a small patch-set to fix a race condition which > happens when an RTC-IRQ is migrated to another VCPU while it > is being handled by the guest. > > The RTC-EOI handling in KVM requires that all sent interrupt > messages to the VCPUs need to be acked before another > RTC-IRQ can be sent. When an EOI signal from the guest is > lost, it will never see an RTC interrupt again (until it > reboots). > > This is easily reproducible with a Linux guest executing > this loop: > > $ while true;do time hwclock --show --test --debug;done > > When the guest has multiple vcpus and the RTC-IRQ is > regularily migrated (e.g. by irqbalance), the race condition > will be hit after some time and the hwclock tool will fail > with: > > select() to /dev/rtc to wait for clock tick timed out...synchronization > failed > > The race condition happens because of the way the EOI > backtracking between local APIC and IOAPIC works in KVM. The > destination VCPU and vector is part of the IOAPIC state. > When the guest sends an EOI to the local APIC the vector is > matched against the destinations stored in the IOAPIC and > ACKed there too if it matches. > > The problem begins when a VCPU handles an RTC interrupt and > at the same time another VCPU migrates the RTC-IRQ away from > that VCPU. This updates the IOAPIC state in KVM to > the new destination, so that the EOI sent from the first > VCPU does not match anymore in the IOAPIC, hence losing the > RTC-EOI. > > This patch-set fixes the race-condition by adding explicit > back-tracking information for RTC-IRQs. The rtc_status > struct already holds a dest_map bitmap to store which VCPUs > receveived an RTC-IRQ. This is extended to also hold the > vector that was sent to this VCPU. > > This information is then used to match EOI signals from the > guest to the RTC. This explicit back-tracking fixes the > issue. > > Regards, Nice patches, really. Ok to wait until 4.6? Paolo