On Thu, Aug 28, 2014 at 08:55:18PM +0800, Zhang Haoyu wrote: > Hi Jason, > I tested below patch, it's okay, the e1000 interrupt storm disappeared. > But I am going to make a bit change on it, could you help review it? > > >Currently, we call ioapic_service() immediately when we find the irq is still > >active during eoi broadcast. But for real hardware, there's some dealy > >between > >the EOI writing and irq delivery (system bus latency?). So we need to emulate > >this behavior. Otherwise, for a guest who haven't register a proper irq > >handler > >, it would stay in the interrupt routine as this irq would be re-injected > >immediately after guest enables interrupt. This would lead guest can't move > >forward and may miss the possibility to get proper irq handler registered > >(one > >example is windows guest resuming from hibernation). > > > >As there's no way to differ the unhandled irq from new raised ones, this > >patch > >solve this problems by scheduling a delayed work when the count of irq > >injected > >during eoi broadcast exceeds a threshold value. After this patch, the guest > >can > >move a little forward when there's no suitable irq handler in case it may > >register one very soon and for guest who has a bad irq detection routine ( > >such > >as note_interrupt() in linux ), this bad irq would be recognized soon as in > >the > >past. > > > >Signed-off-by: Jason Wang <jasowang <at> redhat.com> > >--- > > virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- > > virt/kvm/ioapic.h | 2 ++ > > 2 files changed, 47 insertions(+), 2 deletions(-) > > > >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > >index dcaf272..892253e 100644 > >--- a/virt/kvm/ioapic.c > >+++ b/virt/kvm/ioapic.c > > <at> <at> -221,6 +221,24 <at> <at> int kvm_ioapic_set_irq(struct > > kvm_ioapic *ioapic, int irq, int level) > > return ret; > > } > > > >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work) > >+{ > >+ int i, ret; > >+ struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, > >+ eoi_inject.work); > >+ spin_lock(&ioapic->lock); > >+ for (i = 0; i < IOAPIC_NUM_PINS; i++) { > >+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; > >+ > >+ if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) > >+ continue; > >+ > >+ if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) > >+ ret = ioapic_service(ioapic, i); > >+ } > >+ spin_unlock(&ioapic->lock); > >+} > >+ > > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > > int trigger_mode) > > { > > <at> <at> -249,8 +267,29 <at> <at> static void > > __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > > > > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > > ent->fields.remote_irr = 0; > >- if (!ent->fields.mask && (ioapic->irr & (1 << i))) > >- ioapic_service(ioapic, i); > >+ if (!ent->fields.mask && (ioapic->irr & (1 << i))) { > >+ ++ioapic->irq_eoi; > -+ ++ioapic->irq_eoi; > ++ ++ioapic->irq_eoi[i]; > >+ if (ioapic->irq_eoi == 100) { > -+ if (ioapic->irq_eoi == 100) { > ++ if (ioapic->irq_eoi[i] == 100) { > >+ /* > >+ * Real hardware does not deliver the irq so > >+ * immediately during eoi broadcast, so we need > >+ * to emulate this behavior. Otherwise, for > >+ * guests who has not registered handler of a > >+ * level irq, this irq would be injected > >+ * immediately after guest enables interrupt > >+ * (which happens usually at the end of the > >+ * common interrupt routine). This would lead > >+ * guest can't move forward and may miss the > >+ * possibility to get proper irq handler > >+ * registered. So we need to give some breath to > >+ * guest. TODO: 1 is too long? > >+ */ > >+ schedule_delayed_work(&ioapic->eoi_inject, 1); > >+ ioapic->irq_eoi = 0; > -+ ioapic->irq_eoi = 0; > ++ ioapic->irq_eoi[i] = 0; > >+ } else { > >+ ioapic_service(ioapic, i); > >+ } > >+ } > ++ else { > ++ ioapic->irq_eoi[i] = 0; > ++ } > > } > > } > I think ioapic->irq_eoi is prone to reach to 100, because during a eoi > broadcast, > it's possible that another interrupt's (not current eoi's corresponding > interrupt) irr is set, so the ioapic->irq_eoi will grow continually, > and not too long, ioapic->irq_eoi will reach to 100. > I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". > Any ideas? > > Zhang Haoyu
I'm a bit concerned how this will affect realtime guests. Worth adding a flag to enable this, so that e.g. virtio is not affected? > > > > <at> <at> -375,12 +414,14 <at> <at> void kvm_ioapic_reset(struct > > kvm_ioapic *ioapic) > > { > > int i; > > > >+ cancel_delayed_work_sync(&ioapic->eoi_inject); > > for (i = 0; i < IOAPIC_NUM_PINS; i++) > > ioapic->redirtbl[i].fields.mask = 1; > > ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS; > > ioapic->ioregsel = 0; > > ioapic->irr = 0; > > ioapic->id = 0; > >+ ioapic->irq_eoi = 0; > -+ ioapic->irq_eoi = 0; > ++ memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS); > > update_handled_vectors(ioapic); > > } > > > > <at> <at> -398,6 +439,7 <at> <at> int kvm_ioapic_init(struct kvm *kvm) > > if (!ioapic) > > return -ENOMEM; > > spin_lock_init(&ioapic->lock); > >+ INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work); > > kvm->arch.vioapic = ioapic; > > kvm_ioapic_reset(ioapic); > > kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops); > > <at> <at> -418,6 +460,7 <at> <at> void kvm_ioapic_destroy(struct kvm > > *kvm) > > { > > struct kvm_ioapic *ioapic = kvm->arch.vioapic; > > > >+ cancel_delayed_work_sync(&ioapic->eoi_inject); > > if (ioapic) { > > kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev); > > kvm->arch.vioapic = NULL; > >diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h > >index 0b190c3..8938e66 100644 > >--- a/virt/kvm/ioapic.h > >+++ b/virt/kvm/ioapic.h > > <at> <at> -47,6 +47,8 <at> <at> struct kvm_ioapic { > > void (*ack_notifier)(void *opaque, int irq); > > spinlock_t lock; > > DECLARE_BITMAP(handled_vectors, 256); > >+ struct delayed_work eoi_inject; > >+ u32 irq_eoi; > -+ u32 irq_eoi; > ++ u32 irq_eoi[IOAPIC_NUM_PINS]; > > }; > > > > #ifdef DEBUG >