On 08/28/2014 08:55 PM, Zhang Haoyu wrote: > Hi Jason, > I tested below patch, it's okay, the e1000 interrupt storm disappeared. > But I am going to make a bit change on it, could you help review it? > >> Currently, we call ioapic_service() immediately when we find the irq is still >> active during eoi broadcast. But for real hardware, there's some dealy >> between >> the EOI writing and irq delivery (system bus latency?). So we need to emulate >> this behavior. Otherwise, for a guest who haven't register a proper irq >> handler >> , it would stay in the interrupt routine as this irq would be re-injected >> immediately after guest enables interrupt. This would lead guest can't move >> forward and may miss the possibility to get proper irq handler registered >> (one >> example is windows guest resuming from hibernation). >> >> As there's no way to differ the unhandled irq from new raised ones, this >> patch >> solve this problems by scheduling a delayed work when the count of irq >> injected >> during eoi broadcast exceeds a threshold value. After this patch, the guest >> can >> move a little forward when there's no suitable irq handler in case it may >> register one very soon and for guest who has a bad irq detection routine ( >> such >> as note_interrupt() in linux ), this bad irq would be recognized soon as in >> the >> past. >> >> Signed-off-by: Jason Wang <jasowang <at> redhat.com> >> --- >> virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- >> virt/kvm/ioapic.h | 2 ++ >> 2 files changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >> index dcaf272..892253e 100644 >> --- a/virt/kvm/ioapic.c >> +++ b/virt/kvm/ioapic.c >> <at> <at> -221,6 +221,24 <at> <at> int kvm_ioapic_set_irq(struct >> kvm_ioapic *ioapic, int irq, int level) >> return ret; >> } >> >> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >> +{ >> + int i, ret; >> + struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, >> + eoi_inject.work); >> + spin_lock(&ioapic->lock); >> + for (i = 0; i < IOAPIC_NUM_PINS; i++) { >> + union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; >> + >> + if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) >> + continue; >> + >> + if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) >> + ret = ioapic_service(ioapic, i); >> + } >> + spin_unlock(&ioapic->lock); >> +} >> + >> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >> int trigger_mode) >> { >> <at> <at> -249,8 +267,29 <at> <at> static void >> __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >> >> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >> ent->fields.remote_irr = 0; >> - if (!ent->fields.mask && (ioapic->irr & (1 << i))) >> - ioapic_service(ioapic, i); >> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >> + ++ioapic->irq_eoi; > -+ ++ioapic->irq_eoi; > ++ ++ioapic->irq_eoi[i]; >> + if (ioapic->irq_eoi == 100) { > -+ if (ioapic->irq_eoi == 100) { > ++ if (ioapic->irq_eoi[i] == 100) { >> + /* >> + * Real hardware does not deliver the irq so >> + * immediately during eoi broadcast, so we need >> + * to emulate this behavior. Otherwise, for >> + * guests who has not registered handler of a >> + * level irq, this irq would be injected >> + * immediately after guest enables interrupt >> + * (which happens usually at the end of the >> + * common interrupt routine). This would lead >> + * guest can't move forward and may miss the >> + * possibility to get proper irq handler >> + * registered. So we need to give some breath to >> + * guest. TODO: 1 is too long? >> + */ >> + schedule_delayed_work(&ioapic->eoi_inject, 1); >> + ioapic->irq_eoi = 0; > -+ ioapic->irq_eoi = 0; > ++ ioapic->irq_eoi[i] = 0; >> + } else { >> + ioapic_service(ioapic, i); >> + } >> + } > ++ else { > ++ ioapic->irq_eoi[i] = 0; > ++ } >> } >> } > I think ioapic->irq_eoi is prone to reach to 100, because during a eoi > broadcast, > it's possible that another interrupt's (not current eoi's corresponding > interrupt) irr is set, so the ioapic->irq_eoi will grow continually, > and not too long, ioapic->irq_eoi will reach to 100. > I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". > Any ideas? > > Zhang Haoyu
I don't have objection to this per irq counter, but I don't see obvious difference. Btw, the patch has been rejected in the past since we're not sure whether this behaviour is architectural (or ask Intel?). You need convince the Gleb and other kvm maintainers for this idea first :).