On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote: > On 2016-04-25 09:18, Peter Xu wrote: > > On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote: > >> On 2016-04-19 10:38, Peter Xu wrote: > > > > [...] > > > >>> By default, IR is disabled to be better compatible with current > >>> QEMU. To enable IR, we can using the following command to boot a > >>> IR-supported VM with virtio-net device with vhost (still do not > >>> support kvm-ioapic, so we need to specify kernel-irqchip={split|off} > >>> here): > >>> > >>> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \ > >> > >> "intr" sounds a bit too much like "interrupt", not "interrupt > >> remapping". Why not use the kernel's form, "intremap"? > > > > Sure. It sounds nice to be aligned with the kernel one. Let me take > > it in v5. > > > >> > >>> -enable-kvm -m 1024 \ > >>> -netdev tap,id=net0,vhost=on \ > >>> -device virtio-net-pci,netdev=user.0 \ > >>> -monitor telnet::3333,server,nowait \ > >>> /var/lib/libvirt/images/vm1.qcow2 > >>> > >>> When guest boots, we can verify whether IR enabled by grepping the > >>> dmesg like: > >>> > >>> [root@localhost ~]# journalctl -k | grep "DMAR-IR" > >>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under > >>> DRHD base 0xfed90000 IOMMU 0 > >>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ > >>> remapping in xapic mode > >>> > >>> Currently supported devices: > >>> > >>> - Emulated/Splitted irqchip > >>> - Generic PCI Devices > >>> - vhost devices > >>> - pass through device support? Not tested, but suppose it should work. > >> > >> I've tested this series against my Jailhouse setup, and it works pretty > >> well! Actually considering to move my test setup over this branch. > > > > This is really encouraging feedback! Btw, thanks for all kinds of > > help on this patchset. :-) > > > >> > >> However, split irqchip still has some issues: When I boot a q35 machine > >> with Linux, the e1000 network adapter only gets a single IRQ delivered. > >> Interestingly, other IOAPIC IRQs like the keyboard work all the time. I > >> didn't debug this in details yet. > > > > I reproduced this problem. It seems that it fails even with > > kernel-irqchip=off. Will try to dig it out. > > Very good. Hope it can be easily fixed.
Hi, Jan, The above issue should be caused by EOI missing of level-triggered interrupts. Before that, I was always using edge-triggered interrupts for test, so didn't encounter this one. Would you please help try below patch? It can be applied directly onto the series, and should solve the issue (it works on my test vm, and I'll take it in v5 as well if it also works for you): ------------------------- diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c index b41ab89..de6a8cf 100644 --- a/hw/intc/ioapic.c +++ b/hw/intc/ioapic.c @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size) return val; } +/* + * This is to satisfy the hack in Linux kernel. One hack of it is to + * simulate clearing the Remote IRR bit of IOAPIC entry using the + * following: + * + * "For IO-APIC's with EOI register, we use that to do an explicit EOI. + * Otherwise, we simulate the EOI message manually by changing the trigger + * mode to edge and then back to level, with RTE being masked during + * this." + * + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701) + * + * This is based on the assumption that, Remote IRR bit will be + * cleared by IOAPIC hardware for edge-triggered interrupts (I + * believe that's what the IOAPIC version 0x1X hardware does). So + * if we are emulating it, we'd better do it the same here, so that + * the guest kernel hack will work as well on QEMU. + * + * Without this, level-triggered interrupts in IR mode might fail to + * work correctly. + */ +static inline void +ioapic_fix_edge_remote_irr(uint64_t *entry) +{ + if (*entry & IOAPIC_LVT_TRIGGER_MODE) { + /* Level triggered interrupts, make sure remote IRR is zero */ + *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR); + } +} + static void ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val, unsigned int size) @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val, s->ioredtbl[index] &= ~0xffffffffULL; s->ioredtbl[index] |= val; } + ioapic_fix_edge_remote_irr(&s->ioredtbl[index]); ioapic_service(s); } } ------------------------ I am still looking into guest part codes. Although the above patch should solve the issue, there are still issues in guest codes when IR is enabled: - mismatched "vector" in IOAPIC entry and IRTE entry (this is required in vt-d spec 5.1.5.1, and required to correctly deliver EOI broadcast I guess). See intel_irq_remapping_prepare_irte(): ... /* * IO-APIC RTE will be configured with virtual vector. * irq handler will do the explicit EOI to the io-apic. */ entry->vector = info->ioapic_pin; ... - I encountered that level-triggered entries in IOAPIC is marked as edge-triggered interrupt in APIC (which is strange)... This will also affect correct delivery of EOI broadcast. I still need time to figure out why. If EOI broadcast can work, e1000 issue would be solved as well even without above patch. [...] > > > >> > >>> - IR fault reporting > >> > >> Would be welcome! I found a "test case" yesterday: misconfigured IOAPIC > >> ID blocked its IRQs under Jailhouse, and I first had to enable tracing > >> to realize it ;). > > > > Yes, it sounds nice to have guest side feedback on IR faults. Will > > do more reading, and see whether I can add one more patch in v5 to > > do this. > > It's not a must-have for getting things merged. In fact, any additional > feature that could now delay the merge of what you have should rather > wait. Stabilizing, addressing style and structure comments is more > important IMO. Okay, then let me add this into my todo list, and will pick this up when got time. Thanks, -- peterx