Gleb Natapov <g...@redhat.com> writes: > On Sun, Jun 23, 2013 at 10:06:05AM -0500, Anthony Liguori wrote: >> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson >> <alex.william...@redhat.com> wrote: >> > On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote: >> >> On 06/21/2013 12:34 PM, Alex Williamson wrote: >> >> >> >> >> >> Do not follow you, sorry. For x86, is it that MSI routing table which is >> >> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece >> >> of >> >> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()? >> > >> > vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data) >> > >> > This writes directly to the interrupt block on the vCPU. With KVM, the >> > in-kernel APIC does the same write, where the pin to MSIMessage is setup >> > by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd. >> >> What is this "interrupt block on the vCPU" you speak of? I reviewed > FEE00000H address as seen from PCI bus is a special address range (see > 10.11.1 in SDM).
Ack. > Any write by a PCI device to that address range is > interpreted as MSI. We do not model this correctly in QEMU yet since > all devices, including vcpus, see exactly same memory map. This should be a per-device mapping, yes. But I'm not sure that VCPUs should even see anything. I don't think a VCPU can generate an MSI interrupt by writing to this location. >> the SDM and see nothing in the APIC protocol or the brief description >> of MSI as a PCI concept that would indicate anything except that the >> PHB handles MSI writes and feeds them to the I/O APIC. >> > I/O APIC? Did you mean APIC, but even that will probably be incorrect. > I'd say it translates the data to APIC bus message. And with interrupt > remapping there is more magic happens between MSI and APIC bus. I think the wording in the SDM allows either. >> In fact, the wikipedia article on MSI has: >> >> "A common misconception with Message Signaled Interrupts is that they >> allow the device to send data to a processor as part of the interrupt. >> The data that is sent as part of the write is used by the chipset to >> determine which interrupt to trigger on which processor; it is not >> available for the device to communicate additional information to the >> interrupt handler." >> > Not sure who claimed otherwise. So to summarize: 1) MSI writes are intercepted by the PHB and generates an appropriate IRQ. 2) The PHB has a tuple of (src device, address, data) plus whatever information it maintains to do the translation. 3) On Power, we can have multiple PHBs. 4) The kernel interface assumes a single flat table mapping (address, data) to interrupts. We try to keep that table up-to-date in QEMU. 5) The reason the kernel has MSI info at all is to allow for IRQFDs to generate MSI interrupts. Is there anything that prevents us from using IRQFDs corresponding to the target of an MSI mapping and get rid of the MSI info in the kernel? It seems like the only sane way to actually support (2) and (3). Regards, Anthony Liguori