Alex, Michael, Thank you for the clarification.
On Tue, Mar 15, 2011 at 1:01 AM, Alex Williamson <alex.william...@redhat.com > wrote: > On Mon, 2011-03-14 at 21:00 +0200, Michael S. Tsirkin wrote: > > On Mon, Mar 14, 2011 at 10:35:08PM +0530, rukhsana ansari wrote: > > > Seeking clarification to the original question I posted: > > > >> > > > >> > > > > This maybe a novice question - Would appreciate it if you can you > provide a > > > > pointer to documentation or relevant code that explains what is the > > > > limitation in supporting level irq support in kvm irqfd. > > > > > > > > > > > > > > > After browsing the KVM kernel code, it does look like direct assignment > of PCI > > > devices allows support for level-triggered interrupts to be injected to > the > > > guest from the kernel. (as opposed to not supporting it for vhost > irqfd > > > mechanism) > > > This occurs when the guest device supports INTX. > > > Reference: kvm_assigned_dev_interrupt_work_handler() in assigned-dev.c > calls > > > kvm_set_irq() > > > with the guest_irq. > > > This function in turn invokes the assigned set function (either > > > kvm_set_pic_irq or kvm_set_ioapic_irq) which was setup at kvm_irq_chip > creation > > > time when kvm_setup_default_irq_routing () called for handling ioctl > > > KVM_CREATE_IRQCHIP. > > > > > > So, it isn't clear why level-triggered interrupt isn't supported for > irqfd > > > mechanism. > > > Would greatly appreciate clarification here > > > > > > Thanks > > > -Rukhsana > > > > > > > Mostly, no one came up with an implementation so far. > > > > If the point is to use irqfd with vhost-net, there's also > > a question of adding interfaces to > > 1. pass IO read transactions directly to another kernel module > > 2. add an interface to clear the irq level > > > > Maybe the right thing is to combine the two somehow: > > irqfd might get an oiption to set a bit in memory, > > ioeventfd might get an option to read and clear from memory > > and clear irqfd line at the same time. > > I had wanted this for VFIO too and it gets pretty complicated. The > first problem with level triggered interrupts is that you need to know > which GSI your device triggers. This means translating PCI INTA through > bridge swizzles and chipset mapping to an IOAPIC. Current device > assignment does this through a complete hack in qemu. Then you can set > the IRQ, but being level triggered, we need to know when the guest has > serviced the IRQ so we can de-assert it. This requires a hook into the > in-kernel APIC to sent the EOI back out to userspace. > > I posted RFC patches for doing all this a while back, but they didn't go > anywhere. I think the feeling was that it was too intrusive for "slow" > interrupts. The current thinking for VFIO based device assignment is to > use qemu for level interrupts until we find something that actually > needs low latency in this path. We generally consider INTx to be like > supporting i/o port space or non-4k BARs, ie. necessary for > compatibility, but not necessarily a performance path. High performance > devices should always be using some kind of MSI because it bypasses all > of the APIC complications and slowness. Thanks, > > Alex > > -- -Rukhsana