Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support
Gregory Haskins wrote: Won't this have scaling issues? One IRQ means one target vcpu. Whereas I'd like virtio devices to span multiple queues, each queue with its own MSI IRQ. Hmm..you know I hadnt really thought of it that way, but you have a point. To clarify, my design actually uses one IRQ per "eventq", where we can have an arbitrary number of eventq's defined (note: today I only define one eventq, however). An eventq is actually a shm-ring construct where I can pass events up to the host like "device added" or "ring X signaled". Each individual device based virtio-ring would then aggregates "signal" events onto this eventq mechanism to actually inject events to the host. Only the eventq itself injects an actual IRQ to the assigned vcpu. You will get get cachelines bounced around when events from different devices are added to the queue. On the plus side, a single injection can contain interrupts for multiple devices. I'm not sure how useful this coalescing is; certainly you will never see it on microbenchmarks, but that doesn't mean it's not useful. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support
Avi Kivity wrote: > Gregory Haskins wrote: >>> - works with all guests >>> - supports hotplug/hotunplug, udev, sysfs, module autoloading, ... >>> - supported in all OSes >>> - someone else maintains it >>> >> These points are all valid, and I really struggled with this particular >> part of the design. The entire vbus design only requires one IRQ for >> the entire guest, > > Won't this have scaling issues? One IRQ means one target vcpu. > Whereas I'd like virtio devices to span multiple queues, each queue > with its own MSI IRQ. Hmm..you know I hadnt really thought of it that way, but you have a point. To clarify, my design actually uses one IRQ per "eventq", where we can have an arbitrary number of eventq's defined (note: today I only define one eventq, however). An eventq is actually a shm-ring construct where I can pass events up to the host like "device added" or "ring X signaled". Each individual device based virtio-ring would then aggregates "signal" events onto this eventq mechanism to actually inject events to the host. Only the eventq itself injects an actual IRQ to the assigned vcpu. My intended use of multiple eventqs was for prioritization of different rings. For instance, we could define 8 priority levels, each with its own ring/irq. That way, a virtio-net that supports something like 802.1p could define 8 virtio-rings, one for each priority level. But this scheme is more targeted at prioritization than per vcpu irq-balancing. I support the eventq construct I proposed could still be used in this fashion since each has its own routable IRQ. However, I would have to think about that some more because it is beyond the design spec. The good news is that the decision to use the "eventq+irq" approach is completely contained in the kvm-host+guest.patch. We could easily switch to a 1:1 irq:shm-signal if we wanted to, and the device/drivers would work exactly the same without modification. > Also, the single IRQ handler will need to scan for all potential IRQ > sources. Even if implemented carefully, this will cause many > cacheline bounces. Well, no, I think this part is covered. As mentioned above, we use a queuing technique so there is no scanning needed. Ultimately I would love to adapt a similar technique to optionally replace the LAPIC. That way we can avoid the EOI trap and just consume the next interrupt (if applicable) from the shm-ring. > >> so its conceivable that I could present a simple >> "dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on >> the IRQ routing logic. Then userspace could simply pass the IRQ routing >> info down to the kernel with an ioctl, or something similar. >> > > Xen does something similar, I believe. > >> I think ultimately I was trying to stay away from PCI in general because >> I want to support environments that do not have PCI. However, for the >> kvm-transport case (at least on x86) this isnt really a constraint. >> >> > > s/PCI/the native IRQ solution for your platform/. virtio has the same > problem; on s390 we use the native (if that word ever applies to s390) > interrupt and device discovery mechanism. yeah, I agree. We can contain the "exposure" of PCI to just platforms within KVM that care about it. -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support
Gregory Haskins wrote: - works with all guests - supports hotplug/hotunplug, udev, sysfs, module autoloading, ... - supported in all OSes - someone else maintains it These points are all valid, and I really struggled with this particular part of the design. The entire vbus design only requires one IRQ for the entire guest, Won't this have scaling issues? One IRQ means one target vcpu. Whereas I'd like virtio devices to span multiple queues, each queue with its own MSI IRQ. Also, the single IRQ handler will need to scan for all potential IRQ sources. Even if implemented carefully, this will cause many cacheline bounces. so its conceivable that I could present a simple "dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on the IRQ routing logic. Then userspace could simply pass the IRQ routing info down to the kernel with an ioctl, or something similar. Xen does something similar, I believe. I think ultimately I was trying to stay away from PCI in general because I want to support environments that do not have PCI. However, for the kvm-transport case (at least on x86) this isnt really a constraint. s/PCI/the native IRQ solution for your platform/. virtio has the same problem; on s390 we use the native (if that word ever applies to s390) interrupt and device discovery mechanism. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support
Avi Kivity wrote: > Gregory Haskins wrote: >> This patch provides the ability to dynamically declare and map an >> interrupt-request handle to an x86 8-bit vector. >> >> Problem Statement: Emulated devices (such as PCI, ISA, etc) have >> interrupt routing done via standard PC mechanisms (MP-table, ACPI, >> etc). However, we also want to support a new class of devices >> which exist in a new virtualized namespace and therefore should >> not try to piggyback on these emulated mechanisms. Rather, we >> create a way to dynamically register interrupt resources that >> acts indepent of the emulated counterpart. >> >> On x86, a simplistic view of the interrupt model is that each core >> has a local-APIC which can recieve messages from APIC-compliant >> routing devices (such as IO-APIC and MSI) regarding details about >> an interrupt (such as which vector to raise). These routing devices >> are controlled by the OS so they may translate a physical event >> (such as "e1000: raise an RX interrupt") to a logical destination >> (such as "inject IDT vector 46 on core 3"). A dynirq is a virtual >> implementation of such a router (think of it as a virtual-MSI, but >> without the coupling to an existing standard, such as PCI). >> >> The model is simple: A guest OS can allocate the mapping of "IRQ" >> handle to "vector/core" in any way it sees fit, and provide this >> information to the dynirq module running in the host. The assigned >> IRQ then becomes the sole handle needed to inject an IDT vector >> to the guest from a host. A host entity that wishes to raise an >> interrupt simple needs to call kvm_inject_dynirq(irq) and the routing >> is performed transparently. >> > > A major disadvantage of dynirq is that it will only work on guests > which have been ported to it. So this will only be useful on newer > Linux, and will likely never work with Windows guests. > > Why is having an emulated PCI device so bad? We found that it has > several advantages: > - works with all guests > - supports hotplug/hotunplug, udev, sysfs, module autoloading, ... > - supported in all OSes > - someone else maintains it These points are all valid, and I really struggled with this particular part of the design. The entire vbus design only requires one IRQ for the entire guest, so its conceivable that I could present a simple "dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on the IRQ routing logic. Then userspace could simply pass the IRQ routing info down to the kernel with an ioctl, or something similar. Ultimately I wasn't sure whether I wanted all that goo just to get an IRQ assignment...but on the other hand, we have all this goo to build one in the first place, and its half on the guest side which has the disadvantages you mention. So perhaps this should go in favor of a PCI-esqe type solution, as I think you are suggesting. I think ultimately I was trying to stay away from PCI in general because I want to support environments that do not have PCI. However, for the kvm-transport case (at least on x86) this isnt really a constraint. > > See also the kvm irq routing work, merged into 2.6.30, which does a > small part of what you're describing (the "sole handle" part, > specifically). I will take a look, thanks! (I wish I wish you had accepted those irq patches I wrote a while back. It had the foundation for this type of stuff all built in. But alas, I think it was before its time, and I didn't do a good job of explaining my future plans) ;) Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support
Gregory Haskins wrote: This patch provides the ability to dynamically declare and map an interrupt-request handle to an x86 8-bit vector. Problem Statement: Emulated devices (such as PCI, ISA, etc) have interrupt routing done via standard PC mechanisms (MP-table, ACPI, etc). However, we also want to support a new class of devices which exist in a new virtualized namespace and therefore should not try to piggyback on these emulated mechanisms. Rather, we create a way to dynamically register interrupt resources that acts indepent of the emulated counterpart. On x86, a simplistic view of the interrupt model is that each core has a local-APIC which can recieve messages from APIC-compliant routing devices (such as IO-APIC and MSI) regarding details about an interrupt (such as which vector to raise). These routing devices are controlled by the OS so they may translate a physical event (such as "e1000: raise an RX interrupt") to a logical destination (such as "inject IDT vector 46 on core 3"). A dynirq is a virtual implementation of such a router (think of it as a virtual-MSI, but without the coupling to an existing standard, such as PCI). The model is simple: A guest OS can allocate the mapping of "IRQ" handle to "vector/core" in any way it sees fit, and provide this information to the dynirq module running in the host. The assigned IRQ then becomes the sole handle needed to inject an IDT vector to the guest from a host. A host entity that wishes to raise an interrupt simple needs to call kvm_inject_dynirq(irq) and the routing is performed transparently. A major disadvantage of dynirq is that it will only work on guests which have been ported to it. So this will only be useful on newer Linux, and will likely never work with Windows guests. Why is having an emulated PCI device so bad? We found that it has several advantages: - works with all guests - supports hotplug/hotunplug, udev, sysfs, module autoloading, ... - supported in all OSes - someone else maintains it See also the kvm irq routing work, merged into 2.6.30, which does a small part of what you're describing (the "sole handle" part, specifically). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 15/17] kvm: add dynamic IRQ support
This patch provides the ability to dynamically declare and map an interrupt-request handle to an x86 8-bit vector. Problem Statement: Emulated devices (such as PCI, ISA, etc) have interrupt routing done via standard PC mechanisms (MP-table, ACPI, etc). However, we also want to support a new class of devices which exist in a new virtualized namespace and therefore should not try to piggyback on these emulated mechanisms. Rather, we create a way to dynamically register interrupt resources that acts indepent of the emulated counterpart. On x86, a simplistic view of the interrupt model is that each core has a local-APIC which can recieve messages from APIC-compliant routing devices (such as IO-APIC and MSI) regarding details about an interrupt (such as which vector to raise). These routing devices are controlled by the OS so they may translate a physical event (such as "e1000: raise an RX interrupt") to a logical destination (such as "inject IDT vector 46 on core 3"). A dynirq is a virtual implementation of such a router (think of it as a virtual-MSI, but without the coupling to an existing standard, such as PCI). The model is simple: A guest OS can allocate the mapping of "IRQ" handle to "vector/core" in any way it sees fit, and provide this information to the dynirq module running in the host. The assigned IRQ then becomes the sole handle needed to inject an IDT vector to the guest from a host. A host entity that wishes to raise an interrupt simple needs to call kvm_inject_dynirq(irq) and the routing is performed transparently. Signed-off-by: Gregory Haskins --- arch/x86/Kconfig|5 + arch/x86/Makefile |3 arch/x86/include/asm/kvm_host.h |9 + arch/x86/include/asm/kvm_para.h | 11 + arch/x86/kvm/Makefile |3 arch/x86/kvm/dynirq.c | 329 +++ arch/x86/kvm/guest/Makefile |2 arch/x86/kvm/guest/dynirq.c | 95 +++ arch/x86/kvm/x86.c |6 + include/linux/kvm.h |1 include/linux/kvm_guest.h |7 + include/linux/kvm_host.h|1 include/linux/kvm_para.h|1 13 files changed, 472 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kvm/dynirq.c create mode 100644 arch/x86/kvm/guest/Makefile create mode 100644 arch/x86/kvm/guest/dynirq.c create mode 100644 include/linux/kvm_guest.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3fca247..91fefd5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -446,6 +446,11 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_GUEST_DYNIRQ + bool "KVM Dynamic IRQ support" + depends on KVM_GUEST + default y + source "arch/x86/lguest/Kconfig" config PARAVIRT diff --git a/arch/x86/Makefile b/arch/x86/Makefile index d1a47ad..d788815 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -147,6 +147,9 @@ core-$(CONFIG_XEN) += arch/x86/xen/ # lguest paravirtualization support core-$(CONFIG_LGUEST_GUEST) += arch/x86/lguest/ +# kvm paravirtualization support +core-$(CONFIG_KVM_GUEST) += arch/x86/kvm/guest/ + core-y += arch/x86/kernel/ core-y += arch/x86/mm/ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 730843d..9ae398a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -346,6 +346,12 @@ struct kvm_mem_alias { gfn_t target_gfn; }; +struct kvm_dynirq { + spinlock_t lock; + struct rb_root map; + struct kvm *kvm; +}; + struct kvm_arch{ int naliases; struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS]; @@ -363,6 +369,7 @@ struct kvm_arch{ struct iommu_domain *iommu_domain; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; + struct kvm_dynirq *dynirq; struct kvm_pit *vpit; struct hlist_head irq_ack_notifier_list; int vapics_in_nmi_mode; @@ -519,6 +526,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes); int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); +int kvm_dynirq_hc(struct kvm_vcpu *vcpu, int nr, gpa_t gpa, size_t len); +void kvm_free_dynirq(struct kvm *kvm); extern bool tdp_enabled; diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index b8a3305..fba210e 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -13,6 +13,7 @@ #define KVM_FEATURE_CLOCKSOURCE0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 +#define KVM_FEATURE_DYNIRQ 3 #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 @@ -45,6 +46,16 @@ struct kvm_mmu_op_release_pt { __u64 pt_phys; }; +/* Operations for KVM_HC_DYNIRQ */ +#define KVM_DYNIRQ_OP_SE