Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Avi Kivity

Gregory Haskins wrote:
Won't this have scaling issues?  One IRQ means one target vcpu. 
Whereas I'd like virtio devices to span multiple queues, each queue

with its own MSI IRQ.


Hmm..you know I hadnt really thought of it that way, but you have a
point.  To clarify, my design actually uses one IRQ per "eventq", where
we can have an arbitrary number of eventq's defined (note: today I only
define one eventq, however).  An eventq is actually a shm-ring construct
where I can pass events up to the host like "device added" or "ring X
signaled".  Each individual device based virtio-ring would then
aggregates "signal" events onto this eventq mechanism to actually inject
events to the host.  Only the eventq itself injects an actual IRQ to the
assigned vcpu.
  


You will get get cachelines bounced around when events from different 
devices are added to the queue.  On the plus side, a single injection 
can contain interrupts for multiple devices.


I'm not sure how useful this coalescing is; certainly you will never see 
it on microbenchmarks, but that doesn't mean it's not useful.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>>> - works with all guests
>>> - supports hotplug/hotunplug, udev, sysfs, module autoloading, ...
>>> - supported in all OSes
>>> - someone else maintains it
>>> 
>> These points are all valid, and I really struggled with this particular
>> part of the design.  The entire vbus design only requires one IRQ for
>> the entire guest,
>
> Won't this have scaling issues?  One IRQ means one target vcpu. 
> Whereas I'd like virtio devices to span multiple queues, each queue
> with its own MSI IRQ.
Hmm..you know I hadnt really thought of it that way, but you have a
point.  To clarify, my design actually uses one IRQ per "eventq", where
we can have an arbitrary number of eventq's defined (note: today I only
define one eventq, however).  An eventq is actually a shm-ring construct
where I can pass events up to the host like "device added" or "ring X
signaled".  Each individual device based virtio-ring would then
aggregates "signal" events onto this eventq mechanism to actually inject
events to the host.  Only the eventq itself injects an actual IRQ to the
assigned vcpu.

My intended use of multiple eventqs was for prioritization of different
rings.  For instance, we could define 8 priority levels, each with its
own ring/irq.  That way, a virtio-net that supports something like
802.1p could define 8 virtio-rings, one for each priority level.

But this scheme is more targeted at prioritization than per vcpu
irq-balancing.  I support the eventq construct I proposed could still be
used in this fashion since each has its own routable IRQ.  However, I
would have to think about that some more because it is beyond the design
spec.

The good news is that the decision to use the "eventq+irq" approach is
completely contained in the kvm-host+guest.patch.  We could easily
switch to a 1:1 irq:shm-signal if we wanted to, and the device/drivers
would work exactly the same without modification.

>   Also, the single IRQ handler will need to scan for all potential IRQ
> sources.  Even if implemented carefully, this will cause many
> cacheline bounces.
Well, no, I think this part is covered.  As mentioned above, we use a
queuing technique so there is no scanning needed.  Ultimately I would
love to adapt a similar technique to optionally replace the LAPIC.  That
way we can avoid the EOI trap and just consume the next interrupt (if
applicable) from the shm-ring.

>
>>  so its conceivable that I could present a simple
>> "dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on
>> the IRQ routing logic.  Then userspace could simply pass the IRQ routing
>> info down to the kernel with an ioctl, or something similar.
>>   
>
> Xen does something similar, I believe.
>
>> I think ultimately I was trying to stay away from PCI in general because
>> I want to support environments that do not have PCI.  However, for the
>> kvm-transport case (at least on x86) this isnt really a constraint.
>>
>>   
>
> s/PCI/the native IRQ solution for your platform/. virtio has the same
> problem; on s390 we use the native (if that word ever applies to s390)
> interrupt and device discovery mechanism.

yeah, I agree.  We can contain the "exposure" of PCI to just platforms
within KVM that care about it.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Avi Kivity

Gregory Haskins wrote:

- works with all guests
- supports hotplug/hotunplug, udev, sysfs, module autoloading, ...
- supported in all OSes
- someone else maintains it


These points are all valid, and I really struggled with this particular
part of the design.  The entire vbus design only requires one IRQ for
the entire guest,


Won't this have scaling issues?  One IRQ means one target vcpu.  Whereas 
I'd like virtio devices to span multiple queues, each queue with its own 
MSI IRQ.  Also, the single IRQ handler will need to scan for all 
potential IRQ sources.  Even if implemented carefully, this will cause 
many cacheline bounces.



 so its conceivable that I could present a simple
"dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on
the IRQ routing logic.  Then userspace could simply pass the IRQ routing
info down to the kernel with an ioctl, or something similar.
  


Xen does something similar, I believe.


I think ultimately I was trying to stay away from PCI in general because
I want to support environments that do not have PCI.  However, for the
kvm-transport case (at least on x86) this isnt really a constraint.

  


s/PCI/the native IRQ solution for your platform/. virtio has the same 
problem; on s390 we use the native (if that word ever applies to s390) 
interrupt and device discovery mechanism.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>> This patch provides the ability to dynamically declare and map an
>> interrupt-request handle to an x86 8-bit vector.
>>
>> Problem Statement: Emulated devices (such as PCI, ISA, etc) have
>> interrupt routing done via standard PC mechanisms (MP-table, ACPI,
>> etc).  However, we also want to support a new class of devices
>> which exist in a new virtualized namespace and therefore should
>> not try to piggyback on these emulated mechanisms.  Rather, we
>> create a way to dynamically register interrupt resources that
>> acts indepent of the emulated counterpart.
>>
>> On x86, a simplistic view of the interrupt model is that each core
>> has a local-APIC which can recieve messages from APIC-compliant
>> routing devices (such as IO-APIC and MSI) regarding details about
>> an interrupt (such as which vector to raise).  These routing devices
>> are controlled by the OS so they may translate a physical event
>> (such as "e1000: raise an RX interrupt") to a logical destination
>> (such as "inject IDT vector 46 on core 3").  A dynirq is a virtual
>> implementation of such a router (think of it as a virtual-MSI, but
>> without the coupling to an existing standard, such as PCI).
>>
>> The model is simple: A guest OS can allocate the mapping of "IRQ"
>> handle to "vector/core" in any way it sees fit, and provide this
>> information to the dynirq module running in the host.  The assigned
>> IRQ then becomes the sole handle needed to inject an IDT vector
>> to the guest from a host.  A host entity that wishes to raise an
>> interrupt simple needs to call kvm_inject_dynirq(irq) and the routing
>> is performed transparently.
>>   
>
> A major disadvantage of dynirq is that it will only work on guests
> which have been ported to it.  So this will only be useful on newer
> Linux, and will likely never work with Windows guests.
>
> Why is having an emulated PCI device so bad?  We found that it has
> several advantages:
> - works with all guests
> - supports hotplug/hotunplug, udev, sysfs, module autoloading, ...
> - supported in all OSes
> - someone else maintains it
These points are all valid, and I really struggled with this particular
part of the design.  The entire vbus design only requires one IRQ for
the entire guest, so its conceivable that I could present a simple
"dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on
the IRQ routing logic.  Then userspace could simply pass the IRQ routing
info down to the kernel with an ioctl, or something similar.

Ultimately I wasn't sure whether I wanted all that goo just to get an
IRQ assignment...but on the other hand, we have all this goo to build
one in the first place, and its half on the guest side which has the
disadvantages you mention.  So perhaps this should go in favor of a
PCI-esqe type solution, as I think you are suggesting.

I think ultimately I was trying to stay away from PCI in general because
I want to support environments that do not have PCI.  However, for the
kvm-transport case (at least on x86) this isnt really a constraint.

>
> See also the kvm irq routing work, merged into 2.6.30, which does a
> small part of what you're describing (the "sole handle" part,
> specifically).

I will take a look, thanks!

(I wish I wish you had accepted those irq patches I wrote a while back. 
It had the foundation for this type of stuff all built in.  But alas, I
think it was before its time, and I didn't do a good job of explaining
my future plans) ;)

Regards,
-Greg






signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Avi Kivity

Gregory Haskins wrote:

This patch provides the ability to dynamically declare and map an
interrupt-request handle to an x86 8-bit vector.

Problem Statement: Emulated devices (such as PCI, ISA, etc) have
interrupt routing done via standard PC mechanisms (MP-table, ACPI,
etc).  However, we also want to support a new class of devices
which exist in a new virtualized namespace and therefore should
not try to piggyback on these emulated mechanisms.  Rather, we
create a way to dynamically register interrupt resources that
acts indepent of the emulated counterpart.

On x86, a simplistic view of the interrupt model is that each core
has a local-APIC which can recieve messages from APIC-compliant
routing devices (such as IO-APIC and MSI) regarding details about
an interrupt (such as which vector to raise).  These routing devices
are controlled by the OS so they may translate a physical event
(such as "e1000: raise an RX interrupt") to a logical destination
(such as "inject IDT vector 46 on core 3").  A dynirq is a virtual
implementation of such a router (think of it as a virtual-MSI, but
without the coupling to an existing standard, such as PCI).

The model is simple: A guest OS can allocate the mapping of "IRQ"
handle to "vector/core" in any way it sees fit, and provide this
information to the dynirq module running in the host.  The assigned
IRQ then becomes the sole handle needed to inject an IDT vector
to the guest from a host.  A host entity that wishes to raise an
interrupt simple needs to call kvm_inject_dynirq(irq) and the routing
is performed transparently.
  


A major disadvantage of dynirq is that it will only work on guests which 
have been ported to it.  So this will only be useful on newer Linux, and 
will likely never work with Windows guests.


Why is having an emulated PCI device so bad?  We found that it has 
several advantages:

- works with all guests
- supports hotplug/hotunplug, udev, sysfs, module autoloading, ...
- supported in all OSes
- someone else maintains it

See also the kvm irq routing work, merged into 2.6.30, which does a 
small part of what you're describing (the "sole handle" part, specifically).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 15/17] kvm: add dynamic IRQ support

2009-03-31 Thread Gregory Haskins
This patch provides the ability to dynamically declare and map an
interrupt-request handle to an x86 8-bit vector.

Problem Statement: Emulated devices (such as PCI, ISA, etc) have
interrupt routing done via standard PC mechanisms (MP-table, ACPI,
etc).  However, we also want to support a new class of devices
which exist in a new virtualized namespace and therefore should
not try to piggyback on these emulated mechanisms.  Rather, we
create a way to dynamically register interrupt resources that
acts indepent of the emulated counterpart.

On x86, a simplistic view of the interrupt model is that each core
has a local-APIC which can recieve messages from APIC-compliant
routing devices (such as IO-APIC and MSI) regarding details about
an interrupt (such as which vector to raise).  These routing devices
are controlled by the OS so they may translate a physical event
(such as "e1000: raise an RX interrupt") to a logical destination
(such as "inject IDT vector 46 on core 3").  A dynirq is a virtual
implementation of such a router (think of it as a virtual-MSI, but
without the coupling to an existing standard, such as PCI).

The model is simple: A guest OS can allocate the mapping of "IRQ"
handle to "vector/core" in any way it sees fit, and provide this
information to the dynirq module running in the host.  The assigned
IRQ then becomes the sole handle needed to inject an IDT vector
to the guest from a host.  A host entity that wishes to raise an
interrupt simple needs to call kvm_inject_dynirq(irq) and the routing
is performed transparently.

Signed-off-by: Gregory Haskins 
---

 arch/x86/Kconfig|5 +
 arch/x86/Makefile   |3 
 arch/x86/include/asm/kvm_host.h |9 +
 arch/x86/include/asm/kvm_para.h |   11 +
 arch/x86/kvm/Makefile   |3 
 arch/x86/kvm/dynirq.c   |  329 +++
 arch/x86/kvm/guest/Makefile |2 
 arch/x86/kvm/guest/dynirq.c |   95 +++
 arch/x86/kvm/x86.c  |6 +
 include/linux/kvm.h |1 
 include/linux/kvm_guest.h   |7 +
 include/linux/kvm_host.h|1 
 include/linux/kvm_para.h|1 
 13 files changed, 472 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kvm/dynirq.c
 create mode 100644 arch/x86/kvm/guest/Makefile
 create mode 100644 arch/x86/kvm/guest/dynirq.c
 create mode 100644 include/linux/kvm_guest.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3fca247..91fefd5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -446,6 +446,11 @@ config KVM_GUEST
 This option enables various optimizations for running under the KVM
 hypervisor.
 
+config KVM_GUEST_DYNIRQ
+   bool "KVM Dynamic IRQ support"
+   depends on KVM_GUEST
+   default y
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index d1a47ad..d788815 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -147,6 +147,9 @@ core-$(CONFIG_XEN) += arch/x86/xen/
 # lguest paravirtualization support
 core-$(CONFIG_LGUEST_GUEST) += arch/x86/lguest/
 
+# kvm paravirtualization support
+core-$(CONFIG_KVM_GUEST) += arch/x86/kvm/guest/
+
 core-y += arch/x86/kernel/
 core-y += arch/x86/mm/
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 730843d..9ae398a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -346,6 +346,12 @@ struct kvm_mem_alias {
gfn_t target_gfn;
 };
 
+struct kvm_dynirq {
+   spinlock_t lock;
+   struct rb_root map;
+   struct kvm *kvm;
+};
+
 struct kvm_arch{
int naliases;
struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -363,6 +369,7 @@ struct kvm_arch{
struct iommu_domain *iommu_domain;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
+   struct kvm_dynirq *dynirq;
struct kvm_pit *vpit;
struct hlist_head irq_ack_notifier_list;
int vapics_in_nmi_mode;
@@ -519,6 +526,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
  gpa_t addr, unsigned long *ret);
+int kvm_dynirq_hc(struct kvm_vcpu *vcpu, int nr, gpa_t gpa, size_t len);
+void kvm_free_dynirq(struct kvm *kvm);
 
 extern bool tdp_enabled;
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index b8a3305..fba210e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -13,6 +13,7 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+#define KVM_FEATURE_DYNIRQ 3
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
@@ -45,6 +46,16 @@ struct kvm_mmu_op_release_pt {
__u64 pt_phys;
 };
 
+/* Operations for KVM_HC_DYNIRQ */
+#define KVM_DYNIRQ_OP_SE