Re: [PATCH 2/4] KVM: pci device assignment

2008-07-28 Thread Ben-Ami Yassour
On Mon, 2008-07-28 at 15:27 +0800, Yang, Sheng wrote:
> On Tuesday 22 July 2008 20:13:53 Ben-Ami Yassour wrote:
> >
> > -int kvm_pic_read_irq(struct kvm_pic *s)
> > +int kvm_pic_read_irq(struct kvm *kvm)
> >  {
> > int irq, irq2, intno;
> > +   struct kvm_pic *s = pic_irqchip(kvm);
> >
> > irq = pic_get_irq(&s->pics[0]);
> > if (irq >= 0) {
> > @@ -186,6 +187,8 @@ int kvm_pic_read_irq(struct kvm_pic *s)
> > irq = 7;
> > intno = s->pics[0].irq_base + irq;
> > }
> > +   kvm_notify_acked_irq(kvm, irq);
> > +
> > pic_update_irq(s);
> >
> > return intno;
> 
> One coding style suggestion,
> 
> struct kvm_pic has *irq_request_opaque which is struct kvm indeed. How 
> about contain kvm in kvm_pic explicitly? (seems cleaner, though needs 
> more modification).

Your suggestion is applied in the next set of patches that I sent.

> 
> 
> Another thing is about host share IRQ: Do we want follow the straight 
> forward way to add it? That's it, return IRQ_HANDLED from irq handler 
> and wait for EOI of guest?

That's what the code in the current patches doing now. It also disables
the irq until the EOI of the guest. If we don't do that, then the
interrupt handler keeps getting called (which causes many guest exit)
until the guest finally interacts with the device to release the
interrupt line.

Thanks,
Ben

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: pci device assignment

2008-07-28 Thread Amit Shah
* On Tuesday 22 Jul 2008 17:43:53 Ben-Ami Yassour wrote:
> Based on a patch from: Amit Shah <[EMAIL PROTECTED]>
>
> This patch adds support for handling PCI devices that are assigned to
> the guest.
>
> The device to be assigned to the guest is registered in the host kernel
> and interrupt delivery is handled. If a device is already assigned, or
> the device driver for it is still loaded on the host, the device
> assignment
> is failed by conveying a -EBUSY reply to the userspace.
>
> Devices that share their interrupt line are not supported at the moment.
>
> By itself, this patch will not make devices work within the guest.
> The VT-d extension is required to enable the device to perform DMA.
> Another alternative is PVDMA.
>
> Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
> Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
> Signed-off-by: Weidong Han <[EMAIL PROTECTED]>


> + if (pci_enable_device(dev)) {
> + printk(KERN_INFO "%s: Could not enable PCI device\n", __func__);
> + r = -EBUSY;
> + goto out_put;
> + }
> + r = pci_request_regions(dev, "kvm_assigned_device");
> + if (r) {
> + printk(KERN_INFO "%s: Could not get access to device regions\n",
> +__func__);
> + goto out_disable;
> + }

A driver might already be loaded for this device. If we fail to get the 
regions, it might mean the device was enabled and in use. That's the reason I 
didn't originally have a pci_disable_device() when pci_request_regions fails.

> +out:
> + mutex_unlock(&kvm->lock);
> + return r;
> +out_list_del:
> + list_del(&match->list);
> + pci_release_regions(dev);
> +out_disable:
> + pci_disable_device(dev);
> +out_put:
> + pci_dev_put(dev);
> +out_free:
> + kfree(match);
> + mutex_unlock(&kvm->lock);
> + return r;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: pci device assignment

2008-07-28 Thread Yang, Sheng
On Tuesday 22 July 2008 20:13:53 Ben-Ami Yassour wrote:
>
> -int kvm_pic_read_irq(struct kvm_pic *s)
> +int kvm_pic_read_irq(struct kvm *kvm)
>  {
>   int irq, irq2, intno;
> + struct kvm_pic *s = pic_irqchip(kvm);
>
>   irq = pic_get_irq(&s->pics[0]);
>   if (irq >= 0) {
> @@ -186,6 +187,8 @@ int kvm_pic_read_irq(struct kvm_pic *s)
>   irq = 7;
>   intno = s->pics[0].irq_base + irq;
>   }
> + kvm_notify_acked_irq(kvm, irq);
> +
>   pic_update_irq(s);
>
>   return intno;

One coding style suggestion,

struct kvm_pic has *irq_request_opaque which is struct kvm indeed. How 
about contain kvm in kvm_pic explicitly? (seems cleaner, though needs 
more modification).


Another thing is about host share IRQ: Do we want follow the straight 
forward way to add it? That's it, return IRQ_HANDLED from irq handler 
and wait for EOI of guest?

I found it's not easy to provide support for MSI without shared IRQ 
support. We don't know when guest want to enable MSI or it decided to 
shared IRQ, maybe we can intercept the unmask of IOAPIC to know that, 
but it's a little tricky.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: pci device assignment

2008-07-26 Thread Avi Kivity

Ben-Ami Yassour wrote:

Based on a patch from: Amit Shah <[EMAIL PROTECTED]>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

 
+struct kvm_assigned_dev_kernel

+*kvm_find_assigned_dev(struct list_head *head,
  


Keep these two on the same line.


+
+static int
+kvm_vm_ioctl_device_assignment(struct kvm *kvm,
  


Ditto.


+  struct kvm_assigned_dev *assigned_dev)
+{
+   int r = 0;
+   struct kvm_assigned_dev_kernel *match;
+   struct pci_dev *dev;
+
+   if (assigned_dev->host.num_valid_irqs != 1) {
+   printk(KERN_INFO "%s: Unsupported number of irqs %d\n",
+  __func__, assigned_dev->host.num_valid_irqs);
+   return -EINVAL;
+   }
  


We also support zero irqs.  While this patch doesn't do much with the 
information (only claim the device), this is still useful.




diff --git a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
index 8f13749..12b4b25 100644
--- a/include/asm-x86/kvm.h
+++ b/include/asm-x86/kvm.h
@@ -208,4 +208,5 @@ struct kvm_pit_channel_state {
 struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
 };
+
 #endif
  


Unrelated.


diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index e2864e6..34eb3e7 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -325,6 +325,25 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+/* For assigned devices, we schedule work in the system workqueue to

+ * inject interrupts into the guest when an interrupt occurs on the
+ * physical device and also when the guest acks the interrupt.
+ */
+struct kvm_assigned_dev_work {
+   struct work_struct work;
+   struct kvm_assigned_dev_kernel *assigned_dev;
+};
+
+struct kvm_assigned_dev_kernel {
+   struct kvm_irq_ack_notifier ack_notifier;
+   struct list_head list;
+   struct kvm_assigned_dev_info guest;
+   struct kvm_assigned_dev_info host;
+   struct kvm_assigned_dev_work int_work;
  


Just put the work_struct here, and use container_of() instead of 
->assigned_dev.



 struct kvm_arch{
int naliases;
struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -337,6 +356,7 @@ struct kvm_arch{
 * Hash table of struct kvm_mmu_page.
 */
struct list_head active_mmu_pages;
+   struct list_head assigned_dev_head;
  


This is useful for non-x86 (except s390).  It's okay to leave it to the 
ia64/ppc maintainers, though.



struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index 76f3921..3aa1731 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -143,5 +143,4 @@ static inline unsigned int kvm_arch_para_features(void)
 }
 
 #endif

-
  


Spurious.


 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6edba45..c436c08 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -382,6 +382,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_PV_MMU 13
 #define KVM_CAP_MP_STATE 14
 #define KVM_CAP_COALESCED_MMIO 15
+#define KVM_CAP_DEVICE_ASSIGNMENT 16
 
 /*

  * ioctls for VM fds
@@ -411,6 +412,8 @@ struct kvm_trace_rec {
_IOW(KVMIO,  0x67, struct kvm_coalesced_mmio_zone)
 #define KVM_UNREGISTER_COALESCED_MMIO \
_IOW(KVMIO,  0x68, struct kvm_coalesced_mmio_zone)
+#define KVM_UPDATE_ASSIGNED_DEVICE _IOR(KVMIO, 0x69,   \
+   struct kvm_assigned_dev)
 
 /*

  * ioctls for vcpu fds
@@ -475,4 +478,22 @@ struct kvm_trace_rec {
 #define KVM_TRC_STLB_INVAL   (KVM_TRC_HANDLER + 0x18)
 #define KVM_TRC_PPC_INSTR(KVM_TRC_HANDLER + 0x19)
 
+#define ASSIGNED_DEV_MAX_IRQ 16

+
+/* Stores information for identifying host PCI devices assigned to the
+ * guest: this is used in the host kernel and in the userspace.
+ */
+struct kvm_assigned_dev_info {
+   __u32 busnr;
+   __u32 devfn;
+   __u32 irq[ASSIGNED_DEV_MAX_IRQ];
+   __u32 num_valid_irqs; /* currently only 1 is supported */
+};
  


Move num_valid_irqs before the array.  Add padding so the structure is a 
multiple of 64 bits (needed so that the i386 and x86_64 ABIs are identical).




+
+/* Mapping between host and guest PCI device */
+struct kvm_assigned_dev {
+   struct kvm_assigned_dev_info guest;
+   

[PATCH 2/4] KVM: pci device assignment

2008-07-22 Thread Ben-Ami Yassour
Based on a patch from: Amit Shah <[EMAIL PROTECTED]>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
Signed-off-by: Weidong Han <[EMAIL PROTECTED]>
---
 arch/x86/kvm/i8259.c   |5 +-
 arch/x86/kvm/irq.c |2 +-
 arch/x86/kvm/irq.h |3 +-
 arch/x86/kvm/x86.c |  215 
 include/asm-x86/kvm.h  |1 +
 include/asm-x86/kvm_host.h |   20 
 include/asm-x86/kvm_para.h |1 -
 include/linux/kvm.h|   21 +
 virt/kvm/ioapic.c  |4 +
 virt/kvm/ioapic.h  |1 +
 10 files changed, 269 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 55e179a..d6793f0 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -159,9 +159,10 @@ static inline void pic_intack(struct kvm_kpic_state *s, 
int irq)
s->irr &= ~(1 << irq);
 }
 
-int kvm_pic_read_irq(struct kvm_pic *s)
+int kvm_pic_read_irq(struct kvm *kvm)
 {
int irq, irq2, intno;
+   struct kvm_pic *s = pic_irqchip(kvm);
 
irq = pic_get_irq(&s->pics[0]);
if (irq >= 0) {
@@ -186,6 +187,8 @@ int kvm_pic_read_irq(struct kvm_pic *s)
irq = 7;
intno = s->pics[0].irq_base + irq;
}
+   kvm_notify_acked_irq(kvm, irq);
+
pic_update_irq(s);
 
return intno;
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 9091195..3c508af 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -72,7 +72,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v->kvm);
s->output = 0;  /* PIC */
-   vector = kvm_pic_read_irq(s);
+   vector = kvm_pic_read_irq(v->kvm);
}
}
return vector;
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 95fe718..479a3d2 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -63,11 +63,12 @@ struct kvm_pic {
void *irq_request_opaque;
int output; /* intr from master PIC */
struct kvm_io_device dev;
+   void (*ack_notifier)(void *opaque, int irq);
 };
 
 struct kvm_pic *kvm_create_pic(struct kvm *kvm);
 void kvm_pic_set_irq(void *opaque, int irq, int level);
-int kvm_pic_read_irq(struct kvm_pic *s);
+int kvm_pic_read_irq(struct kvm *kvm);
 void kvm_pic_update_irq(struct kvm_pic *s);
 
 static inline struct kvm_pic *pic_irqchip(struct kvm *kvm)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 376ef73..d9aa931 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   <[EMAIL PROTECTED]>
  *   Yaniv Kamay  <[EMAIL PROTECTED]>
+ *   Amit Shah<[EMAIL PROTECTED]>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +25,10 @@
 #include "x86.h"
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -98,6 +102,204 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+struct kvm_assigned_dev_kernel
+*kvm_find_assigned_dev(struct list_head *head,
+struct kvm_assigned_dev_info *assigned_dev_info)
+{
+   struct list_head *ptr;
+   struct kvm_assigned_dev_kernel *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_assigned_dev_kernel, list);
+   if ((match->host.busnr == assigned_dev_info->busnr) &&
+   (match->host.devfn == assigned_dev_info->devfn))
+   return match;
+   }
+   return NULL;
+}
+
+static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
+{
+   struct kvm_assigned_dev_work *int_work;
+
+   int_work = container_of(work, struct kvm_assigned_dev_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(&int_work->assigned_dev->kvm->lock);
+   kvm_set_irq(int_work->assigned_dev->kvm,