Re: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-28 Thread Muli Ben-Yehuda
On Fri, Oct 24, 2008 at 11:22:48AM -0500, Anthony Liguori wrote:

 Amit Shah wrote:

 +#include linux/kvm_para.h
   

 Is this header really necessary?

No, removed.


 +#include device-assignment.h
 +
 +/* From linux/ioport.h */
 +#define IORESOURCE_IO   0x0100  /* Resource type */
 +#define IORESOURCE_MEM  0x0200
 +#define IORESOURCE_IRQ  0x0400
 +#define IORESOURCE_DMA  0x0800
 +#define IORESOURCE_PREFETCH 0x1000  /* No side effects */
 +
 +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
 +
 +#ifdef DEVICE_ASSIGNMENT_DEBUG
 +#define DEBUG(fmt, args...)   \
   

 Please use C99 style varidacs.

Done.


 +do {  \
 +  fprintf(stderr, %s:  fmt, __func__ , ## args);\
 +} while (0)
 +#else
 +#define DEBUG(fmt, args...) do { } while(0)
 +#endif
 +
 +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
 +   uint32_t value)
 +{
 +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
   

 Cast is unnecessary.

Removed.


 +uint32_t r_pio = (unsigned long)r_access-r_virtbase
 ++ (addr - r_access-e_physbase);
   

 It would be nice to make this a function to make it more obvious that you 
 were translated from guest to host regions.  The cast to unsigned long 
 should probably be target_ulong too.

Done.


 +DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
 +   r_virtbase=%08lx value=%08x\n,
 +  __func__, r_pio, (int)r_access-e_physbase,
 +  (unsigned long)r_access-r_virtbase, value);
   

 This debug statement looks wrong to me.  You're passing stderr.
 It's true for all of these functions.

Fixed.


 +static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
 +   uint32_t e_phys, uint32_t e_size, int 
 type)
 +{
 +AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
 +AssignedDevRegion *region = r_dev-v_addrs[region_num];
 +int first_map = (region-e_size == 0);
 +int ret = 0;
 +
 +DEBUG(%s: e_phys=%08x r_virt=%x type=%d len=%08x region_num=%d \n,
 +  __func__, e_phys, (uint32_t)region-r_virtbase, type, e_size,
 +  region_num);
   

 You already have __func__ in your debug printf().

Fixed.


 +region-e_physbase = e_phys;
 +region-e_size = e_size;
 +
 +/* FIXME: Add support for emulated MMIO for non-kvm guests */
 +if (kvm_enabled()) {
   

 I don't think having a kvm_enabled() check here is very useful.  I
 think device-assignment.c should be conditional on USE_KVM, and the
 only kvm_enabled() check should be when creating the initial device
 assignment.  Practically speaking, QEMU is never going to support
 device assignment outside of the context of KVM because I strongly
 doubt anything like irqhook will make it upstream.

Reworked along your suggestions, please let me know if you have
further comments.

 +if (!first_map)
 +kvm_destroy_phys_mem(kvm_context, e_phys, e_size);
 +if (e_size  0)
 +ret = kvm_register_phys_mem(kvm_context, e_phys,
 +region-r_virtbase, e_size, 0);
 +if (ret != 0)
 +fprintf(stderr, %s: Error: create new mapping failed\n, 
 __func__);
   

 If we do get an error here, we shouldn't keep going.  This error is
 probably going to happen in practice if a guest tries to pass
 through too many devices and we run out of slots.

Fixed, we exit(1) now (is there a more graceful to bail out?).

 +}
 +}
 +
 +static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
 +uint32_t addr, uint32_t size, int 
 type)
 +{
 +AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
 +AssignedDevRegion *region = r_dev-v_addrs[region_num];
 +int r;
 +
 +region-e_physbase = addr;
 +region-e_size = size;
 +
 +DEBUG(%s: e_phys=0x%x r_virt=%x type=0x%x len=%d region_num=%d \n,
 +  __func__, addr, (uint32_t)region-r_virtbase, type, size, 
 region_num);
   

 Need to fix this DEBUG().

Fixed.


 +r = ioperm((uint32_t)region-r_virtbase, size, 1);
   

 I don't think this is enough for KVM.  This will only do the ioperm
 in the thread that triggered the IO.  If you have an SMP guest,
 ioperm needs to be done on each VCPU's thread.

Fixed.

 +if (r  0) {
 +perror(assigned_dev_ioport_map: ioperm);
 +return;
 +}
   

 Again, if we fail, we have to exit QEMU gracefully.

Fixed.


 +register_ioport_read(addr, size, 1, assigned_dev_ioport_readb,
 + (void *) (r_dev-v_addrs + region_num));
 +register_ioport_read(addr, size, 2, assigned_dev_ioport_readw,
 + (void *) (r_dev-v_addrs + region_num));
 +register_ioport_read(addr, size, 4, assigned_dev_ioport_readl,
 + (void *) (r_dev-v_addrs + region_num));
 +

Re: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-28 Thread Muli Ben-Yehuda
On Mon, Oct 27, 2008 at 02:32:48PM +0800, Han, Weidong wrote:

 Yes, it's buggy. It should like:
 
 uint32_t old_ephys = region-e_physbase;
 uint32_t old_esize = region-e_size;
 
 ...
 
 kvm_destroy_phys_mem(kvm_context, old_ephys, old_esize);

Fixed in v8. Thanks!

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
   -
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-27 Thread Han, Weidong
Su, Disheng wrote:
 Amit Shah wrote:
 This patch has been contributed to by the following people:
 
 From: Or Sagi [EMAIL PROTECTED]
 From: Nir Peleg [EMAIL PROTECTED]
 From: Amit Shah [EMAIL PROTECTED]
 From: Ben-Ami Yassour [EMAIL PROTECTED]
 From: Weidong Han [EMAIL PROTECTED]
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 With this patch, we can assign a device on the host machine to a
 guest. 
 
 A new command-line option, -pcidevice is added.
 To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use
 this: 
 
 -pcidevice host=04:08.0
 
 * The host driver for the device, if any, is to be removed before
 assigning the device (else device assignment will fail).
 
 * A device that shares IRQ with another host device cannot currently
 be assigned. 
 
 * The RAW_IO capability is needed for this to work
 
 This works only with the in-kernel irqchip method; to use the
 userspace irqchip, a kernel module (irqhook) and some extra changes
 are needed. 
 
 Signed-off-by: Amit Shah [EMAIL PROTECTED]
 ---
  qemu/Makefile.target|1 +
  qemu/hw/device-assignment.c |  619
  +++
  qemu/hw/device-assignment.h |   98 +++ qemu/hw/pc.c
  |6 + qemu/hw/pci.c   |7 +
  qemu/vl.c   |   18 ++
  6 files changed, 749 insertions(+), 0 deletions(-)
  create mode 100644 qemu/hw/device-assignment.c
  create mode 100644 qemu/hw/device-assignment.h
 
 diff --git a/qemu/Makefile.target b/qemu/Makefile.target
 index d9bdeca..05a1d84 100644
 --- a/qemu/Makefile.target
 +++ b/qemu/Makefile.target
 @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW)
  dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o
  pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o +OBJS+=
  device-assignment.o ifeq ($(USE_KVM_PIT), 1)
  OBJS+= i8254-kvm.o
  endif
 diff --git a/qemu/hw/device-assignment.c
 b/qemu/hw/device-assignment.c new file mode 100644 index
 000..5ba21a0 --- /dev/null
 +++ b/qemu/hw/device-assignment.c
 @@ -0,0 +1,619 @@
 +/*
 + * Copyright (c) 2007, Neocleus Corporation.
 + *
 + * This program is free software; you can redistribute it and/or
 modify it + * under the terms and conditions of the GNU General
 Public License, + * version 2, as published by the Free Software
 Foundation. + * + * This program is distributed in the hope it will
 be useful, but WITHOUT + * ANY WARRANTY; without even the implied
 warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.
 See the GNU General Public License for + * more details.
 + *
 + * You should have received a copy of the GNU General Public License
 along with + * this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA
 02111-1307 USA. + * + *
 + *  Assign a PCI device from the host to a guest VM. + *
 + *  Adapted for KVM by Qumranet.
 + *
 + *  Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED])
 + *  Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED])
 + *  Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED])
 + *  Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED]) +
 */ +#include stdio.h
 +#include sys/io.h
 +#include qemu-kvm.h
 +#include hw.h
 +#include pc.h
 +#include sysemu.h
 +#include console.h
 +#include linux/kvm_para.h
 +#include device-assignment.h
 +
 +/* From linux/ioport.h */
 +#define IORESOURCE_IO   0x0100  /* Resource type */
 +#define IORESOURCE_MEM  0x0200
 +#define IORESOURCE_IRQ  0x0400
 +#define IORESOURCE_DMA  0x0800
 +#define IORESOURCE_PREFETCH 0x1000  /* No side effects */ +
 +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
 +
 +#ifdef DEVICE_ASSIGNMENT_DEBUG
 +#define DEBUG(fmt, args...)   \
 +do {  \
 +  fprintf(stderr, %s:  fmt, __func__ , ## args);\ +}
 while (0) +#else
 +#define DEBUG(fmt, args...) do { } while(0)
 +#endif
 +
 +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
 +   uint32_t value) +{
 +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
 +uint32_t r_pio = (unsigned long)r_access-r_virtbase
 ++ (addr - r_access-e_physbase);
 +
 +DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
 +   r_virtbase=%08lx value=%08x\n,
 +  __func__, r_pio, (int)r_access-e_physbase,
 +  (unsigned long)r_access-r_virtbase, value); +   
 outb(value, r_pio); +}
 +
 +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
 +   uint32_t value) +{
 +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
 +uint32_t r_pio = (unsigned long)r_access-r_virtbase
 ++ (addr - r_access-e_physbase);
 +
 +DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
 +   r_virtbase=%08lx value=%08x\n,
 +  

RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-26 Thread Su, Disheng
Amit Shah wrote:
 This patch has been contributed to by the following people:

 From: Or Sagi [EMAIL PROTECTED]
 From: Nir Peleg [EMAIL PROTECTED]
 From: Amit Shah [EMAIL PROTECTED]
 From: Ben-Ami Yassour [EMAIL PROTECTED]
 From: Weidong Han [EMAIL PROTECTED]
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]

 With this patch, we can assign a device on the host machine to a
 guest.

 A new command-line option, -pcidevice is added.
 To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use this:

 -pcidevice host=04:08.0

 * The host driver for the device, if any, is to be removed before
 assigning the device (else device assignment will fail).

 * A device that shares IRQ with another host device cannot currently
 be assigned.

 * The RAW_IO capability is needed for this to work

 This works only with the in-kernel irqchip method; to use the
 userspace irqchip, a kernel module (irqhook) and some extra changes
 are needed.

 Signed-off-by: Amit Shah [EMAIL PROTECTED]
 ---
  qemu/Makefile.target|1 +
  qemu/hw/device-assignment.c |  619
  +++
  qemu/hw/device-assignment.h |   98 +++ qemu/hw/pc.c
  |6 + qemu/hw/pci.c   |7 +
  qemu/vl.c   |   18 ++
  6 files changed, 749 insertions(+), 0 deletions(-)
  create mode 100644 qemu/hw/device-assignment.c
  create mode 100644 qemu/hw/device-assignment.h

 diff --git a/qemu/Makefile.target b/qemu/Makefile.target
 index d9bdeca..05a1d84 100644
 --- a/qemu/Makefile.target
 +++ b/qemu/Makefile.target
 @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
 +OBJS+= device-assignment.o
  ifeq ($(USE_KVM_PIT), 1)
  OBJS+= i8254-kvm.o
  endif
 diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
 new file mode 100644
 index 000..5ba21a0
 --- /dev/null
 +++ b/qemu/hw/device-assignment.c
 @@ -0,0 +1,619 @@
 +/*
 + * Copyright (c) 2007, Neocleus Corporation.
 + *
 + * This program is free software; you can redistribute it and/or
 modify it + * under the terms and conditions of the GNU General
 Public License, + * version 2, as published by the Free Software
 Foundation. + *
 + * This program is distributed in the hope it will be useful, but
 WITHOUT + * ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 General Public License for + * more details.
 + *
 + * You should have received a copy of the GNU General Public License
 along with + * this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA
 02111-1307 USA. + *
 + *
 + *  Assign a PCI device from the host to a guest VM.
 + *
 + *  Adapted for KVM by Qumranet.
 + *
 + *  Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED])
 + *  Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED])
 + *  Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED])
 + *  Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED])
 + */
 +#include stdio.h
 +#include sys/io.h
 +#include qemu-kvm.h
 +#include hw.h
 +#include pc.h
 +#include sysemu.h
 +#include console.h
 +#include linux/kvm_para.h
 +#include device-assignment.h
 +
 +/* From linux/ioport.h */
 +#define IORESOURCE_IO   0x0100  /* Resource type */
 +#define IORESOURCE_MEM  0x0200
 +#define IORESOURCE_IRQ  0x0400
 +#define IORESOURCE_DMA  0x0800
 +#define IORESOURCE_PREFETCH 0x1000  /* No side effects */
 +
 +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
 +
 +#ifdef DEVICE_ASSIGNMENT_DEBUG
 +#define DEBUG(fmt, args...)   \
 +do {  \
 +  fprintf(stderr, %s:  fmt, __func__ , ## args);\
 +} while (0)
 +#else
 +#define DEBUG(fmt, args...) do { } while(0)
 +#endif
 +
 +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
 +   uint32_t value)
 +{
 +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
 +uint32_t r_pio = (unsigned long)r_access-r_virtbase
 ++ (addr - r_access-e_physbase);
 +
 +DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
 +   r_virtbase=%08lx value=%08x\n,
 +  __func__, r_pio, (int)r_access-e_physbase,
 +  (unsigned long)r_access-r_virtbase, value);
 +outb(value, r_pio);
 +}
 +
 +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
 +   uint32_t value)
 +{
 +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
 +uint32_t r_pio = (unsigned long)r_access-r_virtbase
 ++ (addr - r_access-e_physbase);
 +
 +DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
 +   r_virtbase=%08lx value=%08x\n,
 +  __func__, 

Re: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests

2008-10-24 Thread Anthony Liguori

Amit Shah wrote:

This patch has been contributed to by the following people:

From: Or Sagi [EMAIL PROTECTED]
From: Nir Peleg [EMAIL PROTECTED]
From: Amit Shah [EMAIL PROTECTED]
From: Ben-Ami Yassour [EMAIL PROTECTED]
From: Weidong Han [EMAIL PROTECTED]
From: Glauber de Oliveira Costa [EMAIL PROTECTED]

With this patch, we can assign a device on the host machine to a
guest.

A new command-line option, -pcidevice is added.
To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use this:

-pcidevice host=04:08.0

* The host driver for the device, if any, is to be removed before
assigning the device (else device assignment will fail).

* A device that shares IRQ with another host device cannot currently
be assigned.

* The RAW_IO capability is needed for this to work

This works only with the in-kernel irqchip method; to use the
userspace irqchip, a kernel module (irqhook) and some extra changes
are needed.

Signed-off-by: Amit Shah [EMAIL PROTECTED]
---
 qemu/Makefile.target|1 +
 qemu/hw/device-assignment.c |  619 +++
 qemu/hw/device-assignment.h |   98 +++
 qemu/hw/pc.c|6 +
 qemu/hw/pci.c   |7 +
 qemu/vl.c   |   18 ++
 6 files changed, 749 insertions(+), 0 deletions(-)
 create mode 100644 qemu/hw/device-assignment.c
 create mode 100644 qemu/hw/device-assignment.h

diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index d9bdeca..05a1d84 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
+OBJS+= device-assignment.o
 ifeq ($(USE_KVM_PIT), 1)
 OBJS+= i8254-kvm.o
 endif
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
new file mode 100644
index 000..5ba21a0
--- /dev/null
+++ b/qemu/hw/device-assignment.c
@@ -0,0 +1,619 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ *
+ *  Assign a PCI device from the host to a guest VM.
+ *
+ *  Adapted for KVM by Qumranet.
+ *
+ *  Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED])
+ *  Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED])
+ *  Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED])
+ *  Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED])
+ */
+#include stdio.h
+#include sys/io.h
+#include qemu-kvm.h
+#include hw.h
+#include pc.h
+#include sysemu.h
+#include console.h
+#include linux/kvm_para.h
  


Is this header really necessary?


+#include device-assignment.h
+
+/* From linux/ioport.h */
+#define IORESOURCE_IO   0x0100  /* Resource type */
+#define IORESOURCE_MEM  0x0200
+#define IORESOURCE_IRQ  0x0400
+#define IORESOURCE_DMA  0x0800
+#define IORESOURCE_PREFETCH 0x1000  /* No side effects */
+
+/* #define DEVICE_ASSIGNMENT_DEBUG 1 */
+
+#ifdef DEVICE_ASSIGNMENT_DEBUG
+#define DEBUG(fmt, args...)   \
  


Please use C99 style varidacs.


+do {  \
+  fprintf(stderr, %s:  fmt, __func__ , ## args);\
+} while (0)
+#else
+#define DEBUG(fmt, args...) do { } while(0)
+#endif
+
+static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
+   uint32_t value)
+{
+AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
  


Cast is unnecessary.


+uint32_t r_pio = (unsigned long)r_access-r_virtbase
++ (addr - r_access-e_physbase);
  


It would be nice to make this a function to make it more obvious that 
you were translated from guest to host regions.  The cast to unsigned 
long should probably be target_ulong too.



+DEBUG(stderr, %s: r_pio=%08x e_physbase=%08x
+   r_virtbase=%08lx value=%08x\n,
+  __func__, r_pio, (int)r_access-e_physbase,
+  (unsigned long)r_access-r_virtbase, value);
  


This debug statement looks wrong to me.  You're passing stderr.  It's 
true for all of these functions.



+static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
+   uint32_t e_phys, uint32_t e_size,