[SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core

2008-12-01 Thread Yu Zhao
This patch integrates the IGB driver with the SR-IOV core. It shows how
the SR-IOV API is used to support the capability. Obviously people does
not need to put much effort to integrate the PF driver with SR-IOV core.
All SR-IOV standard stuff are handled by SR-IOV core and PF driver only
concerns the device specific resource allocation and deallocation once it
gets the necessary information (i.e. number of Virtual Functions) from
the callback function.

From: Intel Corporation, LAN Access Division <[EMAIL PROTECTED]>
Signed-off-by: Yu Zhao <[EMAIL PROTECTED]>

---
 drivers/net/igb/igb_main.c |   46 
 1 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index f0361ef..78bda11 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -138,6 +138,7 @@ void igb_set_mc_list_pools(struct igb_adapter *, struct 
e1000_hw *, int, u16);
 static int igb_vmm_control(struct igb_adapter *, bool);
 static int igb_set_vf_mac(struct net_device *, int, u8*);
 static void igb_mbox_handler(struct igb_adapter *);
+static int igb_virtual(struct pci_dev *, int);
 
 static int igb_suspend(struct pci_dev *, pm_message_t);
 #ifdef CONFIG_PM
@@ -182,6 +183,7 @@ static struct pci_driver igb_driver = {
 #endif
.shutdown = igb_shutdown,
.err_handler = &igb_err_handler,
+   .virtual = igb_virtual
 };
 
 static int global_quad_port_a; /* global quad port a indication */
@@ -5066,4 +5068,48 @@ void igb_set_mc_list_pools(struct igb_adapter *adapter,
wr32(E1000_VMOLR(pool), reg_data);
 }
 
+static int
+igb_virtual(struct pci_dev *pdev, int nr_virtfn)
+{
+   int i;
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct igb_adapter *adapter = netdev_priv(netdev);
+   /* the VFs' MAC addresses are hard-coded */
+   unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF};
+
+   /*
+* the 82576 NIC supports 1-PF NIC + 7-VF NICs mode and 8-VF NICs
+* mode. In the 8-VF NICs mode, the PF can't tx/rx packets -- it
+* only behaves as 'VF supervisor'. For now we use the 1-PF NIC +
+* 7-VF NICs mode to preserve PF's tx/rx capability for the debug
+* purpose.
+*/
+   if (nr_virtfn > (MAX_NUM_VFS - 1))
+   return -EINVAL;
+
+   if (nr_virtfn) {
+   dev_info(&pdev->dev, "SR-IOV is enabled\n");
+   /*
+* Currently VFs resources are pre-allocated, so just set
+* the MAC addresses of each VF here.
+*/
+   for (i = 0; i < nr_virtfn; i++) {
+   my_mac_addr[5] = (unsigned char)i;
+   igb_set_vf_mac(netdev, i, my_mac_addr);
+   igb_set_vf_vmolr(adapter, i);
+   }
+   } else {
+   /*
+* Since we statically allocate tx/rx queues for the PF
+* and the VFs, so we don't need to free any VF related
+* resources here.
+*/
+   dev_info(&pdev->dev, "SR-IOV is disabled\n");
+   }
+
+   adapter->vfs_allocated_count = nr_virtfn;
+
+   return 0;
+}
+
 /* igb_main.c */
-- 
1.5.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[SR-IOV driver example 1/3 resend] PF driver: hardware specific operations

2008-12-01 Thread Yu Zhao
This patch makes the IGB driver allocate hardware resource (rx/tx queues)
for Virtual Functions. All operations in this patch are hardware specific.

From: Intel Corporation, LAN Access Division <[EMAIL PROTECTED]>
Signed-off-by: Yu Zhao <[EMAIL PROTECTED]>

---
 drivers/net/igb/Makefile|2 +-
 drivers/net/igb/e1000_82575.c   |1 +
 drivers/net/igb/e1000_82575.h   |   61 +
 drivers/net/igb/e1000_defines.h |7 +
 drivers/net/igb/e1000_hw.h  |2 +
 drivers/net/igb/e1000_regs.h|   13 +
 drivers/net/igb/igb.h   |8 +
 drivers/net/igb/igb_main.c  |  567 +-
 drivers/pci/iov.c   |6 +-
 9 files changed, 649 insertions(+), 18 deletions(-)

diff --git a/drivers/net/igb/Makefile b/drivers/net/igb/Makefile
index 1927b3f..ab3944c 100644
--- a/drivers/net/igb/Makefile
+++ b/drivers/net/igb/Makefile
@@ -33,5 +33,5 @@
 obj-$(CONFIG_IGB) += igb.o
 
 igb-objs := igb_main.o igb_ethtool.o e1000_82575.o \
-   e1000_mac.o e1000_nvm.o e1000_phy.o
+   e1000_mac.o e1000_nvm.o e1000_phy.o e1000_vf.o
 
diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index f5e2e72..bb823ac 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -87,6 +87,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
case E1000_DEV_ID_82576:
case E1000_DEV_ID_82576_FIBER:
case E1000_DEV_ID_82576_SERDES:
+   case E1000_DEV_ID_82576_QUAD_COPPER:
mac->type = e1000_82576;
break;
default:
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index c1928b5..8c488ab 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -170,4 +170,65 @@ struct e1000_adv_tx_context_desc {
 #define E1000_DCA_TXCTRL_CPUID_SHIFT 24 /* Tx CPUID now in the last byte */
 #define E1000_DCA_RXCTRL_CPUID_SHIFT 24 /* Rx CPUID now in the last byte */
 
+#define MAX_NUM_VFS   8
+
+#define E1000_DTXSWC_VMDQ_LOOPBACK_EN (1 << 31)  /* global VF LB enable */
+
+/* Easy defines for setting default pool, would normally be left a zero */
+#define E1000_VT_CTL_DEFAULT_POOL_SHIFT 7
+#define E1000_VT_CTL_DEFAULT_POOL_MASK  (0x7 << 
E1000_VT_CTL_DEFAULT_POOL_SHIFT)
+
+/* Other useful VMD_CTL register defines */
+#define E1000_VT_CTL_DISABLE_DEF_POOL   (1 << 29)
+#define E1000_VT_CTL_VM_REPL_EN (1 << 30)
+
+/* Per VM Offload register setup */
+#define E1000_VMOLR_LPE0x0001 /* Accept Long packet */
+#define E1000_VMOLR_AUPE   0x0100 /* Accept untagged packets */
+#define E1000_VMOLR_BAM0x0800 /* Accept Broadcast packets */
+#define E1000_VMOLR_MPME   0x1000 /* Multicast promiscuous mode */
+#define E1000_VMOLR_STRVLAN0x4000 /* Vlan stripping enable */
+
+#define E1000_P2VMAILBOX_STS   0x0001 /* Initiate message send to VF */
+#define E1000_P2VMAILBOX_ACK   0x0002 /* Ack message recv'd from VF */
+#define E1000_P2VMAILBOX_VFU   0x0004 /* VF owns the mailbox buffer */
+#define E1000_P2VMAILBOX_PFU   0x0008 /* PF owns the mailbox buffer */
+
+#define E1000_VLVF_ARRAY_SIZE 32
+#define E1000_VLVF_VLANID_MASK0x0FFF
+#define E1000_VLVF_POOLSEL_SHIFT  12
+#define E1000_VLVF_POOLSEL_MASK   (0xFF << E1000_VLVF_POOLSEL_SHIFT)
+#define E1000_VLVF_VLANID_ENABLE  0x8000
+
+#define E1000_VFMAILBOX_SIZE   16 /* 16 32 bit words - 64 bytes */
+
+/* If it's a E1000_VF_* msg then it originates in the VF and is sent to the
+ * PF.  The reverse is true if it is E1000_PF_*.
+ * Message ACK's are the value or'd with 0xF000
+ */
+#define E1000_VT_MSGTYPE_ACK  0xF000  /* Messages below or'd with
+   * this are the ACK */
+#define E1000_VT_MSGTYPE_NACK 0xFF00  /* Messages below or'd with
+   * this are the NACK */
+#define E1000_VT_MSGINFO_SHIFT16
+/* bits 23:16 are used for exra info for certain messages */
+#define E1000_VT_MSGINFO_MASK (0xFF << E1000_VT_MSGINFO_SHIFT)
+
+#define E1000_VF_MSGTYPE_REQ_MAC  1 /* VF needs to know its MAC */
+#define E1000_VF_MSGTYPE_VFLR 2 /* VF notifies VFLR to PF */
+#define E1000_VF_SET_MULTICAST3 /* VF requests PF to set MC addr */
+#define E1000_VF_SET_VLAN 4 /* VF requests PF to set VLAN */
+#define E1000_VF_SET_LPE  5 /* VF requests PF to set VMOLR.LPE */
+
+s32  e1000_send_mail_to_vf(struct e1000_hw *hw, u32 *msg,
+   u32 vf_number, s16 size);
+s32  e1000_receive_mail_from_vf(struct e1000_hw *hw, u32 *msg,
+u32 vf_number, s16 size);
+void e1000_vmdq_loopback_enable_vf(struct e1000_hw *hw);
+void e1000_vmdq_loopback_disable_vf(struct e1000_hw *hw);
+void e1000_vmdq_replication_enable_vf(struct e1000_hw *hw, u32 enables);
+void e1000_vmdq_replication_disable_vf(struct e1000_hw *hw);
+bool e1000_check_for_pf_ack_vf(s

RE: [PATCH] Kvm: Qemu: save nvram

2008-12-01 Thread Zhang, Yang
Hi:
Please drop the previous one.

From 2fd0c2746a2d07813ad16700ee31c7f6ae78c40a Mon Sep 17 00:00:00 2001
From: Yang Zhang <[EMAIL PROTECTED]>
Date: Tue, 2 Dec 2008 13:05:55 +0800
Subject: [PATCH] KVM: Qemu: save nvram

support to save nvram to the file

Signed-off-by: Yang Zhang <[EMAIL PROTECTED]>
---
 qemu/hw/ipf.c   |   19 -
 qemu/target-ia64/firmware.c |   94 --
 qemu/target-ia64/firmware.h |   22 +-
 3 files changed, 126 insertions(+), 9 deletions(-)

diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
index 337c854..2300ba9 100644
--- a/qemu/hw/ipf.c
+++ b/qemu/hw/ipf.c
@@ -51,6 +51,7 @@
 static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PCIDevice *i440fx_state;
+uint8_t *g_fw_start;

 static uint32_t ipf_to_legacy_io(target_phys_addr_t addr)
 {
@@ -454,9 +455,13 @@ static void ipf_init1(ram_addr_t ram_size, int 
vga_ram_size,
 unsigned long  image_size;
 char *image = NULL;
 uint8_t *fw_image_start;
+unsigned long nvram_addr = 0;
+unsigned long nvram_fd = 0;
+unsigned long i = 0;
 ram_addr_t fw_offset = qemu_ram_alloc(GFW_SIZE);
 uint8_t *fw_start = phys_ram_base + fw_offset;

+g_fw_start = fw_start;
 snprintf(buf, sizeof(buf), "%s/%s", bios_dir, FW_FILENAME);
 image = read_image(buf, &image_size );
 if (NULL == image || !image_size) {
@@ -472,7 +477,19 @@ static void ipf_init1(ram_addr_t ram_size, int 
vga_ram_size,
 free(image);
 flush_icache_range((unsigned long)fw_image_start,
(unsigned long)fw_image_start + image_size);
-kvm_ia64_build_hob(ram_size + above_4g_mem_size, smp_cpus, fw_start);
+if (qemu_name) {
+nvram_addr = NVRAM_START;
+nvram_fd = kvm_ia64_nvram_init();
+if (nvram_fd != -1) {
+kvm_ia64_copy_from_nvram_to_GFW(nvram_fd, g_fw_start);
+close(nvram_fd);
+}
+i = atexit(kvm_ia64_copy_from_GFW_to_nvram);
+if (i != 0)
+fprintf(stderr, "cannot set exit function\n");
+}
+kvm_ia64_build_hob(ram_size + above_4g_mem_size,smp_cpus,fw_start,
+   nvram_addr);
 }

 /*Register legacy io address space, size:64M*/
diff --git a/qemu/target-ia64/firmware.c b/qemu/target-ia64/firmware.c
index bac2721..6729cb5 100644
--- a/qemu/target-ia64/firmware.c
+++ b/qemu/target-ia64/firmware.c
@@ -31,6 +31,9 @@

 #include "firmware.h"

+#include "qemu-common.h"
+#include "sysemu.h"
+
 typedef struct {
 unsigned long signature;
 unsigned int  type;
@@ -85,14 +88,16 @@ static int hob_init(void  *buffer ,unsigned long buf_size);
 static int add_pal_hob(void* hob_buf);
 static int add_mem_hob(void* hob_buf, unsigned long dom_mem_size);
 static int add_vcpus_hob(void* hob_buf, unsigned long nr_vcpu);
+static int add_nvram_hob(void *hob_buf, unsigned long nvram_addr);
 static int build_hob(void* hob_buf, unsigned long hob_buf_size,
- unsigned long dom_mem_size, unsigned long vcpus);
+ unsigned long dom_mem_size, unsigned long vcpus,
+ unsigned long nvram_addr);
 static int load_hob(void *hob_buf,
 unsigned long dom_mem_size, void* hob_start);

 int
-kvm_ia64_build_hob(unsigned long memsize,
-   unsigned long vcpus, uint8_t* fw_start)
+kvm_ia64_build_hob(unsigned long memsize, unsigned long vcpus,
+   uint8_t* fw_start, unsigned long nvram_addr)
 {
 char   *hob_buf;

@@ -102,7 +107,7 @@ kvm_ia64_build_hob(unsigned long memsize,
 return -1;
 }

-if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus) < 0) {
+if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus, nvram_addr) < 0) {
 free(hob_buf);
 Hob_Output("Could not build hob");
 return -1;
@@ -206,7 +211,8 @@ add_max_hob_entry(void* hob_buf)

 static int
 build_hob(void* hob_buf, unsigned long hob_buf_size,
-  unsigned long dom_mem_size, unsigned long vcpus)
+  unsigned long dom_mem_size, unsigned long vcpus,
+  unsigned long nvram_addr)
 {
 //Init HOB List
 if (hob_init(hob_buf, hob_buf_size) < 0) {
@@ -229,6 +235,11 @@ build_hob(void* hob_buf, unsigned long hob_buf_size,
 goto err_out;
 }

+if (add_nvram_hob(hob_buf, nvram_addr) < 0) {
+   Hob_Output("Add nvram hob failed, buffer too small");
+   goto err_out;
+   }
+
 if (add_max_hob_entry(hob_buf) < 0) {
 Hob_Output("Add max hob entry failed, buffer too small");
 goto err_out;
@@ -285,6 +296,12 @@ add_vcpus_hob(void* hob_buf, unsigned long vcpus)
 return hob_add(hob_buf, HOB_TYPE_NR_VCPU, &vcpus, sizeof(vcpus));
 }

+static int
+add_nvram_hob(void *hob_buf, unsigned long nvram_addr)
+{
+return hob_add(hob_buf, HOB_TYPE_NR_NVRAM, &nvram_addr, 
sizeof(nvram_addr)

[SR-IOV driver example 0/3 resend] introduction

2008-12-01 Thread Yu Zhao
SR-IOV drivers of Intel 82576 NIC are available. There are two parts
of the drivers: Physical Function driver and Virtual Function driver.
The PF driver is based on the IGB driver and is used to control PF to
allocate hardware specific resources and interface with the SR-IOV core.
The VF driver is a new NIC driver that is same as the traditional PCI
device driver. It works in both the host and the guest (Xen and KVM)
environment.

These two drivers are testing versions and they are *only* intended to
show how to use SR-IOV API.

Intel 82576 NIC specification can be found at:
http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf

[SR-IOV driver example 0/3 resend] introduction
[SR-IOV driver example 1/3 resend] PF driver: hardware specific operations
[SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core
[SR-IOV driver example 3/3 resend] VF driver: an independent PCI NIC driver
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Kvm: Qemu: save nvram

2008-12-01 Thread Zhang, Yang
This patch to save the nvram. It save the nvram by specify the arg of -name.And 
the saved file named by the arg. If do not specify the arg,it will not save the 
nvram

>From d3e31cda03ef67efc860eaec2f93153e5535d744 Mon Sep 17 00:00:00 2001
From: Yang Zhang <[EMAIL PROTECTED]>
Date: Tue, 2 Dec 2008 10:02:00 +0800
Subject: [PATCH] Kvm: Qemu: save nvram

support to save nvram to the file

Signed-off-by: Yang Zhang <[EMAIL PROTECTED]>
---
 qemu/hw/ipf.c   |   15 ++-
 qemu/target-ia64/firmware.c |  107 +--
 qemu/target-ia64/firmware.h |   22 -
 3 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
index 337c854..cdbd4e0 100644
--- a/qemu/hw/ipf.c
+++ b/qemu/hw/ipf.c
@@ -51,6 +51,7 @@
 static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PCIDevice *i440fx_state;
+uint8_t *g_fw_start;
 
 static uint32_t ipf_to_legacy_io(target_phys_addr_t addr)
 {
@@ -454,9 +455,12 @@ static void ipf_init1(ram_addr_t ram_size, int 
vga_ram_size,
 unsigned long  image_size;
 char *image = NULL;
 uint8_t *fw_image_start;
+unsigned long nvram_addr = 0;
+unsigned long nvram_fd = 0;
 ram_addr_t fw_offset = qemu_ram_alloc(GFW_SIZE);
 uint8_t *fw_start = phys_ram_base + fw_offset;
 
+g_fw_start = fw_start;
 snprintf(buf, sizeof(buf), "%s/%s", bios_dir, FW_FILENAME);
 image = read_image(buf, &image_size );
 if (NULL == image || !image_size) {
@@ -472,7 +476,16 @@ static void ipf_init1(ram_addr_t ram_size, int 
vga_ram_size,
 free(image);
 flush_icache_range((unsigned long)fw_image_start,
(unsigned long)fw_image_start + image_size);
-kvm_ia64_build_hob(ram_size + above_4g_mem_size, smp_cpus, fw_start);
+if (qemu_name) {
+nvram_addr = NVRAM_START;
+if((nvram_fd = kvm_ia64_nvram_init()) != -1) {
+kvm_ia64_copy_from_nvram_to_GFW(nvram_fd,g_fw_start);
+close(nvram_fd);
+}
+atexit(kvm_ia64_copy_from_GFW_to_nvram);
+}
+kvm_ia64_build_hob(ram_size + above_4g_mem_size,smp_cpus,fw_start,
+   nvram_addr);
 }
 
 /*Register legacy io address space, size:64M*/
diff --git a/qemu/target-ia64/firmware.c b/qemu/target-ia64/firmware.c
index bac2721..39c8361 100644
--- a/qemu/target-ia64/firmware.c
+++ b/qemu/target-ia64/firmware.c
@@ -31,6 +31,8 @@
 
 #include "firmware.h"
 
+#include "qemu-common.h"
+#include "sysemu.h"
 typedef struct {
 unsigned long signature;
 unsigned int  type;
@@ -85,14 +87,16 @@ static int hob_init(void  *buffer ,unsigned long buf_size);
 static int add_pal_hob(void* hob_buf);
 static int add_mem_hob(void* hob_buf, unsigned long dom_mem_size);
 static int add_vcpus_hob(void* hob_buf, unsigned long nr_vcpu);
+static int add_nvram_hob(void *hob_buf, unsigned long nvram_addr);
 static int build_hob(void* hob_buf, unsigned long hob_buf_size,
- unsigned long dom_mem_size, unsigned long vcpus);
+ unsigned long dom_mem_size, unsigned long vcpus
+, unsigned long nvram_addr);
 static int load_hob(void *hob_buf,
 unsigned long dom_mem_size, void* hob_start);
 
 int
-kvm_ia64_build_hob(unsigned long memsize,
-   unsigned long vcpus, uint8_t* fw_start)
+kvm_ia64_build_hob(unsigned long memsize, unsigned long vcpus,
+   uint8_t* fw_start,unsigned long nvram_addr)
 {
 char   *hob_buf;
 
@@ -102,7 +106,7 @@ kvm_ia64_build_hob(unsigned long memsize,
 return -1;
 }
 
-if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus) < 0) {
+if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus,nvram_addr) < 0) {
 free(hob_buf);
 Hob_Output("Could not build hob");
 return -1;
@@ -206,7 +210,7 @@ add_max_hob_entry(void* hob_buf)
 
 static int
 build_hob(void* hob_buf, unsigned long hob_buf_size,
-  unsigned long dom_mem_size, unsigned long vcpus)
+  unsigned long dom_mem_size, unsigned long vcpus,unsigned long 
nvram_addr)
 {
 //Init HOB List
 if (hob_init(hob_buf, hob_buf_size) < 0) {
@@ -229,6 +233,11 @@ build_hob(void* hob_buf, unsigned long hob_buf_size,
 goto err_out;
 }
 
+if (add_nvram_hob(hob_buf, nvram_addr) < 0) {
+   Hob_Output("Add nvram hob failed, buffer too small");
+   goto err_out;
+   }
+
 if (add_max_hob_entry(hob_buf) < 0) {
 Hob_Output("Add max hob entry failed, buffer too small");
 goto err_out;
@@ -285,6 +294,12 @@ add_vcpus_hob(void* hob_buf, unsigned long vcpus)
 return hob_add(hob_buf, HOB_TYPE_NR_VCPU, &vcpus, sizeof(vcpus));
 }
 
+static int
+add_nvram_hob(void *hob_buf, unsigned long nvram_addr)
+{
+return hob_add(hob_buf, HOB_TYPE_NR_NVRAM, &nvram_addr, 
sizeof(

RE: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.

2008-12-01 Thread Zhang, Xiantao
Oops, seems we introduced the issue together. 

Acked-by Xiantao Zhang <[EMAIL PROTECTED]>

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 7:03 AM
To: Hollis Blanchard
Cc: Avi Kivity; Zhang, Xiantao; kvm@vger.kernel.org; [EMAIL PROTECTED]
Subject: Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.

Hollis Blanchard wrote:
> On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote:
>> Zhang, Xiantao wrote:
>>> >From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001
>>> From: Xiantao Zhang <[EMAIL PROTECTED]>
>>> Date: Fri, 28 Nov 2008 09:38:46 +0800
>>> Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
>>>
>>> Use TARGET_I386 to exclude other archs.
>>> Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]>
>>> ---
>>>  libkvm/libkvm.c |4 ++--
>>>  qemu/qemu-kvm.c |4 
>>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
>>> index 40c95ce..851a93a 100644
>>> --- a/libkvm/libkvm.c
>>> +++ b/libkvm/libkvm.c
>>> @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env)
>>> struct kvm_run *run = kvm->run[vcpu];
>>>  
>>>  again:
>>> -#ifdef KVM_CAP_NMI
>>> +#ifdef TARGET_I386
>>> push_nmi(kvm);
>>>  #endif
>>>  #if !defined(__s390__)
>>> @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm)
>>>  
>>>  int kvm_inject_nmi(kvm_context_t kvm, int vcpu)
>>>  {
>>> -#ifdef KVM_CAP_NMI
>>> +#ifdef TARGET_I386
>>> return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI);
>>>  #else
>>> return -ENOSYS;
>>> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
>>> index cf0e85d..b6c8288 100644
>>> --- a/qemu/qemu-kvm.c
>>> +++ b/qemu/qemu-kvm.c
>>> @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque)
>>>  return kvm_arch_try_push_interrupts(opaque);
>>>  }
>>>  
>>> +#ifdef TARGET_I386
>>>  static void push_nmi(void *opaque)
>>>  {
>>>  kvm_arch_push_nmi(opaque);
>>>  }
>>> +#endif
>>>  
>>>  static void post_kvm_run(void *opaque, void *data)
>>>  {
>>> @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = {
>>>  .shutdown = kvm_shutdown,
>>>  .io_window = kvm_io_window,
>>>  .try_push_interrupts = try_push_interrupts,
>>> +#ifdef TARGET_I386
>>>  .push_nmi = push_nmi,
>>> +#endif
>>>  .post_kvm_run = post_kvm_run,
>>>  .pre_kvm_run = pre_kvm_run,
>>>  #ifdef TARGET_I386
>> This will now break when KVM_CAP_NMI is undefined, ie. when there is no
>> KVM_NMI IOCTL (=> older kvm module sets).
> 
> Guys, we already have stubs for this (although they've been turned into
> dead code). Jan broke IA64 and PowerPC builds when he renamed
> "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is
> to update the stubs to match. That avoids all these ifdefs and
> associated problems.

Ouch - I'm sorry.

> 
> Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please?
> 

Here is a patch that reverts change and fixes the root of the issue.

---

Subject: Fix non-x86 NMI hooks

My previous x86-only change to the NMI push hook broke PPC and IA64.
This is a proper fix plus a cleanup of the #ifdef-based approach to
solve the breakage.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---

 qemu/qemu-kvm-ia64.c|3 +--
 qemu/qemu-kvm-powerpc.c |3 +--
 qemu/qemu-kvm.c |4 
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/qemu/qemu-kvm-ia64.c b/qemu/qemu-kvm-ia64.c
index 8380f39..a6b17af 100644
--- a/qemu/qemu-kvm-ia64.c
+++ b/qemu/qemu-kvm-ia64.c
@@ -57,9 +57,8 @@ int kvm_arch_try_push_interrupts(void *opaque)
 return 1;
 }
 
-int kvm_arch_try_push_nmi(void *opaque)
+void kvm_arch_push_nmi(void *opaque)
 {
-return 1;
 }
 
 void kvm_arch_update_regs_for_sipi(CPUState *env)
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
index 19fde40..fa534ed 100644
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -188,12 +188,11 @@ int kvm_arch_try_push_interrupts(void *opaque)
 return 0;
 }
 
-int kvm_arch_try_push_nmi(void *opaque)
+void kvm_arch_push_nmi(void *opaque)
 {
/* no nmi irq, so discard that call for now and return success.
 * This might later get mapped to something on powerpc too if we want
 *  to support the nmi monitor command somwhow */
-   return 0;
 }
 
 void kvm_arch_update_regs_for_sipi(CPUState *env)
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index b6c8288..cf0e85d 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -154,12 +154,10 @@ static int try_push_interrupts(void *opaque)
 return kvm_arch_try_push_interrupts(opaque);
 }
 
-#ifdef TARGET_I386
 static void push_nmi(void *opaque)
 {
 kvm_arch_push_nmi(opaque);
 }
-#endif
 
 static void post_kvm_run(void *opaque, void *data)
 {
@@ -744,9 +742,7 @@ static struct kvm_callbacks qemu_kvm_ops = {
 .shutdown = kvm_shutdown,
 .io_window = kvm_io_window,
 .try_push_interrupts = try_push_in

Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?

2008-12-01 Thread walt


On Mon, 1 Dec 2008, Avi Kivity wrote:

> Anthony Liguori wrote:
> >
> > We've started getting some reports of corruption on "commit" in KVM.  There
> > is a long standing disk corruption issue too that is very difficult to
> > reproduce.  The thinking is that there is a bug somewhere in the qcow2 code.
> >
> > Is anyone actively looking into this?
> >
>
> I am, though my actively is a lot less than could be desired.  Additional eyes
> would be welcome.

FWIW, I must apologize for giving you incorrect data.  I'm seeing problems
now that have nothing to do with the size of the commit, and I'm beginning
to suspect that the commit step has nothing to do with the problem.  I'll
summarize my evidence because it seems potentially very important:

I installed WinXP on qcow2, which went perfectly.  I rebooted multiple times
with no problems and changed settings for my desktop and taskbar, rebooted
again with no problems.

Now, I make a new, fresh [what's the opposite of a backing file?] like this:
$qemu-img create -f qcow2 -b kvmXP kvmXP.delta

All I do is boot XP again like this:
$qemu-system-x86_64 -m 1000 kvmXP.delta

My shock is that one of the taskbar settings I changed has disappeared!
I can boot the original kvmXP qcow2 image and verify that my changes are
still there in the original, but not when I boot kvmXP.delta.

In case someone wants to try to reproduce it, the specific change I made
to the task bar is to display the animated network activity icon in the
system tray.  That icon never appears when I boot from kvmXP.delta.

In other words, from the instant I boot kvmXP.delta, the image of XP that
gets loaded into memory is not an accurate reflection of what's in the
backing file.  If that's true, then it's not surprising that the commit
step causes trouble.

Does my reasoning seem reasonable? :o)




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/4] KVM: MMU: prepopulate the shadow on invlpg

2008-12-01 Thread Marcelo Tosatti
If the guest executes invlpg, peek into the pagetable and attempt to
prepopulate the shadow entry.

Also stop dirty fault updates from interfering with the fork detector.

2% improvement on RHEL3/AIM7.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2441,7 +2441,8 @@ static void kvm_mmu_access_page(struct k
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const u8 *new, int bytes)
+  const u8 *new, int bytes,
+  bool guest_initiated)
 {
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm_mmu_page *sp;
@@ -2467,15 +2468,17 @@ void kvm_mmu_pte_write(struct kvm_vcpu *
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
kvm_mmu_audit(vcpu, "pre pte write");
-   if (gfn == vcpu->arch.last_pt_write_gfn
-   && !last_updated_pte_accessed(vcpu)) {
-   ++vcpu->arch.last_pt_write_count;
-   if (vcpu->arch.last_pt_write_count >= 3)
-   flooded = 1;
-   } else {
-   vcpu->arch.last_pt_write_gfn = gfn;
-   vcpu->arch.last_pt_write_count = 1;
-   vcpu->arch.last_pte_updated = NULL;
+   if (guest_initiated) {
+   if (gfn == vcpu->arch.last_pt_write_gfn
+   && !last_updated_pte_accessed(vcpu)) {
+   ++vcpu->arch.last_pt_write_count;
+   if (vcpu->arch.last_pt_write_count >= 3)
+   flooded = 1;
+   } else {
+   vcpu->arch.last_pt_write_gfn = gfn;
+   vcpu->arch.last_pt_write_count = 1;
+   vcpu->arch.last_pte_updated = NULL;
+   }
}
index = kvm_page_table_hashfn(gfn);
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
@@ -2615,9 +2618,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
-   spin_lock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.invlpg(vcpu, gva);
-   spin_unlock(&vcpu->kvm->mmu_lock);
kvm_mmu_flush_tlb(vcpu);
++vcpu->stat.invlpg;
 }
Index: kvm/arch/x86/kvm/paging_tmpl.h
===
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -82,6 +82,7 @@ struct shadow_walker {
int *ptwrite;
pfn_t pfn;
u64 *sptep;
+   gpa_t pte_gpa;
 };
 
 static gfn_t gpte_to_gfn(pt_element_t gpte)
@@ -222,7 +223,7 @@ walk:
if (ret)
goto walk;
pte |= PT_DIRTY_MASK;
-   kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte));
+   kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte), 0);
walker->ptes[walker->level - 1] = pte;
}
 
@@ -468,8 +469,15 @@ static int FNAME(shadow_invlpg_entry)(st
  struct kvm_vcpu *vcpu, u64 addr,
  u64 *sptep, int level)
 {
+   struct shadow_walker *sw =
+   container_of(_sw, struct shadow_walker, walker);
 
if (level == PT_PAGE_TABLE_LEVEL) {
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   sw->pte_gpa = (sp->gfn << PAGE_SHIFT);
+   sw->pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
+
if (is_shadow_present_pte(*sptep))
rmap_remove(vcpu->kvm, sptep);
set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
@@ -482,11 +490,26 @@ static int FNAME(shadow_invlpg_entry)(st
 
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
+   pt_element_t gpte;
struct shadow_walker walker = {
.walker = { .entry = FNAME(shadow_invlpg_entry), },
+   .pte_gpa = -1,
};
 
+   spin_lock(&vcpu->kvm->mmu_lock);
walk_shadow(&walker.walker, vcpu, gva);
+   spin_unlock(&vcpu->kvm->mmu_lock);
+   if (walker.pte_gpa == -1)
+   return;
+   if (kvm_read_guest_atomic(vcpu->kvm, walker.pte_gpa, &gpte,
+ sizeof(pt_element_t)))
+   return;
+   if (is_present_pte(gpte) && (gpte & PT_ACCESSED_MASK)) {
+   if (mmu_topup_memory_caches(vcpu))
+   return;
+   kvm_mmu_pte_write(vcpu, walker.pte_gpa, (const u8 *)&gpte,
+ sizeof(pt_element_t), 0);
+   }
 }
 
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2046,7 +2046,7 @@ int emulator_write_phys(struct kvm_vcpu 
ret = kvm_write_gue

[patch 3/4] KVM: MMU: skip global pgtables on sync due to cr3 switch

2008-12-01 Thread Marcelo Tosatti
Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
important for Linux 32-bit guests with PAE, where the kmap page is
marked as global.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/include/asm/kvm_host.h
===
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -182,6 +182,8 @@ struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
 
+   struct list_head oos_link;
+
/*
 * The following two entries are used to key the shadow page in the
 * hash table.
@@ -200,6 +202,7 @@ struct kvm_mmu_page {
int multimapped; /* More than one parent_pte? */
int root_count;  /* Currently serving as active root */
bool unsync;
+   bool global;
unsigned int unsync_children;
union {
u64 *parent_pte;   /* !multimapped */
@@ -356,6 +359,7 @@ struct kvm_arch{
 */
struct list_head active_mmu_pages;
struct list_head assigned_dev_head;
+   struct list_head oos_global_pages;
struct dmar_domain *intel_iommu_domain;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
@@ -385,6 +389,7 @@ struct kvm_vm_stat {
u32 mmu_recycled;
u32 mmu_cache_miss;
u32 mmu_unsync;
+   u32 mmu_unsync_global;
u32 remote_tlb_flush;
u32 lpages;
 };
@@ -603,6 +608,7 @@ void __kvm_mmu_free_some_pages(struct kv
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_sync_global(struct kvm_vcpu *vcpu);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -793,9 +793,11 @@ static struct kvm_mmu_page *kvm_mmu_allo
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, 
PAGE_SIZE);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
+   INIT_LIST_HEAD(&sp->oos_link);
ASSERT(is_empty_shadow_page(sp->spt));
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
sp->multimapped = 0;
+   sp->global = 1;
sp->parent_pte = parent_pte;
--vcpu->kvm->arch.n_free_mmu_pages;
return sp;
@@ -1066,10 +1068,18 @@ static struct kvm_mmu_page *kvm_mmu_look
return NULL;
 }
 
+static void kvm_unlink_unsync_global(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+   list_del(&sp->oos_link);
+   --kvm->stat.mmu_unsync_global;
+}
+
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
WARN_ON(!sp->unsync);
sp->unsync = 0;
+   if (sp->global)
+   kvm_unlink_unsync_global(kvm, sp);
--kvm->stat.mmu_unsync;
 }
 
@@ -1615,9 +1625,15 @@ static int kvm_unsync_page(struct kvm_vc
if (s->role.word != sp->role.word)
return 1;
}
-   kvm_mmu_mark_parents_unsync(vcpu, sp);
++vcpu->kvm->stat.mmu_unsync;
sp->unsync = 1;
+
+   if (sp->global) {
+   list_add(&sp->oos_link, &vcpu->kvm->arch.oos_global_pages);
+   ++vcpu->kvm->stat.mmu_unsync_global;
+   } else
+   kvm_mmu_mark_parents_unsync(vcpu, sp);
+
mmu_convert_notrap(sp);
return 0;
 }
@@ -1643,12 +1659,21 @@ static int mmu_need_write_protect(struct
 static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
unsigned pte_access, int user_fault,
int write_fault, int dirty, int largepage,
-   gfn_t gfn, pfn_t pfn, bool speculative,
+   int global, gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync)
 {
u64 spte;
int ret = 0;
u64 mt_mask = shadow_mt_mask;
+   struct kvm_mmu_page *sp = page_header(__pa(shadow_pte));
+
+   if (!global && sp->global) {
+   sp->global = 0;
+   if (sp->unsync) {
+   kvm_unlink_unsync_global(vcpu->kvm, sp);
+   kvm_mmu_mark_parents_unsync(vcpu, sp);
+   }
+   }
 
/*
 * We don't set the accessed bit, since we sometimes want to see
@@ -1717,8 +1742,8 @@ set_pte:
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 unsigned pt_access, unsigned pte_access,
 int user_fault, int write_fault, int dirty,
-int *ptwrite, int largepage, gfn_t gfn,
-pfn_t pfn, bool speculative)
+int *ptwrite, int largepage, int global,
+gfn_t gfn, pfn_t pfn, bool speculative)
 {
   

[patch 2/4] KVM: MMU: collapse remote TLB flushes on root sync

2008-12-01 Thread Marcelo Tosatti
Collapse remote TLB flushes on root sync.

kernbench is 2.7% faster on 4-way guest. Improvements have been seen
with other loads such as AIM7.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -621,7 +621,7 @@ static u64 *rmap_next(struct kvm *kvm, u
return NULL;
 }
 
-static void rmap_write_protect(struct kvm *kvm, u64 gfn)
+static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
unsigned long *rmapp;
u64 *spte;
@@ -667,8 +667,7 @@ static void rmap_write_protect(struct kv
spte = rmap_next(kvm, rmapp, spte);
}
 
-   if (write_protected)
-   kvm_flush_remote_tlbs(kvm);
+   return write_protected;
 }
 
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
@@ -1083,7 +1082,8 @@ static int kvm_sync_page(struct kvm_vcpu
return 1;
}
 
-   rmap_write_protect(vcpu->kvm, sp->gfn);
+   if (rmap_write_protect(vcpu->kvm, sp->gfn))
+   kvm_flush_remote_tlbs(vcpu->kvm);
kvm_unlink_unsync_page(vcpu->kvm, sp);
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
kvm_mmu_zap_page(vcpu->kvm, sp);
@@ -1162,6 +1162,14 @@ static void mmu_sync_children(struct kvm
 
kvm_mmu_pages_init(parent, &parents, &pages);
while (mmu_unsync_walk(parent, &pages)) {
+   int protected = 0;
+
+   for_each_sp(pages, sp, parents, i)
+   protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
+
+   if (protected)
+   kvm_flush_remote_tlbs(vcpu->kvm);
+
for_each_sp(pages, sp, parents, i) {
kvm_sync_page(vcpu, sp);
mmu_pages_clear_parents(&parents);
@@ -1226,7 +1234,8 @@ static struct kvm_mmu_page *kvm_mmu_get_
sp->role = role;
hlist_add_head(&sp->hash_link, bucket);
if (!metaphysical) {
-   rmap_write_protect(vcpu->kvm, gfn);
+   if (rmap_write_protect(vcpu->kvm, gfn))
+   kvm_flush_remote_tlbs(vcpu->kvm);
account_shadowed(vcpu->kvm, gfn);
}
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/4] KVM: MMU: use page array in unsync walk

2008-12-01 Thread Marcelo Tosatti
Instead of invoking the handler directly collect pages into 
an array so the caller can work with it.

Simplifies TLB flush collapsing.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -908,8 +908,9 @@ static void kvm_mmu_update_unsync_bitmap
struct kvm_mmu_page *sp = page_header(__pa(spte));
 
index = spte - sp->spt;
-   __set_bit(index, sp->unsync_child_bitmap);
-   sp->unsync_children = 1;
+   if (!__test_and_set_bit(index, sp->unsync_child_bitmap))
+   sp->unsync_children++;
+   WARN_ON(!sp->unsync_children);
 }
 
 static void kvm_mmu_update_parents_unsync(struct kvm_mmu_page *sp)
@@ -936,7 +937,6 @@ static void kvm_mmu_update_parents_unsyn
 
 static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
-   sp->unsync_children = 1;
kvm_mmu_update_parents_unsync(sp);
return 1;
 }
@@ -967,18 +967,41 @@ static void nonpaging_invlpg(struct kvm_
 {
 }
 
+#define KVM_PAGE_ARRAY_NR 16
+
+struct kvm_mmu_pages {
+   struct mmu_page_and_offset {
+   struct kvm_mmu_page *sp;
+   unsigned int idx;
+   } page[KVM_PAGE_ARRAY_NR];
+   unsigned int nr;
+};
+
 #define for_each_unsync_children(bitmap, idx)  \
for (idx = find_first_bit(bitmap, 512); \
 idx < 512; \
 idx = find_next_bit(bitmap, 512, idx+1))
 
-static int mmu_unsync_walk(struct kvm_mmu_page *sp,
-  struct kvm_unsync_walk *walker)
+int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
+  int idx)
 {
-   int i, ret;
+   int i;
 
-   if (!sp->unsync_children)
-   return 0;
+   if (sp->unsync)
+   for (i=0; i < pvec->nr; i++)
+   if (pvec->page[i].sp == sp)
+   return 0;
+
+   pvec->page[pvec->nr].sp = sp;
+   pvec->page[pvec->nr].idx = idx;
+   pvec->nr++;
+   return (pvec->nr == KVM_PAGE_ARRAY_NR);
+}
+
+static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
+  struct kvm_mmu_pages *pvec)
+{
+   int i, ret, nr_unsync_leaf = 0;
 
for_each_unsync_children(sp->unsync_child_bitmap, i) {
u64 ent = sp->spt[i];
@@ -988,17 +1011,22 @@ static int mmu_unsync_walk(struct kvm_mm
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
if (child->unsync_children) {
-   ret = mmu_unsync_walk(child, walker);
-   if (ret)
+   if (mmu_pages_add(pvec, child, i))
+   return -ENOSPC;
+
+   ret = __mmu_unsync_walk(child, pvec);
+   if (!ret)
+   __clear_bit(i, sp->unsync_child_bitmap);
+   else if (ret > 0)
+   nr_unsync_leaf += ret;
+   else
return ret;
-   __clear_bit(i, sp->unsync_child_bitmap);
}
 
if (child->unsync) {
-   ret = walker->entry(child, walker);
-   __clear_bit(i, sp->unsync_child_bitmap);
-   if (ret)
-   return ret;
+   nr_unsync_leaf++;
+   if (mmu_pages_add(pvec, child, i))
+   return -ENOSPC;
}
}
}
@@ -1006,7 +1034,17 @@ static int mmu_unsync_walk(struct kvm_mm
if (find_first_bit(sp->unsync_child_bitmap, 512) == 512)
sp->unsync_children = 0;
 
-   return 0;
+   return nr_unsync_leaf;
+}
+
+static int mmu_unsync_walk(struct kvm_mmu_page *sp,
+  struct kvm_mmu_pages *pvec)
+{
+   if (!sp->unsync_children)
+   return 0;
+
+   mmu_pages_add(pvec, sp, 0);
+   return __mmu_unsync_walk(sp, pvec);
 }
 
 static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
@@ -1056,30 +1094,81 @@ static int kvm_sync_page(struct kvm_vcpu
return 0;
 }
 
-struct sync_walker {
-   struct kvm_vcpu *vcpu;
-   struct kvm_unsync_walk walker;
+struct mmu_page_path {
+   struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
+   unsigned int idx[PT64_ROOT_LEVEL-1];
 };
 
-static int mmu_sync_fn(struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk)
+#define for_each_sp(pvec, sp, parents, i)  \
+   for (i = mmu_pages_next(&pvec, &parents, -1),   \
+   sp = 

[patch 0/4] oos shadow optimizations v2

2008-12-01 Thread Marcelo Tosatti
Addressing comments from previous version.

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM

2008-12-01 Thread Han, Weidong
It's fine. You only needs to change the APIs to generic APIs. I will update it 
soon.

Regards,
Weidong

Joerg Roedel wrote:
> Ok, I got them to apply. I also did the checkpatch cleanups. To speed
> things up a bit I would suggest that I rebase my patchset on your
> patches and send it out in a single series. Any problems with this
> approach?
> 
> Joerg
> 
> On Mon, Dec 01, 2008 at 09:22:42PM +0800, Han, Weidong wrote:
>> Sorry, this patch has style problem. I will update it and also split
>> it to smaller patches for easy reviewing. 
>> 
>> Regards,
>> Weidong
>> 
>> 'Joerg Roedel' wrote:
>>> Hmm, I get these errors using git-am:
>>> 
>>> Applying VT-d: Support multiple device assignment for KVM
>>> .dotest/patch:1344: space before tab in indent.
>>> clflush_cache_range(addr, size);
>>> .dotest/patch:1350: space before tab in indent.
>>> clflush_cache_range(addr, size);
>>> .dotest/patch:1907: trailing whitespace.
>>> 
>>> .dotest/patch:1946: trailing whitespace.
>>>  * owned by this domain, clear this iommu in
>>> iommu_bmp .dotest/patch:2300: trailing whitespace.
>>> 
>>> error: patch failed: drivers/pci/dmar.c:484
>>> error: drivers/pci/dmar.c: patch does not apply
>>> error: patch failed: drivers/pci/intel-iommu.c:50
>>> error: drivers/pci/intel-iommu.c: patch does not apply
>>> error: patch failed: include/linux/dma_remapping.h:111
>>> error: include/linux/dma_remapping.h: patch does not apply
>>> error: patch failed: include/linux/intel-iommu.h:219
>>> error: include/linux/intel-iommu.h: patch does not apply
>>> Patch failed at 0001.
>>> 
>>> Joerg
>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.

2008-12-01 Thread Hollis Blanchard
On Tue, 2008-12-02 at 00:02 +0100, Jan Kiszka wrote:
> > 
> > Guys, we already have stubs for this (although they've been turned into
> > dead code). Jan broke IA64 and PowerPC builds when he renamed
> > "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is
> > to update the stubs to match. That avoids all these ifdefs and
> > associated problems.
> 
> Ouch - I'm sorry.

Well, it happens, but I do wish that more people would use cscope or
even grep to find all users of a symbol.

I also wish that Avi would get his PPC box working so he could catch
build breaks like these. Cross-compilers would do as well.

I would also like a pony.

> > Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please?
> > 
> 
> Here is a patch that reverts change and fixes the root of the issue.

Acked-by: Hollis Blanchard <[EMAIL PROTECTED]>

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.

2008-12-01 Thread Jan Kiszka
Hollis Blanchard wrote:
> On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote:
>> Zhang, Xiantao wrote:
>>> >From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001
>>> From: Xiantao Zhang <[EMAIL PROTECTED]>
>>> Date: Fri, 28 Nov 2008 09:38:46 +0800
>>> Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
>>>
>>> Use TARGET_I386 to exclude other archs.
>>> Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]>
>>> ---
>>>  libkvm/libkvm.c |4 ++--
>>>  qemu/qemu-kvm.c |4 
>>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
>>> index 40c95ce..851a93a 100644
>>> --- a/libkvm/libkvm.c
>>> +++ b/libkvm/libkvm.c
>>> @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env)
>>> struct kvm_run *run = kvm->run[vcpu];
>>>  
>>>  again:
>>> -#ifdef KVM_CAP_NMI
>>> +#ifdef TARGET_I386
>>> push_nmi(kvm);
>>>  #endif
>>>  #if !defined(__s390__)
>>> @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm)
>>>  
>>>  int kvm_inject_nmi(kvm_context_t kvm, int vcpu)
>>>  {
>>> -#ifdef KVM_CAP_NMI
>>> +#ifdef TARGET_I386
>>> return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI);
>>>  #else
>>> return -ENOSYS;
>>> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
>>> index cf0e85d..b6c8288 100644
>>> --- a/qemu/qemu-kvm.c
>>> +++ b/qemu/qemu-kvm.c
>>> @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque)
>>>  return kvm_arch_try_push_interrupts(opaque);
>>>  }
>>>  
>>> +#ifdef TARGET_I386
>>>  static void push_nmi(void *opaque)
>>>  {
>>>  kvm_arch_push_nmi(opaque);
>>>  }
>>> +#endif
>>>  
>>>  static void post_kvm_run(void *opaque, void *data)
>>>  {
>>> @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = {
>>>  .shutdown = kvm_shutdown,
>>>  .io_window = kvm_io_window,
>>>  .try_push_interrupts = try_push_interrupts,
>>> +#ifdef TARGET_I386
>>>  .push_nmi = push_nmi,
>>> +#endif
>>>  .post_kvm_run = post_kvm_run,
>>>  .pre_kvm_run = pre_kvm_run,
>>>  #ifdef TARGET_I386
>> This will now break when KVM_CAP_NMI is undefined, ie. when there is no
>> KVM_NMI IOCTL (=> older kvm module sets).
> 
> Guys, we already have stubs for this (although they've been turned into
> dead code). Jan broke IA64 and PowerPC builds when he renamed
> "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is
> to update the stubs to match. That avoids all these ifdefs and
> associated problems.

Ouch - I'm sorry.

> 
> Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please?
> 

Here is a patch that reverts change and fixes the root of the issue.

---

Subject: Fix non-x86 NMI hooks

My previous x86-only change to the NMI push hook broke PPC and IA64.
This is a proper fix plus a cleanup of the #ifdef-based approach to
solve the breakage.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---

 qemu/qemu-kvm-ia64.c|3 +--
 qemu/qemu-kvm-powerpc.c |3 +--
 qemu/qemu-kvm.c |4 
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/qemu/qemu-kvm-ia64.c b/qemu/qemu-kvm-ia64.c
index 8380f39..a6b17af 100644
--- a/qemu/qemu-kvm-ia64.c
+++ b/qemu/qemu-kvm-ia64.c
@@ -57,9 +57,8 @@ int kvm_arch_try_push_interrupts(void *opaque)
 return 1;
 }
 
-int kvm_arch_try_push_nmi(void *opaque)
+void kvm_arch_push_nmi(void *opaque)
 {
-return 1;
 }
 
 void kvm_arch_update_regs_for_sipi(CPUState *env)
diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c
index 19fde40..fa534ed 100644
--- a/qemu/qemu-kvm-powerpc.c
+++ b/qemu/qemu-kvm-powerpc.c
@@ -188,12 +188,11 @@ int kvm_arch_try_push_interrupts(void *opaque)
 return 0;
 }
 
-int kvm_arch_try_push_nmi(void *opaque)
+void kvm_arch_push_nmi(void *opaque)
 {
/* no nmi irq, so discard that call for now and return success.
 * This might later get mapped to something on powerpc too if we want
 *  to support the nmi monitor command somwhow */
-   return 0;
 }
 
 void kvm_arch_update_regs_for_sipi(CPUState *env)
diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index b6c8288..cf0e85d 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -154,12 +154,10 @@ static int try_push_interrupts(void *opaque)
 return kvm_arch_try_push_interrupts(opaque);
 }
 
-#ifdef TARGET_I386
 static void push_nmi(void *opaque)
 {
 kvm_arch_push_nmi(opaque);
 }
-#endif
 
 static void post_kvm_run(void *opaque, void *data)
 {
@@ -744,9 +742,7 @@ static struct kvm_callbacks qemu_kvm_ops = {
 .shutdown = kvm_shutdown,
 .io_window = kvm_io_window,
 .try_push_interrupts = try_push_interrupts,
-#ifdef TARGET_I386
 .push_nmi = push_nmi,
-#endif
 .post_kvm_run = post_kvm_run,
 .pre_kvm_run = pre_kvm_run,
 #ifdef TARGET_I386



signature.asc
Description: OpenPGP digital signature


Re: 1-1 mapping of devices without VT-d

2008-12-01 Thread Dor Laor

Michael Tokarev wrote:

Dor Laor wrote:
[]
  

Although it had worked for us out of tree, there is no immediate need to
pursue it.
If anyone would like to nurture these patches he is more than welcome.
ps: you also have pv-dma option for Linux guests (same status though).
As time goes by most host will have either vt-d or amd iommu.



Hmm.  Well, as time goes by, most hosts will be 64 bit or more.
But it does not mean that there's no need to maintain 32bits
arch anymore...  i hope anyway :)

  

But of course

Are you saying that PCI passthrough without hardware support will
not be available in (standard) kvm, even if patches exists for that?

  
No, just might take a some time to go to mainline. Patches need further 
polishing and we

also need wider demand for it.
Actually pvdma can help vt-d so we won't have to make all the guest 
memory unswappable.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: STOP error with virtio on KVM-79/2.6.18/Win2k3 x64 guest

2008-12-01 Thread Dor Laor

Adrian Schmitz wrote:

Sorry for the repost.. I forgot the subject line!
Hi, I'm having problems with STOP errors (0x00d1) under
KVM-79/2.6.18 whenever I try to use the virtio drivers. This post
(http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue
exactly, except that I'm using a Win2k3 x64 guest with the x64
paravirtual drivers instead of 32-bit guest/drivers. I am able to
reproduce the problem reliably using iperf, the same as in the above
post. When I disable virtio, the guest is very stable. Any suggestions
are greatly appreciated.

  

What driver version are you using? Version 2 is obsolete.
I posted ver 3 few months ago, Avi can you please upload it to sourceforge.
My old public space was blocked so I'll send you a private attachment to 
test.


Dor.

-Adrian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1-1 mapping of devices without VT-d

2008-12-01 Thread Michael Tokarev
Dor Laor wrote:
[]
> Although it had worked for us out of tree, there is no immediate need to
> pursue it.
> If anyone would like to nurture these patches he is more than welcome.
> ps: you also have pv-dma option for Linux guests (same status though).
> As time goes by most host will have either vt-d or amd iommu.

Hmm.  Well, as time goes by, most hosts will be 64 bit or more.
But it does not mean that there's no need to maintain 32bits
arch anymore...  i hope anyway :)

Are you saying that PCI passthrough without hardware support will
not be available in (standard) kvm, even if patches exists for that?

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] PCI: allow pci driver to support only dynids

2008-12-01 Thread Jesse Barnes
On Tuesday, November 25, 2008 7:36 pm Chris Wright wrote:
> commit b41d6cf38e27 (PCI: Check dynids driver_data value for validity)
> requires all drivers to include an id table to try and match
> driver_data.  Before validating driver_data check driver has an id
> table.
>
> Cc: Jean Delvare <[EMAIL PROTECTED]>
> Cc: Milton Miller <[EMAIL PROTECTED]>
> Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

Applied these to my linux-next branch, thanks Chris.

-- 
Jesse Barnes, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] Virtio block device support

2008-12-01 Thread Anthony Liguori

Hollis Blanchard wrote:

On Tue, 2008-11-25 at 15:57 -0600, Anthony Liguori wrote:
  

diff --git a/hw/pc.h b/hw/pc.h
index f156b9e..bbfa2d6 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -152,4 +152,8 @@ void pci_piix4_ide_init(PCIBus *bus,
BlockDriverState **hd_table, int devfn,

 void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd);

+/* virtio-blk.c */
+void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device,
+  BlockDriverState *bs);
+
 #endif



This shouldn't be in pc.h.


I don't disagree.


 I don't know if you'd consider virtio.h to be
a layering violation, but the virtio layers are already being compressed
in these patches...
  


Yeah, I think the virtio stuff could use some love but I'd like to avoid 
that until we have something in tree and merged against kvm-userspace.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] extboot: properly set int 0x13 return value

2008-12-01 Thread Anthony Liguori

Glauber Costa wrote:

Callers of int 0x13 usually rely on the carry flag being
clear/set to indicate the status of the interrupt execution.

However, our current code clear or set the flags register,
which is totally useless. Whichever value it has, will
be overwritten by the flags value _before_ the interrupt, due to
the iret instruction.

This fixes a bug that prevents slackware (and possibly win2k, untested)
to boot.
  


Good catch!


Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>

  


Acked-by: Anthony Liguori <[EMAIL PROTECTED]>

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] Virtio block device support

2008-12-01 Thread Hollis Blanchard
On Tue, 2008-11-25 at 15:57 -0600, Anthony Liguori wrote:
> diff --git a/hw/pc.h b/hw/pc.h
> index f156b9e..bbfa2d6 100644
> --- a/hw/pc.h
> +++ b/hw/pc.h
> @@ -152,4 +152,8 @@ void pci_piix4_ide_init(PCIBus *bus,
> BlockDriverState **hd_table, int devfn,
> 
>  void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd);
> 
> +/* virtio-blk.c */
> +void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device,
> +  BlockDriverState *bs);
> +
>  #endif

This shouldn't be in pc.h. I don't know if you'd consider virtio.h to be
a layering violation, but the virtio layers are already being compressed
in these patches...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1-1 mapping of devices without VT-d

2008-12-01 Thread Dor Laor

Passera, Pablo R wrote:

Hi everyone,
I want to assign a PCI device directly to a VM (PCI passthrough) in a 
machine that does not have VT-d. I found something related with this in a 
presentation done at the 2008 KVM Forum called 1-1 mapping and a patch for this 
at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753. I 
am wondering if this is included or are there plans to include it in the latest 
KVM version?

  
Although it had worked for us out of tree, there is no immediate need to 
pursue it.

If anyone would like to nurture these patches he is more than welcome.
ps: you also have pv-dma option for Linux guests (same status though).
As time goes by most host will have either vt-d or amd iommu.

Regards,
Dor


Thanks in advance,

Pablo Pássera

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] extboot: properly set int 0x13 return value

2008-12-01 Thread Glauber Costa
Callers of int 0x13 usually rely on the carry flag being
clear/set to indicate the status of the interrupt execution.

However, our current code clear or set the flags register,
which is totally useless. Whichever value it has, will
be overwritten by the flags value _before_ the interrupt, due to
the iret instruction.

This fixes a bug that prevents slackware (and possibly win2k, untested)
to boot.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 extboot/extboot.S |   52 ++--
 1 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/extboot/extboot.S b/extboot/extboot.S
index 2630abb..e3d1adf 100644
--- a/extboot/extboot.S
+++ b/extboot/extboot.S
@@ -99,24 +99,24 @@ int19_handler:
 
 #define FLAGS_CF   0x01
 
-.macro clc
-   push %ax
-   pushf
-   pop %ax
-   and $(~FLAGS_CF), %ax
-   push %ax
-   popf
-   pop %ax
+/* The two macro below clear/set the carry flag to indicate the status
+ * of the interrupt execution. It is not enough to issue a clc/stc instruction,
+ * since the value of the flags register will be overwritten by whatever is
+ * in the stack frame
+ */
+.macro clc_stack
+   push %bp
+   mov %sp, %bp
+   /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) */
+   and $(~FLAGS_CF), 8(%bp)
+   pop %bp
 .endm
 
-.macro stc
-   push %ax
-   pushf
-   pop %ax
-   or $(FLAGS_CF), %ax
-   push %ax
-   popf
-   pop %ax
+.macro stc_stack
+   push %bp
+   /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) */
+   or $(FLAGS_CF), 8(%bp)
+   pop %bp
 .endm
 
 /* we clobber %bx */
@@ -292,7 +292,7 @@ mul32:  /* lo,  hi, lo, hi */
 
 disk_reset:
movb $0, %ah
-   clc
+   clc_stack
ret
 
 /* this really should be a function, not a macro but i'm lazy */
@@ -395,7 +395,7 @@ disk_reset:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 .endm
 
@@ -454,12 +454,12 @@ read_disk_drive_parameters:
pop %bx
 
/* do this last since it's the most sensitive */
-   clc
+   clc_stack
ret
 
 alternate_disk_reset:
movb $0, %ah
-   clc
+   clc_stack
ret
 
 read_disk_drive_size:
@@ -498,21 +498,21 @@ read_disk_drive_size:
freea
pop %bx
 
-   clc
+   clc_stack
ret
 
 check_if_extensions_present:
mov $0x30, %ah
mov $0xAA55, %bx
mov $0x07, %cx
-   clc
+   clc_stack
ret
 
 .macro extended_read_write_sectors cmd
cmpb $10, 0(%si)
jg 1f
mov $1, %ah
-   stc
+   stc_stack
ret
 1:
push %ax
@@ -544,7 +544,7 @@ check_if_extensions_present:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 .endm
 
@@ -612,12 +612,12 @@ get_extended_drive_parameters:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 
 terminate_disk_emulation:
mov $1, %ah
-   stc
+   stc_stack
ret
 
 int13_handler:
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] extboot: properly set int 0x13 return value

2008-12-01 Thread Glauber Costa
Callers of int 0x13 usually rely on the carry flag being
clear/set to indicate the status of the interrupt execution.

However, our current code clear or set the flags register,
which is totally useless. Whichever value it has, will
be overwritten by the flags value _before_ the interrupt, due to
the iret instruction.

This fixes a bug that prevents slackware (and possibly win2k, untested)
to boot.

Signed-off-by: Glauber Costa <[EMAIL PROTECTED]>
---
 extboot/extboot.S |   52 ++--
 1 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/extboot/extboot.S b/extboot/extboot.S
index 2630abb..4cbfe11 100644
--- a/extboot/extboot.S
+++ b/extboot/extboot.S
@@ -99,24 +99,24 @@ int19_handler:
 
 #define FLAGS_CF   0x01
 
-.macro clc
-   push %ax
-   pushf
-   pop %ax
-   and $(~FLAGS_CF), %ax
-   push %ax
-   popf
-   pop %ax
+/* The two macro below clear/set the carry flag to indicate the status
+ * of the interrupt execution. It is not enough to issue a clc/stc 
instruction, 
+ * since the value of the flags register will be overwritten by whatever is
+ * in the stack frame
+ */
+.macro clc_stack
+   push %bp
+   mov %sp, %bp
+   /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame)
+   and $(~FLAGS_CF), 8(%bp)
+   pop %bp
 .endm
 
-.macro stc
-   push %ax
-   pushf
-   pop %ax
-   or $(FLAGS_CF), %ax
-   push %ax
-   popf
-   pop %ax
+.macro stc_stack
+   push %bp
+   /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame)
+   or $(FLAGS_CF), 8(%bp)
+   pop %bp
 .endm
 
 /* we clobber %bx */
@@ -292,7 +292,7 @@ mul32:  /* lo,  hi, lo, hi */
 
 disk_reset:
movb $0, %ah
-   clc
+   clc_stack
ret
 
 /* this really should be a function, not a macro but i'm lazy */
@@ -395,7 +395,7 @@ disk_reset:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 .endm
 
@@ -454,12 +454,12 @@ read_disk_drive_parameters:
pop %bx
 
/* do this last since it's the most sensitive */
-   clc
+   clc_stack
ret
 
 alternate_disk_reset:
movb $0, %ah
-   clc
+   clc_stack
ret
 
 read_disk_drive_size:
@@ -498,21 +498,21 @@ read_disk_drive_size:
freea
pop %bx
 
-   clc
+   clc_stack
ret
 
 check_if_extensions_present:
mov $0x30, %ah
mov $0xAA55, %bx
mov $0x07, %cx
-   clc
+   clc_stack
ret
 
 .macro extended_read_write_sectors cmd
cmpb $10, 0(%si)
jg 1f
mov $1, %ah
-   stc
+   stc_stack
ret
 1:
push %ax
@@ -544,7 +544,7 @@ check_if_extensions_present:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 .endm
 
@@ -612,12 +612,12 @@ get_extended_drive_parameters:
pop %ax
 
mov $0, %ah
-   clc
+   clc_stack
ret
 
 terminate_disk_emulation:
mov $1, %ah
-   stc
+   stc_stack
ret
 
 int13_handler:
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


splice() based interguest networking

2008-12-01 Thread Anthony Liguori
Here's a random thought I had after seeing the new Xen netchannel2 tree 
had fast-path support for guest<=>guest communication.


With virtio, we could do really fast interguest networking in 
userspace.  We have a few requirements though:


1) There should be a minimal number of copies, just one in almost all cases.
2) The copy should occur on the receiving end since the receiver is most 
likely going to be accessing the data in the future
3) The copy should be done in the kernel so that in the future it could 
be accelerated with a generic DMA engine.


So far, all the approaches required mmap()'ing the guest memory in both 
QEMU instances which makes it much less useful.  I think splice solves 
this problem though and gets us most of the above for free.


If we have two shared pipes() between the two QEMU processes, then:

1) On TX, we vmsplice() from the sg buffer to one pipe.  This will end 
up being vmsplice_to_pipe() in the kernel which is zero-copy.


2) The pipe becomes readable which will result in an RX notification in 
the other process, we see if we have any buffers available in the 
receive queue.  If so, we vmsplice() from the pipe to the sg buffer.  
This will result in a copy via vmsplice_to_user().  In the future, 
vmsplice_to_user() would be an obvious candidate for IO-AT acceleration.


Since the copy is happening in the kernel, assuming you're not in a 
highmem situation, no page table manipulation is required.


We still have to address feature negotation and such.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2351676 ] Guests hang periodically on Ubuntu-8.10

2008-12-01 Thread SourceForge.net
Bugs item #2351676, was opened at 2008-11-26 12:59
Message generated for change (Comment added) made by c_jones
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Chris Jones (c_jones)
Assigned to: Nobody/Anonymous (nobody)
Summary: Guests hang periodically on Ubuntu-8.10

Initial Comment:
I'm seeing periodic hangs on my guests.  I've been unable so far to find a 
trigger - they always boot fine, but after anywhere from 10 minutes to 24 hours 
they eventually hang completely.

My setup:
  * AMD Athlon X2 4850e (2500 MHz dual core)
  * 4Gig memory
  * Ubuntu 8.10 server, 64-bit
  * KVMs tried:
: kvm-72 (shipped with ubuntu)
: kvm-79 (built myself, --patched-kernel option)
  * Kernels tried:
: 2.6.27.7 (kernel.org, self built)
: 2.6.27-7-server from Ubuntu 8.10 distribution

  In guests
  * Ubuntu 8.10 server, 64-bit (virtual machine install)
  * kernel 2.6.27-7-server from Ubuntu 8.10

I'm running the guests like:
  sudo /usr/local/bin/qemu-system-x86_64\
 -daemonize \
 -no-kvm-irqchip\
 -hda Imgs/ndev_root.img\
 -m 1024\
 -cdrom ISOs/ubuntu-8.10-server-amd64.iso   \
 -vnc :4\
 -net nic,macaddr=DE:AD:BE:EF:04:04,model=e1000 \
 -net tap,ifname=tap4,script=/home/chris/kvm/qemu-ifup.sh 

The problem does not happen if I use -no-kvm.

I've tried some other options that have no effect:
  -no-kvm-pit
  -no-acpi

The disk images are raw format.

When the guests hang, I cannot ping them, and the vnc console us hung.  The 
qemu monitor is still accessible, and the guests recover if I issue a 
system_reset command from the monitor.  However, often, the console will not 
take keyboard after doing so.

When the guest is hung, kvm_stat shows all 0s for the counters:

efer_relo  exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall
+insn_emul  insn_emul invlpg   io_exits  irq_exits  irq_windo  largepage
+mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc
+mmu_shado  nmi_windo   pf_fixed   pf_guest  remote_tl  request_i  signal_ex
+tlb_flush
>  0  0  0  0  0  0  0
+0  0  0  0  0  0  0  0
+0  0  0  0  0  0  0  0
+0  0  0  0  0  0

gdb shows two threads - both waiting:

c(gdb) info threads
  2 Thread 0x414f1950 (LWP 422)  0x7f36f07a03e1 in sigtimedwait ()
   from /lib/libc.so.6
  1 Thread 0x7f36f1f306e0 (LWP 414)  0x7f36f084b482 in select ()
   from /lib/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f36f1f306e0 (LWP 414))]#0  0x7f36f084b482
+in select () from /lib/libc.so.6
(gdb) bt
#0  0x7f36f084b482 in select () from /lib/libc.so.6
#1  0x004094cb in main_loop_wait (timeout=0)
at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4719
#2  0x0050a7ea in kvm_main_loop ()
at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:619
#3  0x0040fafc in main (argc=,
argv=0x79f41948) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4871
(gdb) thread 2
[Switching to thread 2 (Thread 0x414f1950 (LWP 422))]#0  0x7f36f07a03e1 in
+sigtimedwait () from /lib/libc.so.6
(gdb) bt
#0  0x7f36f07a03e1 in sigtimedwait () from /lib/libc.so.6
#1  0x0050a560 in kvm_main_loop_wait (env=0xc319e0, timeout=0)
at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:284
#2  0x0050aaf7 in ap_main_loop (_env=)
at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:425
#3  0x7f36f11ba3ea in start_thread () from /lib/libpthread.so.0
#4  0x7f36f0852c6d in clone () from /lib/libc.so.6
#5  0x in ?? ()


Any clues to help me resolve this would be much appreciated.


--

>Comment By: Chris Jones (c_jones)
Date: 2008-12-01 14:09

Message:
Alexey,

Thanks for the response.  As you advised, I tried a Fedora 8 guest, and it
does seem to be much more stable.  However, I really need a Debian base
system for my application.  Not necessarily Ubuntu 8.10, but I haven't had
much luck with others either.  Do you have any recommendations on one that
is particularly stable?

Over the weekend I tried:
  Fedora 8   : Seems very stable, but I really need a debian base.
  Ubuntu 8.04LTS : Same periodic hangs I was seeing on 8.10
  Debian 4.0 Etch: Seems stable on the guest, but on the host, qemu
process is running 100% busy
   while the guest is idle.

STOP error with virtio on KVM-79/2.6.18/Win2k3 x64 guest

2008-12-01 Thread Adrian Schmitz
Sorry for the repost.. I forgot the subject line!
Hi, I'm having problems with STOP errors (0x00d1) under
KVM-79/2.6.18 whenever I try to use the virtio drivers. This post
(http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue
exactly, except that I'm using a Win2k3 x64 guest with the x64
paravirtual drivers instead of 32-bit guest/drivers. I am able to
reproduce the problem reliably using iperf, the same as in the above
post. When I disable virtio, the guest is very stable. Any suggestions
are greatly appreciated.

-Adrian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2008-12-01 Thread Adrian Schmitz
Hi, I'm having problems with STOP errors (0x00d1) under
KVM-79/2.6.18 whenever I try to use the virtio drivers. This post
(http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue
exactly, except that I'm using a Win2k3 x64 guest with the x64
paravirtual drivers instead of 32-bit guest/drivers. I am able to
reproduce the problem reliably using iperf, the same as in the above
post. When I disable virtio, the guest is very stable. Any suggestions
are greatly appreciated.

-Adrian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [v2] Remove TARGET_PAGE_SIZE from virtio interface

2008-12-01 Thread Hollis Blanchard
TARGET_PAGE_SIZE should only be used internal to qemu, not in guest/host
interfaces. The virtio frontend code in Linux uses two constants (PFN shift
and vring alignment) for the interface, so update qemu to match.

I've tested this with PowerPC KVM and confirmed that it fixes virtio problems
when using non-TARGET_PAGE_SIZE pages in the guest.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
---
Corrects a silly bug in v1.

Paul Brook doesn't like the idea of a generic align() macro, so vring_align()
is correct.
---
 hw/virtio.c |   16 +---
 hw/virtio.h |6 ++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index e4224ab..0134b0b 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -51,6 +51,14 @@
 /* Virtio ABI version, if we increment this, we break the guest driver. */
 #define VIRTIO_PCI_ABI_VERSION   0
 
+/* How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size. */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT12
+
+/* The alignment to use between consumer and producer parts of vring.
+ * x86 pagesize again. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
@@ -110,7 +118,9 @@ static void virtqueue_init(VirtQueue *vq, 
target_phys_addr_t pa)
 {
 vq->vring.desc = pa;
 vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
-vq->vring.used = TARGET_PAGE_ALIGN(vq->vring.avail + offsetof(VRingAvail, 
ring[vq->vring.num]));
+vq->vring.used = vring_align(vq->vring.avail +
+ offsetof(VRingAvail, ring[vq->vring.num]),
+ VIRTIO_PCI_VRING_ALIGN);
 }
 
 static inline uint64_t vring_desc_addr(VirtQueue *vq, int i)
@@ -386,7 +396,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 vdev->features = val;
 break;
 case VIRTIO_PCI_QUEUE_PFN:
-pa = (ram_addr_t)val << TARGET_PAGE_BITS;
+pa = (ram_addr_t)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
 vdev->vq[vdev->queue_sel].pfn = val;
 if (pa == 0)
 virtio_reset(vdev);
@@ -660,7 +670,7 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f)
 if (vdev->vq[i].pfn) {
 target_phys_addr_t pa;
 
-pa = (ram_addr_t)vdev->vq[i].pfn << TARGET_PAGE_BITS;
+pa = (ram_addr_t)vdev->vq[i].pfn << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
 virtqueue_init(&vdev->vq[i], pa);
 }
 }
diff --git a/hw/virtio.h b/hw/virtio.h
index 1df8f83..ae92ece 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -47,6 +47,12 @@
 /* This means don't interrupt guest when buffer consumed. */
 #define VRING_AVAIL_F_NO_INTERRUPT1
 
+static inline target_phys_addr_t vring_align(target_phys_addr_t addr,
+ unsigned long align)
+{
+return (addr + align - 1) & ~(align - 1);
+}
+
 typedef struct VirtQueue VirtQueue;
 typedef struct VirtIODevice VirtIODevice;
 
-- 
1.5.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread 'Joerg Roedel'
On Fri, Nov 28, 2008 at 10:50:36AM +0800, Han, Weidong wrote:
> Joerg Roedel wrote:
> > +struct iommu_domain *iommu_domain_alloc(struct device *dev)
> > +{
> > +   struct iommu_domain *domain;
> > +   int ret;
> > +
> > +   domain = kmalloc(sizeof(*domain), GFP_KERNEL);
> > +   if (!domain)
> > +   return NULL;
> > +
> > +   ret = iommu_ops->domain_init(domain, dev);
> > +   if (ret)
> > +   goto out_free;
> > +
> > +   return domain;
> > +
> > +out_free:
> > +   kfree(domain);
> > +
> > +   return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_domain_alloc);
> 
> remove the parameter dev.

[x] Done.

> > +
> > +void iommu_domain_free(struct iommu_domain *domain)
> > +{
> > +   iommu_ops->domain_destroy(domain);
> > +   kfree(domain);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_domain_free);
> > +
> > +int iommu_attach_device(struct iommu_domain *domain, struct device
> > *dev) +{
> > +   return iommu_ops->attach_dev(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_attach_device);
> > +
> > +void iommu_detach_device(struct iommu_domain *domain, struct device
> > *dev) +{
> > +   iommu_ops->detach_dev(domain, dev);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_detach_device);
> > +
> > +int iommu_map_address(struct iommu_domain *domain,
> > + dma_addr_t iova, phys_addr_t paddr,
> > + size_t size, int prot)
> > +{
> > +   return iommu_ops->map(domain, iova, paddr, size, prot);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_map_address);
> 
> change to:
> int iommu_map_pages(struct iommu_domain *domain, unsigned long gfn,
>   unsigned long pfn, unsigned long npages, int prot)
> {
>   return iommu_ops->map(domain, gfn, pfn, npages, prot);
> }
> EXPORT_SYMBOL_GPL(iommu_map_pages);
> 
> int iommu_unmap_pages(struct iommu_domain *domain, unsigned long gfn, 
> unsigned long npages)
> {
>   return iommu_ops->map(domain, gfn, npages);
> }
> EXPORT_SYMBOL_GPL(iommu_unmap_pages);

Ok, I added the unmap function. But I think this API should work with
addresses instead of page numbers. This way the IO page size is
transparent for the user.

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?

2008-12-01 Thread Avi Kivity

Anthony Liguori wrote:


We've started getting some reports of corruption on "commit" in KVM.  
There is a long standing disk corruption issue too that is very 
difficult to reproduce.  The thinking is that there is a bug somewhere 
in the qcow2 code.


Is anyone actively looking into this?



I am, though my actively is a lot less than could be desired.  
Additional eyes would be welcome.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling error : ppc440_bamboo.o fails

2008-12-01 Thread Hollis Blanchard
Hi Giuseppe, thanks for your mail. Feel free to CC
[EMAIL PROTECTED] in the future, too... :)

On Mon, 2008-12-01 at 15:08 +0100, Giuseppe Falsetti wrote:
> Error messages:
> gcc -I. -I.. -I/root/kvm-userspace/qemu/target-ppc 
> -I/root/kvm-userspace/qemu -MMD -MT ppc440_bamboo.o -MP -DNEED_CPU_H 
> -D__powerpc__ -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
> -D__user= -I/root/kvm-userspace/qemu/tcg 
> -I/root/kvm-userspace/qemu/tcg/ppc -I/root/kvm-userspace/qemu/fpu 
> -DHAS_AUDIO -DHAS_AUDIO_CHOICE -I/root/kvm-userspace/qemu/slirp -I 
> /root/kvm-userspace/qemu/../libkvm  -I /root/kvm-userspace/libfdt -O2 -g 
> -fno-strict-aliasing -Wall -Wundef -Wendif-labels -Wwrite-strings   -I 
> /root/kvm-userspace/kernel/include -c -o ppc440_bamboo.o 
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c: In function 'bamboo_init':
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: warning: passing 
> argument 2 of 'load_uimage' from incompatible pointer type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: warning: passing 
> argument 3 of 'load_uimage' from incompatible pointer type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: error: too many 
> arguments to function 'load_uimage'

Sorry about that... I'm currently in the process of merging PowerPC KVM
support into upstream qemu, and due to this the kvm qemu fork has
broken.

> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:139: warning: passing 
> argument 1 of 'read_proc_dt_prop_cell' discards qualifiers from pointer 
> target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:140: warning: passing 
> argument 1 of 'read_proc_dt_prop_cell' discards qualifiers from pointer 
> target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:173: warning: passing 
> argument 2 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:173: warning: passing 
> argument 3 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:174: warning: passing 
> argument 2 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:174: warning: passing 
> argument 3 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:176: warning: passing 
> argument 2 of 'dt_cell_multi' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:176: warning: passing 
> argument 3 of 'dt_cell_multi' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:177: warning: passing 
> argument 2 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:177: warning: passing 
> argument 3 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:179: warning: passing 
> argument 2 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:179: warning: passing 
> argument 3 of 'dt_cell' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:180: warning: passing 
> argument 2 of 'dt_string' discards qualifiers from pointer target type
> /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:180: warning: passing 
> argument 3 of 'dt_string' discards qualifiers from pointer target type

These are annoying, but just warnings so we can ignore them for now.

I can provide you a patch to get you going again right now, but just to
clarify: do you have a 440 system you're going to be running KVM on, and
the G5 is just your build how? There currently is no KVM support for
970...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Tue, Dec 02, 2008 at 12:58:29AM +0900, FUJITA Tomonori wrote:
> On Mon, 01 Dec 2008 16:33:11 +0200
> Avi Kivity <[EMAIL PROTECTED]> wrote:
> 
> > Joerg Roedel wrote:
> > > Hmm, is there any hardware IOMMU with which we can't emulate domains by
> > > partitioning the IO address space? This concept works for GART and
> > > Calgary.
> > >
> > >   
> > 
> > Is partitioning secure?  Domain X's user could program its hardware to 
> > dma to domain Y's addresses, zapping away Domain Y's user's memory.
> 
> It can't be secure. So what's the point to emulate the domain
> partitioning in many traditional hardware IOMMUs that doesn't support
> it.

Btw, if you use the k8-agp driver the GART space is already partitioned
today. So this concept is not entirely new.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?

2008-12-01 Thread Anthony Liguori

walt wrote:

Some background for my question:  I've been trying to install
and then update Windows Vista using kvm.  Everything works great
until I use 'qemu-img commit' to apply all the Windows Updates
to my original base install of Vista.

After doing the qemu-img commit step, the backing file is now
corrupt, 100% reproducibly.  I don't have the same problem with
Windows XP, however, and I wondered if the problem is caused by
the sheer size of the commit that Vista requires.

When I install XP, then windows-update, and then qemu-img commit
the updates, I'm committing about 1GB of updates to a 3GB backing
file.

When I install Vista and then later commit the Vista updates, I'm
committing a 3GB file to a 6GB backing file, and that's when the
corruption happens every time.

So I tried an experiment with Vista -- I deliberately limit the
number of windows updates I allow at any one time, and then use
qemu-img commit after each small update.  Voila, everything now
works perfectly -- no file corruption!


We've started getting some reports of corruption on "commit" in KVM.  
There is a long standing disk corruption issue too that is very 
difficult to reproduce.  The thinking is that there is a bug somewhere 
in the qcow2 code.


Is anyone actively looking into this?

Regards,

Anthony Liguori


And that's why I suspect there is a functional limit to the size
of each commit I can do with qemu-img.

Any thoughts or possible diagnostic maneuvers to be tried?

Thanks!

(BTW, I get the same results using 32-bit linux and 64-bit linux
on the same amd64 machine, using both gcc3 and gcc4.)





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Tue, Dec 02, 2008 at 12:58:29AM +0900, FUJITA Tomonori wrote:
> On Mon, 01 Dec 2008 16:33:11 +0200
> Avi Kivity <[EMAIL PROTECTED]> wrote:
> 
> > Joerg Roedel wrote:
> > > Hmm, is there any hardware IOMMU with which we can't emulate domains by
> > > partitioning the IO address space? This concept works for GART and
> > > Calgary.
> > >
> > >   
> > 
> > Is partitioning secure?  Domain X's user could program its hardware to 
> > dma to domain Y's addresses, zapping away Domain Y's user's memory.
> 
> It can't be secure. So what's the point to emulate the domain
> partitioning in many traditional hardware IOMMUs that doesn't support
> it.

Its a generic way to make non-contiguous host memory io-contiguous. I
already pointed out some potential users for this.

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.

2008-12-01 Thread Hollis Blanchard
On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote:
> Zhang, Xiantao wrote:
> >>From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001
> > From: Xiantao Zhang <[EMAIL PROTECTED]>
> > Date: Fri, 28 Nov 2008 09:38:46 +0800
> > Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
> > 
> > Use TARGET_I386 to exclude other archs.
> > Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]>
> > ---
> >  libkvm/libkvm.c |4 ++--
> >  qemu/qemu-kvm.c |4 
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
> > index 40c95ce..851a93a 100644
> > --- a/libkvm/libkvm.c
> > +++ b/libkvm/libkvm.c
> > @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env)
> > struct kvm_run *run = kvm->run[vcpu];
> >  
> >  again:
> > -#ifdef KVM_CAP_NMI
> > +#ifdef TARGET_I386
> > push_nmi(kvm);
> >  #endif
> >  #if !defined(__s390__)
> > @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm)
> >  
> >  int kvm_inject_nmi(kvm_context_t kvm, int vcpu)
> >  {
> > -#ifdef KVM_CAP_NMI
> > +#ifdef TARGET_I386
> > return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI);
> >  #else
> > return -ENOSYS;
> > diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
> > index cf0e85d..b6c8288 100644
> > --- a/qemu/qemu-kvm.c
> > +++ b/qemu/qemu-kvm.c
> > @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque)
> >  return kvm_arch_try_push_interrupts(opaque);
> >  }
> >  
> > +#ifdef TARGET_I386
> >  static void push_nmi(void *opaque)
> >  {
> >  kvm_arch_push_nmi(opaque);
> >  }
> > +#endif
> >  
> >  static void post_kvm_run(void *opaque, void *data)
> >  {
> > @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = {
> >  .shutdown = kvm_shutdown,
> >  .io_window = kvm_io_window,
> >  .try_push_interrupts = try_push_interrupts,
> > +#ifdef TARGET_I386
> >  .push_nmi = push_nmi,
> > +#endif
> >  .post_kvm_run = post_kvm_run,
> >  .pre_kvm_run = pre_kvm_run,
> >  #ifdef TARGET_I386
> 
> This will now break when KVM_CAP_NMI is undefined, ie. when there is no
> KVM_NMI IOCTL (=> older kvm module sets).

Guys, we already have stubs for this (although they've been turned into
dead code). Jan broke IA64 and PowerPC builds when he renamed
"kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is
to update the stubs to match. That avoids all these ifdefs and
associated problems.

Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please?

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread FUJITA Tomonori
On Mon, 01 Dec 2008 16:33:11 +0200
Avi Kivity <[EMAIL PROTECTED]> wrote:

> Joerg Roedel wrote:
> > Hmm, is there any hardware IOMMU with which we can't emulate domains by
> > partitioning the IO address space? This concept works for GART and
> > Calgary.
> >
> >   
> 
> Is partitioning secure?  Domain X's user could program its hardware to 
> dma to domain Y's addresses, zapping away Domain Y's user's memory.

It can't be secure. So what's the point to emulate the domain
partitioning in many traditional hardware IOMMUs that doesn't support
it.

The emulated domain support with the DMA mapping debugging feature
might be useful to debug drivers but it doesn't mean that we need to
add the emulated domain support to every hardware IOMMU. If you add it
to swiotlb, everyone can enjoy the debugging.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:


I see no compelling reason to do cpu placement internally.  It can be 
done quite effectively externally.


Memory allocation is tough, but I don't think it's out of reach.  
Looking at the numactl man page, you can do:


numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
  Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.


Since we can already create VM's with the -mem-path argument, if you 
create a 2GB guest and want it to span two numa nodes, you could do:


numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch

And then create the VM with:

qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...

What's best about this approach, is that you get full access to what 
numactl is capable of.  Interleaving, rebalancing, etc.


It looks horribly difficult and unintuitive.  It forces you to use 
-mem-path (which is an abomination; the only reason it lives is that 
we can't allocate large pages with it).


As opposed to inventing new options for QEMU that convey all of the same 
information a slightly different way?  We're stuck with -mem-path so we 
might as well make good use of it.


The proposed syntax is:

qemu -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3

The new syntax would be:

qemu -smp 4 -numa nodes=2,cpus=1:2:3:4,mem=1G:1G -mem-path 
/dev/hugetlbfs/foo


Then you would have to look up the thread ids, and do

taskset 
taskset 
taskset 
taskset 
numactl -o 1G -l 1G -m 0 -f /dev/hugetlbfs/foo
numactl -o 1G -l 1G -m 1 -f /dev/hugetlbfs/foo

This may look like a lot more, but it's not going to be nearly enough to 
specify a NUMA placement on startup.  What if you have a very large NUMA 
system and want to rebalance virtual machines?  You need a mechanism to 
do this that now has to be exposed through the monitor.  In fact, you'll 
almost certainly introduce a taskset-like monitor command and a 
numactl-like monitor command.


Why reinvent the wheel?  Plus, taskset and numactl gives you a lot of 
flexibility.  All we're going to do by cooking this stuff into QEMU is 
artificially limit ourselves.


Regards,

Anthony LIguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Mon, Dec 01, 2008 at 04:33:11PM +0200, Avi Kivity wrote:
> Joerg Roedel wrote:
> > Hmm, is there any hardware IOMMU with which we can't emulate domains by
> > partitioning the IO address space? This concept works for GART and
> > Calgary.
> >
> >   
> 
> Is partitioning secure?  Domain X's user could program its hardware to 
> dma to domain Y's addresses, zapping away Domain Y's user's memory.

No its not secure. But this problem exists with pv-dma without iommu
too.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Avi Kivity

Anthony Liguori wrote:

Avi Kivity wrote:

Andre Przywara wrote:

Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?


Right now vcpu migration is done externally (we export the thread IDs 
so management can pin them as it wishes).  If we add numa support, I 
think it makes sense do it internally as well.  I suggest using the 
same syntax for the monitor as for the command line; that's simplest 
to learn and to implement.


I see no compelling reason to do cpu placement internally.  It can be 
done quite effectively externally.


Memory allocation is tough, but I don't think it's out of reach.  
Looking at the numactl man page, you can do:


numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
  Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.


Since we can already create VM's with the -mem-path argument, if you 
create a 2GB guest and want it to span two numa nodes, you could do:


numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch

And then create the VM with:

qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...

What's best about this approach, is that you get full access to what 
numactl is capable of.  Interleaving, rebalancing, etc.


It looks horribly difficult and unintuitive.  It forces you to use 
-mem-path (which is an abomination; the only reason it lives is that we 
can't allocate large pages with it).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Avi Kivity

Anthony Liguori wrote:

Andre Przywara wrote:

Hi,

this patch series introduces multiple NUMA nodes support within KVM 
guests.
This will improve the performance of guests which are bigger than one 
node (number of VCPUs and/or amount of memory) and also allows better 
balancing by taking better usage of each node's memory.

It also improves the one node case by pinning a guest to this node and
avoiding access of remote memory from one VCPU.


Could you please post this to qemu-devel?  There's really nothing KVM 
specific here.




It's almost useless to qemu until it can run vcpus on host threads.  I 
agree it should be posted there though.




I think the dependency on libnuma is a bad idea.  It's mixing a 
mechanism (emulating NUMA layout) with a policy (how to do memory/VCPU 
placement).


If you split the NUMA emulation bits into a separate patch series, 
that has no dependency on the host NUMA topology, I think we look at 
the existing mechanisms we have to see if they're sufficient to do 
static placement on NUMA boundaries.  vcpu pinning is easy enough, I 
think the only place we're lacking is memory layout.  Note, that's 
totally independent of the guest's NUMA characteristics though.  You 
may still want half of memory to be pinned between two nodes even if 
the guest has no SRAT tables.


You can do that easily with numactl.  Fine grained control of host numa 
layout and guest numa emulation are only useful together (one could 
argue that guest numa emulation is useful by itself, for debugging the 
guest OS numa algorithms).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Anthony Liguori

Anthony Liguori wrote:


numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch

And then create the VM with:

qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...

What's best about this approach, is that you get full access to what 
numactl is capable of.  Interleaving, rebalancing, etc.


Prefaulting, generating an error when NUMA placement can't be 
satisified, hugetlbfs support, yeah, this very much seems like the right 
thing to do to me.


If you care enough about performance to do NUMA placement, you almost 
certainly are going to be doing hugetlbfs anyway so you get it 
practically for free.


Regards,

Anthony Liguori


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


1-1 mapping of devices without VT-d

2008-12-01 Thread Passera, Pablo R
Hi everyone,
I want to assign a PCI device directly to a VM (PCI passthrough) in a 
machine that does not have VT-d. I found something related with this in a 
presentation done at the 2008 KVM Forum called 1-1 mapping and a patch for this 
at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753. I 
am wondering if this is included or are there plans to include it in the latest 
KVM version?

Thanks in advance,

Pablo Pássera

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Anthony Liguori

Avi Kivity wrote:

Andre Przywara wrote:

Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?


Right now vcpu migration is done externally (we export the thread IDs 
so management can pin them as it wishes).  If we add numa support, I 
think it makes sense do it internally as well.  I suggest using the 
same syntax for the monitor as for the command line; that's simplest 
to learn and to implement.


I see no compelling reason to do cpu placement internally.  It can be 
done quite effectively externally.


Memory allocation is tough, but I don't think it's out of reach.  
Looking at the numactl man page, you can do:


numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
  Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.


Since we can already create VM's with the -mem-path argument, if you 
create a 2GB guest and want it to span two numa nodes, you could do:


numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch

And then create the VM with:

qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...

What's best about this approach, is that you get full access to what 
numactl is capable of.  Interleaving, rebalancing, etc.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Anthony Liguori

Andre Przywara wrote:

Hi,

this patch series introduces multiple NUMA nodes support within KVM 
guests.
This will improve the performance of guests which are bigger than one 
node (number of VCPUs and/or amount of memory) and also allows better 
balancing by taking better usage of each node's memory.

It also improves the one node case by pinning a guest to this node and
avoiding access of remote memory from one VCPU.


Could you please post this to qemu-devel?  There's really nothing KVM 
specific here.



The user (or better: management application) specifies the host nodes
the guest should use: -nodes 2,3 would create a two node guest mapped to
node 2 and 3 on the host. These numbers are handed over to libnuma:
VCPUs are pinned to the nodes and the allocated guest memory is bound to
it's respective node. Since libnuma seems not to be installed
everywhere, the user has to enable this via configure --enable-numa
In the BIOS code an ACPI SRAT table was added, which describes the NUMA
topology to the guest. The number of nodes is communicated via the CMOS
RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.


I think the dependency on libnuma is a bad idea.  It's mixing a 
mechanism (emulating NUMA layout) with a policy (how to do memory/VCPU 
placement).


If you split the NUMA emulation bits into a separate patch series, that 
has no dependency on the host NUMA topology, I think we look at the 
existing mechanisms we have to see if they're sufficient to do static 
placement on NUMA boundaries.  vcpu pinning is easy enough, I think the 
only place we're lacking is memory layout.  Note, that's totally 
independent of the guest's NUMA characteristics though.  You may still 
want half of memory to be pinned between two nodes even if the guest has 
no SRAT tables.


Regards,

Anthony Liguori


To take use of the new BIOS, install the iasl compiler
(http://acpica.org/downloads/) and type "make bios" before installing,
so the default BIOS will be replaced with the modified one.
Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
parameter reverts to the old behavior.

Please apply.

Regards,
Andre.

Patch 1/3: introduce a command line parameter
Patch 2/3: allocate guests  resources from different host nodes
Patch 3/3: generate an appropriate SRAT ACPI table

Signed-off-by: Andre Przywara <[EMAIL PROTECTED]>



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Avi Kivity

Daniel P. Berrange wrote:
The only problem is the default option for the host side, as libnuma 
requires to explicitly name the nodes. Maybe make the pin: part _not_ 
optional? I would at least want to pin the memory, one could discuss 
about the VCPUs...



I think keeping it optional makes things more flexible for people
invoking KVM. If omitted, then query current CPU pinning to determine
which host NUMA nodes to allocate from. 
  


Well, -numa itself is optional.  But yes, we could use the default cpu 
affinity mask to derive the default host numa nodes.



The topology exposed to a guest  will likely be the same every time
you launch a particular VM, while the guest<-> host pinning is a 
point in time decision according to current available resources.

Thus some apps / users may find it more convenient to have a fixed set
of args they always use to invoke the KVM process, and instead control
placement during the fork/exec'ing of KVM by explicitly calling 
sched_setaffinity or using numactl to launch.  It should be easy enough

to use sched_getaffinity to query current pining and from that determine
appropriate NUMA nodes, if they leave out the pin= arg.
  


I agree, nice idea.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Daniel P. Berrange
On Mon, Dec 01, 2008 at 03:15:19PM +0100, Andre Przywara wrote:
> Avi Kivity wrote:
> >>Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
> >>parameter reverts to the old behavior.
> >
> >'-nodes' is too generic a name ('node' could also mean a host).  Suggest 
> >-numanode.
> >
> >Need more flexibility: specify the range of memory per node, which cpus 
> >are in the node, relative weights for the SRAT table:
> >
> >  -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3
> 
> I converted my code to use the new firmware interface. This also makes 
> it possible to pass more information between qemu and BIOS (which 
> prevented a more flexible command line in the first version).
> So I would opt for the following:
> - use numanode (or simply numa?) instead of the misleading -nodes
> - allow passing memory sizes, VCPU subsets and host CPU pin info
> I would prefer Daniel's version:
> -numa [,mem:[;...]]
> [,cpu:[;...]]
> [,pin:[;...]]
> 
> That would allow easy things like -numa 2 (for a two guest node), not 
> given options would result in defaults (equally split-up resources).
> 
> The only problem is the default option for the host side, as libnuma 
> requires to explicitly name the nodes. Maybe make the pin: part _not_ 
> optional? I would at least want to pin the memory, one could discuss 
> about the VCPUs...

I think keeping it optional makes things more flexible for people
invoking KVM. If omitted, then query current CPU pinning to determine
which host NUMA nodes to allocate from. 

The topology exposed to a guest  will likely be the same every time
you launch a particular VM, while the guest<-> host pinning is a 
point in time decision according to current available resources.
Thus some apps / users may find it more convenient to have a fixed set
of args they always use to invoke the KVM process, and instead control
placement during the fork/exec'ing of KVM by explicitly calling 
sched_setaffinity or using numactl to launch.  It should be easy enough
to use sched_getaffinity to query current pining and from that determine
appropriate NUMA nodes, if they leave out the pin= arg.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Avi Kivity

Joerg Roedel wrote:

Hmm, is there any hardware IOMMU with which we can't emulate domains by
partitioning the IO address space? This concept works for GART and
Calgary.

  


Is partitioning secure?  Domain X's user could program its hardware to 
dma to domain Y's addresses, zapping away Domain Y's user's memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Avi Kivity

Andre Przywara wrote:

Avi Kivity wrote:

Andre Przywara wrote:

The user (or better: management application) specifies the host nodes
the guest should use: -nodes 2,3 would create a two node guest 
mapped to

node 2 and 3 on the host. These numbers are handed over to libnuma:
VCPUs are pinned to the nodes and the allocated guest memory is 
bound to

it's respective node. Since libnuma seems not to be installed
everywhere, the user has to enable this via configure --enable-numa
In the BIOS code an ACPI SRAT table was added, which describes the NUMA
topology to the guest. The number of nodes is communicated via the CMOS
RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.


There exists now a firmware interface in qemu for this kind of 
communications.
Oh, right you are, I missed that (was well hidden). I was looking at 
how the BIOS detects memory size and CPU numbers and these methods are 
quite cumbersome. Why not convert them to the FW_CFG methods (which 
the qemu side already sets)? To not diverge too much from the original 
BOCHS BIOS?




Mostly.  Also, no one felt the urge.


Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
parameter reverts to the old behavior.


'-nodes' is too generic a name ('node' could also mean a host).  
Suggest -numanode.


Need more flexibility: specify the range of memory per node, which 
cpus are in the node, relative weights for the SRAT table:


  -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3


I converted my code to use the new firmware interface. This also makes 
it possible to pass more information between qemu and BIOS (which 
prevented a more flexible command line in the first version).

So I would opt for the following:
- use numanode (or simply numa?) instead of the misleading -nodes
- allow passing memory sizes, VCPU subsets and host CPU pin info
I would prefer Daniel's version:
-numa [,mem:[;...]]
[,cpu:[;...]]
[,pin:[;...]]

That would allow easy things like -numa 2 (for a two guest node), not 
given options would result in defaults (equally split-up resources).




Yes, that look good.

The only problem is the default option for the host side, as libnuma 
requires to explicitly name the nodes. Maybe make the pin: part _not_ 
optional? I would at least want to pin the memory, one could discuss 
about the VCPUs...




If you can bench it, that would be best.  My guess is that we would need 
to pin the vcpus.



hange host nodes dynamically:

Implementing a monitor interface is a good idea.

(qemu) numanode 1 0
Does that include page migration? That would be easily possible with 
mbind(MPOL_MF_MOVE), but would take some time and resources (which I 
think is OK if explicitly triggered in the monitor).


Yes, that's the main interest.  Allow management to load balance numa 
nodes (as Linux doesn't do so automatically for long running processes).


Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?


Right now vcpu migration is done externally (we export the thread IDs so 
management can pin them as it wishes).  If we add numa support, I think 
it makes sense do it internally as well.  I suggest using the same 
syntax for the monitor as for the command line; that's simplest to learn 
and to implement.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-userspace: fix module build with --kerneldir

2008-12-01 Thread Maik Hentsche
Please find my reworked patch attached. Support for pre-f1d28fb04
kernels was tested with 2.6.16.1. I CC-ed everyone who contributed to
this thread, thanks for your help. I hope the "bureaucracy" is correct.
I'm not a kernel developer and thus only know about the contribution
process what I found in the documentation.

so long
Maik



When kvm-userspace is build with a different kernel version than the
running kernel the depmod at the end will fail. This patch fixed the
problem.

Signed-off-by: Maik Hentsche <[EMAIL PROTECTED]>
Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>


-- 
   \   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
  System   |  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
  Center   | AMD Saxony LLC (Wilmington, Delaware, US)
   / General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe,
Thomas McCoy
diff --git a/configure b/configure
index 63f956c..97a7cb7 100755
--- a/configure
+++ b/configure
@@ -15,6 +15,12 @@ qemu_opts=()
 cross_prefix=
 arch=`uname -m`
 target_exec=
+# don't use uname if kerneldir is set
+no_uname=
+depmod_version=
+if [ -z "TMPDIR" ] ; then 
+TMPDIR=.
+fi
 
 usage() {
 cat <<-EOF
@@ -56,6 +62,7 @@ while [[ "$1" = -* ]]; do
 	;;
 	--kerneldir)
 	kerneldir="$arg"
+no_uname=1
 	;;
 	--with-patched-kernel)
 	want_module=
@@ -112,6 +119,21 @@ if [ -d "$kerneldir/include2" ]; then
 kernelsourcedir=${kerneldir%/*}/source
 fi
 
+if [ -n "$no_uname" ]; then
+if [ -e "$kerneldir/.kernelrelease" ]; then
+depmod_version=`cat "$kerneldir/.kernelrelease"`
+
+elif [ -e "$kerneldir/include/config/kernel.release" ]; then
+depmod_version=`cat "$kerneldir/include/config/kernel.release"`
+else
+echo 
+echo "Error: kernelversion not found"
+echo "Please make sure your kernel is configured"
+echo
+exit 1
+fi
+fi
+
 #configure user dir
 (cd user; ./configure --prefix="$prefix" --kerneldir="$libkvm_kerneldir" \
   --arch="$arch" --processor="$processor" \
@@ -143,6 +165,7 @@ CC=$cross_prefix$cc
 LD=$cross_prefix$ld
 OBJCOPY=$cross_prefix$objcopy
 AR=$cross_prefix$ar
+DEPMOD_VERSION=$depmod_version
 EOF
 
 cat < kernel/config.kbuild
diff --git a/kernel/Makefile b/kernel/Makefile
index 41449d6..8315e3d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -107,7 +107,7 @@ install:
 		 $(ORIGMODDIR)/arch/$(ARCH_DIR)/kvm/*.ko; do \
 		if [ -f "$$i" ]; then mv "$$i" "$$i.orig"; fi; \
 	done
-	/sbin/depmod -a
+	/sbin/depmod -a $(DEPMOD_VERSION)
 
 tmpspec = .tmp.kvm-kmod.spec
 


signature.asc
Description: PGP signature


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Mon, Dec 01, 2008 at 11:18:39PM +0900, FUJITA Tomonori wrote:
> On Mon, 1 Dec 2008 15:02:09 +0200
> Muli Ben-Yehuda <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote:
> > 
> > > > > > The majority of the names (include/linux/iommu.h, iommu.c,
> > > > > > iommu_ops, etc) looks too generic? We already have lots of
> > > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several
> > > > > > archs' iommu.c, etc). Such names are expected to be used by
> > > > > > all the IOMMUs.
> > > > > 
> > > > > The API is already useful for more than KVM. I also plan to
> > > > > extend it to support more types of IOMMUs than VT-d and AMD
> > > > > IOMMU in the future. But these changes are more intrusive than
> > > > > this patchset and need more discussion. I prefer to do small
> > > > > steps into this direction.
> > > > 
> > > > Can you be more specific? What IOMMU could use this? For example,
> > > > how GART can use this? I think that people expect the name 'struct
> > > > iommu_ops' to be an abstract for all the IOMMUs (or the majority
> > > > at least). If this works like that, the name is a good choice, I
> > > > think.
> > > 
> > > GART can't use exactly this. But with some extensions we can make it
> > > useful for GART and GART-like IOMMUs too. For example we can emulate
> > > domains in GART by partitioning the GART aperture space.
> > 
> > That would only work with a pvdma API, since GART doesn't support
> > multiple address spaces, and you don't get the isolation properties of
> > a real IOMMU, so... why would you want to do that?
> 
> If this works for only IOMMUs that support kinda domain concept, then
> I think that a name like iommu_domain_ops is more appropriate.

Hmm, is there any hardware IOMMU with which we can't emulate domains by
partitioning the IO address space? This concept works for GART and
Calgary.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread FUJITA Tomonori
On Mon, 1 Dec 2008 15:02:09 +0200
Muli Ben-Yehuda <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote:
> 
> > > > > The majority of the names (include/linux/iommu.h, iommu.c,
> > > > > iommu_ops, etc) looks too generic? We already have lots of
> > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several
> > > > > archs' iommu.c, etc). Such names are expected to be used by
> > > > > all the IOMMUs.
> > > > 
> > > > The API is already useful for more than KVM. I also plan to
> > > > extend it to support more types of IOMMUs than VT-d and AMD
> > > > IOMMU in the future. But these changes are more intrusive than
> > > > this patchset and need more discussion. I prefer to do small
> > > > steps into this direction.
> > > 
> > > Can you be more specific? What IOMMU could use this? For example,
> > > how GART can use this? I think that people expect the name 'struct
> > > iommu_ops' to be an abstract for all the IOMMUs (or the majority
> > > at least). If this works like that, the name is a good choice, I
> > > think.
> > 
> > GART can't use exactly this. But with some extensions we can make it
> > useful for GART and GART-like IOMMUs too. For example we can emulate
> > domains in GART by partitioning the GART aperture space.
> 
> That would only work with a pvdma API, since GART doesn't support
> multiple address spaces, and you don't get the isolation properties of
> a real IOMMU, so... why would you want to do that?

If this works for only IOMMUs that support kinda domain concept, then
I think that a name like iommu_domain_ops is more appropriate.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

2008-12-01 Thread Andre Przywara

Avi Kivity wrote:

Andre Przywara wrote:

The user (or better: management application) specifies the host nodes
the guest should use: -nodes 2,3 would create a two node guest mapped to
node 2 and 3 on the host. These numbers are handed over to libnuma:
VCPUs are pinned to the nodes and the allocated guest memory is bound to
it's respective node. Since libnuma seems not to be installed
everywhere, the user has to enable this via configure --enable-numa
In the BIOS code an ACPI SRAT table was added, which describes the NUMA
topology to the guest. The number of nodes is communicated via the CMOS
RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.


There exists now a firmware interface in qemu for this kind of 
communications.
Oh, right you are, I missed that (was well hidden). I was looking at how 
the BIOS detects memory size and CPU numbers and these methods are quite 
cumbersome. Why not convert them to the FW_CFG methods (which the qemu 
side already sets)? To not diverge too much from the original BOCHS BIOS?



Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
parameter reverts to the old behavior.


'-nodes' is too generic a name ('node' could also mean a host).  Suggest 
-numanode.


Need more flexibility: specify the range of memory per node, which cpus 
are in the node, relative weights for the SRAT table:


  -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3


I converted my code to use the new firmware interface. This also makes 
it possible to pass more information between qemu and BIOS (which 
prevented a more flexible command line in the first version).

So I would opt for the following:
- use numanode (or simply numa?) instead of the misleading -nodes
- allow passing memory sizes, VCPU subsets and host CPU pin info
I would prefer Daniel's version:
-numa [,mem:[;...]]
[,cpu:[;...]]
[,pin:[;...]]

That would allow easy things like -numa 2 (for a two guest node), not 
given options would result in defaults (equally split-up resources).


The only problem is the default option for the host side, as libnuma 
requires to explicitly name the nodes. Maybe make the pin: part _not_ 
optional? I would at least want to pin the memory, one could discuss 
about the VCPUs...




Also need a monitor command to change host nodes dynamically:

Implementing a monitor interface is a good idea.

(qemu) numanode 1 0
Does that include page migration? That would be easily possible with 
mbind(MPOL_MF_MOVE), but would take some time and resources (which I 
think is OK if explicitly triggered in the monitor).
Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Mon, Dec 01, 2008 at 03:02:09PM +0200, Muli Ben-Yehuda wrote:
> On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote:
> 
> > > > > The majority of the names (include/linux/iommu.h, iommu.c,
> > > > > iommu_ops, etc) looks too generic? We already have lots of
> > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several
> > > > > archs' iommu.c, etc). Such names are expected to be used by
> > > > > all the IOMMUs.
> > > > 
> > > > The API is already useful for more than KVM. I also plan to
> > > > extend it to support more types of IOMMUs than VT-d and AMD
> > > > IOMMU in the future. But these changes are more intrusive than
> > > > this patchset and need more discussion. I prefer to do small
> > > > steps into this direction.
> > > 
> > > Can you be more specific? What IOMMU could use this? For example,
> > > how GART can use this? I think that people expect the name 'struct
> > > iommu_ops' to be an abstract for all the IOMMUs (or the majority
> > > at least). If this works like that, the name is a good choice, I
> > > think.
> > 
> > GART can't use exactly this. But with some extensions we can make it
> > useful for GART and GART-like IOMMUs too. For example we can emulate
> > domains in GART by partitioning the GART aperture space.
> 
> That would only work with a pvdma API, since GART doesn't support
> multiple address spaces, and you don't get the isolation properties of
> a real IOMMU, so... why would you want to do that?

Yes, this can not be used for not-pv device passthrough. But I think it
can speed up the pvdma case. Beside that I can be used for UIO and
devices which perform bad with sg.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM

2008-12-01 Thread Joerg Roedel
Ok, I got them to apply. I also did the checkpatch cleanups. To speed
things up a bit I would suggest that I rebase my patchset on your
patches and send it out in a single series. Any problems with this
approach?

Joerg

On Mon, Dec 01, 2008 at 09:22:42PM +0800, Han, Weidong wrote:
> Sorry, this patch has style problem. I will update it and also split it to 
> smaller patches for easy reviewing.
> 
> Regards,
> Weidong
> 
> 'Joerg Roedel' wrote:
> > Hmm, I get these errors using git-am:
> > 
> > Applying VT-d: Support multiple device assignment for KVM
> > .dotest/patch:1344: space before tab in indent.
> > clflush_cache_range(addr, size);
> > .dotest/patch:1350: space before tab in indent.
> > clflush_cache_range(addr, size);
> > .dotest/patch:1907: trailing whitespace.
> > 
> > .dotest/patch:1946: trailing whitespace.
> >  * owned by this domain, clear this iommu in iommu_bmp
> > .dotest/patch:2300: trailing whitespace.
> > 
> > error: patch failed: drivers/pci/dmar.c:484
> > error: drivers/pci/dmar.c: patch does not apply
> > error: patch failed: drivers/pci/intel-iommu.c:50
> > error: drivers/pci/intel-iommu.c: patch does not apply
> > error: patch failed: include/linux/dma_remapping.h:111
> > error: include/linux/dma_remapping.h: patch does not apply
> > error: patch failed: include/linux/intel-iommu.h:219
> > error: include/linux/intel-iommu.h: patch does not apply
> > Patch failed at 0001.
> > 
> > Joerg
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: don't fee an unallocated irq source id

2008-12-01 Thread Mark McLoughlin
Set assigned_dev->irq_source_id to -1 so that we can avoid freeing
a source ID which we never allocated.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
---
 virt/kvm/kvm_main.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8dab7ce..63fd882 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,7 +210,10 @@ static void kvm_free_assigned_device(struct kvm *kvm,
pci_disable_msi(assigned_dev->dev);
 
kvm_unregister_irq_ack_notifier(&assigned_dev->ack_notifier);
-   kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id);
+
+   if (assigned_dev->irq_source_id != -1)
+   kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id);
+   assigned_dev->irq_source_id = -1;
 
if (cancel_work_sync(&assigned_dev->interrupt_work))
/* We had pending work. That means we will have to take
@@ -466,7 +469,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
match->host_busnr = assigned_dev->busnr;
match->host_devfn = assigned_dev->devfn;
match->dev = dev;
-
+   match->irq_source_id = -1;
match->kvm = kvm;
 
list_add(&match->list, &kvm->arch.assigned_dev_head);
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: make kvm_unregister_irq_ack_notifier() safe

2008-12-01 Thread Mark McLoughlin
We never pass a NULL notifier pointer here, but we may well
pass a notifier struct which hasn't previously been
registered.

Guard against this by using hlist_del_init() which will
not do anything if the node hasn't been added to the list
and, when removing the node, will ensure that a subsequent
call to hlist_del_init() will be fine too.

Fixes an oops seen when an assigned device is freed before
and IRQ is assigned to it.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
---
 virt/kvm/irq_comm.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 973df99..db75045 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -63,9 +63,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 
 void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian)
 {
-   if (!kian)
-   return;
-   hlist_del(&kian->link);
+   hlist_del_init(&kian->link);
 }
 
 /* The caller must hold kvm->lock mutex */
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: remove the IRQ ACK notifier assertions

2008-12-01 Thread Mark McLoughlin
We will obviously never pass a NULL struct kvm_irq_ack_notifier* to
this functions. They are always embedded in the assigned device
structure, so the assertion add nothing.

The irqchip_in_kernel() assertion is very out of place - clearly
this little abstraction needs to know nothing about the upper
layer details.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
---
 virt/kvm/irq_comm.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 9fbbdea..973df99 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -58,9 +58,6 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian)
 {
-   /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */
-   ASSERT(irqchip_in_kernel(kvm));
-   ASSERT(kian);
hlist_add_head(&kian->link, &kvm->arch.irq_ack_notifier_list);
 }
 
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: add KVM_USERSPACE_IRQ_SOURCE_ID assertions

2008-12-01 Thread Mark McLoughlin
Make sure kvm_request_irq_source_id() never returns
KVM_USERSPACE_IRQ_SOURCE_ID.

Likewise, check that kvm_free_irq_source_id() never accepts
KVM_USERSPACE_IRQ_SOURCE_ID.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
---
 virt/kvm/irq_comm.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index db75045..aa5d1e5 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -72,11 +72,15 @@ int kvm_request_irq_source_id(struct kvm *kvm)
unsigned long *bitmap = &kvm->arch.irq_sources_bitmap;
int irq_source_id = find_first_zero_bit(bitmap,
sizeof(kvm->arch.irq_sources_bitmap));
+
if (irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) {
printk(KERN_WARNING "kvm: exhaust allocatable IRQ sources!\n");
-   irq_source_id = -EFAULT;
-   } else
-   set_bit(irq_source_id, bitmap);
+   return -EFAULT;
+   }
+
+   ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
+   set_bit(irq_source_id, bitmap);
+
return irq_source_id;
 }
 
@@ -84,7 +88,9 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
irq_source_id)
 {
int i;
 
-   if (irq_source_id <= 0 ||
+   ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
+
+   if (irq_source_id < 0 ||
irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) {
printk(KERN_ERR "kvm: IRQ source ID out of range!\n");
return;
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: split out kvm_free_assigned_irq()

2008-12-01 Thread Mark McLoughlin
Split out the logic corresponding to undoing assign_irq() and
clean it up a bit.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
---
 virt/kvm/kvm_main.c |   29 ++---
 1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 63fd882..e41d39d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -200,14 +200,11 @@ static void kvm_assigned_dev_ack_irq(struct 
kvm_irq_ack_notifier *kian)
enable_irq(dev->host_irq);
 }
 
-static void kvm_free_assigned_device(struct kvm *kvm,
-struct kvm_assigned_dev_kernel
-*assigned_dev)
+static void kvm_free_assigned_irq(struct kvm *kvm,
+ struct kvm_assigned_dev_kernel *assigned_dev)
 {
-   if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested_type)
-   free_irq(assigned_dev->host_irq, (void *)assigned_dev);
-   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
-   pci_disable_msi(assigned_dev->dev);
+   if (!irqchip_in_kernel(kvm))
+   return;
 
kvm_unregister_irq_ack_notifier(&assigned_dev->ack_notifier);
 
@@ -215,12 +212,30 @@ static void kvm_free_assigned_device(struct kvm *kvm,
kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id);
assigned_dev->irq_source_id = -1;
 
+   if (!assigned_dev->irq_requested_type)
+   return;
+
if (cancel_work_sync(&assigned_dev->interrupt_work))
/* We had pending work. That means we will have to take
 * care of kvm_put_kvm.
 */
kvm_put_kvm(kvm);
 
+   free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+
+   if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
+   pci_disable_msi(assigned_dev->dev);
+
+   assigned_dev->irq_requested_type = 0;
+}
+
+
+static void kvm_free_assigned_device(struct kvm *kvm,
+struct kvm_assigned_dev_kernel
+*assigned_dev)
+{
+   kvm_free_assigned_irq(kvm, assigned_dev);
+
pci_reset_function(assigned_dev->dev);
 
pci_release_regions(assigned_dev->dev);
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: gracefully handle zero in kvm_free_irq_source_id()

2008-12-01 Thread Mark McLoughlin
On Sun, 2008-11-30 at 12:28 +0200, Avi Kivity wrote:
> Mark McLoughlin wrote:
> > Allow kvm_free_irq_source_id() to be called with a zero ID.
> >
> > Zero is reserved for KVM_USERSPACE_IRQ_SOURCE_ID, so we can
> > guarantee that kvm_request_irq_source_id() will never return
> > zero and use zero to indicate "no source ID allocated".
> >
> >   
> 
> Zero is a legal value for irq source ids, overloading it as something 
> else is confusing.

Fair enough; I choose zero because it's naturally initialised to that by
the kzalloc(). But I prefer explicit initialisation anyway, so ...

> Things should continue to work if we #define it to 17.

Okay, let's try with -1 then.

> > +   ASSERT(irq_source_id != 0); /* KVM_USERSPACE_IRQ_SOURCE_ID reserved */
> >   
> 
> Why not replace 0 with the actual symbolic constant?

Because I was giving 0 two meanings :-)

Respin of the patches coming up.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


error "could not open disk image" and snapshot=on (if off it works)

2008-12-01 Thread paolo pedaletti
Ciao,
I have this strange problem:
ubuntu 8.10, kvm 72 2.6.27-7-server x86_64 GNU/Linux

vdeq kvm -name proxy_UBUNTU_8.04 \
-net nic,macaddr=00:16:3e:00:a0:00-net
tap,ifname=tap1,script=no,downscript=no \
-net nic,macaddr=00:16:3e:00:a1:01,vlan=1 -net
vde,vlan=1,sock=/var/run/vde2/tun1.ctl \
-drive
file=./ubuntu-server-8.04_proxy.root,if=scsi,index=0,snapshot=off,cache=on,boot=on
\
-drive
file=./ubuntu-server-8.04_proxy.home,if=scsi,index=1,snapshot=off,cache=on \
-drive file=./linux.swap,if=scsi,index=2,cache=on,snapshot=on \
-smp 1 -M pc -cpu pentium3 -m 512 -k en-us -localtime

qemu: could not open disk image ./linux.swap

but
$ ls -l linux.swap
-rw-rw-r-- 1 paolop virtual 1073741824 2008-12-01 13:21 linux.swap

exist

If I set "snapshot=off" on linux.swap, kvm boot without problems.
(the same happened with other file-device)
So it's seems a "wrong error message", it's not a filesystem problem but
an option problem
(snapshot=on doesn't work, snapshot=off works)

Any suggestion?
thank you.

-- 
Paolo Pedaletti

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM

2008-12-01 Thread Han, Weidong
Sorry, this patch has style problem. I will update it and also split it to 
smaller patches for easy reviewing.

Regards,
Weidong

'Joerg Roedel' wrote:
> Hmm, I get these errors using git-am:
> 
> Applying VT-d: Support multiple device assignment for KVM
> .dotest/patch:1344: space before tab in indent.
> clflush_cache_range(addr, size);
> .dotest/patch:1350: space before tab in indent.
> clflush_cache_range(addr, size);
> .dotest/patch:1907: trailing whitespace.
> 
> .dotest/patch:1946: trailing whitespace.
>  * owned by this domain, clear this iommu in iommu_bmp
> .dotest/patch:2300: trailing whitespace.
> 
> error: patch failed: drivers/pci/dmar.c:484
> error: drivers/pci/dmar.c: patch does not apply
> error: patch failed: drivers/pci/intel-iommu.c:50
> error: drivers/pci/intel-iommu.c: patch does not apply
> error: patch failed: include/linux/dma_remapping.h:111
> error: include/linux/dma_remapping.h: patch does not apply
> error: patch failed: include/linux/intel-iommu.h:219
> error: include/linux/intel-iommu.h: patch does not apply
> Patch failed at 0001.
> 
> Joerg
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Muli Ben-Yehuda
On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote:

> > > > The majority of the names (include/linux/iommu.h, iommu.c,
> > > > iommu_ops, etc) looks too generic? We already have lots of
> > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several
> > > > archs' iommu.c, etc). Such names are expected to be used by
> > > > all the IOMMUs.
> > > 
> > > The API is already useful for more than KVM. I also plan to
> > > extend it to support more types of IOMMUs than VT-d and AMD
> > > IOMMU in the future. But these changes are more intrusive than
> > > this patchset and need more discussion. I prefer to do small
> > > steps into this direction.
> > 
> > Can you be more specific? What IOMMU could use this? For example,
> > how GART can use this? I think that people expect the name 'struct
> > iommu_ops' to be an abstract for all the IOMMUs (or the majority
> > at least). If this works like that, the name is a good choice, I
> > think.
> 
> GART can't use exactly this. But with some extensions we can make it
> useful for GART and GART-like IOMMUs too. For example we can emulate
> domains in GART by partitioning the GART aperture space.

That would only work with a pvdma API, since GART doesn't support
multiple address spaces, and you don't get the isolation properties of
a real IOMMU, so... why would you want to do that?

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
   <->
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SR-IOV driver example 0/3] introduction

2008-12-01 Thread Yu Zhao
On Thu, Nov 27, 2008 at 12:59:33AM +0800, Greg KH wrote:
> On Wed, Nov 26, 2008 at 10:03:03PM +0800, Yu Zhao wrote:
> > SR-IOV drivers of Intel 82576 NIC are available. There are two parts
> > of the drivers: Physical Function driver and Virtual Function driver.
> > The PF driver is based on the IGB driver and is used to control PF to
> > allocate hardware specific resources and interface with the SR-IOV core.
> > The VF driver is a new NIC driver that is same as the traditional PCI
> > device driver. It works in both the host and the guest (Xen and KVM)
> > environment.
> > 
> > These two drivers are testing versions and they are *only* intended to
> > show how to use SR-IOV API.
> 
> That's funny, as some distros are already shipping this driver.  You
> might want to tell them that this is an "example only" driver and not to
> be used "for real"... :(

Maybe they are shipping another version, not this one. This one is really
a experimental patch, it's just created a week before...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core

2008-12-01 Thread Yu Zhao
On Thu, Nov 27, 2008 at 01:54:27AM +0800, Chris Wright wrote:
> * Greg KH ([EMAIL PROTECTED]) wrote:
> > > +static   int
> > > +igb_virtual(struct pci_dev *pdev, int nr_virtfn)
> > > +{
> > > + unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF};
> > > + struct net_device *netdev = pci_get_drvdata(pdev);
> > > + struct igb_adapter *adapter = netdev_priv(netdev);
> > > + int i;
> > > +
> > > + if (nr_virtfn > 7)
> > > + return -EINVAL;
> > 
> > Why the check for 7?  Is that the max virtual functions for this card?
> > Shouldn't that be a define somewhere so it's easier to fix in future
> > versions of this hardware?  :)
> 
> IIRC it's 8 for the card, 1 reserved for PF.  I think both notions
> should be captured w/ commented constants.

You remember correctly! I'll put some comments there as suggested.

Thanks,
Yu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core

2008-12-01 Thread Yu Zhao
On Thu, Nov 27, 2008 at 12:58:59AM +0800, Greg KH wrote:
> On Wed, Nov 26, 2008 at 10:21:56PM +0800, Yu Zhao wrote:
> > +   my_mac_addr[5] = (unsigned char)i;
> > +   igb_set_vf_mac(netdev, i, my_mac_addr);
> > +   igb_set_vf_vmolr(adapter, i);
> > +   }
> > +   } else
> > +   printk(KERN_INFO "SR-IOV is disabled\n");
> 
> Is that really true?  (oh, use dev_info as well.)  What happens if you
> had called this with "5" and then later with "0", you never destroyed
> those existing virtual functions, yet the code does:
> 
> > +   adapter->vfs_allocated_count = nr_virtfn;
> 
> Which makes the driver think they are not present.  What happens when
> the driver later goes to shut down?  Are those resources freed up
> properly?

For now we hard-code the tx/rx queues allocation so this doesn't
matter. Eventually this will become dynamic allocation: when number
of VFs changes the corresponding resources need to be freed.

I'll put more comments here.

Thanks,
Yu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SR-IOV driver example 0/3] introduction

2008-12-01 Thread Yu Zhao
On Thu, Nov 27, 2008 at 04:14:48AM +0800, Jeff Garzik wrote:
> Yu Zhao wrote:
> > SR-IOV drivers of Intel 82576 NIC are available. There are two parts
> > of the drivers: Physical Function driver and Virtual Function driver.
> > The PF driver is based on the IGB driver and is used to control PF to
> > allocate hardware specific resources and interface with the SR-IOV core.
> > The VF driver is a new NIC driver that is same as the traditional PCI
> > device driver. It works in both the host and the guest (Xen and KVM)
> > environment.
> > 
> > These two drivers are testing versions and they are *only* intended to
> > show how to use SR-IOV API.
> > 
> > Intel 82576 NIC specification can be found at:
> > http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf
> > 
> > [SR-IOV driver example 1/3] PF driver: allocate hardware specific resource
> > [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core
> > [SR-IOV driver example 3/3] VF driver tar ball
> 
> Please copy [EMAIL PROTECTED] on all network-related patches.  This 
> is where the network developers live, and all patches on this list are 
> automatically archived for review and handling at 
> http://patchwork.ozlabs.org/project/netdev/list/

Will do.

Thanks,
Yu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread Joerg Roedel
On Mon, Dec 01, 2008 at 05:38:11PM +0900, FUJITA Tomonori wrote:
> On Fri, 28 Nov 2008 12:31:29 +0100
> Joerg Roedel <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, Nov 28, 2008 at 06:40:41PM +0900, FUJITA Tomonori wrote:
> > > On Thu, 27 Nov 2008 16:40:48 +0100
> > > Joerg Roedel <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
> > > > ---
> > > >  drivers/base/iommu.c |   94 
> > > > ++
> > > >  1 files changed, 94 insertions(+), 0 deletions(-)
> > > >  create mode 100644 drivers/base/iommu.c
> > > > 
> > > > diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
> > > > new file mode 100644
> > > > index 000..7250b9c
> > > > --- /dev/null
> > > > +++ b/drivers/base/iommu.c
> > > 
> > > Hmm, why is this at drivers/base/? Anyone except for kvm could use
> > > this? If so, under virt/ is more appropriate?
> > 
> > I don't see a reason why this should be KVM specific. KVM is the only
> > user for now. But it can be used for i.e. UIO too. Or in drivers to
> > speed up devices which have bad performance when they do scather gather
> > IO.
> 
> If there are some except for kvm that could use this, it should be
> fine, I guess.
> 
> Can you add such information (e.g. who could use this) to the patch
> description? It should be in the git log if the patch is merged.

Ok, I will add it.

> > > The majority of the names (include/linux/iommu.h, iommu.c, iommu_ops,
> > > etc) looks too generic? We already have lots of similar things
> > > (e.g. arch/{x86,ia64}/asm/iommu.h, several archs' iommu.c, etc). Such
> > > names are expected to be used by all the IOMMUs.
> > 
> > The API is already useful for more than KVM. I also plan to extend it to
> > support more types of IOMMUs than VT-d and AMD IOMMU in the future. But
> > these changes are more intrusive than this patchset and need more
> > discussion. I prefer to do small steps into this direction.
> 
> Can you be more specific? What IOMMU could use this? For example, how
> GART can use this? I think that people expect the name 'struct
> iommu_ops' to be an abstract for all the IOMMUs (or the majority at
> least). If this works like that, the name is a good choice, I think.

GART can't use exactly this. But with some extensions we can make it
useful for GART and GART-like IOMMUs too. For example we can emulate
domains in GART by partitioning the GART aperture space.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM

2008-12-01 Thread 'Joerg Roedel'
Hmm, I get these errors using git-am:

Applying VT-d: Support multiple device assignment for KVM
.dotest/patch:1344: space before tab in indent.
clflush_cache_range(addr, size);
.dotest/patch:1350: space before tab in indent.
clflush_cache_range(addr, size);
.dotest/patch:1907: trailing whitespace.

.dotest/patch:1946: trailing whitespace.
 * owned by this domain, clear this iommu in iommu_bmp 
.dotest/patch:2300: trailing whitespace.

error: patch failed: drivers/pci/dmar.c:484
error: drivers/pci/dmar.c: patch does not apply
error: patch failed: drivers/pci/intel-iommu.c:50
error: drivers/pci/intel-iommu.c: patch does not apply
error: patch failed: include/linux/dma_remapping.h:111
error: include/linux/dma_remapping.h: patch does not apply
error: patch failed: include/linux/intel-iommu.h:219
error: include/linux/intel-iommu.h: patch does not apply
Patch failed at 0001.

Joerg

On Mon, Dec 01, 2008 at 02:17:38PM +0800, Han, Weidong wrote:
> It's developed based on commit 0f7d3ee6 on avi/master, but it still can be 
> applied on latest avi/master (commit 90755652).
> 
> Regards,
> Weidong
> 
> Joerg Roedel wrote:
> > Hmm, I tried to apply this patch against avi/master and linus/master
> > but get merge conflicts. Where do these patches apply cleanly?
> >
> > Joerg
> >
> > On Thu, Nov 27, 2008 at 09:49:04PM +0800, Han, Weidong wrote:
> >> In order to support multiple device assignment for KVM, this patch
> >> does following main changes:
> >>- extend dmar_domain to own multiple devices from different
> >> iommus, use a bitmap of iommus to replace iommu pointer in
> >> dmar_domain.
> >>- implement independent low level functions for kvm, then won't
> >> impact native VT-d.
> >>- "SAGAW" capability may be different across iommus, that's to
> >> say the VT-d page table levels may be different among iommus. This
> >> patch uses a defaut agaw, and skip top levels of page tables for
> >> iommus which have smaller agaw than default.
> >>- rename the APIs for kvm VT-d, make it more readable.
> >>
> >>
> >> Signed-off-by: Weidong Han <[EMAIL PROTECTED]>
> >> ---
> >>  drivers/pci/dmar.c|   15 +
> >>  drivers/pci/intel-iommu.c |  698
> >>  ++--
> >>  include/linux/dma_remapping.h |   21 +- include/linux/intel-iommu.h
> >>  |   21 +- 4 files changed, 637 insertions(+), 118 deletions(-)
> >>
> >> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
> >> index 691b3ad..d6bdced 100644
> >> --- a/drivers/pci/dmar.c
> >> +++ b/drivers/pci/dmar.c
> >> @@ -484,6 +484,7 @@ void __init detect_intel_iommu(void)
> >>  dmar_tbl = NULL; }
> >>
> >> +extern int width_to_agaw(int width);
> >>
> >>  int alloc_iommu(struct dmar_drhd_unit *drhd)
> >>  {
> >> @@ -491,6 +492,8 @@ int alloc_iommu(struct dmar_drhd_unit *drhd)
> >> int map_size; u32 ver;
> >> static int iommu_allocated = 0;
> >> +   unsigned long sagaw;
> >> +   int agaw;
> >>
> >> iommu = kzalloc(sizeof(*iommu), GFP_KERNEL); if
> >> (!iommu) @@ -506,6 +509,18 @@ int alloc_iommu(struct dmar_drhd_unit
> >> *drhd) iommu->cap = dmar_readq(iommu->reg + DMAR_CAP_REG);
> >> iommu->ecap = dmar_readq(iommu->reg + DMAR_ECAP_REG);
> >>
> >> +   /* set agaw, "SAGAW" may be different across iommus */
> >> +   sagaw = cap_sagaw(iommu->cap);
> >> +   for (agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
> >> +agaw >= 0; agaw--)
> >> +   if (test_bit(agaw, &sagaw))
> >> +   break;
> >> +   if (agaw < 0) {
> >> +   printk(KERN_ERR "IOMMU: unsupported sagaw %lx\n",
> >> sagaw); +   goto error; +   }
> >> +   iommu->agaw = agaw;
> >> +
> >> /* the registers might be more than one page */
> >> map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> >> cap_max_fault_reg_offset(iommu->cap));
> >> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> >> index 5c8baa4..55b96c4 100644
> >> --- a/drivers/pci/intel-iommu.c
> >> +++ b/drivers/pci/intel-iommu.c
> >> @@ -50,8 +50,6 @@
> >>  #define IOAPIC_RANGE_END   (0xfeef)
> >>  #define IOVA_START_ADDR(0x1000)
> >>
> >> -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
> >> -
> >>  #define DOMAIN_MAX_ADDR(gaw) u64)1) << gaw) - 1)
> >>
> >>
> >> @@ -64,6 +62,7 @@ struct deferred_flush_tables {
> >> int next;
> >> struct iova *iova[HIGH_WATER_MARK];
> >> struct dmar_domain *domain[HIGH_WATER_MARK];
> >> +   struct intel_iommu *iommu;
> >>  };
> >>
> >>  static struct deferred_flush_tables *deferred_flush;
> >> @@ -184,6 +183,69 @@ void free_iova_mem(struct iova *iova)
> >> kmem_cache_free(iommu_iova_cache, iova);
> >>  }
> >>
> >> +/* in native case, each domain is related to only one iommu */
> >> +static struct intel_iommu *domain_get_only_iommu(struct dmar_dom

Re: [PATCH v2]: check for fops->owner in anon_inode_getfd

2008-12-01 Thread Christian Borntraeger
Am Donnerstag, 27. November 2008 schrieb Davide Libenzi:
> > ===
> > --- kvm.orig/fs/anon_inodes.c
> > +++ kvm/fs/anon_inodes.c
> > @@ -79,9 +79,12 @@ int anon_inode_getfd(const char *name, c
> > if (IS_ERR(anon_inode_inode))
> > return -ENODEV;
> >  
> > +   if (fops->owner && !try_module_get(fops->owner))
> > +   return -ENOENT;
> > +
> > error = get_unused_fd_flags(flags);
> > if (error < 0)
> > -   return error;
> > +   goto err_module;
> > fd = error;
> >  
> > /*
> > @@ -128,6 +131,8 @@ err_dput:
> > dput(dentry);
> >  err_put_unused_fd:
> > put_unused_fd(fd);
> > +err_module:
> > +   module_put(fops->owner);
> > return error;
> >  }
> >  EXPORT_SYMBOL_GPL(anon_inode_getfd);
> 
> Looks OK to me.


Ok. Thanks. I will push this to Avi. 
Can I add a
Reviewed-by: Davide Libenzi <[EMAIL PROTECTED]>
to the patch?

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] add frontend implementation for the IOMMU API

2008-12-01 Thread FUJITA Tomonori
On Fri, 28 Nov 2008 12:31:29 +0100
Joerg Roedel <[EMAIL PROTECTED]> wrote:

> On Fri, Nov 28, 2008 at 06:40:41PM +0900, FUJITA Tomonori wrote:
> > On Thu, 27 Nov 2008 16:40:48 +0100
> > Joerg Roedel <[EMAIL PROTECTED]> wrote:
> > 
> > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
> > > ---
> > >  drivers/base/iommu.c |   94 
> > > ++
> > >  1 files changed, 94 insertions(+), 0 deletions(-)
> > >  create mode 100644 drivers/base/iommu.c
> > > 
> > > diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
> > > new file mode 100644
> > > index 000..7250b9c
> > > --- /dev/null
> > > +++ b/drivers/base/iommu.c
> > 
> > Hmm, why is this at drivers/base/? Anyone except for kvm could use
> > this? If so, under virt/ is more appropriate?
> 
> I don't see a reason why this should be KVM specific. KVM is the only
> user for now. But it can be used for i.e. UIO too. Or in drivers to
> speed up devices which have bad performance when they do scather gather
> IO.

If there are some except for kvm that could use this, it should be
fine, I guess.

Can you add such information (e.g. who could use this) to the patch
description? It should be in the git log if the patch is merged.


> > The majority of the names (include/linux/iommu.h, iommu.c, iommu_ops,
> > etc) looks too generic? We already have lots of similar things
> > (e.g. arch/{x86,ia64}/asm/iommu.h, several archs' iommu.c, etc). Such
> > names are expected to be used by all the IOMMUs.
> 
> The API is already useful for more than KVM. I also plan to extend it to
> support more types of IOMMUs than VT-d and AMD IOMMU in the future. But
> these changes are more intrusive than this patchset and need more
> discussion. I prefer to do small steps into this direction.

Can you be more specific? What IOMMU could use this? For example, how
GART can use this? I think that people expect the name 'struct
iommu_ops' to be an abstract for all the IOMMUs (or the majority at
least). If this works like that, the name is a good choice, I think.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html