[SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core
This patch integrates the IGB driver with the SR-IOV core. It shows how the SR-IOV API is used to support the capability. Obviously people does not need to put much effort to integrate the PF driver with SR-IOV core. All SR-IOV standard stuff are handled by SR-IOV core and PF driver only concerns the device specific resource allocation and deallocation once it gets the necessary information (i.e. number of Virtual Functions) from the callback function. From: Intel Corporation, LAN Access Division <[EMAIL PROTECTED]> Signed-off-by: Yu Zhao <[EMAIL PROTECTED]> --- drivers/net/igb/igb_main.c | 46 1 files changed, 46 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index f0361ef..78bda11 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -138,6 +138,7 @@ void igb_set_mc_list_pools(struct igb_adapter *, struct e1000_hw *, int, u16); static int igb_vmm_control(struct igb_adapter *, bool); static int igb_set_vf_mac(struct net_device *, int, u8*); static void igb_mbox_handler(struct igb_adapter *); +static int igb_virtual(struct pci_dev *, int); static int igb_suspend(struct pci_dev *, pm_message_t); #ifdef CONFIG_PM @@ -182,6 +183,7 @@ static struct pci_driver igb_driver = { #endif .shutdown = igb_shutdown, .err_handler = &igb_err_handler, + .virtual = igb_virtual }; static int global_quad_port_a; /* global quad port a indication */ @@ -5066,4 +5068,48 @@ void igb_set_mc_list_pools(struct igb_adapter *adapter, wr32(E1000_VMOLR(pool), reg_data); } +static int +igb_virtual(struct pci_dev *pdev, int nr_virtfn) +{ + int i; + struct net_device *netdev = pci_get_drvdata(pdev); + struct igb_adapter *adapter = netdev_priv(netdev); + /* the VFs' MAC addresses are hard-coded */ + unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF}; + + /* +* the 82576 NIC supports 1-PF NIC + 7-VF NICs mode and 8-VF NICs +* mode. In the 8-VF NICs mode, the PF can't tx/rx packets -- it +* only behaves as 'VF supervisor'. For now we use the 1-PF NIC + +* 7-VF NICs mode to preserve PF's tx/rx capability for the debug +* purpose. +*/ + if (nr_virtfn > (MAX_NUM_VFS - 1)) + return -EINVAL; + + if (nr_virtfn) { + dev_info(&pdev->dev, "SR-IOV is enabled\n"); + /* +* Currently VFs resources are pre-allocated, so just set +* the MAC addresses of each VF here. +*/ + for (i = 0; i < nr_virtfn; i++) { + my_mac_addr[5] = (unsigned char)i; + igb_set_vf_mac(netdev, i, my_mac_addr); + igb_set_vf_vmolr(adapter, i); + } + } else { + /* +* Since we statically allocate tx/rx queues for the PF +* and the VFs, so we don't need to free any VF related +* resources here. +*/ + dev_info(&pdev->dev, "SR-IOV is disabled\n"); + } + + adapter->vfs_allocated_count = nr_virtfn; + + return 0; +} + /* igb_main.c */ -- 1.5.6.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SR-IOV driver example 1/3 resend] PF driver: hardware specific operations
This patch makes the IGB driver allocate hardware resource (rx/tx queues) for Virtual Functions. All operations in this patch are hardware specific. From: Intel Corporation, LAN Access Division <[EMAIL PROTECTED]> Signed-off-by: Yu Zhao <[EMAIL PROTECTED]> --- drivers/net/igb/Makefile|2 +- drivers/net/igb/e1000_82575.c |1 + drivers/net/igb/e1000_82575.h | 61 + drivers/net/igb/e1000_defines.h |7 + drivers/net/igb/e1000_hw.h |2 + drivers/net/igb/e1000_regs.h| 13 + drivers/net/igb/igb.h |8 + drivers/net/igb/igb_main.c | 567 +- drivers/pci/iov.c |6 +- 9 files changed, 649 insertions(+), 18 deletions(-) diff --git a/drivers/net/igb/Makefile b/drivers/net/igb/Makefile index 1927b3f..ab3944c 100644 --- a/drivers/net/igb/Makefile +++ b/drivers/net/igb/Makefile @@ -33,5 +33,5 @@ obj-$(CONFIG_IGB) += igb.o igb-objs := igb_main.o igb_ethtool.o e1000_82575.o \ - e1000_mac.o e1000_nvm.o e1000_phy.o + e1000_mac.o e1000_nvm.o e1000_phy.o e1000_vf.o diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c index f5e2e72..bb823ac 100644 --- a/drivers/net/igb/e1000_82575.c +++ b/drivers/net/igb/e1000_82575.c @@ -87,6 +87,7 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw) case E1000_DEV_ID_82576: case E1000_DEV_ID_82576_FIBER: case E1000_DEV_ID_82576_SERDES: + case E1000_DEV_ID_82576_QUAD_COPPER: mac->type = e1000_82576; break; default: diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h index c1928b5..8c488ab 100644 --- a/drivers/net/igb/e1000_82575.h +++ b/drivers/net/igb/e1000_82575.h @@ -170,4 +170,65 @@ struct e1000_adv_tx_context_desc { #define E1000_DCA_TXCTRL_CPUID_SHIFT 24 /* Tx CPUID now in the last byte */ #define E1000_DCA_RXCTRL_CPUID_SHIFT 24 /* Rx CPUID now in the last byte */ +#define MAX_NUM_VFS 8 + +#define E1000_DTXSWC_VMDQ_LOOPBACK_EN (1 << 31) /* global VF LB enable */ + +/* Easy defines for setting default pool, would normally be left a zero */ +#define E1000_VT_CTL_DEFAULT_POOL_SHIFT 7 +#define E1000_VT_CTL_DEFAULT_POOL_MASK (0x7 << E1000_VT_CTL_DEFAULT_POOL_SHIFT) + +/* Other useful VMD_CTL register defines */ +#define E1000_VT_CTL_DISABLE_DEF_POOL (1 << 29) +#define E1000_VT_CTL_VM_REPL_EN (1 << 30) + +/* Per VM Offload register setup */ +#define E1000_VMOLR_LPE0x0001 /* Accept Long packet */ +#define E1000_VMOLR_AUPE 0x0100 /* Accept untagged packets */ +#define E1000_VMOLR_BAM0x0800 /* Accept Broadcast packets */ +#define E1000_VMOLR_MPME 0x1000 /* Multicast promiscuous mode */ +#define E1000_VMOLR_STRVLAN0x4000 /* Vlan stripping enable */ + +#define E1000_P2VMAILBOX_STS 0x0001 /* Initiate message send to VF */ +#define E1000_P2VMAILBOX_ACK 0x0002 /* Ack message recv'd from VF */ +#define E1000_P2VMAILBOX_VFU 0x0004 /* VF owns the mailbox buffer */ +#define E1000_P2VMAILBOX_PFU 0x0008 /* PF owns the mailbox buffer */ + +#define E1000_VLVF_ARRAY_SIZE 32 +#define E1000_VLVF_VLANID_MASK0x0FFF +#define E1000_VLVF_POOLSEL_SHIFT 12 +#define E1000_VLVF_POOLSEL_MASK (0xFF << E1000_VLVF_POOLSEL_SHIFT) +#define E1000_VLVF_VLANID_ENABLE 0x8000 + +#define E1000_VFMAILBOX_SIZE 16 /* 16 32 bit words - 64 bytes */ + +/* If it's a E1000_VF_* msg then it originates in the VF and is sent to the + * PF. The reverse is true if it is E1000_PF_*. + * Message ACK's are the value or'd with 0xF000 + */ +#define E1000_VT_MSGTYPE_ACK 0xF000 /* Messages below or'd with + * this are the ACK */ +#define E1000_VT_MSGTYPE_NACK 0xFF00 /* Messages below or'd with + * this are the NACK */ +#define E1000_VT_MSGINFO_SHIFT16 +/* bits 23:16 are used for exra info for certain messages */ +#define E1000_VT_MSGINFO_MASK (0xFF << E1000_VT_MSGINFO_SHIFT) + +#define E1000_VF_MSGTYPE_REQ_MAC 1 /* VF needs to know its MAC */ +#define E1000_VF_MSGTYPE_VFLR 2 /* VF notifies VFLR to PF */ +#define E1000_VF_SET_MULTICAST3 /* VF requests PF to set MC addr */ +#define E1000_VF_SET_VLAN 4 /* VF requests PF to set VLAN */ +#define E1000_VF_SET_LPE 5 /* VF requests PF to set VMOLR.LPE */ + +s32 e1000_send_mail_to_vf(struct e1000_hw *hw, u32 *msg, + u32 vf_number, s16 size); +s32 e1000_receive_mail_from_vf(struct e1000_hw *hw, u32 *msg, +u32 vf_number, s16 size); +void e1000_vmdq_loopback_enable_vf(struct e1000_hw *hw); +void e1000_vmdq_loopback_disable_vf(struct e1000_hw *hw); +void e1000_vmdq_replication_enable_vf(struct e1000_hw *hw, u32 enables); +void e1000_vmdq_replication_disable_vf(struct e1000_hw *hw); +bool e1000_check_for_pf_ack_vf(s
RE: [PATCH] Kvm: Qemu: save nvram
Hi: Please drop the previous one. From 2fd0c2746a2d07813ad16700ee31c7f6ae78c40a Mon Sep 17 00:00:00 2001 From: Yang Zhang <[EMAIL PROTECTED]> Date: Tue, 2 Dec 2008 13:05:55 +0800 Subject: [PATCH] KVM: Qemu: save nvram support to save nvram to the file Signed-off-by: Yang Zhang <[EMAIL PROTECTED]> --- qemu/hw/ipf.c | 19 - qemu/target-ia64/firmware.c | 94 -- qemu/target-ia64/firmware.h | 22 +- 3 files changed, 126 insertions(+), 9 deletions(-) diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c index 337c854..2300ba9 100644 --- a/qemu/hw/ipf.c +++ b/qemu/hw/ipf.c @@ -51,6 +51,7 @@ static fdctrl_t *floppy_controller; static RTCState *rtc_state; static PCIDevice *i440fx_state; +uint8_t *g_fw_start; static uint32_t ipf_to_legacy_io(target_phys_addr_t addr) { @@ -454,9 +455,13 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, unsigned long image_size; char *image = NULL; uint8_t *fw_image_start; +unsigned long nvram_addr = 0; +unsigned long nvram_fd = 0; +unsigned long i = 0; ram_addr_t fw_offset = qemu_ram_alloc(GFW_SIZE); uint8_t *fw_start = phys_ram_base + fw_offset; +g_fw_start = fw_start; snprintf(buf, sizeof(buf), "%s/%s", bios_dir, FW_FILENAME); image = read_image(buf, &image_size ); if (NULL == image || !image_size) { @@ -472,7 +477,19 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, free(image); flush_icache_range((unsigned long)fw_image_start, (unsigned long)fw_image_start + image_size); -kvm_ia64_build_hob(ram_size + above_4g_mem_size, smp_cpus, fw_start); +if (qemu_name) { +nvram_addr = NVRAM_START; +nvram_fd = kvm_ia64_nvram_init(); +if (nvram_fd != -1) { +kvm_ia64_copy_from_nvram_to_GFW(nvram_fd, g_fw_start); +close(nvram_fd); +} +i = atexit(kvm_ia64_copy_from_GFW_to_nvram); +if (i != 0) +fprintf(stderr, "cannot set exit function\n"); +} +kvm_ia64_build_hob(ram_size + above_4g_mem_size,smp_cpus,fw_start, + nvram_addr); } /*Register legacy io address space, size:64M*/ diff --git a/qemu/target-ia64/firmware.c b/qemu/target-ia64/firmware.c index bac2721..6729cb5 100644 --- a/qemu/target-ia64/firmware.c +++ b/qemu/target-ia64/firmware.c @@ -31,6 +31,9 @@ #include "firmware.h" +#include "qemu-common.h" +#include "sysemu.h" + typedef struct { unsigned long signature; unsigned int type; @@ -85,14 +88,16 @@ static int hob_init(void *buffer ,unsigned long buf_size); static int add_pal_hob(void* hob_buf); static int add_mem_hob(void* hob_buf, unsigned long dom_mem_size); static int add_vcpus_hob(void* hob_buf, unsigned long nr_vcpu); +static int add_nvram_hob(void *hob_buf, unsigned long nvram_addr); static int build_hob(void* hob_buf, unsigned long hob_buf_size, - unsigned long dom_mem_size, unsigned long vcpus); + unsigned long dom_mem_size, unsigned long vcpus, + unsigned long nvram_addr); static int load_hob(void *hob_buf, unsigned long dom_mem_size, void* hob_start); int -kvm_ia64_build_hob(unsigned long memsize, - unsigned long vcpus, uint8_t* fw_start) +kvm_ia64_build_hob(unsigned long memsize, unsigned long vcpus, + uint8_t* fw_start, unsigned long nvram_addr) { char *hob_buf; @@ -102,7 +107,7 @@ kvm_ia64_build_hob(unsigned long memsize, return -1; } -if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus) < 0) { +if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus, nvram_addr) < 0) { free(hob_buf); Hob_Output("Could not build hob"); return -1; @@ -206,7 +211,8 @@ add_max_hob_entry(void* hob_buf) static int build_hob(void* hob_buf, unsigned long hob_buf_size, - unsigned long dom_mem_size, unsigned long vcpus) + unsigned long dom_mem_size, unsigned long vcpus, + unsigned long nvram_addr) { //Init HOB List if (hob_init(hob_buf, hob_buf_size) < 0) { @@ -229,6 +235,11 @@ build_hob(void* hob_buf, unsigned long hob_buf_size, goto err_out; } +if (add_nvram_hob(hob_buf, nvram_addr) < 0) { + Hob_Output("Add nvram hob failed, buffer too small"); + goto err_out; + } + if (add_max_hob_entry(hob_buf) < 0) { Hob_Output("Add max hob entry failed, buffer too small"); goto err_out; @@ -285,6 +296,12 @@ add_vcpus_hob(void* hob_buf, unsigned long vcpus) return hob_add(hob_buf, HOB_TYPE_NR_VCPU, &vcpus, sizeof(vcpus)); } +static int +add_nvram_hob(void *hob_buf, unsigned long nvram_addr) +{ +return hob_add(hob_buf, HOB_TYPE_NR_NVRAM, &nvram_addr, sizeof(nvram_addr)
[SR-IOV driver example 0/3 resend] introduction
SR-IOV drivers of Intel 82576 NIC are available. There are two parts of the drivers: Physical Function driver and Virtual Function driver. The PF driver is based on the IGB driver and is used to control PF to allocate hardware specific resources and interface with the SR-IOV core. The VF driver is a new NIC driver that is same as the traditional PCI device driver. It works in both the host and the guest (Xen and KVM) environment. These two drivers are testing versions and they are *only* intended to show how to use SR-IOV API. Intel 82576 NIC specification can be found at: http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf [SR-IOV driver example 0/3 resend] introduction [SR-IOV driver example 1/3 resend] PF driver: hardware specific operations [SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core [SR-IOV driver example 3/3 resend] VF driver: an independent PCI NIC driver -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Kvm: Qemu: save nvram
This patch to save the nvram. It save the nvram by specify the arg of -name.And the saved file named by the arg. If do not specify the arg,it will not save the nvram >From d3e31cda03ef67efc860eaec2f93153e5535d744 Mon Sep 17 00:00:00 2001 From: Yang Zhang <[EMAIL PROTECTED]> Date: Tue, 2 Dec 2008 10:02:00 +0800 Subject: [PATCH] Kvm: Qemu: save nvram support to save nvram to the file Signed-off-by: Yang Zhang <[EMAIL PROTECTED]> --- qemu/hw/ipf.c | 15 ++- qemu/target-ia64/firmware.c | 107 +-- qemu/target-ia64/firmware.h | 22 - 3 files changed, 135 insertions(+), 9 deletions(-) diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c index 337c854..cdbd4e0 100644 --- a/qemu/hw/ipf.c +++ b/qemu/hw/ipf.c @@ -51,6 +51,7 @@ static fdctrl_t *floppy_controller; static RTCState *rtc_state; static PCIDevice *i440fx_state; +uint8_t *g_fw_start; static uint32_t ipf_to_legacy_io(target_phys_addr_t addr) { @@ -454,9 +455,12 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, unsigned long image_size; char *image = NULL; uint8_t *fw_image_start; +unsigned long nvram_addr = 0; +unsigned long nvram_fd = 0; ram_addr_t fw_offset = qemu_ram_alloc(GFW_SIZE); uint8_t *fw_start = phys_ram_base + fw_offset; +g_fw_start = fw_start; snprintf(buf, sizeof(buf), "%s/%s", bios_dir, FW_FILENAME); image = read_image(buf, &image_size ); if (NULL == image || !image_size) { @@ -472,7 +476,16 @@ static void ipf_init1(ram_addr_t ram_size, int vga_ram_size, free(image); flush_icache_range((unsigned long)fw_image_start, (unsigned long)fw_image_start + image_size); -kvm_ia64_build_hob(ram_size + above_4g_mem_size, smp_cpus, fw_start); +if (qemu_name) { +nvram_addr = NVRAM_START; +if((nvram_fd = kvm_ia64_nvram_init()) != -1) { +kvm_ia64_copy_from_nvram_to_GFW(nvram_fd,g_fw_start); +close(nvram_fd); +} +atexit(kvm_ia64_copy_from_GFW_to_nvram); +} +kvm_ia64_build_hob(ram_size + above_4g_mem_size,smp_cpus,fw_start, + nvram_addr); } /*Register legacy io address space, size:64M*/ diff --git a/qemu/target-ia64/firmware.c b/qemu/target-ia64/firmware.c index bac2721..39c8361 100644 --- a/qemu/target-ia64/firmware.c +++ b/qemu/target-ia64/firmware.c @@ -31,6 +31,8 @@ #include "firmware.h" +#include "qemu-common.h" +#include "sysemu.h" typedef struct { unsigned long signature; unsigned int type; @@ -85,14 +87,16 @@ static int hob_init(void *buffer ,unsigned long buf_size); static int add_pal_hob(void* hob_buf); static int add_mem_hob(void* hob_buf, unsigned long dom_mem_size); static int add_vcpus_hob(void* hob_buf, unsigned long nr_vcpu); +static int add_nvram_hob(void *hob_buf, unsigned long nvram_addr); static int build_hob(void* hob_buf, unsigned long hob_buf_size, - unsigned long dom_mem_size, unsigned long vcpus); + unsigned long dom_mem_size, unsigned long vcpus +, unsigned long nvram_addr); static int load_hob(void *hob_buf, unsigned long dom_mem_size, void* hob_start); int -kvm_ia64_build_hob(unsigned long memsize, - unsigned long vcpus, uint8_t* fw_start) +kvm_ia64_build_hob(unsigned long memsize, unsigned long vcpus, + uint8_t* fw_start,unsigned long nvram_addr) { char *hob_buf; @@ -102,7 +106,7 @@ kvm_ia64_build_hob(unsigned long memsize, return -1; } -if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus) < 0) { +if (build_hob(hob_buf, GFW_HOB_SIZE, memsize, vcpus,nvram_addr) < 0) { free(hob_buf); Hob_Output("Could not build hob"); return -1; @@ -206,7 +210,7 @@ add_max_hob_entry(void* hob_buf) static int build_hob(void* hob_buf, unsigned long hob_buf_size, - unsigned long dom_mem_size, unsigned long vcpus) + unsigned long dom_mem_size, unsigned long vcpus,unsigned long nvram_addr) { //Init HOB List if (hob_init(hob_buf, hob_buf_size) < 0) { @@ -229,6 +233,11 @@ build_hob(void* hob_buf, unsigned long hob_buf_size, goto err_out; } +if (add_nvram_hob(hob_buf, nvram_addr) < 0) { + Hob_Output("Add nvram hob failed, buffer too small"); + goto err_out; + } + if (add_max_hob_entry(hob_buf) < 0) { Hob_Output("Add max hob entry failed, buffer too small"); goto err_out; @@ -285,6 +294,12 @@ add_vcpus_hob(void* hob_buf, unsigned long vcpus) return hob_add(hob_buf, HOB_TYPE_NR_VCPU, &vcpus, sizeof(vcpus)); } +static int +add_nvram_hob(void *hob_buf, unsigned long nvram_addr) +{ +return hob_add(hob_buf, HOB_TYPE_NR_NVRAM, &nvram_addr, sizeof(
RE: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
Oops, seems we introduced the issue together. Acked-by Xiantao Zhang <[EMAIL PROTECTED]> -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 7:03 AM To: Hollis Blanchard Cc: Avi Kivity; Zhang, Xiantao; kvm@vger.kernel.org; [EMAIL PROTECTED] Subject: Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch. Hollis Blanchard wrote: > On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote: >> Zhang, Xiantao wrote: >>> >From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001 >>> From: Xiantao Zhang <[EMAIL PROTECTED]> >>> Date: Fri, 28 Nov 2008 09:38:46 +0800 >>> Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch. >>> >>> Use TARGET_I386 to exclude other archs. >>> Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]> >>> --- >>> libkvm/libkvm.c |4 ++-- >>> qemu/qemu-kvm.c |4 >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c >>> index 40c95ce..851a93a 100644 >>> --- a/libkvm/libkvm.c >>> +++ b/libkvm/libkvm.c >>> @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env) >>> struct kvm_run *run = kvm->run[vcpu]; >>> >>> again: >>> -#ifdef KVM_CAP_NMI >>> +#ifdef TARGET_I386 >>> push_nmi(kvm); >>> #endif >>> #if !defined(__s390__) >>> @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm) >>> >>> int kvm_inject_nmi(kvm_context_t kvm, int vcpu) >>> { >>> -#ifdef KVM_CAP_NMI >>> +#ifdef TARGET_I386 >>> return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI); >>> #else >>> return -ENOSYS; >>> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c >>> index cf0e85d..b6c8288 100644 >>> --- a/qemu/qemu-kvm.c >>> +++ b/qemu/qemu-kvm.c >>> @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque) >>> return kvm_arch_try_push_interrupts(opaque); >>> } >>> >>> +#ifdef TARGET_I386 >>> static void push_nmi(void *opaque) >>> { >>> kvm_arch_push_nmi(opaque); >>> } >>> +#endif >>> >>> static void post_kvm_run(void *opaque, void *data) >>> { >>> @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = { >>> .shutdown = kvm_shutdown, >>> .io_window = kvm_io_window, >>> .try_push_interrupts = try_push_interrupts, >>> +#ifdef TARGET_I386 >>> .push_nmi = push_nmi, >>> +#endif >>> .post_kvm_run = post_kvm_run, >>> .pre_kvm_run = pre_kvm_run, >>> #ifdef TARGET_I386 >> This will now break when KVM_CAP_NMI is undefined, ie. when there is no >> KVM_NMI IOCTL (=> older kvm module sets). > > Guys, we already have stubs for this (although they've been turned into > dead code). Jan broke IA64 and PowerPC builds when he renamed > "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is > to update the stubs to match. That avoids all these ifdefs and > associated problems. Ouch - I'm sorry. > > Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please? > Here is a patch that reverts change and fixes the root of the issue. --- Subject: Fix non-x86 NMI hooks My previous x86-only change to the NMI push hook broke PPC and IA64. This is a proper fix plus a cleanup of the #ifdef-based approach to solve the breakage. Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> --- qemu/qemu-kvm-ia64.c|3 +-- qemu/qemu-kvm-powerpc.c |3 +-- qemu/qemu-kvm.c |4 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/qemu/qemu-kvm-ia64.c b/qemu/qemu-kvm-ia64.c index 8380f39..a6b17af 100644 --- a/qemu/qemu-kvm-ia64.c +++ b/qemu/qemu-kvm-ia64.c @@ -57,9 +57,8 @@ int kvm_arch_try_push_interrupts(void *opaque) return 1; } -int kvm_arch_try_push_nmi(void *opaque) +void kvm_arch_push_nmi(void *opaque) { -return 1; } void kvm_arch_update_regs_for_sipi(CPUState *env) diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c index 19fde40..fa534ed 100644 --- a/qemu/qemu-kvm-powerpc.c +++ b/qemu/qemu-kvm-powerpc.c @@ -188,12 +188,11 @@ int kvm_arch_try_push_interrupts(void *opaque) return 0; } -int kvm_arch_try_push_nmi(void *opaque) +void kvm_arch_push_nmi(void *opaque) { /* no nmi irq, so discard that call for now and return success. * This might later get mapped to something on powerpc too if we want * to support the nmi monitor command somwhow */ - return 0; } void kvm_arch_update_regs_for_sipi(CPUState *env) diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index b6c8288..cf0e85d 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -154,12 +154,10 @@ static int try_push_interrupts(void *opaque) return kvm_arch_try_push_interrupts(opaque); } -#ifdef TARGET_I386 static void push_nmi(void *opaque) { kvm_arch_push_nmi(opaque); } -#endif static void post_kvm_run(void *opaque, void *data) { @@ -744,9 +742,7 @@ static struct kvm_callbacks qemu_kvm_ops = { .shutdown = kvm_shutdown, .io_window = kvm_io_window, .try_push_interrupts = try_push_in
Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?
On Mon, 1 Dec 2008, Avi Kivity wrote: > Anthony Liguori wrote: > > > > We've started getting some reports of corruption on "commit" in KVM. There > > is a long standing disk corruption issue too that is very difficult to > > reproduce. The thinking is that there is a bug somewhere in the qcow2 code. > > > > Is anyone actively looking into this? > > > > I am, though my actively is a lot less than could be desired. Additional eyes > would be welcome. FWIW, I must apologize for giving you incorrect data. I'm seeing problems now that have nothing to do with the size of the commit, and I'm beginning to suspect that the commit step has nothing to do with the problem. I'll summarize my evidence because it seems potentially very important: I installed WinXP on qcow2, which went perfectly. I rebooted multiple times with no problems and changed settings for my desktop and taskbar, rebooted again with no problems. Now, I make a new, fresh [what's the opposite of a backing file?] like this: $qemu-img create -f qcow2 -b kvmXP kvmXP.delta All I do is boot XP again like this: $qemu-system-x86_64 -m 1000 kvmXP.delta My shock is that one of the taskbar settings I changed has disappeared! I can boot the original kvmXP qcow2 image and verify that my changes are still there in the original, but not when I boot kvmXP.delta. In case someone wants to try to reproduce it, the specific change I made to the task bar is to display the animated network activity icon in the system tray. That icon never appears when I boot from kvmXP.delta. In other words, from the instant I boot kvmXP.delta, the image of XP that gets loaded into memory is not an accurate reflection of what's in the backing file. If that's true, then it's not surprising that the commit step causes trouble. Does my reasoning seem reasonable? :o) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/4] KVM: MMU: prepopulate the shadow on invlpg
If the guest executes invlpg, peek into the pagetable and attempt to prepopulate the shadow entry. Also stop dirty fault updates from interfering with the fork detector. 2% improvement on RHEL3/AIM7. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2441,7 +2441,8 @@ static void kvm_mmu_access_page(struct k } void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, - const u8 *new, int bytes) + const u8 *new, int bytes, + bool guest_initiated) { gfn_t gfn = gpa >> PAGE_SHIFT; struct kvm_mmu_page *sp; @@ -2467,15 +2468,17 @@ void kvm_mmu_pte_write(struct kvm_vcpu * kvm_mmu_free_some_pages(vcpu); ++vcpu->kvm->stat.mmu_pte_write; kvm_mmu_audit(vcpu, "pre pte write"); - if (gfn == vcpu->arch.last_pt_write_gfn - && !last_updated_pte_accessed(vcpu)) { - ++vcpu->arch.last_pt_write_count; - if (vcpu->arch.last_pt_write_count >= 3) - flooded = 1; - } else { - vcpu->arch.last_pt_write_gfn = gfn; - vcpu->arch.last_pt_write_count = 1; - vcpu->arch.last_pte_updated = NULL; + if (guest_initiated) { + if (gfn == vcpu->arch.last_pt_write_gfn + && !last_updated_pte_accessed(vcpu)) { + ++vcpu->arch.last_pt_write_count; + if (vcpu->arch.last_pt_write_count >= 3) + flooded = 1; + } else { + vcpu->arch.last_pt_write_gfn = gfn; + vcpu->arch.last_pt_write_count = 1; + vcpu->arch.last_pte_updated = NULL; + } } index = kvm_page_table_hashfn(gfn); bucket = &vcpu->kvm->arch.mmu_page_hash[index]; @@ -2615,9 +2618,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) { - spin_lock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu.invlpg(vcpu, gva); - spin_unlock(&vcpu->kvm->mmu_lock); kvm_mmu_flush_tlb(vcpu); ++vcpu->stat.invlpg; } Index: kvm/arch/x86/kvm/paging_tmpl.h === --- kvm.orig/arch/x86/kvm/paging_tmpl.h +++ kvm/arch/x86/kvm/paging_tmpl.h @@ -82,6 +82,7 @@ struct shadow_walker { int *ptwrite; pfn_t pfn; u64 *sptep; + gpa_t pte_gpa; }; static gfn_t gpte_to_gfn(pt_element_t gpte) @@ -222,7 +223,7 @@ walk: if (ret) goto walk; pte |= PT_DIRTY_MASK; - kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte)); + kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte), 0); walker->ptes[walker->level - 1] = pte; } @@ -468,8 +469,15 @@ static int FNAME(shadow_invlpg_entry)(st struct kvm_vcpu *vcpu, u64 addr, u64 *sptep, int level) { + struct shadow_walker *sw = + container_of(_sw, struct shadow_walker, walker); if (level == PT_PAGE_TABLE_LEVEL) { + struct kvm_mmu_page *sp = page_header(__pa(sptep)); + + sw->pte_gpa = (sp->gfn << PAGE_SHIFT); + sw->pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); + if (is_shadow_present_pte(*sptep)) rmap_remove(vcpu->kvm, sptep); set_shadow_pte(sptep, shadow_trap_nonpresent_pte); @@ -482,11 +490,26 @@ static int FNAME(shadow_invlpg_entry)(st static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { + pt_element_t gpte; struct shadow_walker walker = { .walker = { .entry = FNAME(shadow_invlpg_entry), }, + .pte_gpa = -1, }; + spin_lock(&vcpu->kvm->mmu_lock); walk_shadow(&walker.walker, vcpu, gva); + spin_unlock(&vcpu->kvm->mmu_lock); + if (walker.pte_gpa == -1) + return; + if (kvm_read_guest_atomic(vcpu->kvm, walker.pte_gpa, &gpte, + sizeof(pt_element_t))) + return; + if (is_present_pte(gpte) && (gpte & PT_ACCESSED_MASK)) { + if (mmu_topup_memory_caches(vcpu)) + return; + kvm_mmu_pte_write(vcpu, walker.pte_gpa, (const u8 *)&gpte, + sizeof(pt_element_t), 0); + } } static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr) Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -2046,7 +2046,7 @@ int emulator_write_phys(struct kvm_vcpu ret = kvm_write_gue
[patch 3/4] KVM: MMU: skip global pgtables on sync due to cr3 switch
Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is important for Linux 32-bit guests with PAE, where the kmap page is marked as global. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Index: kvm/arch/x86/include/asm/kvm_host.h === --- kvm.orig/arch/x86/include/asm/kvm_host.h +++ kvm/arch/x86/include/asm/kvm_host.h @@ -182,6 +182,8 @@ struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; + struct list_head oos_link; + /* * The following two entries are used to key the shadow page in the * hash table. @@ -200,6 +202,7 @@ struct kvm_mmu_page { int multimapped; /* More than one parent_pte? */ int root_count; /* Currently serving as active root */ bool unsync; + bool global; unsigned int unsync_children; union { u64 *parent_pte; /* !multimapped */ @@ -356,6 +359,7 @@ struct kvm_arch{ */ struct list_head active_mmu_pages; struct list_head assigned_dev_head; + struct list_head oos_global_pages; struct dmar_domain *intel_iommu_domain; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; @@ -385,6 +389,7 @@ struct kvm_vm_stat { u32 mmu_recycled; u32 mmu_cache_miss; u32 mmu_unsync; + u32 mmu_unsync_global; u32 remote_tlb_flush; u32 lpages; }; @@ -603,6 +608,7 @@ void __kvm_mmu_free_some_pages(struct kv int kvm_mmu_load(struct kvm_vcpu *vcpu); void kvm_mmu_unload(struct kvm_vcpu *vcpu); void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); +void kvm_mmu_sync_global(struct kvm_vcpu *vcpu); int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -793,9 +793,11 @@ static struct kvm_mmu_page *kvm_mmu_allo sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); set_page_private(virt_to_page(sp->spt), (unsigned long)sp); list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); + INIT_LIST_HEAD(&sp->oos_link); ASSERT(is_empty_shadow_page(sp->spt)); bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); sp->multimapped = 0; + sp->global = 1; sp->parent_pte = parent_pte; --vcpu->kvm->arch.n_free_mmu_pages; return sp; @@ -1066,10 +1068,18 @@ static struct kvm_mmu_page *kvm_mmu_look return NULL; } +static void kvm_unlink_unsync_global(struct kvm *kvm, struct kvm_mmu_page *sp) +{ + list_del(&sp->oos_link); + --kvm->stat.mmu_unsync_global; +} + static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) { WARN_ON(!sp->unsync); sp->unsync = 0; + if (sp->global) + kvm_unlink_unsync_global(kvm, sp); --kvm->stat.mmu_unsync; } @@ -1615,9 +1625,15 @@ static int kvm_unsync_page(struct kvm_vc if (s->role.word != sp->role.word) return 1; } - kvm_mmu_mark_parents_unsync(vcpu, sp); ++vcpu->kvm->stat.mmu_unsync; sp->unsync = 1; + + if (sp->global) { + list_add(&sp->oos_link, &vcpu->kvm->arch.oos_global_pages); + ++vcpu->kvm->stat.mmu_unsync_global; + } else + kvm_mmu_mark_parents_unsync(vcpu, sp); + mmu_convert_notrap(sp); return 0; } @@ -1643,12 +1659,21 @@ static int mmu_need_write_protect(struct static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, unsigned pte_access, int user_fault, int write_fault, int dirty, int largepage, - gfn_t gfn, pfn_t pfn, bool speculative, + int global, gfn_t gfn, pfn_t pfn, bool speculative, bool can_unsync) { u64 spte; int ret = 0; u64 mt_mask = shadow_mt_mask; + struct kvm_mmu_page *sp = page_header(__pa(shadow_pte)); + + if (!global && sp->global) { + sp->global = 0; + if (sp->unsync) { + kvm_unlink_unsync_global(vcpu->kvm, sp); + kvm_mmu_mark_parents_unsync(vcpu, sp); + } + } /* * We don't set the accessed bit, since we sometimes want to see @@ -1717,8 +1742,8 @@ set_pte: static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, unsigned pt_access, unsigned pte_access, int user_fault, int write_fault, int dirty, -int *ptwrite, int largepage, gfn_t gfn, -pfn_t pfn, bool speculative) +int *ptwrite, int largepage, int global, +gfn_t gfn, pfn_t pfn, bool speculative) {
[patch 2/4] KVM: MMU: collapse remote TLB flushes on root sync
Collapse remote TLB flushes on root sync. kernbench is 2.7% faster on 4-way guest. Improvements have been seen with other loads such as AIM7. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -621,7 +621,7 @@ static u64 *rmap_next(struct kvm *kvm, u return NULL; } -static void rmap_write_protect(struct kvm *kvm, u64 gfn) +static int rmap_write_protect(struct kvm *kvm, u64 gfn) { unsigned long *rmapp; u64 *spte; @@ -667,8 +667,7 @@ static void rmap_write_protect(struct kv spte = rmap_next(kvm, rmapp, spte); } - if (write_protected) - kvm_flush_remote_tlbs(kvm); + return write_protected; } static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) @@ -1083,7 +1082,8 @@ static int kvm_sync_page(struct kvm_vcpu return 1; } - rmap_write_protect(vcpu->kvm, sp->gfn); + if (rmap_write_protect(vcpu->kvm, sp->gfn)) + kvm_flush_remote_tlbs(vcpu->kvm); kvm_unlink_unsync_page(vcpu->kvm, sp); if (vcpu->arch.mmu.sync_page(vcpu, sp)) { kvm_mmu_zap_page(vcpu->kvm, sp); @@ -1162,6 +1162,14 @@ static void mmu_sync_children(struct kvm kvm_mmu_pages_init(parent, &parents, &pages); while (mmu_unsync_walk(parent, &pages)) { + int protected = 0; + + for_each_sp(pages, sp, parents, i) + protected |= rmap_write_protect(vcpu->kvm, sp->gfn); + + if (protected) + kvm_flush_remote_tlbs(vcpu->kvm); + for_each_sp(pages, sp, parents, i) { kvm_sync_page(vcpu, sp); mmu_pages_clear_parents(&parents); @@ -1226,7 +1234,8 @@ static struct kvm_mmu_page *kvm_mmu_get_ sp->role = role; hlist_add_head(&sp->hash_link, bucket); if (!metaphysical) { - rmap_write_protect(vcpu->kvm, gfn); + if (rmap_write_protect(vcpu->kvm, gfn)) + kvm_flush_remote_tlbs(vcpu->kvm); account_shadowed(vcpu->kvm, gfn); } if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte) -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/4] KVM: MMU: use page array in unsync walk
Instead of invoking the handler directly collect pages into an array so the caller can work with it. Simplifies TLB flush collapsing. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -908,8 +908,9 @@ static void kvm_mmu_update_unsync_bitmap struct kvm_mmu_page *sp = page_header(__pa(spte)); index = spte - sp->spt; - __set_bit(index, sp->unsync_child_bitmap); - sp->unsync_children = 1; + if (!__test_and_set_bit(index, sp->unsync_child_bitmap)) + sp->unsync_children++; + WARN_ON(!sp->unsync_children); } static void kvm_mmu_update_parents_unsync(struct kvm_mmu_page *sp) @@ -936,7 +937,6 @@ static void kvm_mmu_update_parents_unsyn static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { - sp->unsync_children = 1; kvm_mmu_update_parents_unsync(sp); return 1; } @@ -967,18 +967,41 @@ static void nonpaging_invlpg(struct kvm_ { } +#define KVM_PAGE_ARRAY_NR 16 + +struct kvm_mmu_pages { + struct mmu_page_and_offset { + struct kvm_mmu_page *sp; + unsigned int idx; + } page[KVM_PAGE_ARRAY_NR]; + unsigned int nr; +}; + #define for_each_unsync_children(bitmap, idx) \ for (idx = find_first_bit(bitmap, 512); \ idx < 512; \ idx = find_next_bit(bitmap, 512, idx+1)) -static int mmu_unsync_walk(struct kvm_mmu_page *sp, - struct kvm_unsync_walk *walker) +int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, + int idx) { - int i, ret; + int i; - if (!sp->unsync_children) - return 0; + if (sp->unsync) + for (i=0; i < pvec->nr; i++) + if (pvec->page[i].sp == sp) + return 0; + + pvec->page[pvec->nr].sp = sp; + pvec->page[pvec->nr].idx = idx; + pvec->nr++; + return (pvec->nr == KVM_PAGE_ARRAY_NR); +} + +static int __mmu_unsync_walk(struct kvm_mmu_page *sp, + struct kvm_mmu_pages *pvec) +{ + int i, ret, nr_unsync_leaf = 0; for_each_unsync_children(sp->unsync_child_bitmap, i) { u64 ent = sp->spt[i]; @@ -988,17 +1011,22 @@ static int mmu_unsync_walk(struct kvm_mm child = page_header(ent & PT64_BASE_ADDR_MASK); if (child->unsync_children) { - ret = mmu_unsync_walk(child, walker); - if (ret) + if (mmu_pages_add(pvec, child, i)) + return -ENOSPC; + + ret = __mmu_unsync_walk(child, pvec); + if (!ret) + __clear_bit(i, sp->unsync_child_bitmap); + else if (ret > 0) + nr_unsync_leaf += ret; + else return ret; - __clear_bit(i, sp->unsync_child_bitmap); } if (child->unsync) { - ret = walker->entry(child, walker); - __clear_bit(i, sp->unsync_child_bitmap); - if (ret) - return ret; + nr_unsync_leaf++; + if (mmu_pages_add(pvec, child, i)) + return -ENOSPC; } } } @@ -1006,7 +1034,17 @@ static int mmu_unsync_walk(struct kvm_mm if (find_first_bit(sp->unsync_child_bitmap, 512) == 512) sp->unsync_children = 0; - return 0; + return nr_unsync_leaf; +} + +static int mmu_unsync_walk(struct kvm_mmu_page *sp, + struct kvm_mmu_pages *pvec) +{ + if (!sp->unsync_children) + return 0; + + mmu_pages_add(pvec, sp, 0); + return __mmu_unsync_walk(sp, pvec); } static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn) @@ -1056,30 +1094,81 @@ static int kvm_sync_page(struct kvm_vcpu return 0; } -struct sync_walker { - struct kvm_vcpu *vcpu; - struct kvm_unsync_walk walker; +struct mmu_page_path { + struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1]; + unsigned int idx[PT64_ROOT_LEVEL-1]; }; -static int mmu_sync_fn(struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk) +#define for_each_sp(pvec, sp, parents, i) \ + for (i = mmu_pages_next(&pvec, &parents, -1), \ + sp =
[patch 0/4] oos shadow optimizations v2
Addressing comments from previous version. -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM
It's fine. You only needs to change the APIs to generic APIs. I will update it soon. Regards, Weidong Joerg Roedel wrote: > Ok, I got them to apply. I also did the checkpatch cleanups. To speed > things up a bit I would suggest that I rebase my patchset on your > patches and send it out in a single series. Any problems with this > approach? > > Joerg > > On Mon, Dec 01, 2008 at 09:22:42PM +0800, Han, Weidong wrote: >> Sorry, this patch has style problem. I will update it and also split >> it to smaller patches for easy reviewing. >> >> Regards, >> Weidong >> >> 'Joerg Roedel' wrote: >>> Hmm, I get these errors using git-am: >>> >>> Applying VT-d: Support multiple device assignment for KVM >>> .dotest/patch:1344: space before tab in indent. >>> clflush_cache_range(addr, size); >>> .dotest/patch:1350: space before tab in indent. >>> clflush_cache_range(addr, size); >>> .dotest/patch:1907: trailing whitespace. >>> >>> .dotest/patch:1946: trailing whitespace. >>> * owned by this domain, clear this iommu in >>> iommu_bmp .dotest/patch:2300: trailing whitespace. >>> >>> error: patch failed: drivers/pci/dmar.c:484 >>> error: drivers/pci/dmar.c: patch does not apply >>> error: patch failed: drivers/pci/intel-iommu.c:50 >>> error: drivers/pci/intel-iommu.c: patch does not apply >>> error: patch failed: include/linux/dma_remapping.h:111 >>> error: include/linux/dma_remapping.h: patch does not apply >>> error: patch failed: include/linux/intel-iommu.h:219 >>> error: include/linux/intel-iommu.h: patch does not apply >>> Patch failed at 0001. >>> >>> Joerg >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
On Tue, 2008-12-02 at 00:02 +0100, Jan Kiszka wrote: > > > > Guys, we already have stubs for this (although they've been turned into > > dead code). Jan broke IA64 and PowerPC builds when he renamed > > "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is > > to update the stubs to match. That avoids all these ifdefs and > > associated problems. > > Ouch - I'm sorry. Well, it happens, but I do wish that more people would use cscope or even grep to find all users of a symbol. I also wish that Avi would get his PPC box working so he could catch build breaks like these. Cross-compilers would do as well. I would also like a pony. > > Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please? > > > > Here is a patch that reverts change and fixes the root of the issue. Acked-by: Hollis Blanchard <[EMAIL PROTECTED]> -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
Hollis Blanchard wrote: > On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote: >> Zhang, Xiantao wrote: >>> >From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001 >>> From: Xiantao Zhang <[EMAIL PROTECTED]> >>> Date: Fri, 28 Nov 2008 09:38:46 +0800 >>> Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch. >>> >>> Use TARGET_I386 to exclude other archs. >>> Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]> >>> --- >>> libkvm/libkvm.c |4 ++-- >>> qemu/qemu-kvm.c |4 >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c >>> index 40c95ce..851a93a 100644 >>> --- a/libkvm/libkvm.c >>> +++ b/libkvm/libkvm.c >>> @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env) >>> struct kvm_run *run = kvm->run[vcpu]; >>> >>> again: >>> -#ifdef KVM_CAP_NMI >>> +#ifdef TARGET_I386 >>> push_nmi(kvm); >>> #endif >>> #if !defined(__s390__) >>> @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm) >>> >>> int kvm_inject_nmi(kvm_context_t kvm, int vcpu) >>> { >>> -#ifdef KVM_CAP_NMI >>> +#ifdef TARGET_I386 >>> return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI); >>> #else >>> return -ENOSYS; >>> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c >>> index cf0e85d..b6c8288 100644 >>> --- a/qemu/qemu-kvm.c >>> +++ b/qemu/qemu-kvm.c >>> @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque) >>> return kvm_arch_try_push_interrupts(opaque); >>> } >>> >>> +#ifdef TARGET_I386 >>> static void push_nmi(void *opaque) >>> { >>> kvm_arch_push_nmi(opaque); >>> } >>> +#endif >>> >>> static void post_kvm_run(void *opaque, void *data) >>> { >>> @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = { >>> .shutdown = kvm_shutdown, >>> .io_window = kvm_io_window, >>> .try_push_interrupts = try_push_interrupts, >>> +#ifdef TARGET_I386 >>> .push_nmi = push_nmi, >>> +#endif >>> .post_kvm_run = post_kvm_run, >>> .pre_kvm_run = pre_kvm_run, >>> #ifdef TARGET_I386 >> This will now break when KVM_CAP_NMI is undefined, ie. when there is no >> KVM_NMI IOCTL (=> older kvm module sets). > > Guys, we already have stubs for this (although they've been turned into > dead code). Jan broke IA64 and PowerPC builds when he renamed > "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is > to update the stubs to match. That avoids all these ifdefs and > associated problems. Ouch - I'm sorry. > > Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please? > Here is a patch that reverts change and fixes the root of the issue. --- Subject: Fix non-x86 NMI hooks My previous x86-only change to the NMI push hook broke PPC and IA64. This is a proper fix plus a cleanup of the #ifdef-based approach to solve the breakage. Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> --- qemu/qemu-kvm-ia64.c|3 +-- qemu/qemu-kvm-powerpc.c |3 +-- qemu/qemu-kvm.c |4 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/qemu/qemu-kvm-ia64.c b/qemu/qemu-kvm-ia64.c index 8380f39..a6b17af 100644 --- a/qemu/qemu-kvm-ia64.c +++ b/qemu/qemu-kvm-ia64.c @@ -57,9 +57,8 @@ int kvm_arch_try_push_interrupts(void *opaque) return 1; } -int kvm_arch_try_push_nmi(void *opaque) +void kvm_arch_push_nmi(void *opaque) { -return 1; } void kvm_arch_update_regs_for_sipi(CPUState *env) diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c index 19fde40..fa534ed 100644 --- a/qemu/qemu-kvm-powerpc.c +++ b/qemu/qemu-kvm-powerpc.c @@ -188,12 +188,11 @@ int kvm_arch_try_push_interrupts(void *opaque) return 0; } -int kvm_arch_try_push_nmi(void *opaque) +void kvm_arch_push_nmi(void *opaque) { /* no nmi irq, so discard that call for now and return success. * This might later get mapped to something on powerpc too if we want * to support the nmi monitor command somwhow */ - return 0; } void kvm_arch_update_regs_for_sipi(CPUState *env) diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index b6c8288..cf0e85d 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -154,12 +154,10 @@ static int try_push_interrupts(void *opaque) return kvm_arch_try_push_interrupts(opaque); } -#ifdef TARGET_I386 static void push_nmi(void *opaque) { kvm_arch_push_nmi(opaque); } -#endif static void post_kvm_run(void *opaque, void *data) { @@ -744,9 +742,7 @@ static struct kvm_callbacks qemu_kvm_ops = { .shutdown = kvm_shutdown, .io_window = kvm_io_window, .try_push_interrupts = try_push_interrupts, -#ifdef TARGET_I386 .push_nmi = push_nmi, -#endif .post_kvm_run = post_kvm_run, .pre_kvm_run = pre_kvm_run, #ifdef TARGET_I386 signature.asc Description: OpenPGP digital signature
Re: 1-1 mapping of devices without VT-d
Michael Tokarev wrote: Dor Laor wrote: [] Although it had worked for us out of tree, there is no immediate need to pursue it. If anyone would like to nurture these patches he is more than welcome. ps: you also have pv-dma option for Linux guests (same status though). As time goes by most host will have either vt-d or amd iommu. Hmm. Well, as time goes by, most hosts will be 64 bit or more. But it does not mean that there's no need to maintain 32bits arch anymore... i hope anyway :) But of course Are you saying that PCI passthrough without hardware support will not be available in (standard) kvm, even if patches exists for that? No, just might take a some time to go to mainline. Patches need further polishing and we also need wider demand for it. Actually pvdma can help vt-d so we won't have to make all the guest memory unswappable. /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: STOP error with virtio on KVM-79/2.6.18/Win2k3 x64 guest
Adrian Schmitz wrote: Sorry for the repost.. I forgot the subject line! Hi, I'm having problems with STOP errors (0x00d1) under KVM-79/2.6.18 whenever I try to use the virtio drivers. This post (http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue exactly, except that I'm using a Win2k3 x64 guest with the x64 paravirtual drivers instead of 32-bit guest/drivers. I am able to reproduce the problem reliably using iperf, the same as in the above post. When I disable virtio, the guest is very stable. Any suggestions are greatly appreciated. What driver version are you using? Version 2 is obsolete. I posted ver 3 few months ago, Avi can you please upload it to sourceforge. My old public space was blocked so I'll send you a private attachment to test. Dor. -Adrian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1-1 mapping of devices without VT-d
Dor Laor wrote: [] > Although it had worked for us out of tree, there is no immediate need to > pursue it. > If anyone would like to nurture these patches he is more than welcome. > ps: you also have pv-dma option for Linux guests (same status though). > As time goes by most host will have either vt-d or amd iommu. Hmm. Well, as time goes by, most hosts will be 64 bit or more. But it does not mean that there's no need to maintain 32bits arch anymore... i hope anyway :) Are you saying that PCI passthrough without hardware support will not be available in (standard) kvm, even if patches exists for that? /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] PCI: allow pci driver to support only dynids
On Tuesday, November 25, 2008 7:36 pm Chris Wright wrote: > commit b41d6cf38e27 (PCI: Check dynids driver_data value for validity) > requires all drivers to include an id table to try and match > driver_data. Before validating driver_data check driver has an id > table. > > Cc: Jean Delvare <[EMAIL PROTECTED]> > Cc: Milton Miller <[EMAIL PROTECTED]> > Signed-off-by: Chris Wright <[EMAIL PROTECTED]> Applied these to my linux-next branch, thanks Chris. -- Jesse Barnes, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Virtio block device support
Hollis Blanchard wrote: On Tue, 2008-11-25 at 15:57 -0600, Anthony Liguori wrote: diff --git a/hw/pc.h b/hw/pc.h index f156b9e..bbfa2d6 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -152,4 +152,8 @@ void pci_piix4_ide_init(PCIBus *bus, BlockDriverState **hd_table, int devfn, void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd); +/* virtio-blk.c */ +void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, + BlockDriverState *bs); + #endif This shouldn't be in pc.h. I don't disagree. I don't know if you'd consider virtio.h to be a layering violation, but the virtio layers are already being compressed in these patches... Yeah, I think the virtio stuff could use some love but I'd like to avoid that until we have something in tree and merged against kvm-userspace. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] extboot: properly set int 0x13 return value
Glauber Costa wrote: Callers of int 0x13 usually rely on the carry flag being clear/set to indicate the status of the interrupt execution. However, our current code clear or set the flags register, which is totally useless. Whichever value it has, will be overwritten by the flags value _before_ the interrupt, due to the iret instruction. This fixes a bug that prevents slackware (and possibly win2k, untested) to boot. Good catch! Signed-off-by: Glauber Costa <[EMAIL PROTECTED]> Acked-by: Anthony Liguori <[EMAIL PROTECTED]> Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Virtio block device support
On Tue, 2008-11-25 at 15:57 -0600, Anthony Liguori wrote: > diff --git a/hw/pc.h b/hw/pc.h > index f156b9e..bbfa2d6 100644 > --- a/hw/pc.h > +++ b/hw/pc.h > @@ -152,4 +152,8 @@ void pci_piix4_ide_init(PCIBus *bus, > BlockDriverState **hd_table, int devfn, > > void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd); > > +/* virtio-blk.c */ > +void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, > + BlockDriverState *bs); > + > #endif This shouldn't be in pc.h. I don't know if you'd consider virtio.h to be a layering violation, but the virtio layers are already being compressed in these patches... -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1-1 mapping of devices without VT-d
Passera, Pablo R wrote: Hi everyone, I want to assign a PCI device directly to a VM (PCI passthrough) in a machine that does not have VT-d. I found something related with this in a presentation done at the 2008 KVM Forum called 1-1 mapping and a patch for this at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753. I am wondering if this is included or are there plans to include it in the latest KVM version? Although it had worked for us out of tree, there is no immediate need to pursue it. If anyone would like to nurture these patches he is more than welcome. ps: you also have pv-dma option for Linux guests (same status though). As time goes by most host will have either vt-d or amd iommu. Regards, Dor Thanks in advance, Pablo Pássera -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] extboot: properly set int 0x13 return value
Callers of int 0x13 usually rely on the carry flag being clear/set to indicate the status of the interrupt execution. However, our current code clear or set the flags register, which is totally useless. Whichever value it has, will be overwritten by the flags value _before_ the interrupt, due to the iret instruction. This fixes a bug that prevents slackware (and possibly win2k, untested) to boot. Signed-off-by: Glauber Costa <[EMAIL PROTECTED]> --- extboot/extboot.S | 52 ++-- 1 files changed, 26 insertions(+), 26 deletions(-) diff --git a/extboot/extboot.S b/extboot/extboot.S index 2630abb..e3d1adf 100644 --- a/extboot/extboot.S +++ b/extboot/extboot.S @@ -99,24 +99,24 @@ int19_handler: #define FLAGS_CF 0x01 -.macro clc - push %ax - pushf - pop %ax - and $(~FLAGS_CF), %ax - push %ax - popf - pop %ax +/* The two macro below clear/set the carry flag to indicate the status + * of the interrupt execution. It is not enough to issue a clc/stc instruction, + * since the value of the flags register will be overwritten by whatever is + * in the stack frame + */ +.macro clc_stack + push %bp + mov %sp, %bp + /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) */ + and $(~FLAGS_CF), 8(%bp) + pop %bp .endm -.macro stc - push %ax - pushf - pop %ax - or $(FLAGS_CF), %ax - push %ax - popf - pop %ax +.macro stc_stack + push %bp + /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) */ + or $(FLAGS_CF), 8(%bp) + pop %bp .endm /* we clobber %bx */ @@ -292,7 +292,7 @@ mul32: /* lo, hi, lo, hi */ disk_reset: movb $0, %ah - clc + clc_stack ret /* this really should be a function, not a macro but i'm lazy */ @@ -395,7 +395,7 @@ disk_reset: pop %ax mov $0, %ah - clc + clc_stack ret .endm @@ -454,12 +454,12 @@ read_disk_drive_parameters: pop %bx /* do this last since it's the most sensitive */ - clc + clc_stack ret alternate_disk_reset: movb $0, %ah - clc + clc_stack ret read_disk_drive_size: @@ -498,21 +498,21 @@ read_disk_drive_size: freea pop %bx - clc + clc_stack ret check_if_extensions_present: mov $0x30, %ah mov $0xAA55, %bx mov $0x07, %cx - clc + clc_stack ret .macro extended_read_write_sectors cmd cmpb $10, 0(%si) jg 1f mov $1, %ah - stc + stc_stack ret 1: push %ax @@ -544,7 +544,7 @@ check_if_extensions_present: pop %ax mov $0, %ah - clc + clc_stack ret .endm @@ -612,12 +612,12 @@ get_extended_drive_parameters: pop %ax mov $0, %ah - clc + clc_stack ret terminate_disk_emulation: mov $1, %ah - stc + stc_stack ret int13_handler: -- 1.5.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] extboot: properly set int 0x13 return value
Callers of int 0x13 usually rely on the carry flag being clear/set to indicate the status of the interrupt execution. However, our current code clear or set the flags register, which is totally useless. Whichever value it has, will be overwritten by the flags value _before_ the interrupt, due to the iret instruction. This fixes a bug that prevents slackware (and possibly win2k, untested) to boot. Signed-off-by: Glauber Costa <[EMAIL PROTECTED]> --- extboot/extboot.S | 52 ++-- 1 files changed, 26 insertions(+), 26 deletions(-) diff --git a/extboot/extboot.S b/extboot/extboot.S index 2630abb..4cbfe11 100644 --- a/extboot/extboot.S +++ b/extboot/extboot.S @@ -99,24 +99,24 @@ int19_handler: #define FLAGS_CF 0x01 -.macro clc - push %ax - pushf - pop %ax - and $(~FLAGS_CF), %ax - push %ax - popf - pop %ax +/* The two macro below clear/set the carry flag to indicate the status + * of the interrupt execution. It is not enough to issue a clc/stc instruction, + * since the value of the flags register will be overwritten by whatever is + * in the stack frame + */ +.macro clc_stack + push %bp + mov %sp, %bp + /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) + and $(~FLAGS_CF), 8(%bp) + pop %bp .endm -.macro stc - push %ax - pushf - pop %ax - or $(FLAGS_CF), %ax - push %ax - popf - pop %ax +.macro stc_stack + push %bp + /* 8 = 2 (bp, just pushed) + 2 (ip) + 3 (real mode interrupt frame) + or $(FLAGS_CF), 8(%bp) + pop %bp .endm /* we clobber %bx */ @@ -292,7 +292,7 @@ mul32: /* lo, hi, lo, hi */ disk_reset: movb $0, %ah - clc + clc_stack ret /* this really should be a function, not a macro but i'm lazy */ @@ -395,7 +395,7 @@ disk_reset: pop %ax mov $0, %ah - clc + clc_stack ret .endm @@ -454,12 +454,12 @@ read_disk_drive_parameters: pop %bx /* do this last since it's the most sensitive */ - clc + clc_stack ret alternate_disk_reset: movb $0, %ah - clc + clc_stack ret read_disk_drive_size: @@ -498,21 +498,21 @@ read_disk_drive_size: freea pop %bx - clc + clc_stack ret check_if_extensions_present: mov $0x30, %ah mov $0xAA55, %bx mov $0x07, %cx - clc + clc_stack ret .macro extended_read_write_sectors cmd cmpb $10, 0(%si) jg 1f mov $1, %ah - stc + stc_stack ret 1: push %ax @@ -544,7 +544,7 @@ check_if_extensions_present: pop %ax mov $0, %ah - clc + clc_stack ret .endm @@ -612,12 +612,12 @@ get_extended_drive_parameters: pop %ax mov $0, %ah - clc + clc_stack ret terminate_disk_emulation: mov $1, %ah - stc + stc_stack ret int13_handler: -- 1.5.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
splice() based interguest networking
Here's a random thought I had after seeing the new Xen netchannel2 tree had fast-path support for guest<=>guest communication. With virtio, we could do really fast interguest networking in userspace. We have a few requirements though: 1) There should be a minimal number of copies, just one in almost all cases. 2) The copy should occur on the receiving end since the receiver is most likely going to be accessing the data in the future 3) The copy should be done in the kernel so that in the future it could be accelerated with a generic DMA engine. So far, all the approaches required mmap()'ing the guest memory in both QEMU instances which makes it much less useful. I think splice solves this problem though and gets us most of the above for free. If we have two shared pipes() between the two QEMU processes, then: 1) On TX, we vmsplice() from the sg buffer to one pipe. This will end up being vmsplice_to_pipe() in the kernel which is zero-copy. 2) The pipe becomes readable which will result in an RX notification in the other process, we see if we have any buffers available in the receive queue. If so, we vmsplice() from the pipe to the sg buffer. This will result in a copy via vmsplice_to_user(). In the future, vmsplice_to_user() would be an obvious candidate for IO-AT acceleration. Since the copy is happening in the kernel, assuming you're not in a highmem situation, no page table manipulation is required. We still have to address feature negotation and such. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2351676 ] Guests hang periodically on Ubuntu-8.10
Bugs item #2351676, was opened at 2008-11-26 12:59 Message generated for change (Comment added) made by c_jones You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Chris Jones (c_jones) Assigned to: Nobody/Anonymous (nobody) Summary: Guests hang periodically on Ubuntu-8.10 Initial Comment: I'm seeing periodic hangs on my guests. I've been unable so far to find a trigger - they always boot fine, but after anywhere from 10 minutes to 24 hours they eventually hang completely. My setup: * AMD Athlon X2 4850e (2500 MHz dual core) * 4Gig memory * Ubuntu 8.10 server, 64-bit * KVMs tried: : kvm-72 (shipped with ubuntu) : kvm-79 (built myself, --patched-kernel option) * Kernels tried: : 2.6.27.7 (kernel.org, self built) : 2.6.27-7-server from Ubuntu 8.10 distribution In guests * Ubuntu 8.10 server, 64-bit (virtual machine install) * kernel 2.6.27-7-server from Ubuntu 8.10 I'm running the guests like: sudo /usr/local/bin/qemu-system-x86_64\ -daemonize \ -no-kvm-irqchip\ -hda Imgs/ndev_root.img\ -m 1024\ -cdrom ISOs/ubuntu-8.10-server-amd64.iso \ -vnc :4\ -net nic,macaddr=DE:AD:BE:EF:04:04,model=e1000 \ -net tap,ifname=tap4,script=/home/chris/kvm/qemu-ifup.sh The problem does not happen if I use -no-kvm. I've tried some other options that have no effect: -no-kvm-pit -no-acpi The disk images are raw format. When the guests hang, I cannot ping them, and the vnc console us hung. The qemu monitor is still accessible, and the guests recover if I issue a system_reset command from the monitor. However, often, the console will not take keyboard after doing so. When the guest is hung, kvm_stat shows all 0s for the counters: efer_relo exits fpu_reloa halt_exit halt_wake host_stat hypercall +insn_emul insn_emul invlpg io_exits irq_exits irq_windo largepage +mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc +mmu_shado nmi_windo pf_fixed pf_guest remote_tl request_i signal_ex +tlb_flush > 0 0 0 0 0 0 0 +0 0 0 0 0 0 0 0 +0 0 0 0 0 0 0 0 +0 0 0 0 0 0 gdb shows two threads - both waiting: c(gdb) info threads 2 Thread 0x414f1950 (LWP 422) 0x7f36f07a03e1 in sigtimedwait () from /lib/libc.so.6 1 Thread 0x7f36f1f306e0 (LWP 414) 0x7f36f084b482 in select () from /lib/libc.so.6 (gdb) thread 1 [Switching to thread 1 (Thread 0x7f36f1f306e0 (LWP 414))]#0 0x7f36f084b482 +in select () from /lib/libc.so.6 (gdb) bt #0 0x7f36f084b482 in select () from /lib/libc.so.6 #1 0x004094cb in main_loop_wait (timeout=0) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4719 #2 0x0050a7ea in kvm_main_loop () at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:619 #3 0x0040fafc in main (argc=, argv=0x79f41948) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4871 (gdb) thread 2 [Switching to thread 2 (Thread 0x414f1950 (LWP 422))]#0 0x7f36f07a03e1 in +sigtimedwait () from /lib/libc.so.6 (gdb) bt #0 0x7f36f07a03e1 in sigtimedwait () from /lib/libc.so.6 #1 0x0050a560 in kvm_main_loop_wait (env=0xc319e0, timeout=0) at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:284 #2 0x0050aaf7 in ap_main_loop (_env=) at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:425 #3 0x7f36f11ba3ea in start_thread () from /lib/libpthread.so.0 #4 0x7f36f0852c6d in clone () from /lib/libc.so.6 #5 0x in ?? () Any clues to help me resolve this would be much appreciated. -- >Comment By: Chris Jones (c_jones) Date: 2008-12-01 14:09 Message: Alexey, Thanks for the response. As you advised, I tried a Fedora 8 guest, and it does seem to be much more stable. However, I really need a Debian base system for my application. Not necessarily Ubuntu 8.10, but I haven't had much luck with others either. Do you have any recommendations on one that is particularly stable? Over the weekend I tried: Fedora 8 : Seems very stable, but I really need a debian base. Ubuntu 8.04LTS : Same periodic hangs I was seeing on 8.10 Debian 4.0 Etch: Seems stable on the guest, but on the host, qemu process is running 100% busy while the guest is idle.
STOP error with virtio on KVM-79/2.6.18/Win2k3 x64 guest
Sorry for the repost.. I forgot the subject line! Hi, I'm having problems with STOP errors (0x00d1) under KVM-79/2.6.18 whenever I try to use the virtio drivers. This post (http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue exactly, except that I'm using a Win2k3 x64 guest with the x64 paravirtual drivers instead of 32-bit guest/drivers. I am able to reproduce the problem reliably using iperf, the same as in the above post. When I disable virtio, the guest is very stable. Any suggestions are greatly appreciated. -Adrian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hi, I'm having problems with STOP errors (0x00d1) under KVM-79/2.6.18 whenever I try to use the virtio drivers. This post (http://marc.info/?l=kvm&m=121089259211638&w=2) describes the issue exactly, except that I'm using a Win2k3 x64 guest with the x64 paravirtual drivers instead of 32-bit guest/drivers. I am able to reproduce the problem reliably using iperf, the same as in the above post. When I disable virtio, the guest is very stable. Any suggestions are greatly appreciated. -Adrian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [v2] Remove TARGET_PAGE_SIZE from virtio interface
TARGET_PAGE_SIZE should only be used internal to qemu, not in guest/host interfaces. The virtio frontend code in Linux uses two constants (PFN shift and vring alignment) for the interface, so update qemu to match. I've tested this with PowerPC KVM and confirmed that it fixes virtio problems when using non-TARGET_PAGE_SIZE pages in the guest. Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]> --- Corrects a silly bug in v1. Paul Brook doesn't like the idea of a generic align() macro, so vring_align() is correct. --- hw/virtio.c | 16 +--- hw/virtio.h |6 ++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index e4224ab..0134b0b 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -51,6 +51,14 @@ /* Virtio ABI version, if we increment this, we break the guest driver. */ #define VIRTIO_PCI_ABI_VERSION 0 +/* How many bits to shift physical queue address written to QUEUE_PFN. + * 12 is historical, and due to x86 page size. */ +#define VIRTIO_PCI_QUEUE_ADDR_SHIFT12 + +/* The alignment to use between consumer and producer parts of vring. + * x86 pagesize again. */ +#define VIRTIO_PCI_VRING_ALIGN 4096 + /* QEMU doesn't strictly need write barriers since everything runs in * lock-step. We'll leave the calls to wmb() in though to make it obvious for * KVM or if kqemu gets SMP support. @@ -110,7 +118,9 @@ static void virtqueue_init(VirtQueue *vq, target_phys_addr_t pa) { vq->vring.desc = pa; vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc); -vq->vring.used = TARGET_PAGE_ALIGN(vq->vring.avail + offsetof(VRingAvail, ring[vq->vring.num])); +vq->vring.used = vring_align(vq->vring.avail + + offsetof(VRingAvail, ring[vq->vring.num]), + VIRTIO_PCI_VRING_ALIGN); } static inline uint64_t vring_desc_addr(VirtQueue *vq, int i) @@ -386,7 +396,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) vdev->features = val; break; case VIRTIO_PCI_QUEUE_PFN: -pa = (ram_addr_t)val << TARGET_PAGE_BITS; +pa = (ram_addr_t)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT; vdev->vq[vdev->queue_sel].pfn = val; if (pa == 0) virtio_reset(vdev); @@ -660,7 +670,7 @@ void virtio_load(VirtIODevice *vdev, QEMUFile *f) if (vdev->vq[i].pfn) { target_phys_addr_t pa; -pa = (ram_addr_t)vdev->vq[i].pfn << TARGET_PAGE_BITS; +pa = (ram_addr_t)vdev->vq[i].pfn << VIRTIO_PCI_QUEUE_ADDR_SHIFT; virtqueue_init(&vdev->vq[i], pa); } } diff --git a/hw/virtio.h b/hw/virtio.h index 1df8f83..ae92ece 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -47,6 +47,12 @@ /* This means don't interrupt guest when buffer consumed. */ #define VRING_AVAIL_F_NO_INTERRUPT1 +static inline target_phys_addr_t vring_align(target_phys_addr_t addr, + unsigned long align) +{ +return (addr + align - 1) & ~(align - 1); +} + typedef struct VirtQueue VirtQueue; typedef struct VirtIODevice VirtIODevice; -- 1.5.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Fri, Nov 28, 2008 at 10:50:36AM +0800, Han, Weidong wrote: > Joerg Roedel wrote: > > +struct iommu_domain *iommu_domain_alloc(struct device *dev) > > +{ > > + struct iommu_domain *domain; > > + int ret; > > + > > + domain = kmalloc(sizeof(*domain), GFP_KERNEL); > > + if (!domain) > > + return NULL; > > + > > + ret = iommu_ops->domain_init(domain, dev); > > + if (ret) > > + goto out_free; > > + > > + return domain; > > + > > +out_free: > > + kfree(domain); > > + > > + return NULL; > > +} > > +EXPORT_SYMBOL_GPL(iommu_domain_alloc); > > remove the parameter dev. [x] Done. > > + > > +void iommu_domain_free(struct iommu_domain *domain) > > +{ > > + iommu_ops->domain_destroy(domain); > > + kfree(domain); > > +} > > +EXPORT_SYMBOL_GPL(iommu_domain_free); > > + > > +int iommu_attach_device(struct iommu_domain *domain, struct device > > *dev) +{ > > + return iommu_ops->attach_dev(domain, dev); > > +} > > +EXPORT_SYMBOL_GPL(iommu_attach_device); > > + > > +void iommu_detach_device(struct iommu_domain *domain, struct device > > *dev) +{ > > + iommu_ops->detach_dev(domain, dev); > > +} > > +EXPORT_SYMBOL_GPL(iommu_detach_device); > > + > > +int iommu_map_address(struct iommu_domain *domain, > > + dma_addr_t iova, phys_addr_t paddr, > > + size_t size, int prot) > > +{ > > + return iommu_ops->map(domain, iova, paddr, size, prot); > > +} > > +EXPORT_SYMBOL_GPL(iommu_map_address); > > change to: > int iommu_map_pages(struct iommu_domain *domain, unsigned long gfn, > unsigned long pfn, unsigned long npages, int prot) > { > return iommu_ops->map(domain, gfn, pfn, npages, prot); > } > EXPORT_SYMBOL_GPL(iommu_map_pages); > > int iommu_unmap_pages(struct iommu_domain *domain, unsigned long gfn, > unsigned long npages) > { > return iommu_ops->map(domain, gfn, npages); > } > EXPORT_SYMBOL_GPL(iommu_unmap_pages); Ok, I added the unmap function. But I think this API should work with addresses instead of page numbers. This way the IO page size is transparent for the user. -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?
Anthony Liguori wrote: We've started getting some reports of corruption on "commit" in KVM. There is a long standing disk corruption issue too that is very difficult to reproduce. The thinking is that there is a bug somewhere in the qcow2 code. Is anyone actively looking into this? I am, though my actively is a lot less than could be desired. Additional eyes would be welcome. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compiling error : ppc440_bamboo.o fails
Hi Giuseppe, thanks for your mail. Feel free to CC [EMAIL PROTECTED] in the future, too... :) On Mon, 2008-12-01 at 15:08 +0100, Giuseppe Falsetti wrote: > Error messages: > gcc -I. -I.. -I/root/kvm-userspace/qemu/target-ppc > -I/root/kvm-userspace/qemu -MMD -MT ppc440_bamboo.o -MP -DNEED_CPU_H > -D__powerpc__ -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE > -D__user= -I/root/kvm-userspace/qemu/tcg > -I/root/kvm-userspace/qemu/tcg/ppc -I/root/kvm-userspace/qemu/fpu > -DHAS_AUDIO -DHAS_AUDIO_CHOICE -I/root/kvm-userspace/qemu/slirp -I > /root/kvm-userspace/qemu/../libkvm -I /root/kvm-userspace/libfdt -O2 -g > -fno-strict-aliasing -Wall -Wundef -Wendif-labels -Wwrite-strings -I > /root/kvm-userspace/kernel/include -c -o ppc440_bamboo.o > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c: In function 'bamboo_init': > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: warning: passing > argument 2 of 'load_uimage' from incompatible pointer type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: warning: passing > argument 3 of 'load_uimage' from incompatible pointer type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:108: error: too many > arguments to function 'load_uimage' Sorry about that... I'm currently in the process of merging PowerPC KVM support into upstream qemu, and due to this the kvm qemu fork has broken. > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:139: warning: passing > argument 1 of 'read_proc_dt_prop_cell' discards qualifiers from pointer > target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:140: warning: passing > argument 1 of 'read_proc_dt_prop_cell' discards qualifiers from pointer > target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:173: warning: passing > argument 2 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:173: warning: passing > argument 3 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:174: warning: passing > argument 2 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:174: warning: passing > argument 3 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:176: warning: passing > argument 2 of 'dt_cell_multi' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:176: warning: passing > argument 3 of 'dt_cell_multi' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:177: warning: passing > argument 2 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:177: warning: passing > argument 3 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:179: warning: passing > argument 2 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:179: warning: passing > argument 3 of 'dt_cell' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:180: warning: passing > argument 2 of 'dt_string' discards qualifiers from pointer target type > /root/kvm-userspace/qemu/hw/ppc440_bamboo.c:180: warning: passing > argument 3 of 'dt_string' discards qualifiers from pointer target type These are annoying, but just warnings so we can ignore them for now. I can provide you a patch to get you going again right now, but just to clarify: do you have a 440 system you're going to be running KVM on, and the G5 is just your build how? There currently is no KVM support for 970... -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Tue, Dec 02, 2008 at 12:58:29AM +0900, FUJITA Tomonori wrote: > On Mon, 01 Dec 2008 16:33:11 +0200 > Avi Kivity <[EMAIL PROTECTED]> wrote: > > > Joerg Roedel wrote: > > > Hmm, is there any hardware IOMMU with which we can't emulate domains by > > > partitioning the IO address space? This concept works for GART and > > > Calgary. > > > > > > > > > > Is partitioning secure? Domain X's user could program its hardware to > > dma to domain Y's addresses, zapping away Domain Y's user's memory. > > It can't be secure. So what's the point to emulate the domain > partitioning in many traditional hardware IOMMUs that doesn't support > it. Btw, if you use the k8-agp driver the GART space is already partitioned today. So this concept is not entirely new. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-img commit -- is there a limit on file sizes?
walt wrote: Some background for my question: I've been trying to install and then update Windows Vista using kvm. Everything works great until I use 'qemu-img commit' to apply all the Windows Updates to my original base install of Vista. After doing the qemu-img commit step, the backing file is now corrupt, 100% reproducibly. I don't have the same problem with Windows XP, however, and I wondered if the problem is caused by the sheer size of the commit that Vista requires. When I install XP, then windows-update, and then qemu-img commit the updates, I'm committing about 1GB of updates to a 3GB backing file. When I install Vista and then later commit the Vista updates, I'm committing a 3GB file to a 6GB backing file, and that's when the corruption happens every time. So I tried an experiment with Vista -- I deliberately limit the number of windows updates I allow at any one time, and then use qemu-img commit after each small update. Voila, everything now works perfectly -- no file corruption! We've started getting some reports of corruption on "commit" in KVM. There is a long standing disk corruption issue too that is very difficult to reproduce. The thinking is that there is a bug somewhere in the qcow2 code. Is anyone actively looking into this? Regards, Anthony Liguori And that's why I suspect there is a functional limit to the size of each commit I can do with qemu-img. Any thoughts or possible diagnostic maneuvers to be tried? Thanks! (BTW, I get the same results using 32-bit linux and 64-bit linux on the same amd64 machine, using both gcc3 and gcc4.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Tue, Dec 02, 2008 at 12:58:29AM +0900, FUJITA Tomonori wrote: > On Mon, 01 Dec 2008 16:33:11 +0200 > Avi Kivity <[EMAIL PROTECTED]> wrote: > > > Joerg Roedel wrote: > > > Hmm, is there any hardware IOMMU with which we can't emulate domains by > > > partitioning the IO address space? This concept works for GART and > > > Calgary. > > > > > > > > > > Is partitioning secure? Domain X's user could program its hardware to > > dma to domain Y's addresses, zapping away Domain Y's user's memory. > > It can't be secure. So what's the point to emulate the domain > partitioning in many traditional hardware IOMMUs that doesn't support > it. Its a generic way to make non-contiguous host memory io-contiguous. I already pointed out some potential users for this. -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch.
On Fri, 2008-11-28 at 10:26 +0100, Jan Kiszka wrote: > Zhang, Xiantao wrote: > >>From c25fa2e4de40e500bd364c3267d5be89a9cfbb4d Mon Sep 17 00:00:00 2001 > > From: Xiantao Zhang <[EMAIL PROTECTED]> > > Date: Fri, 28 Nov 2008 09:38:46 +0800 > > Subject: [PATCH] KVM: Qemu: push_nmi should be only used by I386 Arch. > > > > Use TARGET_I386 to exclude other archs. > > Signed-off-by: Xiantao Zhang <[EMAIL PROTECTED]> > > --- > > libkvm/libkvm.c |4 ++-- > > qemu/qemu-kvm.c |4 > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c > > index 40c95ce..851a93a 100644 > > --- a/libkvm/libkvm.c > > +++ b/libkvm/libkvm.c > > @@ -868,7 +868,7 @@ int kvm_run(kvm_context_t kvm, int vcpu, void *env) > > struct kvm_run *run = kvm->run[vcpu]; > > > > again: > > -#ifdef KVM_CAP_NMI > > +#ifdef TARGET_I386 > > push_nmi(kvm); > > #endif > > #if !defined(__s390__) > > @@ -1032,7 +1032,7 @@ int kvm_has_sync_mmu(kvm_context_t kvm) > > > > int kvm_inject_nmi(kvm_context_t kvm, int vcpu) > > { > > -#ifdef KVM_CAP_NMI > > +#ifdef TARGET_I386 > > return ioctl(kvm->vcpu_fd[vcpu], KVM_NMI); > > #else > > return -ENOSYS; > > diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c > > index cf0e85d..b6c8288 100644 > > --- a/qemu/qemu-kvm.c > > +++ b/qemu/qemu-kvm.c > > @@ -154,10 +154,12 @@ static int try_push_interrupts(void *opaque) > > return kvm_arch_try_push_interrupts(opaque); > > } > > > > +#ifdef TARGET_I386 > > static void push_nmi(void *opaque) > > { > > kvm_arch_push_nmi(opaque); > > } > > +#endif > > > > static void post_kvm_run(void *opaque, void *data) > > { > > @@ -742,7 +744,9 @@ static struct kvm_callbacks qemu_kvm_ops = { > > .shutdown = kvm_shutdown, > > .io_window = kvm_io_window, > > .try_push_interrupts = try_push_interrupts, > > +#ifdef TARGET_I386 > > .push_nmi = push_nmi, > > +#endif > > .post_kvm_run = post_kvm_run, > > .pre_kvm_run = pre_kvm_run, > > #ifdef TARGET_I386 > > This will now break when KVM_CAP_NMI is undefined, ie. when there is no > KVM_NMI IOCTL (=> older kvm module sets). Guys, we already have stubs for this (although they've been turned into dead code). Jan broke IA64 and PowerPC builds when he renamed "kvm_arch_try_push_nmi" to "kvm_arch_push_nmi", and the obvious fix is to update the stubs to match. That avoids all these ifdefs and associated problems. Avi, could you revert a8d12f98755be9330fcde055134511f76ecaa538 please? -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, 01 Dec 2008 16:33:11 +0200 Avi Kivity <[EMAIL PROTECTED]> wrote: > Joerg Roedel wrote: > > Hmm, is there any hardware IOMMU with which we can't emulate domains by > > partitioning the IO address space? This concept works for GART and > > Calgary. > > > > > > Is partitioning secure? Domain X's user could program its hardware to > dma to domain Y's addresses, zapping away Domain Y's user's memory. It can't be secure. So what's the point to emulate the domain partitioning in many traditional hardware IOMMUs that doesn't support it. The emulated domain support with the DMA mapping debugging feature might be useful to debug drivers but it doesn't mean that we need to add the emulated domain support to every hardware IOMMU. If you add it to swiotlb, everyone can enjoy the debugging. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Avi Kivity wrote: Anthony Liguori wrote: I see no compelling reason to do cpu placement internally. It can be done quite effectively externally. Memory allocation is tough, but I don't think it's out of reach. Looking at the numactl man page, you can do: numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1. Since we can already create VM's with the -mem-path argument, if you create a 2GB guest and want it to span two numa nodes, you could do: numactl --offset=0G --length=1G --membind=0 --file /dev/shm/A --touch numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch And then create the VM with: qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ... What's best about this approach, is that you get full access to what numactl is capable of. Interleaving, rebalancing, etc. It looks horribly difficult and unintuitive. It forces you to use -mem-path (which is an abomination; the only reason it lives is that we can't allocate large pages with it). As opposed to inventing new options for QEMU that convey all of the same information a slightly different way? We're stuck with -mem-path so we might as well make good use of it. The proposed syntax is: qemu -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3 The new syntax would be: qemu -smp 4 -numa nodes=2,cpus=1:2:3:4,mem=1G:1G -mem-path /dev/hugetlbfs/foo Then you would have to look up the thread ids, and do taskset taskset taskset taskset numactl -o 1G -l 1G -m 0 -f /dev/hugetlbfs/foo numactl -o 1G -l 1G -m 1 -f /dev/hugetlbfs/foo This may look like a lot more, but it's not going to be nearly enough to specify a NUMA placement on startup. What if you have a very large NUMA system and want to rebalance virtual machines? You need a mechanism to do this that now has to be exposed through the monitor. In fact, you'll almost certainly introduce a taskset-like monitor command and a numactl-like monitor command. Why reinvent the wheel? Plus, taskset and numactl gives you a lot of flexibility. All we're going to do by cooking this stuff into QEMU is artificially limit ourselves. Regards, Anthony LIguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, Dec 01, 2008 at 04:33:11PM +0200, Avi Kivity wrote: > Joerg Roedel wrote: > > Hmm, is there any hardware IOMMU with which we can't emulate domains by > > partitioning the IO address space? This concept works for GART and > > Calgary. > > > > > > Is partitioning secure? Domain X's user could program its hardware to > dma to domain Y's addresses, zapping away Domain Y's user's memory. No its not secure. But this problem exists with pv-dma without iommu too. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Anthony Liguori wrote: Avi Kivity wrote: Andre Przywara wrote: Any other useful commands for the monitor? Maybe (temporary) VCPU migration without page migration? Right now vcpu migration is done externally (we export the thread IDs so management can pin them as it wishes). If we add numa support, I think it makes sense do it internally as well. I suggest using the same syntax for the monitor as for the command line; that's simplest to learn and to implement. I see no compelling reason to do cpu placement internally. It can be done quite effectively externally. Memory allocation is tough, but I don't think it's out of reach. Looking at the numactl man page, you can do: numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1. Since we can already create VM's with the -mem-path argument, if you create a 2GB guest and want it to span two numa nodes, you could do: numactl --offset=0G --length=1G --membind=0 --file /dev/shm/A --touch numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch And then create the VM with: qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ... What's best about this approach, is that you get full access to what numactl is capable of. Interleaving, rebalancing, etc. It looks horribly difficult and unintuitive. It forces you to use -mem-path (which is an abomination; the only reason it lives is that we can't allocate large pages with it). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Anthony Liguori wrote: Andre Przywara wrote: Hi, this patch series introduces multiple NUMA nodes support within KVM guests. This will improve the performance of guests which are bigger than one node (number of VCPUs and/or amount of memory) and also allows better balancing by taking better usage of each node's memory. It also improves the one node case by pinning a guest to this node and avoiding access of remote memory from one VCPU. Could you please post this to qemu-devel? There's really nothing KVM specific here. It's almost useless to qemu until it can run vcpus on host threads. I agree it should be posted there though. I think the dependency on libnuma is a bad idea. It's mixing a mechanism (emulating NUMA layout) with a policy (how to do memory/VCPU placement). If you split the NUMA emulation bits into a separate patch series, that has no dependency on the host NUMA topology, I think we look at the existing mechanisms we have to see if they're sufficient to do static placement on NUMA boundaries. vcpu pinning is easy enough, I think the only place we're lacking is memory layout. Note, that's totally independent of the guest's NUMA characteristics though. You may still want half of memory to be pinned between two nodes even if the guest has no SRAT tables. You can do that easily with numactl. Fine grained control of host numa layout and guest numa emulation are only useful together (one could argue that guest numa emulation is useful by itself, for debugging the guest OS numa algorithms). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Anthony Liguori wrote: numactl --offset=0G --length=1G --membind=0 --file /dev/shm/A --touch numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch And then create the VM with: qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ... What's best about this approach, is that you get full access to what numactl is capable of. Interleaving, rebalancing, etc. Prefaulting, generating an error when NUMA placement can't be satisified, hugetlbfs support, yeah, this very much seems like the right thing to do to me. If you care enough about performance to do NUMA placement, you almost certainly are going to be doing hugetlbfs anyway so you get it practically for free. Regards, Anthony Liguori Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
1-1 mapping of devices without VT-d
Hi everyone, I want to assign a PCI device directly to a VM (PCI passthrough) in a machine that does not have VT-d. I found something related with this in a presentation done at the 2008 KVM Forum called 1-1 mapping and a patch for this at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753. I am wondering if this is included or are there plans to include it in the latest KVM version? Thanks in advance, Pablo Pássera -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Avi Kivity wrote: Andre Przywara wrote: Any other useful commands for the monitor? Maybe (temporary) VCPU migration without page migration? Right now vcpu migration is done externally (we export the thread IDs so management can pin them as it wishes). If we add numa support, I think it makes sense do it internally as well. I suggest using the same syntax for the monitor as for the command line; that's simplest to learn and to implement. I see no compelling reason to do cpu placement internally. It can be done quite effectively externally. Memory allocation is tough, but I don't think it's out of reach. Looking at the numactl man page, you can do: numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1. Since we can already create VM's with the -mem-path argument, if you create a 2GB guest and want it to span two numa nodes, you could do: numactl --offset=0G --length=1G --membind=0 --file /dev/shm/A --touch numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch And then create the VM with: qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ... What's best about this approach, is that you get full access to what numactl is capable of. Interleaving, rebalancing, etc. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Andre Przywara wrote: Hi, this patch series introduces multiple NUMA nodes support within KVM guests. This will improve the performance of guests which are bigger than one node (number of VCPUs and/or amount of memory) and also allows better balancing by taking better usage of each node's memory. It also improves the one node case by pinning a guest to this node and avoiding access of remote memory from one VCPU. Could you please post this to qemu-devel? There's really nothing KVM specific here. The user (or better: management application) specifies the host nodes the guest should use: -nodes 2,3 would create a two node guest mapped to node 2 and 3 on the host. These numbers are handed over to libnuma: VCPUs are pinned to the nodes and the allocated guest memory is bound to it's respective node. Since libnuma seems not to be installed everywhere, the user has to enable this via configure --enable-numa In the BIOS code an ACPI SRAT table was added, which describes the NUMA topology to the guest. The number of nodes is communicated via the CMOS RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me. I think the dependency on libnuma is a bad idea. It's mixing a mechanism (emulating NUMA layout) with a policy (how to do memory/VCPU placement). If you split the NUMA emulation bits into a separate patch series, that has no dependency on the host NUMA topology, I think we look at the existing mechanisms we have to see if they're sufficient to do static placement on NUMA boundaries. vcpu pinning is easy enough, I think the only place we're lacking is memory layout. Note, that's totally independent of the guest's NUMA characteristics though. You may still want half of memory to be pinned between two nodes even if the guest has no SRAT tables. Regards, Anthony Liguori To take use of the new BIOS, install the iasl compiler (http://acpica.org/downloads/) and type "make bios" before installing, so the default BIOS will be replaced with the modified one. Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes parameter reverts to the old behavior. Please apply. Regards, Andre. Patch 1/3: introduce a command line parameter Patch 2/3: allocate guests resources from different host nodes Patch 3/3: generate an appropriate SRAT ACPI table Signed-off-by: Andre Przywara <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Daniel P. Berrange wrote: The only problem is the default option for the host side, as libnuma requires to explicitly name the nodes. Maybe make the pin: part _not_ optional? I would at least want to pin the memory, one could discuss about the VCPUs... I think keeping it optional makes things more flexible for people invoking KVM. If omitted, then query current CPU pinning to determine which host NUMA nodes to allocate from. Well, -numa itself is optional. But yes, we could use the default cpu affinity mask to derive the default host numa nodes. The topology exposed to a guest will likely be the same every time you launch a particular VM, while the guest<-> host pinning is a point in time decision according to current available resources. Thus some apps / users may find it more convenient to have a fixed set of args they always use to invoke the KVM process, and instead control placement during the fork/exec'ing of KVM by explicitly calling sched_setaffinity or using numactl to launch. It should be easy enough to use sched_getaffinity to query current pining and from that determine appropriate NUMA nodes, if they leave out the pin= arg. I agree, nice idea. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
On Mon, Dec 01, 2008 at 03:15:19PM +0100, Andre Przywara wrote: > Avi Kivity wrote: > >>Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes > >>parameter reverts to the old behavior. > > > >'-nodes' is too generic a name ('node' could also mean a host). Suggest > >-numanode. > > > >Need more flexibility: specify the range of memory per node, which cpus > >are in the node, relative weights for the SRAT table: > > > > -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3 > > I converted my code to use the new firmware interface. This also makes > it possible to pass more information between qemu and BIOS (which > prevented a more flexible command line in the first version). > So I would opt for the following: > - use numanode (or simply numa?) instead of the misleading -nodes > - allow passing memory sizes, VCPU subsets and host CPU pin info > I would prefer Daniel's version: > -numa [,mem:[;...]] > [,cpu:[;...]] > [,pin:[;...]] > > That would allow easy things like -numa 2 (for a two guest node), not > given options would result in defaults (equally split-up resources). > > The only problem is the default option for the host side, as libnuma > requires to explicitly name the nodes. Maybe make the pin: part _not_ > optional? I would at least want to pin the memory, one could discuss > about the VCPUs... I think keeping it optional makes things more flexible for people invoking KVM. If omitted, then query current CPU pinning to determine which host NUMA nodes to allocate from. The topology exposed to a guest will likely be the same every time you launch a particular VM, while the guest<-> host pinning is a point in time decision according to current available resources. Thus some apps / users may find it more convenient to have a fixed set of args they always use to invoke the KVM process, and instead control placement during the fork/exec'ing of KVM by explicitly calling sched_setaffinity or using numactl to launch. It should be easy enough to use sched_getaffinity to query current pining and from that determine appropriate NUMA nodes, if they leave out the pin= arg. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
Joerg Roedel wrote: Hmm, is there any hardware IOMMU with which we can't emulate domains by partitioning the IO address space? This concept works for GART and Calgary. Is partitioning secure? Domain X's user could program its hardware to dma to domain Y's addresses, zapping away Domain Y's user's memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Andre Przywara wrote: Avi Kivity wrote: Andre Przywara wrote: The user (or better: management application) specifies the host nodes the guest should use: -nodes 2,3 would create a two node guest mapped to node 2 and 3 on the host. These numbers are handed over to libnuma: VCPUs are pinned to the nodes and the allocated guest memory is bound to it's respective node. Since libnuma seems not to be installed everywhere, the user has to enable this via configure --enable-numa In the BIOS code an ACPI SRAT table was added, which describes the NUMA topology to the guest. The number of nodes is communicated via the CMOS RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me. There exists now a firmware interface in qemu for this kind of communications. Oh, right you are, I missed that (was well hidden). I was looking at how the BIOS detects memory size and CPU numbers and these methods are quite cumbersome. Why not convert them to the FW_CFG methods (which the qemu side already sets)? To not diverge too much from the original BOCHS BIOS? Mostly. Also, no one felt the urge. Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes parameter reverts to the old behavior. '-nodes' is too generic a name ('node' could also mean a host). Suggest -numanode. Need more flexibility: specify the range of memory per node, which cpus are in the node, relative weights for the SRAT table: -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3 I converted my code to use the new firmware interface. This also makes it possible to pass more information between qemu and BIOS (which prevented a more flexible command line in the first version). So I would opt for the following: - use numanode (or simply numa?) instead of the misleading -nodes - allow passing memory sizes, VCPU subsets and host CPU pin info I would prefer Daniel's version: -numa [,mem:[;...]] [,cpu:[;...]] [,pin:[;...]] That would allow easy things like -numa 2 (for a two guest node), not given options would result in defaults (equally split-up resources). Yes, that look good. The only problem is the default option for the host side, as libnuma requires to explicitly name the nodes. Maybe make the pin: part _not_ optional? I would at least want to pin the memory, one could discuss about the VCPUs... If you can bench it, that would be best. My guess is that we would need to pin the vcpus. hange host nodes dynamically: Implementing a monitor interface is a good idea. (qemu) numanode 1 0 Does that include page migration? That would be easily possible with mbind(MPOL_MF_MOVE), but would take some time and resources (which I think is OK if explicitly triggered in the monitor). Yes, that's the main interest. Allow management to load balance numa nodes (as Linux doesn't do so automatically for long running processes). Any other useful commands for the monitor? Maybe (temporary) VCPU migration without page migration? Right now vcpu migration is done externally (we export the thread IDs so management can pin them as it wishes). If we add numa support, I think it makes sense do it internally as well. I suggest using the same syntax for the monitor as for the command line; that's simplest to learn and to implement. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-userspace: fix module build with --kerneldir
Please find my reworked patch attached. Support for pre-f1d28fb04 kernels was tested with 2.6.16.1. I CC-ed everyone who contributed to this thread, thanks for your help. I hope the "bureaucracy" is correct. I'm not a kernel developer and thus only know about the contribution process what I found in the documentation. so long Maik When kvm-userspace is build with a different kernel version than the running kernel the depmod at the end will fail. This patch fixed the problem. Signed-off-by: Maik Hentsche <[EMAIL PROTECTED]> Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> -- \ AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) / General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy diff --git a/configure b/configure index 63f956c..97a7cb7 100755 --- a/configure +++ b/configure @@ -15,6 +15,12 @@ qemu_opts=() cross_prefix= arch=`uname -m` target_exec= +# don't use uname if kerneldir is set +no_uname= +depmod_version= +if [ -z "TMPDIR" ] ; then +TMPDIR=. +fi usage() { cat <<-EOF @@ -56,6 +62,7 @@ while [[ "$1" = -* ]]; do ;; --kerneldir) kerneldir="$arg" +no_uname=1 ;; --with-patched-kernel) want_module= @@ -112,6 +119,21 @@ if [ -d "$kerneldir/include2" ]; then kernelsourcedir=${kerneldir%/*}/source fi +if [ -n "$no_uname" ]; then +if [ -e "$kerneldir/.kernelrelease" ]; then +depmod_version=`cat "$kerneldir/.kernelrelease"` + +elif [ -e "$kerneldir/include/config/kernel.release" ]; then +depmod_version=`cat "$kerneldir/include/config/kernel.release"` +else +echo +echo "Error: kernelversion not found" +echo "Please make sure your kernel is configured" +echo +exit 1 +fi +fi + #configure user dir (cd user; ./configure --prefix="$prefix" --kerneldir="$libkvm_kerneldir" \ --arch="$arch" --processor="$processor" \ @@ -143,6 +165,7 @@ CC=$cross_prefix$cc LD=$cross_prefix$ld OBJCOPY=$cross_prefix$objcopy AR=$cross_prefix$ar +DEPMOD_VERSION=$depmod_version EOF cat < kernel/config.kbuild diff --git a/kernel/Makefile b/kernel/Makefile index 41449d6..8315e3d 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -107,7 +107,7 @@ install: $(ORIGMODDIR)/arch/$(ARCH_DIR)/kvm/*.ko; do \ if [ -f "$$i" ]; then mv "$$i" "$$i.orig"; fi; \ done - /sbin/depmod -a + /sbin/depmod -a $(DEPMOD_VERSION) tmpspec = .tmp.kvm-kmod.spec signature.asc Description: PGP signature
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, Dec 01, 2008 at 11:18:39PM +0900, FUJITA Tomonori wrote: > On Mon, 1 Dec 2008 15:02:09 +0200 > Muli Ben-Yehuda <[EMAIL PROTECTED]> wrote: > > > On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote: > > > > > > > > The majority of the names (include/linux/iommu.h, iommu.c, > > > > > > iommu_ops, etc) looks too generic? We already have lots of > > > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several > > > > > > archs' iommu.c, etc). Such names are expected to be used by > > > > > > all the IOMMUs. > > > > > > > > > > The API is already useful for more than KVM. I also plan to > > > > > extend it to support more types of IOMMUs than VT-d and AMD > > > > > IOMMU in the future. But these changes are more intrusive than > > > > > this patchset and need more discussion. I prefer to do small > > > > > steps into this direction. > > > > > > > > Can you be more specific? What IOMMU could use this? For example, > > > > how GART can use this? I think that people expect the name 'struct > > > > iommu_ops' to be an abstract for all the IOMMUs (or the majority > > > > at least). If this works like that, the name is a good choice, I > > > > think. > > > > > > GART can't use exactly this. But with some extensions we can make it > > > useful for GART and GART-like IOMMUs too. For example we can emulate > > > domains in GART by partitioning the GART aperture space. > > > > That would only work with a pvdma API, since GART doesn't support > > multiple address spaces, and you don't get the isolation properties of > > a real IOMMU, so... why would you want to do that? > > If this works for only IOMMUs that support kinda domain concept, then > I think that a name like iommu_domain_ops is more appropriate. Hmm, is there any hardware IOMMU with which we can't emulate domains by partitioning the IO address space? This concept works for GART and Calgary. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, 1 Dec 2008 15:02:09 +0200 Muli Ben-Yehuda <[EMAIL PROTECTED]> wrote: > On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote: > > > > > > The majority of the names (include/linux/iommu.h, iommu.c, > > > > > iommu_ops, etc) looks too generic? We already have lots of > > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several > > > > > archs' iommu.c, etc). Such names are expected to be used by > > > > > all the IOMMUs. > > > > > > > > The API is already useful for more than KVM. I also plan to > > > > extend it to support more types of IOMMUs than VT-d and AMD > > > > IOMMU in the future. But these changes are more intrusive than > > > > this patchset and need more discussion. I prefer to do small > > > > steps into this direction. > > > > > > Can you be more specific? What IOMMU could use this? For example, > > > how GART can use this? I think that people expect the name 'struct > > > iommu_ops' to be an abstract for all the IOMMUs (or the majority > > > at least). If this works like that, the name is a good choice, I > > > think. > > > > GART can't use exactly this. But with some extensions we can make it > > useful for GART and GART-like IOMMUs too. For example we can emulate > > domains in GART by partitioning the GART aperture space. > > That would only work with a pvdma API, since GART doesn't support > multiple address spaces, and you don't get the isolation properties of > a real IOMMU, so... why would you want to do that? If this works for only IOMMUs that support kinda domain concept, then I think that a name like iommu_domain_ops is more appropriate. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Avi Kivity wrote: Andre Przywara wrote: The user (or better: management application) specifies the host nodes the guest should use: -nodes 2,3 would create a two node guest mapped to node 2 and 3 on the host. These numbers are handed over to libnuma: VCPUs are pinned to the nodes and the allocated guest memory is bound to it's respective node. Since libnuma seems not to be installed everywhere, the user has to enable this via configure --enable-numa In the BIOS code an ACPI SRAT table was added, which describes the NUMA topology to the guest. The number of nodes is communicated via the CMOS RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me. There exists now a firmware interface in qemu for this kind of communications. Oh, right you are, I missed that (was well hidden). I was looking at how the BIOS detects memory size and CPU numbers and these methods are quite cumbersome. Why not convert them to the FW_CFG methods (which the qemu side already sets)? To not diverge too much from the original BOCHS BIOS? Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes parameter reverts to the old behavior. '-nodes' is too generic a name ('node' could also mean a host). Suggest -numanode. Need more flexibility: specify the range of memory per node, which cpus are in the node, relative weights for the SRAT table: -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3 I converted my code to use the new firmware interface. This also makes it possible to pass more information between qemu and BIOS (which prevented a more flexible command line in the first version). So I would opt for the following: - use numanode (or simply numa?) instead of the misleading -nodes - allow passing memory sizes, VCPU subsets and host CPU pin info I would prefer Daniel's version: -numa [,mem:[;...]] [,cpu:[;...]] [,pin:[;...]] That would allow easy things like -numa 2 (for a two guest node), not given options would result in defaults (equally split-up resources). The only problem is the default option for the host side, as libnuma requires to explicitly name the nodes. Maybe make the pin: part _not_ optional? I would at least want to pin the memory, one could discuss about the VCPUs... Also need a monitor command to change host nodes dynamically: Implementing a monitor interface is a good idea. (qemu) numanode 1 0 Does that include page migration? That would be easily possible with mbind(MPOL_MF_MOVE), but would take some time and resources (which I think is OK if explicitly triggered in the monitor). Any other useful commands for the monitor? Maybe (temporary) VCPU migration without page migration? Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, Dec 01, 2008 at 03:02:09PM +0200, Muli Ben-Yehuda wrote: > On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote: > > > > > > The majority of the names (include/linux/iommu.h, iommu.c, > > > > > iommu_ops, etc) looks too generic? We already have lots of > > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several > > > > > archs' iommu.c, etc). Such names are expected to be used by > > > > > all the IOMMUs. > > > > > > > > The API is already useful for more than KVM. I also plan to > > > > extend it to support more types of IOMMUs than VT-d and AMD > > > > IOMMU in the future. But these changes are more intrusive than > > > > this patchset and need more discussion. I prefer to do small > > > > steps into this direction. > > > > > > Can you be more specific? What IOMMU could use this? For example, > > > how GART can use this? I think that people expect the name 'struct > > > iommu_ops' to be an abstract for all the IOMMUs (or the majority > > > at least). If this works like that, the name is a good choice, I > > > think. > > > > GART can't use exactly this. But with some extensions we can make it > > useful for GART and GART-like IOMMUs too. For example we can emulate > > domains in GART by partitioning the GART aperture space. > > That would only work with a pvdma API, since GART doesn't support > multiple address spaces, and you don't get the isolation properties of > a real IOMMU, so... why would you want to do that? Yes, this can not be used for not-pv device passthrough. But I think it can speed up the pvdma case. Beside that I can be used for UIO and devices which perform bad with sg. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM
Ok, I got them to apply. I also did the checkpatch cleanups. To speed things up a bit I would suggest that I rebase my patchset on your patches and send it out in a single series. Any problems with this approach? Joerg On Mon, Dec 01, 2008 at 09:22:42PM +0800, Han, Weidong wrote: > Sorry, this patch has style problem. I will update it and also split it to > smaller patches for easy reviewing. > > Regards, > Weidong > > 'Joerg Roedel' wrote: > > Hmm, I get these errors using git-am: > > > > Applying VT-d: Support multiple device assignment for KVM > > .dotest/patch:1344: space before tab in indent. > > clflush_cache_range(addr, size); > > .dotest/patch:1350: space before tab in indent. > > clflush_cache_range(addr, size); > > .dotest/patch:1907: trailing whitespace. > > > > .dotest/patch:1946: trailing whitespace. > > * owned by this domain, clear this iommu in iommu_bmp > > .dotest/patch:2300: trailing whitespace. > > > > error: patch failed: drivers/pci/dmar.c:484 > > error: drivers/pci/dmar.c: patch does not apply > > error: patch failed: drivers/pci/intel-iommu.c:50 > > error: drivers/pci/intel-iommu.c: patch does not apply > > error: patch failed: include/linux/dma_remapping.h:111 > > error: include/linux/dma_remapping.h: patch does not apply > > error: patch failed: include/linux/intel-iommu.h:219 > > error: include/linux/intel-iommu.h: patch does not apply > > Patch failed at 0001. > > > > Joerg > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: don't fee an unallocated irq source id
Set assigned_dev->irq_source_id to -1 so that we can avoid freeing a source ID which we never allocated. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> --- virt/kvm/kvm_main.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8dab7ce..63fd882 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -210,7 +210,10 @@ static void kvm_free_assigned_device(struct kvm *kvm, pci_disable_msi(assigned_dev->dev); kvm_unregister_irq_ack_notifier(&assigned_dev->ack_notifier); - kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id); + + if (assigned_dev->irq_source_id != -1) + kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id); + assigned_dev->irq_source_id = -1; if (cancel_work_sync(&assigned_dev->interrupt_work)) /* We had pending work. That means we will have to take @@ -466,7 +469,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, match->host_busnr = assigned_dev->busnr; match->host_devfn = assigned_dev->devfn; match->dev = dev; - + match->irq_source_id = -1; match->kvm = kvm; list_add(&match->list, &kvm->arch.assigned_dev_head); -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: make kvm_unregister_irq_ack_notifier() safe
We never pass a NULL notifier pointer here, but we may well pass a notifier struct which hasn't previously been registered. Guard against this by using hlist_del_init() which will not do anything if the node hasn't been added to the list and, when removing the node, will ensure that a subsequent call to hlist_del_init() will be fine too. Fixes an oops seen when an assigned device is freed before and IRQ is assigned to it. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> --- virt/kvm/irq_comm.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 973df99..db75045 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -63,9 +63,7 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm, void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian) { - if (!kian) - return; - hlist_del(&kian->link); + hlist_del_init(&kian->link); } /* The caller must hold kvm->lock mutex */ -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: remove the IRQ ACK notifier assertions
We will obviously never pass a NULL struct kvm_irq_ack_notifier* to this functions. They are always embedded in the assigned device structure, so the assertion add nothing. The irqchip_in_kernel() assertion is very out of place - clearly this little abstraction needs to know nothing about the upper layer details. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> --- virt/kvm/irq_comm.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9fbbdea..973df99 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -58,9 +58,6 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi) void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { - /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */ - ASSERT(irqchip_in_kernel(kvm)); - ASSERT(kian); hlist_add_head(&kian->link, &kvm->arch.irq_ack_notifier_list); } -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: add KVM_USERSPACE_IRQ_SOURCE_ID assertions
Make sure kvm_request_irq_source_id() never returns KVM_USERSPACE_IRQ_SOURCE_ID. Likewise, check that kvm_free_irq_source_id() never accepts KVM_USERSPACE_IRQ_SOURCE_ID. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> --- virt/kvm/irq_comm.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index db75045..aa5d1e5 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -72,11 +72,15 @@ int kvm_request_irq_source_id(struct kvm *kvm) unsigned long *bitmap = &kvm->arch.irq_sources_bitmap; int irq_source_id = find_first_zero_bit(bitmap, sizeof(kvm->arch.irq_sources_bitmap)); + if (irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) { printk(KERN_WARNING "kvm: exhaust allocatable IRQ sources!\n"); - irq_source_id = -EFAULT; - } else - set_bit(irq_source_id, bitmap); + return -EFAULT; + } + + ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID); + set_bit(irq_source_id, bitmap); + return irq_source_id; } @@ -84,7 +88,9 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id) { int i; - if (irq_source_id <= 0 || + ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID); + + if (irq_source_id < 0 || irq_source_id >= sizeof(kvm->arch.irq_sources_bitmap)) { printk(KERN_ERR "kvm: IRQ source ID out of range!\n"); return; -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: split out kvm_free_assigned_irq()
Split out the logic corresponding to undoing assign_irq() and clean it up a bit. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> --- virt/kvm/kvm_main.c | 29 ++--- 1 files changed, 22 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 63fd882..e41d39d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -200,14 +200,11 @@ static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian) enable_irq(dev->host_irq); } -static void kvm_free_assigned_device(struct kvm *kvm, -struct kvm_assigned_dev_kernel -*assigned_dev) +static void kvm_free_assigned_irq(struct kvm *kvm, + struct kvm_assigned_dev_kernel *assigned_dev) { - if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested_type) - free_irq(assigned_dev->host_irq, (void *)assigned_dev); - if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) - pci_disable_msi(assigned_dev->dev); + if (!irqchip_in_kernel(kvm)) + return; kvm_unregister_irq_ack_notifier(&assigned_dev->ack_notifier); @@ -215,12 +212,30 @@ static void kvm_free_assigned_device(struct kvm *kvm, kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id); assigned_dev->irq_source_id = -1; + if (!assigned_dev->irq_requested_type) + return; + if (cancel_work_sync(&assigned_dev->interrupt_work)) /* We had pending work. That means we will have to take * care of kvm_put_kvm. */ kvm_put_kvm(kvm); + free_irq(assigned_dev->host_irq, (void *)assigned_dev); + + if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) + pci_disable_msi(assigned_dev->dev); + + assigned_dev->irq_requested_type = 0; +} + + +static void kvm_free_assigned_device(struct kvm *kvm, +struct kvm_assigned_dev_kernel +*assigned_dev) +{ + kvm_free_assigned_irq(kvm, assigned_dev); + pci_reset_function(assigned_dev->dev); pci_release_regions(assigned_dev->dev); -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: gracefully handle zero in kvm_free_irq_source_id()
On Sun, 2008-11-30 at 12:28 +0200, Avi Kivity wrote: > Mark McLoughlin wrote: > > Allow kvm_free_irq_source_id() to be called with a zero ID. > > > > Zero is reserved for KVM_USERSPACE_IRQ_SOURCE_ID, so we can > > guarantee that kvm_request_irq_source_id() will never return > > zero and use zero to indicate "no source ID allocated". > > > > > > Zero is a legal value for irq source ids, overloading it as something > else is confusing. Fair enough; I choose zero because it's naturally initialised to that by the kzalloc(). But I prefer explicit initialisation anyway, so ... > Things should continue to work if we #define it to 17. Okay, let's try with -1 then. > > + ASSERT(irq_source_id != 0); /* KVM_USERSPACE_IRQ_SOURCE_ID reserved */ > > > > Why not replace 0 with the actual symbolic constant? Because I was giving 0 two meanings :-) Respin of the patches coming up. Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
error "could not open disk image" and snapshot=on (if off it works)
Ciao, I have this strange problem: ubuntu 8.10, kvm 72 2.6.27-7-server x86_64 GNU/Linux vdeq kvm -name proxy_UBUNTU_8.04 \ -net nic,macaddr=00:16:3e:00:a0:00-net tap,ifname=tap1,script=no,downscript=no \ -net nic,macaddr=00:16:3e:00:a1:01,vlan=1 -net vde,vlan=1,sock=/var/run/vde2/tun1.ctl \ -drive file=./ubuntu-server-8.04_proxy.root,if=scsi,index=0,snapshot=off,cache=on,boot=on \ -drive file=./ubuntu-server-8.04_proxy.home,if=scsi,index=1,snapshot=off,cache=on \ -drive file=./linux.swap,if=scsi,index=2,cache=on,snapshot=on \ -smp 1 -M pc -cpu pentium3 -m 512 -k en-us -localtime qemu: could not open disk image ./linux.swap but $ ls -l linux.swap -rw-rw-r-- 1 paolop virtual 1073741824 2008-12-01 13:21 linux.swap exist If I set "snapshot=off" on linux.swap, kvm boot without problems. (the same happened with other file-device) So it's seems a "wrong error message", it's not a filesystem problem but an option problem (snapshot=on doesn't work, snapshot=off works) Any suggestion? thank you. -- Paolo Pedaletti -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM
Sorry, this patch has style problem. I will update it and also split it to smaller patches for easy reviewing. Regards, Weidong 'Joerg Roedel' wrote: > Hmm, I get these errors using git-am: > > Applying VT-d: Support multiple device assignment for KVM > .dotest/patch:1344: space before tab in indent. > clflush_cache_range(addr, size); > .dotest/patch:1350: space before tab in indent. > clflush_cache_range(addr, size); > .dotest/patch:1907: trailing whitespace. > > .dotest/patch:1946: trailing whitespace. > * owned by this domain, clear this iommu in iommu_bmp > .dotest/patch:2300: trailing whitespace. > > error: patch failed: drivers/pci/dmar.c:484 > error: drivers/pci/dmar.c: patch does not apply > error: patch failed: drivers/pci/intel-iommu.c:50 > error: drivers/pci/intel-iommu.c: patch does not apply > error: patch failed: include/linux/dma_remapping.h:111 > error: include/linux/dma_remapping.h: patch does not apply > error: patch failed: include/linux/intel-iommu.h:219 > error: include/linux/intel-iommu.h: patch does not apply > Patch failed at 0001. > > Joerg > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, Dec 01, 2008 at 01:00:26PM +0100, Joerg Roedel wrote: > > > > The majority of the names (include/linux/iommu.h, iommu.c, > > > > iommu_ops, etc) looks too generic? We already have lots of > > > > similar things (e.g. arch/{x86,ia64}/asm/iommu.h, several > > > > archs' iommu.c, etc). Such names are expected to be used by > > > > all the IOMMUs. > > > > > > The API is already useful for more than KVM. I also plan to > > > extend it to support more types of IOMMUs than VT-d and AMD > > > IOMMU in the future. But these changes are more intrusive than > > > this patchset and need more discussion. I prefer to do small > > > steps into this direction. > > > > Can you be more specific? What IOMMU could use this? For example, > > how GART can use this? I think that people expect the name 'struct > > iommu_ops' to be an abstract for all the IOMMUs (or the majority > > at least). If this works like that, the name is a good choice, I > > think. > > GART can't use exactly this. But with some extensions we can make it > useful for GART and GART-like IOMMUs too. For example we can emulate > domains in GART by partitioning the GART aperture space. That would only work with a pvdma API, since GART doesn't support multiple address spaces, and you don't get the isolation properties of a real IOMMU, so... why would you want to do that? Cheers, Muli -- The First Workshop on I/O Virtualization (WIOV '08) Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/ <-> SYSTOR 2009---The Israeli Experimental Systems Conference http://www.haifa.il.ibm.com/conferences/systor2009/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SR-IOV driver example 0/3] introduction
On Thu, Nov 27, 2008 at 12:59:33AM +0800, Greg KH wrote: > On Wed, Nov 26, 2008 at 10:03:03PM +0800, Yu Zhao wrote: > > SR-IOV drivers of Intel 82576 NIC are available. There are two parts > > of the drivers: Physical Function driver and Virtual Function driver. > > The PF driver is based on the IGB driver and is used to control PF to > > allocate hardware specific resources and interface with the SR-IOV core. > > The VF driver is a new NIC driver that is same as the traditional PCI > > device driver. It works in both the host and the guest (Xen and KVM) > > environment. > > > > These two drivers are testing versions and they are *only* intended to > > show how to use SR-IOV API. > > That's funny, as some distros are already shipping this driver. You > might want to tell them that this is an "example only" driver and not to > be used "for real"... :( Maybe they are shipping another version, not this one. This one is really a experimental patch, it's just created a week before... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core
On Thu, Nov 27, 2008 at 01:54:27AM +0800, Chris Wright wrote: > * Greg KH ([EMAIL PROTECTED]) wrote: > > > +static int > > > +igb_virtual(struct pci_dev *pdev, int nr_virtfn) > > > +{ > > > + unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF}; > > > + struct net_device *netdev = pci_get_drvdata(pdev); > > > + struct igb_adapter *adapter = netdev_priv(netdev); > > > + int i; > > > + > > > + if (nr_virtfn > 7) > > > + return -EINVAL; > > > > Why the check for 7? Is that the max virtual functions for this card? > > Shouldn't that be a define somewhere so it's easier to fix in future > > versions of this hardware? :) > > IIRC it's 8 for the card, 1 reserved for PF. I think both notions > should be captured w/ commented constants. You remember correctly! I'll put some comments there as suggested. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core
On Thu, Nov 27, 2008 at 12:58:59AM +0800, Greg KH wrote: > On Wed, Nov 26, 2008 at 10:21:56PM +0800, Yu Zhao wrote: > > + my_mac_addr[5] = (unsigned char)i; > > + igb_set_vf_mac(netdev, i, my_mac_addr); > > + igb_set_vf_vmolr(adapter, i); > > + } > > + } else > > + printk(KERN_INFO "SR-IOV is disabled\n"); > > Is that really true? (oh, use dev_info as well.) What happens if you > had called this with "5" and then later with "0", you never destroyed > those existing virtual functions, yet the code does: > > > + adapter->vfs_allocated_count = nr_virtfn; > > Which makes the driver think they are not present. What happens when > the driver later goes to shut down? Are those resources freed up > properly? For now we hard-code the tx/rx queues allocation so this doesn't matter. Eventually this will become dynamic allocation: when number of VFs changes the corresponding resources need to be freed. I'll put more comments here. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SR-IOV driver example 0/3] introduction
On Thu, Nov 27, 2008 at 04:14:48AM +0800, Jeff Garzik wrote: > Yu Zhao wrote: > > SR-IOV drivers of Intel 82576 NIC are available. There are two parts > > of the drivers: Physical Function driver and Virtual Function driver. > > The PF driver is based on the IGB driver and is used to control PF to > > allocate hardware specific resources and interface with the SR-IOV core. > > The VF driver is a new NIC driver that is same as the traditional PCI > > device driver. It works in both the host and the guest (Xen and KVM) > > environment. > > > > These two drivers are testing versions and they are *only* intended to > > show how to use SR-IOV API. > > > > Intel 82576 NIC specification can be found at: > > http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf > > > > [SR-IOV driver example 1/3] PF driver: allocate hardware specific resource > > [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core > > [SR-IOV driver example 3/3] VF driver tar ball > > Please copy [EMAIL PROTECTED] on all network-related patches. This > is where the network developers live, and all patches on this list are > automatically archived for review and handling at > http://patchwork.ozlabs.org/project/netdev/list/ Will do. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Mon, Dec 01, 2008 at 05:38:11PM +0900, FUJITA Tomonori wrote: > On Fri, 28 Nov 2008 12:31:29 +0100 > Joerg Roedel <[EMAIL PROTECTED]> wrote: > > > On Fri, Nov 28, 2008 at 06:40:41PM +0900, FUJITA Tomonori wrote: > > > On Thu, 27 Nov 2008 16:40:48 +0100 > > > Joerg Roedel <[EMAIL PROTECTED]> wrote: > > > > > > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> > > > > --- > > > > drivers/base/iommu.c | 94 > > > > ++ > > > > 1 files changed, 94 insertions(+), 0 deletions(-) > > > > create mode 100644 drivers/base/iommu.c > > > > > > > > diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c > > > > new file mode 100644 > > > > index 000..7250b9c > > > > --- /dev/null > > > > +++ b/drivers/base/iommu.c > > > > > > Hmm, why is this at drivers/base/? Anyone except for kvm could use > > > this? If so, under virt/ is more appropriate? > > > > I don't see a reason why this should be KVM specific. KVM is the only > > user for now. But it can be used for i.e. UIO too. Or in drivers to > > speed up devices which have bad performance when they do scather gather > > IO. > > If there are some except for kvm that could use this, it should be > fine, I guess. > > Can you add such information (e.g. who could use this) to the patch > description? It should be in the git log if the patch is merged. Ok, I will add it. > > > The majority of the names (include/linux/iommu.h, iommu.c, iommu_ops, > > > etc) looks too generic? We already have lots of similar things > > > (e.g. arch/{x86,ia64}/asm/iommu.h, several archs' iommu.c, etc). Such > > > names are expected to be used by all the IOMMUs. > > > > The API is already useful for more than KVM. I also plan to extend it to > > support more types of IOMMUs than VT-d and AMD IOMMU in the future. But > > these changes are more intrusive than this patchset and need more > > discussion. I prefer to do small steps into this direction. > > Can you be more specific? What IOMMU could use this? For example, how > GART can use this? I think that people expect the name 'struct > iommu_ops' to be an abstract for all the IOMMUs (or the majority at > least). If this works like that, the name is a good choice, I think. GART can't use exactly this. But with some extensions we can make it useful for GART and GART-like IOMMUs too. For example we can emulate domains in GART by partitioning the GART aperture space. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System| Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center| AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [v2] VT-d: Support multiple device assignment for KVM
Hmm, I get these errors using git-am: Applying VT-d: Support multiple device assignment for KVM .dotest/patch:1344: space before tab in indent. clflush_cache_range(addr, size); .dotest/patch:1350: space before tab in indent. clflush_cache_range(addr, size); .dotest/patch:1907: trailing whitespace. .dotest/patch:1946: trailing whitespace. * owned by this domain, clear this iommu in iommu_bmp .dotest/patch:2300: trailing whitespace. error: patch failed: drivers/pci/dmar.c:484 error: drivers/pci/dmar.c: patch does not apply error: patch failed: drivers/pci/intel-iommu.c:50 error: drivers/pci/intel-iommu.c: patch does not apply error: patch failed: include/linux/dma_remapping.h:111 error: include/linux/dma_remapping.h: patch does not apply error: patch failed: include/linux/intel-iommu.h:219 error: include/linux/intel-iommu.h: patch does not apply Patch failed at 0001. Joerg On Mon, Dec 01, 2008 at 02:17:38PM +0800, Han, Weidong wrote: > It's developed based on commit 0f7d3ee6 on avi/master, but it still can be > applied on latest avi/master (commit 90755652). > > Regards, > Weidong > > Joerg Roedel wrote: > > Hmm, I tried to apply this patch against avi/master and linus/master > > but get merge conflicts. Where do these patches apply cleanly? > > > > Joerg > > > > On Thu, Nov 27, 2008 at 09:49:04PM +0800, Han, Weidong wrote: > >> In order to support multiple device assignment for KVM, this patch > >> does following main changes: > >>- extend dmar_domain to own multiple devices from different > >> iommus, use a bitmap of iommus to replace iommu pointer in > >> dmar_domain. > >>- implement independent low level functions for kvm, then won't > >> impact native VT-d. > >>- "SAGAW" capability may be different across iommus, that's to > >> say the VT-d page table levels may be different among iommus. This > >> patch uses a defaut agaw, and skip top levels of page tables for > >> iommus which have smaller agaw than default. > >>- rename the APIs for kvm VT-d, make it more readable. > >> > >> > >> Signed-off-by: Weidong Han <[EMAIL PROTECTED]> > >> --- > >> drivers/pci/dmar.c| 15 + > >> drivers/pci/intel-iommu.c | 698 > >> ++-- > >> include/linux/dma_remapping.h | 21 +- include/linux/intel-iommu.h > >> | 21 +- 4 files changed, 637 insertions(+), 118 deletions(-) > >> > >> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c > >> index 691b3ad..d6bdced 100644 > >> --- a/drivers/pci/dmar.c > >> +++ b/drivers/pci/dmar.c > >> @@ -484,6 +484,7 @@ void __init detect_intel_iommu(void) > >> dmar_tbl = NULL; } > >> > >> +extern int width_to_agaw(int width); > >> > >> int alloc_iommu(struct dmar_drhd_unit *drhd) > >> { > >> @@ -491,6 +492,8 @@ int alloc_iommu(struct dmar_drhd_unit *drhd) > >> int map_size; u32 ver; > >> static int iommu_allocated = 0; > >> + unsigned long sagaw; > >> + int agaw; > >> > >> iommu = kzalloc(sizeof(*iommu), GFP_KERNEL); if > >> (!iommu) @@ -506,6 +509,18 @@ int alloc_iommu(struct dmar_drhd_unit > >> *drhd) iommu->cap = dmar_readq(iommu->reg + DMAR_CAP_REG); > >> iommu->ecap = dmar_readq(iommu->reg + DMAR_ECAP_REG); > >> > >> + /* set agaw, "SAGAW" may be different across iommus */ > >> + sagaw = cap_sagaw(iommu->cap); > >> + for (agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); > >> +agaw >= 0; agaw--) > >> + if (test_bit(agaw, &sagaw)) > >> + break; > >> + if (agaw < 0) { > >> + printk(KERN_ERR "IOMMU: unsupported sagaw %lx\n", > >> sagaw); + goto error; + } > >> + iommu->agaw = agaw; > >> + > >> /* the registers might be more than one page */ > >> map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap), > >> cap_max_fault_reg_offset(iommu->cap)); > >> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > >> index 5c8baa4..55b96c4 100644 > >> --- a/drivers/pci/intel-iommu.c > >> +++ b/drivers/pci/intel-iommu.c > >> @@ -50,8 +50,6 @@ > >> #define IOAPIC_RANGE_END (0xfeef) > >> #define IOVA_START_ADDR(0x1000) > >> > >> -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 > >> - > >> #define DOMAIN_MAX_ADDR(gaw) u64)1) << gaw) - 1) > >> > >> > >> @@ -64,6 +62,7 @@ struct deferred_flush_tables { > >> int next; > >> struct iova *iova[HIGH_WATER_MARK]; > >> struct dmar_domain *domain[HIGH_WATER_MARK]; > >> + struct intel_iommu *iommu; > >> }; > >> > >> static struct deferred_flush_tables *deferred_flush; > >> @@ -184,6 +183,69 @@ void free_iova_mem(struct iova *iova) > >> kmem_cache_free(iommu_iova_cache, iova); > >> } > >> > >> +/* in native case, each domain is related to only one iommu */ > >> +static struct intel_iommu *domain_get_only_iommu(struct dmar_dom
Re: [PATCH v2]: check for fops->owner in anon_inode_getfd
Am Donnerstag, 27. November 2008 schrieb Davide Libenzi: > > === > > --- kvm.orig/fs/anon_inodes.c > > +++ kvm/fs/anon_inodes.c > > @@ -79,9 +79,12 @@ int anon_inode_getfd(const char *name, c > > if (IS_ERR(anon_inode_inode)) > > return -ENODEV; > > > > + if (fops->owner && !try_module_get(fops->owner)) > > + return -ENOENT; > > + > > error = get_unused_fd_flags(flags); > > if (error < 0) > > - return error; > > + goto err_module; > > fd = error; > > > > /* > > @@ -128,6 +131,8 @@ err_dput: > > dput(dentry); > > err_put_unused_fd: > > put_unused_fd(fd); > > +err_module: > > + module_put(fops->owner); > > return error; > > } > > EXPORT_SYMBOL_GPL(anon_inode_getfd); > > Looks OK to me. Ok. Thanks. I will push this to Avi. Can I add a Reviewed-by: Davide Libenzi <[EMAIL PROTECTED]> to the patch? Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] add frontend implementation for the IOMMU API
On Fri, 28 Nov 2008 12:31:29 +0100 Joerg Roedel <[EMAIL PROTECTED]> wrote: > On Fri, Nov 28, 2008 at 06:40:41PM +0900, FUJITA Tomonori wrote: > > On Thu, 27 Nov 2008 16:40:48 +0100 > > Joerg Roedel <[EMAIL PROTECTED]> wrote: > > > > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> > > > --- > > > drivers/base/iommu.c | 94 > > > ++ > > > 1 files changed, 94 insertions(+), 0 deletions(-) > > > create mode 100644 drivers/base/iommu.c > > > > > > diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c > > > new file mode 100644 > > > index 000..7250b9c > > > --- /dev/null > > > +++ b/drivers/base/iommu.c > > > > Hmm, why is this at drivers/base/? Anyone except for kvm could use > > this? If so, under virt/ is more appropriate? > > I don't see a reason why this should be KVM specific. KVM is the only > user for now. But it can be used for i.e. UIO too. Or in drivers to > speed up devices which have bad performance when they do scather gather > IO. If there are some except for kvm that could use this, it should be fine, I guess. Can you add such information (e.g. who could use this) to the patch description? It should be in the git log if the patch is merged. > > The majority of the names (include/linux/iommu.h, iommu.c, iommu_ops, > > etc) looks too generic? We already have lots of similar things > > (e.g. arch/{x86,ia64}/asm/iommu.h, several archs' iommu.c, etc). Such > > names are expected to be used by all the IOMMUs. > > The API is already useful for more than KVM. I also plan to extend it to > support more types of IOMMUs than VT-d and AMD IOMMU in the future. But > these changes are more intrusive than this patchset and need more > discussion. I prefer to do small steps into this direction. Can you be more specific? What IOMMU could use this? For example, how GART can use this? I think that people expect the name 'struct iommu_ops' to be an abstract for all the IOMMUs (or the majority at least). If this works like that, the name is a good choice, I think. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html