Re: [PATCH 1/1] uio_pci_generic: extensions to allow access for non-privileged processes
Btw. This patch posting is broken. It suffers from line-wraps which make it impossible to apply as-is. I was able to fix it but please consider this in your next posting. On Wed, Mar 31, 2010 at 05:12:35PM -0700, Tom Lyon wrote: --- linux-2.6.33/drivers/uio/uio_pci_generic.c2010-02-24 10:52:17.0 -0800 ^ Unexpected line-wrap. I also got some whitespace warnings when trying to apply it. Please make sure you fix this in the next version too. Thanks, Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest
Hi Lucas Thanks for your comments. Please find the patch, with suggested changes. Thanks Pradeep Signed-off-by: Pradeep Kumar Surisetty psuri...@linux.vnet.ibm.com --- diff -uprN autotest-old/client/tests/kvm/tests/balloon_check.py autotest/client/tests/kvm/tests/balloon_check.py --- autotest-old/client/tests/kvm/tests/balloon_check.py1969-12-31 19:00:00.0 -0500 +++ autotest/client/tests/kvm/tests/balloon_check.py2010-04-09 12:33:34.0 -0400 @@ -0,0 +1,47 @@ +import re, string, logging, random, time +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_utils + +def run_balloon_check(test, params, env): + +Check Memory ballooning: +1) Boot a guest +2) Increase and decrease the memory of guest using balloon command from monitor +3) check memory info + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. + + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm) +fail = 0 + +# Check memory size +logging.info(Memory size check) +expected_mem = int(params.get(mem)) +actual_mem = vm.get_memory_size() +if actual_mem != expected_mem: +logging.error(Memory size mismatch:) +logging.error(Assigned to VM: %s % expected_mem) +logging.error(Reported by OS: %s % actual_mem) + +#change memory to random size between 60% to 95% of actual memory +percent = random.uniform(0.6, 0.95) +new_mem = int(percent*expected_mem) +vm.send_monitor_cmd(balloon %s %new_mem) +time.sleep(20) +status, output = vm.send_monitor_cmd(info balloon) +if status != 0: +logging.error(qemu monitor command failed: info balloon) + +balloon_cmd_mem = int(re.findall(\d+,output)[0]) +if balloon_cmd_mem != new_mem: +logging.error(memory ballooning failed while changing memory to %s %balloon_cmd_mem) + fail += 1 + +#Checking for test result +if fail != 0: +raise error.TestFail(Memory ballooning test failed ) +session.close() diff -uprN autotest-old/client/tests/kvm/tests_base.cfg.sample autotest/client/tests/kvm/tests_base.cfg.sample --- autotest-old/client/tests/kvm/tests_base.cfg.sample 2010-04-09 12:32:50.0 -0400 +++ autotest/client/tests/kvm/tests_base.cfg.sample 2010-04-09 12:53:27.0 -0400 @@ -185,6 +185,10 @@ variants: drift_threshold = 10 drift_threshold_single = 3 +- balloon_check: install setup unattended_install boot +type = balloon_check +extra_params += -balloon virtio + - stress_boot: install setup unattended_install type = stress_boot max_vms = 5 ---
[PATCH RFC 0/5] KVM: Moving dirty bitmaps to userspace: double buffering approach
Hi, this is the first version! We've first implemented the x86 specific parts without introducing new APIs: so this code works with current qemu-kvm. Although we have many things to do, we'd like to get some comments to see we are going to the right direction. Note: we are now testing this and now getting to be thinking we may be able to improve some performance especially concerning to the migration, graphics, etc, sorry but not confirmed yet. Thanks in advance, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps
We will use this later in other parts. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- arch/powerpc/kvm/book3s.c |2 +- arch/x86/kvm/x86.c|2 +- include/linux/kvm_host.h |5 + virt/kvm/kvm_main.c |4 ++-- 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a7ab2ea..3ca857b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1136,7 +1136,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kvm_for_each_vcpu(n, vcpu, kvm) kvmppc_mmu_pte_pflush(vcpu, ga, ga_end); - n = ALIGN(memslot-npages, BITS_PER_LONG) / 8; + n = kvm_dirty_bitmap_bytes(memslot); memset(memslot-dirty_bitmap, 0, n); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd5c3d3..450ecfe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2664,7 +2664,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (!memslot-dirty_bitmap) goto out; - n = ALIGN(memslot-npages, BITS_PER_LONG) / 8; + n = kvm_dirty_bitmap_bytes(memslot); r = -ENOMEM; dirty_bitmap = vmalloc(n); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8e91fa7..dd6bcf4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -119,6 +119,11 @@ struct kvm_memory_slot { int user_alloc; }; +static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) +{ + return ALIGN(memslot-npages, BITS_PER_LONG) / 8; +} + struct kvm_kernel_irq_routing_entry { u32 gsi; u32 type; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9379533..5ab581e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -645,7 +645,7 @@ skip_lpage: /* Allocate page dirty bitmap if needed */ if ((new.flags KVM_MEM_LOG_DIRTY_PAGES) !new.dirty_bitmap) { - unsigned dirty_bytes = ALIGN(npages, BITS_PER_LONG) / 8; + int dirty_bytes = kvm_dirty_bitmap_bytes(new); new.dirty_bitmap = vmalloc(dirty_bytes); if (!new.dirty_bitmap) @@ -777,7 +777,7 @@ int kvm_get_dirty_log(struct kvm *kvm, if (!memslot-dirty_bitmap) goto out; - n = ALIGN(memslot-npages, BITS_PER_LONG) / 8; + n = kvm_dirty_bitmap_bytes(memslot); for (i = 0; !any i n/sizeof(long); ++i) any = memslot-dirty_bitmap[i]; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps
For x86, we will change the allocation and free parts to do_mmap() and do_munmap(). This patch makes it cleaner. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- virt/kvm/kvm_main.c | 27 --- 1 files changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5ab581e..f919bd1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -431,6 +431,12 @@ out_err_nodisable: return ERR_PTR(r); } +static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + vfree(memslot-dirty_bitmap); + memslot-dirty_bitmap = NULL; +} + /* * Free any memory in @free but not in @dont. */ @@ -443,7 +449,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, vfree(free-rmap); if (!dont || free-dirty_bitmap != dont-dirty_bitmap) - vfree(free-dirty_bitmap); + kvm_destroy_dirty_bitmap(free); for (i = 0; i KVM_NR_PAGE_SIZES - 1; ++i) { @@ -454,7 +460,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, } free-npages = 0; - free-dirty_bitmap = NULL; free-rmap = NULL; } @@ -516,6 +521,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) return 0; } +static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + int dirty_bytes = kvm_dirty_bitmap_bytes(memslot); + + memslot-dirty_bitmap = vmalloc(dirty_bytes); + if (!memslot-dirty_bitmap) + return -ENOMEM; + + memset(memslot-dirty_bitmap, 0, dirty_bytes); + return 0; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -645,12 +662,8 @@ skip_lpage: /* Allocate page dirty bitmap if needed */ if ((new.flags KVM_MEM_LOG_DIRTY_PAGES) !new.dirty_bitmap) { - int dirty_bytes = kvm_dirty_bitmap_bytes(new); - - new.dirty_bitmap = vmalloc(dirty_bytes); - if (!new.dirty_bitmap) + if (kvm_create_dirty_bitmap(new) 0) goto out_free; - memset(new.dirty_bitmap, 0, dirty_bytes); /* destroy any largepage mappings for dirty tracking */ if (old.npages) flush_shadow = 1; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps
Currently, x86 vmalloc()s a dirty bitmap every time when we swich to the next dirty bitmap. To avoid this, we use the double buffering technique: we also move the bitmaps to userspace, so that extra bitmaps will not use the precious kernel resource. This idea is based on Avi's suggestion. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- arch/x86/include/asm/kvm_host.h |3 +++ include/linux/kvm_host.h|6 ++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0c49c88..b502bca 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -25,6 +25,9 @@ #include asm/mtrr.h #include asm/msr-index.h +/* Select x86 specific features in linux/kvm_host.h */ +#define __KVM_HAVE_USER_DIRTYBITMAP + #define KVM_MAX_VCPUS 64 #define KVM_MEMORY_SLOTS 32 /* memory slots that does not exposed to userspace */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index dd6bcf4..07092d6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -110,7 +110,13 @@ struct kvm_memory_slot { unsigned long npages; unsigned long flags; unsigned long *rmap; +#ifndef __KVM_HAVE_USER_DIRTYBITMAP unsigned long *dirty_bitmap; +#else + unsigned long __user *dirty_bitmap; + unsigned long __user *dirty_bitmap_old; + bool is_dirty; +#endif struct { unsigned long rmap_pde; int write_count; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 5/5] KVM: This is the main part of the moving dirty bitmaps to user space
By this patch, bitmap allocation is replaced with do_mmap() and bitmap manipulation is replaced with *_user() functions. Note that this does not change the APIs between kernel and user space. To get more advantage from this hack, we need to add a new interface for triggering the bitmap swith and getting the bitmap addresses: the addresses is in user space and we can export them to qemu. TODO: 1. We want to use copy_in_user() for 32bit case too. Note that this is only for the compatibility issue: in the future, we hope, qemu will not need to use this ioctl. 2. We have to implement test_bit_user() to avoid extra set_bit. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- arch/x86/kvm/x86.c | 118 + include/linux/kvm_host.h |4 ++ virt/kvm/kvm_main.c | 30 +++- 3 files changed, 130 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 450ecfe..995b970 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2642,16 +2642,99 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, return 0; } +int kvm_arch_create_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long user_addr1; + unsigned long user_addr2; + int dirty_bytes = kvm_dirty_bitmap_bytes(memslot); + + down_write(current-mm-mmap_sem); + user_addr1 = do_mmap(NULL, 0, dirty_bytes, +PROT_READ | PROT_WRITE, +MAP_PRIVATE | MAP_ANONYMOUS, 0); + if (IS_ERR((void *)user_addr1)) { + up_write(current-mm-mmap_sem); + return PTR_ERR((void *)user_addr1); + } + user_addr2 = do_mmap(NULL, 0, dirty_bytes, +PROT_READ | PROT_WRITE, +MAP_PRIVATE | MAP_ANONYMOUS, 0); + if (IS_ERR((void *)user_addr2)) { + do_munmap(current-mm, user_addr1, dirty_bytes); + up_write(current-mm-mmap_sem); + return PTR_ERR((void *)user_addr2); + } + up_write(current-mm-mmap_sem); + + memslot-dirty_bitmap = (unsigned long __user *)user_addr1; + memslot-dirty_bitmap_old = (unsigned long __user *)user_addr2; + clear_user(memslot-dirty_bitmap, dirty_bytes); + clear_user(memslot-dirty_bitmap_old, dirty_bytes); + + return 0; +} + +void kvm_arch_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + int n = kvm_dirty_bitmap_bytes(memslot); + + if (!memslot-dirty_bitmap) + return; + + down_write(current-mm-mmap_sem); + do_munmap(current-mm, (unsigned long)memslot-dirty_bitmap, n); + do_munmap(current-mm, (unsigned long)memslot-dirty_bitmap_old, n); + up_write(current-mm-mmap_sem); + + memslot-dirty_bitmap = NULL; + memslot-dirty_bitmap_old = NULL; +} + +static int kvm_copy_dirty_bitmap(unsigned long __user *to, +const unsigned long __user *from, int n) +{ +#ifdef CONFIG_X86_64 + if (copy_in_user(to, from, n) 0) { + printk(KERN_WARNING %s: copy_in_user failed\n, __func__); + return -EFAULT; + } + return 0; +#else + int ret = 0; + void *p = vmalloc(n); + + if (!p) { + ret = -ENOMEM; + goto out; + } + if (copy_from_user(p, from, n) 0) { + printk(KERN_WARNING %s: copy_from_user failed\n, __func__); + ret = -EFAULT; + goto out_free; + } + if (copy_to_user(to, p, n) 0) { + printk(KERN_WARNING %s: copy_to_user failed\n, __func__); + ret = -EFAULT; + goto out_free; + } + +out_free: + vfree(p); +out: + return ret; +#endif +} + /* * Get (and clear) the dirty memory log for a memory slot. */ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { - int r, n, i; + int r, n; struct kvm_memory_slot *memslot; - unsigned long is_dirty = 0; - unsigned long *dirty_bitmap = NULL; + unsigned long __user *dirty_bitmap; + unsigned long __user *dirty_bitmap_old; mutex_lock(kvm-slots_lock); @@ -2664,44 +2747,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (!memslot-dirty_bitmap) goto out; - n = kvm_dirty_bitmap_bytes(memslot); - - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; - memset(dirty_bitmap, 0, n); + dirty_bitmap = memslot-dirty_bitmap; + dirty_bitmap_old = memslot-dirty_bitmap_old; - for (i = 0; !is_dirty i n/sizeof(long); i++) - is_dirty = memslot-dirty_bitmap[i]; + n = kvm_dirty_bitmap_bytes(memslot); + clear_user(dirty_bitmap_old, n);
[RFC][PATCH v3 0/3] Provide a zero-copy method on KVM virtio-net.
The idea is simple, just to pin the guest VM user space and then let host NIC driver has the chance to directly DMA to it. The patches are based on vhost-net backend driver. We add a device which provides proto_ops as sendmsg/recvmsg to vhost-net to send/recv directly to/from the NIC driver. KVM guest who use the vhost-net backend may bind any ethX interface in the host side to get copyless data transfer thru guest virtio-net frontend. The scenario is like this: The guest virtio-net driver submits multiple requests thru vhost-net backend driver to the kernel. And the requests are queued and then completed after corresponding actions in h/w are done. For read, user space buffers are dispensed to NIC driver for rx when a page constructor API is invoked. Means NICs can allocate user buffers from a page constructor. We add a hook in netif_receive_skb() function to intercept the incoming packets, and notify the zero-copy device. For write, the zero-copy deivce may allocates a new host skb and puts payload on the skb_shinfo(skb)-frags, and copied the header to skb-data. The request remains pending until the skb is transmitted by h/w. Here, we have ever considered 2 ways to utilize the page constructor API to dispense the user buffers. One:Modify __alloc_skb() function a bit, it can only allocate a structure of sk_buff, and the data pointer is pointing to a user buffer which is coming from a page constructor API. Then the shinfo of the skb is also from guest. When packet is received from hardware, the skb-data is filled directly by h/w. What we have done is in this way. Pros: We can avoid any copy here. Cons: Guest virtio-net driver needs to allocate skb as almost the same method with the host NIC drivers, say the size of netdev_alloc_skb() and the same reserved space in the head of skb. Many NIC drivers are the same with guest and ok for this. But some lastest NIC drivers reserves special room in skb head. To deal with it, we suggest to provide a method in guest virtio-net driver to ask for parameter we interest from the NIC driver when we know which device we have bind to do zero-copy. Then we ask guest to do so. Is that reasonable? Two:Modify driver to get user buffer allocated from a page constructor API(to substitute alloc_page()), the user buffer are used as payload buffers and filled by h/w directly when packet is received. Driver should associate the pages with skb (skb_shinfo(skb)-frags). For the head buffer side, let host allocates skb, and h/w fills it. After that, the data filled in host skb header will be copied into guest header buffer which is submitted together with the payload buffer. Pros: We could less care the way how guest or host allocates their buffers. Cons: We still need a bit copy here for the skb header. We are not sure which way is the better here. This is the first thing we want to get comments from the community. We wish the modification to the network part will be generic which not used by vhost-net backend only, but a user application may use it as well when the zero-copy device may provides async read/write operations later. Please give comments especially for the network part modifications. We provide multiple submits and asynchronous notifiicaton to vhost-net too. Our goal is to improve the bandwidth and reduce the CPU usage. Exact performance data will be provided later. But for simple test with netperf, we found bindwidth up and CPU % up too, but the bindwidth up ratio is much more than CPU % up ratio. What we have not done yet: packet split support To support GRO Performance tuning what we have done in v1: polish the RCU usage deal with write logging in asynchroush mode in vhost add notifier block for mp device rename page_ctor to mp_port in netdevice.h to make it looks generic add mp_dev_change_flags() for mp device to change NIC state add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load a small fix for missing dev_put when fail using dynamic minor instead of static minor number a __KERNEL__ protect to mp_get_sock() what we have done in v2: remove most of the RCU usage, since the ctor pointer is only changed by BIND/UNBIND ioctl, and during that time, NIC will be stopped to get good cleanup(all outstanding requests are finished), so the ctor pointer cannot be raced into wrong situation. Remove the struct vhost_notifier with struct kiocb. Let vhost-net backend to alloc/free the kiocb and transfer them via sendmsg/recvmsg. use get_user_pages_fast() and set_page_dirty_lock()
[RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.
From: Xin Xiaohui xiaohui@intel.com Add a device to utilize the vhost-net backend driver for copy-less data transfer between guest FE and host NIC. It pins the guest user space to the host memory and provides proto_ops as sendmsg/recvmsg to vhost-net. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzha...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- memory leak fixed, kconfig made, do_unbind() made, mp_chr_ioctl() cleaned up and some other cleanups made by Jeff Dike jd...@linux.intel.com drivers/vhost/Kconfig |5 + drivers/vhost/Makefile|2 + drivers/vhost/mpassthru.c | 1264 + include/linux/mpassthru.h | 29 + 4 files changed, 1300 insertions(+), 0 deletions(-) create mode 100644 drivers/vhost/mpassthru.c create mode 100644 include/linux/mpassthru.h diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 9f409f4..ee32a3b 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -9,3 +9,8 @@ config VHOST_NET To compile this driver as a module, choose M here: the module will be called vhost_net. +config VHOST_PASSTHRU + tristate Zerocopy network driver (EXPERIMENTAL) + depends on VHOST_NET + ---help--- + zerocopy network I/O support diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 72dd020..3f79c79 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o vhost_net-y := vhost.o net.o + +obj-$(CONFIG_VHOST_PASSTHRU) += mpassthru.o diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c new file mode 100644 index 000..86d2525 --- /dev/null +++ b/drivers/vhost/mpassthru.c @@ -0,0 +1,1264 @@ +/* + * MPASSTHRU - Mediate passthrough device. + * Copyright (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define DRV_NAMEmpassthru +#define DRV_DESCRIPTION Mediate passthru device driver +#define DRV_COPYRIGHT (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + +#include linux/module.h +#include linux/errno.h +#include linux/kernel.h +#include linux/major.h +#include linux/slab.h +#include linux/smp_lock.h +#include linux/poll.h +#include linux/fcntl.h +#include linux/init.h +#include linux/aio.h + +#include linux/skbuff.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/miscdevice.h +#include linux/ethtool.h +#include linux/rtnetlink.h +#include linux/if.h +#include linux/if_arp.h +#include linux/if_ether.h +#include linux/crc32.h +#include linux/nsproxy.h +#include linux/uaccess.h +#include linux/virtio_net.h +#include linux/mpassthru.h +#include net/net_namespace.h +#include net/netns/generic.h +#include net/rtnetlink.h +#include net/sock.h + +#include asm/system.h + +#include vhost.h + +/* Uncomment to enable debugging */ +/* #define MPASSTHRU_DEBUG 1 */ + +#ifdef MPASSTHRU_DEBUG +static int debug; + +#define DBG if (mp-debug) printk +#define DBG1 if (debug == 2) printk +#else +#define DBG(a...) +#define DBG1(a...) +#endif + +#define COPY_THRESHOLD (L1_CACHE_BYTES * 4) +#define COPY_HDR_LEN (L1_CACHE_BYTES 64 ? 64 : L1_CACHE_BYTES) + +struct frag { + u16 offset; + u16 size; +}; + +struct page_ctor { + struct list_headreadq; + int w_len; + int r_len; + spinlock_t read_lock; + struct kmem_cache *cache; + /* record the locked pages */ + int lock_pages; + struct rlimit o_rlim; + struct net_device *dev; + struct mpassthru_port port; +}; + +struct page_info { + void*ctrl; + struct list_headlist; + int header; + /* indicate the actual length of bytes +* send/recv in the user space buffers +*/ + int total; + int offset; + struct page *pages[MAX_SKB_FRAGS+1]; + struct skb_frag_struct frag[MAX_SKB_FRAGS+1]; + struct sk_buff *skb; + struct page_ctor*ctor; + + /* The pointer relayed to skb, to indicate +* it's a user space allocated skb or kernel +*/ + struct skb_user_pageuser; + struct skb_shared_info ushinfo; + +#define INFO_READ
[RFC][PATCH v3 3/3] Let host NIC driver to DMA to guest user space.
From: Xin Xiaohui xiaohui@intel.com The patch let host NIC driver to receive user space skb, then the driver has chance to directly DMA to guest user space buffers thru single ethX interface. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzha...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- alloc_skb() is cleanup by Jeff Dike jd...@linux.intel.com include/linux/netdevice.h | 69 - include/linux/skbuff.h| 30 -- net/core/dev.c| 63 ++ net/core/skbuff.c | 74 4 files changed, 224 insertions(+), 12 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 94958c1..ba48eb0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -485,6 +485,17 @@ struct netdev_queue { unsigned long tx_dropped; } cacheline_aligned_in_smp; +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) +struct mpassthru_port { + int hdr_len; + int data_len; + int npages; + unsignedflags; + struct socket *sock; + struct skb_user_page*(*ctor)(struct mpassthru_port *, + struct sk_buff *, int); +}; +#endif /* * This structure defines the management hooks for network devices. @@ -636,6 +647,10 @@ struct net_device_ops { int (*ndo_fcoe_ddp_done)(struct net_device *dev, u16 xid); #endif +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) + int (*ndo_mp_port_prep)(struct net_device *dev, + struct mpassthru_port *port); +#endif }; /* @@ -891,7 +906,8 @@ struct net_device struct macvlan_port *macvlan_port; /* GARP */ struct garp_port*garp_port; - + /* mpassthru */ + struct mpassthru_port *mp_port; /* class/net/name entry */ struct device dev; /* space for optional statistics and wireless sysfs groups */ @@ -2013,6 +2029,55 @@ static inline u32 dev_ethtool_get_flags(struct net_device *dev) return 0; return dev-ethtool_ops-get_flags(dev); } -#endif /* __KERNEL__ */ +/* To support zero-copy between user space application and NIC driver, + * we'd better ask NIC driver for the capability it can provide, especially + * for packet split mode, now we only ask for the header size, and the + * payload once a descriptor may carry. + */ + +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE) +static inline int netdev_mp_port_prep(struct net_device *dev, + struct mpassthru_port *port) +{ + int rc; + int npages, data_len; + const struct net_device_ops *ops = dev-netdev_ops; + + /* needed by packet split */ + if (ops-ndo_mp_port_prep) { + rc = ops-ndo_mp_port_prep(dev, port); + if (rc) + return rc; + } else { + /* If the NIC driver did not report this, +* then we try to use it as igb driver. +*/ + port-hdr_len = 128; + port-data_len = 2048; + port-npages = 1; + } + + if (port-hdr_len = 0) + goto err; + + npages = port-npages; + data_len = port-data_len; + if (npages = 0 || npages MAX_SKB_FRAGS || + (data_len PAGE_SIZE * (npages - 1) || +data_len PAGE_SIZE * npages)) + goto err; + + return 0; +err: + dev_warn(dev-dev, invalid page constructor parameters\n); + + return -EINVAL; +} + +extern int netdev_mp_port_attach(struct net_device *dev, + struct mpassthru_port *port); +extern void netdev_mp_port_detach(struct net_device *dev); +#endif /* CONFIG_VHOST_PASSTHRU */ +#endif /* __KERNEL__ */ #endif /* _LINUX_NETDEVICE_H */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index df7b23a..e59fa57 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -209,6 +209,13 @@ struct skb_shared_info { void * destructor_arg; }; +struct skb_user_page { + u8 *start; + int size; + struct skb_frag_struct *frags; + struct skb_shared_info *ushinfo; + void(*dtor)(struct skb_user_page *); +}; /* We divide dataref into two halves. The higher 16 bits hold references * to the payload part of skb-data. The lower 16 bits hold references to * the entire skb-data. A clone of a headerless skb holds the length of @@ -441,17 +448,18 @@ extern void kfree_skb(struct sk_buff *skb); extern void
[RFC][PATCH v3 2/3] Provides multiple submits and asynchronous notifications.
From: Xin Xiaohui xiaohui@intel.com The vhost-net backend now only supports synchronous send/recv operations. The patch provides multiple submits and asynchronous notifications. This is needed for zero-copy case. Signed-off-by: Xin Xiaohui xiaohui@intel.com --- drivers/vhost/net.c | 203 +++-- drivers/vhost/vhost.c | 115 drivers/vhost/vhost.h | 15 3 files changed, 278 insertions(+), 55 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 22d5fef..d3fb3fc 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -17,11 +17,13 @@ #include linux/workqueue.h #include linux/rcupdate.h #include linux/file.h +#include linux/aio.h #include linux/net.h #include linux/if_packet.h #include linux/if_arp.h #include linux/if_tun.h +#include linux/mpassthru.h #include net/sock.h @@ -47,6 +49,7 @@ struct vhost_net { struct vhost_dev dev; struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX]; struct vhost_poll poll[VHOST_NET_VQ_MAX]; + struct kmem_cache *cache; /* Tells us whether we are polling a socket for TX. * We only do this when socket buffer fills up. * Protected by tx vq lock. */ @@ -91,11 +94,100 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock) net-tx_poll_state = VHOST_NET_POLL_STARTED; } +struct kiocb *notify_dequeue(struct vhost_virtqueue *vq) +{ + struct kiocb *iocb = NULL; + unsigned long flags; + + spin_lock_irqsave(vq-notify_lock, flags); + if (!list_empty(vq-notifier)) { + iocb = list_first_entry(vq-notifier, + struct kiocb, ki_list); + list_del(iocb-ki_list); + } + spin_unlock_irqrestore(vq-notify_lock, flags); + return iocb; +} + +static void handle_async_rx_events_notify(struct vhost_net *net, + struct vhost_virtqueue *vq) +{ + struct kiocb *iocb = NULL; + struct vhost_log *vq_log = NULL; + int rx_total_len = 0; + unsigned int head, log, in, out; + int size; + + if (vq-link_state != VHOST_VQ_LINK_ASYNC) + return; + + if (vq-receiver) + vq-receiver(vq); + + vq_log = unlikely(vhost_has_feature( + net-dev, VHOST_F_LOG_ALL)) ? vq-log : NULL; + while ((iocb = notify_dequeue(vq)) != NULL) { + vhost_add_used_and_signal(net-dev, vq, + iocb-ki_pos, iocb-ki_nbytes); + log = (int)iocb-ki_user_data; + size = iocb-ki_nbytes; + head = iocb-ki_pos; + rx_total_len += iocb-ki_nbytes; + + if (iocb-ki_dtor) + iocb-ki_dtor(iocb); + kmem_cache_free(net-cache, iocb); + + /* when log is enabled, recomputing the log info is needed, +* since these buffers are in async queue, and may not get +* the log info before. +*/ + if (unlikely(vq_log)) { + if (!log) + __vhost_get_vq_desc(net-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), + out, in, vq_log, + log, head); + vhost_log_write(vq, vq_log, log, size); + } + if (unlikely(rx_total_len = VHOST_NET_WEIGHT)) { + vhost_poll_queue(vq-poll); + break; + } + } +} + +static void handle_async_tx_events_notify(struct vhost_net *net, + struct vhost_virtqueue *vq) +{ + struct kiocb *iocb = NULL; + int tx_total_len = 0; + + if (vq-link_state != VHOST_VQ_LINK_ASYNC) + return; + + while ((iocb = notify_dequeue(vq)) != NULL) { + vhost_add_used_and_signal(net-dev, vq, + iocb-ki_pos, 0); + tx_total_len += iocb-ki_nbytes; + + if (iocb-ki_dtor) + iocb-ki_dtor(iocb); + + kmem_cache_free(net-cache, iocb); + if (unlikely(tx_total_len = VHOST_NET_WEIGHT)) { + vhost_poll_queue(vq-poll); + break; + } + } +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + struct kiocb *iocb = NULL; unsigned head, out, in, s; struct msghdr msg = { .msg_name = NULL, @@ -124,6 +216,8 @@ static void handle_tx(struct vhost_net *net)
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes
On 04/02/2010 08:05 PM, Greg KH wrote: Currently kvm does device assignment with its own code, I'd like to unify it with uio, not split it off. Separate notifications for msi-x interrupts are just as useful for uio as they are for kvm. I agree, there should not be a difference here for KVM vs. the normal version. Just so you know what you got into, here are the kvm requirements: - msi interrupts delivered via eventfd (these allow us to inject interrupts from uio to a guest without going through userspace) - nonlinear iommu mapping (i.e. map discontiguous ranges of the device address space into ranges of the virtual address space) - dynamic iommu mapping (support guest memory hotplug) - unprivileged operation once an admin has assigned a device (my preferred implementation is to have all operations go through an fd, which can be passed via SCM_RIGHTS from a privileged application that opens the file) - access to all config space, but BARs must be translated so userspace cannot attack the host - some mechanism which allows us to affine device interrupts with their target vcpus (eventually, this is vague) - anything mst might add - a pony -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest
On Fri, Apr 9, 2010 at 2:40 PM, pradeep psuri...@linux.vnet.ibm.com wrote: Hi Lucas Thanks for your comments. Please find the patch, with suggested changes. Thanks Pradeep Signed-off-by: Pradeep Kumar Surisetty psuri...@linux.vnet.ibm.com --- diff -uprN autotest-old/client/tests/kvm/tests/balloon_check.py autotest/client/tests/kvm/tests/balloon_check.py --- autotest-old/client/tests/kvm/tests/balloon_check.py 1969-12-31 19:00:00.0 -0500 +++ autotest/client/tests/kvm/tests/balloon_check.py 2010-04-09 12:33:34.0 -0400 @@ -0,0 +1,47 @@ +import re, string, logging, random, time +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_utils + +def run_balloon_check(test, params, env): + + Check Memory ballooning: + 1) Boot a guest + 2) Increase and decrease the memory of guest using balloon command from monitor Better replace this description by Change the guest memory between X and Y values Also instead of using 0.6 and 0.95 below, better use two variables and take their value from config file. This will give the user a flexibility to narrow or widen the ballooning range. + 3) check memory info + + �...@param test: kvm test object + �...@param params: Dictionary with the test parameters + �...@param env: Dictionary with test environment. + + + vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) + session = kvm_test_utils.wait_for_login(vm) + fail = 0 + + # Check memory size + logging.info(Memory size check) + expected_mem = int(params.get(mem)) + actual_mem = vm.get_memory_size() + if actual_mem != expected_mem: + logging.error(Memory size mismatch:) + logging.error(Assigned to VM: %s % expected_mem) + logging.error(Reported by OS: %s % actual_mem) + + #change memory to random size between 60% to 95% of actual memory + percent = random.uniform(0.6, 0.95) + new_mem = int(percent*expected_mem) + vm.send_monitor_cmd(balloon %s %new_mem) You may want to check if the command passed/failed. Older versions might not support ballooning. + time.sleep(20) why 20 second sleep and why the magic number? + status, output = vm.send_monitor_cmd(info balloon) You might want to put this check before changing the memory. + if status != 0: + logging.error(qemu monitor command failed: info balloon) + + balloon_cmd_mem = int(re.findall(\d+,output)[0]) A better variable name I can think of is ballooned_mem + if balloon_cmd_mem != new_mem: + logging.error(memory ballooning failed while changing memory to %s %balloon_cmd_mem) + fail += 1 + + #Checking for test result + if fail != 0: In case you are running multiple iterations and the 2nd iteration fails you will always miss this condition. + raise error.TestFail(Memory ballooning test failed ) + session.close() diff -uprN autotest-old/client/tests/kvm/tests_base.cfg.sample autotest/client/tests/kvm/tests_base.cfg.sample --- autotest-old/client/tests/kvm/tests_base.cfg.sample 2010-04-09 12:32:50.0 -0400 +++ autotest/client/tests/kvm/tests_base.cfg.sample 2010-04-09 12:53:27.0 -0400 @@ -185,6 +185,10 @@ variants: drift_threshold = 10 drift_threshold_single = 3 + - balloon_check: install setup unattended_install boot + type = balloon_check + extra_params += -balloon virtio + - stress_boot: install setup unattended_install type = stress_boot max_vms = 5 --- Rest all looks good ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- Regards Sudhir Kumar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Thu, 8 Apr 2010 18:01:01 +0200 Mohammed Gamal m.gamal...@gmail.com wrote: Hi, Now that Cam is almost done with his ivshmem patches, I was thinking of another idea for GSoC which is improving the pass-though filesystems. I've got some questions on that: 1- What does the community prefer to use and improve? CIFS, 9p, or both? And which is better taken up for GSoC. 2- With respect to CIFS. I wonder how the shares are supposed to be exposed to the guest. Should the Samba server be modified to be able to use unix domain sockets instead of TCP ports and then QEMU communicating on these sockets. With that approach, how should the guest be able to see the exposed share? And what is the problem of using Samba with TCP ports? 3- In addition, I see the idea mentions that some Windows code needs to be written to use network shares on a special interface. What's that interface? And what's the nature of that Windows code? (a driver a la guest additions?) CC'ing Aneesh as he's working on that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
On Thu, 2010-04-08 at 17:14 -0700, Rick Jones wrote: Here are the results with netperf TCP_STREAM 64K guest to host on a 8-cpu Nehalem system. I presume you mean 8 core Nehalem-EP, or did you mean 8 processor Nehalem-EX? Yes. It is a 2 socket quad-core Nehalem. so i guess it is a 8 core Nehalem-EP. Don't get me wrong, I *like* the netperf 64K TCP_STREAM test, I lik it a lot!-) but I find it incomplete and also like to run things like single-instance TCP_RR and multiple-instance, multiple transaction (./configure --enable-burst) TCP_RR tests, particularly when concerned with scaling issues. Can we run multiple instance and multiple transaction tests with a single netperf commandline? Is there any easy way to get consolidated throughput when a netserver on the host is servicing netperf clients from multiple guests? Thanks Sridhar happy benchmarking, rick jones It shows cumulative bandwidth in Mbps and host CPU utilization. Current default single vhost thread --- 1 guest: 12500 37% 2 guests: 12800 46% 3 guests: 12600 47% 4 guests: 12200 47% 5 guests: 12000 47% 6 guests: 11700 47% 7 guests: 11340 47% 8 guests: 11200 48% vhost thread per cpu 1 guest: 4900 25% 2 guests: 10800 49% 3 guests: 17100 67% 4 guests: 20400 84% 5 guests: 21000 90% 6 guests: 22500 92% 7 guests: 23500 96% 8 guests: 24500 99% vhost thread per guest interface 1 guest: 12500 37% 2 guests: 21000 72% 3 guests: 21600 79% 4 guests: 21600 85% 5 guests: 22500 89% 6 guests: 22800 94% 7 guests: 24500 98% 8 guests: 26400 99% Thanks Sridhar -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes
On Friday 09 April 2010 02:58:19 am Avi Kivity wrote: On 04/02/2010 08:05 PM, Greg KH wrote: Currently kvm does device assignment with its own code, I'd like to unify it with uio, not split it off. Separate notifications for msi-x interrupts are just as useful for uio as they are for kvm. I agree, there should not be a difference here for KVM vs. the normal version. Just so you know what you got into, here are the kvm requirements: - msi interrupts delivered via eventfd (these allow us to inject interrupts from uio to a guest without going through userspace) Check. - nonlinear iommu mapping (i.e. map discontiguous ranges of the device address space into ranges of the virtual address space) Check. - dynamic iommu mapping (support guest memory hotplug) Check. - unprivileged operation once an admin has assigned a device (my preferred implementation is to have all operations go through an fd, which can be passed via SCM_RIGHTS from a privileged application that opens the file) Check. - access to all config space, but BARs must be translated so userspace cannot attack the host Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess. - some mechanism which allows us to affine device interrupts with their target vcpus (eventually, this is vague) Do-able. - anything mst might add mst? - a pony Rainbow or glitter? The 'check' items are already done, not fully tested; probably available next week. Can we leave the others for future patches? Please? And I definitely need help with the PCI config stuff. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] uio_pci_generic: extensions to allow access for non-privileged processes
Mea culpa. On Friday 09 April 2010 02:08:55 am Joerg Roedel wrote: Btw. This patch posting is broken. It suffers from line-wraps which make it impossible to apply as-is. I was able to fix it but please consider this in your next posting. On Wed, Mar 31, 2010 at 05:12:35PM -0700, Tom Lyon wrote: --- linux-2.6.33/drivers/uio/uio_pci_generic.c 2010-02-24 10:52:17.0 -0800 ^ Unexpected line-wrap. I also got some whitespace warnings when trying to apply it. Please make sure you fix this in the next version too. Thanks, Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for?non-privileged processes
On Fri, Apr 09, 2010 at 09:34:16AM -0700, Tom Lyon wrote: The 'check' items are already done, not fully tested; probably available next week. Can we leave the others for future patches? Please? And I definitely need help with the PCI config stuff. Yeah, go in small steps forward. Just post again what you have next week. We can add more functionality step by step. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
Luiz Capitulino wrote: On Thu, 8 Apr 2010 18:01:01 +0200 Mohammed Gamal m.gamal...@gmail.com wrote: Hi, Now that Cam is almost done with his ivshmem patches, I was thinking of another idea for GSoC which is improving the pass-though filesystems. I've got some questions on that: 1- What does the community prefer to use and improve? CIFS, 9p, or both? And which is better taken up for GSoC. Please look at our recent set of patches. We are developing a 9P server for QEMU and client is already part of mainline Linux. Our goal is to optimize it for virualization environment and will work as FS pass-through mechanism between host and the guest. Here is the latest set of patches.. http://www.mail-archive.com/qemu-de...@nongnu.org/msg29267.html Please let us know if you are interested ... we can coordinate. Thanks, JV 2- With respect to CIFS. I wonder how the shares are supposed to be exposed to the guest. Should the Samba server be modified to be able to use unix domain sockets instead of TCP ports and then QEMU communicating on these sockets. With that approach, how should the guest be able to see the exposed share? And what is the problem of using Samba with TCP ports? 3- In addition, I see the idea mentions that some Windows code needs to be written to use network shares on a special interface. What's that interface? And what's the nature of that Windows code? (a driver a la guest additions?) CC'ing Aneesh as he's working on that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
Sridhar Samudrala wrote: On Thu, 2010-04-08 at 17:14 -0700, Rick Jones wrote: Here are the results with netperf TCP_STREAM 64K guest to host on a 8-cpu Nehalem system. I presume you mean 8 core Nehalem-EP, or did you mean 8 processor Nehalem-EX? Yes. It is a 2 socket quad-core Nehalem. so i guess it is a 8 core Nehalem-EP. Don't get me wrong, I *like* the netperf 64K TCP_STREAM test, I lik it a lot!-) but I find it incomplete and also like to run things like single-instance TCP_RR and multiple-instance, multiple transaction (./configure --enable-burst) TCP_RR tests, particularly when concerned with scaling issues. Can we run multiple instance and multiple transaction tests with a single netperf commandline? Do you count a shell for loop as a single command line? Is there any easy way to get consolidated throughput when a netserver on the host is servicing netperf clients from multiple guests? I tend to use a script such as: ftp://ftp.netperf.org/netperf/misc/runemomniagg2.sh which presumes that netperf/netserver have been built with: ./configure --enable-omni --enable-burst ... and uses the CSV output format of the omni tests. When I want sums I then turn to a spreadsheet, or I suppose I could turn to awk etc. The TCP_RR test can be flipped around request size for response size etc, so when I have a single sustem under test, I initiate the netperf commands on it, targetting netservers on the clients. If I want inbound bulk throughput I use the TCP_MAERTS test rather than the TCP_STREAM test. happy benchmarking, rick jones -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Fri, Apr 9, 2010 at 7:11 PM, jvrao jv...@linux.vnet.ibm.com wrote: Luiz Capitulino wrote: On Thu, 8 Apr 2010 18:01:01 +0200 Mohammed Gamal m.gamal...@gmail.com wrote: Hi, Now that Cam is almost done with his ivshmem patches, I was thinking of another idea for GSoC which is improving the pass-though filesystems. I've got some questions on that: 1- What does the community prefer to use and improve? CIFS, 9p, or both? And which is better taken up for GSoC. Please look at our recent set of patches. We are developing a 9P server for QEMU and client is already part of mainline Linux. Our goal is to optimize it for virualization environment and will work as FS pass-through mechanism between host and the guest. Here is the latest set of patches.. http://www.mail-archive.com/qemu-de...@nongnu.org/msg29267.html Please let us know if you are interested ... we can coordinate. Thanks, JV I'd be interested indeed. 2- With respect to CIFS. I wonder how the shares are supposed to be exposed to the guest. Should the Samba server be modified to be able to use unix domain sockets instead of TCP ports and then QEMU communicating on these sockets. With that approach, how should the guest be able to see the exposed share? And what is the problem of using Samba with TCP ports? 3- In addition, I see the idea mentions that some Windows code needs to be written to use network shares on a special interface. What's that interface? And what's the nature of that Windows code? (a driver a la guest additions?) CC'ing Aneesh as he's working on that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes
On 04/09/2010 07:34 PM, Tom Lyon wrote: - access to all config space, but BARs must be translated so userspace cannot attack the host Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess. Yes. Anything a guest's device driver may want to access. The 'check' items are already done, not fully tested; probably available next week. Can we leave the others for future patches? Please? Hey, I was expecting we'd have to do all of this. The requirements list was to get the uio maintainers confirmation that this is going in an acceptable direction. We can definitely proceed incrementally. And I definitely need help with the PCI config stuff. Sure. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hugetlbfs and KSM
* Bernhard Schmidt (be...@birkenwald.de) wrote: * KSM seems to be largely ineffective (100MB saved - 1.3MB saved) Am I doing something wrong? Is this a bug? Is this generally impossible with large pages (which might explain the lower load on the host, if large pages are not scanned)? Or is it just way less likely to have identical pages at that size? KSM only scans and merges 4k pages. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes
* Avi Kivity (a...@redhat.com) wrote: On 04/02/2010 08:05 PM, Greg KH wrote: - access to all config space, but BARs must be translated so userspace cannot attack the host Specifically, intermediated access to config space. For example, need to know about MSI/MSI-X updates in config space. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for?non-privileged processes
* Tom Lyon (p...@lyon-about.com) wrote: On Friday 09 April 2010 02:58:19 am Avi Kivity wrote: - access to all config space, but BARs must be translated so userspace cannot attack the host Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess. All of config space, but not raw access to all bits. So the MSI/MSI-X capability writes need to be intermediated. There's bits in the header too. And it's not just PCI, it's extended config space as well, drivers may care about finding their whizzybang PCIe capability and doing someting with it (and worse...they are allowed to put device specific registers there, and worse yet...they do!). thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
Mohammed Gamal wrote: 2- With respect to CIFS. I wonder how the shares are supposed to be exposed to the guest. Should the Samba server be modified to be able to use unix domain sockets instead of TCP ports and then QEMU communicating on these sockets. With that approach, how should the guest be able to see the exposed share? And what is the problem of using Samba with TCP ports? One problem with TCP ports is it only works when the guest's network is up :) You can't boot from that. It also makes things fragile or difficult if the guest work you are doing involves fiddling with the network settings. Doing it over virtio-serial would have many benefits. On the other hand, Samba+TCP+CIFS does have the advantage of working with virtually all guest OSes, including Linux / BSDs / Windows / MacOSX / Solaris etc. 9P only works with Linux as far as I know. I big problem with Samba at the moment is it's not possible to instantiate multiple instances of Samba any more, and not as a non-root user. That's because it contains some hard-coded paths to directories of run-time state, at least on Debian/Ubuntu hosts where I have tried and failed to use qemu's smb option, and there is no config file option to disable that or even change all the paths. Patching Samba to make per-user instantiations possible again would go a long way to making it useful for filesystem passthrough. Patching it so you can turn off all the fancy features and have it _just_ serve a filesystem with the most basic necessary authentication would be even better. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Fri, Apr 9, 2010 at 11:22 PM, Jamie Lokier ja...@shareable.org wrote: Mohammed Gamal wrote: 2- With respect to CIFS. I wonder how the shares are supposed to be exposed to the guest. Should the Samba server be modified to be able to use unix domain sockets instead of TCP ports and then QEMU communicating on these sockets. With that approach, how should the guest be able to see the exposed share? And what is the problem of using Samba with TCP ports? One problem with TCP ports is it only works when the guest's network is up :) You can't boot from that. It also makes things fragile or difficult if the guest work you are doing involves fiddling with the network settings. Doing it over virtio-serial would have many benefits. On the other hand, Samba+TCP+CIFS does have the advantage of working with virtually all guest OSes, including Linux / BSDs / Windows / MacOSX / Solaris etc. 9P only works with Linux as far as I know. I big problem with Samba at the moment is it's not possible to instantiate multiple instances of Samba any more, and not as a non-root user. That's because it contains some hard-coded paths to directories of run-time state, at least on Debian/Ubuntu hosts where I have tried and failed to use qemu's smb option, and there is no config file option to disable that or even change all the paths. Patching Samba to make per-user instantiations possible again would go a long way to making it useful for filesystem passthrough. Patching it so you can turn off all the fancy features and have it _just_ serve a filesystem with the most basic necessary authentication would be even better. -- Jamie Hi Jamie, Thanks for your input. That's all good and well. The question now is which direction would the community prefer to go. Would everyone be just happy with virtio-9p passthrough? Would it support multiple OSs (Windows comes to mind here)? Or would we eventually need to patch Samba for passthrough filesystems? Regards, Mohammed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Fri, Apr 9, 2010 at 5:17 PM, Mohammed Gamal m.gamal...@gmail.com wrote: That's all good and well. The question now is which direction would the community prefer to go. Would everyone be just happy with virtio-9p passthrough? Would it support multiple OSs (Windows comes to mind here)? Or would we eventually need to patch Samba for passthrough filesystems? found this: http://code.google.com/p/ninefs/ it's a BSD-licensed 9p client for windows i have no idea of how stable / complete / trustable it is; but might be some start -- Javier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Sat, Apr 10, 2010 at 12:22 AM, Javier Guerra Giraldez jav...@guerrag.com wrote: On Fri, Apr 9, 2010 at 5:17 PM, Mohammed Gamal m.gamal...@gmail.com wrote: That's all good and well. The question now is which direction would the community prefer to go. Would everyone be just happy with virtio-9p passthrough? Would it support multiple OSs (Windows comes to mind here)? Or would we eventually need to patch Samba for passthrough filesystems? found this: http://code.google.com/p/ninefs/ it's a BSD-licensed 9p client for windows i have no idea of how stable / complete / trustable it is; but might be some start -- Javier Hi Javier, Thanks for the link. However, I'm still concerned with interoperability with other operating systems, including non-Windows ones. I am not sure of how many operating systems actually support 9p, but I'm almost certain that CIFS would be more widely-supported. I am still a newbie as far as all this is concerned, so if anyone has any arguments as to whether which approach should be taken, I'd be enlightened to hear them. Regards, Mohammed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting nx bit in virtual CPU
Richard Simpson wrote: On 08/04/10 09:52, Andre Przywara wrote: Can you try to boot the attached multiboot kernel, which just outputs a brief CPUID dump? $ qemu-kvm -kernel cpuid_mb -vnc :0 (Unfortunately I have no serial console support in there yet, so you either have to write the values down or screenshot it). In the 4th line from the button it should print NX (after SYSCALL). OK, that was fun! Resulting screen shots are attached. ...default.png With command line above. ...cpu_host.png With -cpu host option added. ...no_kvm.png With -no-kvm option added. I hope that helps! OK, AFAIK there are several flags missing. I dimly remember there was a bug with masking the CPUID bits in older kernels, so I guess you have to celebrate your uptime for the last time and then give it a reboot with a more up-to-date host kernel. (I also rebooted my desktop after I made the one year and now am gone green with turning it off over night ;-) Maybe you get around with rebuilding fixed versions of kvm.ko and kvm_amd.ko, I can provide a fix for you if you wish (please point me to a way to get the actual kernel source you use). The userspace was up-to-date? (qemu-kvm 0.12.3)? Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 488-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html