Live migration makes VM unusable
Hi I am experiencing very weird and hard to debug issue for me with live migration. After successfully migrating VM it is not usable. It responds to echo requests (for some time). When I am trying to 'ping' someone only the first packet appears on network interface (I am able to receive one echo response). Command 'sleep' hangs forever, htop shows black screen. The problem appears randomly. I have already tried kvm 1.6.0 and 1.5.0. Kernel on host and guest is 3.10.5 (tried 3.10.7 as well). All VMs are running and live migrating through libvirt 1.1.2 (tried previous versions as well). Guest and host OS is Debian jessie/sid. Here is a command line used by libvirt to run VM: qemu-system-x86_64 -machine accel=kvm:tcg -name instance-08a0 -S -machine pc-i440fx-1.5,accel=kvm,usb=off -cpu SandyBridge,+erms,+smep,+fsgsbase,+rdrand,+f16c,+osxsave,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 573ab654-324c-4a5a-baf5-48d573c43a7d -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2013.1.3,serial=40181e1b-dad7-dd11-bfb4-10bf487fde32,uuid=573ab654-324c-4a5a-baf5-48d573c43a7d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-08a0.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:cinder_volumes/volume-cb489a8b-7af5-4c2d-91ee-9e26de3c23cd:id=cinder_volumes:key=mykey8CgeK34UmGR/oWjLwnjnw==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,serial=cb489a8b-7af5-4c2d-91ee-9e26de3c23cd,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=rbd:cinder_volumes/volume-bcc2d707-aedf-4b0d-bfbd-66d18f39d63e:id=cinder_volumes:key=mykey8CgeK34UmGR/oWjLwnjnw==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk1,format=raw,serial=bcc2d707-aedf-4b0d-bfbd-66d18f39d63e,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:e0:36:84,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/573ab654-324c-4a5a-baf5-48d573c43a7d/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 127.0.0.1:1 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 Not using rbd does not help as well. There is nothing interesting in logs or at least I am not able to collect them properly. Could you please give me some hints how to provide more useful information for your? regards -- Maciej Gałkiewicz Shelly Cloud Sp. z o. o., Sysadmin http://shellycloud.com/, mac...@shellycloud.com KRS: 440358 REGON: 101504426 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] Documentation/kvm: Update cpuid documentation for steal time and pv eoi
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- Changes in V2: Correction in the description of steal time and added msr info (Michael S Tsirkin) Documentation/virtual/kvm/cpuid.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 22ff659..6c4fb20 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -43,6 +43,16 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 -- +KVM_FEATURE_STEAL_TIME || 5 || Steal time available at msr + || || 0x4b564d03. The feature is enabled + || || by guest when host has schedstat + || || or task delay accounting support. +-- +KVM_FEATURE_PV_EOI || 6 || overrides the generic EOI + || || implementation with a + || || paravirtualized version. Available + || || at msr 0x4b564d04. +-- KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mapping guest memory from another process?
On Tue, Sep 03, 2013 at 07:56:33PM -0400, Cutter 409 wrote: I'm working on a tool that needs the ability to map the physical memory of a virtual machine into its own address space. With Xen, I can simply call xc_map_foreign_pages(). Is there something similar for KVM? So far, I can only figure out how to do it if I were the process that created the VM (then I could mmap() the handle of the virtual machine). Is there a way for an outside process to do this? You can get QEMU to do a shared mapping of a files as guest RAM using -mem-path and -mem-prealloc, see man qemu. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC qom-cpu 15/41] cpu: Move watchpoint fields from CPU_COMMON to CPUState
Signed-off-by: Andreas Färber afaer...@suse.de --- cpu-exec.c | 5 +++-- exec.c | 33 - gdbstub.c | 8 include/exec/cpu-defs.h | 10 -- include/qom/cpu.h | 10 ++ linux-user/main.c | 5 +++-- target-i386/cpu.h | 2 +- target-i386/helper.c| 7 --- target-i386/kvm.c | 8 target-xtensa/cpu.h | 2 +- target-xtensa/helper.c | 8 +--- 11 files changed, 55 insertions(+), 43 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 0081eaf..209380d 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -183,10 +183,11 @@ void cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler) static void cpu_handle_debug_exception(CPUArchState *env) { +CPUState *cpu = ENV_GET_CPU(env); CPUWatchpoint *wp; -if (!env-watchpoint_hit) { -QTAILQ_FOREACH(wp, env-watchpoints, entry) { +if (!cpu-watchpoint_hit) { +QTAILQ_FOREACH(wp, cpu-watchpoints, entry) { wp-flags = ~BP_WATCHPOINT_HIT; } } diff --git a/exec.c b/exec.c index 93958c3..5b70bf8 100644 --- a/exec.c +++ b/exec.c @@ -379,7 +379,7 @@ void cpu_exec_init(CPUArchState *env) cpu-cpu_index = cpu_index; cpu-numa_node = 0; QTAILQ_INIT(env-breakpoints); -QTAILQ_INIT(env-watchpoints); +QTAILQ_INIT(cpu-watchpoints); #ifndef CONFIG_USER_ONLY cpu-thread_id = qemu_get_thread_id(); #endif @@ -432,6 +432,7 @@ int cpu_watchpoint_insert(CPUArchState *env, target_ulong addr, target_ulong len int cpu_watchpoint_insert(CPUArchState *env, target_ulong addr, target_ulong len, int flags, CPUWatchpoint **watchpoint) { +CPUState *cpu = ENV_GET_CPU(env); target_ulong len_mask = ~(len - 1); CPUWatchpoint *wp; @@ -449,10 +450,11 @@ int cpu_watchpoint_insert(CPUArchState *env, target_ulong addr, target_ulong len wp-flags = flags; /* keep all GDB-injected watchpoints in front */ -if (flags BP_GDB) -QTAILQ_INSERT_HEAD(env-watchpoints, wp, entry); -else -QTAILQ_INSERT_TAIL(env-watchpoints, wp, entry); +if (flags BP_GDB) { +QTAILQ_INSERT_HEAD(cpu-watchpoints, wp, entry); +} else { +QTAILQ_INSERT_TAIL(cpu-watchpoints, wp, entry); +} tlb_flush_page(env, addr); @@ -465,10 +467,11 @@ int cpu_watchpoint_insert(CPUArchState *env, target_ulong addr, target_ulong len int cpu_watchpoint_remove(CPUArchState *env, target_ulong addr, target_ulong len, int flags) { +CPUState *cpu = ENV_GET_CPU(env); target_ulong len_mask = ~(len - 1); CPUWatchpoint *wp; -QTAILQ_FOREACH(wp, env-watchpoints, entry) { +QTAILQ_FOREACH(wp, cpu-watchpoints, entry) { if (addr == wp-vaddr len_mask == wp-len_mask flags == (wp-flags ~BP_WATCHPOINT_HIT)) { cpu_watchpoint_remove_by_ref(env, wp); @@ -481,7 +484,9 @@ int cpu_watchpoint_remove(CPUArchState *env, target_ulong addr, target_ulong len /* Remove a specific watchpoint by reference. */ void cpu_watchpoint_remove_by_ref(CPUArchState *env, CPUWatchpoint *watchpoint) { -QTAILQ_REMOVE(env-watchpoints, watchpoint, entry); +CPUState *cpu = ENV_GET_CPU(env); + +QTAILQ_REMOVE(cpu-watchpoints, watchpoint, entry); tlb_flush_page(env, watchpoint-vaddr); @@ -491,9 +496,10 @@ void cpu_watchpoint_remove_by_ref(CPUArchState *env, CPUWatchpoint *watchpoint) /* Remove all matching watchpoints. */ void cpu_watchpoint_remove_all(CPUArchState *env, int mask) { +CPUState *cpu = ENV_GET_CPU(env); CPUWatchpoint *wp, *next; -QTAILQ_FOREACH_SAFE(wp, env-watchpoints, entry, next) { +QTAILQ_FOREACH_SAFE(wp, cpu-watchpoints, entry, next) { if (wp-flags mask) cpu_watchpoint_remove_by_ref(env, wp); } @@ -677,6 +683,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env, int prot, target_ulong *address) { +CPUState *cpu = ENV_GET_CPU(env); hwaddr iotlb; CPUWatchpoint *wp; @@ -696,7 +703,7 @@ hwaddr memory_region_section_get_iotlb(CPUArchState *env, /* Make accesses to pages with watchpoints go via the watchpoint trap routines. */ -QTAILQ_FOREACH(wp, env-watchpoints, entry) { +QTAILQ_FOREACH(wp, cpu-watchpoints, entry) { if (vaddr == (wp-vaddr TARGET_PAGE_MASK)) { /* Avoid trapping reads of pages with a write breakpoint. */ if ((prot PAGE_WRITE) || (wp-flags BP_MEM_READ)) { @@ -1454,7 +1461,7 @@ static void check_watchpoint(int offset, int len_mask, int flags) CPUWatchpoint *wp; int cpu_flags; -if (env-watchpoint_hit) { +if (cpu-watchpoint_hit) { /* We re-entered the check after replacing the TB. Now raise * the debug interrupt so that is will trigger after the * current
Re: Live migration makes VM unusable
Hello, On Wednesday 04 September 2013 09:43:55 Maciej Gałkiewicz wrote: I am experiencing very weird and hard to debug issue for me with live migration. After successfully migrating VM it is not usable. It responds to echo requests (for some time). When I am trying to 'ping' someone only the first packet appears on network interface (I am able to receive one echo response). Command 'sleep' hangs forever, htop shows black screen. In the past I had a similar problem with buggy KVM/xen, when the CPU time stamp counters (TSC) were not synchronized between different host (this is not expected): For the VM the TSC jumped forward/backward and the kernel decided to wait in a busy-loop until the TSC was right again. Migrating the VM back often solved the problem temporary, as the problem only occurred when migrating from A to B, but not from B to A. You might check that as well. The problem appears randomly. I never noticed the problem, when I started my servers at the same time. I only noticed it, when I booted the servers with (several) minutes in between, so it looked random to me at first too. I have already tried kvm 1.6.0 and 1.5.0. Kernel on host and guest is 3.10.5 (tried 3.10.7 as well). All VMs are running and live migrating through libvirt 1.1.2 (tried previous versions as well). Guest and host OS is Debian jessie/sid. That looks new enough, so this might me a complete different problem. So just for reference here's the link to our German Bugzilla entry, were you can find my past findings: https://forge.univention.org/bugzilla/show_bug.cgi?id=23258#c6 Sincerely Philipp -- Philipp Hahn Open Source Software Engineer h...@univention.de Univention GmbHbe open. fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ Director:Peter H. Ganten HRB 20755 Amtsgericht Bremen UID:DE 220 051 310 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()
On 8/30/13 4:58 PM, Ying-Shiuan Pan wrote: From: Ying-Shiuan Pan yingshiuan@gmail.com This patch fixes a bug that vtirtio_mmio_init_ioeventfd() passed a wrong value when it invoked ioeventfd__add_event(). True value of 2nd parameter indicates the eventfd uses PIO bus which is used by virito-pci, however, for virtio-mmio, the value should be false. Signed-off-by: Ying-Shiuan Pan ys...@itri.org.tw Will, Marc? It would probably be good to change the two boolean arguments into one flags argument to avoid future bugs. --- tools/kvm/virtio/mmio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c index afa2692..3838774 100644 --- a/tools/kvm/virtio/mmio.c +++ b/tools/kvm/virtio/mmio.c @@ -55,10 +55,10 @@ static int virtio_mmio_init_ioeventfd(struct kvm *kvm, * Vhost will poll the eventfd in host kernel side, * no need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, true, false); + err = ioeventfd__add_event(ioevent, false, false); else /* Need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, true, true); + err = ioeventfd__add_event(ioevent, false, true); if (err) return err; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] kvm tools: remove periodic tick
On 9/3/13 9:10 PM, Jonathan Austin wrote: This patch series removes kvm tool's periodic tick function in favour of a thread that blocks waiting for input. The paths used for handling input are the same as when using a periodic tick, but they're not called unless there is actually input to be processed. On extremely slow platforms (eg FPGAs) the overhead involved in handling the timer tick means it is possible to make progress at all inside the VM! This patch addresses this problem. In doing this there are a number of small tidyups/cleanups that made sense, too: - Use a #define for maximum number of term devices - Refactor the method by which the virtio console handles input in order not to - handle input too early - handle input multiple times if the worker thread didn't immediately start work. - Rename the periodic_poll function to reflect the functional change Jonathan Austin (3): kvm tools: use #define for maximum number of terminal devices kvm tools: remove periodic tick in favour of a polling thread kvm tools: stop virtio console doing unnecessary input handling tools/kvm/arm/kvm.c |2 +- tools/kvm/builtin-run.c | 13 --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 50 --- tools/kvm/powerpc/kvm.c |2 +- tools/kvm/term.c| 38 +--- tools/kvm/virtio/console.c | 23 +--- tools/kvm/x86/kvm.c |2 +- 8 files changed, 59 insertions(+), 73 deletions(-) Seems reasonable to me. Marc, Will? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/vhost/scsi.c: avoid a 10-order allocation
On Wed, Sep 04, 2013 at 12:02:01PM +0300, Michael S. Tsirkin wrote: On Sun, Aug 18, 2013 at 12:18:38PM +0300, Michael S. Tsirkin wrote: On Sun, Aug 18, 2013 at 11:48:56AM +0300, Dan Aloni wrote: On 3.10.7 and x86_64, as a result of sizeof(struct vhost_scsi) being 2152960 bytes the allocation failed once on my development machine. Saw it would be prudent to split the bulk of it, which is the vqs array into separately allocated parts. sizeof(struct vhost_virtqueue) is currently 16816 bytes. Signed-off-by: Dan Aloni alo...@stratoscale.com This extra indirection is likely to have measureable cost though. net core saw a similar problem, it was fixed in patch net: allow large number of tx queues So let's do it in a similar way: try to allocate with GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT and if that fails, do vmalloc. To free, we can do if (is_vmalloc_addr()) vfree(); else kfree(); Hi Dan, were you going to make this change? Or prefer me to do it? Hey Michael, I prefer you go ahead and do as your suggestion. I got distracted with other matters in the meanwhile. -- Dan Aloni -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm tools: remove periodic tick in favour of a polling thread
Hi Jonny, Just a couple of nits, see below: On 03/09/13 19:10, Jonathan Austin wrote: Currently the only use of the periodic timer tick in kvmtool is to handle reading from stdin. Though functional, this periodic tick can be problematic on slow (eg FPGA) platforms and can cause low interactivity or even stop the execution from progressing at all. This patch removes the periodic tick in favour of a dedicated thread blocked waiting for input from the console. In order to reflect the new behaviour, the old 'kvm__arch_periodic_tick' function is renamed to 'kvm__arm_read_term'. s/kvm__arm_read_term/kvm__arch_read_term/ Signed-off-by: Jonathan Austin jonathan.aus...@arm.com --- tools/kvm/arm/kvm.c |2 +- tools/kvm/builtin-run.c | 13 --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 50 --- tools/kvm/powerpc/kvm.c |2 +- tools/kvm/term.c| 31 +++ tools/kvm/x86/kvm.c |2 +- 7 files changed, 35 insertions(+), 67 deletions(-) diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c index 27e6cf4..008b7fe 100644 --- a/tools/kvm/arm/kvm.c +++ b/tools/kvm/arm/kvm.c @@ -46,7 +46,7 @@ void kvm__arch_delete_ram(struct kvm *kvm) munmap(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size); } -void kvm__arch_periodic_poll(struct kvm *kvm) +void kvm__arch_read_term(struct kvm *kvm) { if (term_readable(0)) { serial8250__update_consoles(kvm); diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 4d7fbf9d..da95d71 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -165,13 +165,6 @@ void kvm_run_set_wrapper_sandbox(void) OPT_END() \ }; -static void handle_sigalrm(int sig, siginfo_t *si, void *uc) -{ - struct kvm *kvm = si-si_value.sival_ptr; - - kvm__arch_periodic_poll(kvm); -} - static void *kvm_cpu_thread(void *arg) { char name[16]; @@ -487,17 +480,11 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv) { static char real_cmdline[2048], default_name[20]; unsigned int nr_online_cpus; - struct sigaction sa; struct kvm *kvm = kvm__new(); if (IS_ERR(kvm)) return kvm; - sa.sa_flags = SA_SIGINFO; - sa.sa_sigaction = handle_sigalrm; - sigemptyset(sa.sa_mask); - sigaction(SIGALRM, sa, NULL); - nr_online_cpus = sysconf(_SC_NPROCESSORS_ONLN); kvm-cfg.custom_rootfs_name = default; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index ad53ca7..d05b936 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -103,7 +103,7 @@ void kvm__arch_delete_ram(struct kvm *kvm); int kvm__arch_setup_firmware(struct kvm *kvm); int kvm__arch_free_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); -void kvm__arch_periodic_poll(struct kvm *kvm); +void kvm__arch_read_term(struct kvm *kvm); void *guest_flat_to_host(struct kvm *kvm, u64 offset); u64 host_to_guest_flat(struct kvm *kvm, void *ptr); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index cfd30dd..d7d2e84 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -393,56 +393,6 @@ found_kernel: return ret; } -#define TIMER_INTERVAL_NS 100/* 1 msec */ - -/* - * This function sets up a timer that's used to inject interrupts from the - * userspace hypervisor into the guest at periodical intervals. Please note - * that clock interrupt, for example, is not handled here. - */ -int kvm_timer__init(struct kvm *kvm) -{ - struct itimerspec its; - struct sigevent sev; - int r; - - memset(sev, 0, sizeof(struct sigevent)); - sev.sigev_value.sival_int = 0; - sev.sigev_notify= SIGEV_THREAD_ID; - sev.sigev_signo = SIGALRM; - sev.sigev_value.sival_ptr = kvm; - sev._sigev_un._tid = syscall(__NR_gettid); - - r = timer_create(CLOCK_REALTIME, sev, kvm-timerid); - if (r 0) - return r; - - its.it_value.tv_sec = TIMER_INTERVAL_NS / 10; - its.it_value.tv_nsec= TIMER_INTERVAL_NS % 10; - its.it_interval.tv_sec = its.it_value.tv_sec; - its.it_interval.tv_nsec = its.it_value.tv_nsec; - - r = timer_settime(kvm-timerid, 0, its, NULL); - if (r 0) { - timer_delete(kvm-timerid); - return r; - } - - return 0; -} -firmware_init(kvm_timer__init); - -int kvm_timer__exit(struct kvm *kvm) -{ - if (kvm-timerid) - if (timer_delete(kvm-timerid) 0) - die(timer_delete()); - - kvm-timerid = 0; - - return 0; -} -firmware_exit(kvm_timer__exit); void kvm__dump_mem(struct
Re: [PATCH 0/3] kvm tools: remove periodic tick
On 04/09/13 10:23, Pekka Enberg wrote: Hi Pekka, On 9/3/13 9:10 PM, Jonathan Austin wrote: This patch series removes kvm tool's periodic tick function in favour of a thread that blocks waiting for input. The paths used for handling input are the same as when using a periodic tick, but they're not called unless there is actually input to be processed. On extremely slow platforms (eg FPGAs) the overhead involved in handling the timer tick means it is possible to make progress at all inside the VM! This patch addresses this problem. In doing this there are a number of small tidyups/cleanups that made sense, too: - Use a #define for maximum number of term devices - Refactor the method by which the virtio console handles input in order not to - handle input too early - handle input multiple times if the worker thread didn't immediately start work. - Rename the periodic_poll function to reflect the functional change Jonathan Austin (3): kvm tools: use #define for maximum number of terminal devices kvm tools: remove periodic tick in favour of a polling thread kvm tools: stop virtio console doing unnecessary input handling tools/kvm/arm/kvm.c |2 +- tools/kvm/builtin-run.c | 13 --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 50 --- tools/kvm/powerpc/kvm.c |2 +- tools/kvm/term.c| 38 +--- tools/kvm/virtio/console.c | 23 +--- tools/kvm/x86/kvm.c |2 +- 8 files changed, 59 insertions(+), 73 deletions(-) Seems reasonable to me. Marc, Will? With the nits I mentioned earlier addressed, I'm happy to give my Acked-by: Marc Zyngier marc.zyng...@arm.com. I must also mention than I've been using an earlier version of this patch series, and that my test rig has been much happier since... ;-) Cheers, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()
On Wed, Sep 04, 2013 at 10:21:26AM +0100, Pekka Enberg wrote: On 8/30/13 4:58 PM, Ying-Shiuan Pan wrote: From: Ying-Shiuan Pan yingshiuan@gmail.com This patch fixes a bug that vtirtio_mmio_init_ioeventfd() passed a wrong value when it invoked ioeventfd__add_event(). True value of 2nd parameter indicates the eventfd uses PIO bus which is used by virito-pci, however, for virtio-mmio, the value should be false. Signed-off-by: Ying-Shiuan Pan ys...@itri.org.tw Will, Marc? It would probably be good to change the two boolean arguments into one flags argument to avoid future bugs. Like this? It gets a bit confusing, because there is a KVM_IOEVENTFD_FLAG_* namespace as part of the kernel KVM API, but which doesn't have the flags we need (e.g. userspace polling). Will ---8 diff --git a/tools/kvm/include/kvm/ioeventfd.h b/tools/kvm/include/kvm/ioeventfd.h index d71fa40..bb1f78d 100644 --- a/tools/kvm/include/kvm/ioeventfd.h +++ b/tools/kvm/include/kvm/ioeventfd.h @@ -20,9 +20,12 @@ struct ioevent { struct list_headlist; }; +#define IOEVENTFD_FLAG_PIO (1 0) +#define IOEVENTFD_FLAG_USER_POLL (1 1) + int ioeventfd__init(struct kvm *kvm); int ioeventfd__exit(struct kvm *kvm); -int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_userspace); +int ioeventfd__add_event(struct ioevent *ioevent, int flags); int ioeventfd__del_event(u64 addr, u64 datamatch); #endif diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c index ff665d4..bce6861 100644 --- a/tools/kvm/ioeventfd.c +++ b/tools/kvm/ioeventfd.c @@ -120,7 +120,7 @@ int ioeventfd__exit(struct kvm *kvm) } base_exit(ioeventfd__exit); -int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_userspace) +int ioeventfd__add_event(struct ioevent *ioevent, int flags) { struct kvm_ioeventfd kvm_ioevent; struct epoll_event epoll_event; @@ -145,7 +145,7 @@ int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_user .flags = KVM_IOEVENTFD_FLAG_DATAMATCH, }; - if (is_pio) + if (flags IOEVENTFD_FLAG_PIO) kvm_ioevent.flags |= KVM_IOEVENTFD_FLAG_PIO; r = ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent); @@ -154,7 +154,7 @@ int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_user goto cleanup; } - if (!poll_in_userspace) + if (!(flags IOEVENTFD_FLAG_USER_POLL)) return 0; epoll_event = (struct epoll_event) { diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c index afa2692..afae6a7 100644 --- a/tools/kvm/virtio/mmio.c +++ b/tools/kvm/virtio/mmio.c @@ -55,10 +55,10 @@ static int virtio_mmio_init_ioeventfd(struct kvm *kvm, * Vhost will poll the eventfd in host kernel side, * no need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, true, false); + err = ioeventfd__add_event(ioevent, 0); else /* Need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, true, true); + err = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_USER_POLL); if (err) return err; diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index fec8ce0..bb6e7c4 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -46,10 +46,11 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde * Vhost will poll the eventfd in host kernel side, * no need to poll in userspace. */ - r = ioeventfd__add_event(ioevent, true, false); + r = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_PIO); else /* Need to poll in userspace. */ - r = ioeventfd__add_event(ioevent, true, true); + r = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_PIO | + IOEVENTFD_FLAG_USER_POLL); if (r) return r; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()
On Wed, Sep 4, 2013 at 1:07 PM, Will Deacon will.dea...@arm.com wrote: Like this? It gets a bit confusing, because there is a KVM_IOEVENTFD_FLAG_* namespace as part of the kernel KVM API, but which doesn't have the flags we need (e.g. userspace polling). Looks good. I applied the fix so can you please redo this on top of tip? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM changes for 3.12
On Tue, Sep 03, 2013 at 03:10:46PM +0300, Gleb Natapov wrote: [...] Aneesh Kumar K.V (5): mm/cma: Move dma contiguous changes into a seperate config Hi Gleb, This commit is going to cause runtime regressions on various ARM platforms because it renames a symbol but fails to update all default configurations that select the symbol. A quick grep shows that three ARM platforms are affected: $ git grep CONFIG_CMA=y arch/arm/configs/keystone_defconfig:CONFIG_CMA=y arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y arch/arm/configs/tegra_defconfig:CONFIG_CMA=y I've been digging around a bit and it seems like the original patch from Aneesh had the defconfig changes but they were dropped because they ... require separate handling to avoid pointless merge conflicts.[0] While I can't speak for Keystone or OMAP, at least on Tegra this causes issues because we use CMA for framebuffer allocation. Since we only have CMA selected but not the new DMA_CMA, large DMA allocations will fail. Can we have the defconfig changes added back to this patch, please? I suspect that Linus can handle any resulting merge conflicts. Thierry [0]: http://permalink.gmane.org/gmane.linux.kernel.mm/102707 pgpzF1UgybQKB.pgp Description: PGP signature
Re: [PATCH] kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()
On Wed, Sep 04, 2013 at 11:13:55AM +0100, Pekka Enberg wrote: On Wed, Sep 4, 2013 at 1:07 PM, Will Deacon will.dea...@arm.com wrote: Like this? It gets a bit confusing, because there is a KVM_IOEVENTFD_FLAG_* namespace as part of the kernel KVM API, but which doesn't have the flags we need (e.g. userspace polling). Looks good. I applied the fix so can you please redo this on top of tip? Sure, I'll add a commit message too and send as a new thread. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: ioeventfd: replace bool parameters to __add_event with flags
A recent fix to virtio MMIO (72a7541ce305 [kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()]) highlighted the confusing parameters expected by ioeventfd__add_event. As per Pekka's suggestion, replace the bool parameters to this function with a single `flags' argument instead. Cc: Ying-Shiuan Pan yingshiuan@gmail.com Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/include/kvm/ioeventfd.h | 5 - tools/kvm/ioeventfd.c | 6 +++--- tools/kvm/virtio/mmio.c | 4 ++-- tools/kvm/virtio/pci.c| 5 +++-- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/tools/kvm/include/kvm/ioeventfd.h b/tools/kvm/include/kvm/ioeventfd.h index d71fa40..bb1f78d 100644 --- a/tools/kvm/include/kvm/ioeventfd.h +++ b/tools/kvm/include/kvm/ioeventfd.h @@ -20,9 +20,12 @@ struct ioevent { struct list_headlist; }; +#define IOEVENTFD_FLAG_PIO (1 0) +#define IOEVENTFD_FLAG_USER_POLL (1 1) + int ioeventfd__init(struct kvm *kvm); int ioeventfd__exit(struct kvm *kvm); -int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_userspace); +int ioeventfd__add_event(struct ioevent *ioevent, int flags); int ioeventfd__del_event(u64 addr, u64 datamatch); #endif diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c index ff665d4..bce6861 100644 --- a/tools/kvm/ioeventfd.c +++ b/tools/kvm/ioeventfd.c @@ -120,7 +120,7 @@ int ioeventfd__exit(struct kvm *kvm) } base_exit(ioeventfd__exit); -int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_userspace) +int ioeventfd__add_event(struct ioevent *ioevent, int flags) { struct kvm_ioeventfd kvm_ioevent; struct epoll_event epoll_event; @@ -145,7 +145,7 @@ int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_user .flags = KVM_IOEVENTFD_FLAG_DATAMATCH, }; - if (is_pio) + if (flags IOEVENTFD_FLAG_PIO) kvm_ioevent.flags |= KVM_IOEVENTFD_FLAG_PIO; r = ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent); @@ -154,7 +154,7 @@ int ioeventfd__add_event(struct ioevent *ioevent, bool is_pio, bool poll_in_user goto cleanup; } - if (!poll_in_userspace) + if (!(flags IOEVENTFD_FLAG_USER_POLL)) return 0; epoll_event = (struct epoll_event) { diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c index 3838774..afae6a7 100644 --- a/tools/kvm/virtio/mmio.c +++ b/tools/kvm/virtio/mmio.c @@ -55,10 +55,10 @@ static int virtio_mmio_init_ioeventfd(struct kvm *kvm, * Vhost will poll the eventfd in host kernel side, * no need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, false, false); + err = ioeventfd__add_event(ioevent, 0); else /* Need to poll in userspace. */ - err = ioeventfd__add_event(ioevent, false, true); + err = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_USER_POLL); if (err) return err; diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index fec8ce0..bb6e7c4 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -46,10 +46,11 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde * Vhost will poll the eventfd in host kernel side, * no need to poll in userspace. */ - r = ioeventfd__add_event(ioevent, true, false); + r = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_PIO); else /* Need to poll in userspace. */ - r = ioeventfd__add_event(ioevent, true, true); + r = ioeventfd__add_event(ioevent, IOEVENTFD_FLAG_PIO | + IOEVENTFD_FLAG_USER_POLL); if (r) return r; -- 1.8.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM changes for 3.12
Copying Marek, Aneesh and Alex since this came through PPC kvm tree. On Wed, Sep 04, 2013 at 12:18:28PM +0200, Thierry Reding wrote: On Tue, Sep 03, 2013 at 03:10:46PM +0300, Gleb Natapov wrote: [...] Aneesh Kumar K.V (5): mm/cma: Move dma contiguous changes into a seperate config Hi Gleb, This commit is going to cause runtime regressions on various ARM platforms because it renames a symbol but fails to update all default configurations that select the symbol. A quick grep shows that three ARM platforms are affected: $ git grep CONFIG_CMA=y arch/arm/configs/keystone_defconfig:CONFIG_CMA=y arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y arch/arm/configs/tegra_defconfig:CONFIG_CMA=y I've been digging around a bit and it seems like the original patch from Aneesh had the defconfig changes but they were dropped because they ... require separate handling to avoid pointless merge conflicts.[0] Marek, that's your words. What do you think about ARM problem? While I can't speak for Keystone or OMAP, at least on Tegra this causes issues because we use CMA for framebuffer allocation. Since we only have CMA selected but not the new DMA_CMA, large DMA allocations will fail. Make config suppose to ask you about new option though, does it? Can we have the defconfig changes added back to this patch, please? I suspect that Linus can handle any resulting merge conflicts. Thierry [0]: http://permalink.gmane.org/gmane.linux.kernel.mm/102707 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: ioeventfd: replace bool parameters to __add_event with flags
On Wed, Sep 4, 2013 at 1:27 PM, Will Deacon will.dea...@arm.com wrote: A recent fix to virtio MMIO (72a7541ce305 [kvm tools: virtio-mmio: init_ioeventfd should use MMIO for ioeventfd__add_event()]) highlighted the confusing parameters expected by ioeventfd__add_event. As per Pekka's suggestion, replace the bool parameters to this function with a single `flags' argument instead. Cc: Ying-Shiuan Pan yingshiuan@gmail.com Signed-off-by: Will Deacon will.dea...@arm.com Applied, thanks a lot! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
On 01.09.2013, at 14:53, Gleb Natapov wrote: XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com Paul, please ack :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote: Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determining zerocopy once by checking all conditions at one time before. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 47 --- 1 files changed, 20 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8a6dd0d..3f89dea 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net) iov_length(nvq-hdr, s), hdr_size); break; } - zcopy_used = zcopy (len = VHOST_GOODCOPY_LEN || -nvq-upend_idx != nvq-done_idx); + + zcopy_used = zcopy len = VHOST_GOODCOPY_LEN + (nvq-upend_idx + 1) % UIO_MAXIOV != + nvq-done_idx Thinking about this, this looks strange. The original idea was that once we start doing zcopy, we keep using the heads ring even for short packets until no zcopy is outstanding. What's the logic behind (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx here? + vhost_net_tx_select_zcopy(net); /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { + struct ubuf_info *ubuf; + ubuf = nvq-ubuf_info + nvq-upend_idx; + vq-heads[nvq-upend_idx].id = head; - if (!vhost_net_tx_select_zcopy(net) || - len VHOST_GOODCOPY_LEN) { - /* copy don't need to wait for DMA done */ - vq-heads[nvq-upend_idx].len = - VHOST_DMA_DONE_LEN; - msg.msg_control = NULL; - msg.msg_controllen = 0; - ubufs = NULL; - } else { - struct ubuf_info *ubuf; - ubuf = nvq-ubuf_info + nvq-upend_idx; - - vq-heads[nvq-upend_idx].len = - VHOST_DMA_IN_PROGRESS; - ubuf-callback = vhost_zerocopy_callback; - ubuf-ctx = nvq-ubufs; - ubuf-desc = nvq-upend_idx; - msg.msg_control = ubuf; - msg.msg_controllen = sizeof(ubuf); - ubufs = nvq-ubufs; - kref_get(ubufs-kref); - } + vq-heads[nvq-upend_idx].len = VHOST_DMA_IN_PROGRESS; + ubuf-callback = vhost_zerocopy_callback; + ubuf-ctx = nvq-ubufs; + ubuf-desc = nvq-upend_idx; + msg.msg_control = ubuf; + msg.msg_controllen = sizeof(ubuf); + ubufs = nvq-ubufs; + kref_get(ubufs-kref); nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; - } else + } else { msg.msg_control = NULL; + ubufs = NULL; + } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { if (zcopy_used) { - if (ubufs) - vhost_net_ubuf_put(ubufs); + vhost_net_ubuf_put(ubufs); nvq-upend_idx = ((unsigned)nvq-upend_idx - 1) % UIO_MAXIOV; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
On Sun, Sep 01, 2013 at 03:53:46PM +0300, Gleb Natapov wrote: XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com Acked-by: Paul Mackerras pau...@samba.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] kvm tools: stop virtio console doing unnecessary input handling
The asynchronous nature of the virtio input handling (using a job queue) can result in unnecessary jobs being created if there is some delay in handing input (the original function to handle the input returns immediately without the file having been read, and hence poll returns immediately informing us of data to read). This patch adds synchronisation to the threads so that we don't start polling input files again until we've read from the console. Signed-off-by: Jonathan Austin jonathan.aus...@arm.com Acked-by: Marc Zyngier marc.zyng...@arm.com --- tools/kvm/virtio/console.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c index 83c58bf..f982dab7 100644 --- a/tools/kvm/virtio/console.c +++ b/tools/kvm/virtio/console.c @@ -36,12 +36,17 @@ struct con_dev { struct virtio_console_configconfig; u32 features; + pthread_cond_t poll_cond; + int vq_ready; + struct thread_pool__job jobs[VIRTIO_CONSOLE_NUM_QUEUES]; }; static struct con_dev cdev = { .mutex = MUTEX_INITIALIZER, + .vq_ready = 0, + .config = { .cols = 80, .rows = 24, @@ -69,6 +74,9 @@ static void virtio_console__inject_interrupt_callback(struct kvm *kvm, void *par vq = param; + if (!cdev.vq_ready) + pthread_cond_wait(cdev.poll_cond, cdev.mutex.mutex); + if (term_readable(0) virt_queue__available(vq)) { head = virt_queue__get_iov(vq, iov, out, in, kvm); len = term_getc_iov(kvm, iov, in, 0); @@ -81,7 +89,8 @@ static void virtio_console__inject_interrupt_callback(struct kvm *kvm, void *par void virtio_console__inject_interrupt(struct kvm *kvm) { - thread_pool__do_job(cdev.jobs[VIRTIO_CONSOLE_RX_QUEUE]); + virtio_console__inject_interrupt_callback(kvm, + cdev.vqs[VIRTIO_CONSOLE_RX_QUEUE]); } static void virtio_console_handle_callback(struct kvm *kvm, void *param) @@ -141,10 +150,16 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, vring_init(queue-vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, align); - if (vq == VIRTIO_CONSOLE_TX_QUEUE) + if (vq == VIRTIO_CONSOLE_TX_QUEUE) { thread_pool__init_job(cdev.jobs[vq], kvm, virtio_console_handle_callback, queue); - else if (vq == VIRTIO_CONSOLE_RX_QUEUE) + } else if (vq == VIRTIO_CONSOLE_RX_QUEUE) { thread_pool__init_job(cdev.jobs[vq], kvm, virtio_console__inject_interrupt_callback, queue); + /* Tell the waiting poll thread that we're ready to go */ + mutex_lock(cdev.mutex); + cdev.vq_ready = 1; + pthread_cond_signal(cdev.poll_cond); + mutex_unlock(cdev.mutex); + } return 0; } @@ -192,6 +207,8 @@ int virtio_console__init(struct kvm *kvm) if (kvm-cfg.active_console != CONSOLE_VIRTIO) return 0; + pthread_cond_init(cdev.poll_cond, NULL); + virtio_init(kvm, cdev, cdev.vdev, con_dev_virtio_ops, VIRTIO_DEFAULT_TRANS, PCI_DEVICE_ID_VIRTIO_CONSOLE, VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] kvm tools: remove periodic tick
This patch series removes kvm tool's periodic tick function in favour of a thread that blocks waiting for input. The paths used for handling input are the same as when using a periodic tick, but they're not called unless there is actually input to be processed. On extremely slow platforms (eg FPGAs) the overhead involved in handling the timer tick means it is possible to make progress at all inside the VM! This patch addresses this problem. In doing this there are a number of small tidyups/cleanups that made sense, too: - Use a #define for maximum number of term devices - Refactor the method by which the virtio console handles input in order not to - handle input too early - handle input multiple times if the worker thread didn't immediately start work. - Rename the periodic_poll function to reflect the functional change --- Changes since V1 - s/kvm__arm_read_term/kvm__arch_read_term/ in patch2's coverletter - make term_poll_thread static - Added Marc's ack Jonathan Austin (3): kvm tools: use #define for maximum number of terminal devices kvm tools: remove periodic tick in favour of a polling thread kvm tools: stop virtio console doing unnecessary input handling tools/kvm/arm/kvm.c |2 +- tools/kvm/builtin-run.c | 13 --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 50 --- tools/kvm/powerpc/kvm.c |2 +- tools/kvm/term.c| 38 +--- tools/kvm/virtio/console.c | 23 +--- tools/kvm/x86/kvm.c |2 +- 8 files changed, 59 insertions(+), 73 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] kvm tools: use #define for maximum number of terminal devices
Though there may be no near-term plans to change the number of terminal devices in the future, using TERM_MAX_DEVS instead of '4' makes reading some of the loops over terminal devices clearer. This patch makes the this substitution where required. Signed-off-by: Jonathan Austin jonathan.aus...@arm.com Acked-by: Marc Zyngier marc.zyng...@arm.com --- tools/kvm/term.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index fa85e4a..ac9c7cc 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -16,13 +16,14 @@ #define TERM_FD_IN 0 #define TERM_FD_OUT 1 +#define TERM_MAX_DEVS 4 static struct termios orig_term; int term_escape_char = 0x01; /* ctrl-a is used for escape */ bool term_got_escape = false; -int term_fds[4][2]; +int term_fds[TERM_MAX_DEVS][2]; int term_getc(struct kvm *kvm, int term) { @@ -94,7 +95,7 @@ static void term_cleanup(void) { int i; - for (i = 0; i 4; i++) + for (i = 0; i TERM_MAX_DEVS; i++) tcsetattr(term_fds[i][TERM_FD_IN], TCSANOW, orig_term); } @@ -140,7 +141,7 @@ int term_init(struct kvm *kvm) struct termios term; int i, r; - for (i = 0; i 4; i++) + for (i = 0; i TERM_MAX_DEVS; i++) if (term_fds[i][TERM_FD_IN] == 0) { term_fds[i][TERM_FD_IN] = STDIN_FILENO; term_fds[i][TERM_FD_OUT] = STDOUT_FILENO; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread
Currently the only use of the periodic timer tick in kvmtool is to handle reading from stdin. Though functional, this periodic tick can be problematic on slow (eg FPGA) platforms and can cause low interactivity or even stop the execution from progressing at all. This patch removes the periodic tick in favour of a dedicated thread blocked waiting for input from the console. In order to reflect the new behaviour, the old 'kvm__arch_periodic_tick' function is renamed to 'kvm__arch_read_term'. Signed-off-by: Jonathan Austin jonathan.aus...@arm.com Acked-by: Marc Zyngier marc.zyng...@arm.com --- tools/kvm/arm/kvm.c |2 +- tools/kvm/builtin-run.c | 13 --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 50 --- tools/kvm/powerpc/kvm.c |2 +- tools/kvm/term.c| 31 +++ tools/kvm/x86/kvm.c |2 +- 7 files changed, 35 insertions(+), 67 deletions(-) diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c index 27e6cf4..008b7fe 100644 --- a/tools/kvm/arm/kvm.c +++ b/tools/kvm/arm/kvm.c @@ -46,7 +46,7 @@ void kvm__arch_delete_ram(struct kvm *kvm) munmap(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size); } -void kvm__arch_periodic_poll(struct kvm *kvm) +void kvm__arch_read_term(struct kvm *kvm) { if (term_readable(0)) { serial8250__update_consoles(kvm); diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 4d7fbf9d..da95d71 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -165,13 +165,6 @@ void kvm_run_set_wrapper_sandbox(void) OPT_END() \ }; -static void handle_sigalrm(int sig, siginfo_t *si, void *uc) -{ - struct kvm *kvm = si-si_value.sival_ptr; - - kvm__arch_periodic_poll(kvm); -} - static void *kvm_cpu_thread(void *arg) { char name[16]; @@ -487,17 +480,11 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv) { static char real_cmdline[2048], default_name[20]; unsigned int nr_online_cpus; - struct sigaction sa; struct kvm *kvm = kvm__new(); if (IS_ERR(kvm)) return kvm; - sa.sa_flags = SA_SIGINFO; - sa.sa_sigaction = handle_sigalrm; - sigemptyset(sa.sa_mask); - sigaction(SIGALRM, sa, NULL); - nr_online_cpus = sysconf(_SC_NPROCESSORS_ONLN); kvm-cfg.custom_rootfs_name = default; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index ad53ca7..d05b936 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -103,7 +103,7 @@ void kvm__arch_delete_ram(struct kvm *kvm); int kvm__arch_setup_firmware(struct kvm *kvm); int kvm__arch_free_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); -void kvm__arch_periodic_poll(struct kvm *kvm); +void kvm__arch_read_term(struct kvm *kvm); void *guest_flat_to_host(struct kvm *kvm, u64 offset); u64 host_to_guest_flat(struct kvm *kvm, void *ptr); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index cfd30dd..d7d2e84 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -393,56 +393,6 @@ found_kernel: return ret; } -#define TIMER_INTERVAL_NS 100 /* 1 msec */ - -/* - * This function sets up a timer that's used to inject interrupts from the - * userspace hypervisor into the guest at periodical intervals. Please note - * that clock interrupt, for example, is not handled here. - */ -int kvm_timer__init(struct kvm *kvm) -{ - struct itimerspec its; - struct sigevent sev; - int r; - - memset(sev, 0, sizeof(struct sigevent)); - sev.sigev_value.sival_int = 0; - sev.sigev_notify= SIGEV_THREAD_ID; - sev.sigev_signo = SIGALRM; - sev.sigev_value.sival_ptr = kvm; - sev._sigev_un._tid = syscall(__NR_gettid); - - r = timer_create(CLOCK_REALTIME, sev, kvm-timerid); - if (r 0) - return r; - - its.it_value.tv_sec = TIMER_INTERVAL_NS / 10; - its.it_value.tv_nsec= TIMER_INTERVAL_NS % 10; - its.it_interval.tv_sec = its.it_value.tv_sec; - its.it_interval.tv_nsec = its.it_value.tv_nsec; - - r = timer_settime(kvm-timerid, 0, its, NULL); - if (r 0) { - timer_delete(kvm-timerid); - return r; - } - - return 0; -} -firmware_init(kvm_timer__init); - -int kvm_timer__exit(struct kvm *kvm) -{ - if (kvm-timerid) - if (timer_delete(kvm-timerid) 0) - die(timer_delete()); - - kvm-timerid = 0; - - return 0; -} -firmware_exit(kvm_timer__exit); void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, int debug_fd) { diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
[PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- This series depends on queue. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 51 ++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..870caa8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 - preempt_val_l1 0) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; + int ret; /* If guest state is invalid, start emulating */ if (vmx-emulation_required) @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (exit_reason kvm_vmx_max_exit_handlers kvm_vmx_exit_handlers[exit_reason]) - return kvm_vmx_exit_handlers[exit_reason](vcpu); + ret = kvm_vmx_exit_handlers[exit_reason](vcpu); else { vcpu-run-exit_reason = KVM_EXIT_UNKNOWN; vcpu-run-hw.hardware_exit_reason = exit_reason; + ret = 0; } - return 0; + if (is_guest_mode(vcpu)) + nested_adjust_preemption_timer(vcpu); + return ret; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and
[PATCH] kvm-unit-tests: VMX: Test suite for preemption timer
Test cases for preemption timer in nested VMX. Two aspects are tested: 1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL 2. Test a relevant bug of KVM. The bug will not save preemption timer value if exit L2-L0 for some reason and enter L0-L2. Thus preemption timer will never trigger if the value is large enough. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.h |3 ++ x86/vmx_tests.c | 117 +++ 2 files changed, 120 insertions(+) diff --git a/x86/vmx.h b/x86/vmx.h index 28595d8..ebc8cfd 100644 --- a/x86/vmx.h +++ b/x86/vmx.h @@ -210,6 +210,7 @@ enum Encoding { GUEST_ACTV_STATE= 0x4826ul, GUEST_SMBASE= 0x4828ul, GUEST_SYSENTER_CS = 0x482aul, + PREEMPT_TIMER_VALUE = 0x482eul, /* 32-Bit Host State Fields */ HOST_SYSENTER_CS= 0x4c00ul, @@ -331,6 +332,7 @@ enum Ctrl_exi { EXI_LOAD_PERF = 1UL 12, EXI_INTA= 1UL 15, EXI_LOAD_EFER = 1UL 21, + EXI_SAVE_PREEMPT= 1UL 22, }; enum Ctrl_ent { @@ -342,6 +344,7 @@ enum Ctrl_pin { PIN_EXTINT = 1ul 0, PIN_NMI = 1ul 3, PIN_VIRT_NMI= 1ul 5, + PIN_PREEMPT = 1ul 6, }; enum Ctrl0 { diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index c1b39f4..d358148 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -1,4 +1,30 @@ #include vmx.h +#include msr.h +#include processor.h + +volatile u32 stage; + +static inline void vmcall() +{ + asm volatile(vmcall); +} + +static inline void set_stage(u32 s) +{ + barrier(); + stage = s; + barrier(); +} + +static inline u32 get_stage() +{ + u32 s; + + barrier(); + s = stage; + barrier(); + return s; +} void basic_init() { @@ -76,6 +102,95 @@ int vmenter_exit_handler() return VMX_TEST_VMEXIT; } +u32 preempt_scale; +volatile unsigned long long tsc_val; +volatile u32 preempt_val; + +void preemption_timer_init() +{ + u32 ctrl_pin; + + ctrl_pin = vmcs_read(PIN_CONTROLS) | PIN_PREEMPT; + ctrl_pin = ctrl_pin_rev.clr; + vmcs_write(PIN_CONTROLS, ctrl_pin); + preempt_val = 1000; + vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); + preempt_scale = rdmsr(MSR_IA32_VMX_MISC) 0x1F; +} + +void preemption_timer_main() +{ + tsc_val = rdtsc(); + if (!(ctrl_pin_rev.clr PIN_PREEMPT)) { + printf(\tPreemption timer is not supported\n); + return; + } + if (!(ctrl_exit_rev.clr EXI_SAVE_PREEMPT)) + printf(\tSave preemption value is not supported\n); + else { + set_stage(0); + vmcall(); + if (get_stage() == 1) + vmcall(); + } + while (1) { + if (((rdtsc() - tsc_val) preempt_scale) +10 * preempt_val) { + report(Preemption timer, 0); + break; + } + } +} + +int preemption_timer_exit_handler() +{ + u64 guest_rip; + ulong reason; + u32 insn_len; + u32 ctrl_exit; + + guest_rip = vmcs_read(GUEST_RIP); + reason = vmcs_read(EXI_REASON) 0xff; + insn_len = vmcs_read(EXI_INST_LEN); + switch (reason) { + case VMX_PREEMPT: + if (((rdtsc() - tsc_val) preempt_scale) preempt_val) + report(Preemption timer, 0); + else + report(Preemption timer, 1); + return VMX_TEST_VMEXIT; + case VMX_VMCALL: + switch (get_stage()) { + case 0: + if (vmcs_read(PREEMPT_TIMER_VALUE) != preempt_val) + report(Save preemption value, 0); + else { + set_stage(get_stage() + 1); + ctrl_exit = (vmcs_read(EXI_CONTROLS) | + EXI_SAVE_PREEMPT) ctrl_exit_rev.clr; + vmcs_write(EXI_CONTROLS, ctrl_exit); + } + break; + case 1: + if (vmcs_read(PREEMPT_TIMER_VALUE) = preempt_val) + report(Save preemption value, 0); + else + report(Save preemption value, 1); + break; + default: + printf(Invalid stage.\n); + print_vmexit_info(); + return VMX_TEST_VMEXIT; + } + vmcs_write(GUEST_RIP, guest_rip + insn_len); + return VMX_TEST_RESUME; + default: + printf(Unknown exit reason, %d\n, reason); +
Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread
Hi Jonathan, On Wed, Sep 4, 2013 at 4:25 PM, Jonathan Austin jonathan.aus...@arm.com wrote: Currently the only use of the periodic timer tick in kvmtool is to handle reading from stdin. Though functional, this periodic tick can be problematic on slow (eg FPGA) platforms and can cause low interactivity or even stop the execution from progressing at all. This patch removes the periodic tick in favour of a dedicated thread blocked waiting for input from the console. In order to reflect the new behaviour, the old 'kvm__arch_periodic_tick' function is renamed to 'kvm__arch_read_term'. Signed-off-by: Jonathan Austin jonathan.aus...@arm.com Acked-by: Marc Zyngier marc.zyng...@arm.com I'm afraid this breaks top on x86. Does it work on arm? When I start it up, it seems as if it's stuck but whenever I press a key, it prints part of the screen. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-unit-tests: VMX: Add the framework of EPT
Hi Xiao Guangrong, Jun Nakajima, Yang Zhang, Gleb and Paolo, If you have any ideas of how and which aspects should nested EPT be tested, please tell me and I will write relevant test cases. Besides, I'm so happy if you can help me review this patch or propose other suggestions. Thanks very mush, Arthur On Mon, Sep 2, 2013 at 5:38 PM, Arthur Chunqi Li yzt...@gmail.com wrote: There must have some minor revisions to be done in this patch, so this is mainly a RFC mail. Besides, I'm not quite clear what we should test in nested EPT modules, and I bet writers of nested EPT must have ideas to continue and refine this testing part. Any suggestions of which part and how to test nested EPT is welcome. Please help me CC EPT-related guys if anyone knows. Thanks, Arthur On Mon, Sep 2, 2013 at 5:26 PM, Arthur Chunqi Li yzt...@gmail.com wrote: Add a framework of EPT in nested VMX testing, including a set of functions to construct and read EPT paging structures and a simple read/write test of EPT remapping from guest to host. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- x86/vmx.c | 132 -- x86/vmx.h | 76 +++ x86/vmx_tests.c | 156 +++ 3 files changed, 360 insertions(+), 4 deletions(-) diff --git a/x86/vmx.c b/x86/vmx.c index ca36d35..a156b71 100644 --- a/x86/vmx.c +++ b/x86/vmx.c @@ -143,6 +143,132 @@ asm( call hypercall\n\t ); +/* EPT paging structure related functions */ +/* install_ept_entry : Install a page to a given level in EPT + @pml4 : addr of pml4 table + @pte_level : level of PTE to set + @guest_addr : physical address of guest + @pte : pte value to set + @pt_page : address of page table, NULL for a new page + */ +void install_ept_entry(unsigned long *pml4, + int pte_level, + unsigned long guest_addr, + unsigned long pte, + unsigned long *pt_page) +{ + int level; + unsigned long *pt = pml4; + unsigned offset; + + for (level = EPT_PAGE_LEVEL; level pte_level; --level) { + offset = (guest_addr ((level-1) * EPT_PGDIR_WIDTH + 12)) +EPT_PGDIR_MASK; + if (!(pt[offset] (EPT_RA | EPT_WA | EPT_EA))) { + unsigned long *new_pt = pt_page; + if (!new_pt) + new_pt = alloc_page(); + else + pt_page = 0; + memset(new_pt, 0, PAGE_SIZE); + pt[offset] = virt_to_phys(new_pt) + | EPT_RA | EPT_WA | EPT_EA; + } + pt = phys_to_virt(pt[offset] 0xff000ull); + } + offset = ((unsigned long)guest_addr ((level-1) * + EPT_PGDIR_WIDTH + 12)) EPT_PGDIR_MASK; + pt[offset] = pte; +} + +/* Map a page, @perm is the permission of the page */ +void install_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 1, guest_addr, (phys PAGE_MASK) | perm, 0); +} + +/* Map a 1G-size page */ +void install_1g_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 3, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* Map a 2M-size page */ +void install_2m_ept(unsigned long *pml4, + unsigned long phys, + unsigned long guest_addr, + u64 perm) +{ + install_ept_entry(pml4, 2, guest_addr, + (phys PAGE_MASK) | perm | EPT_LARGE_PAGE, 0); +} + +/* setup_ept_range : Setup a range of 1:1 mapped page to EPT paging structure. + @start : start address of guest page + @len : length of address to be mapped + @map_1g : whether 1G page map is used + @map_2m : whether 2M page map is used + @perm : permission for every page + */ +int setup_ept_range(unsigned long *pml4, unsigned long start, + unsigned long len, int map_1g, int map_2m, u64 perm) +{ + u64 phys = start; + u64 max = (u64)len + (u64)start; + + if (map_1g) { + while (phys + PAGE_SIZE_1G = max) { + install_1g_ept(pml4, phys, phys, perm); + phys += PAGE_SIZE_1G; + } + } + if (map_2m) { + while (phys + PAGE_SIZE_2M = max) { + install_2m_ept(pml4, phys, phys, perm); + phys +=
[stable-3.4] possibly revert KVM: X86 emulator: fix source operand decoding...
Hi Greg, The 3.4.44+ cherry pick: commit 5b5b30580218eae22609989546bac6e44d0eda6e Author: Gleb Natapov g...@redhat.com Date: Wed Apr 24 13:38:36 2013 +0300 KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions commit 660696d1d16a71e15549ce1bf74953be1592bcd3 upstream. Source operand for one byte mov[zs]x is decoded incorrectly if it is in high byte register. Fix that. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org introduces the following: arch/x86/kvm/emulate.c: In function ‘decode_operand’: arch/x86/kvm/emulate.c:3974:4: warning: passing argument 1 of ‘decode_register’ makes integer from pointer without a cast [enabled by default] arch/x86/kvm/emulate.c:789:14: note: expected ‘u8’ but argument is of type ‘struct x86_emulate_ctxt *’ arch/x86/kvm/emulate.c:3974:4: warning: passing argument 2 of ‘decode_register’ makes pointer from integer without a cast [enabled by default] arch/x86/kvm/emulate.c:789:14: note: expected ‘long unsigned int *’ but argument is of type ‘u8’ Based on the severity of the warnings above, I'm reasonably sure there will be some kind of runtime regressions due to this, but I stopped to investigate the warnings as soon as I saw them, before any run time testing. It happens because mainline v3.7-rc1~113^2~40 (dd856efafe60) does this: -static void *decode_register(u8 modrm_reg, unsigned long *regs, +static void *decode_register(struct x86_emulate_ctxt *ctxt, u8 modrm_reg, Since 660696d1d16a71e1 was only applied to stable 3.4, 3.8, and 3.9 -- and the prerequisite above is in 3.7+, the issue should be limited to 3.4.44+ Thanks, Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mapping guest memory from another process?
Thanks, I'll try that. Do you know of any way to get at the VCPU structure from another process? I'm looking to have an event triggered from the guest which will notify my application. In Xen I use an event channel, and then I can call a function to retrieve the relevant VCPU context. On Wed, Sep 4, 2013 at 10:35 AM, Cutter 409 cutter...@gmail.com wrote: Thanks, I'll try that. Do you know of any way to get at the VCPU structure from another process? I'm looking to have an event triggered from the guest which will notify my application. In Xen I use an event channel, and then I can call a function to retrieve the relevant VCPU context. On Wed, Sep 4, 2013 at 4:47 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Sep 03, 2013 at 07:56:33PM -0400, Cutter 409 wrote: I'm working on a tool that needs the ability to map the physical memory of a virtual machine into its own address space. With Xen, I can simply call xc_map_foreign_pages(). Is there something similar for KVM? So far, I can only figure out how to do it if I were the process that created the VM (then I could mmap() the handle of the virtual machine). Is there a way for an outside process to do this? You can get QEMU to do a shared mapping of a files as guest RAM using -mem-path and -mem-prealloc, see man qemu. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V3] target-i386: forward CPUID cache leaves when -cpu host is used
On Mon, Sep 02, 2013 at 07:09:47PM +0200, Benoît Canet wrote: Signed-off-by: Benoit Canet ben...@irqsave.net Reviewed-by: Eduardo Habkost ehabk...@redhat.com Thanks. Do you have an idea on how QEMU could reflect the real host clock frequency to the guest when the host cpu scaling governor kicks in ? Giving a false value to cloud customers is mildly annoying. Probably you will need changes on KVM, SeaBIOS and QEMU to implement the interfaces to let the system notify the OS about CPU frequency changes. I don't know much a lot about ACPI and power management, to know how much of that is already implemented and how much is missing. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread
Hi Pekka, On 04/09/13 16:58, Pekka Enberg wrote: Hi Jonathan, On Wed, Sep 4, 2013 at 4:25 PM, Jonathan Austin jonathan.aus...@arm.com wrote: Currently the only use of the periodic timer tick in kvmtool is to handle reading from stdin. Though functional, this periodic tick can be problematic on slow (eg FPGA) platforms and can cause low interactivity or even stop the execution from progressing at all. This patch removes the periodic tick in favour of a dedicated thread blocked waiting for input from the console. In order to reflect the new behaviour, the old 'kvm__arch_periodic_tick' function is renamed to 'kvm__arch_read_term'. Signed-off-by: Jonathan Austin jonathan.aus...@arm.com Acked-by: Marc Zyngier marc.zyng...@arm.com I'm afraid this breaks top on x86. Does it work on arm? Sorry about that... 'top' works on ARM with virtio console. I've just done some new testing and with the serial console emulation and I see the same as you're reporting. Previously with the 8250 emulation I'd booted to a prompt but didn't actually test top... I'm looking in to fixing this now... Looks like I need to find the right place from which to call serial8250_flush_tx now that it isn't getting called every tick. I've done the following and it works fixes 'top' with serial8250: ---8-- diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index 931067f..a71e68d 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -260,6 +260,7 @@ static bool serial8250_out(struct ioport *ioport, struct kvm *kvm, u16 port, dev-lsr = ~UART_LSR_TEMT; if (dev-txcnt == FIFO_LEN / 2) dev-lsr = ~UART_LSR_THRE; + serial8250_flush_tx(kvm, dev); } else { /* Should never happpen */ dev-lsr = ~(UART_LSR_TEMT | UART_LSR_THRE); -8--- I guess it's a shame that we'll be printing each character (admittedly the rate will always be relatively low...) rather than flushing the buffer in a batch. Without a timer, though, I'm not sure I see a better option - every N chars doesn't seem like a good one to me. If you think that looks about right then I'll fold that in to the patch series, probably also removing the call to serial8250_flush_tx() in serial8250__receive. Thanks, Jonny -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread
On Wed, Sep 4, 2013 at 8:40 PM, Jonathan Austin jonathan.aus...@arm.com wrote: 'top' works on ARM with virtio console. I've just done some new testing and with the serial console emulation and I see the same as you're reporting. Previously with the 8250 emulation I'd booted to a prompt but didn't actually test top... I'm looking in to fixing this now... Looks like I need to find the right place from which to call serial8250_flush_tx now that it isn't getting called every tick. I've done the following and it works fixes 'top' with serial8250: ---8-- diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index 931067f..a71e68d 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -260,6 +260,7 @@ static bool serial8250_out(struct ioport *ioport, struct kvm *kvm, u16 port, dev-lsr = ~UART_LSR_TEMT; if (dev-txcnt == FIFO_LEN / 2) dev-lsr = ~UART_LSR_THRE; + serial8250_flush_tx(kvm, dev); } else { /* Should never happpen */ dev-lsr = ~(UART_LSR_TEMT | UART_LSR_THRE); -8--- I guess it's a shame that we'll be printing each character (admittedly the rate will always be relatively low...) rather than flushing the buffer in a batch. Without a timer, though, I'm not sure I see a better option - every N chars doesn't seem like a good one to me. If you think that looks about right then I'll fold that in to the patch series, probably also removing the call to serial8250_flush_tx() in serial8250__receive. Yeah, looks good to me and makes top work again. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] kvm tools: remove periodic tick in favour of a polling thread
On 09/04/2013 01:48 PM, Pekka Enberg wrote: On Wed, Sep 4, 2013 at 8:40 PM, Jonathan Austin jonathan.aus...@arm.com wrote: 'top' works on ARM with virtio console. I've just done some new testing and with the serial console emulation and I see the same as you're reporting. Previously with the 8250 emulation I'd booted to a prompt but didn't actually test top... I'm looking in to fixing this now... Looks like I need to find the right place from which to call serial8250_flush_tx now that it isn't getting called every tick. I've done the following and it works fixes 'top' with serial8250: ---8-- diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index 931067f..a71e68d 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -260,6 +260,7 @@ static bool serial8250_out(struct ioport *ioport, struct kvm *kvm, u16 port, dev-lsr = ~UART_LSR_TEMT; if (dev-txcnt == FIFO_LEN / 2) dev-lsr = ~UART_LSR_THRE; + serial8250_flush_tx(kvm, dev); } else { /* Should never happpen */ dev-lsr = ~(UART_LSR_TEMT | UART_LSR_THRE); -8--- I guess it's a shame that we'll be printing each character (admittedly the rate will always be relatively low...) rather than flushing the buffer in a batch. Without a timer, though, I'm not sure I see a better option - every N chars doesn't seem like a good one to me. If you think that looks about right then I'll fold that in to the patch series, probably also removing the call to serial8250_flush_tx() in serial8250__receive. Yeah, looks good to me and makes top work again. We might want to make sure performance isn't hit with stuff that's intensive on the serial console. Thanks, Sasha -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM changes for 3.12
On 09/04/2013 04:38 AM, Gleb Natapov wrote: Copying Marek, Aneesh and Alex since this came through PPC kvm tree. On Wed, Sep 04, 2013 at 12:18:28PM +0200, Thierry Reding wrote: On Tue, Sep 03, 2013 at 03:10:46PM +0300, Gleb Natapov wrote: [...] Aneesh Kumar K.V (5): mm/cma: Move dma contiguous changes into a seperate config Hi Gleb, This commit is going to cause runtime regressions on various ARM platforms because it renames a symbol but fails to update all default configurations that select the symbol. A quick grep shows that three ARM platforms are affected: $ git grep CONFIG_CMA=y arch/arm/configs/keystone_defconfig:CONFIG_CMA=y arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y arch/arm/configs/tegra_defconfig:CONFIG_CMA=y I've been digging around a bit and it seems like the original patch from Aneesh had the defconfig changes but they were dropped because they ... require separate handling to avoid pointless merge conflicts.[0] Marek, that's your words. What do you think about ARM problem? While I can't speak for Keystone or OMAP, at least on Tegra this causes issues because we use CMA for framebuffer allocation. Since we only have CMA selected but not the new DMA_CMA, large DMA allocations will fail. Make config suppose to ask you about new option though, does it? make oldconfig quite possibly might, but make tegra_defconfig doesn't, and make tegra_defconfig; make zImage is a workflow that has historically generated a perfectly working kernel for Tegra, and hence people use that flow. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm: free resources after canceling async_pf
When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- virt/kvm/async_pf.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index b44cea0..f30aa1c 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -102,8 +102,11 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) typeof(*work), queue); cancel_work_sync(work-work); list_del(work-queue); - if (!work-done) /* work was canceled */ + if (!work-done) { /* work was canceled */ + mmdrop(work-mm); + kvm_put_kvm(vcpu-kvm); /* == work-vcpu-kvm */ kmem_cache_free(async_pf_cache, work); + } } spin_lock(vcpu-async_pf.lock); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] kvm: fix a bug and remove a redundancy in async_pf
I did not reproduce the bug fixed in [1/2], but there are not that many reasons why we could not unload a module, so the spot is quite obvious. Radim Krčmář (2): kvm: free resources after canceling async_pf kvm: remove .done from struct kvm_async_pf include/linux/kvm_host.h | 1 - virt/kvm/async_pf.c | 8 2 files changed, 4 insertions(+), 5 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm: remove .done from struct kvm_async_pf
'.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- include/linux/kvm_host.h | 1 - virt/kvm/async_pf.c | 5 + 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ca645a0..c7a5e08 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -190,7 +190,6 @@ struct kvm_async_pf { unsigned long addr; struct kvm_arch_async_pf arch; struct page *page; - bool done; }; void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index f30aa1c..89acf41 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -76,7 +76,6 @@ static void async_pf_execute(struct work_struct *work) spin_lock(vcpu-async_pf.lock); list_add_tail(apf-link, vcpu-async_pf.done); apf-page = page; - apf-done = true; spin_unlock(vcpu-async_pf.lock); /* @@ -100,9 +99,8 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) struct kvm_async_pf *work = list_entry(vcpu-async_pf.queue.next, typeof(*work), queue); - cancel_work_sync(work-work); list_del(work-queue); - if (!work-done) { /* work was canceled */ + if (cancel_work_sync(work-work)) { mmdrop(work-mm); kvm_put_kvm(vcpu-kvm); /* == work-vcpu-kvm */ kmem_cache_free(async_pf_cache, work); @@ -167,7 +165,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, return 0; work-page = NULL; - work-done = false; work-vcpu = vcpu; work-gva = gva; work-addr = gfn_to_hva(vcpu-kvm, gfn); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60850] New: BUG: Bad page state in process libvirtd pfn:76000
https://bugzilla.kernel.org/show_bug.cgi?id=60850 Bug ID: 60850 Summary: BUG: Bad page state in process libvirtd pfn:76000 Product: Virtualization Version: unspecified Kernel Version: 3.11 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: alexande...@gmail.com Regression: No Created attachment 107419 -- https://bugzilla.kernel.org/attachment.cgi?id=107419action=edit The part of dmesg (3.11) This bug reproduced on kernel 3.11 and earlier. Steps to reproduce: 1. Add intel_iommu=on in kernel boot cmdline 2. Dettach some NIC from host: for example # virsh nodedev-dettach pci__02_00_1 I attached dmesg of kernel 3.11. And more info avaible in https://bugs.gentoo.org/show_bug.cgi?id=477258 and https://bugs.launchpad.net/ubuntu/+source/ipxe/+bug/1181777, the post #23 -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] realmode: test RETF imm
Signed-off-by: Bruce Rogers brog...@suse.com --- x86/realmode.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index 3546771..c57e033 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -481,6 +481,9 @@ void test_io(void) asm (retf: lretw); extern void retf(); +asm (retf_imm: lretw $10); +extern void retf_imm(); + void test_call(void) { u32 esp[16]; @@ -503,6 +506,7 @@ void test_call(void) MK_INSN(call_far1, lcallw *(%ebx)\n\t); MK_INSN(call_far2, lcallw $0, $retf\n\t); MK_INSN(ret_imm,sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b); + MK_INSN(retf_imm, sub $10, %sp; lcallw $0, $retf_imm); exec_in_big_real_mode(insn_call1); report(call 1, R_AX, outregs.eax == 0x1234); @@ -523,6 +527,9 @@ void test_call(void) exec_in_big_real_mode(insn_ret_imm); report(ret imm 1, 0, 1); + + exec_in_big_real_mode(insn_retf_imm); + report(retf imm 1, 0, 1); } void test_jcc_short(void) -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OpenBSD 5.3 guest on KVM
Hi all! These days I tested OpenBSD 5.3 and pleasantly surprised me notice that they implemented VirtIO for block devices, network and memory ballooning. It is an important step for those who contribute to the project. Now what I'm seeing is that there seems to be some sort of problem with the ACPI to shutdown the VM. I remember that at one time it was not working, then they corrected it and now seems to be new problems with these messages. I tried turning off the VM from libvirt (virsh) and also from Qemu monitor booting the VM manually and in either case the result is the same: the VM freezes. # sysctl hw hw.machine=amd64 hw.model=QEMU Virtual CPU version 1.1.2 hw.ncpu=1 hw.byteorder=1234 hw.pagesize=4096 hw.disknames=cd0:,sd0:be0e0f1c0cdc4dae,fd0:,fd1: hw.diskcount=4 hw.cpuspeed=2009 hw.vendor=Bochs hw.product=Bochs hw.uuid=501ef229-2337-165f-8da3-905b12832049 hw.physmem=535814144 hw.usermem=535801856 hw.ncpufound=1 hw.allowpowerdown=1 hw.allowpowerdown set to 1 (the default) allows a power button shutdown. Someone had this problem and could solve it somehow? There any debug information I can provide to help solve this? Thanks in advance for your reply. Regards, Daniel -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux - Linux user #188.598 signature.asc Description: Digital signature
Re: [GIT PULL] KVM changes for 3.12
On Tue, Sep 3, 2013 at 5:10 AM, Gleb Natapov g...@redhat.com wrote: This pull request adds tlb_gather_mmu() caller in S390 code, but 2b047252 in your tree added another parameter to the function, so the patch bellow have to be applied during merge to resolve the conflicts. The patch was used in linux-next for awhile. Hmm. Fine. Except: /* Reallocate the page tables with pgstes */ mm-context.has_pgste = 1; - tlb_gather_mmu(tlb, mm, 0); + tlb_gather_mmu(tlb, mm, 0, TASK_SIZE); page_table_realloc(tlb, mm, 0, TASK_SIZE); tlb_finish_mmu(tlb, 0, -1); up_write(mm-mmap_sem); Realistically, the begin/end arguments to tlb_gather_mmu() and tlb_finish_mmu() should match. In fact, I considered getting rid of the ones to tlb_finish_mmu() because they are kind of pointless these days (but didn't, because I wanted to keep the patches minimal). And in your case they don't. Which implies a certain amount of confusion. It looks like it's not really a full-mm invalidate (it's not the final TLB flush before getting rid of the VM), so I think 0, TASK_SIZE is correct. I just think I'm going to also change that tlb_finish_mmu() to have the same 0, TASK_SIZE range, so that it's all consistent. It appears that s390 doesn't actually care about the range to tlb_finish_mmu(), so this is pretty academic, but I thought I'd mention it so that it doesn't come as a surprise that my merge resolution looks different from your suggested one. Linus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote: On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote: Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determining zerocopy once by checking all conditions at one time before. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 47 --- 1 files changed, 20 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8a6dd0d..3f89dea 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net) iov_length(nvq-hdr, s), hdr_size); break; } -zcopy_used = zcopy (len = VHOST_GOODCOPY_LEN || - nvq-upend_idx != nvq-done_idx); + +zcopy_used = zcopy len = VHOST_GOODCOPY_LEN +(nvq-upend_idx + 1) % UIO_MAXIOV != + nvq-done_idx Thinking about this, this looks strange. The original idea was that once we start doing zcopy, we keep using the heads ring even for short packets until no zcopy is outstanding. What's the reason for keep using the heads ring? What's the logic behind (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx here? Because we initialize both upend_idx and done_idx to zero, so upend_idx != done_idx could not be used to check whether or not the heads ring were full. +vhost_net_tx_select_zcopy(net); /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { +struct ubuf_info *ubuf; +ubuf = nvq-ubuf_info + nvq-upend_idx; + vq-heads[nvq-upend_idx].id = head; -if (!vhost_net_tx_select_zcopy(net) || -len VHOST_GOODCOPY_LEN) { -/* copy don't need to wait for DMA done */ -vq-heads[nvq-upend_idx].len = -VHOST_DMA_DONE_LEN; -msg.msg_control = NULL; -msg.msg_controllen = 0; -ubufs = NULL; -} else { -struct ubuf_info *ubuf; -ubuf = nvq-ubuf_info + nvq-upend_idx; - -vq-heads[nvq-upend_idx].len = -VHOST_DMA_IN_PROGRESS; -ubuf-callback = vhost_zerocopy_callback; -ubuf-ctx = nvq-ubufs; -ubuf-desc = nvq-upend_idx; -msg.msg_control = ubuf; -msg.msg_controllen = sizeof(ubuf); -ubufs = nvq-ubufs; -kref_get(ubufs-kref); -} +vq-heads[nvq-upend_idx].len = VHOST_DMA_IN_PROGRESS; +ubuf-callback = vhost_zerocopy_callback; +ubuf-ctx = nvq-ubufs; +ubuf-desc = nvq-upend_idx; +msg.msg_control = ubuf; +msg.msg_controllen = sizeof(ubuf); +ubufs = nvq-ubufs; +kref_get(ubufs-kref); nvq-upend_idx = (nvq-upend_idx + 1) % UIO_MAXIOV; -} else +} else { msg.msg_control = NULL; +ubufs = NULL; +} /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock-ops-sendmsg(NULL, sock, msg, len); if (unlikely(err 0)) { if (zcopy_used) { -if (ubufs) -vhost_net_ubuf_put(ubufs); +vhost_net_ubuf_put(ubufs); nvq-upend_idx = ((unsigned)nvq-upend_idx - 1) % UIO_MAXIOV; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] linux-headers: update to 3.11
On 09/04/2013 01:35 AM, Paolo Bonzini wrote: Il 03/09/2013 17:28, Alexey Kardashevskiy ha scritto: On 09/03/2013 08:42 PM, Jan Kiszka wrote: On 2013-09-03 11:32, Alexey Kardashevskiy wrote: On 09/03/2013 07:29 PM, Peter Maydell wrote: On 3 September 2013 09:27, Alexey Kardashevskiy a...@ozlabs.ru wrote: Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- I need this update as VFIO on PPC64/pseries got in upstream kernel and this is required by VFIO-SPAPR bits in QEMU. Others may find this update useful too :) --- linux-headers/asm-arm64/kvm.h | 168 linux-headers/asm-arm64/kvm_para.h | 1 + linux-headers/asm-mips/kvm.h| 81 + linux-headers/linux/kvm.h | 3 + linux-headers/linux/vfio.h | 42 - linux-headers/linux/virtio_config.h | 3 + 6 files changed, 254 insertions(+), 44 deletions(-) create mode 100644 linux-headers/asm-arm64/kvm.h create mode 100644 linux-headers/asm-arm64/kvm_para.h I think this should go in via the KVM tree, not trivial. I do not mind, it just went through the trivial tree last time, that's it. This shouldn't be routed through trivial in general as things broke too often in this area. Sorry for my ignorance, but this is The Kernel, it is already there, broken or not, even if it is broken, qemu cannot stay isolated, no? This is a mechanical change, no more. It's a matter of keeping things bisectable. If we can detect a breakage, we can first work around it, and then apply the header update. And if we don't detect it, maintainers usually send pull requests when they have time to work on breakage caused by their patches. I can see the discussion but I do not see if anyone is going to pull this through any tree. Please, somebody, pull. Thanks. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler
On 09/04/2013 07:59 PM, Daniel Borkmann wrote: On 09/04/2013 01:27 PM, Eric Dumazet wrote: On Wed, 2013-09-04 at 03:30 -0700, Eric Dumazet wrote: On Wed, 2013-09-04 at 14:30 +0800, Jason Wang wrote: And tcpdump would certainly help ;) See attachment. Nothing obvious on tcpdump (only that lot of frames are missing) 1) Are you capturing part of the payload only (like tcpdump -s 128) 2) What is the setup. 3) tc -s -d qdisc If you use FQ in the guest, then it could be that high resolution timers have high latency ? Probably they internally switch to a lower resolution clock event source if there's no hardware support available: The [source event] management layer provides interfaces for hrtimers to implement high resolution timers [...] [and it] supports these more advanced functions only when appropriate clock event sources have been registered, otherwise the traditional periodic tick based behaviour is retained. [1] [1] https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf Maybe, AFAIK, kvm-clock does not provide a clock event, only a pv clocksource were provided. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote: Or supporting all IOMMU links (and leaving emulated stuff as is) in on device is the last thing I have to do and then you'll ack the patch? I am concerned more about API here. Internal implementation details I leave to powerpc experts :) So Gleb, I want to step in for a bit here. While I understand that the new KVM device API is all nice and shiny and that this whole thing should probably have been KVM devices in the first place (had they existed or had we been told back then), the point is, the API for handling HW IOMMUs that Alexey is trying to add is an extension of an existing mechanism used for emulated IOMMUs. The internal data structure is shared, and fundamentally, by forcing him to use that new KVM device for the new stuff, we create a oddball API with an ioctl for one type of iommu and a KVM device for the other, which makes the implementation a complete mess in the kernel (and you should care :-) So for something completely new, I would tend to agree with you. However, I still think that for this specific case, we should just plonk-in the original ioctl proposed by Alexey and be done with it. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM changes for 3.12
On Wed, Sep 04, 2013 at 06:08:08PM -0700, Linus Torvalds wrote: On Tue, Sep 3, 2013 at 5:10 AM, Gleb Natapov g...@redhat.com wrote: This pull request adds tlb_gather_mmu() caller in S390 code, but 2b047252 in your tree added another parameter to the function, so the patch bellow have to be applied during merge to resolve the conflicts. The patch was used in linux-next for awhile. Hmm. Fine. Except: /* Reallocate the page tables with pgstes */ mm-context.has_pgste = 1; - tlb_gather_mmu(tlb, mm, 0); + tlb_gather_mmu(tlb, mm, 0, TASK_SIZE); page_table_realloc(tlb, mm, 0, TASK_SIZE); tlb_finish_mmu(tlb, 0, -1); up_write(mm-mmap_sem); Realistically, the begin/end arguments to tlb_gather_mmu() and tlb_finish_mmu() should match. In fact, I considered getting rid of the ones to tlb_finish_mmu() because they are kind of pointless these days (but didn't, because I wanted to keep the patches minimal). And in your case they don't. Which implies a certain amount of confusion. Actually they do match in our internal version of the merge conflict. It was just a copy-paste error from me when sending the merge resolution patch. Since the fix contained two changes lines within the same hunk it was hard to get right.. oh well.. :) Thanks for fixing it! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
On 01.09.2013, at 14:53, Gleb Natapov wrote: XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com Paul, please ack :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
On Sun, Sep 01, 2013 at 03:53:46PM +0300, Gleb Natapov wrote: XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com Acked-by: Paul Mackerras pau...@samba.org -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote: Or supporting all IOMMU links (and leaving emulated stuff as is) in on device is the last thing I have to do and then you'll ack the patch? I am concerned more about API here. Internal implementation details I leave to powerpc experts :) So Gleb, I want to step in for a bit here. While I understand that the new KVM device API is all nice and shiny and that this whole thing should probably have been KVM devices in the first place (had they existed or had we been told back then), the point is, the API for handling HW IOMMUs that Alexey is trying to add is an extension of an existing mechanism used for emulated IOMMUs. The internal data structure is shared, and fundamentally, by forcing him to use that new KVM device for the new stuff, we create a oddball API with an ioctl for one type of iommu and a KVM device for the other, which makes the implementation a complete mess in the kernel (and you should care :-) So for something completely new, I would tend to agree with you. However, I still think that for this specific case, we should just plonk-in the original ioctl proposed by Alexey and be done with it. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html