RE: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option
Thanks Eduardo ! Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or) suggest who I should talk to get these changes in ? Thanks! Vinod -Original Message- From: Eduardo Habkost [mailto:ehabk...@redhat.com] Sent: Wednesday, July 18, 2012 10:15 AM To: Vinod, Chegu Cc: qemu-de...@nongnu.org; aligu...@us.ibm.com; kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote: > Changes since v3: >- using bitmap_set() instead of set_bit() in numa_add() routine. >- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. >- Rebased to the latest qemu. Tested-by: Eduardo Habkost Reviewed-by: Eduardo Habkost > > Changes since v2: >- Using "unsigned long *" for the node_cpumask[]. >- Use bitmap_new() instead of g_malloc0() for allocation. >- Don't rely on "max_cpus" since it may not be initialized > before the numa related qemu options are parsed & processed. > > Note: Continuing to use a new constant for allocation of > the mask (This constant is currently set to 255 since > with an 8bit APIC ID VCPUs can range from 0-254 in a > guest. The APIC ID 255 (0xFF) is reserved for broadcast). > > Changes since v1: > >- Use bitmap functions that are already in qemu (instead > of cpu_set_t macro's from sched.h) >- Added a check for endvalue >= max_cpus. >- Fix to address the round-robbing assignment when > cpu's are not explicitly specified. > --- > > v1: > -- > > The -numa option to qemu is used to create [fake] numa nodes and > expose them to the guest OS instance. > > There are a couple of issues with the -numa option: > > a) Max VCPU's that can be specified for a guest while using >the qemu's -numa option is 64. Due to a typecasting issue >when the number of VCPUs is > 32 the VCPUs don't show up >under the specified [fake] numa nodes. > > b) KVM currently has support for 160VCPUs per guest. The >qemu's -numa option has only support for upto 64VCPUs >per guest. > This patch addresses these two issues. > > Below are examples of (a) and (b) > > a) >32 VCPUs are specified with the -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > 71:01:01 \ > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 > > ... > Upstream qemu : > -- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 > cpus: 30 node 3 size: 131072 MB node 4 cpus: > node 4 size: 131072 MB > node 5 cpus: 31 > node 5 size: 131072 MB > > With the patch applied : > --- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB > > b) >64 VCPUs specified with -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > -cpu > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 > > ... > > Upstream qemu : > -- > > only 63 CPUs in NUMA mode supported. > only 64 CPUs in NUMA mode supported. > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: > node 4 size: 65536 MB > node 5 cpus: > node 5 size: 65536 MB > node 6 cpus: 31 63 > node 6 size: 65536 MB > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 > size: 65
RE: [Qemu-devel] KVM call agenda for Tuesday, June 19th
-Original Message- From: Dor Laor [mailto:dl...@redhat.com] Sent: Wednesday, July 11, 2012 2:59 AM To: Vinod, Chegu Cc: kvm@vger.kernel.org Subject: Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th On 06/19/2012 06:42 PM, Chegu Vinod wrote: > Hello, > > Wanted to share some preliminary data from live migration experiments > on a setup that is perhaps one of the larger ones. > > We used Juan's "huge_memory" patches (without the separate migration > thread) and measured the total migration time and the time taken for stage 3 > ("downtime"). > Note: We didn't change the default "downtime" (30ms?). We had a > private 10Gig back-to-back link between the two hosts..and we set the > migration speed to 10Gig. > > The "workloads" chosen were ones that we could easily setup. All > experiments were done without using virsh/virt-manager (i.e. direct > interaction with the qemu monitor prompt). Pl. see the data below. > > As the guest size increased (and for busier the workloads) we observed > that network connections were getting dropped not only during the "downtime" > (i.e. > stage 3) but also during at times during iterative pre-copy phase > (i.e. stage 2). Perhaps some of this will get fixed when we have the > migration thread implemented. > > We had also briefly tried the proposed delta compression changes > (easier to say than XBZRLE :)) on a smaller configuration. For the > simple workloads (perhaps there was not much temporal locality in > them) it didn't seem to show improvements instead took much longer > time to migrate (high cache miss penalty?). Waiting for the updated > version of the XBZRLE for further experiments to see how well it scales on > this larger set up... > > FYI > Vinod > > --- > 10VCPUs/128G > --- > 1) Idle guest > Total migration time : 124585 ms, > Stage_3_time : 941 ms , > Total MB transferred : 2720 > > > 2) AIM7-compute (2000 users) > Total migration time : 123540 ms, > Stage_3_time : 726 ms , > Total MB transferred : 3580 > > 3) SpecJBB (modified to run 10 warehouse threads for a long duration > of time) Total migration time : 165720 ms, Stage_3_time : 6851 ms , > Total MB transferred : 19656 6.8s downtime may be unacceptable for some applications. Does it converges with maximum downtime of 1sec? In theory this is where post copy can shine. But what we're missing in the (good) performance data is how the application perform during live migration. This is exactly where the live migration thread and dirtybit optimization should help us. Our 'friends' have nice old analysis of live migration performance: - http://www.cl.cam.ac.uk/research/srg/netos/papers/2005-migration-nsdi-pre.pdf - http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf Cheers, Dor > > > There have been some recent fixes (from Juan) that are supposed to honor the user requested downtime. I am in the middle of redoing some of my experiments...and will share when they are ready (in about 3-4 days). Initial observations are that the time take for the total migration considerably increases but there are no observed stalls or ping timeouts etc. Will know more after I finish my experiments (i.e. the non-XBZRLE ones). As expected the 10G [back -to-back] connection is not really getting saturated with the migration traffic... so the there is some other layer that is consuming time (possibly the overhead of tracking dirty pages). I haven't yet had the time to try to quantify the performance degradation on the workload during the live migration (stage 2)... need to look at that next. Thanks for the pointers to the old artcles. Thanks Vinod > 4) Google SAT (-s 3600 -C 5 -i 5) > Total migration time : 411827 ms, > Stage_3_time : 77807 ms , > Total MB transferred : 142136 > > > > --- > 20VCPUs /256G > --- > > 1) Idle guest > Total migration time : 259938 ms, > Stage_3_time : 1998 ms , > Total MB transferred : 5114 > > 2) AIM7-compute (2000 users) > Total migration time : 261336 ms, > Stage_3_time : 2107 ms , > Total MB transferred : 5473 > > 3) SpecJBB (modified to run 20 warehouse threads for a long duration > of time) Total migration time : 390548 ms, Stage_3_time : 19596 ms , > Total MB transferred : 48109 > > 4) Google SAT (-s 3600 -C 10 -i 10) > Total migration time : 780150 ms, > Stage_3_time : 90346 ms , > Total MB transferred : 251287 > > > 30VCPUs/384G > --- > > 1) Idle guest > (qemu) Total migration time : 501704 ms, Stage_3_time : 2835 ms , > Total MB transferred : 15731 > > > 2) AIM7-compute (2000 users) > Total migration time : 496001 ms, > Stage_
RE: [PATCH] kvm: handle last_boosted_vcpu = 0 case
Hello, I am just catching up on this email thread... Perhaps one of you may be able to help answer this query.. preferably along with some data. [BTW, I do understand the basic intent behind PLE in a typical [sweet spot] use case where there is over subscription etc. and the need to optimize the PLE handler in the host etc. ] In a use case where the host has fewer but much larger guests (say 40VCPUs and higher) and there is no over subscription (i.e. # of vcpus across guests <= physical cpus in the host and perhaps each guest has their vcpu's pinned to specific physical cpus for other reasons), I would like to understand if/how the PLE really helps ? For these use cases would it be ok to turn PLE off (ple_gap=0) since is no real need to take an exit and find some other VCPU to yield to ? Thanks Vinod -Original Message- From: Raghavendra K T [mailto:raghavendra...@linux.vnet.ibm.com] Sent: Thursday, June 28, 2012 9:22 AM To: Andrew Jones Cc: Rik van Riel; Marcelo Tosatti; Srikar; Srivatsa Vaddagiri; Peter Zijlstra; Nikunj A. Dadhania; KVM; LKML; Gleb Natapov; Vinod, Chegu; Jeremy Fitzhardinge; Avi Kivity; Ingo Molnar Subject: Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case On 06/28/2012 09:30 PM, Andrew Jones wrote: > > > - Original Message - >> In summary, current PV has huge benefit on non-PLE machine. >> >> On PLE machine, the results become very sensitive to load, type of >> workload and SPIN_THRESHOLD. Also PLE interference has significant >> effect on them. But still it has slight edge over non PV. >> > > Hi Raghu, > > sorry for my slow response. I'm on vacation right now (until the 9th > of July) and I have limited access to mail. Ok. Happy Vacation :) Also, thanks for > continuing the benchmarking. Question, when you compare PLE vs. > non-PLE, are you using different machines (one with and one without), > or are you disabling its use by loading the kvm module with the > ple_gap=0 modparam as I did? Yes, I am doing the same when I say with PLE disabled and comparing the benchmarks (i.e loading kvm module with ple_gap=0). But older non-PLE results were on a different machine altogether. (I had limited access to PLE machine).
RE: [Qemu-devel] [PATCH v2] Fixes related to processing of qemu's -numa option
-Original Message- From: Eduardo Habkost [mailto:ehabk...@redhat.com] Sent: Monday, June 25, 2012 1:01 PM To: Vinod, Chegu Cc: Hada, Craig M; Hull, Jim; qemu-de...@nongnu.org; kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH v2] Fixes related to processing of qemu's -numa option Just found another issue: On Wed, Jun 20, 2012 at 05:33:29PM -0300, Eduardo Habkost wrote: [...] > > @@ -970,27 +974,24 @@ static void numa_add(const char *optarg) > > } > > node_mem[nodenr] = sval; > > } > > -if (get_param_value(option, 128, "cpus", optarg) == 0) { > > -node_cpumask[nodenr] = 0; > > -} else { > > +if (get_param_value(option, 128, "cpus", optarg) != 0) { > > value = strtoull(option, &endptr, 10); > > -if (value >= 64) { > > -value = 63; > > -fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); > > +if (*endptr == '-') { > > +endvalue = strtoull(endptr+1, &endptr, 10); > > } else { > > -if (*endptr == '-') { > > -endvalue = strtoull(endptr+1, &endptr, 10); > > -if (endvalue >= 63) { > > -endvalue = 62; > > -fprintf(stderr, > > -"only 63 CPUs in NUMA mode supported.\n"); > > -} > > -value = (2ULL << endvalue) - (1ULL << value); > > -} else { > > -value = 1ULL << value; > > -} > > +endvalue = value; > > +} > > + > > +if (endvalue >= max_cpus) { > > +endvalue = max_cpus - 1; > > +fprintf(stderr, > > +"A max of %d CPUs are supported in a guest on this > > host\n", > > + max_cpus); > > +} >This makes the following command segfault: > >$ qemu-system-x86_64 -numa 'node,cpus=1-3' -smp 100 > >max_cpus may be not initialized yet at the call to numa_add(), as you are >still parsing the command-line options. Yes. I am aware of this issue too... and shall post another version of the patch next week (along with another fix) . Thanks Vinod -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
performance of vhost-net over SRIOV+macvtap
Hello, RHEL 6.2 had a technology evaluation version of : vhost-net over SRIOV using macvtap. Wondering if there were any preliminary performance studies done (i.e. i/o performance in the guest and also host cpu cycles consumption) between the following configurations : 1) vhost-net over SRIOV using macvtap (vs). generic vhost-net 2) vhost-net over SRIOV using macvtap (vs). direct assignment of the VF (SRIOV). Thanks! Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vCPU hotplug
Dear All, I am using version 1.0 of qemu-kvm along with 3.2.x host+guest kernels on an X86_64 server. I was unable to trigger a hotplug of a vCPU from the qemu's monitor prompt (tripped over an assertion in qemu). Tried to look through the recent archives and noticed a couple of proposed fixes. One for this assertion and another to trigger the actual SCI notification. Not sure if they were official fixes or not... I tried it out with these 2 fixes. I noticed that the cpuX showed up in the /sys/devices/system/cpu/ subdir. I was however not able to activate the newly hotplug'd vCPU via "echo 1 > cpuX/online". Looks like the vCPU isn't waking up in the guest. If there is any [proposed|official] fix available that I could try out for this issue..pl point me to the same. Thanks in anticipation! Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to start a guest using qemu-kvm directly
Michael Tokarev tls.msk.ru> writes: > > You need to understand how to (pre-)configure networking > for qemu (pre- if you want to run it as non-root), this > is described in the users guide and in a lot of howtos > all around the 'net. > > /mjt Thanks! I was able to find the needed info to bring the guest up and connect to it etc. Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unable to start a guest using qemu-kvm directly
Dear All, I am using RHEL 6.2 + KVM on a X86_64 server. I have been able to create Linux guests using virt-install (using virto and/or pci passthrough) and am able to manage the guests using virsh and/or virt-manager for doing some basic stuff. Here is a sample guest (that I was able to create using virt-install and able to boot fine using either virsh or virt-manager) : testvm2 d44e8618-e48c-531b-01c4-80fc2a026a25 4194304 4194304 1 hvm destroy restart restart /usr/libexec/qemu-kvm system_u:system_r:svirt_t:s0:c290,c578 system_u:object_r:svirt_image_t:s0:c290,c578 Tried to start the above guest using qemu-kvm directly (so that I can in future specify different options etc). But I am getting some errors. (pl. see below). # /usr/libexec/qemu-kvm -version QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 2003-2008 Fabrice Bellard # /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1 -name testvm2 -uuid d44e8618-e48c-531b-01c4- 80fc2a026a25 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/testvm2.monitor,server,nowait - mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown - drive file=/var/lib/libvirt/images/vmStorage/vm2.img,if=none,id=drive-virtio- disk0,format=qcow2,cache=none -device virtio-blk- pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 - drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide- drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:dc:d3,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb - device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x5 char device redirected to /dev/pts/5 qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: TUNGETIFF ioctl() failed: Bad file descriptor TUNSETOFFLOAD ioctl() failed: Bad file descriptor qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: vhost-net requested but could not be initialized qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: Device 'tap' could not be initialized Can someone please tell me what this means and any tips on how I can solve this? Do I have to do some other setup to make qemu-kvm happy w.r.t the -netdev options? Thank you in anticiaption!! Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
cpu_set online causing a guest to hang.
Hello, Wanted to check about the current status of the cpu hotplug support in KVM guests. Pl. excuse me if the following is a known issue (pl. point me to the appropriate status/issue/bug-report if it is). I have an RHEL6.2 (x86_64)+ KVM host with a KVM guest running RHEL 6.2. The guest is up and configured with 8 vCPUs (0-7) [root@testvm1 cpu]# ls cpu0 cpu2 cpu4 cpu6 cpufreq kernel_max onlinepresent cpu1 cpu3 cpu5 cpu7 cpuidle offline possible (I had set the maximum vcpus to 12). I tried to hotplug add a new cpu and so I tried the following command from my host. [root@host ~]# virsh qemu-monitor-command testvm1 --hmp cpu_set 8 online The guest hung and was non responsive. Did I miss any step or is some functionality missing? Any help/pointers would be appreciated. Thanks Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kemari
Thanks for the pointers Mitsuru ! Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kemari
Hello, [ I am very new to KVM...so i am not sure if this is the right forum to ask this question. If not kindly point me to the right forum. ] I would like to get some info. on the current status of Kemari for KVM. Does Kemari support SMP guests or do the guests have to have only 1 VCPU ? Is there a version [& code] that is available that one can try out on an RHEL6.2 + KVM + X86_64 env.? Thanks Vinod -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html