RE: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option

2012-07-18 Thread Vinod, Chegu
Thanks Eduardo !

Hi Anthony, If you are ok with this patch...could you pl pull these changes 
into upstream (or) 
suggest who I should talk to get these changes in ?

Thanks!
Vinod

-Original Message-
From: Eduardo Habkost [mailto:ehabk...@redhat.com] 
Sent: Wednesday, July 18, 2012 10:15 AM
To: Vinod, Chegu
Cc: qemu-de...@nongnu.org; aligu...@us.ibm.com; kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's 
-numa option

On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
>- using bitmap_set() instead of set_bit() in numa_add() routine.
>- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
>- Rebased to the latest qemu.

Tested-by: Eduardo Habkost 
Reviewed-by: Eduardo Habkost 


> 
> Changes since v2:
>- Using "unsigned long *" for the node_cpumask[].
>- Use bitmap_new() instead of g_malloc0() for allocation.
>- Don't rely on "max_cpus" since it may not be initialized
>  before the numa related qemu options are parsed & processed.
> 
> Note: Continuing to use a new constant for allocation of
>   the mask (This constant is currently set to 255 since
>   with an 8bit APIC ID VCPUs can range from 0-254 in a
>   guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> 
> Changes since v1:
> 
>- Use bitmap functions that are already in qemu (instead
>  of cpu_set_t macro's from sched.h)
>- Added a check for endvalue >= max_cpus.
>- Fix to address the round-robbing assignment when
>  cpu's are not explicitly specified.
> ---
> 
> v1:
> --
> 
> The -numa option to qemu is used to create [fake] numa nodes and 
> expose them to the guest OS instance.
> 
> There are a couple of issues with the -numa option:
> 
> a) Max VCPU's that can be specified for a guest while using
>the qemu's -numa option is 64. Due to a typecasting issue
>when the number of VCPUs is > 32 the VCPUs don't show up
>under the specified [fake] numa nodes.
> 
> b) KVM currently has support for 160VCPUs per guest. The
>qemu's -numa option has only support for upto 64VCPUs
>per guest.
> This patch addresses these two issues.
> 
> Below are examples of (a) and (b)
> 
> a) >32 VCPUs are specified with the -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
> 
> ...
> Upstream qemu :
> --
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 
> size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 
> 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 
> 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 
> cpus: 30 node 3 size: 131072 MB node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
> 
> With the patch applied :
> ---
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 
> cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 
> cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
> 
> b) >64 VCPUs specified with -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu 
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
> 
> ...
> 
> Upstream qemu :
> --
> 
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB 
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 
> 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 
> 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB 
> node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 
> size: 65

RE: [Qemu-devel] KVM call agenda for Tuesday, June 19th

2012-07-11 Thread Vinod, Chegu


-Original Message-
From: Dor Laor [mailto:dl...@redhat.com] 
Sent: Wednesday, July 11, 2012 2:59 AM
To: Vinod, Chegu
Cc: kvm@vger.kernel.org
Subject: Re: [Qemu-devel] KVM call agenda for Tuesday, June 19th

On 06/19/2012 06:42 PM, Chegu Vinod wrote:
> Hello,
>
> Wanted to share some preliminary data from live migration experiments 
> on a setup that is perhaps one of the larger ones.
>
> We used Juan's "huge_memory" patches (without the separate migration 
> thread) and measured the total migration time and the time taken for stage 3 
> ("downtime").
> Note: We didn't change the default "downtime" (30ms?). We had a 
> private 10Gig back-to-back link between the two hosts..and we set the 
> migration speed to 10Gig.
>
> The "workloads" chosen were ones that we could easily setup. All 
> experiments were done without using virsh/virt-manager (i.e. direct 
> interaction with the qemu monitor prompt).  Pl. see the data below.
>
> As the guest size increased (and for busier the workloads) we observed 
> that network connections were getting dropped not only during the "downtime" 
> (i.e.
> stage 3) but also during at times during iterative pre-copy phase 
> (i.e. stage 2).  Perhaps some of this will get fixed when we have the 
> migration thread implemented.
>
> We had also briefly tried the proposed delta compression changes 
> (easier to say than XBZRLE :)) on a smaller configuration. For the 
> simple workloads (perhaps there was not much temporal locality in 
> them) it didn't seem to show improvements instead took much longer 
> time to migrate (high cache miss penalty?). Waiting for the updated 
> version of the XBZRLE for further experiments to see how well it scales on 
> this larger set up...
>
> FYI
> Vinod
>
> ---
> 10VCPUs/128G
> ---
> 1) Idle guest
> Total migration time : 124585 ms,
> Stage_3_time : 941 ms ,
> Total MB transferred : 2720
>
>
> 2) AIM7-compute (2000 users)
> Total migration time : 123540 ms,
> Stage_3_time : 726 ms ,
> Total MB transferred : 3580
>
> 3) SpecJBB (modified to run 10 warehouse threads for a long duration 
> of time) Total migration time : 165720 ms, Stage_3_time : 6851 ms , 
> Total MB transferred : 19656

6.8s downtime may be unacceptable for some applications. Does it converges with 
maximum downtime of 1sec?
In theory this is where post copy can shine. But what we're missing in the 
(good) performance data is how the application perform during live migration. 
This is exactly where the live migration thread and dirtybit optimization 
should help us.

Our 'friends' have nice old analysis of live migration performance:
  -
http://www.cl.cam.ac.uk/research/srg/netos/papers/2005-migration-nsdi-pre.pdf
  - http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf

Cheers,
Dor
>
>
>


There have been some recent fixes (from Juan) that are supposed to honor the 
user requested downtime. I am in the middle of redoing some of my 
experiments...and will share when they are ready (in about 3-4 days).  Initial 
observations are that the time take for the total migration considerably 
increases but there are no observed stalls or ping timeouts etc. Will know more 
after I finish my experiments (i.e. the non-XBZRLE ones).

As expected the 10G [back -to-back] connection is not really getting saturated 
with the migration traffic... so the there is some other layer that is 
consuming time (possibly the overhead of  tracking dirty pages).  

I haven't yet  had the time to try to quantify the performance degradation on 
the workload during the live migration (stage 2)... need to look at that next. 

Thanks for the pointers to the old artcles. 

Thanks
Vinod



> 4) Google SAT  (-s 3600 -C 5 -i 5)
> Total migration time : 411827 ms,
> Stage_3_time : 77807 ms ,
> Total MB transferred : 142136
>
>
>
> ---
> 20VCPUs /256G
> ---
>
> 1) Idle  guest
> Total migration time : 259938 ms,
> Stage_3_time : 1998 ms ,
> Total MB transferred : 5114
>
> 2) AIM7-compute (2000 users)
> Total migration time : 261336 ms,
> Stage_3_time : 2107 ms ,
> Total MB transferred : 5473
>
> 3) SpecJBB (modified to run 20 warehouse threads for a long duration 
> of time) Total migration time : 390548 ms, Stage_3_time : 19596 ms , 
> Total MB transferred : 48109
>
> 4) Google SAT  (-s 3600 -C 10 -i 10)
> Total migration time : 780150 ms,
> Stage_3_time : 90346 ms ,
> Total MB transferred : 251287
>
> 
> 30VCPUs/384G
> ---
>
> 1) Idle guest
> (qemu) Total migration time : 501704 ms, Stage_3_time : 2835 ms , 
> Total MB transferred : 15731
>
>
> 2) AIM7-compute (2000 users)
> Total migration time : 496001 ms,
> Stage_

RE: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-28 Thread Vinod, Chegu
Hello,

I am just catching up on this email thread... 

Perhaps one of you may be able to help answer this query.. preferably along 
with some data.  [BTW, I do understand the basic intent behind PLE in a typical 
[sweet spot] use case where there is over subscription etc. and the need to 
optimize the PLE handler in the host etc. ]

In a use case where the host has fewer but much larger guests (say 40VCPUs and 
higher) and there is no over subscription (i.e. # of vcpus across guests <= 
physical cpus in the host  and perhaps each guest has their vcpu's pinned to 
specific physical cpus for other reasons), I would like to understand if/how  
the PLE really helps ?  For these use cases would it be ok to turn PLE off 
(ple_gap=0) since is no real need to take an exit and find some other VCPU to 
yield to ? 

Thanks
Vinod

-Original Message-
From: Raghavendra K T [mailto:raghavendra...@linux.vnet.ibm.com] 
Sent: Thursday, June 28, 2012 9:22 AM
To: Andrew Jones
Cc: Rik van Riel; Marcelo Tosatti; Srikar; Srivatsa Vaddagiri; Peter Zijlstra; 
Nikunj A. Dadhania; KVM; LKML; Gleb Natapov; Vinod, Chegu; Jeremy Fitzhardinge; 
Avi Kivity; Ingo Molnar
Subject: Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

On 06/28/2012 09:30 PM, Andrew Jones wrote:
>
>
> - Original Message -
>> In summary, current PV has huge benefit on non-PLE machine.
>>
>> On PLE machine, the results become very sensitive to load, type of 
>> workload and SPIN_THRESHOLD. Also PLE interference has significant 
>> effect on them. But still it has slight edge over non PV.
>>
>
> Hi Raghu,
>
> sorry for my slow response. I'm on vacation right now (until the 9th 
> of July) and I have limited access to mail.

Ok. Happy Vacation :)

Also, thanks for
> continuing the benchmarking. Question, when you compare PLE vs.
> non-PLE, are you using different machines (one with and one without), 
> or are you disabling its use by loading the kvm module with the 
> ple_gap=0 modparam as I did?

Yes, I am doing the same when I say with PLE disabled and comparing the 
benchmarks (i.e loading kvm module with ple_gap=0).

But older non-PLE results were on a different machine altogether. (I had 
limited access to PLE machine).




RE: [Qemu-devel] [PATCH v2] Fixes related to processing of qemu's -numa option

2012-06-25 Thread Vinod, Chegu


-Original Message-
From: Eduardo Habkost [mailto:ehabk...@redhat.com] 
Sent: Monday, June 25, 2012 1:01 PM
To: Vinod, Chegu
Cc: Hada, Craig M; Hull, Jim; qemu-de...@nongnu.org; kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v2] Fixes related to processing of qemu's 
-numa option

Just found another issue:

On Wed, Jun 20, 2012 at 05:33:29PM -0300, Eduardo Habkost wrote:
[...]
> > @@ -970,27 +974,24 @@ static void numa_add(const char *optarg)
> >  }
> >  node_mem[nodenr] = sval;
> >  }
> > -if (get_param_value(option, 128, "cpus", optarg) == 0) {
> > -node_cpumask[nodenr] = 0;
> > -} else {
> > +if (get_param_value(option, 128, "cpus", optarg) != 0) {
> >  value = strtoull(option, &endptr, 10);
> > -if (value >= 64) {
> > -value = 63;
> > -fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n");
> > +if (*endptr == '-') {
> > +endvalue = strtoull(endptr+1, &endptr, 10);
> >  } else {
> > -if (*endptr == '-') {
> > -endvalue = strtoull(endptr+1, &endptr, 10);
> > -if (endvalue >= 63) {
> > -endvalue = 62;
> > -fprintf(stderr,
> > -"only 63 CPUs in NUMA mode supported.\n");
> > -}
> > -value = (2ULL << endvalue) - (1ULL << value);
> > -} else {
> > -value = 1ULL << value;
> > -}
> > +endvalue = value;
> > +}
> > +
> > +if (endvalue >= max_cpus) {
> > +endvalue = max_cpus - 1;
> > +fprintf(stderr,
> > +"A max of %d CPUs are supported in a guest on this 
> > host\n",
> > +   max_cpus);
> > +}

>This makes the following command segfault:
>
>$ qemu-system-x86_64 -numa 'node,cpus=1-3' -smp 100
>
>max_cpus may be not initialized yet at the call to numa_add(), as you are 
>still parsing the command-line options.

Yes. I am aware of this issue too... and shall post another version of the 
patch next week (along with another fix) .  

Thanks
Vinod



--
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


performance of vhost-net over SRIOV+macvtap

2012-03-29 Thread Vinod Chegu

Hello,

RHEL 6.2 had a technology evaluation version of : vhost-net over SRIOV using 
macvtap. Wondering if there were any preliminary performance studies done (i.e. 
i/o performance in the guest and also host cpu cycles consumption) between the 
following configurations :

1) vhost-net over SRIOV using macvtap  (vs). generic vhost-net 

2) vhost-net over SRIOV using macvtap  (vs). direct assignment of the VF 
(SRIOV).

Thanks!
Vinod

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vCPU hotplug

2012-01-31 Thread Vinod Chegu

Dear All,

I am using version 1.0 of qemu-kvm along with 3.2.x host+guest kernels on an 
X86_64 server. I was unable to trigger a hotplug of a vCPU from the qemu's 
monitor prompt (tripped over an assertion in qemu). 

Tried to look through the recent archives and noticed a couple of proposed 
fixes. One for this assertion and another to trigger the actual 
SCI notification. Not sure if they were official fixes or not...

I tried it out with these  2 fixes. I noticed that the cpuX showed up in 
the /sys/devices/system/cpu/ subdir. 

I was however not able to activate the newly hotplug'd vCPU via 
"echo 1 > cpuX/online".  Looks like the vCPU isn't waking up in the guest.

If there is any [proposed|official] fix available that I could try out 
for this issue..pl point me to the same.

Thanks in anticipation!

Vinod  

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to start a guest using qemu-kvm directly

2012-01-31 Thread Vinod Chegu
Michael Tokarev  tls.msk.ru> writes:

> 
> You need to understand how to (pre-)configure networking
> for qemu (pre- if you want to run it as non-root), this
> is described in the users guide and in a lot of howtos
> all around the 'net.
> 
> /mjt


Thanks! I was able to find the needed info to bring the guest up and connect
to it etc.

Vinod



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to start a guest using qemu-kvm directly

2012-01-29 Thread Vinod Chegu

Dear All,

I am using RHEL 6.2 + KVM on a X86_64 server.  I have been able to create Linux 
guests using virt-install (using virto and/or pci passthrough) and am able to 
manage the guests using virsh and/or virt-manager for doing some basic stuff. 

Here is a sample guest (that I was able to create using virt-install and able 
to 
boot fine using either virsh or virt-manager) :



  testvm2
  d44e8618-e48c-531b-01c4-80fc2a026a25
  4194304
  4194304
  1
  
hvm

  
  



  
  
  destroy
  restart
  restart
  
/usr/libexec/qemu-kvm

  
  
  
  
  


  
  
  
  
  


  
  


  
  
  
  
  
  


  
  
  


  
  
  


  




  
  
  


  
  

  
  
system_u:system_r:svirt_t:s0:c290,c578
system_u:object_r:svirt_image_t:s0:c290,c578
  


 

Tried to start the above guest using qemu-kvm directly (so that I can in future 
specify different options etc).  But I am getting some errors. (pl. see below).

# /usr/libexec/qemu-kvm -version
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 2003-2008 
Fabrice Bellard

# /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 4096 -smp 
1,sockets=1,cores=1,threads=1 -name testvm2 -uuid d44e8618-e48c-531b-01c4-
80fc2a026a25 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/testvm2.monitor,server,nowait -
mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -
drive file=/var/lib/libvirt/images/vmStorage/vm2.img,if=none,id=drive-virtio-
disk0,format=qcow2,cache=none -device virtio-blk-
pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -
drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-
drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-
pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:dc:d3,bus=pci.0,addr=0x3 -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -
device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-
pci,id=balloon0,bus=pci.0,addr=0x5


char device redirected to /dev/pts/5
qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: 
TUNGETIFF 
ioctl() failed: Bad file descriptor
TUNSETOFFLOAD ioctl() failed: Bad file descriptor
qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: 
vhost-net 
requested but could not be initialized
qemu-system-x86_64: -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24: Device 
'tap' could not be initialized




Can someone please tell me what this means and any tips on how I can solve 
this? 
Do I have to do some other setup to make qemu-kvm happy w.r.t the -netdev 
options?


Thank you in anticiaption!!
Vinod


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cpu_set online causing a guest to hang.

2012-01-27 Thread Vinod Chegu

Hello,

Wanted to check about the current status of the cpu hotplug support in KVM 
guests.

Pl. excuse me if the following is a known issue (pl. point me to 
the appropriate status/issue/bug-report if it is). 

I have an RHEL6.2 (x86_64)+ KVM host with a KVM guest running RHEL 6.2.

The guest is up and configured with 8 vCPUs (0-7) 

[root@testvm1 cpu]# ls
cpu0  cpu2  cpu4  cpu6  cpufreq  kernel_max  onlinepresent
cpu1  cpu3  cpu5  cpu7  cpuidle  offline possible


(I had set the maximum vcpus to 12).

I tried to hotplug add a new cpu and so I tried the following command from my 
host. 

[root@host ~]# virsh qemu-monitor-command testvm1 --hmp cpu_set 8 online


The guest hung and was non responsive. 

Did I miss any step or is some functionality missing?

Any help/pointers would be appreciated. 

Thanks
Vinod

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kemari

2012-01-27 Thread Vinod Chegu


Thanks for the pointers Mitsuru !

Vinod



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kemari

2012-01-24 Thread Vinod Chegu

Hello,

[ I am very new to KVM...so i am not sure if this is the right forum to ask this
question. If not kindly point me to the right forum. ]

I would like to get some info. on the current status of Kemari for KVM.

Does Kemari support SMP guests or do the guests have to have only 1 VCPU ?

Is there a version [& code] that is available that one can try out on an
RHEL6.2 + KVM + X86_64 env.? 

Thanks
Vinod

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html