On Wed, Jul 29, 2015 at 11=38=44AM +0100, Dr. David Alan Gilbert wrote: > * Eduardo Otubo (eduardo.ot...@profitbricks.com) wrote: > > On Wed, Jul 29, 2015 at 10=32=59AM +0100, Dr. David Alan Gilbert wrote: > > > * Eduardo Otubo (eduardo.ot...@profitbricks.com) wrote: > > > > On Wed, Jul 29, 2015 at 09=11=21AM +0100, Dr. David Alan Gilbert wrote: > > > > > * Eduardo Otubo (eduardo.ot...@profitbricks.com) wrote: > > > > > > On Tue, Jul 28, 2015 at 04=19=46PM +0100, Dr. David Alan Gilbert > > > > > > wrote: > > > > > > > * Eduardo Otubo (eduardo.ot...@profitbricks.com) wrote: > > > > > > > > Hello all, > > > > > > > > > > > > > > > > I'm facing a weird behavior on my tests: I am able to live > > > > > > > > migrate > > > > > > > > between two virtual machines on my localhost, but not to another > > > > > > > > machine, both using tcp. > > > > > > > > > > > > > > > > * I am using the same arguments on the command line; > > > > > > > > * Both virtual machines uses the same qcow2 file visible > > > > > > > > through NFS; > > > > > > > > * Both machines are in the same subnet; > > > > > > > > * Migration is being done from intel to intel; > > > > > > > > * Same version of Qemu (github master - f8787f8723); > > > > > > > > > > > > > > > > Using all above I am able to live migrate on the same host: > > > > > > > > between two > > > > > > > > vms on local host or between two vms in the remote host; but > > > > > > > > when > > > > > > > > migrating from local to remote, the guest hangs. I still can > > > > > > > > access its > > > > > > > > console via ctrl+alt+2, though, and everything seems to be > > > > > > > > normal. If I > > > > > > > > issue a reboote via console on the remote, the guest gets back > > > > > > > > to > > > > > > > > normal. > > > > > > > > > > > > > > > > Am I missing something here? > > > > > > > > > > > > > > Just checking, but are you saying that as far as qemu is > > > > > > > concerned, the migration > > > > > > > is happy, it's just the guest that's hung? > > > > > > > > > > > > That's exactly the case. The console (via ctrl+alt+2) is active and > > > > > > responding to all commands normally, but the screen (ctrl+alt+1) is > > > > > > frozen and I can't interact with it at all. > > > > > > > > > > Are you driving this via libvirt or using qemu monitor directly? > > > > > If the latter, can you please get an 'info migrate' from the source > > > > > and an 'info status' from the destination at the end of migrate. > > > > > > > > I'm using qemu command line directly. And I got the problem :) See > > > > below. > > > > > > > > > > > > > > > > Are the host clocks on the two hosts very close (there are lots of > > > > > > > weird corner cases with mismatched clocks) - same time zone? > > > > > > > > > > > > Yep. Both machines are in the same room and have the clock sync'ed. > > > > > > > > > > OK, good. > > > > > > > > > > > > > > > > > > > Are you using cache=none (given that it's NFS shared) > > > > > > > > > > > > I wasn't. But I tried again with cache=none and I got exactly the > > > > > > same > > > > > > thing. > > > > > > > > > > OK, and this pair of machines, have you tried both directions - i.e. > > > > > going a->b and b->a - do both directions fail? > > > > > Is the NFS server one of the two machines? If it is, and you're > > > > > using libvirt, > > > > > make sure that the directory the disks are on is an NFS mount on both > > > > > machines; e.g. don't migrate directly from the NFS export. > > > > > > > > > > > Also, I tried with stable-2.2 branch and got the same behavior. I > > > > > > really > > > > > > think that's very unlikely to have unstable code of such an > > > > > > important > > > > > > feature upstream, or on a stable- branch. Most probable thing is > > > > > > that > > > > > > I have something wrong on my environment. > > > > > > > > > > Yes, the challenge is to find what; and if it's something common > > > > > we should try and find a way of spotting it. > > > > > > > > > > > Anyway, I'll keep tetsing different stable- branches until I find > > > > > > something that works for me. I'll keep the mailing list posted. > > > > > > > > > > Could you share the qemu command line so we can see if we can > > > > > spot anything? > > > > > > > > Got the problem! I tried to simplify my qemu command line to the > > > > smallest possible, excluding things I thought it could cause the issue. > > > > With no further due, this is the argument: > > > > > > > > -cpu 'Opteron_G4' > > > > > > > > Without this argument everything works as it should, console responsive > > > > and guest active :) > > > > > > Can you show cat /proc/cpuinfo off the two hosts? > > > (Only one CPU, but please include the whole entry) > > > > Intel host: > > ssor : 7 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 60 > > model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz > > stepping : 3 > > microcode : 0x1c > > cpu MHz : 883.468 > > cache size : 8192 KB > > physical id : 0 > > siblings : 8 > > core id : 3 > > cpu cores : 4 > > apicid : 7 > > initial apicid : 7 > > fpu : yes > > fpu_exception : yes > > cpuid level : 13 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx > > pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl > > xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor > > ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic > > movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida > > arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase > > tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt > > bugs : > > bogomips : 6784.87 > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 39 bits physical, 48 bits virtual > > power management: > > > > AMD host: > > processor : 5 > > vendor_id : AuthenticAMD > > cpu family : 16 > > model : 10 > > model name : AMD Phenom(tm) II X6 1075T Processor > > stepping : 0 > > microcode : 0x10000bf > > cpu MHz : 800.000 > > cache size : 512 KB > > physical id : 0 > > siblings : 6 > > core id : 5 > > cpu cores : 6 > > apicid : 5 > > initial apicid : 5 > > fpu : yes > > fpu_exception : yes > > cpuid level : 6 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc > > extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm > > extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt > > cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall > > bogomips : 6027.25 > > TLB size : 1024 4K pages > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 48 bits physical, 48 bits virtual > > power management: ts ttp tm stc 100mhzsteps hwpstate cpb > > OK, very different CPUs. My guess is that one or both of them don't support > some feature of the Opteron_G4. When specifying -cpu it's often best > to use the enforce option. > > What happens if you try: > > qemu-system-x86_64 -machine pc,accel=kvm -cpu Opteron_G4,enforce=on -nographic
This is the script I'm using right now on both hosts: otubo@vader ~ # cat startvm.sh #/bin/bash /home/otubo/develop/qemu/github/x86_64-softmmu/qemu-system-x86_64 \ -machine pc,accel=kvm -cpu Opteron_G4,enforce=on \ -name 'virt-tests-vm1' \ -sandbox off \ -display sdl \ -drive id=drive_image1,cache=none,if=none,file=$1 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:22:23:24:25:26,id=idqE7Ggl,vectors=4,netdev=idjYAneH,bus=pci.0,addr=05 \ -netdev user,id=idjYAneH,hostfwd=tcp::5001-:22 \ -m 2G,slots=32,maxmem=10G \ -smp 2,maxcpus=10,cores=1,threads=1,sockets=2 \ -boot order=cdn,once=c,menu=off \ -enable-kvm > > on both hosts? The output follows, Intel host: otubo@vader ~ # ./startvm.sh /media/virt_images/pb-debian-7-server-latest.qcow2 warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6] warning: host doesn't support requested feature: CPUID.80000001H:ECX.misalignsse [bit 7] warning: host doesn't support requested feature: CPUID.80000001H:ECX.3dnowprefetch [bit 8] warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11] warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16] qemu-system-x86_64: Host doesn't support requested features AMD host: otubo@qemu-kvm-testrunner [2015-07-29 14:41:40] ~ # ./startvm-incoming.sh /media/virt_images/pb-debian-7-server-latest.qcow2 warning: host doesn't support requested feature: CPUID.01H:ECX.pclmulqdq|pclmuldq [bit 1] warning: host doesn't support requested feature: CPUID.01H:ECX.ssse3 [bit 9] warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.1|sse4_1 [bit 19] warning: host doesn't support requested feature: CPUID.01H:ECX.sse4.2|sse4_2 [bit 20] warning: host doesn't support requested feature: CPUID.01H:ECX.aes [bit 25] warning: host doesn't support requested feature: CPUID.01H:ECX.xsave [bit 26] warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 28] warning: host doesn't support requested feature: CPUID.80000001H:EDX.rdtscp [bit 27] warning: host doesn't support requested feature: CPUID.80000001H:ECX.xop [bit 11] warning: host doesn't support requested feature: CPUID.80000001H:ECX.fma4 [bit 16] qemu-system-x86_64: Host doesn't support requested features > You need to pick a CPU option that works with that on both of the hosts. > So you think it's just a matter of fine tunning which CPU option is best for live migration on each platform? Or it should be handled inside Qemu itself? Regards, -- Eduardo Otubo ProfitBricks GmbH
signature.asc
Description: Digital signature