* Gonglei (Arei) (arei.gong...@huawei.com) wrote: > Hi, > > > > > * Gonglei (Arei) (arei.gong...@huawei.com) wrote: > > > Hi Dave, > > > > > > We discussed some live migration fallback scenarios in this year's KVM > > > forum, > > > and now I can provide another scenario, perhaps the upstream should > > consider rolling > > > back for this situation. > > > > > > Environments information: > > > > > > host A: cpu E5620(model WestmereEP without flag xsave) > > > host B: cpu E5-2643(model SandyBridgeEP with flag xsave) > > > > > > The reproduce steps is : > > > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough). > > > > Well we don't guarantee migration across -cpu host - does this problem > > go away if both qemu's are started with matching CPU flags > > (corresponding to the Westmere) ? > > > Sorry, we didn't test other cpu model scenarios since we should assure > that the live migration support from lower generation CPUs to higher > generation CPUs. :( > > > > > 2. Migrate the vm to host B when cr4.OSXSAVE=0. > > > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. > > > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu > > printed log as followed: > > > > > > KVM: entry failed, hardware error 0x80000021 > > > > > > If you're running a guest on an Intel machine without unrestricted mode > > > support, the failure can be most likely due to the guest entering an > > > invalid > > > state for Intel VT. For example, the guest maybe running in big real mode > > > which is not supported on less recent Intel processors. > > > > > > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 > > > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 > > > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > > > ES =0000 00000000 0000ffff 00009300 > > > CS =f000 ffff0000 0000ffff 00009b00 > > > SS =0000 00000000 0000ffff 00009300 > > > DS =0000 00000000 0000ffff 00009300 > > > FS =0000 00000000 0000ffff 00009300 > > > GS =0000 00000000 0000ffff 00009300 > > > LDT=0000 00000000 0000ffff 00008200 > > > TR =0000 00000000 0000ffff 00008b00 > > > GDT= 00000000 0000ffff > > > IDT= 00000000 0000ffff > > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > > DR3=0000000000000000 > > > DR6=00000000ffff0ff0 DR7=0000000000000400 > > > EFER=0000000000000000 > > > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 > > > 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 > > 00 > > > > > > Problem happened when kvm_put_sregs returns err -22(called by > > kvm_arch_put_registers(qemu)). > > > > > > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that > > > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1. > > > We should cancel migration if kvm_arch_put_registers returns error. > > > > Do you have a backtrace of when the kvm_arch_put_registers is called > > when it fails? > > The main backtrace is below: > > qemu_loadvm_state > cpu_synchronize_all_post_init --> w/o return value > cpu_synchronize_post_init --> w/o return value > kvm_cpu_synchronize_post_init --> w/o return value > run_on_cpu ---> w/o return value > do_kvm_cpu_synchronize_post_init --> w/o > return value > kvm_arch_put_registers --> w/ return value > > Root cause is some functions don't have return values, the migration thread > can't detect those failures. Paolo?
OK, so yes it would be great to get return values and get them up to qemu_loadvm_state; I guess the tricky one is getting the return value through 'run_on_cpu'. > > If it's called during the loading of the device state then we should be > > able to detect it and fail the migration; however if it's only failing > > after the CPU is restarted after the migration then it's a bit too late. > > > Actually the CPUs haven't started in this scenario. OK, then yes, it's worth trying to fail the migrate. Dave > Thanks, > -Gonglei -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK