Hi,

I'm seeing VM live migration failure when a VM is running a nested VM.
I'm using latest Linux kernel (v5.3) and QEMU (v4.1.0). I also tried
v5.2, but the result was the same. Kernel versions in L1 and L2 VM are
v4.18, but I don't think that matters.

The symptom is that L2 VM kernel crashes in different places after
migration but the call stack is mostly related to memory management
like [1] and [2]. The kernel crash happens almost all the time. While
L2 VM gets kernel panic, L1 VM runs fine after the migration. Both L1
and L2 VM were doing nothing during migration.

I found a few clues about this issue.
1) It happens with a relatively large memory for L1 (24G), but it does
not with a smaller size (3G).

2) Dead migration worked; when I ran "stop" command in the qemu
monitor for L1 first and did migration, migration worked always. It
also worked when I only stopped L2 VM and kept L1 live during the
migration.

With those two clues, I guess maybe some dirty pages made by L2 are
not transferred to the destination correctly, but I'm not really sure.

3) It happens on Intel(R) Xeon(R) Silver 4114 CPU, but it does not on
Intel(R) Xeon(R) CPU E5-2630 v3 CPU.

This makes me confused because I thought migrating nested state
doesn't depend on the underlying hardware.. Anyways, L1-only migration
with the large memory size (24G) works on both CPUs without any
problem.

I would appreciate any comments/suggestions to fix this problem.

Thanks,
Jintack


[1]https://paste.ubuntu.com/p/XGDKH45yt4/
[2]https://paste.ubuntu.com/p/CpbVTXJCyc/

Reply via email to