I looked up Intel manual for VM instruction error. Error number 7 means "VM entry with invalid control field(s)", which means in process of VM switching some control fields are not properly configured.
I wonder why some emulated CPUs (e.g.Nehalem) can run properly without nested VMCS MSR support? Besides, this bug has also been reported at Red Hat community https://bugzilla.redhat.com/show_bug.cgi?id=892240 And for some specific kernel (e.g. kernel 3.8.4-202.fc18.x86_64 for fedora18) it works well. On Tue, Apr 16, 2013 at 3:03 PM, Jan Kiszka <jan.kis...@web.de> wrote: > On 2013-04-16 05:49, 李春奇 <Arthur Chunqi Li> wrote: > > I changed to the latest version of kvm kernel but the bug also occured. > > > > On the startup of L1 VM on the host, the host kern.log will output: > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 > > unhandled rdmsr: 0x345 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 > > callbacks suppressed > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x40 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x60 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x41 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x61 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x42 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x62 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x43 data 0 > > Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 > > unhandled wrmsr: 0x63 data 0 > > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 > > unhandled wrmsr: 0x40 data 0 > > Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 > > unhandled wrmsr: 0x60 data 0 > > > > When L1 VM starts and crashes, its kern.log will output: > > Apr 16 11:28:55 kvm1 kernel: [ 33.590101] device tap0 entered > promiscuous > > mode > > Apr 16 11:28:55 kvm1 kernel: [ 33.590140] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:28:55 kvm1 kernel: [ 33.590146] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:29:04 kvm1 kernel: [ 42.592103] br0: port 2(tap0) entered > > forwarding state > > Apr 16 11:29:19 kvm1 kernel: [ 57.752731] kvm [1673]: vcpu0 unhandled > > rdmsr: 0x345 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797261] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x40 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797315] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x60 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797366] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x41 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797416] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x61 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797466] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x42 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797516] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x62 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797566] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x43 data 0 > > Apr 16 11:29:19 kvm1 kernel: [ 57.797616] kvm [1673]: vcpu0 unhandled > > wrmsr: 0x63 data 0 > > > > The host will output simultaneously: > > Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS > > MSR_{LOAD,STORE} unsupported > > That's an important information. KVM is not yet implementing this > feature, but L1 is using it - doomed to fail. This feature gap of nested > VMX needs to be closed at some point. > > > > > And the callback trace displayed on the console is the same as the > previous > > mail. > > > > Besides, the L1 and L2 guest may sometimes crash and output nothing, > while > > sometimes it will output as above. > > > > > > So this indicates that the msr controls may fail for core2duo CPU > emulator. > > > > Maybe varying the CPU type (try e.g. -cpu kvm64,+vmx) reduces the > likeliness of this scenario with KVM as guest. > > > > > For Jan, > > I have traced the code of qemu and KVM and found the relevant code of > errno > > "KVM: entry failed, hardware error 0x7". The relevant code is in kernel > > arch/x86/kvm/vmx.c, function vmx_handle_exit(): > > > > if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) { > > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > > vcpu->run->fail_entry.hardware_entry_failure_reason > > = exit_reason; > > return 0; > > } > > > > if (unlikely(vmx->fail)) { > > vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; > > vcpu->run->fail_entry.hardware_entry_failure_reason > > = vmcs_read32(VM_INSTRUCTION_ERROR); > > return 0; > > } > > > > The entry failed hardware error may be caused from these two points, both > > are caused by VMENTRY failed. Because macro > VMX_EXIT_REASONS_FAILED_VMENTRY > > is 0x80000000 and the output errno is 0x7, so this error is caused by the > > second branch. I'm not very clear what the result of > > vmcs_read32(VM_INSTRUCTION_ERROR) refers to. > > Try to look this up in the Intel manual. It explains what instruction > error 7 means. You will also find it when tracing down the error message > of L0. > > Jan > > > -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China