On 07/06/16 15:32, Paolo Bonzini wrote:
> 
> 
> On 06/07/2016 15:26, Haozhong Zhang wrote:
> > On 07/06/16 19:42, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng...@hotmail.com>
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> IP: [<          (null)>]           (null)
> >> PGD 0 
> >> Oops: 0010 [#1] SMP
> >> Call Trace:
> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  ? vcpu_load+0x1c/0x60 [kvm]
> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> >>  do_vfs_ioctl+0x96/0x6a0
> >>  ? __fget_light+0x2a/0x90
> >>  SyS_ioctl+0x79/0x90
> >>  do_syscall_64+0x68/0x180
> >>  entry_SYSCALL64_slow_path+0x25/0x25
> >> Code:  Bad RIP value.
> >> RIP  [<          (null)>]           (null)
> >>  RSP <ffff8800b5263c48>
> >> CR2: 0000000000000000
> >> ---[ end trace 9c70c48b1a2bc66e ]---
> >>
> >> This can be reproduced readily by preemption timer enabled on L0 and 
> >> disabled 
> >> on L1.
> >>
> >> Preemption timer for nested VMX is emulated by hrtimer which is started on 
> >> L2
> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. 
> >> However,
> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, 
> >> then 
> >> the L1 preemption timer vmexit is captured and be treated as a L2 
> >> preemption 
> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> >>
> >> This patch fix it by depending on check_nested_events to capture L2 
> >> preemption 
> >> timer(emulated hrtimer) expire and nested vmexit.
> >>
> >> Cc: Paolo Bonzini <pbonz...@redhat.com>
> >> Cc: Radim Krčmář <rkrc...@redhat.com>
> >> Cc: Yunhong Jiang <yunhong.ji...@intel.com>
> >> Cc: Jan Kiszka <jan.kis...@siemens.com>
> >> Cc: Haozhong Zhang <haozhong.zh...@intel.com>
> >> Signed-off-by: Wanpeng Li <wanpeng...@hotmail.com>
> >> ---
> >> v2 -> v3:
> >>  * update patch subject
> >> v1 -> v2:
> >>  * fix typo in patch description
> >>
> >>  arch/x86/kvm/vmx.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 85e2f0a..29c16a8 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
> >> *vcpu)
> >>            return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> >>    case EXIT_REASON_PCOMMIT:
> >>            return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> >> +  case EXIT_REASON_PREEMPTION_TIMER:
> >> +          return false;
> >>    default:
> >>            return true;
> >>    }
> >> -- 
> >> 1.9.1
> >>
> > 
> > This patch also fixed the crash of L1 Xen with L2 HVM guest. Xen does
> > not enable preemption timer for HVM guests, and will get panic if it
> > receives a preemption timer vmexit.
> 
> Thanks!  I'm still not sure why the bit is set in the vmcs02 though...
> 

Yes, it looks really weird.

I replaced "return false" in Wanpeng's patch by

    pr_info("VMCS: preemption timer enabled = %d\n",
            !!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & 
PIN_BASED_VMX_PREEMPTION_TIMER));

and redid my test. As expected, L1 Xen crashed due to the unexpected
preemption timer vmexit. I got a log from above statement just before crash:

    VMCS: preemption timer enabled = 1

which is expected to be 0, because preemption timer is disabled in
vmcs02. I also modified L1 Xen to dump VMCS at crash, and it says
preemption timer is disabled.

I noticed Jim Mattson recently sent a patch "KVM: nVMX: Fix memory
corruption when using VMCS shadowing" to fix the inconsistency between
vmcs12 and its shadow. Is it relevant here? I'll test his patch
tomorrow.

Thanks,
Haozhong

Reply via email to