Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On Thu, 2021-01-14 at 16:14 -0800, Sean Christopherson wrote: > On Thu, Jan 14, 2021, Maxim Levitsky wrote: > > This is very helpful for debugging nested VMX issues. > > > > Signed-off-by: Maxim Levitsky > > --- > > arch/x86/kvm/trace.h | 30 ++ > > arch/x86/kvm/vmx/nested.c | 6 ++ > > arch/x86/kvm/x86.c| 1 + > > 3 files changed, 37 insertions(+) > > > > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h > > index 2de30c20bc264..663d1b1d8bf64 100644 > > --- a/arch/x86/kvm/trace.h > > +++ b/arch/x86/kvm/trace.h > > @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun, > > __entry->npt ? "on" : "off") > > ); > > > > + > > +/* > > + * Tracepoint for nested VMLAUNCH/VMRESUME > > VM-Enter, as below. Will do > > > + */ > > +TRACE_EVENT(kvm_nested_vmlaunch_resume, > > s/vmlaunc_resume/vmenter, both for consistency with other code and so that it > can sanely be reused by SVM. IMO, trace_kvm_entry is wrong :-). SVM already has trace_kvm_nested_vmrun and it contains some SVM specific stuff that doesn't exist on VMX and vise versa. So I do want to keep these trace points separate. > > > + TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip, > > +__u32 entry_intr_info), > > + TP_ARGS(rip, vmcs, nested_rip, entry_intr_info), > > + > > + TP_STRUCT__entry( > > + __field(__u64, rip ) > > + __field(__u64, vmcs) > > + __field(__u64, nested_rip ) > > + __field(__u32, entry_intr_info ) > > + ), > > + > > + TP_fast_assign( > > + __entry->rip= rip; > > + __entry->vmcs = vmcs; > > + __entry->nested_rip = nested_rip; > > + __entry->entry_intr_info= entry_intr_info; > > + ), > > + > > + TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx " > > + "entry_intr_info: 0x%08x", > > + __entry->rip, __entry->vmcs, __entry->nested_rip, > > + __entry->entry_intr_info) > > +); > > + > > + > > TRACE_EVENT(kvm_nested_intercepts, > > TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, > > __u32 intercept1, __u32 intercept2, __u32 intercept3), > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index 776688f9d1017..cd51b66480d52 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status > > nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > > > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), > > Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, not L1 > as is the case for the "true" nested VM-Enter path. > > > +vmx->nested.current_vmptr, > > +vmcs12->guest_rip, > > +vmcs12->vm_entry_intr_info_field); > > The placement is a bit funky. I assume you put it here so that calls from > vmx_set_nested_state() also get traced. But, that also means > vmx_pre_leave_smm() will get traced, and it also creates some weirdness where > some nested VM-Enters that VM-Fail will get traced, but others will not. > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing, > especially if the debugger looks up the RIP and sees RSM. Ditto for the > migration case. > > Not sure what would be a good answer. > > > + > > + > > /* > > * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* > > * nested early checks are disabled. In the event of a "late" VM-Fail, > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index a480804ae27a3..7c6e94e32100e 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); > > +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); > > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); > > -- > > 2.26.2 > >
Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On Fri, 2021-01-15 at 08:30 -0800, Sean Christopherson wrote: > On Fri, Jan 15, 2021, Paolo Bonzini wrote: > > On 15/01/21 01:14, Sean Christopherson wrote: > > > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), > > > Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, > > > not L1 > > > as is the case for the "true" nested VM-Enter path. > > > > It will be the previous RIP---might as well be 0xfff0 depending on what > > userspace does. I don't think you can do much better than that, using > > vmcs12->host_rip would be confusing in the SMM case. > > > > > > +vmx->nested.current_vmptr, > > > > +vmcs12->guest_rip, > > > > + > > > > vmcs12->vm_entry_intr_info_field); > > > The placement is a bit funky. I assume you put it here so that calls from > > > vmx_set_nested_state() also get traced. But, that also means > > > vmx_pre_leave_smm() will get traced, and it also creates some weirdness > > > where > > > some nested VM-Enters that VM-Fail will get traced, but others will not. > > > > > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be > > > confusing, > > > especially if the debugger looks up the RIP and sees RSM. Ditto for the > > > migration case. > > > > Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes > > sense so I'm not worried about that. > > Ideally there would something in the tracepoint to differentiate the various > cases. Not that the RSM/migration cases will pop up often, but I think it's > an > easily solved problem that could avoid confusion. > > What if we captured vmx->nested.smm.guest_mode and from_vmentry, and > explicitly > record what triggered the entry? > > TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx > intr_info: 0x%08x", > __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : > "SET_STATE", > __entry->rip, __entry->vmcs, __entry->nested_rip, > __entry->entry_intr_info I think that this is a good idea, but should be done in a separate patch. > > Side topic, can we have an "official" ruling on whether KVM tracepoints should > use colons and/or commas? And probably same question for whether or not to > prepend zeros. E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx > vmcs: 0x%016llx". It bugs me that we're so inconsistent. > As I said the kvm tracing has a lot of things that can be imporoved, and as it is often the only way to figure out complex bugs as these I had to deal with recently, I will do more improvements in this area as time permits. Best regards, Maxim Levitsky
Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On Fri, 2021-01-15 at 14:48 +0100, Paolo Bonzini wrote: > On 15/01/21 01:14, Sean Christopherson wrote: > > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), > > Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, not > > L1 > > as is the case for the "true" nested VM-Enter path. Actually in this case, the initial RIP of 0xfff0 will be printed which isn't that bad. A tracepoint in nested state load function would be very nice to add to mark this explicitly. I'll do this later. > > It will be the previous RIP---might as well be 0xfff0 depending on > what userspace does. I don't think you can do much better than that, > using vmcs12->host_rip would be confusing in the SMM case. > > > > + vmx->nested.current_vmptr, > > > + vmcs12->guest_rip, > > > + vmcs12->vm_entry_intr_info_field); > > The placement is a bit funky. I assume you put it here so that calls from > > vmx_set_nested_state() also get traced. But, that also means > > vmx_pre_leave_smm() will get traced, and it also creates some weirdness > > where > > some nested VM-Enters that VM-Fail will get traced, but others will not. > > > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be > > confusing, > > especially if the debugger looks up the RIP and sees RSM. Ditto for the > > migration case. > > Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes > sense so I'm not worried about that. > > Paolo > I agree with that and indeed this was my intention. In fact I will change the svm's tracepoint to behave the same way in the next patch series (I'll move it to enter_svm_guest_mode). (When I wrote this patch I somehow thought that this is what SVM already does). Best regards, Maxim Levitsky
Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On Fri, Jan 15, 2021, Paolo Bonzini wrote: > On 15/01/21 01:14, Sean Christopherson wrote: > > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), > > Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, not > > L1 > > as is the case for the "true" nested VM-Enter path. > > It will be the previous RIP---might as well be 0xfff0 depending on what > userspace does. I don't think you can do much better than that, using > vmcs12->host_rip would be confusing in the SMM case. > > > > + vmx->nested.current_vmptr, > > > + vmcs12->guest_rip, > > > + vmcs12->vm_entry_intr_info_field); > > The placement is a bit funky. I assume you put it here so that calls from > > vmx_set_nested_state() also get traced. But, that also means > > vmx_pre_leave_smm() will get traced, and it also creates some weirdness > > where > > some nested VM-Enters that VM-Fail will get traced, but others will not. > > > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be > > confusing, > > especially if the debugger looks up the RIP and sees RSM. Ditto for the > > migration case. > > Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes > sense so I'm not worried about that. Ideally there would something in the tracepoint to differentiate the various cases. Not that the RSM/migration cases will pop up often, but I think it's an easily solved problem that could avoid confusion. What if we captured vmx->nested.smm.guest_mode and from_vmentry, and explicitly record what triggered the entry? TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx intr_info: 0x%08x", __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : "SET_STATE", __entry->rip, __entry->vmcs, __entry->nested_rip, __entry->entry_intr_info Side topic, can we have an "official" ruling on whether KVM tracepoints should use colons and/or commas? And probably same question for whether or not to prepend zeros. E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx vmcs: 0x%016llx". It bugs me that we're so inconsistent.
Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On 15/01/21 01:14, Sean Christopherson wrote: + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, not L1 as is the case for the "true" nested VM-Enter path. It will be the previous RIP---might as well be 0xfff0 depending on what userspace does. I don't think you can do much better than that, using vmcs12->host_rip would be confusing in the SMM case. +vmx->nested.current_vmptr, +vmcs12->guest_rip, +vmcs12->vm_entry_intr_info_field); The placement is a bit funky. I assume you put it here so that calls from vmx_set_nested_state() also get traced. But, that also means vmx_pre_leave_smm() will get traced, and it also creates some weirdness where some nested VM-Enters that VM-Fail will get traced, but others will not. Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing, especially if the debugger looks up the RIP and sees RSM. Ditto for the migration case. Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes sense so I'm not worried about that. Paolo
Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
On Thu, Jan 14, 2021, Maxim Levitsky wrote: > This is very helpful for debugging nested VMX issues. > > Signed-off-by: Maxim Levitsky > --- > arch/x86/kvm/trace.h | 30 ++ > arch/x86/kvm/vmx/nested.c | 6 ++ > arch/x86/kvm/x86.c| 1 + > 3 files changed, 37 insertions(+) > > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h > index 2de30c20bc264..663d1b1d8bf64 100644 > --- a/arch/x86/kvm/trace.h > +++ b/arch/x86/kvm/trace.h > @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun, > __entry->npt ? "on" : "off") > ); > > + > +/* > + * Tracepoint for nested VMLAUNCH/VMRESUME VM-Enter, as below. > + */ > +TRACE_EVENT(kvm_nested_vmlaunch_resume, s/vmlaunc_resume/vmenter, both for consistency with other code and so that it can sanely be reused by SVM. IMO, trace_kvm_entry is wrong :-). > + TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip, > + __u32 entry_intr_info), > + TP_ARGS(rip, vmcs, nested_rip, entry_intr_info), > + > + TP_STRUCT__entry( > + __field(__u64, rip ) > + __field(__u64, vmcs) > + __field(__u64, nested_rip ) > + __field(__u32, entry_intr_info ) > + ), > + > + TP_fast_assign( > + __entry->rip= rip; > + __entry->vmcs = vmcs; > + __entry->nested_rip = nested_rip; > + __entry->entry_intr_info= entry_intr_info; > + ), > + > + TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx " > + "entry_intr_info: 0x%08x", > + __entry->rip, __entry->vmcs, __entry->nested_rip, > + __entry->entry_intr_info) > +); > + > + > TRACE_EVENT(kvm_nested_intercepts, > TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, >__u32 intercept1, __u32 intercept2, __u32 intercept3), > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 776688f9d1017..cd51b66480d52 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status > nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), Hmm, won't this RIP be wrong for the migration case? I.e. it'll be L2, not L1 as is the case for the "true" nested VM-Enter path. > + vmx->nested.current_vmptr, > + vmcs12->guest_rip, > + vmcs12->vm_entry_intr_info_field); The placement is a bit funky. I assume you put it here so that calls from vmx_set_nested_state() also get traced. But, that also means vmx_pre_leave_smm() will get traced, and it also creates some weirdness where some nested VM-Enters that VM-Fail will get traced, but others will not. Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing, especially if the debugger looks up the RIP and sees RSM. Ditto for the migration case. Not sure what would be a good answer. > + > + > /* >* Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* >* nested early checks are disabled. In the event of a "late" VM-Fail, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index a480804ae27a3..7c6e94e32100e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); > +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); > -- > 2.26.2 >
[PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint
This is very helpful for debugging nested VMX issues. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/trace.h | 30 ++ arch/x86/kvm/vmx/nested.c | 6 ++ arch/x86/kvm/x86.c| 1 + 3 files changed, 37 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 2de30c20bc264..663d1b1d8bf64 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun, __entry->npt ? "on" : "off") ); + +/* + * Tracepoint for nested VMLAUNCH/VMRESUME + */ +TRACE_EVENT(kvm_nested_vmlaunch_resume, + TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip, +__u32 entry_intr_info), + TP_ARGS(rip, vmcs, nested_rip, entry_intr_info), + + TP_STRUCT__entry( + __field(__u64, rip ) + __field(__u64, vmcs) + __field(__u64, nested_rip ) + __field(__u32, entry_intr_info ) + ), + + TP_fast_assign( + __entry->rip= rip; + __entry->vmcs = vmcs; + __entry->nested_rip = nested_rip; + __entry->entry_intr_info= entry_intr_info; + ), + + TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx " + "entry_intr_info: 0x%08x", + __entry->rip, __entry->vmcs, __entry->nested_rip, + __entry->entry_intr_info) +); + + TRACE_EVENT(kvm_nested_intercepts, TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 intercept1, __u32 intercept2, __u32 intercept3), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 776688f9d1017..cd51b66480d52 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu), +vmx->nested.current_vmptr, +vmcs12->guest_rip, +vmcs12->vm_entry_intr_info_field); + + /* * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* * nested early checks are disabled. In the event of a "late" VM-Fail, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a480804ae27a3..7c6e94e32100e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject); -- 2.26.2