On 2020/5/19 19:15, Peter Zijlstra wrote:
On Thu, May 14, 2020 at 04:30:53PM +0800, Like Xu wrote:diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index ea4faae56473..db185dca903d 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -646,6 +646,43 @@ static void intel_pmu_lbr_cleanup(struct kvm_vcpu *vcpu) intel_pmu_free_lbr_event(vcpu); }+static bool intel_pmu_lbr_is_availabile(struct kvm_vcpu *vcpu)+{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + + if (!pmu->lbr_event) + return false; + + if (event_is_oncpu(pmu->lbr_event)) { + intel_pmu_intercept_lbr_msrs(vcpu, false); + } else { + intel_pmu_intercept_lbr_msrs(vcpu, true); + return false; + } + + return true; +}This is unreadable gunk, what?
Abstractly, it is saying "KVM would passthrough the LBR satck MSRs if event_is_oncpu() is true, otherwise cancel the passthrough state if any." I'm using 'event->oncpu != -1' to represent the guest LBR event is scheduled on rather than 'event->state == PERF_EVENT_STATE_ERROR'. For intel_pmu_intercept_lbr_msrs(), false means to passthrough the LBR stack MSRs to the vCPU, and true means to cancel the passthrough state and make LBR MSR accesses trapped by the KVM.
It's not true. The guest LBR event with 'ERROR state' or 'oncpu != -1' would be lazy released and re-created in the next time the intel_pmu_create_lbr_event() is called and it's supposed to be re-scheduled and re-do availability_check() as well.+/* + * Higher priority host perf events (e.g. cpu pinned) could reclaim the + * pmu resources (e.g. LBR) that were assigned to the guest. This is + * usually done via ipi calls (more details in perf_install_in_context). + * + * Before entering the non-root mode (with irq disabled here), double + * confirm that the pmu features enabled to the guest are not reclaimed + * by higher priority host events. Otherwise, disallow vcpu's access to + * the reclaimed features. + */ +static void intel_pmu_availability_check(struct kvm_vcpu *vcpu) +{ + lockdep_assert_irqs_disabled(); + + if (lbr_is_enabled(vcpu) && !intel_pmu_lbr_is_availabile(vcpu) && + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) + pr_warn_ratelimited("kvm: vcpu-%d: LBR is temporarily unavailable.\n", + vcpu->vcpu_id);More unreadable nonsense; when the events go into ERROR state, it's a permanent fail, they'll not come back.
From the perspective of the guest user, the guest LBR is only temporarily unavailable
until the host no longer reclaims the LBR.
+} + struct kvm_pmu_ops intel_pmu_ops = { .find_arch_event = intel_find_arch_event, .find_fixed_event = intel_find_fixed_event, @@ -662,4 +699,5 @@ struct kvm_pmu_ops intel_pmu_ops = { .reset = intel_pmu_reset, .deliver_pmi = intel_pmu_deliver_pmi, .lbr_cleanup = intel_pmu_lbr_cleanup, + .availability_check = intel_pmu_availability_check, }; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9969d663826a..80d036c5f64a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6696,8 +6696,10 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)pt_guest_enter(vmx); - if (vcpu_to_pmu(vcpu)->version)+ if (vcpu_to_pmu(vcpu)->version) { atomic_switch_perf_msrs(vmx); + kvm_x86_ops.pmu_ops->availability_check(vcpu); + }AFAICT you just did a call out to the kvm_pmu crud in atomic_switch_perf_msrs(), why do another call?
In fact, availability_check() is only called here for just one time. The callchain looks like: - vmx_vcpu_run() - kvm_x86_ops.pmu_ops->availability_check(); - intel_pmu_availability_check() - intel_pmu_lbr_is_availabile() - event_is_oncpu() ... Thanks, Like Xu

