On 5/15/2025 8:19 AM, Sean Christopherson wrote: > The shortlog is wildly inaccurate. KVM is not simply checking, KVM is > actively > disabling RDPMC interception. *That* needs to be the focus of the shortlog > and > changelog.
Sure. > >> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c >> index 92c742ead663..6ad71752be4b 100644 >> --- a/arch/x86/kvm/pmu.c >> +++ b/arch/x86/kvm/pmu.c >> @@ -604,6 +604,40 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, >> u64 *data) >> return 0; >> } >> >> +inline bool kvm_rdpmc_in_guest(struct kvm_vcpu *vcpu) > Strongly prefer kvm_need_rdpmc_intercept(), e.g. to follow > vmx_need_pf_intercept(), > and because it makes the users more obviously correct. The "in_guest" > terminology > from kvm_{hlt,mwait,pause,cstate}_in_guest() isn't great, but at least in > those > flows it's not awful because they are very direct reflections of knobs that > control > interception, whereas this helper is making a variety of runtime checks. Sure. > >> +{ >> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> + if (!kvm_mediated_pmu_enabled(vcpu)) >> + return false; >> + >> + /* >> + * VMware allows access to these Pseduo-PMCs even when read via RDPMC >> + * in Ring3 when CR4.PCE=0. >> + */ >> + if (enable_vmware_backdoor) >> + return false; >> + >> + /* >> + * FIXME: In theory, perf metrics is always combined with fixed >> + * counter 3. it's fair enough to compare the guest and host >> + * fixed counter number and don't need to check perf metrics >> + * explicitly. However kvm_pmu_cap.num_counters_fixed is limited >> + * KVM_MAX_NR_FIXED_COUNTERS (3) as fixed counter 3 is not >> + * supported now. perf metrics is still needed to be checked >> + * explicitly here. Once fixed counter 3 is supported, the perf >> + * metrics checking can be removed. >> + */ > And then what happens when hardware supported fixed counter #4? KVM has the > same > problem, and we can't check for features that KVM doesn't know about. > > The entire problem is that this code is checking for *KVM* support, but what > the > guest can see and access needs to be checked against *hardware* support. > Handling > that is simple, just take a snapshot of the host PMU capabilities before KVM > generates kvm_pmu_cap, and use the unadulterated snapshot here (and everywhere > else with similar checks). Yes. That's correct. Whether disabling intercept should check against HW instead of KVM PMU capability since host perf subsystem may hide some PMU features. > >> + return pmu->nr_arch_gp_counters == kvm_pmu_cap.num_counters_gp && >> + pmu->nr_arch_fixed_counters == kvm_pmu_cap.num_counters_fixed && >> + vcpu_has_perf_metrics(vcpu) == kvm_host_has_perf_metrics() && >> + pmu->counter_bitmask[KVM_PMC_GP] == >> + (BIT_ULL(kvm_pmu_cap.bit_width_gp) - 1) && >> + pmu->counter_bitmask[KVM_PMC_FIXED] == >> + (BIT_ULL(kvm_pmu_cap.bit_width_fixed) - 1); >> +} >> @@ -212,6 +212,18 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); >> } >> >> +static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_svm *svm = to_svm(vcpu); >> + >> + __amd_pmu_refresh(vcpu); > To better communicate the roles of the two paths to refresh(): > > amd_pmu_refresh_capabilities(vcpu); > > amd_pmu_refresh_controls(vcpu); > > Ditto for Intel. Sure.