vmx_flush_pml_buffer repeatedly calls kvm_vcpu_mark_page_dirty, which SRCU-derefrences kvm->memslots. In order to give the compiler more freedom to optimize the function, SRCU-dereference the pointer kvm->memslots only once.
Reviewed-by: Makarand Sonare <makarandson...@google.com> Signed-off-by: Ben Gardon <bgar...@google.com> --- Tested by running the dirty_log_perf_test selftest on a dual socket Intel Skylake machine: ./dirty_log_perf_test -v 4 -b 30G -i 5 The test was run 5 times with and without this patch and the dirty memory time for iterations 2-5 was averaged across the 5 runs. Iteration 1 was discarded for this analysis because it is still dominated by the time spent populating memory. The average time for each run demonstrated a strange bimodal distribution, with clusters around 2 seconds and 2.5 seconds. This may have been a result of vCPU migration between NUMA nodes. In any case, the get dirty times with this patch averaged to 2.07 seconds, a 7% savings from the 2.22 second everage without this patch. While these savings may be partly a result of the patched runs having one more 2 second clustered run, the patched runs in the higer cluster were also 7-8% shorter than those in the unpatched case. Below is the raw data for anyone interested in visualizing the results with a graph: Iteration Baseline Patched 2 2.038562907 2.045226614 3 2.037363248 2.045033709 4 2.037176331 1.999783966 5 1.999891981 2.007849104 2 2.569526298 2.001252504 3 2.579110209 2.008541897 4 2.585883731 2.005317983 5 2.588692727 2.007100987 2 2.01191437 2.006953735 3 2.012972236 2.04540153 4 1.968836017 2.005035246 5 1.967915154 2.003859551 2 2.037533296 1.991275846 3 2.501480125 2.391886691 4 2.454382587 2.391904789 5 2.461046772 2.398767963 2 2.036991484 2.011331436 3 2.002954418 2.002635687 4 2.053342717 2.006769959 5 2.522539759 2.006470059 Average 2.223405818 2.069119963 arch/x86/kvm/vmx/vmx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cc60b1fc3ee7..46c54802dfdb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5692,6 +5692,7 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx) static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + struct kvm_memslots *memslots; u64 *pml_buf; u16 pml_idx; @@ -5707,13 +5708,18 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu) else pml_idx++; + memslots = kvm_vcpu_memslots(vcpu); + pml_buf = page_address(vmx->pml_pg); for (; pml_idx < PML_ENTITY_NUM; pml_idx++) { + struct kvm_memory_slot *memslot; u64 gpa; gpa = pml_buf[pml_idx]; WARN_ON(gpa & (PAGE_SIZE - 1)); - kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); + + memslot = __gfn_to_memslot(memslots, gpa >> PAGE_SHIFT); + mark_page_dirty_in_slot(vcpu->kvm, memslot, gpa >> PAGE_SHIFT); } /* reset PML index */ -- 2.30.0.365.g02bc693789-goog