vmx_flush_pml_buffer repeatedly calls kvm_vcpu_mark_page_dirty, which
SRCU-derefrences kvm->memslots. In order to give the compiler more
freedom to optimize the function, SRCU-dereference the pointer
kvm->memslots only once.

Reviewed-by: Makarand Sonare <makarandson...@google.com>
Signed-off-by: Ben Gardon <bgar...@google.com>

---

Tested by running the dirty_log_perf_test selftest on a dual socket Intel
Skylake machine:
./dirty_log_perf_test -v 4 -b 30G -i 5

The test was run 5 times with and without this patch and the dirty
memory time for iterations 2-5 was averaged across the 5 runs.
Iteration 1 was discarded for this analysis because it is still dominated
by the time spent populating memory.

The average time for each run demonstrated a strange bimodal distribution,
with clusters around 2 seconds and 2.5 seconds. This may have been a
result of vCPU migration between NUMA nodes.

In any case, the get dirty times with this patch averaged to 2.07
seconds, a 7% savings from the 2.22 second everage without this patch.

While these savings may be partly a result of the patched runs having
one more 2 second clustered run, the patched runs in the higer cluster
were also 7-8% shorter than those in the unpatched case.

Below is the raw data for anyone interested in visualizing the results
with a graph:
Iteration       Baseline        Patched
2               2.038562907     2.045226614
3               2.037363248     2.045033709
4               2.037176331     1.999783966
5               1.999891981     2.007849104
2               2.569526298     2.001252504
3               2.579110209     2.008541897
4               2.585883731     2.005317983
5               2.588692727     2.007100987
2               2.01191437      2.006953735
3               2.012972236     2.04540153
4               1.968836017     2.005035246
5               1.967915154     2.003859551
2               2.037533296     1.991275846
3               2.501480125     2.391886691
4               2.454382587     2.391904789
5               2.461046772     2.398767963
2               2.036991484     2.011331436
3               2.002954418     2.002635687
4               2.053342717     2.006769959
5               2.522539759     2.006470059
Average         2.223405818     2.069119963

 arch/x86/kvm/vmx/vmx.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..46c54802dfdb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5692,6 +5692,7 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
 static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
 {
        struct vcpu_vmx *vmx = to_vmx(vcpu);
+       struct kvm_memslots *memslots;
        u64 *pml_buf;
        u16 pml_idx;
 
@@ -5707,13 +5708,18 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
        else
                pml_idx++;
 
+       memslots = kvm_vcpu_memslots(vcpu);
+
        pml_buf = page_address(vmx->pml_pg);
        for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
+               struct kvm_memory_slot *memslot;
                u64 gpa;
 
                gpa = pml_buf[pml_idx];
                WARN_ON(gpa & (PAGE_SIZE - 1));
-               kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+
+               memslot = __gfn_to_memslot(memslots, gpa >> PAGE_SHIFT);
+               mark_page_dirty_in_slot(vcpu->kvm, memslot, gpa >> PAGE_SHIFT);
        }
 
        /* reset PML index */
-- 
2.30.0.365.g02bc693789-goog

Reply via email to