smp_load_acquire() is enough here and it's cheaper than smp_mb(). Adding a comment about reusing memory barrier of kvm_make_all_cpus_request() here to keep order between modifications to the page tables and reading mode.
Signed-off-by: Lan Tianyu <tianyu....@intel.com> --- virt/kvm/kvm_main.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ec5aa8d..39ebee9a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -191,9 +191,23 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req) #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL void kvm_flush_remote_tlbs(struct kvm *kvm) { - long dirty_count = kvm->tlbs_dirty; + /* + * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in + * kvm_make_all_cpus_request. + */ + long dirty_count = smp_load_acquire(&kvm->tlbs_dirty); - smp_mb(); + /* + * We want to publish modifications to the page tables before reading + * mode. Pairs with a memory barrier in arch-specific code. + * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest + * and smp_mb in walk_shadow_page_lockless_begin/end. + * - powerpc: smp_mb in kvmppc_prepare_to_enter. + * + * There is already an smp_mb__after_atomic() before + * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that + * barrier here. + */ if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) ++kvm->stat.remote_tlb_flush; cmpxchg(&kvm->tlbs_dirty, dirty_count, 0); -- 1.8.4.rc0.1.g8f6a3e5.dirty