When freeing or unsharing page tables we send an IPI to synchronize with concurrent lockless page table walkers (e.g. GUP-fast). Today we broadcast that IPI to all CPUs, which is costly on large machines and hurts RT workloads[1].
This series makes those IPIs targeted. We track which CPUs are currently doing a lockless page table walk for a given mm (per-CPU active_lockless_pt_walk_mm). When we need to sync, we only IPI those CPUs. GUP-fast and perf_get_page_size() set/clear the tracker around their walk; tlb_remove_table_sync_mm() uses it and replaces the previous broadcast in the free/unshare paths. On x86, when the TLB flush path already sends IPIs (native without INVLPGB, or KVM), the extra sync IPI is redundant. We add a property on pv_mmu_ops so each backend can declare whether its flush_tlb_multi sends real IPIs; if so, tlb_remove_table_sync_mm() is a no-op. We also have tlb_flush() pass both freed_tables and unshared_tables so lazy-TLB CPUs get IPIs during hugetlb unshare. David Hildenbrand did the initial implementation. I built on his work and relied on off-list discussions to push it further - thanks a lot David! [1] https://lore.kernel.org/linux-mm/[email protected]/ v3 -> v4: - Rework based on David's two-step direction and per-CPU idea: 1) Targeted IPIs: per-CPU variable when entering/leaving lockless page table walk; tlb_remove_table_sync_mm() IPIs only those CPUs. 2) On x86, pv_mmu_ops property set at init to skip the extra sync when flush_tlb_multi() already sends IPIs. https://lore.kernel.org/linux-mm/[email protected]/ - https://lore.kernel.org/linux-mm/[email protected]/ v2 -> v3: - Complete rewrite: use dynamic IPI tracking instead of static checks (per Dave Hansen, thanks!) - Track IPIs via mmu_gather: native_flush_tlb_multi() sets flag when actually sending IPIs - Motivation for skipping redundant IPIs explained by David: https://lore.kernel.org/linux-mm/[email protected]/ - https://lore.kernel.org/linux-mm/[email protected]/ v1 -> v2: - Fix cover letter encoding to resolve send-email issues. Apologies for any email flood caused by the failed send attempts :( RFC -> v1: - Use a callback function in pv_mmu_ops instead of comparing function pointers (per David) - Embed the check directly in tlb_remove_table_sync_one() instead of requiring every caller to check explicitly (per David) - Move tlb_table_flush_implies_ipi_broadcast() outside of CONFIG_MMU_GATHER_RCU_TABLE_FREE to fix build error on architectures that don't enable this config. https://lore.kernel.org/oe-kbuild-all/[email protected]/ - https://lore.kernel.org/linux-mm/[email protected]/ Lance Yang (3): mm: use targeted IPIs for TLB sync with lockless page table walkers mm: switch callers to tlb_remove_table_sync_mm() x86/tlb: add architecture-specific TLB IPI optimization support arch/x86/hyperv/mmu.c | 5 ++ arch/x86/include/asm/paravirt.h | 5 ++ arch/x86/include/asm/paravirt_types.h | 6 +++ arch/x86/include/asm/tlb.h | 20 +++++++- arch/x86/kernel/kvm.c | 6 +++ arch/x86/kernel/paravirt.c | 18 +++++++ arch/x86/kernel/smpboot.c | 1 + arch/x86/xen/mmu_pv.c | 2 + include/asm-generic/tlb.h | 28 +++++++++-- include/linux/mm.h | 34 +++++++++++++ kernel/events/core.c | 2 + mm/gup.c | 2 + mm/khugepaged.c | 2 +- mm/mmu_gather.c | 69 ++++++++++++++++++++++++--- 14 files changed, 187 insertions(+), 13 deletions(-) -- 2.49.0

