On 2026/2/2 20:50, Peter Zijlstra wrote:
On Mon, Feb 02, 2026 at 07:00:16PM +0800, Lance Yang wrote:
On Mon, 2 Feb 2026 10:54:14 +0100, Peter Zijlstra wrote:
On Mon, Feb 02, 2026 at 03:45:54PM +0800, Lance Yang wrote:
When freeing or unsharing page tables we send an IPI to synchronize with
concurrent lockless page table walkers (e.g. GUP-fast). Today we broadcast
that IPI to all CPUs, which is costly on large machines and hurts RT
workloads[1].
This series makes those IPIs targeted. We track which CPUs are currently
doing a lockless page table walk for a given mm (per-CPU
active_lockless_pt_walk_mm). When we need to sync, we only IPI those CPUs.
GUP-fast and perf_get_page_size() set/clear the tracker around their walk;
tlb_remove_table_sync_mm() uses it and replaces the previous broadcast in
the free/unshare paths.
I'm confused. This only happens when !PT_RECLAIM, because if PT_RECLAIM
__tlb_remove_table_one() actually uses RCU.
So why are you making things more expensive for no reason?
You're right that when CONFIG_PT_RECLAIM is set, __tlb_remove_table_one()
uses call_rcu() and we never call any sync there — this series doesn't
touch that path.
In the !PT_RECLAIM table-free path (same __tlb_remove_table_one() branch
that calls tlb_remove_table_sync_mm(tlb->mm) before __tlb_remove_table),
we're not adding any new sync; we're replacing the existing broadcast IPI
(tlb_remove_table_sync_one()) with targeted IPIs (tlb_remove_table_sync_mm()).
Right, but if we can use full RCU for PT_RECLAIM, why can't we do so
unconditionally and not add overhead?
The sync (IPI) is mainly needed for unshare (e.g. hugetlb) and collapse
(khugepaged) paths, regardless of whether table free uses RCU, IIUC.