On Thu, Apr 23, 2026 at 02:57:34PM -0400, Peter Xu wrote: > On Thu, Apr 23, 2026 at 07:08:00PM +0100, Kiryl Shutsemau wrote: > > > - Whether read protection is required for an userspace swap system > > > (e.g. did you get time to have a look at umap?) > > > > I looked at it briefly, so I can miss details. > > > > IIUC, in absence of read tracking it doesn't collect hotness information > > at all. The eviction is based on fault-in time: the oldest faulted-in > > For example, let's imagine if we can have a per-mm idle page tracker, would > it work for you to collect hotness info? > > The other idea is, no matter whether we use MGLRU or legacy LRU, if we can > expose a better interface to share hotness info from kernel to userspace, > would it be possible?
I don't see how either fits our problem. Both page_idle and the LRUs (legacy or MGLRU) track accesses on physical memory. We need visibility in the virtual address space domain. We don't care which physical page backs a given guest address at any moment. We want to know which piece of the user's dataset is cold, and the answer has to be indifferent to kernel actions underneath: the tracking must survive migration and swap-out. RWP gives us that — the uffd-wp bit is preserved across swap PTEs and migration entries, so the "this VA was declared cold" marker stays attached to the VA. A physical-side tracker loses its state the moment the folio is freed or replaced: a refaulted folio is a fresh object with no history. Scaling goes the same way. Per-mm tracking of the form RWP does can scale with the working set. A physical-side tracker scales with all folios on the LRU/memcg, then needs an rmap walk per folio to map back to a VA — which is exactly the reason page_idle doesn't scale for this use case today. There is also a cgroup-level confound: memcg hotness mixes guest memory with the VMM's own (worker threads, I/O buffers, vhost-user rings). VMA-scoped tracking is the natural unit regardless of the migration story. -- Kiryl Shutsemau / Kirill A. Shutemov

