On Thu, Apr 23, 2026 at 07:08:00PM +0100, Kiryl Shutsemau wrote: > On Thu, Apr 23, 2026 at 10:50:06AM -0400, Peter Xu wrote: > > Hello, Kiryl, > > > > On Thu, Apr 23, 2026 at 03:27:11PM +0100, Kiryl Shutsemau wrote: > > > The patchet is pretty good shape in my eyes and will probably drop RFC > > > tag. > > > > I still have some high level questions not yet got answered. Do you want > > to answer them? > > > > https://lore.kernel.org/all/[email protected]/ > > Sorry, reply to this got lost in my TODO list.
No worries. > > > In summary, it's about: > > > > - Whether we have explored other approaches on page hotness tracking > > So, for read/write tracking we have clear_refs=1, page_idle and DAMON. > Did I miss something? > > clear_refs is process-wide hammer. And you can miss a hot page if it > races with LRU rotation. > > page_idle needs rmap. It will not scale. Yes. If you would benefit from a per-mm page_idle, then it may apply to us too if we will be enforced to implement full-userspace swap in QEMU. That's also why I suggested (in my previous reply) that we split the requirement: one is for hotness tracking, the other is about read-inclusive trapping (v.s. wr-protect only traps). > > DAMON is built around sampling. It is good for working set estimation, > but I don't think it is directly useful for eviction decision. It can > miss hot pages. LRU rotation will also loose info. Exactly. If we need to collect ACCESS bit (or anything similar) for eviction accuracy pusrpose, IIUC we need per-page info, we can't estimate by sampling. > > None of them gives comparable capabilities. I want to see if some of your work can be generalized so we can use too, and we can also work together. > > We also need a mechanism to atomically evict pages. Yes, this is the 2nd question below, and btw uffd-wp can also achieve this. > > > - Whether read protection is required for an userspace swap system > > (e.g. did you get time to have a look at umap?) > > I looked at it briefly, so I can miss details. > > IIUC, in absence of read tracking it doesn't collect hotness information > at all. The eviction is based on fault-in time: the oldest faulted-in For example, let's imagine if we can have a per-mm idle page tracker, would it work for you to collect hotness info? The other idea is, no matter whether we use MGLRU or legacy LRU, if we can expose a better interface to share hotness info from kernel to userspace, would it be possible? > page gets evicted first. I guess it is fine if you don't care much about > refault cost. Like, if your workload fits into memory completely and > refaults are rare. One thing to mention is, if we have any hotness tracking facility ready above (e.g. per-mm idle page tracking) we _will_ trap read faults too; it's just that it'll be much faster (when it's hardware ACCESS bit). So if I'm not wrong, what I am trying to discuss as a full userspace swap system will always trap read too for most of the cases. The difference is only about that 5ms (in case of 30s+5ms example I gave in the other email). Your RW protection will also trap that 5ms, what I described won't: when a decision is made, we wr-protect the page, any read on top of it will still go through so it will trigger a refault. My point is, that 5ms missing over 30s (in reality maybe more than 30s) sampling window (which covered read accesses) isn't a major issue, and IMHO it's not a strong enough reason to include the whole RW feature. The other thing is, as I mentioned in the other email, I still don't know how the current RW protection would work for anonymous. I don't yet think the user swapper can read the anon page with RW-protected pgtables. So far my understanding is maybe you only care about shmem so it's fine, but it'll always be great to confirm with you. Thanks, > > That's not my case. > > -- > Kiryl Shutsemau / Kirill A. Shutemov > -- Peter Xu

