On Mon, Feb 4, 2019 at 8:43 PM Hugh Dickins <hu...@google.com> wrote: > > Something I shall not be doing, is verifying the correctness of the > low-level get_page_unless_zero() versus page_ref_freeze() protocol > on arm64 and power - nobody has reported on x86, and I do wonder if > there's a barrier missing somewhere, that could manifest in this way - > but I'm unlikely to be the one to find that (and also think that any > weakness there should have shown up long before now).
Remind me what the page_ref_freeze() rules even _are_? It's a very special thing, setting the page count down to zero if it matches the "expected" count. Now, if another CPU does a put_page() at that point, that certainly will hit the "oops, we dropped the ref to something that was zero". So the "expected" count had better be only references we have and own 100%, but some of those references aren't really necessarily private to our thread. For example, what happens if (a) one CPU is doing migration_entry_wait() (counting expected page refs etc, before doing page_ref_freeze) (b) another CPU is dirtying a page that was in the swap cache and takes a reference to it, but drops it from the swap cache Note how (b) does not change the refcount on the page at all, because it just moves the ref-count from "swap cache entry" to "I own the page in my page tables". Which means that when (a) does the "count expected count, and match it", it happily matches, and the page_ref_freeze() succeeds and makes the page count be zero. But now (b) has a private reference to that page, and can drop it, so the "freeze" isn't a freeze at all. Ok, so clearly the above cannot happen, and there's something I'm missing with the freezing. I think we hold the page lock while this is going on, which means those two things cannot happen at the same time. But maybe there is something else that does the above kind of "move page ref from one owner to another"? The page_ref_freeze() rules don't seem to be documented anywhere. Linus