On 14 May 2025, at 15:51, David Hildenbrand wrote: > Thanks a bucnh for the review! > >>> + >>> + if (PageOffline(page) && PageOfflineSkippable(page)) >>> + continue; >>> + >> >> Some comment like "Skippable PageOffline() pages are not migratable but are >> skipped during memory offline" might help understand the change. > > I can add a comment like for the other cases.
Thanks. >> >> Some thoughts after reading the related code: >> >> offline_pages() is a little convoluted, since it has two steps to make >> sure a range of memory can be offlined: >> 1. start_isolate_page_range() checks unmovable (maybe not migratable >> is more precise) pages using has_unmovable_pages(), but leaves unmovable >> PageOffline() page handling to the driver; > > Right, it's best-effort. For ZONE_MOVABLE we're skipping the checks > completely. > > I was wondering for a longer time that -- with the isolate flag being a > separate bit soon :) -- we could simply isolate the whole range, and then > fail if we stumble over Talking about that, I will need your input on my change to move_pfn_range_to_zone() in online_pages()[1]. Making MIGRATE_ISOLATE a separate bit means if you isolate a pageblock without a migratetype, unisolating it will give an unpredictable migratetype. [1] https://lore.kernel.org/linux-mm/20250509200111.3372279-3-...@nvidia.com/ > an unmovable page during migration. That is, get rid of has_unmovable_pages() > entirely. Un-doing the isolation would then properly preserve the migratetype > -- no harm done? > > Certainly worth a look. What do you think about that? In principle, the method should work and simplifies the code. But it suffers more penalty (pages are migrated) when an unmovable page is encountered after the isolation, since before nothing will be migrated. To mitigate this, we probably would want some retry mechanism. For example, register a callback to each unmovable page and once the unmovable page is deallocated, alloc_contig_range() can move forward progress. > >> 2. scan_movable_pages() and do_migrate_range() migrate pages and handle >> different types of PageOffline() pages. > > Right, migrate what we can, skip these special once, and bail out on any > others (at least in this patch, patch #2 restores the pre-virtio-mem > behavior). > >> >> It might make the logic cleaner if start_isolate_page_range() can >> have a callback to allow driver to handle PageOffline() pages. > > We have to identify them repeadetly, so start_isolate_page_range() would not > be sufficient. Also, callbacks are rather tricky for the case where we cannot > really stabilize the page. During start_isolate_page_range(), all pageblocks are isolated, so free pages cannot be used by anyone else, meaning no new PageOffline() pages or any other unmovable pages, right? > >> >> Hmm, Skippable PageOffline() pages are virtio-mem specific, I wonder >> why offline_pages() takes care of them. Shouldn't virtio-mem driver >> handle them? >> I also realize that the two steps in offline_pages()> are very similar to >> alloc_contig_range() and virtio-mem is using >> alloc_contig_range(). I wonder if the two steps in offline_pages() >> can be abstracted out and shared with virtio-mem.Yes, offline_pages()> >> operates at memory section granularity, different from the granularity, >> pageblock size, of alloc_contig_range() used in virtio-mem, but >> both are trying to check unmovable pages and migrate movable pages. > > Not sure I get completely what you mean. virtio-mem really wants to use > alloc_contig_range() to allocate ranges it wants to unplug, because it will > fail fast and allows for smaller ranges. > > offline_pages() is primarily also about offlining the memory section, which > virtio-mem must do in order to remove the Linux memory block. To clarify, I mean the steps of start_isolate_page_range(), scan_movable_pages(), do_migrate_range(), dissolve_free_hugetlb_folios() and test_pages_isolated() in offline_pages() is very similar to the steps of start_isolate_page_range(), __alloc_contig_migrate_range(), replace_free_hugepage_folios(), and test_pages_isolate() in alloc_contig_range(). So I wonder if a common function routine can be shared. > >> >> >>> folio = page_folio(page); >>> >>> if (!folio_try_get(folio)) >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index a6fe1e9b95941..325b51c905076 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -7023,12 +7023,12 @@ unsigned long __offline_isolated_pages(unsigned >>> long start_pfn, >>> continue; >>> } >>> /* >>> - * At this point all remaining PageOffline() pages have a >>> - * reference count of 0 and can simply be skipped. >>> + * At this point all remaining PageOffline() pages must be >>> + * "skippable" and have exactly one reference. >>> */ >>> if (PageOffline(page)) { >>> - BUG_ON(page_count(page)); >>> - BUG_ON(PageBuddy(page)); >>> + WARN_ON_ONCE(!PageOfflineSkippable(page)); >>> + WARN_ON_ONCE(page_count(page) != 1); >>> already_offline++; >>> pfn++; >>> continue; >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index b2fc5266e3d26..debd898b794ea 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -121,16 +121,11 @@ static struct page *has_unmovable_pages(unsigned long >>> start_pfn, unsigned long e >>> continue; >>> >>> /* >>> - * We treat all PageOffline() pages as movable when offlining >>> - * to give drivers a chance to decrement their reference count >>> - * in MEM_GOING_OFFLINE in order to indicate that these pages >>> - * can be offlined as there are no direct references anymore. >>> - * For actually unmovable PageOffline() where the driver does >>> - * not support this, we will fail later when trying to actually >>> - * move these pages that still have a reference count > 0. >>> - * (false negatives in this function only) >>> + * PageOffline() pages that are marked as "skippable" >>> + * are treated like movable pages for memory offlining. >>> */ >>> - if ((flags & MEMORY_OFFLINE) && PageOffline(page)) >>> + if ((flags & MEMORY_OFFLINE) && PageOffline(page) && >>> + PageOfflineSkippable(page)) >>> continue; >> >> With this change, we are no longer give non-virtio-mem driver a chance >> to decrease PageOffline(page) refcount? Or virtio-mem is the only driver >> doing this? > > The only in-tree driver making use of this so far, yes. Got it. Thanks. > > > There is one tweak I'll have to perform in the virtio-mem driver to cover one > corner case: when force-unloading the virtio-mem driver, we have to make sure > that memory blocks with fake-offline pages cannot get offlined (re-onlining > would be bad). I'll fix that up. -- Best Regards, Yan, Zi