On 14 May 2025, at 13:28, David Hildenbrand wrote:

>>>
>>> Note that PageOffline() is a bit confusing because it's "Memory block 
>>> online but page is logically offline (e.g., has a memmap that can be 
>>> touched, but the page content should not be touched)".
>>
>> So PageOffline() is before memory block offline, which is the first phase of
>> memory hotunplug.
>
> Yes.
>
>>
>>>
>>> (memory block offline -> all pages offline and have effectively no state 
>>> because the memmap is stale)
>>
>> What do you mean by memmap is stale? When a memory block is offline, memmap 
>> is
>> still present, so pfn scanner can see these pages. pfn scanner checks memmap
>> to know that it should not touch these pages, right?
>
> See pfn_to_online_page() for exactly that use case.
>
> For an offline memory section (either because it was just added or because it 
> was just offlined), the memmap is assumed to contain garbage and should not 
> be touched.
>
> See remove_pfn_range_from_zone() -> page_init_poison().
>
>>
>>>
>>>> removed from page allocator.
>>>
>>> Usually, all pages are freed back to the buddy (isolated pageblock -> put 
>>> onto the isolated list). Memory offlining code can then simply grab these 
>>> "free" pages from the buddy -- no PageOffline involved.
>>>
>>> If something fails during memory offlining, these isolated pages are simply 
>>> put back on the appropriate migratetype list and become ordinary free pages 
>>> that can be allocated immediately.
>>
>> I am familiar with this part. Then, when PageOffline is used?
>>
>>  From the comment in page-flags.h, I see two examples: inflated pages by 
>> balloon driver
>> and not onlined pages when onlining the section. These are two different 
>> operations:
>> 1) inflated pages are going to be offline, 2) not onlined pages are going to 
>> be
>> online. But you mentioned above that Memory off lining code does not involve
>> PageOffline, so inflated pages by balloon driver is not part of memory 
>> offlining
>> code, but a different way of offlining pages. Am I getting it right?
>
> Yes. PageOffline means logically offline, for whatever reason someone decides 
> to turn pages logically offline.
>
> Memory ballooning uses and virtio-mem are two users, there are more.
>
>>
>> I read a little bit more on memory ballooning and virtio-mem and understand
>> that memory ballooning still keeps the inflated page but guest cannot 
>> allocate
>> and use it, whereas virtio-mem and memory hotunplug remove the page from
>> Linux completely (i.e., Linux no longer sees the memory).
>
> In virtio-mem terms, they are considered "fake offline" -- memory behaves as 
> if it would never have been onlined, but there is a memmap for it. Like a 
> (current) memory hole.
>
>>
>> It seems that I am mixing memory offlining and memory hotunplug. IIUC,
>> memory offlining means no one can allocate and use the offlined memory, but
>> Linux still sees it; memory hotunplug means Linux no longer sees it (no 
>> related
>> memmap and other metadata). Am I getting it right?
>
> The doc has this "Phases of Memory Hotplug" description, where it is roughly 
> divided into that, yes.
>
>>
>>>
>>> Some PageOffline pages can be migrated using the non-folio migration: this 
>>> is done for memory ballooning (memory comapction). As they get migrated, 
>>> they are freed back to the buddy, PageOffline() is cleared -- they become 
>>> PageBuddy() -- and the above applies.
>>
>> After a PageOffline page is migrated, the destination page becomes 
>> PageOffline, right?
>> OK, I see it in balloon_page_insert().
>
> Yes.
>
>>
>>>
>>> Other PageOffline pages can be skipped during memory offlining (virtio-mem 
>>> use case, what we are doing her). We don't want them to ever go through the 
>>> buddy, especially because if memory offlining fails they must definitely 
>>> not be treated like free pages that can be allocated immediately.
>>
>> What do you mean by "skipped during memory offlining"? Are you implying when
>> virtio-mem is offlining some pages by marking it PageOffline and 
>> PG_offline_skippable,
>> someone else can do memory offlining in parallel?
>
> It could happen (e.g., manually offline a Linux memory block using sysfs), 
> but that is not the primary use case.
>
> virtio-mem unplugs memory in the following sequence:
>
> 1) alloc_contig_range() small blocks (e.g., 2 MiB)
>
> 2) Report the blocks to the hypervisor
>
> 3) Mark them fake-offline: PageOffline (+ PageOfflineSkippable now)
>
> Once all small blocks that comprise a Linux memory block (e.g., 128 MiB) are 
> fake-offline, offline the memory block and remove the memory using 
> offline_and_remove_memory().
>
> In that operation -- offline_and_remove_memory() -- memory offlining code 
> must be able to skip these PageOffline pages, otherwise 
> offline_and_remove_memory() will just fail, saying that there are unmovable 
> pages in there.
>
>>
>>>
>>> Next, the page is removed from its memory
>>>> block. When will PG_offline_skippable be used? The second phase when
>>>> the page is being removed from its memory block?
>>>
>>> PG_offline_skippable is used during memory offlining, while we look for any 
>>> pages that are not PageBuddy (... or hwpoisoned ...), to migrate them off 
>>> the memory so they get converted to PageBuddy.
>>>
>>> PageOffline + PageOfflineSkippable are checked on that phase, such that 
>>> they don't require any migration.
>>
>> Hmm, if you just do not want to get PageOffline migrated, not setting it
>> __PageMovable would work right? PageOffline + __PageMovable is used by
>> ballooning, as these inflated pages can be migrated. PageOffline without
>> __PageMovable should be virtio-mem. Am I missing any other user?
>
> Sure. Just imagine !CONFIG_BALLOON_COMPACTION.
>
> In summary, we have
>
> 1) Migratable PageOffline pages (balloon compaction)
>
> 2) Unmigratable PageOffline pages (e.g., XEN balloon, hyper-v balloon,
>    memtrace, in the future likely some memory holes, ... )
>
> 3) Skippable PageOffline pages (virtio-mem)

Thank you for all the explanation. Now I understand how memory offline
and memory hotunplug work and shall begin to check the patches. :)


--
Best Regards,
Yan, Zi

Reply via email to