On 12 Jan 2026, at 8:45, Jason Gunthorpe wrote: > On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote: >> On 11 Jan 2026, at 19:19, Balbir Singh wrote: >> >>> On 1/12/26 08:35, Matthew Wilcox wrote: >>>> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote: >>>>> The core MM splits the folio before calling folio_free, restoring the >>>>> zone pages associated with the folio to an initialized state (e.g., >>>>> non-compound, pgmap valid, etc...). The order argument represents the >>>>> folio’s order prior to the split which can be used driver side to know >>>>> how many pages are being freed. >>>> >>>> This really feels like the wrong way to fix this problem. >>>> >> >> Hi Matthew, >> >> I think the wording is confusing, since the actual issue is that: >> >> 1. zone_device_page_init() calls prep_compound_page() to form a large folio, >> 2. but free_zone_device_folio() never reverse the course, >> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to >> be done before driver callback ->folio_free(), since once ->folio_free() >> is called, the folio can be reallocated immediately, >> 4. after the undo of prep_compound_page(), folio_order() can no longer >> provide >> the original order information, thus, folio_free() needs that for proper >> device side ref manipulation. > > There is something wrong with the driver if the "folio can be > reallocated immediately". > > The flow generally expects there to be a driver allocator linked to > folio_free() > > 1) Allocator finds free memory > 2) zone_device_page_init() allocates the memory and makes refcount=1 > 3) __folio_put() knows the recount 0. > 4) free_zone_device_folio() calls folio_free(), but it doesn't > actually need to undo prep_compound_page() because *NOTHING* can > use the page pointer at this point. > 5) Driver puts the memory back into the allocator and now #1 can > happen. It knows how much memory to put back because folio->order > is valid from #2 > 6) #1 happens again, then #2 happens again and the folio is in the > right state for use. The successor #2 fully undoes the work of the > predecessor #2.
But how can a successor #2 undo the work if the second #1 only allocates half of the original folio? For example, an order-9 at PFN 0 is allocated and freed, then an order-8 at PFN 0 is allocated and another order-8 at PFN 256 is allocated. How can two #2s undo the same order-9 without corrupting each other’s data? > > If you have races where #1 can happen immediately after #3 then the > driver design is fundamentally broken and passing around order isn't > going to help anything. > > If the allocator is using the struct page memory then step #5 should > also clean up the struct page with the allocator data before returning > it to the allocator. Do you mean ->folio_free() callback should undo prep_compound_page() instead? > > I vaugely remember talking about this before in the context of the Xe > driver.. You can't just take an existing VRAM allocator and layer it > on top of the folios and have it broadly ignore the folio_free > callback. Best Regards, Yan, Zi
