On 1/10/26 06:15, Zi Yan wrote: > On 9 Jan 2026, at 15:03, Matthew Brost wrote: > >> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote: >>> On 9 Jan 2026, at 14:08, Matthew Brost wrote: >>> >>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote: >>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote: >>>>> >>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote: >>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 1/9/26 10:54, Francois Dugast wrote: >>>>>>>> >>>>>>>>> From: Matthew Brost <[email protected]> >>>>>>>>> >>>>>>>>> Split device-private and coherent folios into individual pages before >>>>>>>>> freeing so that any order folio can be formed upon the next use of the >>>>>>>>> pages. >>>>>>>>> >>>>>>>>> Cc: Balbir Singh <[email protected]> >>>>>>>>> Cc: Alistair Popple <[email protected]> >>>>>>>>> Cc: Zi Yan <[email protected]> >>>>>>>>> Cc: David Hildenbrand <[email protected]> >>>>>>>>> Cc: Oscar Salvador <[email protected]> >>>>>>>>> Cc: Andrew Morton <[email protected]> >>>>>>>>> Cc: [email protected] >>>>>>>>> Cc: [email protected] >>>>>>>>> Cc: [email protected] >>>>>>>>> Signed-off-by: Matthew Brost <[email protected]> >>>>>>>>> Signed-off-by: Francois Dugast <[email protected]> >>>>>>>>> --- >>>>>>>>> mm/memremap.c | 2 ++ >>>>>>>>> 1 file changed, 2 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c >>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644 >>>>>>>>> --- a/mm/memremap.c >>>>>>>>> +++ b/mm/memremap.c >>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio) >>>>>>>>> case MEMORY_DEVICE_COHERENT: >>>>>>>>> if (WARN_ON_ONCE(!pgmap->ops || >>>>>>>>> !pgmap->ops->folio_free)) >>>>>>>>> break; >>>>>>>>> + >>>>>>>>> + folio_split_unref(folio); >>>>>>>>> pgmap->ops->folio_free(folio); >>>>>>>>> percpu_ref_put_many(&folio->pgmap->ref, nr); >>>>>>>>> break; >>>>>>>> >>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free >>>>>>>> which checks the folio order and act upon that. >>>>>>>> Maybe add an order parameter to folio_free or let the driver handle >>>>>>>> the split? >>>>>> >>>>>> 'let the driver handle the split?' - I had consisder this as an option. >>>>>> >>>>>>> >>>>>>> Passing an order parameter might be better to avoid exposing core MM >>>>>>> internals >>>>>>> by asking drivers to undo compound pages. >>>>>>> >>>>>> >>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s >>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my >>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU >>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to >>>>>> split the folio, so I’m leaning toward moving this call into the >>>>>> driver’s folio_free function. >>>>> >>>>> No, that creates asymmetric page handling and is error prone. >>>>> >>>> >>>> I agree it is asymmetric and symmetric is likely better. >>>> >>>>> In addition, looking at nouveau’s implementation in >>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from >>>>> drm->dmem->free_folios, >>>>> which is never split, and passes it to zone_device_folio_init(). This >>>>> is wrong, since if the folio is large, it will go through >>>>> prep_compound_page() >>>>> again. The bug has not manifested because there is only order-9 large >>>>> folios. >>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 >>>>> folio >>>>> from a free order-9 folio? Maintain a per-order free folio list and >>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation >>>> >>>> The way Nouveau handles memory allocations here looks wrong to me—it >>>> should probably use DRM Buddy and convert a block buddy to pages rather >>>> than tracking a free folio list and free page list. But this is not my >>>> driver. >>>> >>>>> is wrong by calling prep_compound_page() on a folio (already compound >>>>> page). >>>>> >>>> >>>> I don’t disagree that this implementation is questionable. >>>> >>>> So what’s the suggestion here—add folio order to folio_free just to >>>> accommodate Nouveau’s rather odd memory allocation algorithm? That >>>> doesn’t seem right to me either. >>> >>> Splitting the folio in free_zone_device_folio() and passing folio order >>> to folio_free() make sense to me, since after the split, the folio passed >> >> If this is concensous / direction - I can do this but a tree wide >> change. >> >> I do have another question for everyone here - do we think this spliting >> implementation should be considered a Fixes so this can go into 6.19? > > IMHO, this should be a fix, since it is wrong to call prep_compound_page() > on a large folio. IIUC this seems to only affect nouveau now, I will let > them to decide. >
Agreed, free_zone_device_folio() needs to split the folio on put. >> >>> to folio_free() contains no order information, but just the used-to-be >>> head page and the remaining 511 pages are free. How does Intel Xe driver >>> handle it without knowing folio order? >>> >> >> It’s a bit convoluted, but folio/page->zone_device_data points to a >> reference-counted object in GPU SVM. When the object’s reference count >> drops to zero, we callback into the driver layer to release the memory. >> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which >> is then released. If it’s not clear, our original allocation size >> determines the granularity at which we free the backing store. >> >>> Do we really need the order info in ->folio_free() if the folio is split >>> in free_zone_device_folio()? free_zone_device_folio() should just call >>> ->folio_free() 2^order times to free individual page. >>> >> >> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one >> reference to our GPU SVM object, so we can free the backing in a single >> ->folio_free call. >> >> Now, if that folio gets split at some point into 4KB pages, then we’d >> have 512 references to this object set up in the ->folio_split calls. >> We’d then expect 512 ->folio_free() calls. Same case here: if, for >> whatever reason, we can’t create a 2MB device page during a 2MB >> migration and need to create 512 4KB pages so we'd have 512 references >> to our GPU SVM object. > I still don't follow why the folio_order does not capture the order of the folio. If the folio is split, we should now have 512 split folios for THP > Thank you for the explanation. Adding folio order to ->folio_free() makes > sense to me now. > Balbir
