Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing

Zi Yan Fri, 09 Jan 2026 14:14:37 -0800

On 9 Jan 2026, at 17:11, Balbir Singh wrote:

> On 1/10/26 07:43, Zi Yan wrote:
>> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
>>
>>> On 1/10/26 06:15, Zi Yan wrote:
>>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>>
>>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>>
>>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>>
>>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From: Matthew Brost <[email protected]>
>>>>>>>>>>>>
>>>>>>>>>>>> Split device-private and coherent folios into individual pages 
>>>>>>>>>>>> before
>>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of 
>>>>>>>>>>>> the
>>>>>>>>>>>> pages.
>>>>>>>>>>>>
>>>>>>>>>>>> Cc: Balbir Singh <[email protected]>
>>>>>>>>>>>> Cc: Alistair Popple <[email protected]>
>>>>>>>>>>>> Cc: Zi Yan <[email protected]>
>>>>>>>>>>>> Cc: David Hildenbrand <[email protected]>
>>>>>>>>>>>> Cc: Oscar Salvador <[email protected]>
>>>>>>>>>>>> Cc: Andrew Morton <[email protected]>
>>>>>>>>>>>> Cc: [email protected]
>>>>>>>>>>>> Cc: [email protected]
>>>>>>>>>>>> Cc: [email protected]
>>>>>>>>>>>> Signed-off-by: Matthew Brost <[email protected]>
>>>>>>>>>>>> Signed-off-by: Francois Dugast <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio 
>>>>>>>>>>>> *folio)
>>>>>>>>>>>>    case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>>            if (WARN_ON_ONCE(!pgmap->ops || 
>>>>>>>>>>>> !pgmap->ops->folio_free))
>>>>>>>>>>>>                    break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +          folio_split_unref(folio);
>>>>>>>>>>>>            pgmap->ops->folio_free(folio);
>>>>>>>>>>>>            percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>>            break;
>>>>>>>>>>>
>>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle 
>>>>>>>>>>> the split?
>>>>>>>>>
>>>>>>>>> 'let the driver handle the split?' - I had consisder this as an 
>>>>>>>>> option.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM 
>>>>>>>>>> internals
>>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / 
>>>>>>>>> GPU
>>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>>> driver’s folio_free function.
>>>>>>>>
>>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>>
>>>>>>>
>>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>>
>>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from 
>>>>>>>> drm->dmem->free_folios,
>>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>>> is wrong, since if the folio is large, it will go through 
>>>>>>>> prep_compound_page()
>>>>>>>> again. The bug has not manifested because there is only order-9 large 
>>>>>>>> folios.
>>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 
>>>>>>>> folio
>>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>>
>>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>>> driver.
>>>>>>>
>>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound 
>>>>>>>> page).
>>>>>>>>
>>>>>>>
>>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>>
>>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>>> doesn’t seem right to me either.
>>>>>>
>>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>>
>>>>> If this is concensous / direction - I can do this but a tree wide
>>>>> change.
>>>>>
>>>>> I do have another question for everyone here - do we think this spliting
>>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>>
>>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>>> them to decide.
>>>>
>>>
>>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>>
>>>
>>>>>
>>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>>> handle it without knowing folio order?
>>>>>>
>>>>>
>>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>>> drops to zero, we callback into the driver layer to release the memory.
>>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>>> is then released. If it’s not clear, our original allocation size
>>>>> determines the granularity at which we free the backing store.
>>>>>
>>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>>> ->folio_free() 2^order times to free individual page.
>>>>>>
>>>>>
>>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>>> ->folio_free call.
>>>>>
>>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>>> have 512 references to this object set up in the ->folio_split calls.
>>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>>> to our GPU SVM object.
>>>>
>>>
>>> I still don't follow why the folio_order does not capture the order of the 
>>> folio.
>>> If the folio is split, we should now have 512 split folios for THP
>>
>> folio_order() should return 0 after the folio is split.
>>
>> In terms of the number of after-split folios, it is 512 for current code base
>> since THP is only 2MB in zone devices, but not future proof if mTHP support
>> is added. It also causes confusion in core MM, where folio can have
>> all kinds of orders.
>>
>>
>
> I see that folio_split_unref() to see that there is no driver
> callback during the split. Patch 3 controls the order of
>
> +             folio_split_unref(folio);
>               pgmap->ops->folio_free(folio);
>
> @Matthew, is there a reason to do the split prior to free? 
> pgmap->ops->folio_free(folio)
> shouldn't impact the folio itself, the backing memory can be freed and then 
> the
> folio split?


Quote Matthew from [1]:

... this step must be done before calling folio_free and include a barrier,
as the page can be immediately reallocated.

[1] https://lore.kernel.org/all/[email protected]/

Best Regards,
Yan, Zi

Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing

Reply via email to