On 1/10/26 06:15, Zi Yan wrote:
> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
> 
>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>
>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>
>>>>>>>>> From: Matthew Brost <[email protected]>
>>>>>>>>>
>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>> pages.
>>>>>>>>>
>>>>>>>>> Cc: Balbir Singh <[email protected]>
>>>>>>>>> Cc: Alistair Popple <[email protected]>
>>>>>>>>> Cc: Zi Yan <[email protected]>
>>>>>>>>> Cc: David Hildenbrand <[email protected]>
>>>>>>>>> Cc: Oscar Salvador <[email protected]>
>>>>>>>>> Cc: Andrew Morton <[email protected]>
>>>>>>>>> Cc: [email protected]
>>>>>>>>> Cc: [email protected]
>>>>>>>>> Cc: [email protected]
>>>>>>>>> Signed-off-by: Matthew Brost <[email protected]>
>>>>>>>>> Signed-off-by: Francois Dugast <[email protected]>
>>>>>>>>> ---
>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>       case MEMORY_DEVICE_COHERENT:
>>>>>>>>>               if (WARN_ON_ONCE(!pgmap->ops || 
>>>>>>>>> !pgmap->ops->folio_free))
>>>>>>>>>                       break;
>>>>>>>>> +
>>>>>>>>> +             folio_split_unref(folio);
>>>>>>>>>               pgmap->ops->folio_free(folio);
>>>>>>>>>               percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>               break;
>>>>>>>>
>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>> which checks the folio order and act upon that.
>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle 
>>>>>>>> the split?
>>>>>>
>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>
>>>>>>>
>>>>>>> Passing an order parameter might be better to avoid exposing core MM 
>>>>>>> internals
>>>>>>> by asking drivers to undo compound pages.
>>>>>>>
>>>>>>
>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>> driver’s folio_free function.
>>>>>
>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>
>>>>
>>>> I agree it is asymmetric and symmetric is likely better.
>>>>
>>>>> In addition, looking at nouveau’s implementation in
>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from 
>>>>> drm->dmem->free_folios,
>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>> is wrong, since if the folio is large, it will go through 
>>>>> prep_compound_page()
>>>>> again. The bug has not manifested because there is only order-9 large 
>>>>> folios.
>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 
>>>>> folio
>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>
>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>> than tracking a free folio list and free page list. But this is not my
>>>> driver.
>>>>
>>>>> is wrong by calling prep_compound_page() on a folio (already compound 
>>>>> page).
>>>>>
>>>>
>>>> I don’t disagree that this implementation is questionable.
>>>>
>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>> doesn’t seem right to me either.
>>>
>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>> to folio_free() make sense to me, since after the split, the folio passed
>>
>> If this is concensous / direction - I can do this but a tree wide
>> change.
>>
>> I do have another question for everyone here - do we think this spliting
>> implementation should be considered a Fixes so this can go into 6.19?
> 
> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
> on a large folio. IIUC this seems to only affect nouveau now, I will let
> them to decide.
> 

Agreed, free_zone_device_folio() needs to split the folio on put.


>>
>>> to folio_free() contains no order information, but just the used-to-be
>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>> handle it without knowing folio order?
>>>
>>
>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>> reference-counted object in GPU SVM. When the object’s reference count
>> drops to zero, we callback into the driver layer to release the memory.
>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>> is then released. If it’s not clear, our original allocation size
>> determines the granularity at which we free the backing store.
>>
>>> Do we really need the order info in ->folio_free() if the folio is split
>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>> ->folio_free() 2^order times to free individual page.
>>>
>>
>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>> reference to our GPU SVM object, so we can free the backing in a single
>> ->folio_free call.
>>
>> Now, if that folio gets split at some point into 4KB pages, then we’d
>> have 512 references to this object set up in the ->folio_split calls.
>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>> whatever reason, we can’t create a 2MB device page during a 2MB
>> migration and need to create 512 4KB pages so we'd have 512 references
>> to our GPU SVM object.
> 

I still don't follow why the folio_order does not capture the order of the 
folio.
If the folio is split, we should now have 512 split folios for THP

> Thank you for the explanation. Adding folio order to ->folio_free() makes
> sense to me now.
> 


Balbir

Reply via email to