On Tue, Sep 23, 2025 at 01:44:20PM +1000, Balbir Singh wrote: > On 9/23/25 12:23, Zi Yan wrote: > > On 16 Sep 2025, at 8:21, Balbir Singh wrote: > > > >> Extend migrate_vma_collect_pmd() to handle partially mapped large folios > >> that require splitting before migration can proceed. > >> > >> During PTE walk in the collection phase, if a large folio is only > >> partially mapped in the migration range, it must be split to ensure the > >> folio is correctly migrated. > >> > >> Signed-off-by: Balbir Singh <[email protected]> > >> Cc: David Hildenbrand <[email protected]> > >> Cc: Zi Yan <[email protected]> > >> Cc: Joshua Hahn <[email protected]> > >> Cc: Rakie Kim <[email protected]> > >> Cc: Byungchul Park <[email protected]> > >> Cc: Gregory Price <[email protected]> > >> Cc: Ying Huang <[email protected]> > >> Cc: Alistair Popple <[email protected]> > >> Cc: Oscar Salvador <[email protected]> > >> Cc: Lorenzo Stoakes <[email protected]> > >> Cc: Baolin Wang <[email protected]> > >> Cc: "Liam R. Howlett" <[email protected]> > >> Cc: Nico Pache <[email protected]> > >> Cc: Ryan Roberts <[email protected]> > >> Cc: Dev Jain <[email protected]> > >> Cc: Barry Song <[email protected]> > >> Cc: Lyude Paul <[email protected]> > >> Cc: Danilo Krummrich <[email protected]> > >> Cc: David Airlie <[email protected]> > >> Cc: Simona Vetter <[email protected]> > >> Cc: Ralph Campbell <[email protected]> > >> Cc: Mika Penttilä <[email protected]> > >> Cc: Matthew Brost <[email protected]> > >> Cc: Francois Dugast <[email protected]> > >> --- > >> mm/migrate_device.c | 82 +++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 82 insertions(+) > >> > >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c > >> index abd9f6850db6..70c0601f70ea 100644 > >> --- a/mm/migrate_device.c > >> +++ b/mm/migrate_device.c > >> @@ -54,6 +54,53 @@ static int migrate_vma_collect_hole(unsigned long start, > >> return 0; > >> } > >> > >> +/** > >> + * migrate_vma_split_folio() - Helper function to split a THP folio > >> + * @folio: the folio to split > >> + * @fault_page: struct page associated with the fault if any > >> + * > >> + * Returns 0 on success > >> + */ > >> +static int migrate_vma_split_folio(struct folio *folio, > >> + struct page *fault_page) > >> +{ > >> + int ret; > >> + struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL; > >> + struct folio *new_fault_folio = NULL; > >> + > >> + if (folio != fault_folio) { > >> + folio_get(folio); > >> + folio_lock(folio); > >> + } > >> + > >> + ret = split_folio(folio); > >> + if (ret) { > >> + if (folio != fault_folio) { > >> + folio_unlock(folio); > >> + folio_put(folio); > >> + } > >> + return ret; > >> + } > >> + > >> + new_fault_folio = fault_page ? page_folio(fault_page) : NULL; > >> + > >> + /* > >> + * Ensure the lock is held on the correct > >> + * folio after the split > >> + */ > >> + if (!new_fault_folio) { > >> + folio_unlock(folio); > >> + folio_put(folio); > >> + } else if (folio != new_fault_folio) { > >> + folio_get(new_fault_folio); > >> + folio_lock(new_fault_folio); > >> + folio_unlock(folio); > >> + folio_put(folio); > >> + } > >> + > >> + return 0; > >> +} > >> + > >> static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> unsigned long start, > >> unsigned long end, > >> @@ -136,6 +183,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> * page table entry. Other special swap entries are not > >> * migratable, and we ignore regular swapped page. > >> */ > >> + struct folio *folio; > >> + > >> entry = pte_to_swp_entry(pte); > >> if (!is_device_private_entry(entry)) > >> goto next; > >> @@ -147,6 +196,23 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> pgmap->owner != migrate->pgmap_owner) > >> goto next; > >> > >> + folio = page_folio(page); > >> + if (folio_test_large(folio)) { > >> + int ret; > >> + > >> + pte_unmap_unlock(ptep, ptl); > >> + ret = migrate_vma_split_folio(folio, > >> + migrate->fault_page); > >> + > >> + if (ret) { > >> + ptep = pte_offset_map_lock(mm, pmdp, > >> addr, &ptl); > >> + goto next; > >> + } > >> + > >> + addr = start; > >> + goto again; > >> + } > > > > This does not look right to me. > > > > The folio here is device private, but migrate_vma_split_folio() > > calls split_folio(), which cannot handle device private folios yet. > > Your change to split_folio() is in Patch 10 and should be moved > > before this patch. > > > > Patch 10 is to split the folio in the middle of migration (when we have > converted the entries to migration entries). This patch relies on the > changes in patch 4. I agree the names are confusing, I'll reword the > functions
Hi Balbir, I am still reviewing the patches, but I think I agree with Zi here. split_folio() will replace the PMD mappings of the huge folio with PTE mappings, but will also split the folio into smaller folios. The former is ok with this patch, but the latter is probably not correct if the folio is a zone device folio. The driver needs to know about the change, as usually the driver will have some sort of mapping between GPU physical memory chunks and their corresponding zone device pages.
