Re: [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount
Jason Gunthorpe writes: > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page >> refcount") device private pages have no longer had an extra reference >> count when the page is in use. However before handing them back to the >> owning device driver we add an extra reference count such that free >> pages have a reference count of one. >> >> This makes it difficult to tell if a page is free or not because both >> free and in use pages will have a non-zero refcount. Instead we should >> return pages to the drivers page allocator with a zero reference count. >> Kernel code can then safely use kernel functions such as >> get_page_unless_zero(). >> >> Signed-off-by: Alistair Popple >> --- >> arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + >> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + >> lib/test_hmm.c | 1 + >> mm/memremap.c| 5 - >> mm/page_alloc.c | 6 ++ >> 6 files changed, 10 insertions(+), 5 deletions(-) > > I think this is a great idea, but I'm surprised no dax stuff is > touched here? free_zone_device_page() shouldn't be called for pgmap->type == MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX there. Except that the folio code looks like it might have introduced a bug. AFAICT put_page() always calls put_devmap_managed_page(>page) but folio_put() does not (although folios_put() does!). So it seems folio_put() won't end up calling __put_devmap_managed_page_refs() as I think it should. I think you're right about the change to __init_zone_device_page() - I should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to look at Dan's patch series more closely as I suspect it might be better to rebase this patch on top of that. > Jason
Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
Felix Kuehling writes: > On 2022-09-26 17:35, Lyude Paul wrote: >> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: >>> When the module is unloaded or a GPU is unbound from the module it is >>> possible for device private pages to be left mapped in currently running >>> processes. This leads to a kernel crash when the pages are either freed >>> or accessed from the CPU because the GPU and associated data structures >>> and callbacks have all been freed. >>> >>> Fix this by migrating any mappings back to normal CPU memory prior to >>> freeing the GPU memory chunks and associated device private pages. >>> >>> Signed-off-by: Alistair Popple >>> >>> --- >>> >>> I assume the AMD driver might have a similar issue. However I can't see >>> where device private (or coherent) pages actually get unmapped/freed >>> during teardown as I couldn't find any relevant calls to >>> devm_memunmap(), memunmap(), devm_release_mem_region() or >>> release_mem_region(). So it appears that ZONE_DEVICE pages are not being >>> properly freed during module unload, unless I'm missing something? >> I've got no idea, will poke Ben to see if they know the answer to this > > I guess we're relying on devm to release the region. Isn't the whole point of > using devm_request_free_mem_region that we don't have to remember to > explicitly > release it when the device gets destroyed? I believe we had an explicit free > call at some point by mistake, and that caused a double-free during module > unload. See this commit for reference: Argh, thanks for that pointer. I was not so familiar with devm_request_free_mem_region()/devm_memremap_pages() as currently Nouveau explicitly manages that itself. > commit 22f4f4faf337d5fb2d2750aff13215726814273e > Author: Philip Yang > Date: Mon Sep 20 17:25:52 2021 -0400 > > drm/amdkfd: fix svm_migrate_fini warning > Device manager releases device-specific resources when a driver > disconnects from a device, devm_memunmap_pages and > devm_release_mem_region calls in svm_migrate_fini are redundant. > It causes below warning trace after patch "drm/amdgpu: Split > amdgpu_device_fini into early and late", so remove function > svm_migrate_fini. > BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 > WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 > devm_release_action+0x51/0x60 > Call Trace: > ? memunmap_pages+0x360/0x360 > svm_migrate_fini+0x2d/0x60 [amdgpu] > kgd2kfd_device_exit+0x23/0xa0 [amdgpu] > amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] > amdgpu_device_fini_sw+0x45/0x290 [amdgpu] > amdgpu_driver_release_kms+0x12/0x30 [amdgpu] > drm_dev_release+0x20/0x40 [drm] > release_nodes+0x196/0x1e0 > device_release_driver_internal+0x104/0x1d0 > driver_detach+0x47/0x90 > bus_remove_driver+0x7a/0xd0 > pci_unregister_driver+0x3d/0x90 > amdgpu_exit+0x11/0x20 [amdgpu] > Signed-off-by: Philip Yang > Reviewed-by: Felix Kuehling > Signed-off-by: Alex Deucher > > Furthermore, I guess we are assuming that nobody is using the GPU when the > module is unloaded. As long as any processes have /dev/kfd open, you won't be > able to unload the module (except by force-unload). I suppose with ZONE_DEVICE > memory, we can have references to device memory pages even when user mode has > closed /dev/kfd. We do have a cleanup handler that runs in an > MMU-free-notifier. > In theory that should run after all the pages in the mm_struct have been > freed. > It releases all sorts of other device resources and needs the driver to still > be > there. I'm not sure if there is anything preventing a module unload before the > free-notifier runs. I'll look into that. Right - module unload (or device unbind) is one of the other ways we can hit this issue in Nouveau at least. You can end up with ZONE_DEVICE pages mapped in a running process after the module has unloaded. Although now you mention it that seems a bit wrong - the pgmap refcount should provide some protection against that. Will have to look into that too. > Regards, > Felix > > >> >>> --- >>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- >>> 1 file changed, 48 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> index 66ebbd4..3b247b8 100644 >>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) >>> mutex_unlock(>dmem->mutex); >>> } >>> +/* >>> + * Evict all pages mapping a chunk. >>> + */ >>> +void >>> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) >>> +{ >>> + unsigned long i, npages = range_len(>pagemap.range) >> >>> PAGE_SHIFT; >>> + unsigned long *src_pfns, *dst_pfns; >>> + dma_addr_t *dma_addrs; >>> + struct
Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
John Hubbard writes: > On 9/26/22 14:35, Lyude Paul wrote: >>> + for (i = 0; i < npages; i++) { >>> + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { >>> + struct page *dpage; >>> + >>> + /* >>> +* _GFP_NOFAIL because the GPU is going away and there >>> +* is nothing sensible we can do if we can't copy the >>> +* data back. >>> +*/ >> >> You'll have to excuse me for a moment since this area of nouveau isn't one of >> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite >> retry, in the case of a GPU hotplug event I would assume we would rather just >> stop trying to migrate things to the GPU and just drop the data instead of >> hanging on infinite retries. >> No problem, thanks for taking a look! > Hi Lyude! > > Actually, I really think it's better in this case to keep trying > (presumably not necessarily infinitely, but only until memory becomes > available), rather than failing out and corrupting data. > > That's because I'm not sure it's completely clear that this memory is > discardable. And at some point, we're going to make this all work with > file-backed memory, which will *definitely* not be discardable--I > realize that we're not there yet, of course. > > But here, it's reasonable to commit to just retrying indefinitely, > really. Memory should eventually show up. And if it doesn't, then > restarting the machine is better than corrupting data, generally. The memory is definitely not discardable here if the migration failed because that implies it is still mapped into some userspace process. We could avoid restarting the machine by doing something similar to what happens during memory failure and killing every process that maps the page(s). But overall I think it's better to retry until memory is available, because that allows things like reclaim to work and in the worst case allows the OOM killer to select an appropriate task to kill. It also won't cause data corruption if/when we have file-backed memory. > thanks,
Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On 2022-09-26 17:35, Lyude Paul wrote: On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? I've got no idea, will poke Ben to see if they know the answer to this I guess we're relying on devm to release the region. Isn't the whole point of using devm_request_free_mem_region that we don't have to remember to explicitly release it when the device gets destroyed? I believe we had an explicit free call at some point by mistake, and that caused a double-free during module unload. See this commit for reference: commit 22f4f4faf337d5fb2d2750aff13215726814273e Author: Philip Yang Date: Mon Sep 20 17:25:52 2021 -0400 drm/amdkfd: fix svm_migrate_fini warning Device manager releases device-specific resources when a driver disconnects from a device, devm_memunmap_pages and devm_release_mem_region calls in svm_migrate_fini are redundant. It causes below warning trace after patch "drm/amdgpu: Split amdgpu_device_fini into early and late", so remove function svm_migrate_fini. BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 devm_release_action+0x51/0x60 Call Trace: ? memunmap_pages+0x360/0x360 svm_migrate_fini+0x2d/0x60 [amdgpu] kgd2kfd_device_exit+0x23/0xa0 [amdgpu] amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] amdgpu_device_fini_sw+0x45/0x290 [amdgpu] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] drm_dev_release+0x20/0x40 [drm] release_nodes+0x196/0x1e0 device_release_driver_internal+0x104/0x1d0 driver_detach+0x47/0x90 bus_remove_driver+0x7a/0xd0 pci_unregister_driver+0x3d/0x90 amdgpu_exit+0x11/0x20 [amdgpu] Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Furthermore, I guess we are assuming that nobody is using the GPU when the module is unloaded. As long as any processes have /dev/kfd open, you won't be able to unload the module (except by force-unload). I suppose with ZONE_DEVICE memory, we can have references to device memory pages even when user mode has closed /dev/kfd. We do have a cleanup handler that runs in an MMU-free-notifier. In theory that should run after all the pages in the mm_struct have been freed. It releases all sorts of other device resources and needs the driver to still be there. I'm not sure if there is anything preventing a module unload before the free-notifier runs. I'll look into that. Regards, Felix --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(>dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(>pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* +* _GFP_NOFAIL because the GPU is going away and there +* is nothing sensible we can do if we can't copy the +* data back. +*/ You'll have to excuse me for a moment since this area
Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On 9/26/22 14:35, Lyude Paul wrote: >> +for (i = 0; i < npages; i++) { >> +if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { >> +struct page *dpage; >> + >> +/* >> + * _GFP_NOFAIL because the GPU is going away and there >> + * is nothing sensible we can do if we can't copy the >> + * data back. >> + */ > > You'll have to excuse me for a moment since this area of nouveau isn't one of > my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite > retry, in the case of a GPU hotplug event I would assume we would rather just > stop trying to migrate things to the GPU and just drop the data instead of > hanging on infinite retries. > Hi Lyude! Actually, I really think it's better in this case to keep trying (presumably not necessarily infinitely, but only until memory becomes available), rather than failing out and corrupting data. That's because I'm not sure it's completely clear that this memory is discardable. And at some point, we're going to make this all work with file-backed memory, which will *definitely* not be discardable--I realize that we're not there yet, of course. But here, it's reasonable to commit to just retrying indefinitely, really. Memory should eventually show up. And if it doesn't, then restarting the machine is better than corrupting data, generally. thanks, -- John Hubbard NVIDIA
[Nouveau] [PATCH -next] drm/nouveau/disp: fix cast removes address space of expression warnings
When build Linux kernel with 'make C=2', encounter the following warnings: ./drivers/gpu/drm/nouveau/dispnv50/disp.c:134:34: warning: cast removes address space '__iomem' of expression ./drivers/gpu/drm/nouveau/dispnv50/disp.c:197:34: warning: cast removes address space '__iomem' of expression The data type of dmac->_push.mem.object.map.ptr is 'void __iomem *', but converted to 'u32 *' directly and cause above warnings, now recover their data types to fix these warnings. Signed-off-by: ruanjinjie --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 33c97d510999..aa94f8e284dd 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -131,7 +131,7 @@ nv50_dmac_kick(struct nvif_push *push) { struct nv50_dmac *dmac = container_of(push, typeof(*dmac), _push); - dmac->cur = push->cur - (u32 *)dmac->_push.mem.object.map.ptr; + dmac->cur = push->cur - (u32 __iomem *)dmac->_push.mem.object.map.ptr; if (dmac->put != dmac->cur) { /* Push buffer fetches are not coherent with BAR1, we need to ensure * writes have been flushed right through to VRAM before writing PUT. @@ -194,7 +194,7 @@ nv50_dmac_wait(struct nvif_push *push, u32 size) if (WARN_ON(size > dmac->max)) return -EINVAL; - dmac->cur = push->cur - (u32 *)dmac->_push.mem.object.map.ptr; + dmac->cur = push->cur - (u32 __iomem *)dmac->_push.mem.object.map.ptr; if (dmac->cur + size >= dmac->max) { int ret = nv50_dmac_wind(dmac); if (ret) -- 2.25.1
[Nouveau] [PATCH -next] drm/nouveau/disp: make gv100_disp_core_mthd_base static
The symbol is not used outside of the file, so mark it static. Fixes the following warning: ./drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c:591:1: warning: symbol 'gv100_disp_core_mthd_base' was not declared. Should it be static? Signed-off-by: ruanjinjie --- drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c index 6b9d49270fa7..347c12a1fcb7 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/gv100.c @@ -587,7 +587,7 @@ gv100_disp_curs = { .user = 73, }; -const struct nvkm_disp_mthd_list +static const struct nvkm_disp_mthd_list gv100_disp_core_mthd_base = { .mthd = 0x, .addr = 0x00, -- 2.25.1
Re: [Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: > When the module is unloaded or a GPU is unbound from the module it is > possible for device private pages to be left mapped in currently running > processes. This leads to a kernel crash when the pages are either freed > or accessed from the CPU because the GPU and associated data structures > and callbacks have all been freed. > > Fix this by migrating any mappings back to normal CPU memory prior to > freeing the GPU memory chunks and associated device private pages. > > Signed-off-by: Alistair Popple > > --- > > I assume the AMD driver might have a similar issue. However I can't see > where device private (or coherent) pages actually get unmapped/freed > during teardown as I couldn't find any relevant calls to > devm_memunmap(), memunmap(), devm_release_mem_region() or > release_mem_region(). So it appears that ZONE_DEVICE pages are not being > properly freed during module unload, unless I'm missing something? I've got no idea, will poke Ben to see if they know the answer to this > --- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- > 1 file changed, 48 insertions(+) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c > b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 66ebbd4..3b247b8 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) > mutex_unlock(>dmem->mutex); > } > > +/* > + * Evict all pages mapping a chunk. > + */ > +void > +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) > +{ > + unsigned long i, npages = range_len(>pagemap.range) >> > PAGE_SHIFT; > + unsigned long *src_pfns, *dst_pfns; > + dma_addr_t *dma_addrs; > + struct nouveau_fence *fence; > + > + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); > + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); > + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); > + > + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, > + npages); > + > + for (i = 0; i < npages; i++) { > + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { > + struct page *dpage; > + > + /* > + * _GFP_NOFAIL because the GPU is going away and there > + * is nothing sensible we can do if we can't copy the > + * data back. > + */ You'll have to excuse me for a moment since this area of nouveau isn't one of my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite retry, in the case of a GPU hotplug event I would assume we would rather just stop trying to migrate things to the GPU and just drop the data instead of hanging on infinite retries. > + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); > + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); > + nouveau_dmem_copy_one(chunk->drm, > + migrate_pfn_to_page(src_pfns[i]), dpage, > + _addrs[i]); > + } > + } > + > + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, ); > + migrate_device_pages(src_pfns, dst_pfns, npages); > + nouveau_dmem_fence_done(); > + migrate_device_finalize(src_pfns, dst_pfns, npages); > + kfree(src_pfns); > + kfree(dst_pfns); > + for (i = 0; i < npages; i++) > + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, > DMA_BIDIRECTIONAL); > + kfree(dma_addrs); > +} > + > void > nouveau_dmem_fini(struct nouveau_drm *drm) > { > @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm) > mutex_lock(>dmem->mutex); > > list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) { > + nouveau_dmem_evict_chunk(chunk); > nouveau_bo_unpin(chunk->bo); > nouveau_bo_ref(NULL, >bo); > + WARN_ON(chunk->callocated); > list_del(>list); > memunmap_pages(>pagemap); > release_mem_region(chunk->pagemap.range.start, -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat
Re: [Nouveau] [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: > nouveau_dmem_fault_copy_one() is used during handling of CPU faults via > the migrate_to_ram() callback and is used to copy data from GPU to CPU > memory. It is currently specific to fault handling, however a future > patch implementing eviction of data during teardown needs similar > functionality. > > Refactor out the core functionality so that it is not specific to fault > handling. > > Signed-off-by: Alistair Popple > --- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +-- > 1 file changed, 29 insertions(+), 30 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c > b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index f9234ed..66ebbd4 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct > nouveau_fence **fence) > } > } > > -static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, > - struct vm_fault *vmf, struct migrate_vma *args, > - dma_addr_t *dma_addr) > +static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage, > + struct page *dpage, dma_addr_t *dma_addr) > { > struct device *dev = drm->dev->dev; > - struct page *dpage, *spage; > - struct nouveau_svmm *svmm; > - > - spage = migrate_pfn_to_page(args->src[0]); > - if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE)) > - return 0; > > - dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); > - if (!dpage) > - return VM_FAULT_SIGBUS; > lock_page(dpage); > > *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); > if (dma_mapping_error(dev, *dma_addr)) > - goto error_free_page; > + return -EIO; > > - svmm = spage->zone_device_data; > - mutex_lock(>mutex); > - nouveau_svmm_invalidate(svmm, args->start, args->end); > if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr, > - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage))) > - goto error_dma_unmap; > - mutex_unlock(>mutex); > + NOUVEAU_APER_VRAM, > + nouveau_dmem_page_addr(spage))) { > + dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); > + return -EIO; > + } Feel free to just align this with the starting (, as long as it doesn't go above 100 characters it doesn't really matter imho and would look nicer that way. Otherwise: Reviewed-by: Lyude Paul Will look at the other patch in a moment > > - args->dst[0] = migrate_pfn(page_to_pfn(dpage)); > return 0; > - > -error_dma_unmap: > - mutex_unlock(>mutex); > - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); > -error_free_page: > - __free_page(dpage); > - return VM_FAULT_SIGBUS; > } > > static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) > @@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct > vm_fault *vmf) > struct nouveau_drm *drm = page_to_drm(vmf->page); > struct nouveau_dmem *dmem = drm->dmem; > struct nouveau_fence *fence; > + struct nouveau_svmm *svmm; > + struct page *spage, *dpage; > unsigned long src = 0, dst = 0; > dma_addr_t dma_addr = 0; > - vm_fault_t ret; > + vm_fault_t ret = 0; > struct migrate_vma args = { > .vma= vmf->vma, > .start = vmf->address, > @@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct > vm_fault *vmf) > if (!args.cpages) > return 0; > > - ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr); > - if (ret || dst == 0) > + spage = migrate_pfn_to_page(src); > + if (!spage || !(src & MIGRATE_PFN_MIGRATE)) > + goto done; > + > + dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); > + if (!dpage) > + goto done; > + > + dst = migrate_pfn(page_to_pfn(dpage)); > + > + svmm = spage->zone_device_data; > + mutex_lock(>mutex); > + nouveau_svmm_invalidate(svmm, args.start, args.end); > + ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr); > + mutex_unlock(>mutex); > + if (ret) { > + ret = VM_FAULT_SIGBUS; > goto done; > + } > > nouveau_fence_new(dmem->migrate.chan, false, ); > migrate_vma_pages(); -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat
Re: [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount
On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: > Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page > refcount") device private pages have no longer had an extra reference > count when the page is in use. However before handing them back to the > owning device driver we add an extra reference count such that free > pages have a reference count of one. > > This makes it difficult to tell if a page is free or not because both > free and in use pages will have a non-zero refcount. Instead we should > return pages to the drivers page allocator with a zero reference count. > Kernel code can then safely use kernel functions such as > get_page_unless_zero(). > > Signed-off-by: Alistair Popple > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + > drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + > lib/test_hmm.c | 1 + > mm/memremap.c| 5 - > mm/page_alloc.c | 6 ++ > 6 files changed, 10 insertions(+), 5 deletions(-) I think this is a great idea, but I'm surprised no dax stuff is touched here? Jason
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
Hi Am 26.09.22 um 14:42 schrieb Maxime Ripard: On Mon, Sep 26, 2022 at 01:17:52PM +0200, Thomas Zimmermann wrote: Hi Am 26.09.22 um 12:34 schrieb Geert Uytterhoeven: Hi Maxime, On Mon, Sep 26, 2022 at 12:17 PM Maxime Ripard wrote: On Fri, Sep 23, 2022 at 11:05:48AM +0200, Thomas Zimmermann wrote: + /* 63.556us * 13.5MHz = 858 pixels */ I kind of get what the comment wants to tell me, but the units don't add up. I'm not sure how it doesn't add up? We have a frequency in Hz (equivalent to s^-1) and a duration in s, so the result ends up with no dimension, which is to be expected for a number of periods? To make the units add up, it should be 13.5 Mpixel/s (which is what a pixel clock of 13.5 MHz really means ;-) Sort of. It leaves the time value as a magic number, which obfuscates what's happening. The unit for htotal is pixels/scanline because if you multiply it with the number of scanlines per frame (which is in vtotal), you get pixels/frame. Multiplying with the frames per second results in the pixel clock in pixels/second. That's true, but both are true? I'm not quite sure what you mean. I tried to say that this magic time value makes all this hard to see. That's a bit much for this comment. Hence, I suggested to remove these comments entirely and document the relation among the numbers in a more prominent location. The documentation for drm_display_mode would be a good place, I guess. I'm not sure I understand what it's about. It's an explicit requirement of PAL and NTSC, why would something so specific be in the generic definition of drm_display_mode? Not just TV signals, it's the case for all displays were we control the electron beam in some way (VGA). Such documentation could therefore be added to DRM in an appropriate place. That makes it easier for newcomers to see why certain modes are defined the way they are. (At first, display modes can look like they are made up randomly.) For your test cases, maybe simply refer to the relevant standard documents. Best regards Thomas Maxime -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [Nouveau] [PATCH v2 09/33] drm/connector: Add TV standard property
Den 26.09.2022 12.01, skrev Maxime Ripard: > On Sat, Sep 24, 2022 at 05:52:29PM +0200, Noralf Trønnes wrote: >> Den 22.09.2022 16.25, skrev Maxime Ripard: >>> The TV mode property has been around for a while now to select and get the >>> current TV mode output on an analog TV connector. >>> >>> Despite that property name being generic, its content isn't and has been >>> driver-specific which makes it hard to build any generic behaviour on top >>> of it, both in kernel and user-space. >>> >>> Let's create a new enum tv norm property, that can contain any of the >>> analog TV standards currently supported by kernel drivers. Each driver can >>> then pass in a bitmask of the modes it supports, and the property >>> creation function will filter out the modes not supported. >>> >>> We'll then be able to phase out the older tv mode property. >>> >>> Signed-off-by: Maxime Ripard >> >> Please can you add per patch changelogs, it's hard to review when I have >> to recall what might have happened with each patch. If you do it drm >> style and put in the commit message it should be easy enough to do. > > I certainly don't want to start that discussion, but I'm really not a > fan of that format either. I'll do it for that series if you prefer. > The format isn't important, but especially a big series like this and being weeks between each iteration it's difficult to follow and see which review comments that you have chosen to implement and how. It's almost a full review each time. Even if I see that I have acked/rewieved a patch, if I don't remember, I have to go back to the previous version and see if I had any comments and if you followed up on that. >>> +/** >>> + * enum drm_connector_tv_mode - Analog TV output mode >>> + * >>> + * This enum is used to indicate the TV output mode used on an analog TV >>> + * connector. >>> + * >>> + * WARNING: The values of this enum is uABI since they're exposed in the >>> + * "TV mode" connector property. >>> + */ >>> +enum drm_connector_tv_mode { >>> + /** >>> +* @DRM_MODE_TV_MODE_NONE: Placeholder to not default on one >>> +* variant or the other when nothing is set. >>> +*/ >>> + DRM_MODE_TV_MODE_NONE = 0, >> >> How is this supposed to be used? > > It's not supposed to be used. It was a suggestion from Mateusz to avoid > to default to any standard when we don't initialize something. I don't > have any strong feeling about it, so I can drop it if you prefer. > The confusing thing to me is that "None" is part of the property enum list, so the idea is that it can end up in userspace if there's a driver error? Hmm, that won't work since TV_MODE_NONE won't be part of the bitmask that the driver sets. So userspace reading the property ends up with a value for which there's no enum name to match. So usespace should be trained to know that zero for this property is a driver error? No, not a good idea. I think to catch a bug like this drm_atomic_connector_get_property() should check the tv.mode value and see if it's a legal enum value and if not it has to just pick a legal one and print an error. But I'm not sure it's worth it to catch a bug like this. And I don't see any other enum properties being checked for validity either before being returned to userspace. Based on this reasoning I think you should drop the NONE value. Noralf.
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
On Mon, Sep 26, 2022 at 01:17:52PM +0200, Thomas Zimmermann wrote: > Hi > > Am 26.09.22 um 12:34 schrieb Geert Uytterhoeven: > > Hi Maxime, > > > > On Mon, Sep 26, 2022 at 12:17 PM Maxime Ripard wrote: > > > On Fri, Sep 23, 2022 at 11:05:48AM +0200, Thomas Zimmermann wrote: > > > > > + /* 63.556us * 13.5MHz = 858 pixels */ > > > > > > > > I kind of get what the comment wants to tell me, but the units don't > > > > add up. > > > > > > I'm not sure how it doesn't add up? > > > > > > We have a frequency in Hz (equivalent to s^-1) and a duration in s, so > > > the result ends up with no dimension, which is to be expected for a > > > number of periods? > > > > To make the units add up, it should be 13.5 Mpixel/s > > (which is what a pixel clock of 13.5 MHz really means ;-) > > Sort of. It leaves the time value as a magic number, which obfuscates what's > happening. > > The unit for htotal is pixels/scanline because if you multiply it with the > number of scanlines per frame (which is in vtotal), you get pixels/frame. > Multiplying with the frames per second results in the pixel clock in > pixels/second. That's true, but both are true? > That's a bit much for this comment. Hence, I suggested to remove these > comments entirely and document the relation among the numbers in a more > prominent location. The documentation for drm_display_mode would be a good > place, I guess. I'm not sure I understand what it's about. It's an explicit requirement of PAL and NTSC, why would something so specific be in the generic definition of drm_display_mode? Maxime signature.asc Description: PGP signature
Re: [Nouveau] [PATCH v2 06/33] drm/connector: Rename legacy TV property
Hi Am 26.09.22 um 11:50 schrieb Maxime Ripard: Hi Thomas, On Fri, Sep 23, 2022 at 10:19:08AM +0200, Thomas Zimmermann wrote: Hi Am 22.09.22 um 16:25 schrieb Maxime Ripard: The current tv_mode has driver-specific values that don't allow to easily share code using it, either at the userspace or kernel level. Since we're going to introduce a new, generic, property that fit the same purpose, let's rename this one to legacy_tv_mode to make it obvious we should move away from it. Signed-off-by: Maxime Ripard It's not wrong, but 'legacy' is already overloaded with meaning. If you can, maybe name it 'driver_tv_mode_property' or 'custom_tv_mode_property' instead. Acked-by: Thomas Zimmermann I'd really like to point out that new drivers shouldn't be using this. If we're using either of your proposals then writers might get the impression that this is ok to us. Would you prefer deprecated to legacy? I'm merely suggesting. Call it legacy then, so you don't have to rework all of the patches. Best regards Thomas Maxime -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [Nouveau] [PATCH v2 02/33] drm/tests: Add Kunit Helpers
Den 26.09.2022 11.36, skrev Maxime Ripard: > Hi Noralf, > > On Sat, Sep 24, 2022 at 08:06:17PM +0200, Noralf Trønnes wrote: >> Den 24.09.2022 19.56, skrev Noralf Trønnes: >>> >>> >>> Den 22.09.2022 16.25, skrev Maxime Ripard: As the number of kunit tests in KMS grows further, we start to have multiple test suites that, for example, need to register a mock DRM driver to interact with the KMS function they are supposed to test. Let's add a file meant to provide those kind of helpers to avoid duplication. Signed-off-by: Maxime Ripard diff --git a/drivers/gpu/drm/tests/Makefile b/drivers/gpu/drm/tests/Makefile index 2d9f49b62ecb..b29ef1085cad 100644 --- a/drivers/gpu/drm/tests/Makefile +++ b/drivers/gpu/drm/tests/Makefile @@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \ drm_format_helper_test.o \ drm_format_test.o \ drm_framebuffer_test.o \ + drm_kunit_helpers.o \ drm_mm_test.o \ drm_plane_helper_test.o \ drm_rect_test.o diff --git a/drivers/gpu/drm/tests/drm_kunit_helpers.c b/drivers/gpu/drm/tests/drm_kunit_helpers.c new file mode 100644 index ..7ebd620481c1 --- /dev/null +++ b/drivers/gpu/drm/tests/drm_kunit_helpers.c @@ -0,0 +1,54 @@ +#include +#include + +#include + +static const struct drm_mode_config_funcs drm_mode_config_funcs = { +}; + +static const struct drm_driver drm_mode_driver = { +}; + +static void drm_kunit_free_device(struct drm_device *drm, void *ptr) +{ + struct device *dev = ptr; + + root_device_unregister(dev); +} + +struct drm_device *drm_kunit_device_init(const char *name) +{ + struct drm_device *drm; + struct device *dev; + int ret; + + dev = root_device_register(name); + if (IS_ERR(dev)) + return ERR_CAST(dev); + + drm = drm_dev_alloc(_mode_driver, dev); >>> >>> I can't find drm being freed anywhere? >>> Maybe you could assign it to drm->managed.final_kfree. > > There's a drm_dev_put in the test_exit hook which should free it. > I see now, there's a drmm_add_final_kfree() in drm_dev_alloc(). Noralf. >> Perhaps a better solution would be to use devm_drm_dev_alloc() and >> unregister the root device on exit. That avoids reaching into the drm >> managed internals and it looks more like a regular driver. > > But yeah, this is a good idea, I'll do it. > > Maxime
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
Hi Am 26.09.22 um 12:34 schrieb Geert Uytterhoeven: Hi Maxime, On Mon, Sep 26, 2022 at 12:17 PM Maxime Ripard wrote: On Fri, Sep 23, 2022 at 11:05:48AM +0200, Thomas Zimmermann wrote: + /* 63.556us * 13.5MHz = 858 pixels */ I kind of get what the comment wants to tell me, but the units don't add up. I'm not sure how it doesn't add up? We have a frequency in Hz (equivalent to s^-1) and a duration in s, so the result ends up with no dimension, which is to be expected for a number of periods? To make the units add up, it should be 13.5 Mpixel/s (which is what a pixel clock of 13.5 MHz really means ;-) Sort of. It leaves the time value as a magic number, which obfuscates what's happening. The unit for htotal is pixels/scanline because if you multiply it with the number of scanlines per frame (which is in vtotal), you get pixels/frame. Multiplying with the frames per second results in the pixel clock in pixels/second. That's a bit much for this comment. Hence, I suggested to remove these comments entirely and document the relation among the numbers in a more prominent location. The documentation for drm_display_mode would be a good place, I guess. Best regards Thomas Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
Hi Am 26.09.22 um 12:18 schrieb Maxime Ripard: On Fri, Sep 23, 2022 at 12:16:13PM +0200, Thomas Zimmermann wrote: Hi Am 23.09.22 um 11:18 schrieb Jani Nikula: On Fri, 23 Sep 2022, Thomas Zimmermann wrote: Am 22.09.22 um 16:25 schrieb Maxime Ripard: + drm_dbg_kms(dev, + "Generating a %ux%u%c, %u-line mode with a %lu kHz clock\n", + hactive, vactive, + interlace ? 'i' : 'p', + params->num_lines, + pixel_clock_hz / 1000); Divide by HZ_PER_KHZ here and in other places. https://elixir.bootlin.com/linux/latest/source/include/linux/units.h#L23 From the Department of Bikeshedding: I find "pixel_clock_hz / 1000" has much more clarity than "pixel_clock_hz / HZ_PER_KHZ". This one's easy to see because it tells you with the _hz postfix. Many places don't and then it quickly gets confusing what units the code's converting. So if I add it to places that don't have it explicitly (ie, tests) would that be acceptable to both of you? I'm OK with either. Or just leave it as-is. A HZ_TO_KHZ macro would be nice, but that's beyond this patchset. Best regards Thomas Maxime -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
Hi Maxime, On Mon, Sep 26, 2022 at 12:17 PM Maxime Ripard wrote: > On Fri, Sep 23, 2022 at 11:05:48AM +0200, Thomas Zimmermann wrote: > > > + /* 63.556us * 13.5MHz = 858 pixels */ > > > > I kind of get what the comment wants to tell me, but the units don't add up. > > I'm not sure how it doesn't add up? > > We have a frequency in Hz (equivalent to s^-1) and a duration in s, so > the result ends up with no dimension, which is to be expected for a > number of periods? To make the units add up, it should be 13.5 Mpixel/s (which is what a pixel clock of 13.5 MHz really means ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
On Fri, Sep 23, 2022 at 12:16:13PM +0200, Thomas Zimmermann wrote: > Hi > > Am 23.09.22 um 11:18 schrieb Jani Nikula: > > On Fri, 23 Sep 2022, Thomas Zimmermann wrote: > > > Am 22.09.22 um 16:25 schrieb Maxime Ripard: > > > > + drm_dbg_kms(dev, > > > > + "Generating a %ux%u%c, %u-line mode with a %lu kHz > > > > clock\n", > > > > + hactive, vactive, > > > > + interlace ? 'i' : 'p', > > > > + params->num_lines, > > > > + pixel_clock_hz / 1000); > > > > > > Divide by HZ_PER_KHZ here and in other places. > > > > > > > > > https://elixir.bootlin.com/linux/latest/source/include/linux/units.h#L23 > > > > From the Department of Bikeshedding: > > > > I find "pixel_clock_hz / 1000" has much more clarity than > > "pixel_clock_hz / HZ_PER_KHZ". > > This one's easy to see because it tells you with the _hz postfix. Many > places don't and then it quickly gets confusing what units the code's > converting. So if I add it to places that don't have it explicitly (ie, tests) would that be acceptable to both of you? Maxime signature.asc Description: PGP signature
Re: [Nouveau] [PATCH v2 10/33] drm/modes: Add a function to generate analog display modes
Hi, On Fri, Sep 23, 2022 at 11:05:48AM +0200, Thomas Zimmermann wrote: > > + /* 63.556us * 13.5MHz = 858 pixels */ > > I kind of get what the comment wants to tell me, but the units don't add up. I'm not sure how it doesn't add up? We have a frequency in Hz (equivalent to s^-1) and a duration in s, so the result ends up with no dimension, which is to be expected for a number of periods? If you're talking about the comment itself, then NTSC mandates that a line is 63.556us long. If we're using a pixel clock at 13.5 MHz, it means that the period (== pixel) is ~74ns, so we get 63556 / 74 = 858 pixels / line. > I think you want to end up with 858 pixels/line = > > 13,5 pixels/second / (60/2I frame/second * 525 lines/frame) > > I: interlaced > > Maybe just remove the short comments and document that in a more meaningful > place. I guess this is where it's meaningful, we really want to hit that target. BT601 also mandates it. Maxime signature.asc Description: PGP signature
Re: [Nouveau] [PATCH v2 09/33] drm/connector: Add TV standard property
On Sat, Sep 24, 2022 at 05:52:29PM +0200, Noralf Trønnes wrote: > Den 22.09.2022 16.25, skrev Maxime Ripard: > > The TV mode property has been around for a while now to select and get the > > current TV mode output on an analog TV connector. > > > > Despite that property name being generic, its content isn't and has been > > driver-specific which makes it hard to build any generic behaviour on top > > of it, both in kernel and user-space. > > > > Let's create a new enum tv norm property, that can contain any of the > > analog TV standards currently supported by kernel drivers. Each driver can > > then pass in a bitmask of the modes it supports, and the property > > creation function will filter out the modes not supported. > > > > We'll then be able to phase out the older tv mode property. > > > > Signed-off-by: Maxime Ripard > > Please can you add per patch changelogs, it's hard to review when I have > to recall what might have happened with each patch. If you do it drm > style and put in the commit message it should be easy enough to do. I certainly don't want to start that discussion, but I'm really not a fan of that format either. I'll do it for that series if you prefer. > > +/** > > + * enum drm_connector_tv_mode - Analog TV output mode > > + * > > + * This enum is used to indicate the TV output mode used on an analog TV > > + * connector. > > + * > > + * WARNING: The values of this enum is uABI since they're exposed in the > > + * "TV mode" connector property. > > + */ > > +enum drm_connector_tv_mode { > > + /** > > +* @DRM_MODE_TV_MODE_NONE: Placeholder to not default on one > > +* variant or the other when nothing is set. > > +*/ > > + DRM_MODE_TV_MODE_NONE = 0, > > How is this supposed to be used? It's not supposed to be used. It was a suggestion from Mateusz to avoid to default to any standard when we don't initialize something. I don't have any strong feeling about it, so I can drop it if you prefer. Maxime signature.asc Description: PGP signature
Re: [Nouveau] [PATCH v2 06/33] drm/connector: Rename legacy TV property
Hi Thomas, On Fri, Sep 23, 2022 at 10:19:08AM +0200, Thomas Zimmermann wrote: > Hi > > Am 22.09.22 um 16:25 schrieb Maxime Ripard: > > The current tv_mode has driver-specific values that don't allow to > > easily share code using it, either at the userspace or kernel level. > > > > Since we're going to introduce a new, generic, property that fit the > > same purpose, let's rename this one to legacy_tv_mode to make it > > obvious we should move away from it. > > > > Signed-off-by: Maxime Ripard > > It's not wrong, but 'legacy' is already overloaded with meaning. If you can, > maybe name it 'driver_tv_mode_property' or 'custom_tv_mode_property' > instead. > > Acked-by: Thomas Zimmermann I'd really like to point out that new drivers shouldn't be using this. If we're using either of your proposals then writers might get the impression that this is ok to us. Would you prefer deprecated to legacy? Maxime signature.asc Description: PGP signature
Re: [Nouveau] [PATCH v2 02/33] drm/tests: Add Kunit Helpers
Hi Noralf, On Sat, Sep 24, 2022 at 08:06:17PM +0200, Noralf Trønnes wrote: > Den 24.09.2022 19.56, skrev Noralf Trønnes: > > > > > > Den 22.09.2022 16.25, skrev Maxime Ripard: > >> As the number of kunit tests in KMS grows further, we start to have > >> multiple test suites that, for example, need to register a mock DRM > >> driver to interact with the KMS function they are supposed to test. > >> > >> Let's add a file meant to provide those kind of helpers to avoid > >> duplication. > >> > >> Signed-off-by: Maxime Ripard > >> > >> diff --git a/drivers/gpu/drm/tests/Makefile > >> b/drivers/gpu/drm/tests/Makefile > >> index 2d9f49b62ecb..b29ef1085cad 100644 > >> --- a/drivers/gpu/drm/tests/Makefile > >> +++ b/drivers/gpu/drm/tests/Makefile > >> @@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_KUNIT_TEST) += \ > >>drm_format_helper_test.o \ > >>drm_format_test.o \ > >>drm_framebuffer_test.o \ > >> + drm_kunit_helpers.o \ > >>drm_mm_test.o \ > >>drm_plane_helper_test.o \ > >>drm_rect_test.o > >> diff --git a/drivers/gpu/drm/tests/drm_kunit_helpers.c > >> b/drivers/gpu/drm/tests/drm_kunit_helpers.c > >> new file mode 100644 > >> index ..7ebd620481c1 > >> --- /dev/null > >> +++ b/drivers/gpu/drm/tests/drm_kunit_helpers.c > >> @@ -0,0 +1,54 @@ > >> +#include > >> +#include > >> + > >> +#include > >> + > >> +static const struct drm_mode_config_funcs drm_mode_config_funcs = { > >> +}; > >> + > >> +static const struct drm_driver drm_mode_driver = { > >> +}; > >> + > >> +static void drm_kunit_free_device(struct drm_device *drm, void *ptr) > >> +{ > >> + struct device *dev = ptr; > >> + > >> + root_device_unregister(dev); > >> +} > >> + > >> +struct drm_device *drm_kunit_device_init(const char *name) > >> +{ > >> + struct drm_device *drm; > >> + struct device *dev; > >> + int ret; > >> + > >> + dev = root_device_register(name); > >> + if (IS_ERR(dev)) > >> + return ERR_CAST(dev); > >> + > >> + drm = drm_dev_alloc(_mode_driver, dev); > > > > I can't find drm being freed anywhere? > > Maybe you could assign it to drm->managed.final_kfree. There's a drm_dev_put in the test_exit hook which should free it. > Perhaps a better solution would be to use devm_drm_dev_alloc() and > unregister the root device on exit. That avoids reaching into the drm > managed internals and it looks more like a regular driver. But yeah, this is a good idea, I'll do it. Maxime signature.asc Description: PGP signature
[Nouveau] [PATCH 7/7] hmm-tests: Add test for migrate_device_range()
Signed-off-by: Alistair Popple --- lib/test_hmm.c | 119 +- lib/test_hmm_uapi.h| 1 +- tools/testing/selftests/vm/hmm-tests.c | 49 +++- 3 files changed, 148 insertions(+), 21 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 2bd3a67..d2821dd 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -100,6 +100,7 @@ struct dmirror { struct dmirror_chunk { struct dev_pagemap pagemap; struct dmirror_device *mdevice; + bool remove; }; /* @@ -192,11 +193,15 @@ static int dmirror_fops_release(struct inode *inode, struct file *filp) return 0; } +static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page) +{ + return container_of(page->pgmap, struct dmirror_chunk, pagemap); +} + static struct dmirror_device *dmirror_page_to_device(struct page *page) { - return container_of(page->pgmap, struct dmirror_chunk, - pagemap)->mdevice; + return dmirror_page_to_chunk(page)->mdevice; } static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range) @@ -1219,6 +1224,84 @@ static int dmirror_snapshot(struct dmirror *dmirror, return ret; } +static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) +{ + unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT; + unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT; + unsigned long npages = end_pfn - start_pfn + 1; + unsigned long i; + unsigned long *src_pfns; + unsigned long *dst_pfns; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + + migrate_device_range(src_pfns, start_pfn, npages); + for (i = 0; i < npages; i++) { + struct page *dpage, *spage; + + spage = migrate_pfn_to_page(src_pfns[i]); + if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE)) + continue; + + if (WARN_ON(!is_device_private_page(spage) && + !is_device_coherent_page(spage))) + continue; + spage = BACKING_PAGE(spage); + dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL); + lock_page(dpage); + copy_highpage(dpage, spage); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + if (src_pfns[i] & MIGRATE_PFN_WRITE) + dst_pfns[i] |= MIGRATE_PFN_WRITE; + } + migrate_device_pages(src_pfns, dst_pfns, npages); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); +} + +/* Removes free pages from the free list so they can't be re-allocated */ +static void dmirror_remove_free_pages(struct dmirror_chunk *devmem) +{ + struct dmirror_device *mdevice = devmem->mdevice; + struct page *page; + + for (page = mdevice->free_pages; page; page = page->zone_device_data) + if (dmirror_page_to_chunk(page) == devmem) + mdevice->free_pages = page->zone_device_data; +} + +static void dmirror_device_remove_chunks(struct dmirror_device *mdevice) +{ + unsigned int i; + + mutex_lock(>devmem_lock); + if (mdevice->devmem_chunks) { + for (i = 0; i < mdevice->devmem_count; i++) { + struct dmirror_chunk *devmem = + mdevice->devmem_chunks[i]; + + spin_lock(>lock); + devmem->remove = true; + dmirror_remove_free_pages(devmem); + spin_unlock(>lock); + + dmirror_device_evict_chunk(devmem); + memunmap_pages(>pagemap); + if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE) + release_mem_region(devmem->pagemap.range.start, + range_len(>pagemap.range)); + kfree(devmem); + } + mdevice->devmem_count = 0; + mdevice->devmem_capacity = 0; + mdevice->free_pages = NULL; + kfree(mdevice->devmem_chunks); + } + mutex_unlock(>devmem_lock); +} + static long dmirror_fops_unlocked_ioctl(struct file *filp, unsigned int command, unsigned long arg) @@ -1273,6 +1356,11 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, ret = dmirror_snapshot(dmirror, ); break; + case HMM_DMIRROR_RELEASE: + dmirror_device_remove_chunks(dmirror->mdevice); + ret = 0; + break; + default: return -EINVAL; } @@ -1327,9 +1415,13 @@ static void
[Nouveau] [PATCH 6/7] nouveau/dmem: Evict device private memory during release
When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(>dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(>pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* +* _GFP_NOFAIL because the GPU is going away and there +* is nothing sensible we can do if we can't copy the +* data back. +*/ + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + nouveau_dmem_copy_one(chunk->drm, + migrate_pfn_to_page(src_pfns[i]), dpage, + _addrs[i]); + } + } + + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, ); + migrate_device_pages(src_pfns, dst_pfns, npages); + nouveau_dmem_fence_done(); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); + for (i = 0; i < npages; i++) + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL); + kfree(dma_addrs); +} + void nouveau_dmem_fini(struct nouveau_drm *drm) { @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm) mutex_lock(>dmem->mutex); list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) { + nouveau_dmem_evict_chunk(chunk); nouveau_bo_unpin(chunk->bo); nouveau_bo_ref(NULL, >bo); + WARN_ON(chunk->callocated); list_del(>list); memunmap_pages(>pagemap); release_mem_region(chunk->pagemap.range.start, -- git-series 0.9.1
[Nouveau] [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
nouveau_dmem_fault_copy_one() is used during handling of CPU faults via the migrate_to_ram() callback and is used to copy data from GPU to CPU memory. It is currently specific to fault handling, however a future patch implementing eviction of data during teardown needs similar functionality. Refactor out the core functionality so that it is not specific to fault handling. Signed-off-by: Alistair Popple --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +-- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index f9234ed..66ebbd4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct nouveau_fence **fence) } } -static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, - struct vm_fault *vmf, struct migrate_vma *args, - dma_addr_t *dma_addr) +static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage, + struct page *dpage, dma_addr_t *dma_addr) { struct device *dev = drm->dev->dev; - struct page *dpage, *spage; - struct nouveau_svmm *svmm; - - spage = migrate_pfn_to_page(args->src[0]); - if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE)) - return 0; - dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); - if (!dpage) - return VM_FAULT_SIGBUS; lock_page(dpage); *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); if (dma_mapping_error(dev, *dma_addr)) - goto error_free_page; + return -EIO; - svmm = spage->zone_device_data; - mutex_lock(>mutex); - nouveau_svmm_invalidate(svmm, args->start, args->end); if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr, - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage))) - goto error_dma_unmap; - mutex_unlock(>mutex); +NOUVEAU_APER_VRAM, +nouveau_dmem_page_addr(spage))) { + dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + return -EIO; + } - args->dst[0] = migrate_pfn(page_to_pfn(dpage)); return 0; - -error_dma_unmap: - mutex_unlock(>mutex); - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); -error_free_page: - __free_page(dpage); - return VM_FAULT_SIGBUS; } static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) @@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) struct nouveau_drm *drm = page_to_drm(vmf->page); struct nouveau_dmem *dmem = drm->dmem; struct nouveau_fence *fence; + struct nouveau_svmm *svmm; + struct page *spage, *dpage; unsigned long src = 0, dst = 0; dma_addr_t dma_addr = 0; - vm_fault_t ret; + vm_fault_t ret = 0; struct migrate_vma args = { .vma= vmf->vma, .start = vmf->address, @@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) if (!args.cpages) return 0; - ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr); - if (ret || dst == 0) + spage = migrate_pfn_to_page(src); + if (!spage || !(src & MIGRATE_PFN_MIGRATE)) + goto done; + + dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); + if (!dpage) + goto done; + + dst = migrate_pfn(page_to_pfn(dpage)); + + svmm = spage->zone_device_data; + mutex_lock(>mutex); + nouveau_svmm_invalidate(svmm, args.start, args.end); + ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr); + mutex_unlock(>mutex); + if (ret) { + ret = VM_FAULT_SIGBUS; goto done; + } nouveau_fence_new(dmem->migrate.chan, false, ); migrate_vma_pages(); -- git-series 0.9.1
[Nouveau] [PATCH 4/7] mm/migrate_device.c: Add migrate_device_range()
Device drivers can use the migrate_vma family of functions to migrate existing private anonymous mappings to device private pages. These pages are backed by memory on the device with drivers being responsible for copying data to and from device memory. Device private pages are freed via the pgmap->page_free() callback when they are unmapped and their refcount drops to zero. Alternatively they may be freed indirectly via migration back to CPU memory in response to a pgmap->migrate_to_ram() callback called whenever the CPU accesses an address mapped to a device private page. In other words drivers cannot control the lifetime of data allocated on the devices and must wait until these pages are freed from userspace. This causes issues when memory needs to reclaimed on the device, either because the device is going away due to a ->release() callback or because another user needs to use the memory. Drivers could use the existing migrate_vma functions to migrate data off the device. However this would require them to track the mappings of each page which is both complicated and not always possible. Instead drivers need to be able to migrate device pages directly so they can free up device memory. To allow that this patch introduces the migrate_device family of functions which are functionally similar to migrate_vma but which skips the initial lookup based on mapping. Signed-off-by: Alistair Popple --- include/linux/migrate.h | 7 +++- mm/migrate_device.c | 89 ++ 2 files changed, 89 insertions(+), 7 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 82ffa47..582cdc7 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -225,6 +225,13 @@ struct migrate_vma { int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); +int migrate_device_range(unsigned long *src_pfns, unsigned long start, + unsigned long npages); +void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, + unsigned long npages); +void migrate_device_finalize(unsigned long *src_pfns, + unsigned long *dst_pfns, unsigned long npages); + #endif /* CONFIG_MIGRATION */ #endif /* _LINUX_MIGRATE_H */ diff --git a/mm/migrate_device.c b/mm/migrate_device.c index ba479b5..824860a 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -681,7 +681,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, *src &= ~MIGRATE_PFN_MIGRATE; } -static void migrate_device_pages(unsigned long *src_pfns, +static void __migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, unsigned long npages, struct migrate_vma *migrate) { @@ -703,6 +703,9 @@ static void migrate_device_pages(unsigned long *src_pfns, if (!page) { unsigned long addr; + if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE)) + continue; + /* * The only time there is no vma is when called from * migrate_device_coherent_page(). However this isn't @@ -710,8 +713,6 @@ static void migrate_device_pages(unsigned long *src_pfns, */ VM_BUG_ON(!migrate); addr = migrate->start + i*PAGE_SIZE; - if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE)) - continue; if (!notified) { notified = true; @@ -767,6 +768,22 @@ static void migrate_device_pages(unsigned long *src_pfns, } /** + * migrate_device_pages() - migrate meta-data from src page to dst page + * @src_pfns: src_pfns returned from migrate_device_range() + * @dst_pfns: array of pfns allocated by the driver to migrate memory to + * @npages: number of pages in the range + * + * Equivalent to migrate_vma_pages(). This is called to migrate struct page + * meta-data from source struct page to destination. + */ +void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, + unsigned long npages) +{ + __migrate_device_pages(src_pfns, dst_pfns, npages, NULL); +} +EXPORT_SYMBOL(migrate_device_pages); + +/** * migrate_vma_pages() - migrate meta-data from src page to dst page * @migrate: migrate struct containing all migration information * @@ -776,12 +793,22 @@ static void migrate_device_pages(unsigned long *src_pfns, */ void migrate_vma_pages(struct migrate_vma *migrate) { - migrate_device_pages(migrate->src, migrate->dst, migrate->npages, migrate); + __migrate_device_pages(migrate->src, migrate->dst, migrate->npages, migrate); } EXPORT_SYMBOL(migrate_vma_pages); -static void
[Nouveau] [PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
migrate_device_coherent_page() reuses the existing migrate_vma family of functions to migrate a specific page without providing a valid mapping or vma. This looks a bit odd because it means we are calling migrate_vma_*() without setting a valid vma, however it was considered acceptable at the time because the details were internal to migrate_device.c and there was only a single user. One of the reasons the details could be kept internal was that this was strictly for migrating device coherent memory. Such memory can be copied directly by the CPU without intervention from a driver. However this isn't true for device private memory, and a future change requires similar functionality for device private memory. So refactor the code into something more sensible for migrating device memory without a vma. Signed-off-by: Alistair Popple --- mm/migrate_device.c | 150 + 1 file changed, 85 insertions(+), 65 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index f756c00..ba479b5 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -345,26 +345,20 @@ static bool migrate_vma_check_page(struct page *page, struct page *fault_page) } /* - * migrate_vma_unmap() - replace page mapping with special migration pte entry - * @migrate: migrate struct containing all migration information - * - * Isolate pages from the LRU and replace mappings (CPU page table pte) with a - * special migration pte entry and check if it has been pinned. Pinned pages are - * restored because we cannot migrate them. - * - * This is the last step before we call the device driver callback to allocate - * destination memory and copy contents of original page over to new page. + * Unmaps pages for migration. Returns number of unmapped pages. */ -static void migrate_vma_unmap(struct migrate_vma *migrate) +static unsigned long migrate_device_unmap(unsigned long *src_pfns, + unsigned long npages, + struct page *fault_page) { - const unsigned long npages = migrate->npages; unsigned long i, restore = 0; bool allow_drain = true; + unsigned long unmapped = 0; lru_add_drain(); for (i = 0; i < npages; i++) { - struct page *page = migrate_pfn_to_page(migrate->src[i]); + struct page *page = migrate_pfn_to_page(src_pfns[i]); struct folio *folio; if (!page) @@ -379,8 +373,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) } if (isolate_lru_page(page)) { - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; - migrate->cpages--; + src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; restore++; continue; } @@ -394,34 +387,54 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) try_to_migrate(folio, 0); if (page_mapped(page) || - !migrate_vma_check_page(page, migrate->fault_page)) { + !migrate_vma_check_page(page, fault_page)) { if (!is_zone_device_page(page)) { get_page(page); putback_lru_page(page); } - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; - migrate->cpages--; + src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; restore++; continue; } + + unmapped++; } for (i = 0; i < npages && restore; i++) { - struct page *page = migrate_pfn_to_page(migrate->src[i]); + struct page *page = migrate_pfn_to_page(src_pfns[i]); struct folio *folio; - if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE)) + if (!page || (src_pfns[i] & MIGRATE_PFN_MIGRATE)) continue; folio = page_folio(page); remove_migration_ptes(folio, folio, false); - migrate->src[i] = 0; + src_pfns[i] = 0; folio_unlock(folio); folio_put(folio); restore--; } + + return unmapped; +} + +/* + * migrate_vma_unmap() - replace page mapping with special migration pte entry + * @migrate: migrate struct containing all migration information + * + * Isolate pages from the LRU and replace mappings (CPU page table pte) with a + * special migration pte entry and check if it has been pinned. Pinned pages are + * restored because we cannot migrate them. + * + * This is the last step before we call the device driver callback to allocate + * destination memory and copy contents of original
[Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page refcount") device private pages have no longer had an extra reference count when the page is in use. However before handing them back to the owning device driver we add an extra reference count such that free pages have a reference count of one. This makes it difficult to tell if a page is free or not because both free and in use pages will have a non-zero refcount. Instead we should return pages to the drivers page allocator with a zero reference count. Kernel code can then safely use kernel functions such as get_page_unless_zero(). Signed-off-by: Alistair Popple --- arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + lib/test_hmm.c | 1 + mm/memremap.c| 5 - mm/page_alloc.c | 6 ++ 6 files changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index d4eacf4..08d2f7d 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -718,6 +718,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) dpage = pfn_to_page(uvmem_pfn); dpage->zone_device_data = pvt; + set_page_count(dpage, 1); lock_page(dpage); return dpage; out_clear: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 776448b..05c2f4d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -223,6 +223,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) page = pfn_to_page(pfn); svm_range_bo_ref(prange->svm_bo); page->zone_device_data = prange->svm_bo; + set_page_count(page, 1); lock_page(page); } diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 1635661..f9234ed 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -326,6 +326,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) return NULL; } + set_page_count(page, 1); lock_page(page); return page; } diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 89463ff..2bd3a67 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -627,6 +627,7 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) goto error; } + set_page_count(dpage, 1); dpage->zone_device_data = rpage; lock_page(dpage); return dpage; diff --git a/mm/memremap.c b/mm/memremap.c index 25029a4..e065171 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -501,11 +501,6 @@ void free_zone_device_page(struct page *page) */ page->mapping = NULL; page->pgmap->ops->page_free(page); - - /* -* Reset the page count to 1 to prepare for handing out the page again. -*/ - set_page_count(page, 1); } #ifdef CONFIG_FS_DAX diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9d49803..67eaab5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6744,6 +6744,12 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, set_pageblock_migratetype(page, MIGRATE_MOVABLE); cond_resched(); } + + /* +* ZONE_DEVICE pages are released directly to the driver page allocator +* which will set the page count to 1 when allocating the page. +*/ + set_page_count(page, 0); } /* -- git-series 0.9.1
[Nouveau] [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page
When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. Signed-off-by: Alistair Popple --- arch/powerpc/kvm/book3s_hv_uvmem.c | 15 ++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++-- drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +--- include/linux/migrate.h | 8 ++- lib/test_hmm.c | 7 ++--- mm/memory.c | 16 +++- mm/migrate.c | 34 ++--- mm/migrate_device.c | 18 + 9 files changed, 87 insertions(+), 41 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 5980063..d4eacf4 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm) static int __kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long page_shift, - struct kvm *kvm, unsigned long gpa) + struct kvm *kvm, unsigned long gpa, struct page *fault_page) { unsigned long src_pfn, dst_pfn = 0; - struct migrate_vma mig; + struct migrate_vma mig = { 0 }; struct page *dpage, *spage; struct kvmppc_uvmem_page_pvt *pvt; unsigned long pfn; @@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, mig.dst = _pfn; mig.pgmap_owner = _uvmem_pgmap; mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; + mig.fault_page = fault_page; /* The requested page is already paged-out, nothing to do */ if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL)) @@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, static inline int kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long page_shift, - struct kvm *kvm, unsigned long gpa) + struct kvm *kvm, unsigned long gpa, + struct page *fault_page) { int ret; mutex_lock(>arch.uvmem_lock); - ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa); + ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa, + fault_page); mutex_unlock(>arch.uvmem_lock); return ret; @@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, bool pagein) { unsigned long src_pfn, dst_pfn = 0; - struct migrate_vma mig; + struct migrate_vma mig = { 0 }; struct page *spage; unsigned long pfn; struct page *dpage; @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf) if (kvmppc_svm_page_out(vmf->vma, vmf->address, vmf->address + PAGE_SIZE, PAGE_SHIFT, - pvt->kvm, pvt->gpa)) + pvt->kvm, pvt->gpa, vmf->page)) return VM_FAULT_SIGBUS; else return 0; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index b059a77..776448b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -409,7 +409,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange, uint64_t npages = (end - start) >> PAGE_SHIFT; struct kfd_process_device *pdd; struct dma_fence *mfence = NULL; - struct migrate_vma migrate; + struct migrate_vma migrate = { 0 }; unsigned long cpages = 0; dma_addr_t *scratch; void *buf; @@ -668,7 +668,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, static long
[Nouveau] [PATCH 0/7] Fix several device private page reference counting issues
This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves changes to both PowerPC and AMD GPU code. Unfortunately I lack the hardware to test on either of these so would appreciate it if someone with access could test those. Alistair Popple (7): mm/memory.c: Fix race when faulting a device private page mm: Free device private pages have zero refcount mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page() mm/migrate_device.c: Add migrate_device_range() nouveau/dmem: Refactor nouveau_dmem_fault_copy_one() nouveau/dmem: Evict device private memory during release hmm-tests: Add test for migrate_device_range() arch/powerpc/kvm/book3s_hv_uvmem.c | 16 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 18 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 108 +++ include/linux/migrate.h | 15 ++- lib/test_hmm.c | 127 ++--- lib/test_hmm_uapi.h | 1 +- mm/memory.c | 16 +- mm/memremap.c| 5 +- mm/migrate.c | 34 +-- mm/migrate_device.c | 239 +--- mm/page_alloc.c | 6 +- tools/testing/selftests/vm/hmm-tests.c | 49 +- 14 files changed, 487 insertions(+), 160 deletions(-) base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786 -- git-series 0.9.1