Re: [PATCH 1/2] drm/amdgpu: use pre-calculated bo size
Am 13.04.21 um 22:57 schrieb Nirmoy Das: Use bo->tbo.base.size instead of calculating it from num_pages. Those don't clash with the two I've send out yesterday, don't they? Signed-off-by: Nirmoy Das Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 1345f7eba011..74ecc0c1863f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; - size = bo->mem.num_pages << PAGE_SHIFT; + size = bo->base.size; offset = bo->mem.start << PAGE_SHIFT; if ((offset + size) <= adev->gmc.visible_vram_size) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..26deace78539 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, return r; } - num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + num_bytes = bo->tbo.base.size; num_loops = 0; amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit
Am 14.04.21 um 08:48 schrieb Felix Kuehling: Pages in SG BOs were not allocated by TTM. So don't count them against TTM's pages limit. Signed-off-by: Felix Kuehling Reviewed-by: Christian König Going to pick that one up for inclusion in drm-misc-next. Regards, Christian. --- drivers/gpu/drm/ttm/ttm_tt.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 5d8820725b75..e8b8c3257392 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ttm_tt_is_populated(ttm)) return 0; - atomic_long_add(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_add(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_add(ttm->num_pages, + &ttm_dma32_pages_allocated); + } while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit || atomic_long_read(&ttm_dma32_pages_allocated) > @@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev, return 0; error: - atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_sub(ttm->num_pages, + &ttm_dma32_pages_allocated); + } return ret; } EXPORT_SYMBOL(ttm_tt_populate); @@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) else ttm_pool_free(&bdev->pool, ttm); - atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_sub(ttm->num_pages, + &ttm_dma32_pages_allocated); + } ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
Am 13.04.21 um 23:19 schrieb Mikhail Gavrilov: On Tue, 13 Apr 2021 at 12:29, Christian König wrote: Hi Mikhail, the crash is a known issue and should be fixed by: commit f63da9ae7584280582cbc834b20cc18bfb203b14 Author: Philip Yang Date: Thu Apr 1 00:22:23 2021 -0400 drm/amdgpu: reserve fence slot to update page table Unfortunately, this commit couldn't fix the initial problem. 1. Result video is jerky if it grabbed and encoded with ffmpeg (h264_vaapi codec). 2. OBS still crashed if I try to record or stream video. 3. In the kernel log still appears the message "amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried to record or stream video by OBS. That is expected behavior, the application is just buggy and causing a page fault on the GPU. The kernel should just not crash with a backtrace. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 9/9] drm/amdgpu: Lock the attached dmabuf in unpopulate
amdgpu_ttm_tt_unpopulate can be called during bo_destroy. The dmabuf->resv must not be held by the caller or dma_buf_detach will deadlock. This is probably not the right fix. I get a recursive lock warning with the reservation held in ttm_bo_release. Should unmap_attachment move to backend_unbind instead? Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..257750921eed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1216,9 +1216,22 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev, if (ttm->sg && gtt->gobj->import_attach) { struct dma_buf_attachment *attach; + bool locked; attach = gtt->gobj->import_attach; + /* FIXME: unpopulate can be called during bo_destroy. +* The dmabuf->resv must not be held by the caller or +* dma_buf_detach will deadlock. This is probably not +* the right fix. I get a recursive lock warning with the +* reservation held in ttm_bo_releas.. Should +* unmap_attachment move to backend_unbind instead? +*/ + locked = dma_resv_is_locked(attach->dmabuf->resv); + if (!locked) + dma_resv_lock(attach->dmabuf->resv, NULL); dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL); + if (!locked) + dma_resv_unlock(attach->dmabuf->resv); ttm->sg = NULL; return; } -- 2.31.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 5/9] drm/amdgpu: DMA map/unmap when updating GPU mappings
DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called with the BO and page tables reserved, so we can safely update the DMA mapping. DMA unmap when a BO is unmapped from a GPU and before updating mappings in restore workers. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 56 ++- 1 file changed, 29 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 51502a07fc1d..3bb2ae185bbb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -964,11 +964,12 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx, return ret; } -static int unmap_bo_from_gpuvm(struct amdgpu_device *adev, +static void unmap_bo_from_gpuvm(struct kgd_mem *mem, struct kfd_mem_attachment *entry, struct amdgpu_sync *sync) { struct amdgpu_bo_va *bo_va = entry->bo_va; + struct amdgpu_device *adev = entry->adev; struct amdgpu_vm *vm = bo_va->base.vm; amdgpu_vm_bo_unmap(adev, bo_va, entry->va); @@ -977,15 +978,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev, amdgpu_sync_fence(sync, bo_va->last_pt_update); - return 0; + kfd_mem_dmaunmap_attachment(mem, entry); } -static int update_gpuvm_pte(struct amdgpu_device *adev, - struct kfd_mem_attachment *entry, - struct amdgpu_sync *sync) +static int update_gpuvm_pte(struct kgd_mem *mem, + struct kfd_mem_attachment *entry, + struct amdgpu_sync *sync) { - int ret; struct amdgpu_bo_va *bo_va = entry->bo_va; + struct amdgpu_device *adev = entry->adev; + int ret; + + ret = kfd_mem_dmamap_attachment(mem, entry); + if (ret) + return ret; /* Update the page tables */ ret = amdgpu_vm_bo_update(adev, bo_va, false); @@ -997,14 +1003,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev, return amdgpu_sync_fence(sync, bo_va->last_pt_update); } -static int map_bo_to_gpuvm(struct amdgpu_device *adev, - struct kfd_mem_attachment *entry, struct amdgpu_sync *sync, - bool no_update_pte) +static int map_bo_to_gpuvm(struct kgd_mem *mem, + struct kfd_mem_attachment *entry, + struct amdgpu_sync *sync, + bool no_update_pte) { int ret; /* Set virtual address for the allocation */ - ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0, + ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0, amdgpu_bo_size(entry->bo_va->base.bo), entry->pte_flags); if (ret) { @@ -1016,7 +1023,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev, if (no_update_pte) return 0; - ret = update_gpuvm_pte(adev, entry, sync); + ret = update_gpuvm_pte(mem, entry, sync); if (ret) { pr_err("update_gpuvm_pte() failed\n"); goto update_gpuvm_pte_failed; @@ -1025,7 +1032,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev, return 0; update_gpuvm_pte_failed: - unmap_bo_from_gpuvm(adev, entry, sync); + unmap_bo_from_gpuvm(mem, entry, sync); return ret; } @@ -1633,7 +1640,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n", entry->va, entry->va + bo_size, entry); - ret = map_bo_to_gpuvm(adev, entry, ctx.sync, + ret = map_bo_to_gpuvm(mem, entry, ctx.sync, is_invalid_userptr); if (ret) { pr_err("Failed to map bo to gpuvm\n"); @@ -1672,7 +1679,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( struct kgd_dev *kgd, struct kgd_mem *mem, void *vm) { - struct amdgpu_device *adev = get_amdgpu_device(kgd); struct amdkfd_process_info *process_info = ((struct amdgpu_vm *)vm)->process_info; unsigned long bo_size = mem->bo->tbo.base.size; @@ -1707,13 +1713,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n", entry->va, entry->va + bo_size, entry); - ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync); - if (ret == 0) { - entry->is_mapped = false; - } else { - pr_err("failed to unmap VA 0x%llx\n", mem->va); - goto unreserve_out; - } + unmap_bo_from_gpuvm(mem,
[PATCH 6/9] drm/amdgpu: Move kfd_mem_attach outside reservation
This is needed to avoid deadlocks with DMA buf import in the next patch. Also move PT/PD validation out of kfd_mem_attach, that way the caller can bo this unconditionally. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 72 +++ 1 file changed, 42 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 3bb2ae185bbb..1416f3c03f1d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -575,6 +575,32 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem, } } +static int +kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem, + struct amdgpu_bo **bo) +{ + unsigned long bo_size = mem->bo->tbo.base.size; + struct drm_gem_object *gobj; + int ret; + + ret = amdgpu_bo_reserve(mem->bo, false); + if (ret) + return ret; + + ret = amdgpu_gem_object_create(adev, bo_size, 1, + AMDGPU_GEM_DOMAIN_CPU, + 0, ttm_bo_type_sg, + mem->bo->tbo.base.resv, + &gobj); + if (ret) + return ret; + + amdgpu_bo_unreserve(mem->bo); + + *bo = gem_to_amdgpu_bo(gobj); + return 0; +} + /* kfd_mem_attach - Add a BO to a VM * * Everything that needs to bo done only once when a BO is first added @@ -596,7 +622,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, uint64_t va = mem->va; struct kfd_mem_attachment *attachment[2] = {NULL, NULL}; struct amdgpu_bo *bo[2] = {NULL, NULL}; - struct drm_gem_object *gobj; int i, ret; if (!va) { @@ -630,14 +655,9 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, } else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) { /* Create an SG BO to DMA-map userptrs on other GPUs */ attachment[i]->type = KFD_MEM_ATT_USERPTR; - ret = amdgpu_gem_object_create(adev, bo_size, 1, - AMDGPU_GEM_DOMAIN_CPU, - 0, ttm_bo_type_sg, - mem->bo->tbo.base.resv, - &gobj); + ret = kfd_mem_attach_userptr(adev, mem, &bo[i]); if (ret) goto unwind; - bo[i] = gem_to_amdgpu_bo(gobj); } else { /* FIXME: Need to DMA-map other BO types */ attachment[i]->type = KFD_MEM_ATT_SHARED; @@ -662,13 +682,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, va += bo_size; } - /* Allocate validate page tables if needed */ - ret = vm_validate_pt_pd_bos(vm); - if (unlikely(ret)) { - pr_err("validate_pt_pd_bos() failed\n"); - goto unwind; - } - return 0; unwind: @@ -1516,12 +1529,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, mem->va + bo_size * (1 + mem->aql_queue)); + ret = unreserve_bo_and_vms(&ctx, false, false); + /* Remove from VM internal data structures */ list_for_each_entry_safe(entry, tmp, &mem->attachments, list) kfd_mem_detach(mem, entry); - ret = unreserve_bo_and_vms(&ctx, false, false); - /* Free the sync object */ amdgpu_sync_free(&mem->sync); @@ -1597,6 +1610,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( mem->va + bo_size * (1 + mem->aql_queue), vm, domain_string(domain)); + if (!kfd_mem_is_attached(avm, mem)) { + ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue); + if (ret) + goto out; + } + ret = reserve_bo_and_vm(mem, vm, &ctx); if (unlikely(ret)) goto out; @@ -1610,15 +1629,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( bo->tbo.mem.mem_type == TTM_PL_SYSTEM) is_invalid_userptr = true; - if (!kfd_mem_is_attached(avm, mem)) { - ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue); - if (ret) - goto attach_failed; - } else { - ret = vm_validate_pt_pd_bos(avm); - if (unlikely(ret)) - goto attach_failed; - } + ret = vm_validate_pt_pd_bos(avm); + if (unlikely(ret)) + goto out_unreserve; if (mem->mapped_to_gpu_memor
[PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit
Pages in SG BOs were not allocated by TTM. So don't count them against TTM's pages limit. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/ttm/ttm_tt.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 5d8820725b75..e8b8c3257392 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev, if (ttm_tt_is_populated(ttm)) return 0; - atomic_long_add(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_add(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_add(ttm->num_pages, + &ttm_dma32_pages_allocated); + } while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit || atomic_long_read(&ttm_dma32_pages_allocated) > @@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev, return 0; error: - atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_sub(ttm->num_pages, + &ttm_dma32_pages_allocated); + } return ret; } EXPORT_SYMBOL(ttm_tt_populate); @@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm) else ttm_pool_free(&bdev->pool, ttm); - atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); - if (bdev->pool.use_dma32) - atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated); + if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) { + atomic_long_sub(ttm->num_pages, &ttm_pages_allocated); + if (bdev->pool.use_dma32) + atomic_long_sub(ttm->num_pages, + &ttm_dma32_pages_allocated); + } ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED; } -- 2.31.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 7/9] drm/amdgpu: Add DMA mapping of GTT BOs
Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 74 ++- 2 files changed, 75 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index fc3514ed1b74..3ea51982b720 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -41,6 +41,7 @@ struct amdgpu_device; enum kfd_mem_attachment_type { KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */ KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */ + KFD_MEM_ATT_DMABUF, /* DMAbuf to DMA map TTM BOs */ }; struct kfd_mem_attachment { @@ -56,6 +57,7 @@ struct kfd_mem_attachment { struct kgd_mem { struct mutex lock; struct amdgpu_bo *bo; + struct dma_buf *dmabuf; struct list_head attachments; /* protected by amdkfd_process_info.lock */ struct ttm_validate_buffer validate_list; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1416f3c03f1d..bb3a96ab8f20 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -522,6 +522,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem, return ret; } +static int +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment) +{ + struct ttm_operation_ctx ctx = {.interruptible = true}; + struct amdgpu_bo *bo = attachment->bo_va->base.bo; + + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT); + return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); +} + static int kfd_mem_dmamap_attachment(struct kgd_mem *mem, struct kfd_mem_attachment *attachment) @@ -531,6 +541,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem, return 0; case KFD_MEM_ATT_USERPTR: return kfd_mem_dmamap_userptr(mem, attachment); + case KFD_MEM_ATT_DMABUF: + return kfd_mem_dmamap_dmabuf(attachment); default: WARN_ON_ONCE(1); } @@ -560,6 +572,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem, ttm->sg = NULL; } +static void +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment) +{ + struct ttm_operation_ctx ctx = {.interruptible = true}; + struct amdgpu_bo *bo = attachment->bo_va->base.bo; + + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); + ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); + /* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is +* called +*/ +} + static void kfd_mem_dmaunmap_attachment(struct kgd_mem *mem, struct kfd_mem_attachment *attachment) @@ -570,6 +595,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem, case KFD_MEM_ATT_USERPTR: kfd_mem_dmaunmap_userptr(mem, attachment); break; + case KFD_MEM_ATT_DMABUF: + kfd_mem_dmaunmap_dmabuf(attachment); + break; default: WARN_ON_ONCE(1); } @@ -601,6 +629,36 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem, return 0; } +static int +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem, + struct amdgpu_bo **bo) +{ + struct drm_gem_object *gobj; + + if (!mem->dmabuf) { + mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base, + mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? + DRM_RDWR : 0); + if (IS_ERR(mem->dmabuf)) { + mem->dmabuf = NULL; + return PTR_ERR(mem->dmabuf); + } + } + + gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf); + if (IS_ERR(gobj)) + return PTR_ERR(gobj); + + /* Import takes an extra reference on the dmabuf. Drop it now to +* avoid leaking it. We only need the one reference in +* kgd_mem->dmabuf. +*/ + dma_buf_put(mem->dmabuf); + + *bo = gem_to_amdgpu_bo(gobj); + return 0; +} + /* kfd_mem_attach - Add a BO to a VM * * Everything that needs to bo done only once when a BO is first added @@ -658,8 +716,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, ret = kfd_mem_attach_userptr(adev, mem, &bo[i]); if (ret) goto unwind; + } else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT && + mem->bo->tbo.type != ttm_bo_type_sg) { + /* GTT BOs use DMA-mapping ability of dynamic-attach +
[PATCH 3/9] drm/amdgpu: Simplify AQL queue mapping
Do AQL queue double-mapping with a single attach call. That will make it easier to create per-GPU BOs later, to be shared between the two BO VA mappings on the same GPU. Freeing the attachments is not necessary if map_to_gpu fails. These will be cleaned up when the kdg_mem object is destroyed in amdgpu_amdkfd_gpuvm_free_memory_of_gpu. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 103 -- 1 file changed, 48 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 40a296ca37b9..114fbf508707 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -484,70 +484,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem) * 4a. Validate new page tables and directories */ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, - struct amdgpu_vm *vm, bool is_aql, - struct kfd_mem_attachment **p_attachment) + struct amdgpu_vm *vm, bool is_aql) { unsigned long bo_size = mem->bo->tbo.base.size; uint64_t va = mem->va; - struct kfd_mem_attachment *attachment; - struct amdgpu_bo *bo; - int ret; + struct kfd_mem_attachment *attachment[2] = {NULL, NULL}; + struct amdgpu_bo *bo[2] = {NULL, NULL}; + int i, ret; if (!va) { pr_err("Invalid VA when adding BO to VM\n"); return -EINVAL; } - if (is_aql) - va += bo_size; - - attachment = kzalloc(sizeof(*attachment), GFP_KERNEL); - if (!attachment) - return -ENOMEM; + for (i = 0; i <= is_aql; i++) { + attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL); + if (unlikely(!attachment[i])) { + ret = -ENOMEM; + goto unwind; + } - pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va, - va + bo_size, vm); + pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va, +va + bo_size, vm); - /* FIXME: For now all attachments use the same BO. This is incorrect -* because one BO can only have one DMA mapping for one GPU. We need -* one BO per GPU, e.g. a DMABuf import with dynamic attachment. This -* will be addressed one BO-type at a time in subsequent patches. -*/ - bo = mem->bo; - drm_gem_object_get(&bo->tbo.base); + /* FIXME: For now all attachments use the same BO. This is +* incorrect because one BO can only have one DMA mapping +* for one GPU. We need one BO per GPU, e.g. a DMABuf +* import with dynamic attachment. This will be addressed +* one BO-type at a time in subsequent patches. +*/ + bo[i] = mem->bo; + drm_gem_object_get(&bo[i]->tbo.base); - /* Add BO to VM internal data structures*/ - attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo); - if (!attachment->bo_va) { - ret = -EINVAL; - pr_err("Failed to add BO object to VM. ret == %d\n", - ret); - goto err_vmadd; - } + /* Add BO to VM internal data structures */ + attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]); + if (unlikely(!attachment[i]->bo_va)) { + ret = -ENOMEM; + pr_err("Failed to add BO object to VM. ret == %d\n", + ret); + goto unwind; + } - attachment->va = va; - attachment->pte_flags = get_pte_flags(adev, mem); - attachment->adev = adev; - list_add(&attachment->list, &mem->attachments); + attachment[i]->va = va; + attachment[i]->pte_flags = get_pte_flags(adev, mem); + attachment[i]->adev = adev; + list_add(&attachment[i]->list, &mem->attachments); - if (p_attachment) - *p_attachment = attachment; + va += bo_size; + } /* Allocate validate page tables if needed */ ret = vm_validate_pt_pd_bos(vm); if (unlikely(ret)) { pr_err("validate_pt_pd_bos() failed\n"); - goto err_alloc_pts; + goto unwind; } return 0; -err_alloc_pts: - amdgpu_vm_bo_rmv(adev, attachment->bo_va); - list_del(&attachment->list); -err_vmadd: - drm_gem_object_put(&bo->tbo.base); - kfree(attachment); +unwind: + for (; i >= 0; i--) { + if (!attachment[i]) + continue; + if (attachment[i]->bo_va) { + amdgpu_vm_bo_rmv(adev,
[PATCH 0/9] Implement multi-GPU DMA mappings for KFD
This patch series fixes DMA-mappings of system memory (GTT and userptr) for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per GPU is needed to maintain the DMA mappings of each BO. I ran into some reservation issues when unmapping or freeing DMA-buf imports. There are a few FIXME comments in this patch series where I'm hoping for some expert advice. Patches 8 and 9 are some related fixes in TTM and amdgpu_ttm. I'm pretty sure patch 9 is not the right way to do this. Felix Kuehling (9): drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment drm/amdgpu: Keep a bo-reference per-attachment drm/amdgpu: Simplify AQL queue mapping drm/amdgpu: Add multi-GPU DMA mapping helpers drm/amdgpu: DMA map/unmap when updating GPU mappings drm/amdgpu: Move kfd_mem_attach outside reservation drm/amdgpu: Add DMA mapping of GTT BOs drm/ttm: Don't count pages in SG BOs against pages_limit drm/amdgpu: Lock the attached dmabuf in unpopulate drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 18 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 535 -- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 + drivers/gpu/drm/ttm/ttm_tt.c | 27 +- 4 files changed, 420 insertions(+), 173 deletions(-) -- 2.31.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/9] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
This name is more fitting, especially for the changes coming next to support multi-GPU systems with proper DMA mappings. Cleaned up the code and renamed some related functions and variables to improve readability. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 8 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 209 +- 2 files changed, 104 insertions(+), 113 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 14f68c028126..910c50956e16 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -38,10 +38,10 @@ extern uint64_t amdgpu_amdkfd_total_mem_size; struct amdgpu_device; -struct kfd_bo_va_list { - struct list_head bo_list; +struct kfd_mem_attachment { + struct list_head list; struct amdgpu_bo_va *bo_va; - void *kgd_dev; + struct amdgpu_device *adev; bool is_mapped; uint64_t va; uint64_t pte_flags; @@ -50,7 +50,7 @@ struct kfd_bo_va_list { struct kgd_mem { struct mutex lock; struct amdgpu_bo *bo; - struct list_head bo_va_list; + struct list_head attachments; /* protected by amdkfd_process_info.lock */ struct ttm_validate_buffer validate_list; struct ttm_validate_buffer resv_list; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6622695a5eed..d021152314ad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -75,16 +75,16 @@ static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd) return (struct amdgpu_device *)kgd; } -static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm, +static bool kfd_mem_is_attached(struct amdgpu_vm *avm, struct kgd_mem *mem) { - struct kfd_bo_va_list *entry; + struct kfd_mem_attachment *entry; - list_for_each_entry(entry, &mem->bo_va_list, bo_list) + list_for_each_entry(entry, &mem->attachments, list) if (entry->bo_va->base.vm == avm) - return false; + return true; - return true; + return false; } /* Set memory usage limits. Current, limits are @@ -471,7 +471,7 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem) return pte_flags; } -/* add_bo_to_vm - Add a BO to a VM +/* kfd_mem_attach - Add a BO to a VM * * Everything that needs to bo done only once when a BO is first added * to a VM. It can later be mapped and unmapped many times without @@ -483,15 +483,14 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem) * 4. Alloc page tables and directories if needed * 4a. Validate new page tables and directories */ -static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem, +static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, struct amdgpu_vm *vm, bool is_aql, - struct kfd_bo_va_list **p_bo_va_entry) + struct kfd_mem_attachment **p_attachment) { int ret; - struct kfd_bo_va_list *bo_va_entry; + struct kfd_mem_attachment *attachment; struct amdgpu_bo *bo = mem->bo; uint64_t va = mem->va; - struct list_head *list_bo_va = &mem->bo_va_list; unsigned long bo_size = bo->tbo.base.size; if (!va) { @@ -502,29 +501,29 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem, if (is_aql) va += bo_size; - bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL); - if (!bo_va_entry) + attachment = kzalloc(sizeof(*attachment), GFP_KERNEL); + if (!attachment) return -ENOMEM; pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va, va + bo_size, vm); /* Add BO to VM internal data structures*/ - bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, vm, bo); - if (!bo_va_entry->bo_va) { + attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo); + if (!attachment->bo_va) { ret = -EINVAL; pr_err("Failed to add BO object to VM. ret == %d\n", ret); goto err_vmadd; } - bo_va_entry->va = va; - bo_va_entry->pte_flags = get_pte_flags(adev, mem); - bo_va_entry->kgd_dev = (void *)adev; - list_add(&bo_va_entry->bo_list, list_bo_va); + attachment->va = va; + attachment->pte_flags = get_pte_flags(adev, mem); + attachment->adev = adev; + list_add(&attachment->list, &mem->attachments); - if (p_bo_va_entry) - *p_bo_va_entry = bo_va_entry; + if (p_attachment) + *p_attachment = attachment; /* Allocate validate pag
[PATCH 4/9] drm/amdgpu: Add multi-GPU DMA mapping helpers
Add BO-type specific helpers functions to DMA-map and unmap kfd_mem_attachments. Implement this functionality for userptrs by creating one SG BO per GPU and filling it with a DMA mapping of the pages from the original mem->bo. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 8 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 155 -- 2 files changed, 152 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 910c50956e16..fc3514ed1b74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size; struct amdgpu_device; +enum kfd_mem_attachment_type { + KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */ + KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */ +}; + struct kfd_mem_attachment { struct list_head list; + enum kfd_mem_attachment_type type; + bool is_mapped; struct amdgpu_bo_va *bo_va; struct amdgpu_device *adev; - bool is_mapped; uint64_t va; uint64_t pte_flags; }; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 114fbf508707..51502a07fc1d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -471,12 +471,117 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem) return pte_flags; } +static int +kfd_mem_dmamap_userptr(struct kgd_mem *mem, + struct kfd_mem_attachment *attachment) +{ + enum dma_data_direction direction = + mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? + DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + struct ttm_operation_ctx ctx = {.interruptible = true}; + struct amdgpu_bo *bo = attachment->bo_va->base.bo; + struct amdgpu_device *adev = attachment->adev; + struct ttm_tt *src_ttm = mem->bo->tbo.ttm; + struct ttm_tt *ttm = bo->tbo.ttm; + int ret; + + ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL); + if (unlikely(!ttm->sg)) + return -ENOMEM; + + if (WARN_ON(ttm->num_pages != src_ttm->num_pages)) + return -EINVAL; + + /* Same sequence as in amdgpu_ttm_tt_pin_userptr */ + ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages, + ttm->num_pages, 0, + (u64)ttm->num_pages << PAGE_SHIFT, + GFP_KERNEL); + if (unlikely(ret)) + goto release_sg; + + ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0); + if (unlikely(ret)) + goto release_sg; + + drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address, + ttm->num_pages); + + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT); + ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); + if (ret) + goto release_sg; + + return 0; + +release_sg: + pr_err("DMA map userptr failed: %d\n", ret); + sg_free_table(ttm->sg); + kfree(ttm->sg); + ttm->sg = NULL; + return ret; +} + +static int +kfd_mem_dmamap_attachment(struct kgd_mem *mem, + struct kfd_mem_attachment *attachment) +{ + switch (attachment->type) { + case KFD_MEM_ATT_SHARED: + return 0; + case KFD_MEM_ATT_USERPTR: + return kfd_mem_dmamap_userptr(mem, attachment); + default: + WARN_ON_ONCE(1); + } + return -EINVAL; +} + +static void +kfd_mem_dmaunmap_userptr(struct kgd_mem *mem, +struct kfd_mem_attachment *attachment) +{ + enum dma_data_direction direction = + mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? + DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + struct ttm_operation_ctx ctx = {.interruptible = false}; + struct amdgpu_bo *bo = attachment->bo_va->base.bo; + struct amdgpu_device *adev = attachment->adev; + struct ttm_tt *ttm = bo->tbo.ttm; + + if (unlikely(!ttm->sg)) + return; + + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); + ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); + + dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0); + sg_free_table(ttm->sg); + ttm->sg = NULL; +} + +static void +kfd_mem_dmaunmap_attachment(struct kgd_mem *mem, + struct kfd_mem_attachment *attachment) +{ + switch (attachment->type) { + case KFD_MEM_ATT_SHARED: + break; + case KFD_MEM_ATT_USERPTR: + kfd_mem_dmaunmap_userptr(mem, attachment); +
[PATCH 2/9] drm/amdgpu: Keep a bo-reference per-attachment
For now they all reference the same BO. For correct DMA mappings they will refer to different BOs per-GPU. Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 22 ++- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index d021152314ad..40a296ca37b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -487,11 +487,11 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, struct amdgpu_vm *vm, bool is_aql, struct kfd_mem_attachment **p_attachment) { - int ret; - struct kfd_mem_attachment *attachment; - struct amdgpu_bo *bo = mem->bo; + unsigned long bo_size = mem->bo->tbo.base.size; uint64_t va = mem->va; - unsigned long bo_size = bo->tbo.base.size; + struct kfd_mem_attachment *attachment; + struct amdgpu_bo *bo; + int ret; if (!va) { pr_err("Invalid VA when adding BO to VM\n"); @@ -508,6 +508,14 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va, va + bo_size, vm); + /* FIXME: For now all attachments use the same BO. This is incorrect +* because one BO can only have one DMA mapping for one GPU. We need +* one BO per GPU, e.g. a DMABuf import with dynamic attachment. This +* will be addressed one BO-type at a time in subsequent patches. +*/ + bo = mem->bo; + drm_gem_object_get(&bo->tbo.base); + /* Add BO to VM internal data structures*/ attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo); if (!attachment->bo_va) { @@ -527,7 +535,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, /* Allocate validate page tables if needed */ ret = vm_validate_pt_pd_bos(vm); - if (ret) { + if (unlikely(ret)) { pr_err("validate_pt_pd_bos() failed\n"); goto err_alloc_pts; } @@ -538,15 +546,19 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, amdgpu_vm_bo_rmv(adev, attachment->bo_va); list_del(&attachment->list); err_vmadd: + drm_gem_object_put(&bo->tbo.base); kfree(attachment); return ret; } static void kfd_mem_detach(struct kfd_mem_attachment *attachment) { + struct amdgpu_bo *bo = attachment->bo_va->base.bo; + pr_debug("\t remove VA 0x%llx in entry %p\n", attachment->va, attachment); amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va); + drm_gem_object_put(&bo->tbo.base); list_del(&attachment->list); kfree(attachment); } -- 2.31.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
Am 13.04.21 um 20:26 schrieb Ramesh Errabolu: Extend current implementation of SG_TABLE construction method to allow exportation of sub-buffers of a VRAM BO. This capability will enable logical partitioning of a VRAM BO into multiple non-overlapping sub-buffers. One example of this use case is to partition a VRAM BO into two sub-buffers, one for SRC and another for DST. Signed-off-by: Ramesh Errabolu Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++-- 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index e0c4f7c7f1b9..baa980a477d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -291,8 +291,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, break; case TTM_PL_VRAM: - r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev, - dir, &sgt); + r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, + bo->tbo.base.size, attach->dev, dir, &sgt); if (r) return ERR_PTR(r); break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index dec0db8b0b13..9e38475e0f8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man); u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo); int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 size, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 592a2dd16493..bce105e2973e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -25,6 +25,7 @@ #include #include "amdgpu.h" #include "amdgpu_vm.h" +#include "amdgpu_res_cursor.h" #include "amdgpu_atomfirmware.h" #include "atom.h" @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, * * @adev: amdgpu device pointer * @mem: TTM memory object + * @offset: byte offset from the base of VRAM BO + * @length: number of bytes to export in sg_table * @dev: the other device * @dir: dma direction * @sgt: resulting sg table @@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, */ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 length, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt) { - struct drm_mm_node *node; + struct amdgpu_res_cursor cursor; struct scatterlist *sg; int num_entries = 0; - unsigned int pages; int i, r; *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL); if (!*sgt) return -ENOMEM; - for (pages = mem->num_pages, node = mem->mm_node; -pages; pages -= node->size, ++node) - ++num_entries; + /* Determine the number of DRM_MM nodes to export */ + amdgpu_res_first(mem, offset, length, &cursor); + while (cursor.remaining) { + num_entries++; + amdgpu_res_next(&cursor, cursor.size); + } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); if (r) goto error_free; + /* Initialize scatterlist nodes of sg_table */ for_each_sgtable_sg((*sgt), sg, i) sg->length = 0; - node = mem->mm_node; + /* +* Walk down DRM_MM nodes to populate scatterlist nodes +* @note: Use iterator api to get first the DRM_MM node +* and the number of bytes from it. Access the following +* DRM_MM node(s) if more buffer needs to exported +*/ + amdgpu_res_first(mem, offset, length, &cursor); for_each_sgtable_sg((*sgt), sg, i) { - phys_addr_t phys = (node->start << PAGE_SHIFT) + - adev->gmc.aper_base; - size_t size = node->size << PAGE_SHIFT; + phys_addr_t phys = cursor.start + adev->gmc.aper_base; + size_t size = cursor.size; dma_addr_t addr; - ++node; addr = dma_map_res
Re: [PATCH 3/4] drm/amdkfd: Allow access for mmapping KFD BOs
Am 08.04.21 um 01:12 schrieb Felix Kuehling: DRM allows access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manually. Ok, double checking the code that makes sense. Signed-off-by: Felix Kuehling Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 3 ++- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 19 ++- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 +--- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 --- 4 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 0d59bebd92af..7c8c5e469707 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -245,7 +245,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( void *drm_priv, struct kgd_mem **mem, uint64_t *offset, uint32_t flags); int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( - struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size); + struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv, + uint64_t *size); int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv); int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu( diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 95442bcd60fb..e7d61ec966b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1232,6 +1232,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( domain_string(alloc_domain), ret); goto err_bo_create; } + ret = drm_vma_node_allow(&gobj->vma_node, drm_priv); + if (ret) { + pr_debug("Failed to allow vma node access. ret %d\n", +ret); + goto err_node_allow; + } bo = gem_to_amdgpu_bo(gobj); if (bo_type == ttm_bo_type_sg) { bo->tbo.sg = sg; @@ -1261,6 +1267,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( allocate_init_user_pages_failed: remove_kgd_mem_from_kfd_bo_list(*mem, avm->process_info); + drm_vma_node_revoke(&gobj->vma_node, drm_priv); +err_node_allow: amdgpu_bo_unref(&bo); /* Don't unreserve system mem limit twice */ goto err_reserve_limit; @@ -1278,7 +1286,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( } int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( - struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size) + struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv, + uint64_t *size) { struct amdkfd_process_info *process_info = mem->process_info; unsigned long bo_size = mem->bo->tbo.base.size; @@ -1355,6 +1364,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( } /* Free the BO*/ + drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv); drm_gem_object_put(&mem->bo->tbo.base); mutex_destroy(&mem->lock); kfree(mem); @@ -1666,6 +1676,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd, struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv); struct drm_gem_object *obj; struct amdgpu_bo *bo; + int ret; if (dma_buf->ops != &amdgpu_dmabuf_ops) /* Can't handle non-graphics buffers */ @@ -1686,6 +1697,12 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd, if (!*mem) return -ENOMEM; + ret = drm_vma_node_allow(&obj->vma_node, drm_priv); + if (ret) { + kfree(mem); + return ret; + } + if (size) *size = amdgpu_bo_size(bo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 43de260b2230..8fc18de7cff4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1328,7 +1328,8 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep, return 0; err_free: - amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, NULL); + amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, + pdd->vm, NULL); err_unlock: mutex_unlock(&p->mutex); return err; @@ -1365,7 +1366,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, } ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, - (struct kgd_mem *)mem, &size); + (struct kgd_mem *)mem, pdd->vm, &size); /* If freeing the buffer failed, leave the handle in place for * clean-up during process tear
Re: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh
On Wed, Apr 14, 2021 at 02:20:10PM +0800, Du, Xiaojian wrote: > This patch is to remove the "set" function of pp_dpm_mclk for vangogh. > For vangogh, mclk bonds with fclk, they will lock each other > on the same perfomance level. But according to the smu message from pmfw, > only fclk is allowed to set value manually, so remove the unnecessary > code of "set" function for mclk. > > Signed-off-by: Xiaojian Du Acked-by: Huang Rui > --- > drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > index 61ff9a663b21..35904315c1f9 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c > @@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct > smu_context *smu, > return ret; > break; > case SMU_FCLK: > - case SMU_MCLK: > ret = smu_cmn_send_smc_msg_with_param(smu, > > SMU_MSG_SetHardMinFclkByFreq, > min, NULL); > @@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context > *smu, > if (ret) > return ret; > break; > - case SMU_MCLK: > case SMU_FCLK: > ret = vangogh_get_dpm_clk_limited(smu, > clk_type, > soft_min_level, &min_freq); > @@ -1267,7 +1265,6 @@ static int vangogh_force_dpm_limit_value(struct > smu_context *smu, bool highest) > SMU_SOCCLK, > SMU_VCLK, > SMU_DCLK, > - SMU_MCLK, > SMU_FCLK, > }; > > @@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct > smu_context *smu) > enum smu_clk_type clk_type; > uint32_tfeature; > } clk_feature_map[] = { > - {SMU_MCLK, SMU_FEATURE_DPM_FCLK_BIT}, > {SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT}, > {SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT}, > {SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT}, > @@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct > smu_context *smu, > if (ret) > return ret; > > - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); > vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); > vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask); > vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask); > @@ -1468,7 +1463,6 @@ static int vangogh_set_performance_level(struct > smu_context *smu, > if (ret) > return ret; > > - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); > vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); > break; > case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK: > -- > 2.25.1 > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh
This patch is to remove the "set" function of pp_dpm_mclk for vangogh. For vangogh, mclk bonds with fclk, they will lock each other on the same perfomance level. But according to the smu message from pmfw, only fclk is allowed to set value manually, so remove the unnecessary code of "set" function for mclk. Signed-off-by: Xiaojian Du --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 61ff9a663b21..35904315c1f9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct smu_context *smu, return ret; break; case SMU_FCLK: - case SMU_MCLK: ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetHardMinFclkByFreq, min, NULL); @@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context *smu, if (ret) return ret; break; - case SMU_MCLK: case SMU_FCLK: ret = vangogh_get_dpm_clk_limited(smu, clk_type, soft_min_level, &min_freq); @@ -1267,7 +1265,6 @@ static int vangogh_force_dpm_limit_value(struct smu_context *smu, bool highest) SMU_SOCCLK, SMU_VCLK, SMU_DCLK, - SMU_MCLK, SMU_FCLK, }; @@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct smu_context *smu) enum smu_clk_type clk_type; uint32_tfeature; } clk_feature_map[] = { - {SMU_MCLK, SMU_FEATURE_DPM_FCLK_BIT}, {SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT}, {SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT}, {SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT}, @@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct smu_context *smu, if (ret) return ret; - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask); vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask); @@ -1468,7 +1463,6 @@ static int vangogh_set_performance_level(struct smu_context *smu, if (ret) return ret; - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); break; case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK: -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Recall: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh
Du, Xiaojian would like to recall the message, "[PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/pm: fix error code in smu_set_power_limit()
We should return -EINVAL instead of success if the "limit" is too high. Fixes: e098bc9612c2 ("drm/amd/pm: optimize the power related source code layout") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index e0eb7ca112e2..c29d8b3131b7 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -2221,6 +2221,7 @@ static int smu_set_power_limit(void *handle, uint32_t limit) dev_err(smu->adev->dev, "New power limit (%d) is over the max allowed %d\n", limit, smu->max_power_limit); + ret = -EINVAL; goto out; } -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2] drm/amdgpu: fix an error code in init_pmu_entry_by_type_and_add()
If the kmemdup() fails then this should return a negative error code but it currently returns success Fixes: b4a7db71ea06 ("drm/amdgpu: add per device user friendly xgmi events for vega20") Signed-off-by: Dan Carpenter --- v2: I sent this patch in Feb but I accidentally added an unrelated hunk from nouveau to the commit. Now both hunks are have been sent to the correct lists. drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c index 19c0a3655228..82e9ecf84352 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c @@ -519,8 +519,10 @@ static int init_pmu_entry_by_type_and_add(struct amdgpu_pmu_entry *pmu_entry, pmu_entry->pmu.attr_groups = kmemdup(attr_groups, sizeof(attr_groups), GFP_KERNEL); - if (!pmu_entry->pmu.attr_groups) + if (!pmu_entry->pmu.attr_groups) { + ret = -ENOMEM; goto err_attr_group; + } snprintf(pmu_name, PMU_NAME_SIZE, "%s_%d", pmu_entry->pmu_file_prefix, adev_to_drm(pmu_entry->adev)->primary->index); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
FW: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh
[AMD Official Use Only - Internal Distribution Only] Forward to community for review. -Original Message- From: Du, Xiaojian Sent: 2021年4月13日 16:04 To: brahma_sw_dev Cc: Huang, Ray ; Quan, Evan ; Wang, Kevin(Yang) ; Lazar, Lijo ; Du, Xiaojian Subject: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh This patch is to remvoe the "set" function of pp_dpm_mclk for vangogh. For vangogh, mclk bonds with fclk, they will lock each other on the same perfomance level. But according to the smu message from pmfw, only fclk is allowed to set value manually, so remove the unnecessary code of "set" function for mclk. Signed-off-by: Xiaojian Du --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 61ff9a663b21..35904315c1f9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct smu_context *smu, return ret; break; case SMU_FCLK: -case SMU_MCLK: ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetHardMinFclkByFreq, min, NULL); @@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context *smu, if (ret) return ret; break; -case SMU_MCLK: case SMU_FCLK: ret = vangogh_get_dpm_clk_limited(smu, clk_type, soft_min_level, &min_freq); @@ -1267,7 +1265,6 @@ static int vangogh_force_dpm_limit_value(struct smu_context *smu, bool highest) SMU_SOCCLK, SMU_VCLK, SMU_DCLK, -SMU_MCLK, SMU_FCLK, }; @@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct smu_context *smu) enum smu_clk_type clk_type; uint32_tfeature; } clk_feature_map[] = { -{SMU_MCLK, SMU_FEATURE_DPM_FCLK_BIT}, {SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT}, {SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT}, {SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT}, @@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct smu_context *smu, if (ret) return ret; -vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask); vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask); @@ -1468,7 +1463,6 @@ static int vangogh_set_performance_level(struct smu_context *smu, if (ret) return ret; -vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask); vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask); break; case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK: -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amd/amdgpu: enable ASPM on navi1x and vega
On Tue, Apr 13, 2021 at 11:04 PM Kenneth Feng wrote: > > enable ASPM on navi1x and vega series Please split this patch into two, one for vega and one for navi1x. With that fixed, the series is: Reviewed-by: Alex Deucher > > Signed-off-by: Kenneth Feng > --- > drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 128 + > drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 125 > drivers/gpu/drm/amd/amdgpu/nv.c| 10 +- > drivers/gpu/drm/amd/amdgpu/soc15.c | 7 +- > 4 files changed, 259 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c > b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c > index 83ea063388fd..0d2d629e2d6a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c > +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c > @@ -31,6 +31,28 @@ > #include "vega10_enum.h" > #include > > +#define smnPCIE_LC_CNTL0x11140280 > +#define smnPCIE_LC_CNTL3 0x111402d4 > +#define smnPCIE_LC_CNTL6 0x111402ec > +#define smnPCIE_LC_CNTL7 0x111402f0 > +#define smnNBIF_MGCG_CTRL_LCLK 0x1013a05c > +#define NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK 0x1000L > +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK 0xL > +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK 0xL > +#define smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL 0x10123530 > +#define smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2 0x1014008c > +#define smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP 0x10140324 > +#define smnPSWUSP0_PCIE_LC_CNTL2 0x111402c4 > +#define smnRCC_BIF_STRAP2 0x10123488 > +#define smnRCC_BIF_STRAP3 0x1012348c > +#define smnRCC_BIF_STRAP5 0x10123494 > +#define BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK > 0x0400L > +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK 0xL > +#define RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK 0x4000L > +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT 0x0 > +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT 0x10 > +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER__SHIFT 0x0 > + > static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev) > { > WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL, > @@ -256,6 +278,111 @@ static void nbio_v6_1_init_registers(struct > amdgpu_device *adev) > WREG32_PCIE(smnPCIE_CI_CNTL, data); > } > > +static void nbio_v6_1_program_ltr(struct amdgpu_device *adev) > +{ > + uint32_t def, data; > + > + WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, 0x75EB); > + > + def = data = RREG32_PCIE(smnRCC_BIF_STRAP2); > + data &= ~RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK; > + if (def != data) > + WREG32_PCIE(smnRCC_BIF_STRAP2, data); > + > + def = data = RREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL); > + data &= ~EP_PCIE_TX_LTR_CNTL__LTR_PRIV_MSG_DIS_IN_PM_NON_D0_MASK; > + if (def != data) > + WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, data); > + > + def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2); > + data |= BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK; > + if (def != data) > + WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data); > +} > + > +static void nbio_v6_1_program_aspm(struct amdgpu_device *adev) > +{ > + uint32_t def, data; > + > + def = data = RREG32_PCIE(smnPCIE_LC_CNTL); > + data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK; > + data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK; > + data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK; > + if (def != data) > + WREG32_PCIE(smnPCIE_LC_CNTL, data); > + > + def = data = RREG32_PCIE(smnPCIE_LC_CNTL7); > + data |= PCIE_LC_CNTL7__LC_NBIF_ASPM_INPUT_EN_MASK; > + if (def != data) > + WREG32_PCIE(smnPCIE_LC_CNTL7, data); > + > + def = data = RREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK); > + data |= NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK; > + if (def != data) > + WREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK, data); > + > + def = data = RREG32_PCIE(smnPCIE_LC_CNTL3); > + data |= PCIE_LC_CNTL3__LC_DSC_DONT_ENTER_L23_AFTER_PME_ACK_MASK; > + if (def != data) > + WREG32_PCIE(smnPCIE_LC_CNTL3, data); > + > + def = data = RREG32_PCIE(smnRCC_BIF_STRAP3); > + data &= ~RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK; > + data &= ~RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK; > + if (def != data) > + WREG32_PCIE(smnRCC_BIF_STRAP3, data); > + > + def = data = RREG32_PCIE(smnRCC_BIF_STRAP5); > + data &= ~RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK; > + if (def != data) > + WREG32_PCIE(smnRCC_BIF_STRAP5, data); > + > + def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2); > + data &= ~BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR
[PATCH 2/2] drm/amd/amdgpu: add ASPM support on polaris
add ASPM support on polaris Signed-off-by: Kenneth Feng --- drivers/gpu/drm/amd/amdgpu/vi.c | 193 +++- 1 file changed, 191 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c index ea338de5818a..735ebbd1148f 100644 --- a/drivers/gpu/drm/amd/amdgpu/vi.c +++ b/drivers/gpu/drm/amd/amdgpu/vi.c @@ -81,6 +81,30 @@ #include "mxgpu_vi.h" #include "amdgpu_dm.h" +#define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6 +#define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 0x0001L +#define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 0x0002L +#define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_1_OVERRIDE_MASK 0x0004L +#define PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_2_OVERRIDE_MASK 0x0008L +#define PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_1_OVERRIDE_MASK 0x0010L +#define ixPCIE_L1_PM_SUB_CNTL 0x378 +#define PCIE_L1_PM_SUB_CNTL__ASPM_L1_2_EN_MASK 0x0004L +#define PCIE_L1_PM_SUB_CNTL__ASPM_L1_1_EN_MASK 0x0008L +#define PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_2_EN_MASK 0x0001L +#define PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_1_EN_MASK 0x0002L +#define PCIE_LC_CNTL6__LC_L1_POWERDOWN_MASK0x0020L +#define LINK_CAP 0x64 +#define PCIE_LINK_CAP__CLOCK_POWER_MANAGEMENT_MASK 0x0004L +#define ixCPM_CONTROL 0x1400118 +#define ixPCIE_LC_CNTL70x100100BC +#define PCIE_LC_CNTL7__LC_L1_SIDEBAND_CLKREQ_PDWN_EN_MASK 0x0400L +#define PCIE_LC_CNTL__LC_L0S_INACTIVITY_DEFAULT0x0007 +#define PCIE_LC_CNTL__LC_L1_INACTIVITY_DEFAULT 0x0009 +#define CPM_CONTROL__CLKREQb_UNGATE_TXCLK_ENABLE_MASK 0x0100L +#define PCIE_L1_PM_SUB_CNTL0x378 +#define ASIC_IS_P22(asic_type, rid)((asic_type >= CHIP_POLARIS10) && \ + (asic_type <= CHIP_POLARIS12) && \ + (rid >= 0x6E)) /* Topaz */ static const struct amdgpu_video_codecs topaz_video_codecs_encode = { @@ -1091,13 +1115,178 @@ static void vi_pcie_gen3_enable(struct amdgpu_device *adev) /* todo */ } +static void vi_enable_aspm(struct amdgpu_device *adev) +{ + u32 data, orig; + + orig = data = RREG32_PCIE(ixPCIE_LC_CNTL); + data |= PCIE_LC_CNTL__LC_L0S_INACTIVITY_DEFAULT << + PCIE_LC_CNTL__LC_L0S_INACTIVITY__SHIFT; + data |= PCIE_LC_CNTL__LC_L1_INACTIVITY_DEFAULT << + PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT; + data &= ~PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK; + data |= PCIE_LC_CNTL__LC_DELAY_L1_EXIT_MASK; + if (orig != data) + WREG32_PCIE(ixPCIE_LC_CNTL, data); +} + static void vi_program_aspm(struct amdgpu_device *adev) { + u32 data, data1, orig; + bool bL1SS = false; + bool bClkReqSupport = true; - if (amdgpu_aspm == 0) + if (amdgpu_aspm != 1) return; - /* todo */ + if (adev->flags & AMD_IS_APU || + adev->asic_type < CHIP_POLARIS10) + return; + + orig = data = RREG32_PCIE(ixPCIE_LC_CNTL); + data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK; + data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK; + data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK; + if (orig != data) + WREG32_PCIE(ixPCIE_LC_CNTL, data); + + orig = data = RREG32_PCIE(ixPCIE_LC_N_FTS_CNTL); + data &= ~PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS_MASK; + data |= 0x0024 << PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS__SHIFT; + data |= PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS_OVERRIDE_EN_MASK; + if (orig != data) + WREG32_PCIE(ixPCIE_LC_N_FTS_CNTL, data); + + orig = data = RREG32_PCIE(ixPCIE_LC_CNTL3); + data |= PCIE_LC_CNTL3__LC_GO_TO_RECOVERY_MASK; + if (orig != data) + WREG32_PCIE(ixPCIE_LC_CNTL3, data); + + orig = data = RREG32_PCIE(ixPCIE_P_CNTL); + data |= PCIE_P_CNTL__P_IGNORE_EDB_ERR_MASK; + if (orig != data) + WREG32_PCIE(ixPCIE_P_CNTL, data); + + data = RREG32_PCIE(ixPCIE_LC_L1_PM_SUBSTATE); + pci_read_config_dword(adev->pdev, PCIE_L1_PM_SUB_CNTL, &data1); + if (data & PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK && + (data & (PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK | + PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_1_OVERRIDE_MASK | + PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_2_OVERRIDE_MASK | + PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_1_OVERRIDE_MASK))) { + bL1SS = true; + } else if (data1 & (PCIE_L1_PM_SUB_CNTL__ASPM_L1_2_EN_MASK | + PCIE_L1_PM_SUB_CNTL__ASPM_L1_1_EN_MASK | + PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_2_EN_MASK | + PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_1_EN_MASK)) { + bL1SS = true; + } + + orig = data = RRE
[PATCH 1/2] drm/amd/amdgpu: enable ASPM on navi1x and vega
enable ASPM on navi1x and vega series Signed-off-by: Kenneth Feng --- drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 128 + drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 125 drivers/gpu/drm/amd/amdgpu/nv.c| 10 +- drivers/gpu/drm/amd/amdgpu/soc15.c | 7 +- 4 files changed, 259 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c index 83ea063388fd..0d2d629e2d6a 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c @@ -31,6 +31,28 @@ #include "vega10_enum.h" #include +#define smnPCIE_LC_CNTL0x11140280 +#define smnPCIE_LC_CNTL3 0x111402d4 +#define smnPCIE_LC_CNTL6 0x111402ec +#define smnPCIE_LC_CNTL7 0x111402f0 +#define smnNBIF_MGCG_CTRL_LCLK 0x1013a05c +#define NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK 0x1000L +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK 0xL +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK 0xL +#define smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL 0x10123530 +#define smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2 0x1014008c +#define smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP 0x10140324 +#define smnPSWUSP0_PCIE_LC_CNTL2 0x111402c4 +#define smnRCC_BIF_STRAP2 0x10123488 +#define smnRCC_BIF_STRAP3 0x1012348c +#define smnRCC_BIF_STRAP5 0x10123494 +#define BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK0x0400L +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK 0xL +#define RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK 0x4000L +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT 0x0 +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT 0x10 +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER__SHIFT 0x0 + static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev) { WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL, @@ -256,6 +278,111 @@ static void nbio_v6_1_init_registers(struct amdgpu_device *adev) WREG32_PCIE(smnPCIE_CI_CNTL, data); } +static void nbio_v6_1_program_ltr(struct amdgpu_device *adev) +{ + uint32_t def, data; + + WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, 0x75EB); + + def = data = RREG32_PCIE(smnRCC_BIF_STRAP2); + data &= ~RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK; + if (def != data) + WREG32_PCIE(smnRCC_BIF_STRAP2, data); + + def = data = RREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL); + data &= ~EP_PCIE_TX_LTR_CNTL__LTR_PRIV_MSG_DIS_IN_PM_NON_D0_MASK; + if (def != data) + WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, data); + + def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2); + data |= BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK; + if (def != data) + WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data); +} + +static void nbio_v6_1_program_aspm(struct amdgpu_device *adev) +{ + uint32_t def, data; + + def = data = RREG32_PCIE(smnPCIE_LC_CNTL); + data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK; + data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK; + data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK; + if (def != data) + WREG32_PCIE(smnPCIE_LC_CNTL, data); + + def = data = RREG32_PCIE(smnPCIE_LC_CNTL7); + data |= PCIE_LC_CNTL7__LC_NBIF_ASPM_INPUT_EN_MASK; + if (def != data) + WREG32_PCIE(smnPCIE_LC_CNTL7, data); + + def = data = RREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK); + data |= NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK; + if (def != data) + WREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK, data); + + def = data = RREG32_PCIE(smnPCIE_LC_CNTL3); + data |= PCIE_LC_CNTL3__LC_DSC_DONT_ENTER_L23_AFTER_PME_ACK_MASK; + if (def != data) + WREG32_PCIE(smnPCIE_LC_CNTL3, data); + + def = data = RREG32_PCIE(smnRCC_BIF_STRAP3); + data &= ~RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK; + data &= ~RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK; + if (def != data) + WREG32_PCIE(smnRCC_BIF_STRAP3, data); + + def = data = RREG32_PCIE(smnRCC_BIF_STRAP5); + data &= ~RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK; + if (def != data) + WREG32_PCIE(smnRCC_BIF_STRAP5, data); + + def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2); + data &= ~BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK; + if (def != data) + WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data); + + WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP, 0x10011001); + + def = data = RREG32_PCIE(smnPSWUSP0_PCIE_LC_CNTL2); + data |= PSWUSP0_PCIE_LC_CNTL2__LC_ALLOW_PDWN_IN_L1_MASK | + PSWUSP0_PCIE_LC_CNTL2__LC_ALLOW_PDWN_IN_L23_MASK; + data &= ~PSWUSP0_PCIE_LC
[PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay
Our driver supports overlay planes, and as expected, some userspace compositor takes advantage of these features. If the userspace is not enabling the cursor, they can use multiple planes as they please. Nevertheless, we start to have constraints when userspace tries to enable hardware cursor with various planes. Basically, we cannot draw the cursor at the same size and position on two separated pipes since it uses extra bandwidth and DML only run with one cursor. For those reasons, when we enable hardware cursor and multiple planes, our driver should accept variations like the ones described below: +-+ +--+ | +-+ | | | | |Primary | | | Primary | | | | | | Overlay | | +-+ | | | |Overlay | | | +-+ +--+ In this scenario, we can have the desktop UI in the overlay and some other framebuffer attached to the primary plane (e.g., video). However, userspace needs to obey some rules and avoid scenarios like the ones described below (when enabling hw cursor): ++ |Overlay | +-++-+---+ +-||--+ | ++ | ++ | | ++ | | |Overlay | | |Overlay | | | | | || | || | | | | ++ | ++ | | | | Primary || Primary | | Primary | +-++-+ +-+ +-+ +-+ | ++ | Primary| | |Overlay | | | | || | | | ++ | ++ | | Primary | | |Overlay | | +-+ +-||--+ ++ If the userspace violates some of the above scenarios, our driver needs to reject the commit; otherwise, we can have unexpected behavior. Since we don't have a proper driver validation for the above case, we can see some problems like a duplicate cursor in applications that use multiple planes. This commit fixes the cursor issue and others by adding adequate verification for multiple planes. Change since V1 (Harry and Sean): - Remove cursor verification from the equation. Cc: Louis Li Cc: Nicholas Kazlauskas Cc: Harry Wentland Cc: Hersen Wu Cc: Sean Paul Signed-off-by: Rodrigo Siqueira --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e29cb2e956db..ac1408d52eff 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -9946,6 +9946,53 @@ static int add_affected_mst_dsc_crtcs(struct drm_atomic_state *state, struct drm } #endif +static int validate_overlay(struct drm_atomic_state *state) +{ + int i; + struct drm_plane *plane; + struct drm_plane_state *old_plane_state, *new_plane_state; + struct drm_plane_state *primary_state, *overlay_state = NULL; + + /* Check if primary plane is contained inside overlay */ + for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) { + if (plane->type == DRM_PLANE_TYPE_OVERLAY) { + if (drm_atomic_plane_disabling(plane->state, new_plane_state)) + return 0; + + overlay_state = new_plane_state; + continue; + } + } + + /* check if we're making changes to the overlay plane */ + if (!overlay_state) + return 0; + + /* check if overlay plane is enabled */ + if (!overlay_state->crtc) + return 0; + + /* find the primary plane for the CRTC that the overlay is enabled on */ + primary_state = drm_atomic_get_plane_state(state, overlay_state->crtc->primary); + if (IS_ERR(primary_state)) + return PTR_ERR(primary_state); + + /* check if primary plane is enabled */ + if (!primary_state->crtc) + return 0; + + /* Perform the bounds check to ensure the overlay plane covers the primary */ + if (primary_state->crtc_x < overlay_state->crtc_x || + primary_state->crtc_y < overlay_state->crtc_y || + primary_state->crtc_x + primary_state->crtc_w > overlay_state->crtc_x + overlay_state->crtc_w || + primary_state->crtc_y + primary_state->crtc_h > overlay_state->crtc_y + overlay_state->crtc_h) { + DRM_DEBUG_ATOMIC("Overlay plane is enabled with hardware cursor but does not fully cover primary plane\n"); + return -EINVAL; + } + + return 0; +} + /** * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM. * @dev:
Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
On 2021-04-13 5:24 p.m., Mikhail Gavrilov wrote: On Tue, 13 Apr 2021 at 04:55, Leo Liu wrote: It curious why ffmpeg does not cause such issues. For example such command not cause kernel panic: $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf 'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec h264_vaapi output3.mp4 What command are you using to see the issue or how can the issue be reproduced? $ mpv output4.mp4 This is decode command line, are you seeing issue with encode or decode?, you also said `ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf 'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec h264_vaapi output3.mp4` doesn't cause such issue, right? What command line can cause the issue then? And of course, I know how it should works because when I encode video with CPU encoder (libx264) all fine. $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vcodec libx264 output3.mp4 Please file a freedesktop gitlab issue, so we can keep track of it. Here? https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues&data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GbiA7%2FrkiLwh2E9js9tGhWkZyr%2B9TY57H6G6cL7ex8s%3D&reserved=0 Yes. Also, I found that other users face the same problem. https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbbs.archlinux.org%2Fviewtopic.php%3Fid%3D261965&data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YNnk%2BZnPS0DVtuDfttnTThYfHOvmP38%2BwNpNZ5voLuk%3D&reserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
On Tue, 13 Apr 2021 at 04:55, Leo Liu wrote: > > >It curious why ffmpeg does not cause such issues. > >For example such command not cause kernel panic: > >$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf > >'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec > >h264_vaapi output3.mp4 > > What command are you using to see the issue or how can the issue be > reproduced? $ mpv output4.mp4 And of course, I know how it should works because when I encode video with CPU encoder (libx264) all fine. $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vcodec libx264 output3.mp4 > Please file a freedesktop gitlab issue, so we can keep track of it. Here? https://gitlab.freedesktop.org/drm/amd/-/issues Also, I found that other users face the same problem. https://bbs.archlinux.org/viewtopic.php?id=261965 -- Best Regards, Mike Gavrilov. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
On Tue, 13 Apr 2021 at 12:29, Christian König wrote: > > Hi Mikhail, > > the crash is a known issue and should be fixed by: > > commit f63da9ae7584280582cbc834b20cc18bfb203b14 > Author: Philip Yang > Date: Thu Apr 1 00:22:23 2021 -0400 > > drm/amdgpu: reserve fence slot to update page table > Unfortunately, this commit couldn't fix the initial problem. 1. Result video is jerky if it grabbed and encoded with ffmpeg (h264_vaapi codec). 2. OBS still crashed if I try to record or stream video. 3. In the kernel log still appears the message "amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried to record or stream video by OBS. -- Best Regards, Mike Gavrilov. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [pull] amdgpu, radeon drm-next-5.13
On Mon, Apr 12, 2021 at 06:07:32PM -0400, Alex Deucher wrote: > Hi Dave, Daniel, > > Same PR as last week plus a few accumulated fixes, rebased on drm-next > to resolve the dependencies between ttm and scheduler with changes in amdgpu. > > The following changes since commit c103b850721e4a79ff9578f131888129c37a4679: > > Merge tag 'drm-misc-next-2021-04-09' of > git://anongit.freedesktop.org/drm/drm-misc into drm-next (2021-04-10 05:46:35 > +1000) > > are available in the Git repository at: > > https://gitlab.freedesktop.org/agd5f/linux.git > tags/amd-drm-next-5.13-2021-04-12 > > for you to fetch changes up to cbb8f989d5a07cb3e39e9c149a6f89d6c83432aa: > > drm/amdgpu: page retire over debugfs mechanism (2021-04-09 16:58:28 -0400) Applied to drm-next, thanks. -Daniel > > > amd-drm-next-5.13-2021-04-12: > > amdgpu: > - Re-enable GPU reset on VanGogh > - Enable DPM flags for SMART_SUSPEND and MAY_SKIP_RESUME > - Disentangle HG from vga_switcheroo > - S0ix fixes > - W=1 fixes > - Resource iterator fixes > - DMCUB updates > - UBSAN fixes > - More PM API cleanup > - Aldebaran updates > - Modifier fixes > - Enable VCN load balancing with asymmetric engines > - Rework BO structs > - Aldebaran reset support > - Initial LTTPR display work > - Display MALL fixes > - Fall back to YCbCr420 when YCbCr444 fails > - SR-IOV fixes > - RAS updates > - Misc cleanups and fixes > > radeon: > - Typo fixes > - Fix error handling for firmware on r6xx > - Fix a missing check in DP MST handling > > > Alex Deucher (22): > drm/amdgpu/display/dm: add missing parameter documentation > drm/amdgpu: Add additional Sienna Cichlid PCI ID > drm/amdgpu: add a dev_pm_ops prepare callback (v2) > drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and DPM_FLAG_SMART_SUSPEND > flags (v2) > drm/amdgpu: disentangle HG systems from vgaswitcheroo > drm/amdgpu: rework S3/S4/S0ix state handling > drm/amdgpu: don't evict vram on APUs for suspend to ram (v4) > drm/amdgpu: clean up non-DC suspend/resume handling > drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3) > drm/amdgpu: re-enable suspend phase 2 for S0ix > drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend > drm/amdgpu: update comments about s0ix suspend/resume > drm/amdgpu: drop S0ix checks around CG/PG in suspend > drm/amdgpu: skip kfd suspend/resume for S0ix > drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x > drm/amdgpu/display: fix memory leak for dimgrey cavefish > drm/amdgpu/pm: mark pcie link/speed arrays as const > drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend > drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in > suspend > drm/amdgpu/display: fix warning on 32 bit in dmub > drm/amdgpu: drop some unused atombios functions > drm/amdgpu/smu7: fix CAC setting on TOPAZ > > Alex Sierra (2): > drm/amdgpu: replace per_device_list by array > drm/amdgpu: ih reroute for newer asics than vega20 > > Alvin Lee (1): > drm/amd/display: Change input parameter for set_drr > > Amber Lin (1): > drm/amdkfd: Avoid null pointer in SMI event > > Anson Jacob (2): > drm/amd/display: Fix UBSAN: shift-out-of-bounds warning > drm/amd/display: Removing unused code from dmub_cmd.h > > Anthony Koo (3): > drm/amd/display: [FW Promotion] Release 0.0.57 > drm/amd/display: [FW Promotion] Release 0.0.58 > drm/amd/display: [FW Promotion] Release 0.0.59 > > Aric Cyr (3): > drm/amd/display: 3.2.128 > drm/amd/display: 3.2.129 > drm/amd/display: 3.2.130 > > Arnd Bergmann (3): > amdgpu: avoid incorrect %hu format string > amdgpu: fix gcc -Wrestrict warning > amdgpu: securedisplay: simplify i2c hexdump output > > Aurabindo Pillai (1): > drm/amd/display: Add debugfs entry for LTTPR register status > > Bernard Zhao (2): > drm/amd: use kmalloc_array over kmalloc with multiply > drm/amd: cleanup coding style a bit > > Bhaskar Chowdhury (6): > drm/amdgpu: Fix a typo > drm/amdgpu: Fix a typo > drm/atomic: Couple of typo fixes > drm/radeon/r600_cs: Few typo fixes > drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes > drm/amd: Fix a typo in two different sentences > > Bindu Ramamurthy (1): > drm/amd/display: Allow idle optimization based on vblank. > > Chengming Gui (1): > drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for > vega20/ALDEBARAN > > Chris Park (1): > drm/amd/display: Disable MALL when SMU not present > > Christian König (4): > drm/amdgpu: remove irq_src->data handling > drm/amdgpu: add the sched_score to amdgpu_ring_init > drm/amdgpu: share scheduler score on VCN3 instances > drm/amdgpu: lo
Re: [PATCH v2] drm/amdgpu: use pre-calculated bo size
On 4/13/21 10:50 PM, Nirmoy Das wrote: Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT. Ignore this please, pressed send-email too quick! Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 1345f7eba011..74ecc0c1863f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; - size = bo->mem.num_pages << PAGE_SHIFT; + size = bo->base.size; offset = bo->mem.start << PAGE_SHIFT; if ((offset + size) <= adev->gmc.visible_vram_size) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..26deace78539 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, return r; } - num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + num_bytes = bo->tbo.base.size; num_loops = 0; amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/radeon: use pre-calculated bo size
Use bo->tbo.base.size instead of calculating it from num_pages. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/radeon/radeon_object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index cee11c55fd15..fd8f6ccc267e 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -763,7 +763,7 @@ vm_fault_t radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; - size = bo->mem.num_pages << PAGE_SHIFT; + size = bo->base.size; offset = bo->mem.start << PAGE_SHIFT; if ((offset + size) <= rdev->mc.visible_vram_size) return 0; -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/amdgpu: use pre-calculated bo size
Use bo->tbo.base.size instead of calculating it from num_pages. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 1345f7eba011..74ecc0c1863f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; - size = bo->mem.num_pages << PAGE_SHIFT; + size = bo->base.size; offset = bo->mem.start << PAGE_SHIFT; if ((offset + size) <= adev->gmc.visible_vram_size) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..26deace78539 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, return r; } - num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + num_bytes = bo->tbo.base.size; num_loops = 0; amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2] drm/amdgpu: use pre-calculated bo size
Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 1345f7eba011..74ecc0c1863f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; - size = bo->mem.num_pages << PAGE_SHIFT; + size = bo->base.size; offset = bo->mem.start << PAGE_SHIFT; if ((offset + size) <= adev->gmc.visible_vram_size) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..26deace78539 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, return r; } - num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + num_bytes = bo->tbo.base.size; num_loops = 0; amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: use pre-calculated bo size
Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 936b3cfdde55..26deace78539 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, return r; } - num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + num_bytes = bo->tbo.base.size; num_loops = 0; amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
On Tue, Apr 13, 2021 at 11:13 AM Li, Dennis wrote: > > [AMD Official Use Only - Internal Distribution Only] > > Hi, Christian and Andrey, > We maybe try to implement "wait" callback function of dma_fence_ops, > when GPU reset or unplug happen, make this callback return - ENODEV, to > notify the caller device lost. > > * Must return -ERESTARTSYS if the wait is intr = true and the wait > was > * interrupted, and remaining jiffies if fence has signaled, or 0 if > wait > * timed out. Can also return other error values on custom > implementations, > * which should be treated as if the fence is signaled. For example a > hardware > * lockup could be reported like that. > * > * This callback is optional. > */ > signed long (*wait)(struct dma_fence *fence, > bool intr, signed long timeout); Uh this callback is for old horros like unreliable irq delivery on radeon. Please don't use it for anything, if we need to make fences bail out on error then we need something that works for all fences. Also TDR should recovery you here already and make sure the dma_fence_wait() is bound in time. -Daniel > > Best Regards > Dennis Li > -Original Message- > From: Christian König > Sent: Tuesday, April 13, 2021 3:10 PM > To: Grodzovsky, Andrey ; Koenig, Christian > ; Li, Dennis ; > amd-gfx@lists.freedesktop.org; Deucher, Alexander > ; Kuehling, Felix ; Zhang, > Hawking ; Daniel Vetter > Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability > > Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: > > > > On 2021-04-12 3:18 p.m., Christian König wrote: > >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: > >>> [SNIP] > > > > So what's the right approach ? How we guarantee that when running > > amdgpu_fence_driver_force_completion we will signal all the HW > > fences and not racing against some more fences insertion into that > > array ? > > > > Well I would still say the best approach would be to insert this > between the front end and the backend and not rely on signaling > fences while holding the device srcu. > >>> > >>> > >>> My question is, even now, when we run > >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or > >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, > >>> what there prevents a race with another fence being at the same time > >>> emitted and inserted into the fence array ? Looks like nothing. > >>> > >> > >> Each ring can only be used by one thread at the same time, this > >> includes emitting fences as well as other stuff. > >> > >> During GPU reset we make sure nobody writes to the rings by stopping > >> the scheduler and taking the GPU reset lock (so that nobody else can > >> start the scheduler again). > > > > > > What about direct submissions not through scheduler - > > amdgpu_job_submit_direct, I don't see how this is protected. > > Those only happen during startup and GPU reset. > > >> > > BTW: Could it be that the device SRCU protects more than one device > and we deadlock because of this? > >>> > >>> > >>> I haven't actually experienced any deadlock until now but, yes, > >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the > >>> presence of multiple devices from same or different drivers we in > >>> fact are dependent on all their critical sections i guess. > >>> > >> > >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually > >> need to sync that up with Daniel and the rest of the i915 guys. > >> > >> IIRC we could actually have an amdgpu device in a docking station > >> which needs hotplug and the driver might depend on waiting for the > >> i915 driver as well. > > > > > > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I > > don't see why it has to be global and not per device thing. > > I'm really wondering the same thing for quite a while now. > > Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. > > Regards, > Christian. > > > > > Andrey > > > > > >> > >> Christian. > >> > >>> Andrey > >>> > >>> > > Christian. > > > Andrey > > > > > >> > >>> Andrey > >>> > >>> > > Christian. > > > /* Past this point no more fence are submitted to HW ring > > and hence we can safely call force signal on all that are > > currently there. > > * Any subsequently created HW fences will be returned > > signaled with an error code right away > > */ > > > > for_each_ring(adev) > > amdgpu_fence_process(ring) > > > > drm_dev_unplug(dev); > > Stop schedulers > > cancel_sync(all timers and queued works); > > hw_fini > > unmap_mmio > > > > } > > > > > >>
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
On Tue, Apr 13, 2021 at 9:10 AM Christian König wrote: > > Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: > > > > On 2021-04-12 3:18 p.m., Christian König wrote: > >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: > >>> [SNIP] > > > > So what's the right approach ? How we guarantee that when running > > amdgpu_fence_driver_force_completion we will signal all the HW > > fences and not racing against some more fences insertion into that > > array ? > > > > Well I would still say the best approach would be to insert this > between the front end and the backend and not rely on signaling > fences while holding the device srcu. > >>> > >>> > >>> My question is, even now, when we run > >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or > >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, > >>> what there prevents a race with another fence being at the same time > >>> emitted and inserted into the fence array ? Looks like nothing. > >>> > >> > >> Each ring can only be used by one thread at the same time, this > >> includes emitting fences as well as other stuff. > >> > >> During GPU reset we make sure nobody writes to the rings by stopping > >> the scheduler and taking the GPU reset lock (so that nobody else can > >> start the scheduler again). > > > > > > What about direct submissions not through scheduler - > > amdgpu_job_submit_direct, I don't see how this is protected. > > Those only happen during startup and GPU reset. > > >> > > BTW: Could it be that the device SRCU protects more than one device > and we deadlock because of this? > >>> > >>> > >>> I haven't actually experienced any deadlock until now but, yes, > >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the > >>> presence of multiple devices from same or different drivers we in > >>> fact are dependent on all their critical sections i guess. > >>> > >> > >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually > >> need to sync that up with Daniel and the rest of the i915 guys. > >> > >> IIRC we could actually have an amdgpu device in a docking station > >> which needs hotplug and the driver might depend on waiting for the > >> i915 driver as well. > > > > > > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I > > don't see why it has to be global and not per device thing. > > I'm really wondering the same thing for quite a while now. > > Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. SRCU isn't exactly the cheapest thing, but aside from that we could make it per-device. I'm not seeing the point much since if you do end up being stuck on an ioctl this might happen with anything really. Also note that dma_fence_waits are supposed to be time bound, so you shouldn't end up waiting on them forever. It should all get sorted out in due time with TDR I hope (e.g. if i915 is stuck on a fence because you're unlucky). -Daniel > > Regards, > Christian. > > > > > Andrey > > > > > >> > >> Christian. > >> > >>> Andrey > >>> > >>> > > Christian. > > > Andrey > > > > > >> > >>> Andrey > >>> > >>> > > Christian. > > > /* Past this point no more fence are submitted to HW ring > > and hence we can safely call force signal on all that are > > currently there. > > * Any subsequently created HW fences will be returned > > signaled with an error code right away > > */ > > > > for_each_ring(adev) > > amdgpu_fence_process(ring) > > > > drm_dev_unplug(dev); > > Stop schedulers > > cancel_sync(all timers and queued works); > > hw_fini > > unmap_mmio > > > > } > > > > > > Andrey > > > > > >> > >> > >>> > >> > >> Alternatively grabbing the reset write side and stopping > >> and then restarting the scheduler could work as well. > >> > >> Christian. > > > > > > I didn't get the above and I don't see why I need to reuse > > the GPU reset rw_lock. I rely on the SRCU unplug flag for > > unplug. Also, not clear to me why are we focusing on the > > scheduler threads, any code patch to generate HW fences > > should be covered, so any code leading to > > amdgpu_fence_emit needs to be taken into account such as, > > direct IB submissions, VM flushes e.t.c > > You need to work together with the reset lock anyway, cause > a hotplug could run at the same time as a reset. > >>> > >>> > >>> For going my way indeed now I see now that I have to take > >>> reset write side lock during HW f
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
On 2021-04-13 2:25 p.m., Christian König wrote: Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky: On 2021-04-13 2:03 p.m., Christian König wrote: Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky: On 2021-04-13 3:10 a.m., Christian König wrote: Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. Ok, but then looks like I am missing something, see the following steps in amdgpu_pci_remove - 1) Use disable_irq API function to stop and flush all in flight HW interrupts handlers 2) Grab the reset lock and stop all the schedulers After above 2 steps the HW fences array is idle, no more insertions and no more extractions from the array 3) Run one time amdgpu_fence_process to signal all current HW fences 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the GPU reset lock and go on with the rest of the sequence (cancel timers, work items e.t.c) What's problematic in this sequence ? drm_dev_unplug() will wait for the IOCTLs to finish. The IOCTLs in turn can wait for fences. That can be both hardware fences, scheduler fences, as well as fences from other devices (and KIQ fences for register writes under SRIOV, but we can hopefully ignore them for now). We have handled the hardware fences, but we have no idea when the scheduler fences or the fences from other devices will signal. Scheduler fences won't signal until the scheduler threads are restarted or we somehow cancel the submissions. Doable, but tricky as well. For scheduler fences I am not worried, for the sched_fence->finished fence they are by definition attached to HW fences which already signaledfor sched_fence->scheduled we should run drm_sched_entity_kill_jobs for each entity after stopping the scheduler threads and before setting drm_dev_unplug. Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() assumes that there are no more jobs pushed into the entity. We are racing here once more and need to handle that. But why, I wrote above that we first stop the all schedulers, then only call drm_sched_entity_kill_jobs. For waiting for other device I have no idea if that couldn't deadlock somehow. Yea, not sure for imported fences and dma_bufs, I would assume the other devices should not be impacted by our device removal but, who knows... So I guess we are NOT going with finalizing HW fences before drm_dev_unplug and instead will just call drm_dev_enter/exit at the back-ends all over the place where there are MMIO accesses ? Good question. As you said that is really the hard path. Handling it all at once at IOCTL level certainly has some appeal as well, but I have no idea if we can guarantee that this is lock free. Maybe just empirically - let's try it and see under different test scenarios what actually happens ? Andrey Christian. Andrey Regards, Christian. Andrey BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding D
[PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
Extend current implementation of SG_TABLE construction method to allow exportation of sub-buffers of a VRAM BO. This capability will enable logical partitioning of a VRAM BO into multiple non-overlapping sub-buffers. One example of this use case is to partition a VRAM BO into two sub-buffers, one for SRC and another for DST. Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++-- 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index e0c4f7c7f1b9..baa980a477d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -291,8 +291,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, break; case TTM_PL_VRAM: - r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev, - dir, &sgt); + r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, + bo->tbo.base.size, attach->dev, dir, &sgt); if (r) return ERR_PTR(r); break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index dec0db8b0b13..9e38475e0f8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man); u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo); int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 size, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 592a2dd16493..bce105e2973e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -25,6 +25,7 @@ #include #include "amdgpu.h" #include "amdgpu_vm.h" +#include "amdgpu_res_cursor.h" #include "amdgpu_atomfirmware.h" #include "atom.h" @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, * * @adev: amdgpu device pointer * @mem: TTM memory object + * @offset: byte offset from the base of VRAM BO + * @length: number of bytes to export in sg_table * @dev: the other device * @dir: dma direction * @sgt: resulting sg table @@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, */ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 length, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt) { - struct drm_mm_node *node; + struct amdgpu_res_cursor cursor; struct scatterlist *sg; int num_entries = 0; - unsigned int pages; int i, r; *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL); if (!*sgt) return -ENOMEM; - for (pages = mem->num_pages, node = mem->mm_node; -pages; pages -= node->size, ++node) - ++num_entries; + /* Determine the number of DRM_MM nodes to export */ + amdgpu_res_first(mem, offset, length, &cursor); + while (cursor.remaining) { + num_entries++; + amdgpu_res_next(&cursor, cursor.size); + } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); if (r) goto error_free; + /* Initialize scatterlist nodes of sg_table */ for_each_sgtable_sg((*sgt), sg, i) sg->length = 0; - node = mem->mm_node; + /* +* Walk down DRM_MM nodes to populate scatterlist nodes +* @note: Use iterator api to get first the DRM_MM node +* and the number of bytes from it. Access the following +* DRM_MM node(s) if more buffer needs to exported +*/ + amdgpu_res_first(mem, offset, length, &cursor); for_each_sgtable_sg((*sgt), sg, i) { - phys_addr_t phys = (node->start << PAGE_SHIFT) + - adev->gmc.aper_base; - size_t size = node->size << PAGE_SHIFT; + phys_addr_t phys = cursor.start + adev->gmc.aper_base; + size_t size = cursor.size; dma_addr_t addr; - ++node; addr = dma_map_resource(dev, phys, size, dir, DMA
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky: On 2021-04-13 2:03 p.m., Christian König wrote: Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky: On 2021-04-13 3:10 a.m., Christian König wrote: Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. Ok, but then looks like I am missing something, see the following steps in amdgpu_pci_remove - 1) Use disable_irq API function to stop and flush all in flight HW interrupts handlers 2) Grab the reset lock and stop all the schedulers After above 2 steps the HW fences array is idle, no more insertions and no more extractions from the array 3) Run one time amdgpu_fence_process to signal all current HW fences 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the GPU reset lock and go on with the rest of the sequence (cancel timers, work items e.t.c) What's problematic in this sequence ? drm_dev_unplug() will wait for the IOCTLs to finish. The IOCTLs in turn can wait for fences. That can be both hardware fences, scheduler fences, as well as fences from other devices (and KIQ fences for register writes under SRIOV, but we can hopefully ignore them for now). We have handled the hardware fences, but we have no idea when the scheduler fences or the fences from other devices will signal. Scheduler fences won't signal until the scheduler threads are restarted or we somehow cancel the submissions. Doable, but tricky as well. For scheduler fences I am not worried, for the sched_fence->finished fence they are by definition attached to HW fences which already signaledfor sched_fence->scheduled we should run drm_sched_entity_kill_jobs for each entity after stopping the scheduler threads and before setting drm_dev_unplug. Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() assumes that there are no more jobs pushed into the entity. We are racing here once more and need to handle that. For waiting for other device I have no idea if that couldn't deadlock somehow. Yea, not sure for imported fences and dma_bufs, I would assume the other devices should not be impacted by our device removal but, who knows... So I guess we are NOT going with finalizing HW fences before drm_dev_unplug and instead will just call drm_dev_enter/exit at the back-ends all over the place where there are MMIO accesses ? Good question. As you said that is really the hard path. Handling it all at once at IOCTL level certainly has some appeal as well, but I have no idea if we can guarantee that this is lock free. Christian. Andrey Regards, Christian. Andrey BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal o
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
On 2021-04-13 2:03 p.m., Christian König wrote: Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky: On 2021-04-13 3:10 a.m., Christian König wrote: Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. Ok, but then looks like I am missing something, see the following steps in amdgpu_pci_remove - 1) Use disable_irq API function to stop and flush all in flight HW interrupts handlers 2) Grab the reset lock and stop all the schedulers After above 2 steps the HW fences array is idle, no more insertions and no more extractions from the array 3) Run one time amdgpu_fence_process to signal all current HW fences 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the GPU reset lock and go on with the rest of the sequence (cancel timers, work items e.t.c) What's problematic in this sequence ? drm_dev_unplug() will wait for the IOCTLs to finish. The IOCTLs in turn can wait for fences. That can be both hardware fences, scheduler fences, as well as fences from other devices (and KIQ fences for register writes under SRIOV, but we can hopefully ignore them for now). We have handled the hardware fences, but we have no idea when the scheduler fences or the fences from other devices will signal. Scheduler fences won't signal until the scheduler threads are restarted or we somehow cancel the submissions. Doable, but tricky as well. For scheduler fences I am not worried, for the sched_fence->finished fence they are by definition attached to HW fences which already signaled,for sched_fence->scheduled we should run drm_sched_entity_kill_jobs for each entity after stopping the scheduler threads and before setting drm_dev_unplug. For waiting for other device I have no idea if that couldn't deadlock somehow. Yea, not sure for imported fences and dma_bufs, I would assume the other devices should not be impacted by our device removal but, who knows... So I guess we are NOT going with finalizing HW fences before drm_dev_unplug and instead will just call drm_dev_enter/exit at the back-ends all over the place where there are MMIO accesses ? Andrey Regards, Christian. Andrey BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal on all that are currently there. * Any subsequently created HW fences will be returned signaled with an error code right away */ for_each_ring(adev) amdgpu_fence_process(ring) drm_dev_unplug(dev); Stop schedulers cancel_sync(all timers and queued works); hw_fini unmap_mmio } Andrey Alternatively grabbing the reset write side and stopping and then restarting the scheduler could work as well
Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
Am 13.04.21 um 19:17 schrieb Ramesh Errabolu: Extend current implementation of SG_TABLE construction method to allow exportation of sub-buffers of a VRAM BO. This capability will enable logical partitioning of a VRAM BO into multiple non-overlapping sub-buffers. One example of this use case is to partition a VRAM BO into two sub-buffers, one for SRC and another for DST. Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 6 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++-- 3 files changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index e0c4f7c7f1b9..57534b93bd91 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct sg_table *sgt; + u64 num_bytes; long r; if (!bo->tbo.pin_count) { @@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, break; case TTM_PL_VRAM: - r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev, - dir, &sgt); + num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; Please rather use bo->tbo.base.size here. It is already in bytes. + r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, num_bytes, + attach->dev, dir, &sgt); if (r) return ERR_PTR(r); break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index dec0db8b0b13..9e38475e0f8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man); u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo); int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 size, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 592a2dd16493..c1a7772fa8e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -25,6 +25,7 @@ #include #include "amdgpu.h" #include "amdgpu_vm.h" +#include "amdgpu_res_cursor.h" #include "amdgpu_atomfirmware.h" #include "atom.h" @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, * * @adev: amdgpu device pointer * @mem: TTM memory object + * @offset: byte offset from the base of VRAM BO + * @length: number of bytes to export in sg_table * @dev: the other device * @dir: dma direction * @sgt: resulting sg table @@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, */ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 length, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt) { - struct drm_mm_node *node; + struct amdgpu_res_cursor cursor; struct scatterlist *sg; int num_entries = 0; - unsigned int pages; int i, r; *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL); if (!*sgt) return -ENOMEM; - for (pages = mem->num_pages, node = mem->mm_node; -pages; pages -= node->size, ++node) - ++num_entries; + /* Determine the number of DRM_MM nodes to export */ + amdgpu_res_first(mem, offset, length, &cursor); + while (cursor.remaining) { + num_entries++; + amdgpu_res_next(&cursor, cursor.size); + } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); if (r) goto error_free; + /* Initialize scatterlist nodes of sg_table */ for_each_sgtable_sg((*sgt), sg, i) sg->length = 0; - node = mem->mm_node; + /* +* Walk down DRM_MM nodes to populate scatterlist nodes +* @note: Use iterator api to get first the DRM_MM node +* and the number of bytes from it. Access the following +* DRM_MM node(s) if more buffer needs to exported +*/ + amdgpu_res_first(mem, of
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky: On 2021-04-13 3:10 a.m., Christian König wrote: Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. Ok, but then looks like I am missing something, see the following steps in amdgpu_pci_remove - 1) Use disable_irq API function to stop and flush all in flight HW interrupts handlers 2) Grab the reset lock and stop all the schedulers After above 2 steps the HW fences array is idle, no more insertions and no more extractions from the array 3) Run one time amdgpu_fence_process to signal all current HW fences 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the GPU reset lock and go on with the rest of the sequence (cancel timers, work items e.t.c) What's problematic in this sequence ? drm_dev_unplug() will wait for the IOCTLs to finish. The IOCTLs in turn can wait for fences. That can be both hardware fences, scheduler fences, as well as fences from other devices (and KIQ fences for register writes under SRIOV, but we can hopefully ignore them for now). We have handled the hardware fences, but we have no idea when the scheduler fences or the fences from other devices will signal. Scheduler fences won't signal until the scheduler threads are restarted or we somehow cancel the submissions. Doable, but tricky as well. For waiting for other device I have no idea if that couldn't deadlock somehow. Regards, Christian. Andrey BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal on all that are currently there. * Any subsequently created HW fences will be returned signaled with an error code right away */ for_each_ring(adev) amdgpu_fence_process(ring) drm_dev_unplug(dev); Stop schedulers cancel_sync(all timers and queued works); hw_fini unmap_mmio } Andrey Alternatively grabbing the reset write side and stopping and then restarting the scheduler could work as well. Christian. I didn't get the above and I don't see why I need to reuse the GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not clear to me why are we focusing on the scheduler threads, any code patch to generate HW fences should be covered, so any code leading to amdgpu_fence_emit needs to be taken into account such as, direct IB submissions, VM flushes e.t.c You need to work together with the reset lock anyway, cause a hotplug could run at the same time as a reset. For going my way indeed now I see now that I have to take reset write side lock during HW fences signalling in order to protect against scheduler/HW fences detachment and reattachment during schedulers
[PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
Extend current implementation of SG_TABLE construction method to allow exportation of sub-buffers of a VRAM BO. This capability will enable logical partitioning of a VRAM BO into multiple non-overlapping sub-buffers. One example of this use case is to partition a VRAM BO into two sub-buffers, one for SRC and another for DST. Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 6 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++-- 3 files changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index e0c4f7c7f1b9..57534b93bd91 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct sg_table *sgt; + u64 num_bytes; long r; if (!bo->tbo.pin_count) { @@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, break; case TTM_PL_VRAM: - r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev, - dir, &sgt); + num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; + r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, num_bytes, + attach->dev, dir, &sgt); if (r) return ERR_PTR(r); break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index dec0db8b0b13..9e38475e0f8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager *man); u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo); int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 size, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 592a2dd16493..c1a7772fa8e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -25,6 +25,7 @@ #include #include "amdgpu.h" #include "amdgpu_vm.h" +#include "amdgpu_res_cursor.h" #include "amdgpu_atomfirmware.h" #include "atom.h" @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, * * @adev: amdgpu device pointer * @mem: TTM memory object + * @offset: byte offset from the base of VRAM BO + * @length: number of bytes to export in sg_table * @dev: the other device * @dir: dma direction * @sgt: resulting sg table @@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, */ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, struct ttm_resource *mem, + u64 offset, u64 length, struct device *dev, enum dma_data_direction dir, struct sg_table **sgt) { - struct drm_mm_node *node; + struct amdgpu_res_cursor cursor; struct scatterlist *sg; int num_entries = 0; - unsigned int pages; int i, r; *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL); if (!*sgt) return -ENOMEM; - for (pages = mem->num_pages, node = mem->mm_node; -pages; pages -= node->size, ++node) - ++num_entries; + /* Determine the number of DRM_MM nodes to export */ + amdgpu_res_first(mem, offset, length, &cursor); + while (cursor.remaining) { + num_entries++; + amdgpu_res_next(&cursor, cursor.size); + } r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); if (r) goto error_free; + /* Initialize scatterlist nodes of sg_table */ for_each_sgtable_sg((*sgt), sg, i) sg->length = 0; - node = mem->mm_node; + /* +* Walk down DRM_MM nodes to populate scatterlist nodes +* @note: Use iterator api to get first the DRM_MM node +* and the number of bytes from it. Access the following +* DRM_MM node(s) if more buffer needs to exported +*/ + amdgpu_res_first(mem, offset, length, &cursor); for_each_sgtable_sg((*sgt), sg, i) { - phys_addr_t phys = (node->star
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
On 2021-04-13 3:10 a.m., Christian König wrote: Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. Ok, but then looks like I am missing something, see the following steps in amdgpu_pci_remove - 1) Use disable_irq API function to stop and flush all in flight HW interrupts handlers 2) Grab the reset lock and stop all the schedulers After above 2 steps the HW fences array is idle, no more insertions and no more extractions from the array 3) Run one time amdgpu_fence_process to signal all current HW fences 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the GPU reset lock and go on with the rest of the sequence (cancel timers, work items e.t.c) What's problematic in this sequence ? Andrey BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal on all that are currently there. * Any subsequently created HW fences will be returned signaled with an error code right away */ for_each_ring(adev) amdgpu_fence_process(ring) drm_dev_unplug(dev); Stop schedulers cancel_sync(all timers and queued works); hw_fini unmap_mmio } Andrey Alternatively grabbing the reset write side and stopping and then restarting the scheduler could work as well. Christian. I didn't get the above and I don't see why I need to reuse the GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not clear to me why are we focusing on the scheduler threads, any code patch to generate HW fences should be covered, so any code leading to amdgpu_fence_emit needs to be taken into account such as, direct IB submissions, VM flushes e.t.c You need to work together with the reset lock anyway, cause a hotplug could run at the same time as a reset. For going my way indeed now I see now that I have to take reset write side lock during HW fences signalling in order to protect against scheduler/HW fences detachment and reattachment during schedulers stop/restart. But if we go with your approach then calling drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit should be enough to prevent any concurrent GPU resets during unplug. In fact I already do it anyway - https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjze
Re: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO
On Mon, Apr 12, 2021 at 8:15 AM Roy Sun wrote: > > Add interface to get the mm clock, temperature and memory load > > Signed-off-by: Roy Sun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 + > include/uapi/drm/amdgpu_drm.h | 12 ++ > 2 files changed, 62 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index b2e774aeab45..e5b16e0819ce 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > ui32 /= 100; > break; > + case AMDGPU_INFO_SENSOR_UVD_VCLK: > + /* get mm vclk in Mhz */ > + if (amdgpu_dpm_read_sensor(adev, > + AMDGPU_PP_SENSOR_UVD_VCLK, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + ui32 /= 100; > + break; > + case AMDGPU_INFO_SENSOR_UVD_DCLK: > + /* get mm dclk in Mhz */ > + if (amdgpu_dpm_read_sensor(adev, > + AMDGPU_PP_SENSOR_UVD_DCLK, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + ui32 /= 100; > + break; > + case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP: > + /* get junction temperature */ > + if (amdgpu_dpm_read_sensor(adev, > + > AMDGPU_PP_SENSOR_HOTSPOT_TEMP, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + break; > + case AMDGPU_INFO_SENSOR_EDGE_TEMP: > + /* get current edge temperature */ > + if (amdgpu_dpm_read_sensor(adev, > + AMDGPU_PP_SENSOR_EDGE_TEMP, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + break; > + case AMDGPU_INFO_SENSOR_MEM_TEMP: > + /* get current memory temperature */ > + if (amdgpu_dpm_read_sensor(adev, > + AMDGPU_PP_SENSOR_MEM_TEMP, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + break; > + case AMDGPU_INFO_SENSOR_MEM_LOAD: > + /* get memory load */ > + if (amdgpu_dpm_read_sensor(adev, > + AMDGPU_PP_SENSOR_MEM_LOAD, > + (void *)&ui32, > &ui32_size)) { > + return -EINVAL; > + } > + break; > default: > DRM_DEBUG_KMS("Invalid request %d\n", > info->sensor_info.type); > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > index 8b832f7458f2..484c72e17c72 100644 > --- a/include/uapi/drm/amdgpu_drm.h > +++ b/include/uapi/drm/amdgpu_drm.h > @@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data { > #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK 0x8 > /* Subquery id: Query GPU stable pstate memory clock */ > #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK 0x9 > + /* Subquery id: Query GPU mm vclk */ > + #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa > + /* Subquery id: Query GPU mm dclk */ > + #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb > + /* Subquery id: Query junction temperature */ > + #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc > + /* Subquery id: Query edge temperature */ > + #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd > + /* Subquery id: Query memory temperature */ > + #define AMDGPU_INFO_SENSOR_MEM_TEMP 0xe > + /* Subquery id: Query Memory load */ > + #define AMDGPU_INFO_SENSOR_MEM_LOAD 0xf + Tom Please provide a link to patches for some userspace tool which uses these new queries. Something like umr would be a logical choice. Once you have that, the patch itself loo
[PATCH] drm/amdgpu: remove set but not used variables
The value of ret set but will rewriten, so just delete. Signed-off-by: Tian Tao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1fb2a91..f303ad6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1815,10 +1815,8 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) return 0; *data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO); - if (!*data) { - ret = -ENOMEM; - goto out; - } + if (!*data) + return -ENOMEM; mutex_init(&con->recovery_lock); INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery); @@ -1848,7 +1846,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) kfree((*data)->bps); kfree(*data); con->eh_data = NULL; -out: + dev_warn(adev->dev, "Failed to initialize ras recovery!\n"); /* -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 1/5] drm/connector: Create a helper to attach the hdr_output_metadata property
All the drivers that implement HDR output call pretty much the same function to initialise the hdr_output_metadata property, and while the creation of that property is in a helper, every driver uses the same code to attach it. Provide a helper for it as well Reviewed-by: Harry Wentland Reviewed-by: Jernej Skrabec Signed-off-by: Maxime Ripard --- Changes from v1: - Rebased on latest drm-misc-next tag - Added the tags --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +--- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 3 +-- drivers/gpu/drm/drm_connector.c | 21 +++ drivers/gpu/drm/i915/display/intel_hdmi.c | 3 +-- include/drm/drm_connector.h | 1 + 5 files changed, 25 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 55e39b462a5e..1e22ce718010 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -7078,9 +7078,7 @@ void amdgpu_dm_connector_init_helper(struct amdgpu_display_manager *dm, if (connector_type == DRM_MODE_CONNECTOR_HDMIA || connector_type == DRM_MODE_CONNECTOR_DisplayPort || connector_type == DRM_MODE_CONNECTOR_eDP) { - drm_object_attach_property( - &aconnector->base.base, - dm->ddev->mode_config.hdr_output_metadata_property, 0); + drm_connector_attach_hdr_output_metadata_property(&aconnector->base); if (!aconnector->mst_port) drm_connector_attach_vrr_capable_property(&aconnector->base); diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index dda4fa9a1a08..f24bbb840dbf 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -2492,8 +2492,7 @@ static int dw_hdmi_connector_create(struct dw_hdmi *hdmi) drm_connector_attach_max_bpc_property(connector, 8, 16); if (hdmi->version >= 0x200a && hdmi->plat_data->use_drm_infoframe) - drm_object_attach_property(&connector->base, - connector->dev->mode_config.hdr_output_metadata_property, 0); + drm_connector_attach_hdr_output_metadata_property(connector); drm_connector_attach_encoder(connector, hdmi->bridge.encoder); diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c index 7631f76e7f34..a4aa2d87af35 100644 --- a/drivers/gpu/drm/drm_connector.c +++ b/drivers/gpu/drm/drm_connector.c @@ -2150,6 +2150,27 @@ int drm_connector_attach_max_bpc_property(struct drm_connector *connector, } EXPORT_SYMBOL(drm_connector_attach_max_bpc_property); +/** + * drm_connector_attach_hdr_output_metadata_property - attach "HDR_OUTPUT_METADA" property + * @connector: connector to attach the property on. + * + * This is used to allow the userspace to send HDR Metadata to the + * driver. + * + * Returns: + * Zero on success, negative errno on failure. + */ +int drm_connector_attach_hdr_output_metadata_property(struct drm_connector *connector) +{ + struct drm_device *dev = connector->dev; + struct drm_property *prop = dev->mode_config.hdr_output_metadata_property; + + drm_object_attach_property(&connector->base, prop, 0); + + return 0; +} +EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property); + /** * drm_connector_set_vrr_capable_property - sets the variable refresh rate * capable property for a connector diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c b/drivers/gpu/drm/i915/display/intel_hdmi.c index 95919d325b0b..f2f1b025e6ba 100644 --- a/drivers/gpu/drm/i915/display/intel_hdmi.c +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c @@ -2965,8 +2965,7 @@ intel_hdmi_add_properties(struct intel_hdmi *intel_hdmi, struct drm_connector *c drm_connector_attach_content_type_property(connector); if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv)) - drm_object_attach_property(&connector->base, - connector->dev->mode_config.hdr_output_metadata_property, 0); + drm_connector_attach_hdr_output_metadata_property(connector); if (!HAS_GMCH(dev_priv)) drm_connector_attach_max_bpc_property(connector, 8, 12); diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h index 1922b278ffad..32172dab8427 100644 --- a/include/drm/drm_connector.h +++ b/include/drm/drm_connector.h @@ -1671,6 +1671,7 @@ int drm_connector_attach_scaling_mode_property(struct drm_connector *connector, u32 scaling_mode_mask); int drm_connector_attach_vrr_capable_property( struct drm_connector *connector); +int drm_connector_attach_hdr_output_metadata_property(struct drm_connector *connector); int drm_mode_create_
[PATCH v2 5/5] drm/vc4: hdmi: Signal the proper colorimetry info in the infoframe
Our driver while supporting HDR didn't send the proper colorimetry info in the AVI infoframe. Let's add the property needed so that the userspace can let us know what the colorspace is supposed to be. Signed-off-by: Maxime Ripard --- Changes from v1: - New patch --- drivers/gpu/drm/vc4/vc4_hdmi.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index a33fa1662588..a22e17788074 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -226,7 +226,8 @@ static int vc4_hdmi_connector_atomic_check(struct drm_connector *connector, if (!crtc) return 0; - if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) { + if (old_state->colorspace != new_state->colorspace || + !drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) { struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_crtc_state(state, crtc); @@ -316,6 +317,11 @@ static int vc4_hdmi_connector_init(struct drm_device *dev, if (ret) return ret; + ret = drm_mode_create_hdmi_colorspace_property(connector); + if (ret) + return ret; + + drm_connector_attach_colorspace_property(connector); drm_connector_attach_tv_margin_properties(connector); drm_connector_attach_max_bpc_property(connector, 8, 12); @@ -424,7 +430,7 @@ static void vc4_hdmi_set_avi_infoframe(struct drm_encoder *encoder) vc4_encoder->limited_rgb_range ? HDMI_QUANTIZATION_RANGE_LIMITED : HDMI_QUANTIZATION_RANGE_FULL); - + drm_hdmi_avi_infoframe_colorspace(&frame.avi, cstate); drm_hdmi_avi_infoframe_bars(&frame.avi, cstate); vc4_hdmi_write_infoframe(encoder, &frame); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 4/5] drm/connector: Add a helper to attach the colorspace property
The intel driver uses the same logic to attach the Colorspace property in multiple places and we'll need it in vc4 too. Let's move that common code in a helper. Signed-off-by: Maxime Ripard --- Changes from v1: - New patch --- drivers/gpu/drm/drm_connector.c | 20 +++ .../gpu/drm/i915/display/intel_connector.c| 6 ++ include/drm/drm_connector.h | 1 + 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c index b13021b1b128..6a20b249e533 100644 --- a/drivers/gpu/drm/drm_connector.c +++ b/drivers/gpu/drm/drm_connector.c @@ -2171,6 +2171,26 @@ int drm_connector_attach_hdr_output_metadata_property(struct drm_connector *conn } EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property); +/** + * drm_connector_attach_colorspace_property - attach "Colorspace" property + * @connector: connector to attach the property on. + * + * This is used to allow the userspace to signal the output colorspace + * to the driver. + * + * Returns: + * Zero on success, negative errno on failure. + */ +int drm_connector_attach_colorspace_property(struct drm_connector *connector) +{ + struct drm_property *prop = connector->colorspace_property; + + drm_object_attach_property(&connector->base, prop, DRM_MODE_COLORIMETRY_DEFAULT); + + return 0; +} +EXPORT_SYMBOL(drm_connector_attach_colorspace_property); + /** * drm_connector_atomic_hdr_metadata_equal - checks if the hdr metadata changed * @old_state: old connector state to compare diff --git a/drivers/gpu/drm/i915/display/intel_connector.c b/drivers/gpu/drm/i915/display/intel_connector.c index d5ceb7bdc14b..9bed1ccecea0 100644 --- a/drivers/gpu/drm/i915/display/intel_connector.c +++ b/drivers/gpu/drm/i915/display/intel_connector.c @@ -282,14 +282,12 @@ void intel_attach_hdmi_colorspace_property(struct drm_connector *connector) { if (!drm_mode_create_hdmi_colorspace_property(connector)) - drm_object_attach_property(&connector->base, - connector->colorspace_property, 0); + drm_connector_attach_colorspace_property(connector); } void intel_attach_dp_colorspace_property(struct drm_connector *connector) { if (!drm_mode_create_dp_colorspace_property(connector)) - drm_object_attach_property(&connector->base, - connector->colorspace_property, 0); + drm_connector_attach_colorspace_property(connector); } diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h index 1f51d73ca715..714d1a01c065 100644 --- a/include/drm/drm_connector.h +++ b/include/drm/drm_connector.h @@ -1671,6 +1671,7 @@ int drm_connector_attach_scaling_mode_property(struct drm_connector *connector, u32 scaling_mode_mask); int drm_connector_attach_vrr_capable_property( struct drm_connector *connector); +int drm_connector_attach_colorspace_property(struct drm_connector *connector); int drm_connector_attach_hdr_output_metadata_property(struct drm_connector *connector); bool drm_connector_atomic_hdr_metadata_equal(struct drm_connector_state *old_state, struct drm_connector_state *new_state); -- 2.30.2 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 3/5] drm/vc4: Add HDR metadata property to the VC5 HDMI connectors
From: Dave Stevenson Now that we can export deeper colour depths, add in the signalling for HDR metadata. Signed-off-by: Dave Stevenson Signed-off-by: Maxime Ripard --- Changes from v1: - Rebased on latest drm-misc-next tag --- drivers/gpu/drm/vc4/vc4_hdmi.c | 53 ++ drivers/gpu/drm/vc4/vc4_hdmi.h | 3 ++ 2 files changed, 56 insertions(+) diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 1fda574579af..a33fa1662588 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -214,6 +214,31 @@ static int vc4_hdmi_connector_get_modes(struct drm_connector *connector) return ret; } +static int vc4_hdmi_connector_atomic_check(struct drm_connector *connector, + struct drm_atomic_state *state) +{ + struct drm_connector_state *old_state = + drm_atomic_get_old_connector_state(state, connector); + struct drm_connector_state *new_state = + drm_atomic_get_new_connector_state(state, connector); + struct drm_crtc *crtc = new_state->crtc; + + if (!crtc) + return 0; + + if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) { + struct drm_crtc_state *crtc_state; + + crtc_state = drm_atomic_get_crtc_state(state, crtc); + if (IS_ERR(crtc_state)) + return PTR_ERR(crtc_state); + + crtc_state->mode_changed = true; + } + + return 0; +} + static void vc4_hdmi_connector_reset(struct drm_connector *connector) { struct vc4_hdmi_connector_state *old_state = @@ -263,6 +288,7 @@ static const struct drm_connector_funcs vc4_hdmi_connector_funcs = { static const struct drm_connector_helper_funcs vc4_hdmi_connector_helper_funcs = { .get_modes = vc4_hdmi_connector_get_modes, + .atomic_check = vc4_hdmi_connector_atomic_check, }; static int vc4_hdmi_connector_init(struct drm_device *dev, @@ -299,6 +325,9 @@ static int vc4_hdmi_connector_init(struct drm_device *dev, connector->interlace_allowed = 1; connector->doublescan_allowed = 0; + if (vc4_hdmi->variant->supports_hdr) + drm_connector_attach_hdr_output_metadata_property(connector); + drm_connector_attach_encoder(connector, encoder); return 0; @@ -432,6 +461,25 @@ static void vc4_hdmi_set_audio_infoframe(struct drm_encoder *encoder) vc4_hdmi_write_infoframe(encoder, &frame); } +static void vc4_hdmi_set_hdr_infoframe(struct drm_encoder *encoder) +{ + struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder); + struct drm_connector *connector = &vc4_hdmi->connector; + struct drm_connector_state *conn_state = connector->state; + union hdmi_infoframe frame; + + if (!vc4_hdmi->variant->supports_hdr) + return; + + if (!conn_state->hdr_output_metadata) + return; + + if (drm_hdmi_infoframe_set_hdr_metadata(&frame.drm, conn_state)) + return; + + vc4_hdmi_write_infoframe(encoder, &frame); +} + static void vc4_hdmi_set_infoframes(struct drm_encoder *encoder) { struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder); @@ -444,6 +492,8 @@ static void vc4_hdmi_set_infoframes(struct drm_encoder *encoder) */ if (vc4_hdmi->audio.streaming) vc4_hdmi_set_audio_infoframe(encoder); + + vc4_hdmi_set_hdr_infoframe(encoder); } static void vc4_hdmi_encoder_post_crtc_disable(struct drm_encoder *encoder, @@ -2102,6 +2152,7 @@ static const struct vc4_hdmi_variant bcm2835_variant = { .phy_rng_enable = vc4_hdmi_phy_rng_enable, .phy_rng_disable= vc4_hdmi_phy_rng_disable, .channel_map= vc4_hdmi_channel_map, + .supports_hdr = false, }; static const struct vc4_hdmi_variant bcm2711_hdmi0_variant = { @@ -2129,6 +2180,7 @@ static const struct vc4_hdmi_variant bcm2711_hdmi0_variant = { .phy_rng_enable = vc5_hdmi_phy_rng_enable, .phy_rng_disable= vc5_hdmi_phy_rng_disable, .channel_map= vc5_hdmi_channel_map, + .supports_hdr = true, }; static const struct vc4_hdmi_variant bcm2711_hdmi1_variant = { @@ -2156,6 +2208,7 @@ static const struct vc4_hdmi_variant bcm2711_hdmi1_variant = { .phy_rng_enable = vc5_hdmi_phy_rng_enable, .phy_rng_disable= vc5_hdmi_phy_rng_disable, .channel_map= vc5_hdmi_channel_map, + .supports_hdr = true, }; static const struct of_device_id vc4_hdmi_dt_match[] = { diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h index 3cebd1fd00fc..060bcaefbeb5 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.h +++ b/drivers/gpu/drm/vc4/vc4_hdmi.h @@ -99,6 +99,9 @@ struct vc4_hdmi_variant { /* Callback to get channel map */
[PATCH v2 2/5] drm/connector: Add helper to compare HDR metadata
All the drivers that support the HDR metadata property have a similar function to compare the metadata from one connector state to the next, and force a mode change if they differ. All these functions run pretty much the same code, so let's turn it into an helper that can be shared across those drivers. Reviewed-by: Harry Wentland Reviewed-by: Jernej Skrabec Signed-off-by: Maxime Ripard --- Changes from v1: - Rebased on latest drm-misc-next tag - Added the tags - Fix build breakage on amdgpu --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 23 ++- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 17 +-- drivers/gpu/drm/drm_connector.c | 28 +++ drivers/gpu/drm/i915/display/intel_atomic.c | 13 + include/drm/drm_connector.h | 2 ++ 5 files changed, 34 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 1e22ce718010..ed1972fb531d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -5967,25 +5967,6 @@ static int fill_hdr_info_packet(const struct drm_connector_state *state, return 0; } -static bool -is_hdr_metadata_different(const struct drm_connector_state *old_state, - const struct drm_connector_state *new_state) -{ - struct drm_property_blob *old_blob = old_state->hdr_output_metadata; - struct drm_property_blob *new_blob = new_state->hdr_output_metadata; - - if (old_blob != new_blob) { - if (old_blob && new_blob && - old_blob->length == new_blob->length) - return memcmp(old_blob->data, new_blob->data, - old_blob->length); - - return true; - } - - return false; -} - static int amdgpu_dm_connector_atomic_check(struct drm_connector *conn, struct drm_atomic_state *state) @@ -6003,7 +5984,7 @@ amdgpu_dm_connector_atomic_check(struct drm_connector *conn, if (!crtc) return 0; - if (is_hdr_metadata_different(old_con_state, new_con_state)) { + if (!drm_connector_atomic_hdr_metadata_equal(old_con_state, new_con_state)) { struct dc_info_packet hdr_infopacket; ret = fill_hdr_info_packet(new_con_state, &hdr_infopacket); @@ -8357,7 +8338,7 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) dm_old_crtc_state->abm_level; hdr_changed = - is_hdr_metadata_different(old_con_state, new_con_state); + !drm_connector_atomic_hdr_metadata_equal(old_con_state, new_con_state); if (!scaling_changed && !abm_changed && !hdr_changed) continue; diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index f24bbb840dbf..f871e33c2fc9 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -2395,21 +2395,6 @@ static int dw_hdmi_connector_get_modes(struct drm_connector *connector) return ret; } -static bool hdr_metadata_equal(const struct drm_connector_state *old_state, - const struct drm_connector_state *new_state) -{ - struct drm_property_blob *old_blob = old_state->hdr_output_metadata; - struct drm_property_blob *new_blob = new_state->hdr_output_metadata; - - if (!old_blob || !new_blob) - return old_blob == new_blob; - - if (old_blob->length != new_blob->length) - return false; - - return !memcmp(old_blob->data, new_blob->data, old_blob->length); -} - static int dw_hdmi_connector_atomic_check(struct drm_connector *connector, struct drm_atomic_state *state) { @@ -2423,7 +2408,7 @@ static int dw_hdmi_connector_atomic_check(struct drm_connector *connector, if (!crtc) return 0; - if (!hdr_metadata_equal(old_state, new_state)) { + if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) { crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c index a4aa2d87af35..b13021b1b128 100644 --- a/drivers/gpu/drm/drm_connector.c +++ b/drivers/gpu/drm/drm_connector.c @@ -2171,6 +2171,34 @@ int drm_connector_attach_hdr_output_metadata_property(struct drm_connector *conn } EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property); +/** + * drm_connector_atomic_hdr_metadata_equal - checks if the hdr metadata changed + * @old_state: old connector state to compare + * @new_state: new connector
Re: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface
[AMD Public Use] Series is: Reviewed-by: Alex Deucher From: Zhang, Hawking Sent: Tuesday, April 13, 2021 9:04 AM To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: RE: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface [AMD Public Use] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Tuikov, Luben Sent: Tuesday, April 13, 2021 20:56 To: amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Deucher, Alexander ; Zhang, Hawking Subject: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the context. Cc: Alexander Deucher Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 + 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 30cda4b8a461..44dfb3613e37 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, /** * DOC: AMDGPU RAS debugfs control interface * - * It accepts struct ras_debug_if who has two members. + * The control interface accepts struct ras_debug_if which has two members. * * First member: ras_debug_if::head or ras_debug_if::inject. * @@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * * How to use the interface? * - * Programs + * In a program * - * Copy the struct ras_debug_if in your codes and initialize it. - * Write the struct to the control node. + * Copy the struct ras_debug_if in your code and initialize it. + * Write the struct to the control interface. * - * Shells + * From shell * * .. code-block:: bash * - * echo op block [error [sub_block address value]] > .../ras/ras_ctrl + * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "enable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "inject > /sys/kernel/debug/dri//ras/ras_ctrl * - * Parameters: + * Where N, is the card which you want to affect. * - * op: disable, enable, inject - * disable: only block is needed - * enable: block and error are needed - * inject: error, address, value are needed - * block: umc, sdma, gfx, . + * "disable" requires only the block. + * "enable" requires the block and error type. + * "inject" requires the block, error type, address, and value. + * The block is one of: umc, sdma, gfx, etc. * see ras_block_string[] for details - * error: ue, ce - * ue: multi_uncorrectable - * ce: single_correctable - * sub_block: - * sub block index, pass 0 if there is no sub block + * The error type is one of: ue, ce, where, + * ue is multi-uncorrectable + * ce is single-correctable + * The sub-block is a the sub-block index, pass 0 if there is no sub-block. + * The address and value are hexadecimal numbers, leading 0x is optional. * - * here are some examples for bash commands: + * For instance, * * .. code-block:: bash * @@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl * - * How to check the result? + * How to check the result of the operation? * - * For disable/enable, please check ras features at + * To check disable/enable, see "ras" features at, * /sys/class/drm/card[0/1/2...]/device/ras/features * - * For inject, please check corresponding err count at - * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count + * To check inject, see the corresponding error count at, + * + /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count * * .. note:: * Operations are only allowed on blocks which are supported. - * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask + * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask * to see which blocks support RAS on a particular asic. * */ -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: correction of ucode fw_size calculation errors
[AMD Public Use] Reviewed-by: Alex Deucher From: amd-gfx on behalf of Kevin Wang Sent: Tuesday, April 13, 2021 7:51 AM To: amd-gfx@lists.freedesktop.org Cc: Wang, Kevin(Yang) Subject: [PATCH] drm/amdgpu: correction of ucode fw_size calculation errors correct big and little endian problems on different platforms. Signed-off-by: Kevin Wang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 85a6a10e048f..0a00a7cc8eaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4354,7 +4354,7 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev) le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.mec.mec_fw_obj, &adev->gfx.mec.mec_fw_gpu_addr, @@ -5769,7 +5769,7 @@ static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev) le32_to_cpu(pfp_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(pfp_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, pfp_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.pfp.pfp_fw_obj, &adev->gfx.pfp.pfp_fw_gpu_addr, @@ -5847,7 +5847,7 @@ static int gfx_v10_0_cp_gfx_load_ce_microcode(struct amdgpu_device *adev) le32_to_cpu(ce_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(ce_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, ce_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.ce.ce_fw_obj, &adev->gfx.ce.ce_fw_gpu_addr, @@ -5924,7 +5924,7 @@ static int gfx_v10_0_cp_gfx_load_me_microcode(struct amdgpu_device *adev) le32_to_cpu(me_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(me_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, me_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.me.me_fw_obj, &adev->gfx.me.me_fw_gpu_addr, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 06811a1f4625..831398e15b84 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2013,7 +2013,7 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev) le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.mec.mec_fw_obj, &adev->gfx.mec.mec_fw_gpu_addr, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c index 6274cae4a065..00610def69ea 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -174,7 +174,7 @@ int smu_v11_0_load_microcode(struct smu_context *smu) hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data; src = (const uint32_t *)(adev->pm.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes)); - smc_fw_size = hdr->header.ucode_size_bytes; + smc_fw_size = le32_to_cpu(hdr->header.ucode_size_bytes;) for (i = 1; i < smc_fw_size/4 - 1; i++) { WREG32_PCIE(addr_start, src[i]); diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 0864083e7435..d7ea8215d7dc 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu
Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma
[AMD Official Use Only - Internal Distribution Only] Shouldn't we so something similar for sdma 5.0 as well? Alex From: Su, Jinzhou (Joe) Sent: Tuesday, April 13, 2021 2:23 AM To: amd-gfx@lists.freedesktop.org Cc: Huang, Ray ; Deucher, Alexander ; Koenig, Christian ; Su, Jinzhou (Joe) Subject: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma Add emit mem sync callback for sdma_v5_2 Signed-off-by: Jinzhou Su --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++ 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 93f826a7d3f0..b1ad9e52b234 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -369,6 +369,33 @@ static void sdma_v5_2_ring_emit_ib(struct amdgpu_ring *ring, amdgpu_ring_write(ring, upper_32_bits(csa_mc_addr)); } +/** + * sdma_v5_2_ring_emit_mem_sync - flush the IB by graphics cache rinse + * + * @ring: amdgpu ring pointer + * @job: job to retrieve vmid from + * @ib: IB object to schedule + * + * flush the IB by graphics cache rinse. + */ +static void sdma_v5_2_ring_emit_mem_sync(struct amdgpu_ring *ring) +{ +uint32_t gcr_cntl = + SDMA_GCR_GL2_INV | SDMA_GCR_GL2_WB | SDMA_GCR_GLM_INV | + SDMA_GCR_GL1_INV | SDMA_GCR_GLV_INV | SDMA_GCR_GLK_INV | + SDMA_GCR_GLI_INV(1); + + /* flush entire cache L0/L1/L2, this can be optimized by performance requirement */ + amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD1_BASE_VA_31_7(0)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD2_GCR_CONTROL_15_0(gcr_cntl) | + SDMA_PKT_GCR_REQ_PAYLOAD2_BASE_VA_47_32(0)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD3_LIMIT_VA_31_7(0) | + SDMA_PKT_GCR_REQ_PAYLOAD3_GCR_CONTROL_18_16(gcr_cntl >> 16)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD4_LIMIT_VA_47_32(0) | + SDMA_PKT_GCR_REQ_PAYLOAD4_VMID(0)); +} + /** * sdma_v5_2_ring_emit_hdp_flush - emit an hdp flush on the DMA ring * @@ -1663,6 +1690,7 @@ static const struct amdgpu_ring_funcs sdma_v5_2_ring_funcs = { 10 + 10 + 10, /* sdma_v5_2_ring_emit_fence x3 for user fence, vm fence */ .emit_ib_size = 7 + 6, /* sdma_v5_2_ring_emit_ib */ .emit_ib = sdma_v5_2_ring_emit_ib, + .emit_mem_sync = sdma_v5_2_ring_emit_mem_sync, .emit_fence = sdma_v5_2_ring_emit_fence, .emit_pipeline_sync = sdma_v5_2_ring_emit_pipeline_sync, .emit_vm_flush = sdma_v5_2_ring_emit_vm_flush, -- 2.27.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO
On Mon, Apr 12, 2021 at 7:28 PM Ramesh Errabolu wrote: > > Extend current implementation of SG_TABLE construction method to > allow exportation of sub-buffers of a VRAM BO. This capability will > enable logical partitioning of a VRAM BO into multiple non-overlapping > sub-buffers. One example of this use case is to partition a VRAM BO > into two sub-buffers, one for SRC and another for DST. > > Signed-off-by: Ramesh Errabolu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 6 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 32 ++-- > 3 files changed, 27 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > index e0c4f7c7f1b9..57534b93bd91 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > @@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct > dma_buf_attachment *attach, > struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > struct sg_table *sgt; > + u64 num_bytes; > long r; > > if (!bo->tbo.pin_count) { > @@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct > dma_buf_attachment *attach, > break; > > case TTM_PL_VRAM: > - r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev, > - dir, &sgt); > + num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT; > + r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, > num_bytes, > + attach->dev, dir, &sgt); > if (r) > return ERR_PTR(r); > break; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index dec0db8b0b13..9e38475e0f8d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager > *man); > u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo); > int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, > struct ttm_resource *mem, > + u64 offset, u64 size, > struct device *dev, > enum dma_data_direction dir, > struct sg_table **sgt); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > index 592a2dd16493..fcdee0deba16 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > @@ -25,6 +25,7 @@ > #include > #include "amdgpu.h" > #include "amdgpu_vm.h" > +#include "amdgpu_res_cursor.h" > #include "amdgpu_atomfirmware.h" > #include "atom.h" > > @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct > ttm_resource_manager *man, > * > * @adev: amdgpu device pointer > * @mem: TTM memory object > + * @offset: byte offset from the base of VRAM BO > + * @length: number of bytes to export in sg_table > * @dev: the other device > * @dir: dma direction > * @sgt: resulting sg table > @@ -573,39 +576,45 @@ static void amdgpu_vram_mgr_del(struct > ttm_resource_manager *man, > */ > int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, > struct ttm_resource *mem, > + u64 offset, u64 length, > struct device *dev, > enum dma_data_direction dir, > struct sg_table **sgt) > { > - struct drm_mm_node *node; > + struct amdgpu_res_cursor cursor; > struct scatterlist *sg; > int num_entries = 0; > - unsigned int pages; > int i, r; > > *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL); > if (!*sgt) > return -ENOMEM; > > - for (pages = mem->num_pages, node = mem->mm_node; > -pages; pages -= node->size, ++node) > - ++num_entries; > + // Determine the number of DRM_MM nodes to export > + amdgpu_res_first(mem, offset, length, &cursor); > + while (cursor.remaining) { > + num_entries++; > + amdgpu_res_next(&cursor, cursor.size); > + } > > r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); > if (r) > goto error_free; > > + // Initialize scatterlist nodes of sg_table > for_each_sgtable_sg((*sgt), sg, i) > sg->length = 0; > > - node = mem->mm_node; > + // Walk down DRM_MM nodes to populate scatterlist nodes > + // @note: Use iterator api to get first the DRM_MM node > + // and th
Re: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO
Am 13.04.21 um 15:09 schrieb Sun, Roy: [AMD Official Use Only - Internal Distribution Only] ping -Original Message- From: Roy Sun Sent: Monday, April 12, 2021 8:15 PM To: amd-gfx@lists.freedesktop.org Cc: Sun, Roy Subject: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO Add interface to get the mm clock, temperature and memory load Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 + include/uapi/drm/amdgpu_drm.h | 12 ++ 2 files changed, 62 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b2e774aeab45..e5b16e0819ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } ui32 /= 100; break; + case AMDGPU_INFO_SENSOR_UVD_VCLK: + /* get mm vclk in Mhz */ Maybe drop the comment, it looks redundant. + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_UVD_VCLK, In general I would not repeat that code over and over again, but rather translate the parameter and then call amdgpu_dpm_read_sensor() once. But this is just my top level view on code I don't know well. Regards, Christian. + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + ui32 /= 100; + break; + case AMDGPU_INFO_SENSOR_UVD_DCLK: + /* get mm dclk in Mhz */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_UVD_DCLK, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + ui32 /= 100; + break; + case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP: + /* get junction temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_HOTSPOT_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_EDGE_TEMP: + /* get current edge temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_EDGE_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_MEM_TEMP: + /* get current memory temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_MEM_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_MEM_LOAD: + /* get memory load */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_MEM_LOAD, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; default: DRM_DEBUG_KMS("Invalid request %d\n", info->sensor_info.type); diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 8b832f7458f2..484c72e17c72 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data { #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK 0x8 /* Subquery id: Query GPU stable pstate memory clock */ #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK 0x9 + /* Subquery id: Query GPU mm vclk */ + #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa + /* Subquery id: Query GPU mm dclk */ + #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb + /* Subquery id: Query junction temperature */ + #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc + /* Subquery id: Query edge temperature */ + #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd + /* Subquery id: Query memory temperature */ +
RE: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO
[AMD Official Use Only - Internal Distribution Only] ping -Original Message- From: Roy Sun Sent: Monday, April 12, 2021 8:15 PM To: amd-gfx@lists.freedesktop.org Cc: Sun, Roy Subject: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO Add interface to get the mm clock, temperature and memory load Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 + include/uapi/drm/amdgpu_drm.h | 12 ++ 2 files changed, 62 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b2e774aeab45..e5b16e0819ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } ui32 /= 100; break; + case AMDGPU_INFO_SENSOR_UVD_VCLK: + /* get mm vclk in Mhz */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_UVD_VCLK, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + ui32 /= 100; + break; + case AMDGPU_INFO_SENSOR_UVD_DCLK: + /* get mm dclk in Mhz */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_UVD_DCLK, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + ui32 /= 100; + break; + case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP: + /* get junction temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_HOTSPOT_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_EDGE_TEMP: + /* get current edge temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_EDGE_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_MEM_TEMP: + /* get current memory temperature */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_MEM_TEMP, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; + case AMDGPU_INFO_SENSOR_MEM_LOAD: + /* get memory load */ + if (amdgpu_dpm_read_sensor(adev, + AMDGPU_PP_SENSOR_MEM_LOAD, + (void *)&ui32, &ui32_size)) { + return -EINVAL; + } + break; default: DRM_DEBUG_KMS("Invalid request %d\n", info->sensor_info.type); diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 8b832f7458f2..484c72e17c72 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data { #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK 0x8 /* Subquery id: Query GPU stable pstate memory clock */ #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK 0x9 + /* Subquery id: Query GPU mm vclk */ + #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa + /* Subquery id: Query GPU mm dclk */ + #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb + /* Subquery id: Query junction temperature */ + #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc + /* Subquery id: Query edge temperature */ + #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd + /* Subquery id: Query memory temperature */ + #define AMDGPU_INFO_SENSOR_MEM_TEMP 0xe + /* Subquery id: Query Memory load */ + #define AMDGPU_INFO_SENSOR_MEM_LOAD 0xf /* Number of VRAM page faults on CPU access. */ #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS 0x1E #define AMDGPU_INFO_VRAM_LOST_COUNTER 0x1F -- 2.31.1 _
RE: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface
[AMD Public Use] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Tuikov, Luben Sent: Tuesday, April 13, 2021 20:56 To: amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Deucher, Alexander ; Zhang, Hawking Subject: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the context. Cc: Alexander Deucher Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 + 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 30cda4b8a461..44dfb3613e37 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, /** * DOC: AMDGPU RAS debugfs control interface * - * It accepts struct ras_debug_if who has two members. + * The control interface accepts struct ras_debug_if which has two members. * * First member: ras_debug_if::head or ras_debug_if::inject. * @@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * * How to use the interface? * - * Programs + * In a program * - * Copy the struct ras_debug_if in your codes and initialize it. - * Write the struct to the control node. + * Copy the struct ras_debug_if in your code and initialize it. + * Write the struct to the control interface. * - * Shells + * From shell * * .. code-block:: bash * - * echo op block [error [sub_block address value]] > .../ras/ras_ctrl + * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "enable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "inject > /sys/kernel/debug/dri//ras/ras_ctrl * - * Parameters: + * Where N, is the card which you want to affect. * - * op: disable, enable, inject - * disable: only block is needed - * enable: block and error are needed - * inject: error, address, value are needed - * block: umc, sdma, gfx, . + * "disable" requires only the block. + * "enable" requires the block and error type. + * "inject" requires the block, error type, address, and value. + * The block is one of: umc, sdma, gfx, etc. * see ras_block_string[] for details - * error: ue, ce - * ue: multi_uncorrectable - * ce: single_correctable - * sub_block: - * sub block index, pass 0 if there is no sub block + * The error type is one of: ue, ce, where, + * ue is multi-uncorrectable + * ce is single-correctable + * The sub-block is a the sub-block index, pass 0 if there is no sub-block. + * The address and value are hexadecimal numbers, leading 0x is optional. * - * here are some examples for bash commands: + * For instance, * * .. code-block:: bash * @@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl * - * How to check the result? + * How to check the result of the operation? * - * For disable/enable, please check ras features at + * To check disable/enable, see "ras" features at, * /sys/class/drm/card[0/1/2...]/device/ras/features * - * For inject, please check corresponding err count at - * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count + * To check inject, see the corresponding error count at, + * + /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count * * .. note:: * Operations are only allowed on blocks which are supported. - * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask + * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask * to see which blocks support RAS on a particular asic. * */ -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface
Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the context. Cc: Alexander Deucher Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 + 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 30cda4b8a461..44dfb3613e37 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, /** * DOC: AMDGPU RAS debugfs control interface * - * It accepts struct ras_debug_if who has two members. + * The control interface accepts struct ras_debug_if which has two members. * * First member: ras_debug_if::head or ras_debug_if::inject. * @@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * * How to use the interface? * - * Programs + * In a program * - * Copy the struct ras_debug_if in your codes and initialize it. - * Write the struct to the control node. + * Copy the struct ras_debug_if in your code and initialize it. + * Write the struct to the control interface. * - * Shells + * From shell * * .. code-block:: bash * - * echo op block [error [sub_block address value]] > .../ras/ras_ctrl + * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "enable " > /sys/kernel/debug/dri//ras/ras_ctrl + * echo "inject > /sys/kernel/debug/dri//ras/ras_ctrl * - * Parameters: + * Where N, is the card which you want to affect. * - * op: disable, enable, inject - * disable: only block is needed - * enable: block and error are needed - * inject: error, address, value are needed - * block: umc, sdma, gfx, . + * "disable" requires only the block. + * "enable" requires the block and error type. + * "inject" requires the block, error type, address, and value. + * The block is one of: umc, sdma, gfx, etc. * see ras_block_string[] for details - * error: ue, ce - * ue: multi_uncorrectable - * ce: single_correctable - * sub_block: - * sub block index, pass 0 if there is no sub block + * The error type is one of: ue, ce, where, + * ue is multi-uncorrectable + * ce is single-correctable + * The sub-block is a the sub-block index, pass 0 if there is no sub-block. + * The address and value are hexadecimal numbers, leading 0x is optional. * - * here are some examples for bash commands: + * For instance, * * .. code-block:: bash * @@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl * - * How to check the result? + * How to check the result of the operation? * - * For disable/enable, please check ras features at + * To check disable/enable, see "ras" features at, * /sys/class/drm/card[0/1/2...]/device/ras/features * - * For inject, please check corresponding err count at - * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count + * To check inject, see the corresponding error count at, + * /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count * * .. note:: * Operations are only allowed on blocks which are supported. - * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask + * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask * to see which blocks support RAS on a particular asic. * */ -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 3/4] drm/amdgpu: Add bad_page_cnt_threshold to debugfs
Add bad_page_cnt_threshold to debugfs, an optional file system used for debugging, for reporting purposes only--it usually matches the size of EEPROM but may be different depending on the "bad_page_threshold" kernel module option. The "bad_page_cnt_threshold" is a dynamically computed value. It depends on three things: the VRAM size; the size of the EEPROM (or the size allocated to the RAS table therein); and the "bad_page_threshold" module parameter. It is a dynamically computed value, when the amdgpu module is run, on which further parameters and logic depend, and as such it is helpful to see the dynamically computed value in debugfs. Cc: Alexander Deucher Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 3ca6b51f0c9c..30cda4b8a461 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1263,6 +1263,8 @@ static struct dentry *amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device * &amdgpu_ras_debugfs_ctrl_ops); debugfs_create_file("ras_eeprom_reset", S_IWUGO | S_IRUGO, dir, adev, &amdgpu_ras_debugfs_eeprom_ops); + debugfs_create_u32("bad_page_cnt_threshold", 0444, dir, + &con->bad_page_cnt_threshold); /* * After one uncorrectable error happens, usually GPU recovery will -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/4] drm/amdgpu: Fix a bug in checking the result of reserve page
Fix if (ret) --> if (!ret), a bug, for "retire_page", which caused the kernel to recall the method with *pos == end of file, and that bounced back with error. On the first run, we advanced *pos, but returned 0 back to fs layer, also a bug. Fix the logic of the check of the result of amdgpu_reserve_page_direct()--it is 0 on success, and non-zero on error, not the other way around. This patch fixes this bug. Cc: Alexander Deucher Cc: John Clements Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 9041453465f1..3ca6b51f0c9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -114,7 +114,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device *adev, uint64_t addre if (amdgpu_ras_check_bad_page(adev, address)) { dev_warn(adev->dev, -"RAS WARN: 0x%llx has been marked as bad page!\n", +"RAS WARN: 0x%llx has already been marked as bad page!\n", address); return 0; } @@ -228,7 +228,6 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, return -EINVAL; if (op != -1) { - if (op == 3) { if (sscanf(str, "%*s %llx", &address) != 1) return -EINVAL; @@ -364,11 +363,9 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file *f, const char __user * if (ret) return -EINVAL; - if (data.op == 3) - { + if (data.op == 3) { ret = amdgpu_reserve_page_direct(adev, data.inject.address); - - if (ret) + if (!ret) return size; else return ret; -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/4] drm/amdgpu: Fix a bug for input with double sscanf
Remove double-sscanf to scan for %llu and 0x%llx, as that is not going to work! The %llu will consume the "0" in "0x" of your input, and the hex value you think you're entering will always be 0. That is, a valid hex value can never be consumed. On the other hand, just entering a hex number without leading 0x will either be scanned as a string and not match, for instance FAB123, or the leading decimal portion is scanned as the %llu, for instance 123FAB will be scanned as 123, which is not correct. Thus remove the first %llu scan and leave only the %llx scan, removing the leading 0x since %llx can scan either. Addresses are usually always hex values, so this suffices. Cc: Alexander Deucher Cc: Xinhui Pan Cc: Hawking Zhang Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 0541196ae1ed..9041453465f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -230,9 +230,8 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, if (op != -1) { if (op == 3) { - if (sscanf(str, "%*s %llu", &address) != 1) - if (sscanf(str, "%*s 0x%llx", &address) != 1) - return -EINVAL; + if (sscanf(str, "%*s %llx", &address) != 1) + return -EINVAL; data->op = op; data->inject.address = address; @@ -255,11 +254,9 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, data->op = op; if (op == 2) { - if (sscanf(str, "%*s %*s %*s %u %llu %llu", - &sub_block, &address, &value) != 3) - if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 0x%llx", - &sub_block, &address, &value) != 3) - return -EINVAL; + if (sscanf(str, "%*s %*s %*s %x %llx %llx", + &sub_block, &address, &value) != 3) + return -EINVAL; data->head.sub_block_index = sub_block; data->inject.address = address; data->inject.value = value; -- 2.31.0.97.g1424303384 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface
Am 13.04.21 um 14:14 schrieb Roy Sun: Tracking devices, process info and fence info using /proc/pid/fdinfo Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/Makefile| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 59 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 149 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h | 43 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 21 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 35 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 + 12 files changed, 321 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index ee85e8aba636..d216b7ecb5d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,6 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o +amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o + amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o # add asic specific block diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 616c85a01299..c2338a0dd1f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -107,6 +107,7 @@ #include "amdgpu_gfxhub.h" #include "amdgpu_df.h" #include "amdgpu_smuio.h" +#include "amdgpu_fdinfo.h" #define MAX_GPU_INSTANCE 16 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 0350205c4897..8d33571754d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -651,3 +651,62 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr) idr_destroy(&mgr->ctx_handles); mutex_destroy(&mgr->lock); } + +ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip, + uint32_t idx, uint64_t *elapsed) That starts to look better, but please change this function so that it gets a amdgpu_ctx_mgr structure as first parameter and rename it to amdgpu_ctx_mgr_fence_usage. +{ + struct amdgpu_ctx_entity *centity; + struct idr *idp; + struct amdgpu_ctx *ctx; + uint32_t id, i; + ktime_t now, t1; + ktime_t total = 0, max = 0; + + + if (idx >= AMDGPU_MAX_ENTITY_NUM) + return 0; + + idp = &fpriv->ctx_mgr.ctx_handles; + + mutex_lock(&fpriv->ctx_mgr.lock); + idr_for_each_entry(idp, ctx, id) { Maybe move everything from here till the end of the loop into a separate function. + if (!ctx->entities[hwip][idx]) + continue; + + centity = ctx->entities[hwip][idx]; + now = ktime_get(); + for (i = 0; i < amdgpu_sched_jobs; i++) { + struct dma_fence *fence; + struct drm_sched_fence *s_fence; + + spin_lock(&ctx->ring_lock); + fence = dma_fence_get(centity->fences[i]); + spin_unlock(&ctx->ring_lock); + if (!fence) + continue; + s_fence = to_drm_sched_fence(fence); + if (!dma_fence_is_signaled(&s_fence->scheduled)) + continue; + t1 = s_fence->scheduled.timestamp; + if (t1 >= now) + continue; + if (dma_fence_is_signaled(&s_fence->finished) && + s_fence->finished.timestamp < now) That is indented to far to the right, the "s_fence..." should be below the "dma_fence..." in the line above. + total += ktime_sub(s_fence->finished.timestamp, t1); + else + total += ktime_sub(now, t1); + t1 = ktime_sub(now, t1); + dma_fence_put(fence); + if (t1 > max) + max = t1; Please write this as "max = max(max, t1);". + } + + } + + mutex_unlock(&fpriv->ctx_mgr.lock); + + if (elapsed) + *elapsed = max; + + return total; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h index f54e10314661..8ee42f12e5f8 100644 --- a/drivers/gpu/drm/amd/amdgpu
[PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface
Tracking devices, process info and fence info using /proc/pid/fdinfo Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/Makefile| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 59 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 149 + drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h | 43 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 21 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h| 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 35 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 + 12 files changed, 321 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index ee85e8aba636..d216b7ecb5d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -58,6 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \ amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \ amdgpu_fw_attestation.o amdgpu_securedisplay.o +amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o + amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o # add asic specific block diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 616c85a01299..c2338a0dd1f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -107,6 +107,7 @@ #include "amdgpu_gfxhub.h" #include "amdgpu_df.h" #include "amdgpu_smuio.h" +#include "amdgpu_fdinfo.h" #define MAX_GPU_INSTANCE 16 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 0350205c4897..8d33571754d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -651,3 +651,62 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr) idr_destroy(&mgr->ctx_handles); mutex_destroy(&mgr->lock); } + +ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip, + uint32_t idx, uint64_t *elapsed) +{ + struct amdgpu_ctx_entity *centity; + struct idr *idp; + struct amdgpu_ctx *ctx; + uint32_t id, i; + ktime_t now, t1; + ktime_t total = 0, max = 0; + + + if (idx >= AMDGPU_MAX_ENTITY_NUM) + return 0; + + idp = &fpriv->ctx_mgr.ctx_handles; + + mutex_lock(&fpriv->ctx_mgr.lock); + idr_for_each_entry(idp, ctx, id) { + if (!ctx->entities[hwip][idx]) + continue; + + centity = ctx->entities[hwip][idx]; + now = ktime_get(); + for (i = 0; i < amdgpu_sched_jobs; i++) { + struct dma_fence *fence; + struct drm_sched_fence *s_fence; + + spin_lock(&ctx->ring_lock); + fence = dma_fence_get(centity->fences[i]); + spin_unlock(&ctx->ring_lock); + if (!fence) + continue; + s_fence = to_drm_sched_fence(fence); + if (!dma_fence_is_signaled(&s_fence->scheduled)) + continue; + t1 = s_fence->scheduled.timestamp; + if (t1 >= now) + continue; + if (dma_fence_is_signaled(&s_fence->finished) && + s_fence->finished.timestamp < now) + total += ktime_sub(s_fence->finished.timestamp, t1); + else + total += ktime_sub(now, t1); + t1 = ktime_sub(now, t1); + dma_fence_put(fence); + if (t1 > max) + max = t1; + } + + } + + mutex_unlock(&fpriv->ctx_mgr.lock); + + if (elapsed) + *elapsed = max; + + return total; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h index f54e10314661..8ee42f12e5f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h @@ -87,5 +87,6 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr); void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr); long amdgpu_ctx_mgr_entity_flush(struct amdgpu_ctx_mgr *mgr, long timeout); void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr); - +ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip, + uint32_t idx, uint64_t *elapsed); #endif diff --git a/drivers
[PATCH 1/2] drm/scheduler: Change scheduled fence track
Update the timestamp of scheduled fence on HW completion of the previous fences This allow more accurate tracking of the fence execution in HW Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/scheduler/sched_main.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92d8de24d0a1..4e5d8d4af010 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) EXPORT_SYMBOL(drm_sched_resubmit_jobs); /** - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs from mirror ring list + * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs from pending list * * @sched: scheduler instance * @max: job numbers to relaunch @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) static struct drm_sched_job * drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) { - struct drm_sched_job *job; + struct drm_sched_job *job, *next; /* * Don't destroy jobs while the timeout worker is running OR thread @@ -690,6 +690,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) if (job && dma_fence_is_signaled(&job->s_fence->finished)) { /* remove job from pending_list */ list_del_init(&job->list); + /* account for the next fence in the queue */ + next = list_first_entry_or_null(&sched->pending_list, + struct drm_sched_job, list); + if (next) { + next->s_fence->scheduled.timestamp = + job->s_fence->finished.timestamp; + } } else { job = NULL; /* queue timeout for next job */ -- 2.31.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: correction of ucode fw_size calculation errors
correct big and little endian problems on different platforms. Signed-off-by: Kevin Wang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 85a6a10e048f..0a00a7cc8eaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4354,7 +4354,7 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev) le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.mec.mec_fw_obj, &adev->gfx.mec.mec_fw_gpu_addr, @@ -5769,7 +5769,7 @@ static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev) le32_to_cpu(pfp_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(pfp_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, pfp_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.pfp.pfp_fw_obj, &adev->gfx.pfp.pfp_fw_gpu_addr, @@ -5847,7 +5847,7 @@ static int gfx_v10_0_cp_gfx_load_ce_microcode(struct amdgpu_device *adev) le32_to_cpu(ce_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(ce_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, ce_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.ce.ce_fw_obj, &adev->gfx.ce.ce_fw_gpu_addr, @@ -5924,7 +5924,7 @@ static int gfx_v10_0_cp_gfx_load_me_microcode(struct amdgpu_device *adev) le32_to_cpu(me_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(me_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, me_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.me.me_fw_obj, &adev->gfx.me.me_fw_gpu_addr, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 06811a1f4625..831398e15b84 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2013,7 +2013,7 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev) le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes)); fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes); - r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes, + r = amdgpu_bo_create_reserved(adev, fw_size, PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT, &adev->gfx.mec.mec_fw_obj, &adev->gfx.mec.mec_fw_gpu_addr, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c index 6274cae4a065..00610def69ea 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -174,7 +174,7 @@ int smu_v11_0_load_microcode(struct smu_context *smu) hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data; src = (const uint32_t *)(adev->pm.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes)); - smc_fw_size = hdr->header.ucode_size_bytes; + smc_fw_size = le32_to_cpu(hdr->header.ucode_size_bytes;) for (i = 1; i < smc_fw_size/4 - 1; i++) { WREG32_PCIE(addr_start, src[i]); diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 0864083e7435..d7ea8215d7dc 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -149,7 +149,7 @@ int smu_v13_0_load_microcode(struct smu_context *smu) hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data; src = (const uint32_t *)(adev->pm.fw->data + le32_to_cpu(hdr->header.ucode_array_offset_bytes)); - smc_fw
[bug report] drm/amdgpu: page retire over debugfs mechanism
Hello John Clements, The patch cbb8f989d5a0: "drm/amdgpu: page retire over debugfs mechanism" from Apr 9, 2021, leads to the following static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:377 amdgpu_ras_debugfs_ctrl_write() info: return a literal instead of 'ret' drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 366 ret = amdgpu_ras_debugfs_ctrl_parse_data(f, buf, size, pos, &data); 367 if (ret) 368 return -EINVAL; 369 370 if (data.op == 3) 371 { ^ Please use scripts/checkpatch.pl... :( Bonus points for replacing the magic number 3 with a define? 372 ret = amdgpu_reserve_page_direct(adev, data.inject.address); 373 374 if (ret) 375 return size; 376 else 377 return ret; This static checker warning is disguised as a style warning, but it's really to detect code like this where the if statements are reversed as appears to be the case here. ret = amdgpu_reserve_page_direct(adev, data.inject.address); if (ret) return ret; return size; 378 } 379 380 if (!amdgpu_ras_is_supported(adev, data.head.block)) 381 return -EINVAL; regards, dan carpenter ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Hi Dennis, yeah, that just has the same down side of a lot of additional overhead as the is_signaled callback. Bouncing cache lines on the CPU isn't funny at all. Christian. Am 13.04.21 um 11:13 schrieb Li, Dennis: [AMD Official Use Only - Internal Distribution Only] Hi, Christian and Andrey, We maybe try to implement "wait" callback function of dma_fence_ops, when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller device lost. * Must return -ERESTARTSYS if the wait is intr = true and the wait was * interrupted, and remaining jiffies if fence has signaled, or 0 if wait * timed out. Can also return other error values on custom implementations, * which should be treated as if the fence is signaled. For example a hardware * lockup could be reported like that. * * This callback is optional. */ signed long (*wait)(struct dma_fence *fence, bool intr, signed long timeout); Best Regards Dennis Li -Original Message- From: Christian König Sent: Tuesday, April 13, 2021 3:10 PM To: Grodzovsky, Andrey ; Koenig, Christian ; Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix ; Zhang, Hawking ; Daniel Vetter Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal on all that are currently there. * Any subsequently created HW fences will be returned signaled with an error code right away */ for_each_ring(adev) amdgpu_fence_process(ring) drm_dev_unplug(dev); Stop schedulers cancel_sync(all timers and queued works); hw_fini unmap_mmio } Andrey Alternatively grabbing the reset write side and stopping and then restarting the scheduler could work as well. Christian. I didn't get the above and I don't see why I need to reuse the GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not clear to me why are we focusing on the scheduler threads, any code patch to generate HW fences should be covered, so any code leading to amdgpu_fence_emit needs to be taken into account such as, direct IB submissions, VM flushes e.t.c You need to work together with the reset lock anyway, cause a hotplug could run at the same time as a reset. For going my way indeed now I see now that I have to take reset write side lock during HW fences signalling in order to protect against scheduler/HW fences detachment and reattachment during schedulers stop/restart. But if we go with your approach then callin
RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
[AMD Official Use Only - Internal Distribution Only] Hi, Christian and Andrey, We maybe try to implement "wait" callback function of dma_fence_ops, when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller device lost. * Must return -ERESTARTSYS if the wait is intr = true and the wait was * interrupted, and remaining jiffies if fence has signaled, or 0 if wait * timed out. Can also return other error values on custom implementations, * which should be treated as if the fence is signaled. For example a hardware * lockup could be reported like that. * * This callback is optional. */ signed long (*wait)(struct dma_fence *fence, bool intr, signed long timeout); Best Regards Dennis Li -Original Message- From: Christian König Sent: Tuesday, April 13, 2021 3:10 PM To: Grodzovsky, Andrey ; Koenig, Christian ; Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix ; Zhang, Hawking ; Daniel Vetter Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: > > On 2021-04-12 3:18 p.m., Christian König wrote: >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: >>> [SNIP] > > So what's the right approach ? How we guarantee that when running > amdgpu_fence_driver_force_completion we will signal all the HW > fences and not racing against some more fences insertion into that > array ? > Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. >>> >>> >>> My question is, even now, when we run >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, >>> what there prevents a race with another fence being at the same time >>> emitted and inserted into the fence array ? Looks like nothing. >>> >> >> Each ring can only be used by one thread at the same time, this >> includes emitting fences as well as other stuff. >> >> During GPU reset we make sure nobody writes to the rings by stopping >> the scheduler and taking the GPU reset lock (so that nobody else can >> start the scheduler again). > > > What about direct submissions not through scheduler - > amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. >> BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? >>> >>> >>> I haven't actually experienced any deadlock until now but, yes, >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the >>> presence of multiple devices from same or different drivers we in >>> fact are dependent on all their critical sections i guess. >>> >> >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually >> need to sync that up with Daniel and the rest of the i915 guys. >> >> IIRC we could actually have an amdgpu device in a docking station >> which needs hotplug and the driver might depend on waiting for the >> i915 driver as well. > > > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I > don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. > > Andrey > > >> >> Christian. >> >>> Andrey >>> >>> Christian. > Andrey > > >> >>> Andrey >>> >>> Christian. > /* Past this point no more fence are submitted to HW ring > and hence we can safely call force signal on all that are > currently there. > * Any subsequently created HW fences will be returned > signaled with an error code right away > */ > > for_each_ring(adev) > amdgpu_fence_process(ring) > > drm_dev_unplug(dev); > Stop schedulers > cancel_sync(all timers and queued works); > hw_fini > unmap_mmio > > } > > > Andrey > > >> >> >>> >> >> Alternatively grabbing the reset write side and stopping >> and then restarting the scheduler could work as well. >> >> Christian. > > > I didn't get the above and I don't see why I need to reuse > the GPU reset rw_lock. I rely on the SRCU unplug flag for > unplug. Also, not clear to me why are we focusing on the > scheduler threads, any code patch to generate HW fences > should be covered,
[PATCH] drm/radeon/cik: remove set but not used variables
The value of pipe_id and queue_id are not used under certain circumstances, so just delete. Signed-off-by: Tian Tao --- drivers/gpu/drm/radeon/cik.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 8b7a4f7..42a8afa 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -7948,8 +7948,6 @@ int cik_irq_process(struct radeon_device *rdev) DRM_ERROR("Illegal register access in command stream\n"); /* XXX check the bitfield order! */ me_id = (ring_id & 0x60) >> 5; - pipe_id = (ring_id & 0x18) >> 3; - queue_id = (ring_id & 0x7) >> 0; switch (me_id) { case 0: /* This results in a full GPU reset, but all we need to do is soft @@ -7971,8 +7969,6 @@ int cik_irq_process(struct radeon_device *rdev) DRM_ERROR("Illegal instruction in command stream\n"); /* XXX check the bitfield order! */ me_id = (ring_id & 0x60) >> 5; - pipe_id = (ring_id & 0x18) >> 3; - queue_id = (ring_id & 0x7) >> 0; switch (me_id) { case 0: /* This results in a full GPU reset, but all we need to do is soft -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma
Yeah agree, a bit more commit text would be nice to have. Apart from that feel free to add an Acked-by: Christian König as well. Christian. Am 13.04.21 um 08:41 schrieb Huang Rui: On Tue, Apr 13, 2021 at 02:23:00PM +0800, Su, Jinzhou (Joe) wrote: Add emit mem sync callback for sdma_v5_2 I suggest to describe the problem you encountered for this change, most of persons would like to know how. With that fixed, patch is Reviewed-by: Huang Rui Signed-off-by: Jinzhou Su --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++ 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 93f826a7d3f0..b1ad9e52b234 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -369,6 +369,33 @@ static void sdma_v5_2_ring_emit_ib(struct amdgpu_ring *ring, amdgpu_ring_write(ring, upper_32_bits(csa_mc_addr)); } +/** + * sdma_v5_2_ring_emit_mem_sync - flush the IB by graphics cache rinse + * + * @ring: amdgpu ring pointer + * @job: job to retrieve vmid from + * @ib: IB object to schedule + * + * flush the IB by graphics cache rinse. + */ +static void sdma_v5_2_ring_emit_mem_sync(struct amdgpu_ring *ring) +{ +uint32_t gcr_cntl = + SDMA_GCR_GL2_INV | SDMA_GCR_GL2_WB | SDMA_GCR_GLM_INV | + SDMA_GCR_GL1_INV | SDMA_GCR_GLV_INV | SDMA_GCR_GLK_INV | + SDMA_GCR_GLI_INV(1); + + /* flush entire cache L0/L1/L2, this can be optimized by performance requirement */ + amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD1_BASE_VA_31_7(0)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD2_GCR_CONTROL_15_0(gcr_cntl) | + SDMA_PKT_GCR_REQ_PAYLOAD2_BASE_VA_47_32(0)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD3_LIMIT_VA_31_7(0) | + SDMA_PKT_GCR_REQ_PAYLOAD3_GCR_CONTROL_18_16(gcr_cntl >> 16)); + amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD4_LIMIT_VA_47_32(0) | + SDMA_PKT_GCR_REQ_PAYLOAD4_VMID(0)); +} + /** * sdma_v5_2_ring_emit_hdp_flush - emit an hdp flush on the DMA ring * @@ -1663,6 +1690,7 @@ static const struct amdgpu_ring_funcs sdma_v5_2_ring_funcs = { 10 + 10 + 10, /* sdma_v5_2_ring_emit_fence x3 for user fence, vm fence */ .emit_ib_size = 7 + 6, /* sdma_v5_2_ring_emit_ib */ .emit_ib = sdma_v5_2_ring_emit_ib, + .emit_mem_sync = sdma_v5_2_ring_emit_mem_sync, .emit_fence = sdma_v5_2_ring_emit_fence, .emit_pipeline_sync = sdma_v5_2_ring_emit_pipeline_sync, .emit_vm_flush = sdma_v5_2_ring_emit_vm_flush, -- 2.27.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
Hi Mikhail, the crash is a known issue and should be fixed by: commit f63da9ae7584280582cbc834b20cc18bfb203b14 Author: Philip Yang Date: Thu Apr 1 00:22:23 2021 -0400 drm/amdgpu: reserve fence slot to update page table But that an userspace application can cause a page fault is perfectly possible. See here for example https://en.wikipedia.org/wiki/Halting_problem What we do with misbehaving applications is to log the incident and prevent the queue which does nasty things from doing even more submissions. Regards, Christian. Am 13.04.21 um 00:05 schrieb Mikhail Gavrilov: Video demonstration: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F3nkvUeB0GSw&data=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yunKS%2Fbm%2B4eF5IMS4dYH9mKELbM6ajK19pXXgm8dv6Q%3D&reserved=0 How looks kernel traces. 1. [ 7315.156460] amdgpu :0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread obs:cs0 pid 23977) [ 7315.156490] amdgpu :0b:00.0: amdgpu: in page starting at address 0x80011fdf5000 from client 18 [ 7315.156495] amdgpu :0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51 [ 7315.156500] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd) [ 7315.156503] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1 [ 7315.156505] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0 [ 7315.156509] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [ 7315.156510] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 7315.156513] amdgpu :0b:00.0: amdgpu: RW: 0x1 [ 7315.156545] amdgpu :0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread obs:cs0 pid 23977) [ 7315.156549] amdgpu :0b:00.0: amdgpu: in page starting at address 0x80011fdf6000 from client 18 [ 7315.156551] amdgpu :0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51 [ 7315.156554] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd) [ 7315.156556] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1 [ 7315.156559] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0 [ 7315.156561] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [ 7315.156564] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 7315.156566] amdgpu :0b:00.0: amdgpu: RW: 0x1 This is a harmless panic, but nevertheless VAAPI does not work and the application that tried to use the encoder crashed. 2. If we tries again and again encode 4K stream through VAAPI we can encounter the next trace: [12341.860944] [ cut here ] [12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287! [12341.860968] invalid opcode: [#1] SMP NOPTI [12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G W- --- 5.12.0-0.rc5.180.fc35.x86_64+debug #1 [12341.860977] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021 [12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu] [12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0 [12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5 74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f [12341.861112] RSP: 0018:b2f084c87bb0 EFLAGS: 00010246 [12341.861115] RAX: 0002 RBX: 9f9551184998 RCX: [12341.861119] RDX: 0002 RSI: RDI: 9f9551184a50 [12341.861122] RBP: 0002 R08: R09: [12341.861124] R10: R11: R12: 9f91b9a18140 [12341.861127] R13: 9f91c9020740 R14: 9f91c9020768 R15: [12341.861130] FS: () GS:9f984a20() knlGS: [12341.861133] CS: 0010 DS: ES: CR0: 80050033 [12341.861136] CR2: 144e080d8000 CR3: 00010e98c000 CR4: 00350ee0 [12341.861139] Call Trace: [12341.861143] amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu] [12341.861251] amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu] [12341.861356] amdgpu_vm_handle_fault+0x145/0x290 [amdgpu] [12341.861461] gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu] [12341.861571] ? _raw_spin_unlock_irqrestore+0x37/0x40 [12341.861577] ? lock_acquire+0x179/0x3a0 [12341.861583] ? lock_acquire+0x179/0x3a0 [12341.861587] ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu] [12341.861692] amdgpu_irq_dispatch+0xc6/0x240 [amdgpu] [12341.861796] amdgpu_ih_process+0x90/0x110 [amdgpu] [12341.861900] process_one_work+0x2b0/0x5e0 [12341.861906] worker_thread+0x55/0x3c0 [12341.861910] ? process_one_work+0x5e0/0x5e0 [12341.861915] kthread+0x13a/0x150 [12341.861918] ? __kthread_bind_mask+0x60/0x60 [12341.861922
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky: On 2021-04-12 3:18 p.m., Christian König wrote: Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky: [SNIP] So what's the right approach ? How we guarantee that when running amdgpu_fence_driver_force_completion we will signal all the HW fences and not racing against some more fences insertion into that array ? Well I would still say the best approach would be to insert this between the front end and the backend and not rely on signaling fences while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. Each ring can only be used by one thread at the same time, this includes emitting fences as well as other stuff. During GPU reset we make sure nobody writes to the rings by stopping the scheduler and taking the GPU reset lock (so that nobody else can start the scheduler again). What about direct submissions not through scheduler - amdgpu_job_submit_direct, I don't see how this is protected. Those only happen during startup and GPU reset. BTW: Could it be that the device SRCU protects more than one device and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Shit, yeah the devil is a squirrel. So for A+I laptops we actually need to sync that up with Daniel and the rest of the i915 guys. IIRC we could actually have an amdgpu device in a docking station which needs hotplug and the driver might depend on waiting for the i915 driver as well. Can't we propose a patch to make drm_unplug_srcu per drm_device ? I don't see why it has to be global and not per device thing. I'm really wondering the same thing for quite a while now. Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global. Regards, Christian. Andrey Christian. Andrey Christian. Andrey Andrey Christian. /* Past this point no more fence are submitted to HW ring and hence we can safely call force signal on all that are currently there. * Any subsequently created HW fences will be returned signaled with an error code right away */ for_each_ring(adev) amdgpu_fence_process(ring) drm_dev_unplug(dev); Stop schedulers cancel_sync(all timers and queued works); hw_fini unmap_mmio } Andrey Alternatively grabbing the reset write side and stopping and then restarting the scheduler could work as well. Christian. I didn't get the above and I don't see why I need to reuse the GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not clear to me why are we focusing on the scheduler threads, any code patch to generate HW fences should be covered, so any code leading to amdgpu_fence_emit needs to be taken into account such as, direct IB submissions, VM flushes e.t.c You need to work together with the reset lock anyway, cause a hotplug could run at the same time as a reset. For going my way indeed now I see now that I have to take reset write side lock during HW fences signalling in order to protect against scheduler/HW fences detachment and reattachment during schedulers stop/restart. But if we go with your approach then calling drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit should be enough to prevent any concurrent GPU resets during unplug. In fact I already do it anyway - https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&reserved=0 Yes, good point as well. Christian. Andrey Christian. Andrey Christian. Andrey Andrey ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Am 13.04.21 um 07:36 schrieb Andrey Grodzovsky: [SNIP] emit_fence(fence); */* We can't wait forever as the HW might be gone at any point*/** dma_fence_wait_timeout(old_fence, 5S);* You can pretty much ignore this wait here. It is only as a last resort so that we never overwrite the ring buffers. If device is present how can I ignore this ? I think you missed my question here Sorry I thought I answered that below. See this is just the last resort so that we don't need to worry about ring buffer overflows during testing. We should not get here in practice and if we get here generating a deadlock might actually be the best handling. The alternative would be to call BUG(). BTW, I am not sure it's so improbable to get here in case of sudden device remove, if you are during rapid commands submission to the ring during this time you could easily get to ring buffer overrun because EOP interrupts are gone and fences are not removed anymore but new ones keep arriving from new submissions which don't stop yet. During normal operation hardware fences are only created by two code paths: 1. The scheduler when it pushes jobs to the hardware. 2. The KIQ when it does register access on SRIOV. Both are limited in how many submissions could be made. The only case where this here becomes necessary is during GPU reset when we do direct submission bypassing the scheduler for IB and other tests. Christian. Andrey ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx