date:20210413


Am 13.04.21 um 22:57 schrieb Nirmoy Das:

Use bo->tbo.base.size instead of calculating it from num_pages.


Those don't clash with the two I've send out yesterday, don't they?



Signed-off-by: Nirmoy Das 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1345f7eba011..74ecc0c1863f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->mem.mem_type != TTM_PL_VRAM)
return 0;
  
-	size = bo->mem.num_pages << PAGE_SHIFT;

+   size = bo->base.size;
offset = bo->mem.start << PAGE_SHIFT;
if ((offset + size) <= adev->gmc.visible_vram_size)
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..26deace78539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
}
  
-	num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;

+   num_bytes = bo->tbo.base.size;
num_loops = 0;
  
  	amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit


Am 14.04.21 um 08:48 schrieb Felix Kuehling:

Pages in SG BOs were not allocated by TTM. So don't count them against
TTM's pages limit.

Signed-off-by: Felix Kuehling 


Reviewed-by: Christian König 

Going to pick that one up for inclusion in drm-misc-next.

Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_tt.c | 27 ++-
  1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5d8820725b75..e8b8c3257392 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
if (ttm_tt_is_populated(ttm))
return 0;
  
-	atomic_long_add(ttm->num_pages, &ttm_pages_allocated);

-   if (bdev->pool.use_dma32)
-   atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_add(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
  
  	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||

   atomic_long_read(&ttm_dma32_pages_allocated) >
@@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
return 0;
  
  error:

-   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_sub(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
return ret;
  }
  EXPORT_SYMBOL(ttm_tt_populate);
@@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct 
ttm_tt *ttm)
else
ttm_pool_free(&bdev->pool, ttm);
  
-	atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);

-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_sub(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
  
  	ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K




Am 13.04.21 um 23:19 schrieb Mikhail Gavrilov:

On Tue, 13 Apr 2021 at 12:29, Christian König  wrote:

Hi Mikhail,

the crash is a known issue and should be fixed by:

commit f63da9ae7584280582cbc834b20cc18bfb203b14
Author: Philip Yang 
Date:   Thu Apr 1 00:22:23 2021 -0400

  drm/amdgpu: reserve fence slot to update page table


Unfortunately, this commit couldn't fix the initial problem.
1. Result video is jerky if it grabbed and encoded with ffmpeg
(h264_vaapi codec).
2. OBS still crashed if I try to record or stream video.
3. In the kernel log still appears the message "amdgpu: [mmhub] page
fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried
to record or stream video by OBS.


That is expected behavior, the application is just buggy and causing a 
page fault on the GPU.


The kernel should just not crash with a backtrace.

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 9/9] drm/amdgpu: Lock the attached dmabuf in unpopulate

amdgpu_ttm_tt_unpopulate can be called during bo_destroy. The dmabuf->resv
must not be held by the caller or dma_buf_detach will deadlock. This is
probably not the right fix. I get a recursive lock warning with the
reservation held in ttm_bo_release. Should unmap_attachment move to
backend_unbind instead?

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..257750921eed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1216,9 +1216,22 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device 
*bdev,
 
if (ttm->sg && gtt->gobj->import_attach) {
struct dma_buf_attachment *attach;
+   bool locked;
 
attach = gtt->gobj->import_attach;
+   /* FIXME: unpopulate can be called during bo_destroy.
+* The dmabuf->resv must not be held by the caller or
+* dma_buf_detach will deadlock. This is probably not
+* the right fix. I get a recursive lock warning with the
+* reservation held in ttm_bo_releas.. Should
+* unmap_attachment move to backend_unbind instead?
+*/
+   locked = dma_resv_is_locked(attach->dmabuf->resv);
+   if (!locked)
+   dma_resv_lock(attach->dmabuf->resv, NULL);
dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
+   if (!locked)
+   dma_resv_unlock(attach->dmabuf->resv);
ttm->sg = NULL;
return;
}
-- 
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 5/9] drm/amdgpu: DMA map/unmap when updating GPU mappings

DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
with the BO and page tables reserved, so we can safely update the DMA
mapping.

DMA unmap when a BO is unmapped from a GPU and before updating mappings
in restore workers.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++-
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 51502a07fc1d..3bb2ae185bbb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -964,11 +964,12 @@ static int unreserve_bo_and_vms(struct 
bo_vm_reservation_context *ctx,
return ret;
 }
 
-static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
+static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
struct kfd_mem_attachment *entry,
struct amdgpu_sync *sync)
 {
struct amdgpu_bo_va *bo_va = entry->bo_va;
+   struct amdgpu_device *adev = entry->adev;
struct amdgpu_vm *vm = bo_va->base.vm;
 
amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
@@ -977,15 +978,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
 
amdgpu_sync_fence(sync, bo_va->last_pt_update);
 
-   return 0;
+   kfd_mem_dmaunmap_attachment(mem, entry);
 }
 
-static int update_gpuvm_pte(struct amdgpu_device *adev,
-   struct kfd_mem_attachment *entry,
-   struct amdgpu_sync *sync)
+static int update_gpuvm_pte(struct kgd_mem *mem,
+   struct kfd_mem_attachment *entry,
+   struct amdgpu_sync *sync)
 {
-   int ret;
struct amdgpu_bo_va *bo_va = entry->bo_va;
+   struct amdgpu_device *adev = entry->adev;
+   int ret;
+
+   ret = kfd_mem_dmamap_attachment(mem, entry);
+   if (ret)
+   return ret;
 
/* Update the page tables  */
ret = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -997,14 +1003,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
return amdgpu_sync_fence(sync, bo_va->last_pt_update);
 }
 
-static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-   struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
-   bool no_update_pte)
+static int map_bo_to_gpuvm(struct kgd_mem *mem,
+  struct kfd_mem_attachment *entry,
+  struct amdgpu_sync *sync,
+  bool no_update_pte)
 {
int ret;
 
/* Set virtual address for the allocation */
-   ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
+   ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
   amdgpu_bo_size(entry->bo_va->base.bo),
   entry->pte_flags);
if (ret) {
@@ -1016,7 +1023,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
if (no_update_pte)
return 0;
 
-   ret = update_gpuvm_pte(adev, entry, sync);
+   ret = update_gpuvm_pte(mem, entry, sync);
if (ret) {
pr_err("update_gpuvm_pte() failed\n");
goto update_gpuvm_pte_failed;
@@ -1025,7 +1032,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
return 0;
 
 update_gpuvm_pte_failed:
-   unmap_bo_from_gpuvm(adev, entry, sync);
+   unmap_bo_from_gpuvm(mem, entry, sync);
return ret;
 }
 
@@ -1633,7 +1640,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
 entry->va, entry->va + bo_size, entry);
 
-   ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+   ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
  is_invalid_userptr);
if (ret) {
pr_err("Failed to map bo to gpuvm\n");
@@ -1672,7 +1679,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
 {
-   struct amdgpu_device *adev = get_amdgpu_device(kgd);
struct amdkfd_process_info *process_info =
((struct amdgpu_vm *)vm)->process_info;
unsigned long bo_size = mem->bo->tbo.base.size;
@@ -1707,13 +1713,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
 entry->va, entry->va + bo_size, entry);
 
-   ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
-   if (ret == 0) {
-   entry->is_mapped = false;
-   } else {
-   pr_err("failed to unmap VA 0x%llx\n", mem->va);
-   goto unreserve_out;
-   }
+   unmap_bo_from_gpuvm(mem,

[PATCH 6/9] drm/amdgpu: Move kfd_mem_attach outside reservation

This is needed to avoid deadlocks with DMA buf import in the next patch.
Also move PT/PD validation out of kfd_mem_attach, that way the caller
can bo this unconditionally.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 72 +++
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3bb2ae185bbb..1416f3c03f1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -575,6 +575,32 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
}
 }
 
+static int
+kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
+  struct amdgpu_bo **bo)
+{
+   unsigned long bo_size = mem->bo->tbo.base.size;
+   struct drm_gem_object *gobj;
+   int ret;
+
+   ret = amdgpu_bo_reserve(mem->bo, false);
+   if (ret)
+   return ret;
+
+   ret = amdgpu_gem_object_create(adev, bo_size, 1,
+  AMDGPU_GEM_DOMAIN_CPU,
+  0, ttm_bo_type_sg,
+  mem->bo->tbo.base.resv,
+  &gobj);
+   if (ret)
+   return ret;
+
+   amdgpu_bo_unreserve(mem->bo);
+
+   *bo = gem_to_amdgpu_bo(gobj);
+   return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
@@ -596,7 +622,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
uint64_t va = mem->va;
struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
struct amdgpu_bo *bo[2] = {NULL, NULL};
-   struct drm_gem_object *gobj;
int i, ret;
 
if (!va) {
@@ -630,14 +655,9 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
/* Create an SG BO to DMA-map userptrs on other GPUs */
attachment[i]->type = KFD_MEM_ATT_USERPTR;
-   ret = amdgpu_gem_object_create(adev, bo_size, 1,
-  AMDGPU_GEM_DOMAIN_CPU,
-  0, ttm_bo_type_sg,
-  mem->bo->tbo.base.resv,
-  &gobj);
+   ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
if (ret)
goto unwind;
-   bo[i] = gem_to_amdgpu_bo(gobj);
} else {
/* FIXME: Need to DMA-map other BO types */
attachment[i]->type = KFD_MEM_ATT_SHARED;
@@ -662,13 +682,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
va += bo_size;
}
 
-   /* Allocate validate page tables if needed */
-   ret = vm_validate_pt_pd_bos(vm);
-   if (unlikely(ret)) {
-   pr_err("validate_pt_pd_bos() failed\n");
-   goto unwind;
-   }
-
return 0;
 
 unwind:
@@ -1516,12 +1529,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
mem->va + bo_size * (1 + mem->aql_queue));
 
+   ret = unreserve_bo_and_vms(&ctx, false, false);
+
/* Remove from VM internal data structures */
list_for_each_entry_safe(entry, tmp, &mem->attachments, list)
kfd_mem_detach(mem, entry);
 
-   ret = unreserve_bo_and_vms(&ctx, false, false);
-
/* Free the sync object */
amdgpu_sync_free(&mem->sync);
 
@@ -1597,6 +1610,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
mem->va + bo_size * (1 + mem->aql_queue),
vm, domain_string(domain));
 
+   if (!kfd_mem_is_attached(avm, mem)) {
+   ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
+   if (ret)
+   goto out;
+   }
+
ret = reserve_bo_and_vm(mem, vm, &ctx);
if (unlikely(ret))
goto out;
@@ -1610,15 +1629,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
is_invalid_userptr = true;
 
-   if (!kfd_mem_is_attached(avm, mem)) {
-   ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
-   if (ret)
-   goto attach_failed;
-   } else {
-   ret = vm_validate_pt_pd_bos(avm);
-   if (unlikely(ret))
-   goto attach_failed;
-   }
+   ret = vm_validate_pt_pd_bos(avm);
+   if (unlikely(ret))
+   goto out_unreserve;
 
if (mem->mapped_to_gpu_memor

[PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

Pages in SG BOs were not allocated by TTM. So don't count them against
TTM's pages limit.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5d8820725b75..e8b8c3257392 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
if (ttm_tt_is_populated(ttm))
return 0;
 
-   atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_add(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
 
while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
   atomic_long_read(&ttm_dma32_pages_allocated) >
@@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
return 0;
 
 error:
-   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_sub(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct 
ttm_tt *ttm)
else
ttm_pool_free(&bdev->pool, ttm);
 
-   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+   if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+   atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+   if (bdev->pool.use_dma32)
+   atomic_long_sub(ttm->num_pages,
+   &ttm_dma32_pages_allocated);
+   }
 
ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
 }
-- 
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 7/9] drm/amdgpu: Add DMA mapping of GTT BOs

Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 74 ++-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index fc3514ed1b74..3ea51982b720 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -41,6 +41,7 @@ struct amdgpu_device;
 enum kfd_mem_attachment_type {
KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
+   KFD_MEM_ATT_DMABUF, /* DMAbuf to DMA map TTM BOs */
 };
 
 struct kfd_mem_attachment {
@@ -56,6 +57,7 @@ struct kfd_mem_attachment {
 struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
+   struct dma_buf *dmabuf;
struct list_head attachments;
/* protected by amdkfd_process_info.lock */
struct ttm_validate_buffer validate_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 1416f3c03f1d..bb3a96ab8f20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -522,6 +522,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
return ret;
 }
 
+static int
+kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+   return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+}
+
 static int
 kfd_mem_dmamap_attachment(struct kgd_mem *mem,
  struct kfd_mem_attachment *attachment)
@@ -531,6 +541,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
return 0;
case KFD_MEM_ATT_USERPTR:
return kfd_mem_dmamap_userptr(mem, attachment);
+   case KFD_MEM_ATT_DMABUF:
+   return kfd_mem_dmamap_dmabuf(attachment);
default:
WARN_ON_ONCE(1);
}
@@ -560,6 +572,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
ttm->sg = NULL;
 }
 
+static void
+kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+   /* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
+* called
+*/
+}
+
 static void
 kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
struct kfd_mem_attachment *attachment)
@@ -570,6 +595,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
case KFD_MEM_ATT_USERPTR:
kfd_mem_dmaunmap_userptr(mem, attachment);
break;
+   case KFD_MEM_ATT_DMABUF:
+   kfd_mem_dmaunmap_dmabuf(attachment);
+   break;
default:
WARN_ON_ONCE(1);
}
@@ -601,6 +629,36 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct 
kgd_mem *mem,
return 0;
 }
 
+static int
+kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
+ struct amdgpu_bo **bo)
+{
+   struct drm_gem_object *gobj;
+
+   if (!mem->dmabuf) {
+   mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DRM_RDWR : 0);
+   if (IS_ERR(mem->dmabuf)) {
+   mem->dmabuf = NULL;
+   return PTR_ERR(mem->dmabuf);
+   }
+   }
+
+   gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
+   if (IS_ERR(gobj))
+   return PTR_ERR(gobj);
+
+   /* Import takes an extra reference on the dmabuf. Drop it now to
+* avoid leaking it. We only need the one reference in
+* kgd_mem->dmabuf.
+*/
+   dma_buf_put(mem->dmabuf);
+
+   *bo = gem_to_amdgpu_bo(gobj);
+   return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
@@ -658,8 +716,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
if (ret)
goto unwind;
+   } else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
+  mem->bo->tbo.type != ttm_bo_type_sg) {
+   /* GTT BOs use DMA-mapping ability of dynamic-attach
+

[PATCH 3/9] drm/amdgpu: Simplify AQL queue mapping

Do AQL queue double-mapping with a single attach call. That will make it
easier to create per-GPU BOs later, to be shared between the two BO VA
mappings on the same GPU.

Freeing the attachments is not necessary if map_to_gpu fails. These will be
cleaned up when the kdg_mem object is destroyed in
amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 103 --
 1 file changed, 48 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 40a296ca37b9..114fbf508707 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -484,70 +484,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, 
struct kgd_mem *mem)
  * 4a.  Validate new page tables and directories
  */
 static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
-   struct amdgpu_vm *vm, bool is_aql,
-   struct kfd_mem_attachment **p_attachment)
+   struct amdgpu_vm *vm, bool is_aql)
 {
unsigned long bo_size = mem->bo->tbo.base.size;
uint64_t va = mem->va;
-   struct kfd_mem_attachment *attachment;
-   struct amdgpu_bo *bo;
-   int ret;
+   struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
+   struct amdgpu_bo *bo[2] = {NULL, NULL};
+   int i, ret;
 
if (!va) {
pr_err("Invalid VA when adding BO to VM\n");
return -EINVAL;
}
 
-   if (is_aql)
-   va += bo_size;
-
-   attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
-   if (!attachment)
-   return -ENOMEM;
+   for (i = 0; i <= is_aql; i++) {
+   attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL);
+   if (unlikely(!attachment[i])) {
+   ret = -ENOMEM;
+   goto unwind;
+   }
 
-   pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
-   va + bo_size, vm);
+   pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
+va + bo_size, vm);
 
-   /* FIXME: For now all attachments use the same BO. This is incorrect
-* because one BO can only have one DMA mapping for one GPU. We need
-* one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
-* will be addressed one BO-type at a time in subsequent patches.
-*/
-   bo = mem->bo;
-   drm_gem_object_get(&bo->tbo.base);
+   /* FIXME: For now all attachments use the same BO. This is
+* incorrect because one BO can only have one DMA mapping
+* for one GPU. We need one BO per GPU, e.g. a DMABuf
+* import with dynamic attachment. This will be addressed
+* one BO-type at a time in subsequent patches.
+*/
+   bo[i] = mem->bo;
+   drm_gem_object_get(&bo[i]->tbo.base);
 
-   /* Add BO to VM internal data structures*/
-   attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
-   if (!attachment->bo_va) {
-   ret = -EINVAL;
-   pr_err("Failed to add BO object to VM. ret == %d\n",
-   ret);
-   goto err_vmadd;
-   }
+   /* Add BO to VM internal data structures */
+   attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+   if (unlikely(!attachment[i]->bo_va)) {
+   ret = -ENOMEM;
+   pr_err("Failed to add BO object to VM. ret == %d\n",
+  ret);
+   goto unwind;
+   }
 
-   attachment->va = va;
-   attachment->pte_flags = get_pte_flags(adev, mem);
-   attachment->adev = adev;
-   list_add(&attachment->list, &mem->attachments);
+   attachment[i]->va = va;
+   attachment[i]->pte_flags = get_pte_flags(adev, mem);
+   attachment[i]->adev = adev;
+   list_add(&attachment[i]->list, &mem->attachments);
 
-   if (p_attachment)
-   *p_attachment = attachment;
+   va += bo_size;
+   }
 
/* Allocate validate page tables if needed */
ret = vm_validate_pt_pd_bos(vm);
if (unlikely(ret)) {
pr_err("validate_pt_pd_bos() failed\n");
-   goto err_alloc_pts;
+   goto unwind;
}
 
return 0;
 
-err_alloc_pts:
-   amdgpu_vm_bo_rmv(adev, attachment->bo_va);
-   list_del(&attachment->list);
-err_vmadd:
-   drm_gem_object_put(&bo->tbo.base);
-   kfree(attachment);
+unwind:
+   for (; i >= 0; i--) {
+   if (!attachment[i])
+   continue;
+   if (attachment[i]->bo_va) {
+   amdgpu_vm_bo_rmv(adev,

[PATCH 0/9] Implement multi-GPU DMA mappings for KFD

This patch series fixes DMA-mappings of system memory (GTT and userptr)
for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per
GPU is needed to maintain the DMA mappings of each BO.

I ran into some reservation issues when unmapping or freeing DMA-buf
imports. There are a few FIXME comments in this patch series where I'm
hoping for some expert advice. Patches 8 and 9 are some related fixes
in TTM and amdgpu_ttm. I'm pretty sure patch 9 is not the right way to
do this.

Felix Kuehling (9):
  drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  drm/amdgpu: Keep a bo-reference per-attachment
  drm/amdgpu: Simplify AQL queue mapping
  drm/amdgpu: Add multi-GPU DMA mapping helpers
  drm/amdgpu: DMA map/unmap when updating GPU mappings
  drm/amdgpu: Move kfd_mem_attach outside reservation
  drm/amdgpu: Add DMA mapping of GTT BOs
  drm/ttm: Don't count pages in SG BOs against pages_limit
  drm/amdgpu: Lock the attached dmabuf in unpopulate

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  18 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 535 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  13 +
 drivers/gpu/drm/ttm/ttm_tt.c  |  27 +-
 4 files changed, 420 insertions(+), 173 deletions(-)

-- 
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/9] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment

This name is more fitting, especially for the changes coming next to
support multi-GPU systems with proper DMA mappings. Cleaned up the code
and renamed some related functions and variables to improve readability.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 209 +-
 2 files changed, 104 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 14f68c028126..910c50956e16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,10 +38,10 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
 
-struct kfd_bo_va_list {
-   struct list_head bo_list;
+struct kfd_mem_attachment {
+   struct list_head list;
struct amdgpu_bo_va *bo_va;
-   void *kgd_dev;
+   struct amdgpu_device *adev;
bool is_mapped;
uint64_t va;
uint64_t pte_flags;
@@ -50,7 +50,7 @@ struct kfd_bo_va_list {
 struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
-   struct list_head bo_va_list;
+   struct list_head attachments;
/* protected by amdkfd_process_info.lock */
struct ttm_validate_buffer validate_list;
struct ttm_validate_buffer resv_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6622695a5eed..d021152314ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -75,16 +75,16 @@ static inline struct amdgpu_device 
*get_amdgpu_device(struct kgd_dev *kgd)
return (struct amdgpu_device *)kgd;
 }
 
-static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
+static bool kfd_mem_is_attached(struct amdgpu_vm *avm,
struct kgd_mem *mem)
 {
-   struct kfd_bo_va_list *entry;
+   struct kfd_mem_attachment *entry;
 
-   list_for_each_entry(entry, &mem->bo_va_list, bo_list)
+   list_for_each_entry(entry, &mem->attachments, list)
if (entry->bo_va->base.vm == avm)
-   return false;
+   return true;
 
-   return true;
+   return false;
 }
 
 /* Set memory usage limits. Current, limits are
@@ -471,7 +471,7 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, 
struct kgd_mem *mem)
return pte_flags;
 }
 
-/* add_bo_to_vm - Add a BO to a VM
+/* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
  * to a VM. It can later be mapped and unmapped many times without
@@ -483,15 +483,14 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, 
struct kgd_mem *mem)
  * 4. Alloc page tables and directories if needed
  * 4a.  Validate new page tables and directories
  */
-static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
+static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
struct amdgpu_vm *vm, bool is_aql,
-   struct kfd_bo_va_list **p_bo_va_entry)
+   struct kfd_mem_attachment **p_attachment)
 {
int ret;
-   struct kfd_bo_va_list *bo_va_entry;
+   struct kfd_mem_attachment *attachment;
struct amdgpu_bo *bo = mem->bo;
uint64_t va = mem->va;
-   struct list_head *list_bo_va = &mem->bo_va_list;
unsigned long bo_size = bo->tbo.base.size;
 
if (!va) {
@@ -502,29 +501,29 @@ static int add_bo_to_vm(struct amdgpu_device *adev, 
struct kgd_mem *mem,
if (is_aql)
va += bo_size;
 
-   bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
-   if (!bo_va_entry)
+   attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
+   if (!attachment)
return -ENOMEM;
 
pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
va + bo_size, vm);
 
/* Add BO to VM internal data structures*/
-   bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
-   if (!bo_va_entry->bo_va) {
+   attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
+   if (!attachment->bo_va) {
ret = -EINVAL;
pr_err("Failed to add BO object to VM. ret == %d\n",
ret);
goto err_vmadd;
}
 
-   bo_va_entry->va = va;
-   bo_va_entry->pte_flags = get_pte_flags(adev, mem);
-   bo_va_entry->kgd_dev = (void *)adev;
-   list_add(&bo_va_entry->bo_list, list_bo_va);
+   attachment->va = va;
+   attachment->pte_flags = get_pte_flags(adev, mem);
+   attachment->adev = adev;
+   list_add(&attachment->list, &mem->attachments);
 
-   if (p_bo_va_entry)
-   *p_bo_va_entry = bo_va_entry;
+   if (p_attachment)
+   *p_attachment = attachment;
 
/* Allocate validate pag

[PATCH 4/9] drm/amdgpu: Add multi-GPU DMA mapping helpers

Add BO-type specific helpers functions to DMA-map and unmap
kfd_mem_attachments. Implement this functionality for userptrs by creating
one SG BO per GPU and filling it with a DMA mapping of the pages from the
original mem->bo.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 155 --
 2 files changed, 152 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 910c50956e16..fc3514ed1b74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
 
+enum kfd_mem_attachment_type {
+   KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
+   KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
+};
+
 struct kfd_mem_attachment {
struct list_head list;
+   enum kfd_mem_attachment_type type;
+   bool is_mapped;
struct amdgpu_bo_va *bo_va;
struct amdgpu_device *adev;
-   bool is_mapped;
uint64_t va;
uint64_t pte_flags;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 114fbf508707..51502a07fc1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -471,12 +471,117 @@ static uint64_t get_pte_flags(struct amdgpu_device 
*adev, struct kgd_mem *mem)
return pte_flags;
 }
 
+static int
+kfd_mem_dmamap_userptr(struct kgd_mem *mem,
+  struct kfd_mem_attachment *attachment)
+{
+   enum dma_data_direction direction =
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+   struct amdgpu_device *adev = attachment->adev;
+   struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
+   struct ttm_tt *ttm = bo->tbo.ttm;
+   int ret;
+
+   ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
+   if (unlikely(!ttm->sg))
+   return -ENOMEM;
+
+   if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
+   return -EINVAL;
+
+   /* Same sequence as in amdgpu_ttm_tt_pin_userptr */
+   ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
+   ttm->num_pages, 0,
+   (u64)ttm->num_pages << PAGE_SHIFT,
+   GFP_KERNEL);
+   if (unlikely(ret))
+   goto release_sg;
+
+   ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
+   if (unlikely(ret))
+   goto release_sg;
+
+   drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
+  ttm->num_pages);
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+   ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+   if (ret)
+   goto release_sg;
+
+   return 0;
+
+release_sg:
+   pr_err("DMA map userptr failed: %d\n", ret);
+   sg_free_table(ttm->sg);
+   kfree(ttm->sg);
+   ttm->sg = NULL;
+   return ret;
+}
+
+static int
+kfd_mem_dmamap_attachment(struct kgd_mem *mem,
+ struct kfd_mem_attachment *attachment)
+{
+   switch (attachment->type) {
+   case KFD_MEM_ATT_SHARED:
+   return 0;
+   case KFD_MEM_ATT_USERPTR:
+   return kfd_mem_dmamap_userptr(mem, attachment);
+   default:
+   WARN_ON_ONCE(1);
+   }
+   return -EINVAL;
+}
+
+static void
+kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
+struct kfd_mem_attachment *attachment)
+{
+   enum dma_data_direction direction =
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+   struct ttm_operation_ctx ctx = {.interruptible = false};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+   struct amdgpu_device *adev = attachment->adev;
+   struct ttm_tt *ttm = bo->tbo.ttm;
+
+   if (unlikely(!ttm->sg))
+   return;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+
+   dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
+   sg_free_table(ttm->sg);
+   ttm->sg = NULL;
+}
+
+static void
+kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
+   struct kfd_mem_attachment *attachment)
+{
+   switch (attachment->type) {
+   case KFD_MEM_ATT_SHARED:
+   break;
+   case KFD_MEM_ATT_USERPTR:
+   kfd_mem_dmaunmap_userptr(mem, attachment);
+

[PATCH 2/9] drm/amdgpu: Keep a bo-reference per-attachment

For now they all reference the same BO. For correct DMA mappings they will
refer to different BOs per-GPU.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 22 ++-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d021152314ad..40a296ca37b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -487,11 +487,11 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
struct amdgpu_vm *vm, bool is_aql,
struct kfd_mem_attachment **p_attachment)
 {
-   int ret;
-   struct kfd_mem_attachment *attachment;
-   struct amdgpu_bo *bo = mem->bo;
+   unsigned long bo_size = mem->bo->tbo.base.size;
uint64_t va = mem->va;
-   unsigned long bo_size = bo->tbo.base.size;
+   struct kfd_mem_attachment *attachment;
+   struct amdgpu_bo *bo;
+   int ret;
 
if (!va) {
pr_err("Invalid VA when adding BO to VM\n");
@@ -508,6 +508,14 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
va + bo_size, vm);
 
+   /* FIXME: For now all attachments use the same BO. This is incorrect
+* because one BO can only have one DMA mapping for one GPU. We need
+* one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
+* will be addressed one BO-type at a time in subsequent patches.
+*/
+   bo = mem->bo;
+   drm_gem_object_get(&bo->tbo.base);
+
/* Add BO to VM internal data structures*/
attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
if (!attachment->bo_va) {
@@ -527,7 +535,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
 
/* Allocate validate page tables if needed */
ret = vm_validate_pt_pd_bos(vm);
-   if (ret) {
+   if (unlikely(ret)) {
pr_err("validate_pt_pd_bos() failed\n");
goto err_alloc_pts;
}
@@ -538,15 +546,19 @@ static int kfd_mem_attach(struct amdgpu_device *adev, 
struct kgd_mem *mem,
amdgpu_vm_bo_rmv(adev, attachment->bo_va);
list_del(&attachment->list);
 err_vmadd:
+   drm_gem_object_put(&bo->tbo.base);
kfree(attachment);
return ret;
 }
 
 static void kfd_mem_detach(struct kfd_mem_attachment *attachment)
 {
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
pr_debug("\t remove VA 0x%llx in entry %p\n",
attachment->va, attachment);
amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va);
+   drm_gem_object_put(&bo->tbo.base);
list_del(&attachment->list);
kfree(attachment);
 }
-- 
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO


Am 13.04.21 um 20:26 schrieb Ramesh Errabolu:

Extend current implementation of SG_TABLE construction method to
allow exportation of sub-buffers of a VRAM BO. This capability will
enable logical partitioning of a VRAM BO into multiple non-overlapping
sub-buffers. One example of this use case is to partition a VRAM BO
into two sub-buffers, one for SRC and another for DST.

Signed-off-by: Ramesh Errabolu 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++--
  3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index e0c4f7c7f1b9..baa980a477d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -291,8 +291,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
break;
  
  	case TTM_PL_VRAM:

-   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev,
- dir, &sgt);
+   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0,
+   bo->tbo.base.size, attach->dev, dir, &sgt);
if (r)
return ERR_PTR(r);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index dec0db8b0b13..9e38475e0f8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager 
*man);
  u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo);
  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 size,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 592a2dd16493..bce105e2973e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -25,6 +25,7 @@
  #include 
  #include "amdgpu.h"
  #include "amdgpu_vm.h"
+#include "amdgpu_res_cursor.h"
  #include "amdgpu_atomfirmware.h"
  #include "atom.h"
  
@@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,

   *
   * @adev: amdgpu device pointer
   * @mem: TTM memory object
+ * @offset: byte offset from the base of VRAM BO
+ * @length: number of bytes to export in sg_table
   * @dev: the other device
   * @dir: dma direction
   * @sgt: resulting sg table
@@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,
   */
  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 length,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt)
  {
-   struct drm_mm_node *node;
+   struct amdgpu_res_cursor cursor;
struct scatterlist *sg;
int num_entries = 0;
-   unsigned int pages;
int i, r;
  
  	*sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);

if (!*sgt)
return -ENOMEM;
  
-	for (pages = mem->num_pages, node = mem->mm_node;

-pages; pages -= node->size, ++node)
-   ++num_entries;
+   /* Determine the number of DRM_MM nodes to export */
+   amdgpu_res_first(mem, offset, length, &cursor);
+   while (cursor.remaining) {
+   num_entries++;
+   amdgpu_res_next(&cursor, cursor.size);
+   }
  
  	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);

if (r)
goto error_free;
  
+	/* Initialize scatterlist nodes of sg_table */

for_each_sgtable_sg((*sgt), sg, i)
sg->length = 0;
  
-	node = mem->mm_node;

+   /*
+* Walk down DRM_MM nodes to populate scatterlist nodes
+* @note: Use iterator api to get first the DRM_MM node
+* and the number of bytes from it. Access the following
+* DRM_MM node(s) if more buffer needs to exported
+*/
+   amdgpu_res_first(mem, offset, length, &cursor);
for_each_sgtable_sg((*sgt), sg, i) {
-   phys_addr_t phys = (node->start << PAGE_SHIFT) +
-   adev->gmc.aper_base;
-   size_t size = node->size << PAGE_SHIFT;
+   phys_addr_t phys = cursor.start + adev->gmc.aper_base;
+   size_t size = cursor.size;
dma_addr_t addr;
  
-		++node;

addr = dma_map_res

Re: [PATCH 3/4] drm/amdkfd: Allow access for mmapping KFD BOs


Am 08.04.21 um 01:12 schrieb Felix Kuehling:

DRM allows access automatically when it creates a GEM handle for a BO.
KFD BOs don't have GEM handles, so KFD needs to manage access manually.


Ok, double checking the code that makes sense.


Signed-off-by: Felix Kuehling 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  3 ++-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 19 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  8 +---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  7 ---
  4 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 0d59bebd92af..7c8c5e469707 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -245,7 +245,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
void *drm_priv, struct kgd_mem **mem,
uint64_t *offset, uint32_t flags);
  int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
-   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size);
+   struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv,
+   uint64_t *size);
  int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv);
  int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 95442bcd60fb..e7d61ec966b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1232,6 +1232,12 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 domain_string(alloc_domain), ret);
goto err_bo_create;
}
+   ret = drm_vma_node_allow(&gobj->vma_node, drm_priv);
+   if (ret) {
+   pr_debug("Failed to allow vma node access. ret %d\n",
+ret);
+   goto err_node_allow;
+   }
bo = gem_to_amdgpu_bo(gobj);
if (bo_type == ttm_bo_type_sg) {
bo->tbo.sg = sg;
@@ -1261,6 +1267,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
  
  allocate_init_user_pages_failed:

remove_kgd_mem_from_kfd_bo_list(*mem, avm->process_info);
+   drm_vma_node_revoke(&gobj->vma_node, drm_priv);
+err_node_allow:
amdgpu_bo_unref(&bo);
/* Don't unreserve system mem limit twice */
goto err_reserve_limit;
@@ -1278,7 +1286,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
  }
  
  int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(

-   struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *size)
+   struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv,
+   uint64_t *size)
  {
struct amdkfd_process_info *process_info = mem->process_info;
unsigned long bo_size = mem->bo->tbo.base.size;
@@ -1355,6 +1364,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
}
  
  	/* Free the BO*/

+   drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
drm_gem_object_put(&mem->bo->tbo.base);
mutex_destroy(&mem->lock);
kfree(mem);
@@ -1666,6 +1676,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
struct drm_gem_object *obj;
struct amdgpu_bo *bo;
+   int ret;
  
  	if (dma_buf->ops != &amdgpu_dmabuf_ops)

/* Can't handle non-graphics buffers */
@@ -1686,6 +1697,12 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
*kgd,
if (!*mem)
return -ENOMEM;
  
+	ret = drm_vma_node_allow(&obj->vma_node, drm_priv);

+   if (ret) {
+   kfree(mem);
+   return ret;
+   }
+
if (size)
*size = amdgpu_bo_size(bo);
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index 43de260b2230..8fc18de7cff4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1328,7 +1328,8 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
return 0;
  
  err_free:

-   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, 
NULL);
+   amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem,
+  pdd->vm, NULL);
  err_unlock:
mutex_unlock(&p->mutex);
return err;
@@ -1365,7 +1366,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file 
*filep,
}
  
  	ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd,

-   (struct kgd_mem *)mem, &size);
+   (struct kgd_mem *)mem, pdd->vm, &size);
  
  	/* If freeing the buffer failed, leave the handle in place for

 * clean-up during process tear

Re: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh

2021-04-13 Thread Huang Rui

On Wed, Apr 14, 2021 at 02:20:10PM +0800, Du, Xiaojian wrote:
> This patch is to remove the "set" function of pp_dpm_mclk for vangogh.
> For vangogh, mclk bonds with fclk, they will lock each other
> on the same perfomance level. But according to the smu message from pmfw,
> only fclk is allowed to set value manually, so remove the unnecessary
> code of "set" function for mclk.
> 
> Signed-off-by: Xiaojian Du 

Acked-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> index 61ff9a663b21..35904315c1f9 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
> @@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct 
> smu_context *smu,
>   return ret;
>   break;
>   case SMU_FCLK:
> - case SMU_MCLK:
>   ret = smu_cmn_send_smc_msg_with_param(smu,
>   
> SMU_MSG_SetHardMinFclkByFreq,
>   min, NULL);
> @@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context 
> *smu,
>   if (ret)
>   return ret;
>   break;
> - case SMU_MCLK:
>   case SMU_FCLK:
>   ret = vangogh_get_dpm_clk_limited(smu,
>   clk_type, 
> soft_min_level, &min_freq);
> @@ -1267,7 +1265,6 @@ static int vangogh_force_dpm_limit_value(struct 
> smu_context *smu, bool highest)
>   SMU_SOCCLK,
>   SMU_VCLK,
>   SMU_DCLK,
> - SMU_MCLK,
>   SMU_FCLK,
>   };
>  
> @@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct 
> smu_context *smu)
>   enum smu_clk_type clk_type;
>   uint32_tfeature;
>   } clk_feature_map[] = {
> - {SMU_MCLK,   SMU_FEATURE_DPM_FCLK_BIT},
>   {SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT},
>   {SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT},
>   {SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT},
> @@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct 
> smu_context *smu,
>   if (ret)
>   return ret;
>  
> - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
>   vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
>   vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask);
>   vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask);
> @@ -1468,7 +1463,6 @@ static int vangogh_set_performance_level(struct 
> smu_context *smu,
>   if (ret)
>   return ret;
>  
> - vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
>   vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
>   break;
>   case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK:
> -- 
> 2.25.1
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh

2021-04-13 Thread Xiaojian Du

This patch is to remove the "set" function of pp_dpm_mclk for vangogh.
For vangogh, mclk bonds with fclk, they will lock each other
on the same perfomance level. But according to the smu message from pmfw,
only fclk is allowed to set value manually, so remove the unnecessary
code of "set" function for mclk.

Signed-off-by: Xiaojian Du 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 61ff9a663b21..35904315c1f9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct 
smu_context *smu,
return ret;
break;
case SMU_FCLK:
-   case SMU_MCLK:
ret = smu_cmn_send_smc_msg_with_param(smu,

SMU_MSG_SetHardMinFclkByFreq,
min, NULL);
@@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context 
*smu,
if (ret)
return ret;
break;
-   case SMU_MCLK:
case SMU_FCLK:
ret = vangogh_get_dpm_clk_limited(smu,
clk_type, 
soft_min_level, &min_freq);
@@ -1267,7 +1265,6 @@ static int vangogh_force_dpm_limit_value(struct 
smu_context *smu, bool highest)
SMU_SOCCLK,
SMU_VCLK,
SMU_DCLK,
-   SMU_MCLK,
SMU_FCLK,
};
 
@@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct smu_context 
*smu)
enum smu_clk_type clk_type;
uint32_tfeature;
} clk_feature_map[] = {
-   {SMU_MCLK,   SMU_FEATURE_DPM_FCLK_BIT},
{SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT},
{SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT},
{SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT},
@@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct 
smu_context *smu,
if (ret)
return ret;
 
-   vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask);
vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask);
@@ -1468,7 +1463,6 @@ static int vangogh_set_performance_level(struct 
smu_context *smu,
if (ret)
return ret;
 
-   vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
break;
case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK:
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh

2021-04-13 Thread Du, Xiaojian

Du, Xiaojian would like to recall the message, "[PATCH] drm/amd/pm: remove the 
"set" function of pp_dpm_mclk for vangogh".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/pm: fix error code in smu_set_power_limit()

2021-04-13 Thread Dan Carpenter

We should return -EINVAL instead of success if the "limit" is too high.

Fixes: e098bc9612c2 ("drm/amd/pm: optimize the power related source code 
layout")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index e0eb7ca112e2..c29d8b3131b7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2221,6 +2221,7 @@ static int smu_set_power_limit(void *handle, uint32_t 
limit)
dev_err(smu->adev->dev,
"New power limit (%d) is over the max allowed %d\n",
limit, smu->max_power_limit);
+   ret = -EINVAL;
goto out;
}
 
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu: fix an error code in init_pmu_entry_by_type_and_add()

2021-04-13 Thread Dan Carpenter

If the kmemdup() fails then this should return a negative error code
but it currently returns success

Fixes: b4a7db71ea06 ("drm/amdgpu: add per device user friendly xgmi events for 
vega20")
Signed-off-by: Dan Carpenter 
---
v2: I sent this patch in Feb but I accidentally added an unrelated
hunk from nouveau to the commit.  Now both hunks are have been sent to
the correct lists.

 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 19c0a3655228..82e9ecf84352 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -519,8 +519,10 @@ static int init_pmu_entry_by_type_and_add(struct 
amdgpu_pmu_entry *pmu_entry,
pmu_entry->pmu.attr_groups = kmemdup(attr_groups, sizeof(attr_groups),
GFP_KERNEL);
 
-   if (!pmu_entry->pmu.attr_groups)
+   if (!pmu_entry->pmu.attr_groups) {
+   ret = -ENOMEM;
goto err_attr_group;
+   }
 
snprintf(pmu_name, PMU_NAME_SIZE, "%s_%d", pmu_entry->pmu_file_prefix,
adev_to_drm(pmu_entry->adev)->primary->index);
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

FW: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for vangogh

2021-04-13 Thread Du, Xiaojian

[AMD Official Use Only - Internal Distribution Only]

Forward to community for review.

-Original Message-
From: Du, Xiaojian 
Sent: 2021年4月13日 16:04
To: brahma_sw_dev 
Cc: Huang, Ray ; Quan, Evan ; Wang, 
Kevin(Yang) ; Lazar, Lijo ; Du, 
Xiaojian 
Subject: [PATCH] drm/amd/pm: remove the "set" function of pp_dpm_mclk for 
vangogh

This patch is to remvoe the "set" function of pp_dpm_mclk for vangogh.
For vangogh, mclk bonds with fclk, they will lock each other on the same 
perfomance level. But according to the smu message from pmfw, only fclk is 
allowed to set value manually, so remove the unnecessary code of "set" function 
for mclk.

Signed-off-by: Xiaojian Du 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 61ff9a663b21..35904315c1f9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -1092,7 +1092,6 @@ static int vangogh_set_soft_freq_limited_range(struct 
smu_context *smu,
 return ret;
 break;
 case SMU_FCLK:
-case SMU_MCLK:
 ret = smu_cmn_send_smc_msg_with_param(smu,
 SMU_MSG_SetHardMinFclkByFreq,
 min, NULL);
@@ -1180,7 +1179,6 @@ static int vangogh_force_clk_levels(struct smu_context 
*smu,
 if (ret)
 return ret;
 break;
-case SMU_MCLK:
 case SMU_FCLK:
 ret = vangogh_get_dpm_clk_limited(smu,
 clk_type, soft_min_level, &min_freq); @@ -1267,7 +1265,6 @@ static int 
vangogh_force_dpm_limit_value(struct smu_context *smu, bool highest)
 SMU_SOCCLK,
 SMU_VCLK,
 SMU_DCLK,
-SMU_MCLK,
 SMU_FCLK,
 };

@@ -1296,7 +1293,6 @@ static int vangogh_unforce_dpm_levels(struct smu_context 
*smu)
 enum smu_clk_type clk_type;
 uint32_tfeature;
 } clk_feature_map[] = {
-{SMU_MCLK,   SMU_FEATURE_DPM_FCLK_BIT},
 {SMU_FCLK, SMU_FEATURE_DPM_FCLK_BIT},
 {SMU_SOCCLK, SMU_FEATURE_DPM_SOCCLK_BIT},
 {SMU_VCLK, SMU_FEATURE_VCN_DPM_BIT},
@@ -1428,7 +1424,6 @@ static int vangogh_set_performance_level(struct 
smu_context *smu,
 if (ret)
 return ret;

-vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
 vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
 vangogh_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask);
 vangogh_force_clk_levels(smu, SMU_VCLK, 1 << vclk_mask); @@ -1468,7 +1463,6 @@ 
static int vangogh_set_performance_level(struct smu_context *smu,
 if (ret)
 return ret;

-vangogh_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
 vangogh_force_clk_levels(smu, SMU_FCLK, 1 << fclk_mask);
 break;
 case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK:
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amd/amdgpu: enable ASPM on navi1x and vega

2021-04-13 Thread Alex Deucher

On Tue, Apr 13, 2021 at 11:04 PM Kenneth Feng  wrote:
>
> enable ASPM on navi1x and vega series

Please split this patch into two, one for vega and one for navi1x.
With that fixed, the series is:
Reviewed-by: Alex Deucher 

>
> Signed-off-by: Kenneth Feng 
> ---
>  drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 128 +
>  drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 125 
>  drivers/gpu/drm/amd/amdgpu/nv.c|  10 +-
>  drivers/gpu/drm/amd/amdgpu/soc15.c |   7 +-
>  4 files changed, 259 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> index 83ea063388fd..0d2d629e2d6a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> @@ -31,6 +31,28 @@
>  #include "vega10_enum.h"
>  #include 
>
> +#define smnPCIE_LC_CNTL0x11140280
> +#define smnPCIE_LC_CNTL3   0x111402d4
> +#define smnPCIE_LC_CNTL6   0x111402ec
> +#define smnPCIE_LC_CNTL7   0x111402f0
> +#define smnNBIF_MGCG_CTRL_LCLK 0x1013a05c
> +#define NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK   0x1000L
> +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK   0xL
> +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK 0xL
> +#define smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL   0x10123530
> +#define smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2  0x1014008c
> +#define smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP  0x10140324
> +#define smnPSWUSP0_PCIE_LC_CNTL2   0x111402c4
> +#define smnRCC_BIF_STRAP2  0x10123488
> +#define smnRCC_BIF_STRAP3  0x1012348c
> +#define smnRCC_BIF_STRAP5  0x10123494
> +#define BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK
> 0x0400L
> +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK   0xL
> +#define RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK   0x4000L
> +#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT 0x0
> +#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT   0x10
> +#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER__SHIFT 0x0
> +
>  static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev)
>  {
> WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
> @@ -256,6 +278,111 @@ static void nbio_v6_1_init_registers(struct 
> amdgpu_device *adev)
> WREG32_PCIE(smnPCIE_CI_CNTL, data);
>  }
>
> +static void nbio_v6_1_program_ltr(struct amdgpu_device *adev)
> +{
> +   uint32_t def, data;
> +
> +   WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, 0x75EB);
> +
> +   def = data = RREG32_PCIE(smnRCC_BIF_STRAP2);
> +   data &= ~RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnRCC_BIF_STRAP2, data);
> +
> +   def = data = RREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL);
> +   data &= ~EP_PCIE_TX_LTR_CNTL__LTR_PRIV_MSG_DIS_IN_PM_NON_D0_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, data);
> +
> +   def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2);
> +   data |= BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data);
> +}
> +
> +static void nbio_v6_1_program_aspm(struct amdgpu_device *adev)
> +{
> +   uint32_t def, data;
> +
> +   def = data = RREG32_PCIE(smnPCIE_LC_CNTL);
> +   data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK;
> +   data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK;
> +   data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnPCIE_LC_CNTL, data);
> +
> +   def = data = RREG32_PCIE(smnPCIE_LC_CNTL7);
> +   data |= PCIE_LC_CNTL7__LC_NBIF_ASPM_INPUT_EN_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnPCIE_LC_CNTL7, data);
> +
> +   def = data = RREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK);
> +   data |= NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK, data);
> +
> +   def = data = RREG32_PCIE(smnPCIE_LC_CNTL3);
> +   data |= PCIE_LC_CNTL3__LC_DSC_DONT_ENTER_L23_AFTER_PME_ACK_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnPCIE_LC_CNTL3, data);
> +
> +   def = data = RREG32_PCIE(smnRCC_BIF_STRAP3);
> +   data &= ~RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK;
> +   data &= ~RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnRCC_BIF_STRAP3, data);
> +
> +   def = data = RREG32_PCIE(smnRCC_BIF_STRAP5);
> +   data &= ~RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK;
> +   if (def != data)
> +   WREG32_PCIE(smnRCC_BIF_STRAP5, data);
> +
> +   def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2);
> +   data &= ~BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR

[PATCH 2/2] drm/amd/amdgpu: add ASPM support on polaris

2021-04-13 Thread Kenneth Feng

add ASPM support on polaris

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/vi.c | 193 +++-
 1 file changed, 191 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index ea338de5818a..735ebbd1148f 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -81,6 +81,30 @@
 #include "mxgpu_vi.h"
 #include "amdgpu_dm.h"
 
+#define ixPCIE_LC_L1_PM_SUBSTATE   0x100100C6
+#define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK   
0x0001L
+#define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK   0x0002L
+#define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_1_OVERRIDE_MASK   0x0004L
+#define PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_2_OVERRIDE_MASK 
0x0008L
+#define PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_1_OVERRIDE_MASK 
0x0010L
+#define ixPCIE_L1_PM_SUB_CNTL  0x378
+#define PCIE_L1_PM_SUB_CNTL__ASPM_L1_2_EN_MASK 0x0004L
+#define PCIE_L1_PM_SUB_CNTL__ASPM_L1_1_EN_MASK 0x0008L
+#define PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_2_EN_MASK   0x0001L
+#define PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_1_EN_MASK   0x0002L
+#define PCIE_LC_CNTL6__LC_L1_POWERDOWN_MASK0x0020L
+#define LINK_CAP   0x64
+#define PCIE_LINK_CAP__CLOCK_POWER_MANAGEMENT_MASK 0x0004L
+#define ixCPM_CONTROL  0x1400118
+#define ixPCIE_LC_CNTL70x100100BC
+#define PCIE_LC_CNTL7__LC_L1_SIDEBAND_CLKREQ_PDWN_EN_MASK  0x0400L
+#define PCIE_LC_CNTL__LC_L0S_INACTIVITY_DEFAULT0x0007
+#define PCIE_LC_CNTL__LC_L1_INACTIVITY_DEFAULT 0x0009
+#define CPM_CONTROL__CLKREQb_UNGATE_TXCLK_ENABLE_MASK  0x0100L
+#define PCIE_L1_PM_SUB_CNTL0x378
+#define ASIC_IS_P22(asic_type, rid)((asic_type >= CHIP_POLARIS10) && \
+   
(asic_type <= CHIP_POLARIS12) && \
+   (rid >= 
0x6E))
 /* Topaz */
 static const struct amdgpu_video_codecs topaz_video_codecs_encode =
 {
@@ -1091,13 +1115,178 @@ static void vi_pcie_gen3_enable(struct amdgpu_device 
*adev)
/* todo */
 }
 
+static void vi_enable_aspm(struct amdgpu_device *adev)
+{
+   u32 data, orig;
+
+   orig = data = RREG32_PCIE(ixPCIE_LC_CNTL);
+   data |= PCIE_LC_CNTL__LC_L0S_INACTIVITY_DEFAULT <<
+   PCIE_LC_CNTL__LC_L0S_INACTIVITY__SHIFT;
+   data |= PCIE_LC_CNTL__LC_L1_INACTIVITY_DEFAULT <<
+   PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT;
+   data &= ~PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK;
+   data |= PCIE_LC_CNTL__LC_DELAY_L1_EXIT_MASK;
+   if (orig != data)
+   WREG32_PCIE(ixPCIE_LC_CNTL, data);
+}
+
 static void vi_program_aspm(struct amdgpu_device *adev)
 {
+   u32 data, data1, orig;
+   bool bL1SS = false;
+   bool bClkReqSupport = true;
 
-   if (amdgpu_aspm == 0)
+   if (amdgpu_aspm != 1)
return;
 
-   /* todo */
+   if (adev->flags & AMD_IS_APU ||
+   adev->asic_type < CHIP_POLARIS10)
+   return;
+
+   orig = data = RREG32_PCIE(ixPCIE_LC_CNTL);
+   data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK;
+   data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK;
+   data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK;
+   if (orig != data)
+   WREG32_PCIE(ixPCIE_LC_CNTL, data);
+
+   orig = data = RREG32_PCIE(ixPCIE_LC_N_FTS_CNTL);
+   data &= ~PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS_MASK;
+   data |= 0x0024 << PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS__SHIFT;
+   data |= PCIE_LC_N_FTS_CNTL__LC_XMIT_N_FTS_OVERRIDE_EN_MASK;
+   if (orig != data)
+   WREG32_PCIE(ixPCIE_LC_N_FTS_CNTL, data);
+
+   orig = data = RREG32_PCIE(ixPCIE_LC_CNTL3);
+   data |= PCIE_LC_CNTL3__LC_GO_TO_RECOVERY_MASK;
+   if (orig != data)
+   WREG32_PCIE(ixPCIE_LC_CNTL3, data);
+
+   orig = data = RREG32_PCIE(ixPCIE_P_CNTL);
+   data |= PCIE_P_CNTL__P_IGNORE_EDB_ERR_MASK;
+   if (orig != data)
+   WREG32_PCIE(ixPCIE_P_CNTL, data);
+
+   data = RREG32_PCIE(ixPCIE_LC_L1_PM_SUBSTATE);
+   pci_read_config_dword(adev->pdev, PCIE_L1_PM_SUB_CNTL, &data1);
+   if (data & PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK &&
+   (data & (PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK |
+   PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_1_OVERRIDE_MASK |
+   PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_2_OVERRIDE_MASK |
+   PCIE_LC_L1_PM_SUBSTATE__LC_ASPM_L1_1_OVERRIDE_MASK))) {
+   bL1SS = true;
+   } else if (data1 & (PCIE_L1_PM_SUB_CNTL__ASPM_L1_2_EN_MASK |
+   PCIE_L1_PM_SUB_CNTL__ASPM_L1_1_EN_MASK |
+   PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_2_EN_MASK |
+   PCIE_L1_PM_SUB_CNTL__PCI_PM_L1_1_EN_MASK)) {
+   bL1SS = true;
+   }
+
+   orig = data = RRE

[PATCH 1/2] drm/amd/amdgpu: enable ASPM on navi1x and vega

2021-04-13 Thread Kenneth Feng

enable ASPM on navi1x and vega series

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 128 +
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 125 
 drivers/gpu/drm/amd/amdgpu/nv.c|  10 +-
 drivers/gpu/drm/amd/amdgpu/soc15.c |   7 +-
 4 files changed, 259 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index 83ea063388fd..0d2d629e2d6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -31,6 +31,28 @@
 #include "vega10_enum.h"
 #include 
 
+#define smnPCIE_LC_CNTL0x11140280
+#define smnPCIE_LC_CNTL3   0x111402d4
+#define smnPCIE_LC_CNTL6   0x111402ec
+#define smnPCIE_LC_CNTL7   0x111402f0
+#define smnNBIF_MGCG_CTRL_LCLK 0x1013a05c
+#define NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK   0x1000L
+#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK   0xL
+#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK 0xL
+#define smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL   0x10123530
+#define smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2  0x1014008c
+#define smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP  0x10140324
+#define smnPSWUSP0_PCIE_LC_CNTL2   0x111402c4
+#define smnRCC_BIF_STRAP2  0x10123488
+#define smnRCC_BIF_STRAP3  0x1012348c
+#define smnRCC_BIF_STRAP5  0x10123494
+#define BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK0x0400L
+#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK   0xL
+#define RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK   0x4000L
+#define RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT 0x0
+#define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT   0x10
+#define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER__SHIFT 0x0
+
 static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev)
 {
WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
@@ -256,6 +278,111 @@ static void nbio_v6_1_init_registers(struct amdgpu_device 
*adev)
WREG32_PCIE(smnPCIE_CI_CNTL, data);
 }
 
+static void nbio_v6_1_program_ltr(struct amdgpu_device *adev)
+{
+   uint32_t def, data;
+
+   WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, 0x75EB);
+
+   def = data = RREG32_PCIE(smnRCC_BIF_STRAP2);
+   data &= ~RCC_BIF_STRAP2__STRAP_LTR_IN_ASPML1_DIS_MASK;
+   if (def != data)
+   WREG32_PCIE(smnRCC_BIF_STRAP2, data);
+
+   def = data = RREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL);
+   data &= ~EP_PCIE_TX_LTR_CNTL__LTR_PRIV_MSG_DIS_IN_PM_NON_D0_MASK;
+   if (def != data)
+   WREG32_PCIE(smnRCC_EP_DEV0_0_EP_PCIE_TX_LTR_CNTL, data);
+
+   def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2);
+   data |= BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK;
+   if (def != data)
+   WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data);
+}
+
+static void nbio_v6_1_program_aspm(struct amdgpu_device *adev)
+{
+   uint32_t def, data;
+
+   def = data = RREG32_PCIE(smnPCIE_LC_CNTL);
+   data &= ~PCIE_LC_CNTL__LC_L1_INACTIVITY_MASK;
+   data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK;
+   data |= PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK;
+   if (def != data)
+   WREG32_PCIE(smnPCIE_LC_CNTL, data);
+
+   def = data = RREG32_PCIE(smnPCIE_LC_CNTL7);
+   data |= PCIE_LC_CNTL7__LC_NBIF_ASPM_INPUT_EN_MASK;
+   if (def != data)
+   WREG32_PCIE(smnPCIE_LC_CNTL7, data);
+
+   def = data = RREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK);
+   data |= NBIF_MGCG_CTRL_LCLK__NBIF_MGCG_REG_DIS_LCLK_MASK;
+   if (def != data)
+   WREG32_PCIE(smnNBIF_MGCG_CTRL_LCLK, data);
+
+   def = data = RREG32_PCIE(smnPCIE_LC_CNTL3);
+   data |= PCIE_LC_CNTL3__LC_DSC_DONT_ENTER_L23_AFTER_PME_ACK_MASK;
+   if (def != data)
+   WREG32_PCIE(smnPCIE_LC_CNTL3, data);
+
+   def = data = RREG32_PCIE(smnRCC_BIF_STRAP3);
+   data &= ~RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER_MASK;
+   data &= ~RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER_MASK;
+   if (def != data)
+   WREG32_PCIE(smnRCC_BIF_STRAP3, data);
+
+   def = data = RREG32_PCIE(smnRCC_BIF_STRAP5);
+   data &= ~RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER_MASK;
+   if (def != data)
+   WREG32_PCIE(smnRCC_BIF_STRAP5, data);
+
+   def = data = RREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2);
+   data &= ~BIF_CFG_DEV0_EPF0_DEVICE_CNTL2__LTR_EN_MASK;
+   if (def != data)
+   WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_DEVICE_CNTL2, data);
+
+   WREG32_PCIE(smnBIF_CFG_DEV0_EPF0_PCIE_LTR_CAP, 0x10011001);
+
+   def = data = RREG32_PCIE(smnPSWUSP0_PCIE_LC_CNTL2);
+   data |= PSWUSP0_PCIE_LC_CNTL2__LC_ALLOW_PDWN_IN_L1_MASK |
+   PSWUSP0_PCIE_LC_CNTL2__LC_ALLOW_PDWN_IN_L23_MASK;
+   data &= ~PSWUSP0_PCIE_LC

[PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay

2021-04-13 Thread Rodrigo Siqueira

Our driver supports overlay planes, and as expected, some userspace
compositor takes advantage of these features. If the userspace is not
enabling the cursor, they can use multiple planes as they please.
Nevertheless, we start to have constraints when userspace tries to
enable hardware cursor with various planes. Basically, we cannot draw
the cursor at the same size and position on two separated pipes since it
uses extra bandwidth and DML only run with one cursor.

For those reasons, when we enable hardware cursor and multiple planes,
our driver should accept variations like the ones described below:

  +-+   +--+
  | +-+ |   |  |
  | |Primary  | |   | Primary  |
  | | | |   | Overlay  |
  | +-+ |   |  |
  |Overlay  |   |  |
  +-+   +--+

In this scenario, we can have the desktop UI in the overlay and some
other framebuffer attached to the primary plane (e.g., video). However,
userspace needs to obey some rules and avoid scenarios like the ones
described below (when enabling hw cursor):

  ++
  |Overlay |
 +-++-+---+ +-||--+
 | ++  | ++   | | ++  |
 | |Overlay |  | |Overlay |   | | |
 | ||  | ||   | | |
 | ++  | ++   | | |
 | Primary || Primary | | Primary |
 +-++-+ +-+

 +-+   +-+
 | ++  |  Primary|
 | |Overlay |  | |
 | ||  | |
 | ++  | ++  |
 | Primary |   | |Overlay |  |
 +-+   +-||--+
 ++

If the userspace violates some of the above scenarios, our driver needs
to reject the commit; otherwise, we can have unexpected behavior. Since
we don't have a proper driver validation for the above case, we can see
some problems like a duplicate cursor in applications that use multiple
planes. This commit fixes the cursor issue and others by adding adequate
verification for multiple planes.

Change since V1 (Harry and Sean):
- Remove cursor verification from the equation.

Cc: Louis Li 
Cc: Nicholas Kazlauskas 
Cc: Harry Wentland 
Cc: Hersen Wu 
Cc: Sean Paul 
Signed-off-by: Rodrigo Siqueira 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e29cb2e956db..ac1408d52eff 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9946,6 +9946,53 @@ static int add_affected_mst_dsc_crtcs(struct 
drm_atomic_state *state, struct drm
 }
 #endif
 
+static int validate_overlay(struct drm_atomic_state *state)
+{
+   int i;
+   struct drm_plane *plane;
+   struct drm_plane_state *old_plane_state, *new_plane_state;
+   struct drm_plane_state *primary_state, *overlay_state = NULL;
+
+   /* Check if primary plane is contained inside overlay */
+   for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, 
new_plane_state, i) {
+   if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
+   if (drm_atomic_plane_disabling(plane->state, 
new_plane_state))
+   return 0;
+
+   overlay_state = new_plane_state;
+   continue;
+   }
+   }
+
+   /* check if we're making changes to the overlay plane */
+   if (!overlay_state)
+   return 0;
+
+   /* check if overlay plane is enabled */
+   if (!overlay_state->crtc)
+   return 0;
+
+   /* find the primary plane for the CRTC that the overlay is enabled on */
+   primary_state = drm_atomic_get_plane_state(state, 
overlay_state->crtc->primary);
+   if (IS_ERR(primary_state))
+   return PTR_ERR(primary_state);
+
+   /* check if primary plane is enabled */
+   if (!primary_state->crtc)
+   return 0;
+
+   /* Perform the bounds check to ensure the overlay plane covers the 
primary */
+   if (primary_state->crtc_x < overlay_state->crtc_x ||
+   primary_state->crtc_y < overlay_state->crtc_y ||
+   primary_state->crtc_x + primary_state->crtc_w > 
overlay_state->crtc_x + overlay_state->crtc_w ||
+   primary_state->crtc_y + primary_state->crtc_h > 
overlay_state->crtc_y + overlay_state->crtc_h) {
+   DRM_DEBUG_ATOMIC("Overlay plane is enabled with hardware cursor 
but does not fully cover primary plane\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 /**
  * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM.
  * @dev:

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Leo Liu




On 2021-04-13 5:24 p.m., Mikhail Gavrilov wrote:

On Tue, 13 Apr 2021 at 04:55, Leo Liu  wrote:

It curious why ffmpeg does not cause such issues.
For example such command not cause kernel panic:
$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
h264_vaapi output3.mp4

What command are you using to see the issue or how can the issue be reproduced?

$ mpv output4.mp4


This is decode command line, are you seeing issue with encode or 
decode?, you also said `ffmpeg -f x11grab -framerate 60 -video_size 
3840x2160 -i :0.0 -vf 'format=nv12,hwupload' -vaapi_device 
/dev/dri/renderD128 -vcodec h264_vaapi output3.mp4` doesn't cause such 
issue, right? What command line can cause the issue then?





And of course, I know how it should works because when I encode video
with CPU encoder (libx264) all fine.
$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0
-vcodec libx264 output3.mp4


Please file a freedesktop gitlab issue, so we can keep track of it.

Here? 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues&data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GbiA7%2FrkiLwh2E9js9tGhWkZyr%2B9TY57H6G6cL7ex8s%3D&reserved=0


Yes.



Also, I found that other users face the same problem.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbbs.archlinux.org%2Fviewtopic.php%3Fid%3D261965&data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YNnk%2BZnPS0DVtuDfttnTThYfHOvmP38%2BwNpNZ5voLuk%3D&reserved=0


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Mikhail Gavrilov

On Tue, 13 Apr 2021 at 04:55, Leo Liu  wrote:
>
> >It curious why ffmpeg does not cause such issues.
> >For example such command not cause kernel panic:
> >$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
> >'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
> >h264_vaapi output3.mp4
>
> What command are you using to see the issue or how can the issue be 
> reproduced?
$ mpv output4.mp4

And of course, I know how it should works because when I encode video
with CPU encoder (libx264) all fine.
$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0
-vcodec libx264 output3.mp4

> Please file a freedesktop gitlab issue, so we can keep track of it.
Here? https://gitlab.freedesktop.org/drm/amd/-/issues

Also, I found that other users face the same problem.
https://bbs.archlinux.org/viewtopic.php?id=261965

-- 
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Mikhail Gavrilov

On Tue, 13 Apr 2021 at 12:29, Christian König  wrote:
>
> Hi Mikhail,
>
> the crash is a known issue and should be fixed by:
>
> commit f63da9ae7584280582cbc834b20cc18bfb203b14
> Author: Philip Yang 
> Date:   Thu Apr 1 00:22:23 2021 -0400
>
>  drm/amdgpu: reserve fence slot to update page table
>

Unfortunately, this commit couldn't fix the initial problem.
1. Result video is jerky if it grabbed and encoded with ffmpeg
(h264_vaapi codec).
2. OBS still crashed if I try to record or stream video.
3. In the kernel log still appears the message "amdgpu: [mmhub] page
fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried
to record or stream video by OBS.

-- 
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [pull] amdgpu, radeon drm-next-5.13

2021-04-13 Thread Daniel Vetter

On Mon, Apr 12, 2021 at 06:07:32PM -0400, Alex Deucher wrote:
> Hi Dave, Daniel,
> 
> Same PR as last week plus a few accumulated fixes, rebased on drm-next
> to resolve the dependencies between ttm and scheduler with changes in amdgpu.
> 
> The following changes since commit c103b850721e4a79ff9578f131888129c37a4679:
> 
>   Merge tag 'drm-misc-next-2021-04-09' of 
> git://anongit.freedesktop.org/drm/drm-misc into drm-next (2021-04-10 05:46:35 
> +1000)
> 
> are available in the Git repository at:
> 
>   https://gitlab.freedesktop.org/agd5f/linux.git 
> tags/amd-drm-next-5.13-2021-04-12
> 
> for you to fetch changes up to cbb8f989d5a07cb3e39e9c149a6f89d6c83432aa:
> 
>   drm/amdgpu: page retire over debugfs mechanism (2021-04-09 16:58:28 -0400)

Applied to drm-next, thanks.
-Daniel

> 
> 
> amd-drm-next-5.13-2021-04-12:
> 
> amdgpu:
> - Re-enable GPU reset on VanGogh
> - Enable DPM flags for SMART_SUSPEND and MAY_SKIP_RESUME
> - Disentangle HG from vga_switcheroo
> - S0ix fixes
> - W=1 fixes
> - Resource iterator fixes
> - DMCUB updates
> - UBSAN fixes
> - More PM API cleanup
> - Aldebaran updates
> - Modifier fixes
> - Enable VCN load balancing with asymmetric engines
> - Rework BO structs
> - Aldebaran reset support
> - Initial LTTPR display work
> - Display MALL fixes
> - Fall back to YCbCr420 when YCbCr444 fails
> - SR-IOV fixes
> - RAS updates
> - Misc cleanups and fixes
> 
> radeon:
> - Typo fixes
> - Fix error handling for firmware on r6xx
> - Fix a missing check in DP MST handling
> 
> 
> Alex Deucher (22):
>   drm/amdgpu/display/dm: add missing parameter documentation
>   drm/amdgpu: Add additional Sienna Cichlid PCI ID
>   drm/amdgpu: add a dev_pm_ops prepare callback (v2)
>   drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and DPM_FLAG_SMART_SUSPEND 
> flags (v2)
>   drm/amdgpu: disentangle HG systems from vgaswitcheroo
>   drm/amdgpu: rework S3/S4/S0ix state handling
>   drm/amdgpu: don't evict vram on APUs for suspend to ram (v4)
>   drm/amdgpu: clean up non-DC suspend/resume handling
>   drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3)
>   drm/amdgpu: re-enable suspend phase 2 for S0ix
>   drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend
>   drm/amdgpu: update comments about s0ix suspend/resume
>   drm/amdgpu: drop S0ix checks around CG/PG in suspend
>   drm/amdgpu: skip kfd suspend/resume for S0ix
>   drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
>   drm/amdgpu/display: fix memory leak for dimgrey cavefish
>   drm/amdgpu/pm: mark pcie link/speed arrays as const
>   drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend
>   drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in 
> suspend
>   drm/amdgpu/display: fix warning on 32 bit in dmub
>   drm/amdgpu: drop some unused atombios functions
>   drm/amdgpu/smu7: fix CAC setting on TOPAZ
> 
> Alex Sierra (2):
>   drm/amdgpu: replace per_device_list by array
>   drm/amdgpu: ih reroute for newer asics than vega20
> 
> Alvin Lee (1):
>   drm/amd/display: Change input parameter for set_drr
> 
> Amber Lin (1):
>   drm/amdkfd: Avoid null pointer in SMI event
> 
> Anson Jacob (2):
>   drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
>   drm/amd/display: Removing unused code from dmub_cmd.h
> 
> Anthony Koo (3):
>   drm/amd/display: [FW Promotion] Release 0.0.57
>   drm/amd/display: [FW Promotion] Release 0.0.58
>   drm/amd/display: [FW Promotion] Release 0.0.59
> 
> Aric Cyr (3):
>   drm/amd/display: 3.2.128
>   drm/amd/display: 3.2.129
>   drm/amd/display: 3.2.130
> 
> Arnd Bergmann (3):
>   amdgpu: avoid incorrect %hu format string
>   amdgpu: fix gcc -Wrestrict warning
>   amdgpu: securedisplay: simplify i2c hexdump output
> 
> Aurabindo Pillai (1):
>   drm/amd/display: Add debugfs entry for LTTPR register status
> 
> Bernard Zhao (2):
>   drm/amd: use kmalloc_array over kmalloc with multiply
>   drm/amd: cleanup coding style a bit
> 
> Bhaskar Chowdhury (6):
>   drm/amdgpu: Fix a typo
>   drm/amdgpu: Fix a typo
>   drm/atomic: Couple of typo fixes
>   drm/radeon/r600_cs: Few typo fixes
>   drm/amd/amdgpu/gfx_v7_0: Trivial typo fixes
>   drm/amd: Fix a typo in two different sentences
> 
> Bindu Ramamurthy (1):
>   drm/amd/display: Allow idle optimization based on vblank.
> 
> Chengming Gui (1):
>   drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for 
> vega20/ALDEBARAN
> 
> Chris Park (1):
>   drm/amd/display: Disable MALL when SMU not present
> 
> Christian König (4):
>   drm/amdgpu: remove irq_src->data handling
>   drm/amdgpu: add the sched_score to amdgpu_ring_init
>   drm/amdgpu: share scheduler score on VCN3 instances
>   drm/amdgpu: lo

Re: [PATCH v2] drm/amdgpu: use pre-calculated bo size

2021-04-13 Thread Nirmoy




On 4/13/21 10:50 PM, Nirmoy Das wrote:

Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT.



Ignore this please, pressed send-email too quick!




Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1345f7eba011..74ecc0c1863f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->mem.mem_type != TTM_PL_VRAM)
return 0;

-   size = bo->mem.num_pages << PAGE_SHIFT;
+   size = bo->base.size;
offset = bo->mem.start << PAGE_SHIFT;
if ((offset + size) <= adev->gmc.visible_vram_size)
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..26deace78539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
}

-   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
+   num_bytes = bo->tbo.base.size;
num_loops = 0;

amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor);
--
2.30.2


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/radeon: use pre-calculated bo size

Use bo->tbo.base.size instead of calculating it from num_pages.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/radeon/radeon_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index cee11c55fd15..fd8f6ccc267e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -763,7 +763,7 @@ vm_fault_t radeon_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->mem.mem_type != TTM_PL_VRAM)
return 0;
 
-   size = bo->mem.num_pages << PAGE_SHIFT;
+   size = bo->base.size;
offset = bo->mem.start << PAGE_SHIFT;
if ((offset + size) <= rdev->mc.visible_vram_size)
return 0;
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: use pre-calculated bo size

Use bo->tbo.base.size instead of calculating it from num_pages.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1345f7eba011..74ecc0c1863f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->mem.mem_type != TTM_PL_VRAM)
return 0;
 
-   size = bo->mem.num_pages << PAGE_SHIFT;
+   size = bo->base.size;
offset = bo->mem.start << PAGE_SHIFT;
if ((offset + size) <= adev->gmc.visible_vram_size)
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..26deace78539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
}
 
-   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
+   num_bytes = bo->tbo.base.size;
num_loops = 0;
 
amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor);
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu: use pre-calculated bo size

Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1345f7eba011..74ecc0c1863f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1371,7 +1371,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
if (bo->mem.mem_type != TTM_PL_VRAM)
return 0;

-   size = bo->mem.num_pages << PAGE_SHIFT;
+   size = bo->base.size;
offset = bo->mem.start << PAGE_SHIFT;
if ((offset + size) <= adev->gmc.visible_vram_size)
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..26deace78539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
}

-   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
+   num_bytes = bo->tbo.base.size;
num_loops = 0;

amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor);
--
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: use pre-calculated bo size

Use bo->tbo.base.size instead of bo->tbo.mem.num_pages << PAGE_SHIFT.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 936b3cfdde55..26deace78539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2048,7 +2048,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
}
 
-   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
+   num_bytes = bo->tbo.base.size;
num_loops = 0;
 
amdgpu_res_first(&bo->tbo.mem, 0, num_bytes, &cursor);
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Daniel Vetter

On Tue, Apr 13, 2021 at 11:13 AM Li, Dennis  wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Christian and Andrey,
>   We maybe try to implement "wait" callback function of dma_fence_ops, 
> when GPU reset or unplug happen, make this callback return - ENODEV, to 
> notify the caller device lost.
>
>  * Must return -ERESTARTSYS if the wait is intr = true and the wait 
> was
>  * interrupted, and remaining jiffies if fence has signaled, or 0 if 
> wait
>  * timed out. Can also return other error values on custom 
> implementations,
>  * which should be treated as if the fence is signaled. For example a 
> hardware
>  * lockup could be reported like that.
>  *
>  * This callback is optional.
>  */
> signed long (*wait)(struct dma_fence *fence,
> bool intr, signed long timeout);

Uh this callback is for old horros like unreliable irq delivery on
radeon. Please don't use it for anything, if we need to make fences
bail out on error then we need something that works for all fences.
Also TDR should recovery you here already and make sure the
dma_fence_wait() is bound in time.
-Daniel

>
> Best Regards
> Dennis Li
> -Original Message-
> From: Christian König 
> Sent: Tuesday, April 13, 2021 3:10 PM
> To: Grodzovsky, Andrey ; Koenig, Christian 
> ; Li, Dennis ; 
> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> ; Kuehling, Felix ; Zhang, 
> Hawking ; Daniel Vetter 
> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
>
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
> >
> > On 2021-04-12 3:18 p.m., Christian König wrote:
> >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
> >>> [SNIP]
> >
> > So what's the right approach ? How we guarantee that when running
> > amdgpu_fence_driver_force_completion we will signal all the HW
> > fences and not racing against some more fences insertion into that
> > array ?
> >
> 
>  Well I would still say the best approach would be to insert this
>  between the front end and the backend and not rely on signaling
>  fences while holding the device srcu.
> >>>
> >>>
> >>> My question is, even now, when we run
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
> >>> what there prevents a race with another fence being at the same time
> >>> emitted and inserted into the fence array ? Looks like nothing.
> >>>
> >>
> >> Each ring can only be used by one thread at the same time, this
> >> includes emitting fences as well as other stuff.
> >>
> >> During GPU reset we make sure nobody writes to the rings by stopping
> >> the scheduler and taking the GPU reset lock (so that nobody else can
> >> start the scheduler again).
> >
> >
> > What about direct submissions not through scheduler -
> > amdgpu_job_submit_direct, I don't see how this is protected.
>
> Those only happen during startup and GPU reset.
>
> >>
> 
>  BTW: Could it be that the device SRCU protects more than one device
>  and we deadlock because of this?
> >>>
> >>>
> >>> I haven't actually experienced any deadlock until now but, yes,
> >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the
> >>> presence  of multiple devices from same or different drivers we in
> >>> fact are dependent on all their critical sections i guess.
> >>>
> >>
> >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually
> >> need to sync that up with Daniel and the rest of the i915 guys.
> >>
> >> IIRC we could actually have an amdgpu device in a docking station
> >> which needs hotplug and the driver might depend on waiting for the
> >> i915 driver as well.
> >
> >
> > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
> > don't see why it has to be global and not per device thing.
>
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.
>
> Regards,
> Christian.
>
> >
> > Andrey
> >
> >
> >>
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> 
>  Christian.
> 
> > Andrey
> >
> >
> >>
> >>> Andrey
> >>>
> >>>
> 
>  Christian.
> 
> > /* Past this point no more fence are submitted to HW ring
> > and hence we can safely call force signal on all that are
> > currently there.
> >  * Any subsequently created  HW fences will be returned
> > signaled with an error code right away
> >  */
> >
> > for_each_ring(adev)
> > amdgpu_fence_process(ring)
> >
> > drm_dev_unplug(dev);
> > Stop schedulers
> > cancel_sync(all timers and queued works);
> > hw_fini
> > unmap_mmio
> >
> > }
> >
> >
> >>

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Daniel Vetter

On Tue, Apr 13, 2021 at 9:10 AM Christian König
 wrote:
>
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
> >
> > On 2021-04-12 3:18 p.m., Christian König wrote:
> >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
> >>> [SNIP]
> >
> > So what's the right approach ? How we guarantee that when running
> > amdgpu_fence_driver_force_completion we will signal all the HW
> > fences and not racing against some more fences insertion into that
> > array ?
> >
> 
>  Well I would still say the best approach would be to insert this
>  between the front end and the backend and not rely on signaling
>  fences while holding the device srcu.
> >>>
> >>>
> >>> My question is, even now, when we run
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
> >>> what there prevents a race with another fence being at the same time
> >>> emitted and inserted into the fence array ? Looks like nothing.
> >>>
> >>
> >> Each ring can only be used by one thread at the same time, this
> >> includes emitting fences as well as other stuff.
> >>
> >> During GPU reset we make sure nobody writes to the rings by stopping
> >> the scheduler and taking the GPU reset lock (so that nobody else can
> >> start the scheduler again).
> >
> >
> > What about direct submissions not through scheduler -
> > amdgpu_job_submit_direct, I don't see how this is protected.
>
> Those only happen during startup and GPU reset.
>
> >>
> 
>  BTW: Could it be that the device SRCU protects more than one device
>  and we deadlock because of this?
> >>>
> >>>
> >>> I haven't actually experienced any deadlock until now but, yes,
> >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the
> >>> presence  of multiple devices from same or different drivers we in
> >>> fact are dependent on all their critical sections i guess.
> >>>
> >>
> >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually
> >> need to sync that up with Daniel and the rest of the i915 guys.
> >>
> >> IIRC we could actually have an amdgpu device in a docking station
> >> which needs hotplug and the driver might depend on waiting for the
> >> i915 driver as well.
> >
> >
> > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
> > don't see why it has to be global and not per device thing.
>
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

SRCU isn't exactly the cheapest thing, but aside from that we could
make it per-device. I'm not seeing the point much since if you do end
up being stuck on an ioctl this might happen with anything really.

Also note that dma_fence_waits are supposed to be time bound, so you
shouldn't end up waiting on them forever. It should all get sorted out
in due time with TDR I hope (e.g. if i915 is stuck on a fence because
you're unlucky).
-Daniel

>
> Regards,
> Christian.
>
> >
> > Andrey
> >
> >
> >>
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> 
>  Christian.
> 
> > Andrey
> >
> >
> >>
> >>> Andrey
> >>>
> >>>
> 
>  Christian.
> 
> > /* Past this point no more fence are submitted to HW ring
> > and hence we can safely call force signal on all that are
> > currently there.
> >  * Any subsequently created  HW fences will be returned
> > signaled with an error code right away
> >  */
> >
> > for_each_ring(adev)
> > amdgpu_fence_process(ring)
> >
> > drm_dev_unplug(dev);
> > Stop schedulers
> > cancel_sync(all timers and queued works);
> > hw_fini
> > unmap_mmio
> >
> > }
> >
> >
> > Andrey
> >
> >
> >>
> >>
> >>>
> >>
> >> Alternatively grabbing the reset write side and stopping
> >> and then restarting the scheduler could work as well.
> >>
> >> Christian.
> >
> >
> > I didn't get the above and I don't see why I need to reuse
> > the GPU reset rw_lock. I rely on the SRCU unplug flag for
> > unplug. Also, not clear to me why are we focusing on the
> > scheduler threads, any code patch to generate HW fences
> > should be covered, so any code leading to
> > amdgpu_fence_emit needs to be taken into account such as,
> > direct IB submissions, VM flushes e.t.c
> 
>  You need to work together with the reset lock anyway, cause
>  a hotplug could run at the same time as a reset.
> >>>
> >>>
> >>> For going my way indeed now I see now that I have to take
> >>> reset write side lock during HW f

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Andrey Grodzovsky



On 2021-04-13 2:25 p.m., Christian König wrote:



Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:


On 2021-04-13 2:03 p.m., Christian König wrote:

Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:


On 2021-04-13 3:10 a.m., Christian König wrote:

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:


On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]


So what's the right approach ? How we guarantee that when 
running amdgpu_fence_driver_force_completion we will signal 
all the HW fences and not racing against some more fences 
insertion into that array ?




Well I would still say the best approach would be to insert 
this between the front end and the backend and not rely on 
signaling fences while holding the device srcu.



My question is, even now, when we run 
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
what there prevents a race with another fence being at the same 
time emitted and inserted into the fence array ? Looks like 
nothing.




Each ring can only be used by one thread at the same time, this 
includes emitting fences as well as other stuff.


During GPU reset we make sure nobody writes to the rings by 
stopping the scheduler and taking the GPU reset lock (so that 
nobody else can start the scheduler again).



What about direct submissions not through scheduler - 
amdgpu_job_submit_direct, I don't see how this is protected.


Those only happen during startup and GPU reset.



Ok, but then looks like I am missing something, see the following 
steps in amdgpu_pci_remove -


1) Use disable_irq API function to stop and flush all in flight HW 
interrupts handlers


2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions 
and no more extractions from the array


3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
the GPU reset lock and go on with the rest of the sequence (cancel 
timers, work items e.t.c)


What's problematic in this sequence ?


drm_dev_unplug() will wait for the IOCTLs to finish.

The IOCTLs in turn can wait for fences. That can be both hardware 
fences, scheduler fences, as well as fences from other devices (and 
KIQ fences for register writes under SRIOV, but we can hopefully 
ignore them for now).


We have handled the hardware fences, but we have no idea when the 
scheduler fences or the fences from other devices will signal.


Scheduler fences won't signal until the scheduler threads are 
restarted or we somehow cancel the submissions. Doable, but tricky 
as well.



For scheduler fences I am not worried, for the sched_fence->finished 
fence they are by definition attached to HW fences which already 
signaledfor sched_fence->scheduled we should run 
drm_sched_entity_kill_jobs for each entity after stopping the 
scheduler threads and before setting drm_dev_unplug.


Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() 
assumes that there are no more jobs pushed into the entity.


We are racing here once more and need to handle that.



But why, I wrote above that we first stop the all schedulers, then only 
call drm_sched_entity_kill_jobs.







For waiting for other device I have no idea if that couldn't 
deadlock somehow.



Yea, not sure for imported fences and dma_bufs, I would assume the 
other devices should not be impacted by our device removal but, who 
knows...


So I guess we are NOT going with finalizing HW fences before 
drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
back-ends all over the place where there are MMIO accesses ?


Good question. As you said that is really the hard path.

Handling it all at once at IOCTL level certainly has some appeal as 
well, but I have no idea if we can guarantee that this is lock free.



Maybe just empirically - let's try it and see under different test 
scenarios what actually happens  ?


Andrey




Christian.



Andrey



Regards,
Christian.



Andrey









BTW: Could it be that the device SRCU protects more than one 
device and we deadlock because of this?



I haven't actually experienced any deadlock until now but, yes, 
drm_unplug_srcu is defined as static in drm_drv.c and so in the 
presence  of multiple devices from same or different drivers we 
in fact are dependent on all their critical sections i guess.




Shit, yeah the devil is a squirrel. So for A+I laptops we 
actually need to sync that up with Daniel and the rest of the 
i915 guys.


IIRC we could actually have an amdgpu device in a docking 
station which needs hotplug and the driver might depend on 
waiting for the i915 driver as well.



Can't we propose a patch to make drm_unplug_srcu per drm_device ? 
I don't see why it has to be global and not per device thing.


I'm really wondering the same thing for quite a while now.

Adding D

[PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO

2021-04-13 Thread Ramesh Errabolu

Extend current implementation of SG_TABLE construction method to
allow exportation of sub-buffers of a VRAM BO. This capability will
enable logical partitioning of a VRAM BO into multiple non-overlapping
sub-buffers. One example of this use case is to partition a VRAM BO
into two sub-buffers, one for SRC and another for DST.

Signed-off-by: Ramesh Errabolu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++--
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index e0c4f7c7f1b9..baa980a477d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -291,8 +291,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
break;
 
case TTM_PL_VRAM:
-   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev,
- dir, &sgt);
+   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0,
+   bo->tbo.base.size, attach->dev, dir, &sgt);
if (r)
return ERR_PTR(r);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index dec0db8b0b13..9e38475e0f8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager 
*man);
 u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo);
 int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 size,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 592a2dd16493..bce105e2973e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -25,6 +25,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_res_cursor.h"
 #include "amdgpu_atomfirmware.h"
 #include "atom.h"
 
@@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager 
*man,
  *
  * @adev: amdgpu device pointer
  * @mem: TTM memory object
+ * @offset: byte offset from the base of VRAM BO
+ * @length: number of bytes to export in sg_table
  * @dev: the other device
  * @dir: dma direction
  * @sgt: resulting sg table
@@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,
  */
 int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 length,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt)
 {
-   struct drm_mm_node *node;
+   struct amdgpu_res_cursor cursor;
struct scatterlist *sg;
int num_entries = 0;
-   unsigned int pages;
int i, r;
 
*sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);
if (!*sgt)
return -ENOMEM;
 
-   for (pages = mem->num_pages, node = mem->mm_node;
-pages; pages -= node->size, ++node)
-   ++num_entries;
+   /* Determine the number of DRM_MM nodes to export */
+   amdgpu_res_first(mem, offset, length, &cursor);
+   while (cursor.remaining) {
+   num_entries++;
+   amdgpu_res_next(&cursor, cursor.size);
+   }
 
r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);
if (r)
goto error_free;
 
+   /* Initialize scatterlist nodes of sg_table */
for_each_sgtable_sg((*sgt), sg, i)
sg->length = 0;
 
-   node = mem->mm_node;
+   /*
+* Walk down DRM_MM nodes to populate scatterlist nodes
+* @note: Use iterator api to get first the DRM_MM node
+* and the number of bytes from it. Access the following
+* DRM_MM node(s) if more buffer needs to exported
+*/
+   amdgpu_res_first(mem, offset, length, &cursor);
for_each_sgtable_sg((*sgt), sg, i) {
-   phys_addr_t phys = (node->start << PAGE_SHIFT) +
-   adev->gmc.aper_base;
-   size_t size = node->size << PAGE_SHIFT;
+   phys_addr_t phys = cursor.start + adev->gmc.aper_base;
+   size_t size = cursor.size;
dma_addr_t addr;
 
-   ++node;
addr = dma_map_resource(dev, phys, size, dir,
DMA

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability




Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:


On 2021-04-13 2:03 p.m., Christian König wrote:

Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:


On 2021-04-13 3:10 a.m., Christian König wrote:

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:


On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]


So what's the right approach ? How we guarantee that when 
running amdgpu_fence_driver_force_completion we will signal 
all the HW fences and not racing against some more fences 
insertion into that array ?




Well I would still say the best approach would be to insert 
this between the front end and the backend and not rely on 
signaling fences while holding the device srcu.



My question is, even now, when we run 
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
what there prevents a race with another fence being at the same 
time emitted and inserted into the fence array ? Looks like 
nothing.




Each ring can only be used by one thread at the same time, this 
includes emitting fences as well as other stuff.


During GPU reset we make sure nobody writes to the rings by 
stopping the scheduler and taking the GPU reset lock (so that 
nobody else can start the scheduler again).



What about direct submissions not through scheduler - 
amdgpu_job_submit_direct, I don't see how this is protected.


Those only happen during startup and GPU reset.



Ok, but then looks like I am missing something, see the following 
steps in amdgpu_pci_remove -


1) Use disable_irq API function to stop and flush all in flight HW 
interrupts handlers


2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions 
and no more extractions from the array


3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
the GPU reset lock and go on with the rest of the sequence (cancel 
timers, work items e.t.c)


What's problematic in this sequence ?


drm_dev_unplug() will wait for the IOCTLs to finish.

The IOCTLs in turn can wait for fences. That can be both hardware 
fences, scheduler fences, as well as fences from other devices (and 
KIQ fences for register writes under SRIOV, but we can hopefully 
ignore them for now).


We have handled the hardware fences, but we have no idea when the 
scheduler fences or the fences from other devices will signal.


Scheduler fences won't signal until the scheduler threads are 
restarted or we somehow cancel the submissions. Doable, but tricky as 
well.



For scheduler fences I am not worried, for the sched_fence->finished 
fence they are by definition attached to HW fences which already 
signaledfor sched_fence->scheduled we should run 
drm_sched_entity_kill_jobs for each entity after stopping the 
scheduler threads and before setting drm_dev_unplug.


Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() 
assumes that there are no more jobs pushed into the entity.


We are racing here once more and need to handle that.



For waiting for other device I have no idea if that couldn't deadlock 
somehow.



Yea, not sure for imported fences and dma_bufs, I would assume the 
other devices should not be impacted by our device removal but, who 
knows...


So I guess we are NOT going with finalizing HW fences before 
drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
back-ends all over the place where there are MMIO accesses ?


Good question. As you said that is really the hard path.

Handling it all at once at IOCTL level certainly has some appeal as 
well, but I have no idea if we can guarantee that this is lock free.


Christian.



Andrey



Regards,
Christian.



Andrey









BTW: Could it be that the device SRCU protects more than one 
device and we deadlock because of this?



I haven't actually experienced any deadlock until now but, yes, 
drm_unplug_srcu is defined as static in drm_drv.c and so in the 
presence  of multiple devices from same or different drivers we 
in fact are dependent on all their critical sections i guess.




Shit, yeah the devil is a squirrel. So for A+I laptops we 
actually need to sync that up with Daniel and the rest of the 
i915 guys.


IIRC we could actually have an amdgpu device in a docking station 
which needs hotplug and the driver might depend on waiting for 
the i915 driver as well.



Can't we propose a patch to make drm_unplug_srcu per drm_device ? 
I don't see why it has to be global and not per device thing.


I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
global.


Regards,
Christian.



Andrey




Christian.


Andrey




Christian.


Andrey





Andrey




Christian.

    /* Past this point no more fence are submitted to HW 
ring and hence we can safely call force signal o

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Andrey Grodzovsky



On 2021-04-13 2:03 p.m., Christian König wrote:

Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:


On 2021-04-13 3:10 a.m., Christian König wrote:

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:


On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]


So what's the right approach ? How we guarantee that when 
running amdgpu_fence_driver_force_completion we will signal all 
the HW fences and not racing against some more fences insertion 
into that array ?




Well I would still say the best approach would be to insert this 
between the front end and the backend and not rely on signaling 
fences while holding the device srcu.



My question is, even now, when we run 
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
what there prevents a race with another fence being at the same 
time emitted and inserted into the fence array ? Looks like nothing.




Each ring can only be used by one thread at the same time, this 
includes emitting fences as well as other stuff.


During GPU reset we make sure nobody writes to the rings by 
stopping the scheduler and taking the GPU reset lock (so that 
nobody else can start the scheduler again).



What about direct submissions not through scheduler - 
amdgpu_job_submit_direct, I don't see how this is protected.


Those only happen during startup and GPU reset.



Ok, but then looks like I am missing something, see the following 
steps in amdgpu_pci_remove -


1) Use disable_irq API function to stop and flush all in flight HW 
interrupts handlers


2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions 
and no more extractions from the array


3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
the GPU reset lock and go on with the rest of the sequence (cancel 
timers, work items e.t.c)


What's problematic in this sequence ?


drm_dev_unplug() will wait for the IOCTLs to finish.

The IOCTLs in turn can wait for fences. That can be both hardware 
fences, scheduler fences, as well as fences from other devices (and 
KIQ fences for register writes under SRIOV, but we can hopefully 
ignore them for now).


We have handled the hardware fences, but we have no idea when the 
scheduler fences or the fences from other devices will signal.


Scheduler fences won't signal until the scheduler threads are 
restarted or we somehow cancel the submissions. Doable, but tricky as 
well.



For scheduler fences I am not worried, for the sched_fence->finished 
fence they are by definition attached to HW fences which already 
signaled,for sched_fence->scheduled we should run 
drm_sched_entity_kill_jobs for each entity after stopping the scheduler 
threads and before setting drm_dev_unplug.





For waiting for other device I have no idea if that couldn't deadlock 
somehow.



Yea, not sure for imported fences and dma_bufs, I would assume the other 
devices should not be impacted by our device removal but, who knows...


So I guess we are NOT going with finalizing HW fences before 
drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
back-ends all over the place where there are MMIO accesses ?


Andrey



Regards,
Christian.



Andrey









BTW: Could it be that the device SRCU protects more than one 
device and we deadlock because of this?



I haven't actually experienced any deadlock until now but, yes, 
drm_unplug_srcu is defined as static in drm_drv.c and so in the 
presence  of multiple devices from same or different drivers we 
in fact are dependent on all their critical sections i guess.




Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
need to sync that up with Daniel and the rest of the i915 guys.


IIRC we could actually have an amdgpu device in a docking station 
which needs hotplug and the driver might depend on waiting for the 
i915 driver as well.



Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
don't see why it has to be global and not per device thing.


I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
global.


Regards,
Christian.



Andrey




Christian.


Andrey




Christian.


Andrey





Andrey




Christian.

    /* Past this point no more fence are submitted to HW 
ring and hence we can safely call force signal on all that 
are currently there.
 * Any subsequently created  HW fences will be returned 
signaled with an error code right away

 */

    for_each_ring(adev)
        amdgpu_fence_process(ring)

    drm_dev_unplug(dev);
    Stop schedulers
    cancel_sync(all timers and queued works);
    hw_fini
    unmap_mmio

}


Andrey









Alternatively grabbing the reset write side and 
stopping and then restarting the scheduler could work 
as well

Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO


Am 13.04.21 um 19:17 schrieb Ramesh Errabolu:

Extend current implementation of SG_TABLE construction method to
allow exportation of sub-buffers of a VRAM BO. This capability will
enable logical partitioning of a VRAM BO into multiple non-overlapping
sub-buffers. One example of this use case is to partition a VRAM BO
into two sub-buffers, one for SRC and another for DST.

Signed-off-by: Ramesh Errabolu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  6 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++--
  3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index e0c4f7c7f1b9..57534b93bd91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct sg_table *sgt;
+   u64 num_bytes;
long r;
  
  	if (!bo->tbo.pin_count) {

@@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
break;
  
  	case TTM_PL_VRAM:

-   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev,
- dir, &sgt);
+   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;


Please rather use bo->tbo.base.size here. It is already in bytes.


+   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, num_bytes,
+ attach->dev, dir, &sgt);
if (r)
return ERR_PTR(r);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index dec0db8b0b13..9e38475e0f8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager 
*man);
  u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo);
  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 size,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 592a2dd16493..c1a7772fa8e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -25,6 +25,7 @@
  #include 
  #include "amdgpu.h"
  #include "amdgpu_vm.h"
+#include "amdgpu_res_cursor.h"
  #include "amdgpu_atomfirmware.h"
  #include "atom.h"
  
@@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man,

   *
   * @adev: amdgpu device pointer
   * @mem: TTM memory object
+ * @offset: byte offset from the base of VRAM BO
+ * @length: number of bytes to export in sg_table
   * @dev: the other device
   * @dir: dma direction
   * @sgt: resulting sg table
@@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,
   */
  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 length,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt)
  {
-   struct drm_mm_node *node;
+   struct amdgpu_res_cursor cursor;
struct scatterlist *sg;
int num_entries = 0;
-   unsigned int pages;
int i, r;
  
  	*sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);

if (!*sgt)
return -ENOMEM;
  
-	for (pages = mem->num_pages, node = mem->mm_node;

-pages; pages -= node->size, ++node)
-   ++num_entries;
+   /* Determine the number of DRM_MM nodes to export */
+   amdgpu_res_first(mem, offset, length, &cursor);
+   while (cursor.remaining) {
+   num_entries++;
+   amdgpu_res_next(&cursor, cursor.size);
+   }
  
  	r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);

if (r)
goto error_free;
  
+	/* Initialize scatterlist nodes of sg_table */

for_each_sgtable_sg((*sgt), sg, i)
sg->length = 0;
  
-	node = mem->mm_node;

+   /*
+* Walk down DRM_MM nodes to populate scatterlist nodes
+* @note: Use iterator api to get first the DRM_MM node
+* and the number of bytes from it. Access the following
+* DRM_MM node(s) if more buffer needs to exported
+*/
+   amdgpu_res_first(mem, of

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability


Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:


On 2021-04-13 3:10 a.m., Christian König wrote:

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:


On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]


So what's the right approach ? How we guarantee that when 
running amdgpu_fence_driver_force_completion we will signal all 
the HW fences and not racing against some more fences insertion 
into that array ?




Well I would still say the best approach would be to insert this 
between the front end and the backend and not rely on signaling 
fences while holding the device srcu.



My question is, even now, when we run 
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
what there prevents a race with another fence being at the same 
time emitted and inserted into the fence array ? Looks like nothing.




Each ring can only be used by one thread at the same time, this 
includes emitting fences as well as other stuff.


During GPU reset we make sure nobody writes to the rings by 
stopping the scheduler and taking the GPU reset lock (so that 
nobody else can start the scheduler again).



What about direct submissions not through scheduler - 
amdgpu_job_submit_direct, I don't see how this is protected.


Those only happen during startup and GPU reset.



Ok, but then looks like I am missing something, see the following 
steps in amdgpu_pci_remove -


1) Use disable_irq API function to stop and flush all in flight HW 
interrupts handlers


2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions 
and no more extractions from the array


3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the 
GPU reset lock and go on with the rest of the sequence (cancel timers, 
work items e.t.c)


What's problematic in this sequence ?


drm_dev_unplug() will wait for the IOCTLs to finish.

The IOCTLs in turn can wait for fences. That can be both hardware 
fences, scheduler fences, as well as fences from other devices (and KIQ 
fences for register writes under SRIOV, but we can hopefully ignore them 
for now).


We have handled the hardware fences, but we have no idea when the 
scheduler fences or the fences from other devices will signal.


Scheduler fences won't signal until the scheduler threads are restarted 
or we somehow cancel the submissions. Doable, but tricky as well.


For waiting for other device I have no idea if that couldn't deadlock 
somehow.


Regards,
Christian.



Andrey









BTW: Could it be that the device SRCU protects more than one 
device and we deadlock because of this?



I haven't actually experienced any deadlock until now but, yes, 
drm_unplug_srcu is defined as static in drm_drv.c and so in the 
presence  of multiple devices from same or different drivers we in 
fact are dependent on all their critical sections i guess.




Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
need to sync that up with Daniel and the rest of the i915 guys.


IIRC we could actually have an amdgpu device in a docking station 
which needs hotplug and the driver might depend on waiting for the 
i915 driver as well.



Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
don't see why it has to be global and not per device thing.


I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.



Andrey




Christian.


Andrey




Christian.


Andrey





Andrey




Christian.

    /* Past this point no more fence are submitted to HW 
ring and hence we can safely call force signal on all that 
are currently there.
 * Any subsequently created  HW fences will be returned 
signaled with an error code right away

 */

    for_each_ring(adev)
        amdgpu_fence_process(ring)

    drm_dev_unplug(dev);
    Stop schedulers
    cancel_sync(all timers and queued works);
    hw_fini
    unmap_mmio

}


Andrey









Alternatively grabbing the reset write side and 
stopping and then restarting the scheduler could work 
as well.


Christian.



I didn't get the above and I don't see why I need to 
reuse the GPU reset rw_lock. I rely on the SRCU unplug 
flag for unplug. Also, not clear to me why are we 
focusing on the scheduler threads, any code patch to 
generate HW fences should be covered, so any code 
leading to amdgpu_fence_emit needs to be taken into 
account such as, direct IB submissions, VM flushes e.t.c


You need to work together with the reset lock anyway, 
cause a hotplug could run at the same time as a reset.



For going my way indeed now I see now that I have to take 
reset write side lock during HW fences signalling in order 
to protect against scheduler/HW fences detachment and 
reattachment during schedulers

[PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO

2021-04-13 Thread Ramesh Errabolu

Extend current implementation of SG_TABLE construction method to
allow exportation of sub-buffers of a VRAM BO. This capability will
enable logical partitioning of a VRAM BO into multiple non-overlapping
sub-buffers. One example of this use case is to partition a VRAM BO
into two sub-buffers, one for SRC and another for DST.

Signed-off-by: Ramesh Errabolu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 34 ++--
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index e0c4f7c7f1b9..57534b93bd91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct sg_table *sgt;
+   u64 num_bytes;
long r;
 
if (!bo->tbo.pin_count) {
@@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
break;
 
case TTM_PL_VRAM:
-   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev,
- dir, &sgt);
+   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
+   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, num_bytes,
+ attach->dev, dir, &sgt);
if (r)
return ERR_PTR(r);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index dec0db8b0b13..9e38475e0f8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager 
*man);
 u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo);
 int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 size,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 592a2dd16493..c1a7772fa8e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -25,6 +25,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_res_cursor.h"
 #include "amdgpu_atomfirmware.h"
 #include "atom.h"
 
@@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager 
*man,
  *
  * @adev: amdgpu device pointer
  * @mem: TTM memory object
+ * @offset: byte offset from the base of VRAM BO
+ * @length: number of bytes to export in sg_table
  * @dev: the other device
  * @dir: dma direction
  * @sgt: resulting sg table
@@ -573,39 +576,47 @@ static void amdgpu_vram_mgr_del(struct 
ttm_resource_manager *man,
  */
 int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
  struct ttm_resource *mem,
+ u64 offset, u64 length,
  struct device *dev,
  enum dma_data_direction dir,
  struct sg_table **sgt)
 {
-   struct drm_mm_node *node;
+   struct amdgpu_res_cursor cursor;
struct scatterlist *sg;
int num_entries = 0;
-   unsigned int pages;
int i, r;
 
*sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);
if (!*sgt)
return -ENOMEM;
 
-   for (pages = mem->num_pages, node = mem->mm_node;
-pages; pages -= node->size, ++node)
-   ++num_entries;
+   /* Determine the number of DRM_MM nodes to export */
+   amdgpu_res_first(mem, offset, length, &cursor);
+   while (cursor.remaining) {
+   num_entries++;
+   amdgpu_res_next(&cursor, cursor.size);
+   }
 
r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);
if (r)
goto error_free;
 
+   /* Initialize scatterlist nodes of sg_table */
for_each_sgtable_sg((*sgt), sg, i)
sg->length = 0;
 
-   node = mem->mm_node;
+   /*
+* Walk down DRM_MM nodes to populate scatterlist nodes
+* @note: Use iterator api to get first the DRM_MM node
+* and the number of bytes from it. Access the following
+* DRM_MM node(s) if more buffer needs to exported
+*/
+   amdgpu_res_first(mem, offset, length, &cursor);
for_each_sgtable_sg((*sgt), sg, i) {
-   phys_addr_t phys = (node->star

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Andrey Grodzovsky

On 2021-04-13 3:10 a.m., Christian König wrote:

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:

On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]

So what's the right approach ? How we guarantee that when running
amdgpu_fence_driver_force_completion we will signal all the HW
fences and not racing against some more fences insertion into
that array ?

Well I would still say the best approach would be to insert this
between the front end and the backend and not rely on signaling
fences while holding the device srcu.

My question is, even now, when we run
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
what there prevents a race with another fence being at the same
time emitted and inserted into the fence array ? Looks like nothing.

Each ring can only be used by one thread at the same time, this
includes emitting fences as well as other stuff.

During GPU reset we make sure nobody writes to the rings by stopping
the scheduler and taking the GPU reset lock (so that nobody else can
start the scheduler again).

What about direct submissions not through scheduler -
amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.

Ok, but then looks like I am missing something, see the following steps
in amdgpu_pci_remove -

1) Use disable_irq API function to stop and flush all in flight HW
interrupts handlers

2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions and
no more extractions from the array

3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the
GPU reset lock and go on with the rest of the sequence (cancel timers,
work items e.t.c)

What's problematic in this sequence ?

Andrey

BTW: Could it be that the device SRCU protects more than one
device and we deadlock because of this?

I haven't actually experienced any deadlock until now but, yes,
drm_unplug_srcu is defined as static in drm_drv.c and so in the
presence of multiple devices from same or different drivers we in
fact are dependent on all their critical sections i guess.

Shit, yeah the devil is a squirrel. So for A+I laptops we actually
need to sync that up with Daniel and the rest of the i915 guys.

IIRC we could actually have an amdgpu device in a docking station
which needs hotplug and the driver might depend on waiting for the
i915 driver as well.

Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.

Andrey

Christian.

Andrey

Christian.

Andrey

Christian.

/* Past this point no more fence are submitted to HW ring
and hence we can safely call force signal on all that are
currently there.
* Any subsequently created HW fences will be returned
signaled with an error code right away

for_each_ring(adev)
amdgpu_fence_process(ring)

drm_dev_unplug(dev);
Stop schedulers
cancel_sync(all timers and queued works);
hw_fini
unmap_mmio

}

Andrey

Alternatively grabbing the reset write side and stopping
and then restarting the scheduler could work as well.

Christian.

I didn't get the above and I don't see why I need to
reuse the GPU reset rw_lock. I rely on the SRCU unplug
flag for unplug. Also, not clear to me why are we
focusing on the scheduler threads, any code patch to
generate HW fences should be covered, so any code leading
to amdgpu_fence_emit needs to be taken into account such
as, direct IB submissions, VM flushes e.t.c

You need to work together with the reset lock anyway,
cause a hotplug could run at the same time as a reset.

For going my way indeed now I see now that I have to take
reset write side lock during HW fences signalling in order
to protect against scheduler/HW fences detachment and
reattachment during schedulers stop/restart. But if we go
with your approach then calling drm_dev_unplug and scoping
amdgpu_job_timeout with drm_dev_enter/exit should be enough
to prevent any concurrent GPU resets during unplug. In fact
I already do it anyway -
https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjze

Re: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO

2021-04-13 Thread Alex Deucher

On Mon, Apr 12, 2021 at 8:15 AM Roy Sun  wrote:
>
> Add interface to get the mm clock, temperature and memory load
>
> Signed-off-by: Roy Sun 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 +
>  include/uapi/drm/amdgpu_drm.h   | 12 ++
>  2 files changed, 62 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b2e774aeab45..e5b16e0819ce 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
> *data, struct drm_file *filp)
> }
> ui32 /= 100;
> break;
> +   case AMDGPU_INFO_SENSOR_UVD_VCLK:
> +   /* get mm vclk in Mhz */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  AMDGPU_PP_SENSOR_UVD_VCLK,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   ui32 /= 100;
> +   break;
> +   case AMDGPU_INFO_SENSOR_UVD_DCLK:
> +   /* get mm dclk in Mhz */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  AMDGPU_PP_SENSOR_UVD_DCLK,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   ui32 /= 100;
> +   break;
> +   case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP:
> +   /* get junction temperature */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  
> AMDGPU_PP_SENSOR_HOTSPOT_TEMP,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   break;
> +   case AMDGPU_INFO_SENSOR_EDGE_TEMP:
> +   /* get current edge temperature */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  AMDGPU_PP_SENSOR_EDGE_TEMP,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   break;
> +   case AMDGPU_INFO_SENSOR_MEM_TEMP:
> +   /* get current memory temperature */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  AMDGPU_PP_SENSOR_MEM_TEMP,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   break;
> +   case AMDGPU_INFO_SENSOR_MEM_LOAD:
> +   /* get memory load */
> +   if (amdgpu_dpm_read_sensor(adev,
> +  AMDGPU_PP_SENSOR_MEM_LOAD,
> +  (void *)&ui32, 
> &ui32_size)) {
> +   return -EINVAL;
> +   }
> +   break;
> default:
> DRM_DEBUG_KMS("Invalid request %d\n",
>   info->sensor_info.type);
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 8b832f7458f2..484c72e17c72 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data {
> #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK   0x8
> /* Subquery id: Query GPU stable pstate memory clock */
> #define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK   0x9
> +   /* Subquery id: Query GPU mm vclk */
> +   #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa
> +   /* Subquery id: Query GPU mm dclk */
> +   #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb
> +   /* Subquery id: Query junction temperature */
> +   #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc
> +   /* Subquery id: Query edge temperature */
> +   #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd
> +   /* Subquery id: Query memory temperature */
> +   #define AMDGPU_INFO_SENSOR_MEM_TEMP 0xe
> +   /* Subquery id: Query Memory load */
> +   #define AMDGPU_INFO_SENSOR_MEM_LOAD 0xf

+ Tom

Please provide a link to patches for some userspace tool which uses
these new queries.  Something like umr would be a logical choice.
Once you have that, the patch itself loo

[PATCH] drm/amdgpu: remove set but not used variables

2021-04-13 Thread Tian Tao

The value of ret set but will rewriten, so just delete.

Signed-off-by: Tian Tao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1fb2a91..f303ad6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1815,10 +1815,8 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
return 0;
 
*data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO);
-   if (!*data) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!*data)
+   return -ENOMEM;
 
mutex_init(&con->recovery_lock);
INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery);
@@ -1848,7 +1846,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
kfree((*data)->bps);
kfree(*data);
con->eh_data = NULL;
-out:
+
dev_warn(adev->dev, "Failed to initialize ras recovery!\n");
 
/*
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 1/5] drm/connector: Create a helper to attach the hdr_output_metadata property

All the drivers that implement HDR output call pretty much the same
function to initialise the hdr_output_metadata property, and while the
creation of that property is in a helper, every driver uses the same
code to attach it.

Provide a helper for it as well

Reviewed-by: Harry Wentland 
Reviewed-by: Jernej Skrabec 
Signed-off-by: Maxime Ripard 

---

Changes from v1:
  - Rebased on latest drm-misc-next tag
  - Added the tags
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  4 +---
 drivers/gpu/drm/bridge/synopsys/dw-hdmi.c |  3 +--
 drivers/gpu/drm/drm_connector.c   | 21 +++
 drivers/gpu/drm/i915/display/intel_hdmi.c |  3 +--
 include/drm/drm_connector.h   |  1 +
 5 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 55e39b462a5e..1e22ce718010 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7078,9 +7078,7 @@ void amdgpu_dm_connector_init_helper(struct 
amdgpu_display_manager *dm,
if (connector_type == DRM_MODE_CONNECTOR_HDMIA ||
connector_type == DRM_MODE_CONNECTOR_DisplayPort ||
connector_type == DRM_MODE_CONNECTOR_eDP) {
-   drm_object_attach_property(
-   &aconnector->base.base,
-   dm->ddev->mode_config.hdr_output_metadata_property, 0);
+   
drm_connector_attach_hdr_output_metadata_property(&aconnector->base);
 
if (!aconnector->mst_port)

drm_connector_attach_vrr_capable_property(&aconnector->base);
diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index dda4fa9a1a08..f24bbb840dbf 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -2492,8 +2492,7 @@ static int dw_hdmi_connector_create(struct dw_hdmi *hdmi)
drm_connector_attach_max_bpc_property(connector, 8, 16);
 
if (hdmi->version >= 0x200a && hdmi->plat_data->use_drm_infoframe)
-   drm_object_attach_property(&connector->base,
-   
connector->dev->mode_config.hdr_output_metadata_property, 0);
+   drm_connector_attach_hdr_output_metadata_property(connector);
 
drm_connector_attach_encoder(connector, hdmi->bridge.encoder);
 
diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index 7631f76e7f34..a4aa2d87af35 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -2150,6 +2150,27 @@ int drm_connector_attach_max_bpc_property(struct 
drm_connector *connector,
 }
 EXPORT_SYMBOL(drm_connector_attach_max_bpc_property);
 
+/**
+ * drm_connector_attach_hdr_output_metadata_property - attach 
"HDR_OUTPUT_METADA" property
+ * @connector: connector to attach the property on.
+ *
+ * This is used to allow the userspace to send HDR Metadata to the
+ * driver.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
+int drm_connector_attach_hdr_output_metadata_property(struct drm_connector 
*connector)
+{
+   struct drm_device *dev = connector->dev;
+   struct drm_property *prop = 
dev->mode_config.hdr_output_metadata_property;
+
+   drm_object_attach_property(&connector->base, prop, 0);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property);
+
 /**
  * drm_connector_set_vrr_capable_property - sets the variable refresh rate
  * capable property for a connector
diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
b/drivers/gpu/drm/i915/display/intel_hdmi.c
index 95919d325b0b..f2f1b025e6ba 100644
--- a/drivers/gpu/drm/i915/display/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
@@ -2965,8 +2965,7 @@ intel_hdmi_add_properties(struct intel_hdmi *intel_hdmi, 
struct drm_connector *c
drm_connector_attach_content_type_property(connector);
 
if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
-   drm_object_attach_property(&connector->base,
-   
connector->dev->mode_config.hdr_output_metadata_property, 0);
+   drm_connector_attach_hdr_output_metadata_property(connector);
 
if (!HAS_GMCH(dev_priv))
drm_connector_attach_max_bpc_property(connector, 8, 12);
diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
index 1922b278ffad..32172dab8427 100644
--- a/include/drm/drm_connector.h
+++ b/include/drm/drm_connector.h
@@ -1671,6 +1671,7 @@ int drm_connector_attach_scaling_mode_property(struct 
drm_connector *connector,
   u32 scaling_mode_mask);
 int drm_connector_attach_vrr_capable_property(
struct drm_connector *connector);
+int drm_connector_attach_hdr_output_metadata_property(struct drm_connector 
*connector);
 int drm_mode_create_

[PATCH v2 5/5] drm/vc4: hdmi: Signal the proper colorimetry info in the infoframe

Our driver while supporting HDR didn't send the proper colorimetry info
in the AVI infoframe.

Let's add the property needed so that the userspace can let us know what
the colorspace is supposed to be.

Signed-off-by: Maxime Ripard 

---

Changes from v1:
  - New patch
---
 drivers/gpu/drm/vc4/vc4_hdmi.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index a33fa1662588..a22e17788074 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -226,7 +226,8 @@ static int vc4_hdmi_connector_atomic_check(struct 
drm_connector *connector,
if (!crtc)
return 0;
 
-   if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) {
+   if (old_state->colorspace != new_state->colorspace ||
+   !drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) {
struct drm_crtc_state *crtc_state;
 
crtc_state = drm_atomic_get_crtc_state(state, crtc);
@@ -316,6 +317,11 @@ static int vc4_hdmi_connector_init(struct drm_device *dev,
if (ret)
return ret;
 
+   ret = drm_mode_create_hdmi_colorspace_property(connector);
+   if (ret)
+   return ret;
+
+   drm_connector_attach_colorspace_property(connector);
drm_connector_attach_tv_margin_properties(connector);
drm_connector_attach_max_bpc_property(connector, 8, 12);
 
@@ -424,7 +430,7 @@ static void vc4_hdmi_set_avi_infoframe(struct drm_encoder 
*encoder)
   vc4_encoder->limited_rgb_range ?
   HDMI_QUANTIZATION_RANGE_LIMITED :
   HDMI_QUANTIZATION_RANGE_FULL);
-
+   drm_hdmi_avi_infoframe_colorspace(&frame.avi, cstate);
drm_hdmi_avi_infoframe_bars(&frame.avi, cstate);
 
vc4_hdmi_write_infoframe(encoder, &frame);
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 4/5] drm/connector: Add a helper to attach the colorspace property

The intel driver uses the same logic to attach the Colorspace property
in multiple places and we'll need it in vc4 too. Let's move that common
code in a helper.

Signed-off-by: Maxime Ripard 

---

Changes from v1:
  - New patch
---
 drivers/gpu/drm/drm_connector.c   | 20 +++
 .../gpu/drm/i915/display/intel_connector.c|  6 ++
 include/drm/drm_connector.h   |  1 +
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index b13021b1b128..6a20b249e533 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -2171,6 +2171,26 @@ int 
drm_connector_attach_hdr_output_metadata_property(struct drm_connector *conn
 }
 EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property);
 
+/**
+ * drm_connector_attach_colorspace_property - attach "Colorspace" property
+ * @connector: connector to attach the property on.
+ *
+ * This is used to allow the userspace to signal the output colorspace
+ * to the driver.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
+int drm_connector_attach_colorspace_property(struct drm_connector *connector)
+{
+   struct drm_property *prop = connector->colorspace_property;
+
+   drm_object_attach_property(&connector->base, prop, 
DRM_MODE_COLORIMETRY_DEFAULT);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_connector_attach_colorspace_property);
+
 /**
  * drm_connector_atomic_hdr_metadata_equal - checks if the hdr metadata changed
  * @old_state: old connector state to compare
diff --git a/drivers/gpu/drm/i915/display/intel_connector.c 
b/drivers/gpu/drm/i915/display/intel_connector.c
index d5ceb7bdc14b..9bed1ccecea0 100644
--- a/drivers/gpu/drm/i915/display/intel_connector.c
+++ b/drivers/gpu/drm/i915/display/intel_connector.c
@@ -282,14 +282,12 @@ void
 intel_attach_hdmi_colorspace_property(struct drm_connector *connector)
 {
if (!drm_mode_create_hdmi_colorspace_property(connector))
-   drm_object_attach_property(&connector->base,
-  connector->colorspace_property, 0);
+   drm_connector_attach_colorspace_property(connector);
 }
 
 void
 intel_attach_dp_colorspace_property(struct drm_connector *connector)
 {
if (!drm_mode_create_dp_colorspace_property(connector))
-   drm_object_attach_property(&connector->base,
-  connector->colorspace_property, 0);
+   drm_connector_attach_colorspace_property(connector);
 }
diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
index 1f51d73ca715..714d1a01c065 100644
--- a/include/drm/drm_connector.h
+++ b/include/drm/drm_connector.h
@@ -1671,6 +1671,7 @@ int drm_connector_attach_scaling_mode_property(struct 
drm_connector *connector,
   u32 scaling_mode_mask);
 int drm_connector_attach_vrr_capable_property(
struct drm_connector *connector);
+int drm_connector_attach_colorspace_property(struct drm_connector *connector);
 int drm_connector_attach_hdr_output_metadata_property(struct drm_connector 
*connector);
 bool drm_connector_atomic_hdr_metadata_equal(struct drm_connector_state 
*old_state,
 struct drm_connector_state 
*new_state);
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 3/5] drm/vc4: Add HDR metadata property to the VC5 HDMI connectors

From: Dave Stevenson 

Now that we can export deeper colour depths, add in the signalling
for HDR metadata.

Signed-off-by: Dave Stevenson 
Signed-off-by: Maxime Ripard 

---

Changes from v1:
  - Rebased on latest drm-misc-next tag
---
 drivers/gpu/drm/vc4/vc4_hdmi.c | 53 ++
 drivers/gpu/drm/vc4/vc4_hdmi.h |  3 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 1fda574579af..a33fa1662588 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -214,6 +214,31 @@ static int vc4_hdmi_connector_get_modes(struct 
drm_connector *connector)
return ret;
 }
 
+static int vc4_hdmi_connector_atomic_check(struct drm_connector *connector,
+ struct drm_atomic_state *state)
+{
+   struct drm_connector_state *old_state =
+   drm_atomic_get_old_connector_state(state, connector);
+   struct drm_connector_state *new_state =
+   drm_atomic_get_new_connector_state(state, connector);
+   struct drm_crtc *crtc = new_state->crtc;
+
+   if (!crtc)
+   return 0;
+
+   if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) {
+   struct drm_crtc_state *crtc_state;
+
+   crtc_state = drm_atomic_get_crtc_state(state, crtc);
+   if (IS_ERR(crtc_state))
+   return PTR_ERR(crtc_state);
+
+   crtc_state->mode_changed = true;
+   }
+
+   return 0;
+}
+
 static void vc4_hdmi_connector_reset(struct drm_connector *connector)
 {
struct vc4_hdmi_connector_state *old_state =
@@ -263,6 +288,7 @@ static const struct drm_connector_funcs 
vc4_hdmi_connector_funcs = {
 
 static const struct drm_connector_helper_funcs vc4_hdmi_connector_helper_funcs 
= {
.get_modes = vc4_hdmi_connector_get_modes,
+   .atomic_check = vc4_hdmi_connector_atomic_check,
 };
 
 static int vc4_hdmi_connector_init(struct drm_device *dev,
@@ -299,6 +325,9 @@ static int vc4_hdmi_connector_init(struct drm_device *dev,
connector->interlace_allowed = 1;
connector->doublescan_allowed = 0;
 
+   if (vc4_hdmi->variant->supports_hdr)
+   drm_connector_attach_hdr_output_metadata_property(connector);
+
drm_connector_attach_encoder(connector, encoder);
 
return 0;
@@ -432,6 +461,25 @@ static void vc4_hdmi_set_audio_infoframe(struct 
drm_encoder *encoder)
vc4_hdmi_write_infoframe(encoder, &frame);
 }
 
+static void vc4_hdmi_set_hdr_infoframe(struct drm_encoder *encoder)
+{
+   struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder);
+   struct drm_connector *connector = &vc4_hdmi->connector;
+   struct drm_connector_state *conn_state = connector->state;
+   union hdmi_infoframe frame;
+
+   if (!vc4_hdmi->variant->supports_hdr)
+   return;
+
+   if (!conn_state->hdr_output_metadata)
+   return;
+
+   if (drm_hdmi_infoframe_set_hdr_metadata(&frame.drm, conn_state))
+   return;
+
+   vc4_hdmi_write_infoframe(encoder, &frame);
+}
+
 static void vc4_hdmi_set_infoframes(struct drm_encoder *encoder)
 {
struct vc4_hdmi *vc4_hdmi = encoder_to_vc4_hdmi(encoder);
@@ -444,6 +492,8 @@ static void vc4_hdmi_set_infoframes(struct drm_encoder 
*encoder)
 */
if (vc4_hdmi->audio.streaming)
vc4_hdmi_set_audio_infoframe(encoder);
+
+   vc4_hdmi_set_hdr_infoframe(encoder);
 }
 
 static void vc4_hdmi_encoder_post_crtc_disable(struct drm_encoder *encoder,
@@ -2102,6 +2152,7 @@ static const struct vc4_hdmi_variant bcm2835_variant = {
.phy_rng_enable = vc4_hdmi_phy_rng_enable,
.phy_rng_disable= vc4_hdmi_phy_rng_disable,
.channel_map= vc4_hdmi_channel_map,
+   .supports_hdr   = false,
 };
 
 static const struct vc4_hdmi_variant bcm2711_hdmi0_variant = {
@@ -2129,6 +2180,7 @@ static const struct vc4_hdmi_variant 
bcm2711_hdmi0_variant = {
.phy_rng_enable = vc5_hdmi_phy_rng_enable,
.phy_rng_disable= vc5_hdmi_phy_rng_disable,
.channel_map= vc5_hdmi_channel_map,
+   .supports_hdr   = true,
 };
 
 static const struct vc4_hdmi_variant bcm2711_hdmi1_variant = {
@@ -2156,6 +2208,7 @@ static const struct vc4_hdmi_variant 
bcm2711_hdmi1_variant = {
.phy_rng_enable = vc5_hdmi_phy_rng_enable,
.phy_rng_disable= vc5_hdmi_phy_rng_disable,
.channel_map= vc5_hdmi_channel_map,
+   .supports_hdr   = true,
 };
 
 static const struct of_device_id vc4_hdmi_dt_match[] = {
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h
index 3cebd1fd00fc..060bcaefbeb5 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.h
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.h
@@ -99,6 +99,9 @@ struct vc4_hdmi_variant {
 
/* Callback to get channel map */

[PATCH v2 2/5] drm/connector: Add helper to compare HDR metadata

All the drivers that support the HDR metadata property have a similar
function to compare the metadata from one connector state to the next,
and force a mode change if they differ.

All these functions run pretty much the same code, so let's turn it into
an helper that can be shared across those drivers.

Reviewed-by: Harry Wentland 
Reviewed-by: Jernej Skrabec 
Signed-off-by: Maxime Ripard 

---

Changes from v1:
  - Rebased on latest drm-misc-next tag
  - Added the tags
  - Fix build breakage on amdgpu
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 23 ++-
 drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 17 +--
 drivers/gpu/drm/drm_connector.c   | 28 +++
 drivers/gpu/drm/i915/display/intel_atomic.c   | 13 +
 include/drm/drm_connector.h   |  2 ++
 5 files changed, 34 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1e22ce718010..ed1972fb531d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5967,25 +5967,6 @@ static int fill_hdr_info_packet(const struct 
drm_connector_state *state,
return 0;
 }
 
-static bool
-is_hdr_metadata_different(const struct drm_connector_state *old_state,
- const struct drm_connector_state *new_state)
-{
-   struct drm_property_blob *old_blob = old_state->hdr_output_metadata;
-   struct drm_property_blob *new_blob = new_state->hdr_output_metadata;
-
-   if (old_blob != new_blob) {
-   if (old_blob && new_blob &&
-   old_blob->length == new_blob->length)
-   return memcmp(old_blob->data, new_blob->data,
- old_blob->length);
-
-   return true;
-   }
-
-   return false;
-}
-
 static int
 amdgpu_dm_connector_atomic_check(struct drm_connector *conn,
 struct drm_atomic_state *state)
@@ -6003,7 +5984,7 @@ amdgpu_dm_connector_atomic_check(struct drm_connector 
*conn,
if (!crtc)
return 0;
 
-   if (is_hdr_metadata_different(old_con_state, new_con_state)) {
+   if (!drm_connector_atomic_hdr_metadata_equal(old_con_state, 
new_con_state)) {
struct dc_info_packet hdr_infopacket;
 
ret = fill_hdr_info_packet(new_con_state, &hdr_infopacket);
@@ -8357,7 +8338,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
  dm_old_crtc_state->abm_level;
 
hdr_changed =
-   is_hdr_metadata_different(old_con_state, new_con_state);
+   !drm_connector_atomic_hdr_metadata_equal(old_con_state, 
new_con_state);
 
if (!scaling_changed && !abm_changed && !hdr_changed)
continue;
diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index f24bbb840dbf..f871e33c2fc9 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -2395,21 +2395,6 @@ static int dw_hdmi_connector_get_modes(struct 
drm_connector *connector)
return ret;
 }
 
-static bool hdr_metadata_equal(const struct drm_connector_state *old_state,
-  const struct drm_connector_state *new_state)
-{
-   struct drm_property_blob *old_blob = old_state->hdr_output_metadata;
-   struct drm_property_blob *new_blob = new_state->hdr_output_metadata;
-
-   if (!old_blob || !new_blob)
-   return old_blob == new_blob;
-
-   if (old_blob->length != new_blob->length)
-   return false;
-
-   return !memcmp(old_blob->data, new_blob->data, old_blob->length);
-}
-
 static int dw_hdmi_connector_atomic_check(struct drm_connector *connector,
  struct drm_atomic_state *state)
 {
@@ -2423,7 +2408,7 @@ static int dw_hdmi_connector_atomic_check(struct 
drm_connector *connector,
if (!crtc)
return 0;
 
-   if (!hdr_metadata_equal(old_state, new_state)) {
+   if (!drm_connector_atomic_hdr_metadata_equal(old_state, new_state)) {
crtc_state = drm_atomic_get_crtc_state(state, crtc);
if (IS_ERR(crtc_state))
return PTR_ERR(crtc_state);
diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index a4aa2d87af35..b13021b1b128 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -2171,6 +2171,34 @@ int 
drm_connector_attach_hdr_output_metadata_property(struct drm_connector *conn
 }
 EXPORT_SYMBOL(drm_connector_attach_hdr_output_metadata_property);
 
+/**
+ * drm_connector_atomic_hdr_metadata_equal - checks if the hdr metadata changed
+ * @old_state: old connector state to compare
+ * @new_state: new connector

Re: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

2021-04-13 Thread Deucher, Alexander

[AMD Public Use]

Series is:
Reviewed-by: Alex Deucher 

From: Zhang, Hawking 
Sent: Tuesday, April 13, 2021 9:04 AM
To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander 
Subject: RE: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

[AMD Public Use]

Series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Tuikov, Luben 
Sent: Tuesday, April 13, 2021 20:56
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Deucher, Alexander 
; Zhang, Hawking 
Subject: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the 
context.

Cc: Alexander Deucher 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 +
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 30cda4b8a461..44dfb3613e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
 /**
  * DOC: AMDGPU RAS debugfs control interface
  *
- * It accepts struct ras_debug_if who has two members.
+ * The control interface accepts struct ras_debug_if which has two members.
  *
  * First member: ras_debug_if::head or ras_debug_if::inject.
  *
@@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  *
  * How to use the interface?
  *
- * Programs
+ * In a program
  *
- * Copy the struct ras_debug_if in your codes and initialize it.
- * Write the struct to the control node.
+ * Copy the struct ras_debug_if in your code and initialize it.
+ * Write the struct to the control interface.
  *
- * Shells
+ * From shell
  *
  * .. code-block:: bash
  *
- * echo op block [error [sub_block address value]] > .../ras/ras_ctrl
+ * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "enable   " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "inject   > 
/sys/kernel/debug/dri//ras/ras_ctrl
  *
- * Parameters:
+ * Where N, is the card which you want to affect.
  *
- * op: disable, enable, inject
- * disable: only block is needed
- * enable: block and error are needed
- * inject: error, address, value are needed
- * block: umc, sdma, gfx, .
+ * "disable" requires only the block.
+ * "enable" requires the block and error type.
+ * "inject" requires the block, error type, address, and value.
+ * The block is one of: umc, sdma, gfx, etc.
  * see ras_block_string[] for details
- * error: ue, ce
- * ue: multi_uncorrectable
- * ce: single_correctable
- * sub_block:
- * sub block index, pass 0 if there is no sub block
+ * The error type is one of: ue, ce, where,
+ * ue is multi-uncorrectable
+ * ce is single-correctable
+ * The sub-block is a the sub-block index, pass 0 if there is no sub-block.
+ * The address and value are hexadecimal numbers, leading 0x is optional.
  *
- * here are some examples for bash commands:
+ * For instance,
  *
  * .. code-block:: bash
  *
@@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl
  * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl
  *
- * How to check the result?
+ * How to check the result of the operation?
  *
- * For disable/enable, please check ras features at
+ * To check disable/enable, see "ras" features at,
  * /sys/class/drm/card[0/1/2...]/device/ras/features
  *
- * For inject, please check corresponding err count at
- * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count
+ * To check inject, see the corresponding error count at,
+ *
+ /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count
  *
  * .. note::
  * Operations are only allowed on blocks which are supported.
- * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask
+ * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask
  * to see which blocks support RAS on a particular asic.
  *
  */
--
2.31.0.97.g1424303384
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: correction of ucode fw_size calculation errors

2021-04-13 Thread Deucher, Alexander

[AMD Public Use]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Kevin Wang 

Sent: Tuesday, April 13, 2021 7:51 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Wang, Kevin(Yang) 
Subject: [PATCH] drm/amdgpu: correction of ucode fw_size calculation errors

correct big and little endian problems on different platforms.

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 85a6a10e048f..0a00a7cc8eaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4354,7 +4354,7 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev)
  
le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes));
 fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, 
mec_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
   &adev->gfx.mec.mec_fw_obj,
   &adev->gfx.mec.mec_fw_gpu_addr,
@@ -5769,7 +5769,7 @@ static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct 
amdgpu_device *adev)
 le32_to_cpu(pfp_hdr->header.ucode_array_offset_bytes));
 fw_size = le32_to_cpu(pfp_hdr->header.ucode_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, pfp_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
   &adev->gfx.pfp.pfp_fw_obj,
   &adev->gfx.pfp.pfp_fw_gpu_addr,
@@ -5847,7 +5847,7 @@ static int gfx_v10_0_cp_gfx_load_ce_microcode(struct 
amdgpu_device *adev)
 le32_to_cpu(ce_hdr->header.ucode_array_offset_bytes));
 fw_size = le32_to_cpu(ce_hdr->header.ucode_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, ce_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
   &adev->gfx.ce.ce_fw_obj,
   &adev->gfx.ce.ce_fw_gpu_addr,
@@ -5924,7 +5924,7 @@ static int gfx_v10_0_cp_gfx_load_me_microcode(struct 
amdgpu_device *adev)
 le32_to_cpu(me_hdr->header.ucode_array_offset_bytes));
 fw_size = le32_to_cpu(me_hdr->header.ucode_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, me_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
   &adev->gfx.me.me_fw_obj,
   &adev->gfx.me.me_fw_gpu_addr,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 06811a1f4625..831398e15b84 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2013,7 +2013,7 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
  le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes));
 fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
   &adev->gfx.mec.mec_fw_obj,
   &adev->gfx.mec.mec_fw_gpu_addr,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
index 6274cae4a065..00610def69ea 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
@@ -174,7 +174,7 @@ int smu_v11_0_load_microcode(struct smu_context *smu)
 hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data;
 src = (const uint32_t *)(adev->pm.fw->data +
 le32_to_cpu(hdr->header.ucode_array_offset_bytes));
-   smc_fw_size = hdr->header.ucode_size_bytes;
+   smc_fw_size = le32_to_cpu(hdr->header.ucode_size_bytes;)

 for (i = 1; i < smc_fw_size/4 - 1; i++) {
 WREG32_PCIE(addr_start, src[i]);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 0864083e7435..d7ea8215d7dc 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu

Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma

2021-04-13 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

Shouldn't we so something similar for sdma 5.0 as well?

Alex


From: Su, Jinzhou (Joe) 
Sent: Tuesday, April 13, 2021 2:23 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Huang, Ray ; Deucher, Alexander 
; Koenig, Christian ; Su, 
Jinzhou (Joe) 
Subject: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma

  Add emit mem sync callback for sdma_v5_2

Signed-off-by: Jinzhou Su 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 93f826a7d3f0..b1ad9e52b234 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -369,6 +369,33 @@ static void sdma_v5_2_ring_emit_ib(struct amdgpu_ring 
*ring,
 amdgpu_ring_write(ring, upper_32_bits(csa_mc_addr));
 }

+/**
+ * sdma_v5_2_ring_emit_mem_sync - flush the IB by graphics cache rinse
+ *
+ * @ring: amdgpu ring pointer
+ * @job: job to retrieve vmid from
+ * @ib: IB object to schedule
+ *
+ * flush the IB by graphics cache rinse.
+ */
+static void sdma_v5_2_ring_emit_mem_sync(struct amdgpu_ring *ring)
+{
+uint32_t gcr_cntl =
+   SDMA_GCR_GL2_INV | SDMA_GCR_GL2_WB | SDMA_GCR_GLM_INV |
+   SDMA_GCR_GL1_INV | SDMA_GCR_GLV_INV | SDMA_GCR_GLK_INV |
+   SDMA_GCR_GLI_INV(1);
+
+   /* flush entire cache L0/L1/L2, this can be optimized by performance 
requirement */
+   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD1_BASE_VA_31_7(0));
+   amdgpu_ring_write(ring, 
SDMA_PKT_GCR_REQ_PAYLOAD2_GCR_CONTROL_15_0(gcr_cntl) |
+   SDMA_PKT_GCR_REQ_PAYLOAD2_BASE_VA_47_32(0));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD3_LIMIT_VA_31_7(0) |
+   SDMA_PKT_GCR_REQ_PAYLOAD3_GCR_CONTROL_18_16(gcr_cntl >> 
16));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD4_LIMIT_VA_47_32(0) |
+   SDMA_PKT_GCR_REQ_PAYLOAD4_VMID(0));
+}
+
 /**
  * sdma_v5_2_ring_emit_hdp_flush - emit an hdp flush on the DMA ring
  *
@@ -1663,6 +1690,7 @@ static const struct amdgpu_ring_funcs 
sdma_v5_2_ring_funcs = {
 10 + 10 + 10, /* sdma_v5_2_ring_emit_fence x3 for user fence, 
vm fence */
 .emit_ib_size = 7 + 6, /* sdma_v5_2_ring_emit_ib */
 .emit_ib = sdma_v5_2_ring_emit_ib,
+   .emit_mem_sync = sdma_v5_2_ring_emit_mem_sync,
 .emit_fence = sdma_v5_2_ring_emit_fence,
 .emit_pipeline_sync = sdma_v5_2_ring_emit_pipeline_sync,
 .emit_vm_flush = sdma_v5_2_ring_emit_vm_flush,
--
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Use iterator methods exposed by amdgpu_res_cursor.h in building SG_TABLE's for a VRAM BO

2021-04-13 Thread Alex Deucher

On Mon, Apr 12, 2021 at 7:28 PM Ramesh Errabolu  wrote:
>
> Extend current implementation of SG_TABLE construction method to
> allow exportation of sub-buffers of a VRAM BO. This capability will
> enable logical partitioning of a VRAM BO into multiple non-overlapping
> sub-buffers. One example of this use case is to partition a VRAM BO
> into two sub-buffers, one for SRC and another for DST.
>
> Signed-off-by: Ramesh Errabolu 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  6 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 32 ++--
>  3 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index e0c4f7c7f1b9..57534b93bd91 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -255,6 +255,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
> dma_buf_attachment *attach,
> struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> struct sg_table *sgt;
> +   u64 num_bytes;
> long r;
>
> if (!bo->tbo.pin_count) {
> @@ -291,8 +292,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
> dma_buf_attachment *attach,
> break;
>
> case TTM_PL_VRAM:
> -   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, attach->dev,
> - dir, &sgt);
> +   num_bytes = bo->tbo.mem.num_pages << PAGE_SHIFT;
> +   r = amdgpu_vram_mgr_alloc_sgt(adev, &bo->tbo.mem, 0, 
> num_bytes,
> + attach->dev, dir, &sgt);
> if (r)
> return ERR_PTR(r);
> break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index dec0db8b0b13..9e38475e0f8d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -112,6 +112,7 @@ int amdgpu_gtt_mgr_recover(struct ttm_resource_manager 
> *man);
>  u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo);
>  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>   struct ttm_resource *mem,
> + u64 offset, u64 size,
>   struct device *dev,
>   enum dma_data_direction dir,
>   struct sg_table **sgt);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 592a2dd16493..fcdee0deba16 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include "amdgpu.h"
>  #include "amdgpu_vm.h"
> +#include "amdgpu_res_cursor.h"
>  #include "amdgpu_atomfirmware.h"
>  #include "atom.h"
>
> @@ -565,6 +566,8 @@ static void amdgpu_vram_mgr_del(struct 
> ttm_resource_manager *man,
>   *
>   * @adev: amdgpu device pointer
>   * @mem: TTM memory object
> + * @offset: byte offset from the base of VRAM BO
> + * @length: number of bytes to export in sg_table
>   * @dev: the other device
>   * @dir: dma direction
>   * @sgt: resulting sg table
> @@ -573,39 +576,45 @@ static void amdgpu_vram_mgr_del(struct 
> ttm_resource_manager *man,
>   */
>  int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
>   struct ttm_resource *mem,
> + u64 offset, u64 length,
>   struct device *dev,
>   enum dma_data_direction dir,
>   struct sg_table **sgt)
>  {
> -   struct drm_mm_node *node;
> +   struct amdgpu_res_cursor cursor;
> struct scatterlist *sg;
> int num_entries = 0;
> -   unsigned int pages;
> int i, r;
>
> *sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);
> if (!*sgt)
> return -ENOMEM;
>
> -   for (pages = mem->num_pages, node = mem->mm_node;
> -pages; pages -= node->size, ++node)
> -   ++num_entries;
> +   // Determine the number of DRM_MM nodes to export
> +   amdgpu_res_first(mem, offset, length, &cursor);
> +   while (cursor.remaining) {
> +   num_entries++;
> +   amdgpu_res_next(&cursor, cursor.size);
> +   }
>
> r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL);
> if (r)
> goto error_free;
>
> +   // Initialize scatterlist nodes of sg_table
> for_each_sgtable_sg((*sgt), sg, i)
> sg->length = 0;
>
> -   node = mem->mm_node;
> +   // Walk down DRM_MM nodes to populate scatterlist nodes
> +   // @note: Use iterator api to get first the DRM_MM node
> +   // and th

Re: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO





Am 13.04.21 um 15:09 schrieb Sun, Roy:

[AMD Official Use Only - Internal Distribution Only]

ping

-Original Message-
From: Roy Sun 
Sent: Monday, April 12, 2021 8:15 PM
To: amd-gfx@lists.freedesktop.org
Cc: Sun, Roy 
Subject: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO

Add interface to get the mm clock, temperature and memory load

Signed-off-by: Roy Sun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 +
  include/uapi/drm/amdgpu_drm.h   | 12 ++
  2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b2e774aeab45..e5b16e0819ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
}
ui32 /= 100;
break;
+   case AMDGPU_INFO_SENSOR_UVD_VCLK:
+   /* get mm vclk in Mhz */


Maybe drop the comment, it looks redundant.


+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_UVD_VCLK,


In general I would not repeat that code over and over again, but rather 
translate the parameter and then call amdgpu_dpm_read_sensor() once.


But this is just my top level view on code I don't know well.

Regards,
Christian.


+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   ui32 /= 100;
+   break;
+   case AMDGPU_INFO_SENSOR_UVD_DCLK:
+   /* get mm dclk in Mhz */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_UVD_DCLK,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   ui32 /= 100;
+   break;
+   case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP:
+   /* get junction temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  
AMDGPU_PP_SENSOR_HOTSPOT_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_EDGE_TEMP:
+   /* get current edge temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_EDGE_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_MEM_TEMP:
+   /* get current memory temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_MEM_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_MEM_LOAD:
+   /* get memory load */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_MEM_LOAD,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
default:
DRM_DEBUG_KMS("Invalid request %d\n",
  info->sensor_info.type);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h 
index 8b832f7458f2..484c72e17c72 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK   0x8
/* Subquery id: Query GPU stable pstate memory clock */
#define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK   0x9
+   /* Subquery id: Query GPU mm vclk */
+   #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa
+   /* Subquery id: Query GPU mm dclk */
+   #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb
+   /* Subquery id: Query junction temperature */
+   #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc
+   /* Subquery id: Query edge temperature */
+   #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd
+   /* Subquery id: Query memory temperature */
+

RE: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO

2021-04-13 Thread Sun, Roy

[AMD Official Use Only - Internal Distribution Only]

ping

-Original Message-
From: Roy Sun  
Sent: Monday, April 12, 2021 8:15 PM
To: amd-gfx@lists.freedesktop.org
Cc: Sun, Roy 
Subject: [PATCH] drm/amd/amdgpu: Expose some power info through AMDGPU_INFO

Add interface to get the mm clock, temperature and memory load

Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 50 +
 include/uapi/drm/amdgpu_drm.h   | 12 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b2e774aeab45..e5b16e0819ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -971,6 +971,56 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
}
ui32 /= 100;
break;
+   case AMDGPU_INFO_SENSOR_UVD_VCLK:
+   /* get mm vclk in Mhz */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_UVD_VCLK,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   ui32 /= 100;
+   break;
+   case AMDGPU_INFO_SENSOR_UVD_DCLK:
+   /* get mm dclk in Mhz */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_UVD_DCLK,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   ui32 /= 100;
+   break;
+   case AMDGPU_INFO_SENSOR_HOTSPOT_TEMP:
+   /* get junction temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  
AMDGPU_PP_SENSOR_HOTSPOT_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_EDGE_TEMP:
+   /* get current edge temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_EDGE_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_MEM_TEMP:
+   /* get current memory temperature */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_MEM_TEMP,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
+   case AMDGPU_INFO_SENSOR_MEM_LOAD:
+   /* get memory load */
+   if (amdgpu_dpm_read_sensor(adev,
+  AMDGPU_PP_SENSOR_MEM_LOAD,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   break;
default:
DRM_DEBUG_KMS("Invalid request %d\n",
  info->sensor_info.type);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h 
index 8b832f7458f2..484c72e17c72 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -780,6 +780,18 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_SCLK   0x8
/* Subquery id: Query GPU stable pstate memory clock */
#define AMDGPU_INFO_SENSOR_STABLE_PSTATE_GFX_MCLK   0x9
+   /* Subquery id: Query GPU mm vclk */
+   #define AMDGPU_INFO_SENSOR_UVD_VCLK 0xa
+   /* Subquery id: Query GPU mm dclk */
+   #define AMDGPU_INFO_SENSOR_UVD_DCLK 0xb
+   /* Subquery id: Query junction temperature */
+   #define AMDGPU_INFO_SENSOR_HOTSPOT_TEMP 0xc
+   /* Subquery id: Query edge temperature */
+   #define AMDGPU_INFO_SENSOR_EDGE_TEMP0xd
+   /* Subquery id: Query memory temperature */
+   #define AMDGPU_INFO_SENSOR_MEM_TEMP 0xe
+   /* Subquery id: Query Memory load */
+   #define AMDGPU_INFO_SENSOR_MEM_LOAD 0xf
 /* Number of VRAM page faults on CPU access. */
 #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS   0x1E
 #define AMDGPU_INFO_VRAM_LOST_COUNTER  0x1F
--
2.31.1
_

RE: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

2021-04-13 Thread Zhang, Hawking

[AMD Public Use]

Series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Tuikov, Luben  
Sent: Tuesday, April 13, 2021 20:56
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Deucher, Alexander 
; Zhang, Hawking 
Subject: [PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

Imporve the kernel-doc for the RAS sysfs interface. Fix the grammar, fix the 
context.

Cc: Alexander Deucher 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 +
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 30cda4b8a461..44dfb3613e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
 /**
  * DOC: AMDGPU RAS debugfs control interface
  *
- * It accepts struct ras_debug_if who has two members.
+ * The control interface accepts struct ras_debug_if which has two members.
  *
  * First member: ras_debug_if::head or ras_debug_if::inject.
  *
@@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  *
  * How to use the interface?
  *
- * Programs
+ * In a program
  *
- * Copy the struct ras_debug_if in your codes and initialize it.
- * Write the struct to the control node.
+ * Copy the struct ras_debug_if in your code and initialize it.
+ * Write the struct to the control interface.
  *
- * Shells
+ * From shell
  *
  * .. code-block:: bash
  *
- * echo op block [error [sub_block address value]] > .../ras/ras_ctrl
+ * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "enable   " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "inject   > 
/sys/kernel/debug/dri//ras/ras_ctrl
  *
- * Parameters:
+ * Where N, is the card which you want to affect.
  *
- * op: disable, enable, inject
- * disable: only block is needed
- * enable: block and error are needed
- * inject: error, address, value are needed
- * block: umc, sdma, gfx, .
+ * "disable" requires only the block.
+ * "enable" requires the block and error type.
+ * "inject" requires the block, error type, address, and value.
+ * The block is one of: umc, sdma, gfx, etc.
  * see ras_block_string[] for details
- * error: ue, ce
- * ue: multi_uncorrectable
- * ce: single_correctable
- * sub_block:
- * sub block index, pass 0 if there is no sub block
+ * The error type is one of: ue, ce, where,
+ * ue is multi-uncorrectable
+ * ce is single-correctable
+ * The sub-block is a the sub-block index, pass 0 if there is no sub-block.
+ * The address and value are hexadecimal numbers, leading 0x is optional.
  *
- * here are some examples for bash commands:
+ * For instance,
  *
  * .. code-block:: bash
  *
@@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl
  * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl
  *
- * How to check the result?
+ * How to check the result of the operation?
  *
- * For disable/enable, please check ras features at
+ * To check disable/enable, see "ras" features at,
  * /sys/class/drm/card[0/1/2...]/device/ras/features
  *
- * For inject, please check corresponding err count at
- * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count
+ * To check inject, see the corresponding error count at,
+ * 
+ /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count
  *
  * .. note::
  * Operations are only allowed on blocks which are supported.
- * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask
+ * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask
  * to see which blocks support RAS on a particular asic.
  *
  */
--
2.31.0.97.g1424303384
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdgpu: Fix kernel-doc for the RAS sysfs interface

Imporve the kernel-doc for the RAS sysfs
interface. Fix the grammar, fix the context.

Cc: Alexander Deucher 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 47 +
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 30cda4b8a461..44dfb3613e37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -274,7 +274,7 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
 /**
  * DOC: AMDGPU RAS debugfs control interface
  *
- * It accepts struct ras_debug_if who has two members.
+ * The control interface accepts struct ras_debug_if which has two members.
  *
  * First member: ras_debug_if::head or ras_debug_if::inject.
  *
@@ -299,32 +299,33 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  *
  * How to use the interface?
  *
- * Programs
+ * In a program
  *
- * Copy the struct ras_debug_if in your codes and initialize it.
- * Write the struct to the control node.
+ * Copy the struct ras_debug_if in your code and initialize it.
+ * Write the struct to the control interface.
  *
- * Shells
+ * From shell
  *
  * .. code-block:: bash
  *
- * echo op block [error [sub_block address value]] > .../ras/ras_ctrl
+ * echo "disable " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "enable   " > /sys/kernel/debug/dri//ras/ras_ctrl
+ * echo "inject   > 
/sys/kernel/debug/dri//ras/ras_ctrl
  *
- * Parameters:
+ * Where N, is the card which you want to affect.
  *
- * op: disable, enable, inject
- * disable: only block is needed
- * enable: block and error are needed
- * inject: error, address, value are needed
- * block: umc, sdma, gfx, .
+ * "disable" requires only the block.
+ * "enable" requires the block and error type.
+ * "inject" requires the block, error type, address, and value.
+ * The block is one of: umc, sdma, gfx, etc.
  * see ras_block_string[] for details
- * error: ue, ce
- * ue: multi_uncorrectable
- * ce: single_correctable
- * sub_block:
- * sub block index, pass 0 if there is no sub block
+ * The error type is one of: ue, ce, where,
+ * ue is multi-uncorrectable
+ * ce is single-correctable
+ * The sub-block is a the sub-block index, pass 0 if there is no sub-block.
+ * The address and value are hexadecimal numbers, leading 0x is optional.
  *
- * here are some examples for bash commands:
+ * For instance,
  *
  * .. code-block:: bash
  *
@@ -332,17 +333,17 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
  * echo inject umc ce 0 0 0 > /sys/kernel/debug/dri/0/ras/ras_ctrl
  * echo disable umc > /sys/kernel/debug/dri/0/ras/ras_ctrl
  *
- * How to check the result?
+ * How to check the result of the operation?
  *
- * For disable/enable, please check ras features at
+ * To check disable/enable, see "ras" features at,
  * /sys/class/drm/card[0/1/2...]/device/ras/features
  *
- * For inject, please check corresponding err count at
- * /sys/class/drm/card[0/1/2...]/device/ras/[gfx/sdma/...]_err_count
+ * To check inject, see the corresponding error count at,
+ * /sys/class/drm/card[0/1/2...]/device/ras/[gfx|sdma|umc|...]_err_count
  *
  * .. note::
  * Operations are only allowed on blocks which are supported.
- * Please check ras mask at /sys/module/amdgpu/parameters/ras_mask
+ * Check the "ras" mask at /sys/module/amdgpu/parameters/ras_mask
  * to see which blocks support RAS on a particular asic.
  *
  */
-- 
2.31.0.97.g1424303384

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/4] drm/amdgpu: Add bad_page_cnt_threshold to debugfs

Add bad_page_cnt_threshold to debugfs, an optional
file system used for debugging, for reporting
purposes only--it usually matches the size of
EEPROM but may be different depending on the
"bad_page_threshold" kernel module option.

The "bad_page_cnt_threshold" is a dynamically
computed value. It depends on three things: the
VRAM size; the size of the EEPROM (or the size
allocated to the RAS table therein); and the
"bad_page_threshold" module parameter. It is a
dynamically computed value, when the amdgpu module
is run, on which further parameters and logic
depend, and as such it is helpful to see the
dynamically computed value in debugfs.

Cc: Alexander Deucher 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 3ca6b51f0c9c..30cda4b8a461 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1263,6 +1263,8 @@ static struct dentry 
*amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device *
&amdgpu_ras_debugfs_ctrl_ops);
debugfs_create_file("ras_eeprom_reset", S_IWUGO | S_IRUGO, dir, adev,
&amdgpu_ras_debugfs_eeprom_ops);
+   debugfs_create_u32("bad_page_cnt_threshold", 0444, dir,
+  &con->bad_page_cnt_threshold);
 
/*
 * After one uncorrectable error happens, usually GPU recovery will
-- 
2.31.0.97.g1424303384

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/4] drm/amdgpu: Fix a bug in checking the result of reserve page

Fix if (ret) --> if (!ret), a bug, for
"retire_page", which caused the kernel to recall
the method with *pos == end of file, and that
bounced back with error. On the first run, we
advanced *pos, but returned 0 back to fs layer,
also a bug.

Fix the logic of the check of the result of
amdgpu_reserve_page_direct()--it is 0 on success,
and non-zero on error, not the other way
around. This patch fixes this bug.

Cc: Alexander Deucher 
Cc: John Clements 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 9041453465f1..3ca6b51f0c9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -114,7 +114,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device 
*adev, uint64_t addre
 
if (amdgpu_ras_check_bad_page(adev, address)) {
dev_warn(adev->dev,
-"RAS WARN: 0x%llx has been marked as bad page!\n",
+"RAS WARN: 0x%llx has already been marked as bad 
page!\n",
 address);
return 0;
}
@@ -228,7 +228,6 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
return -EINVAL;
 
if (op != -1) {
-
if (op == 3) {
if (sscanf(str, "%*s %llx", &address) != 1)
return -EINVAL;
@@ -364,11 +363,9 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file 
*f, const char __user *
if (ret)
return -EINVAL;
 
-   if (data.op == 3)
-   {
+   if (data.op == 3) {
ret = amdgpu_reserve_page_direct(adev, data.inject.address);
-
-   if (ret)
+   if (!ret)
return size;
else
return ret;
-- 
2.31.0.97.g1424303384

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/4] drm/amdgpu: Fix a bug for input with double sscanf

Remove double-sscanf to scan for %llu and 0x%llx,
as that is not going to work!

The %llu will consume the "0" in "0x" of your
input, and the hex value you think you're entering
will always be 0. That is, a valid hex value can
never be consumed.

On the other hand, just entering a hex number
without leading 0x will either be scanned as a
string and not match, for instance FAB123, or
the leading decimal portion is scanned as the
%llu, for instance 123FAB will be scanned as 123,
which is not correct.

Thus remove the first %llu scan and leave only the
%llx scan, removing the leading 0x since %llx can
scan either.

Addresses are usually always hex values, so this
suffices.

Cc: Alexander Deucher 
Cc: Xinhui Pan 
Cc: Hawking Zhang 
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 0541196ae1ed..9041453465f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -230,9 +230,8 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
if (op != -1) {
 
if (op == 3) {
-   if (sscanf(str, "%*s %llu", &address) != 1)
-   if (sscanf(str, "%*s 0x%llx", &address) != 1)
-   return -EINVAL;
+   if (sscanf(str, "%*s %llx", &address) != 1)
+   return -EINVAL;
 
data->op = op;
data->inject.address = address;
@@ -255,11 +254,9 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
data->op = op;
 
if (op == 2) {
-   if (sscanf(str, "%*s %*s %*s %u %llu %llu",
-   &sub_block, &address, &value) 
!= 3)
-   if (sscanf(str, "%*s %*s %*s 0x%x 0x%llx 
0x%llx",
-   &sub_block, &address, 
&value) != 3)
-   return -EINVAL;
+   if (sscanf(str, "%*s %*s %*s %x %llx %llx",
+  &sub_block, &address, &value) != 3)
+   return -EINVAL;
data->head.sub_block_index = sub_block;
data->inject.address = address;
data->inject.value = value;
-- 
2.31.0.97.g1424303384

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface


Am 13.04.21 um 14:14 schrieb Roy Sun:

Tracking devices, process info and fence info using
/proc/pid/fdinfo

Signed-off-by: David M Nieto 
Signed-off-by: Roy Sun 
---
  drivers/gpu/drm/amd/amdgpu/Makefile|   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  59 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 149 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  43 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  21 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h|   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  35 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   2 +
  12 files changed, 321 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index ee85e8aba636..d216b7ecb5d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,6 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o
  
+amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o

+
  amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
  
  # add asic specific block

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 616c85a01299..c2338a0dd1f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
  #include "amdgpu_gfxhub.h"
  #include "amdgpu_df.h"
  #include "amdgpu_smuio.h"
+#include "amdgpu_fdinfo.h"
  
  #define MAX_GPU_INSTANCE		16
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 0350205c4897..8d33571754d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -651,3 +651,62 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
idr_destroy(&mgr->ctx_handles);
mutex_destroy(&mgr->lock);
  }
+
+ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip,
+   uint32_t idx, uint64_t *elapsed)


That starts to look better, but please change this function so that it 
gets a amdgpu_ctx_mgr structure as first parameter and rename it to 
amdgpu_ctx_mgr_fence_usage.



+{
+   struct amdgpu_ctx_entity *centity;
+   struct idr *idp;
+   struct amdgpu_ctx *ctx;
+   uint32_t id, i;
+   ktime_t now, t1;
+   ktime_t total = 0, max = 0;
+
+
+   if (idx >= AMDGPU_MAX_ENTITY_NUM)
+   return 0;
+
+   idp = &fpriv->ctx_mgr.ctx_handles;
+
+   mutex_lock(&fpriv->ctx_mgr.lock);
+   idr_for_each_entry(idp, ctx, id) {


Maybe move everything from here till the end of the loop into a separate 
function.



+   if (!ctx->entities[hwip][idx])
+   continue;
+
+   centity = ctx->entities[hwip][idx];
+   now = ktime_get();
+   for (i = 0; i < amdgpu_sched_jobs; i++) {
+   struct dma_fence *fence;
+   struct drm_sched_fence *s_fence;
+
+   spin_lock(&ctx->ring_lock);
+   fence = dma_fence_get(centity->fences[i]);
+   spin_unlock(&ctx->ring_lock);
+   if (!fence)
+   continue;
+   s_fence = to_drm_sched_fence(fence);
+   if (!dma_fence_is_signaled(&s_fence->scheduled))
+   continue;
+   t1 = s_fence->scheduled.timestamp;
+   if (t1 >= now)
+   continue;
+   if (dma_fence_is_signaled(&s_fence->finished) &&
+   s_fence->finished.timestamp < 
now)


That is indented to far to the right, the "s_fence..." should be below 
the "dma_fence..." in the line above.



+   total += ktime_sub(s_fence->finished.timestamp, 
t1);
+   else
+   total += ktime_sub(now, t1);
+   t1 = ktime_sub(now, t1);
+   dma_fence_put(fence);
+   if (t1 > max)
+   max = t1;


Please write this as "max = max(max, t1);".


+   }
+
+   }
+
+   mutex_unlock(&fpriv->ctx_mgr.lock);
+
+   if (elapsed)
+   *elapsed = max;
+
+   return total;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index f54e10314661..8ee42f12e5f8 100644
--- a/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface

2021-04-13 Thread Roy Sun

Tracking devices, process info and fence info using
/proc/pid/fdinfo

Signed-off-by: David M Nieto 
Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/amd/amdgpu/Makefile|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  59 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 149 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  43 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  21 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  35 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   2 +
 12 files changed, 321 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index ee85e8aba636..d216b7ecb5d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,6 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o
 
+amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
+
 amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
 
 # add asic specific block
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 616c85a01299..c2338a0dd1f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,6 +107,7 @@
 #include "amdgpu_gfxhub.h"
 #include "amdgpu_df.h"
 #include "amdgpu_smuio.h"
+#include "amdgpu_fdinfo.h"
 
 #define MAX_GPU_INSTANCE   16
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 0350205c4897..8d33571754d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -651,3 +651,62 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
idr_destroy(&mgr->ctx_handles);
mutex_destroy(&mgr->lock);
 }
+
+ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip,
+   uint32_t idx, uint64_t *elapsed)
+{
+   struct amdgpu_ctx_entity *centity;
+   struct idr *idp;
+   struct amdgpu_ctx *ctx;
+   uint32_t id, i;
+   ktime_t now, t1;
+   ktime_t total = 0, max = 0;
+
+
+   if (idx >= AMDGPU_MAX_ENTITY_NUM)
+   return 0;
+
+   idp = &fpriv->ctx_mgr.ctx_handles;
+
+   mutex_lock(&fpriv->ctx_mgr.lock);
+   idr_for_each_entry(idp, ctx, id) {
+   if (!ctx->entities[hwip][idx])
+   continue;
+
+   centity = ctx->entities[hwip][idx];
+   now = ktime_get();
+   for (i = 0; i < amdgpu_sched_jobs; i++) {
+   struct dma_fence *fence;
+   struct drm_sched_fence *s_fence;
+
+   spin_lock(&ctx->ring_lock);
+   fence = dma_fence_get(centity->fences[i]);
+   spin_unlock(&ctx->ring_lock);
+   if (!fence)
+   continue;
+   s_fence = to_drm_sched_fence(fence);
+   if (!dma_fence_is_signaled(&s_fence->scheduled))
+   continue;
+   t1 = s_fence->scheduled.timestamp;
+   if (t1 >= now)
+   continue;
+   if (dma_fence_is_signaled(&s_fence->finished) &&
+   s_fence->finished.timestamp < 
now)
+   total += ktime_sub(s_fence->finished.timestamp, 
t1);
+   else
+   total += ktime_sub(now, t1);
+   t1 = ktime_sub(now, t1);
+   dma_fence_put(fence);
+   if (t1 > max)
+   max = t1;
+   }
+
+   }
+
+   mutex_unlock(&fpriv->ctx_mgr.lock);
+
+   if (elapsed)
+   *elapsed = max;
+
+   return total;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index f54e10314661..8ee42f12e5f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -87,5 +87,6 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
 void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
 long amdgpu_ctx_mgr_entity_flush(struct amdgpu_ctx_mgr *mgr, long timeout);
 void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
-
+ktime_t amdgpu_ctx_fence_usage(struct amdgpu_fpriv *fpriv, uint32_t hwip,
+   uint32_t idx, uint64_t *elapsed);
 #endif
diff --git a/drivers

[PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-13 Thread Roy Sun

Update the timestamp of scheduled fence on HW
completion of the previous fences

This allow more accurate tracking of the fence
execution in HW

Signed-off-by: David M Nieto 
Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/scheduler/sched_main.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..4e5d8d4af010 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
*sched)
 EXPORT_SYMBOL(drm_sched_resubmit_jobs);
 
 /**
- * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs from 
mirror ring list
+ * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs from 
pending list
  *
  * @sched: scheduler instance
  * @max: job numbers to relaunch
@@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 static struct drm_sched_job *
 drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 {
-   struct drm_sched_job *job;
+   struct drm_sched_job *job, *next;
 
/*
 * Don't destroy jobs while the timeout worker is running  OR thread
@@ -690,6 +690,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from pending_list */
list_del_init(&job->list);
+   /* account for the next fence in the queue */
+   next = list_first_entry_or_null(&sched->pending_list,
+   struct drm_sched_job, list);
+   if (next) {
+   next->s_fence->scheduled.timestamp =
+   job->s_fence->finished.timestamp;
+   }
} else {
job = NULL;
/* queue timeout for next job */
-- 
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: correction of ucode fw_size calculation errors

2021-04-13 Thread Kevin Wang

correct big and little endian problems on different platforms.

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 +-
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 85a6a10e048f..0a00a7cc8eaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4354,7 +4354,7 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev)
 le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes));
fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes);
 
-   r = amdgpu_bo_create_reserved(adev, 
mec_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
  PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.mec.mec_fw_obj,
  &adev->gfx.mec.mec_fw_gpu_addr,
@@ -5769,7 +5769,7 @@ static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct 
amdgpu_device *adev)
le32_to_cpu(pfp_hdr->header.ucode_array_offset_bytes));
fw_size = le32_to_cpu(pfp_hdr->header.ucode_size_bytes);
 
-   r = amdgpu_bo_create_reserved(adev, pfp_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
  PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.pfp.pfp_fw_obj,
  &adev->gfx.pfp.pfp_fw_gpu_addr,
@@ -5847,7 +5847,7 @@ static int gfx_v10_0_cp_gfx_load_ce_microcode(struct 
amdgpu_device *adev)
le32_to_cpu(ce_hdr->header.ucode_array_offset_bytes));
fw_size = le32_to_cpu(ce_hdr->header.ucode_size_bytes);
 
-   r = amdgpu_bo_create_reserved(adev, ce_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
  PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.ce.ce_fw_obj,
  &adev->gfx.ce.ce_fw_gpu_addr,
@@ -5924,7 +5924,7 @@ static int gfx_v10_0_cp_gfx_load_me_microcode(struct 
amdgpu_device *adev)
le32_to_cpu(me_hdr->header.ucode_array_offset_bytes));
fw_size = le32_to_cpu(me_hdr->header.ucode_size_bytes);
 
-   r = amdgpu_bo_create_reserved(adev, me_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
  PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.me.me_fw_obj,
  &adev->gfx.me.me_fw_gpu_addr,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 06811a1f4625..831398e15b84 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2013,7 +2013,7 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
 le32_to_cpu(mec_hdr->header.ucode_array_offset_bytes));
fw_size = le32_to_cpu(mec_hdr->header.ucode_size_bytes);
 
-   r = amdgpu_bo_create_reserved(adev, mec_hdr->header.ucode_size_bytes,
+   r = amdgpu_bo_create_reserved(adev, fw_size,
  PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.mec.mec_fw_obj,
  &adev->gfx.mec.mec_fw_gpu_addr,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
index 6274cae4a065..00610def69ea 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
@@ -174,7 +174,7 @@ int smu_v11_0_load_microcode(struct smu_context *smu)
hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data;
src = (const uint32_t *)(adev->pm.fw->data +
le32_to_cpu(hdr->header.ucode_array_offset_bytes));
-   smc_fw_size = hdr->header.ucode_size_bytes;
+   smc_fw_size = le32_to_cpu(hdr->header.ucode_size_bytes;)
 
for (i = 1; i < smc_fw_size/4 - 1; i++) {
WREG32_PCIE(addr_start, src[i]);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 0864083e7435..d7ea8215d7dc 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -149,7 +149,7 @@ int smu_v13_0_load_microcode(struct smu_context *smu)
hdr = (const struct smc_firmware_header_v1_0 *) adev->pm.fw->data;
src = (const uint32_t *)(adev->pm.fw->data +
 
le32_to_cpu(hdr->header.ucode_array_offset_bytes));
-   smc_fw

[bug report] drm/amdgpu: page retire over debugfs mechanism

2021-04-13 Thread Dan Carpenter

Hello John Clements,

The patch cbb8f989d5a0: "drm/amdgpu: page retire over debugfs
mechanism" from Apr 9, 2021, leads to the following static checker
warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:377 
amdgpu_ras_debugfs_ctrl_write()
info: return a literal instead of 'ret'

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
   366  ret = amdgpu_ras_debugfs_ctrl_parse_data(f, buf, size, pos, 
&data);
   367  if (ret)
   368  return -EINVAL;
   369  
   370  if (data.op == 3)
   371  {
^
Please use scripts/checkpatch.pl...  :(  Bonus points for replacing the
magic number 3 with a define?

   372  ret = amdgpu_reserve_page_direct(adev, 
data.inject.address);
   373  
   374  if (ret)
   375  return size;
   376  else
   377  return ret;

This static checker warning is disguised as a style warning, but it's
really to detect code like this where the if statements are reversed as
appears to be the case here.

ret = amdgpu_reserve_page_direct(adev, data.inject.address);
if (ret)
return ret;

return size;

   378  }
   379  
   380  if (!amdgpu_ras_is_supported(adev, data.head.block))
   381  return -EINVAL;

regards,
dan carpenter
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability


Hi Dennis,

yeah, that just has the same down side of a lot of additional overhead 
as the is_signaled callback.


Bouncing cache lines on the CPU isn't funny at all.

Christian.

Am 13.04.21 um 11:13 schrieb Li, Dennis:

[AMD Official Use Only - Internal Distribution Only]

Hi, Christian and Andrey,
   We maybe try to implement "wait" callback function of dma_fence_ops, 
when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller 
device lost.

 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
 * interrupted, and remaining jiffies if fence has signaled, or 0 if 
wait
 * timed out. Can also return other error values on custom 
implementations,
 * which should be treated as if the fence is signaled. For example a 
hardware
 * lockup could be reported like that.
 *
 * This callback is optional.
 */
signed long (*wait)(struct dma_fence *fence,
bool intr, signed long timeout);

Best Regards
Dennis Li
-Original Message-
From: Christian König 
Sent: Tuesday, April 13, 2021 3:10 PM
To: Grodzovsky, Andrey ; Koenig, Christian ; Li, Dennis 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, 
Felix ; Zhang, Hawking ; Daniel Vetter 
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:

On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]

So what's the right approach ? How we guarantee that when running
amdgpu_fence_driver_force_completion we will signal all the HW
fences and not racing against some more fences insertion into that
array ?


Well I would still say the best approach would be to insert this
between the front end and the backend and not rely on signaling
fences while holding the device srcu.


My question is, even now, when we run
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
what there prevents a race with another fence being at the same time
emitted and inserted into the fence array ? Looks like nothing.


Each ring can only be used by one thread at the same time, this
includes emitting fences as well as other stuff.

During GPU reset we make sure nobody writes to the rings by stopping
the scheduler and taking the GPU reset lock (so that nobody else can
start the scheduler again).


What about direct submissions not through scheduler -
amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.


BTW: Could it be that the device SRCU protects more than one device
and we deadlock because of this?


I haven't actually experienced any deadlock until now but, yes,
drm_unplug_srcu is defined as static in drm_drv.c and so in the
presence  of multiple devices from same or different drivers we in
fact are dependent on all their critical sections i guess.


Shit, yeah the devil is a squirrel. So for A+I laptops we actually
need to sync that up with Daniel and the rest of the i915 guys.

IIRC we could actually have an amdgpu device in a docking station
which needs hotplug and the driver might depend on waiting for the
i915 driver as well.


Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.


Andrey



Christian.


Andrey



Christian.


Andrey



Andrey



Christian.


     /* Past this point no more fence are submitted to HW ring
and hence we can safely call force signal on all that are
currently there.
  * Any subsequently created  HW fences will be returned
signaled with an error code right away
  */

     for_each_ring(adev)
         amdgpu_fence_process(ring)

     drm_dev_unplug(dev);
     Stop schedulers
     cancel_sync(all timers and queued works);
     hw_fini
     unmap_mmio

}


Andrey





Alternatively grabbing the reset write side and stopping
and then restarting the scheduler could work as well.

Christian.


I didn't get the above and I don't see why I need to reuse
the GPU reset rw_lock. I rely on the SRCU unplug flag for
unplug. Also, not clear to me why are we focusing on the
scheduler threads, any code patch to generate HW fences
should be covered, so any code leading to
amdgpu_fence_emit needs to be taken into account such as,
direct IB submissions, VM flushes e.t.c

You need to work together with the reset lock anyway, cause
a hotplug could run at the same time as a reset.


For going my way indeed now I see now that I have to take
reset write side lock during HW fences signalling in order
to protect against scheduler/HW fences detachment and
reattachment during schedulers stop/restart. But if we go
with your approach  then callin

RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-13 Thread Li, Dennis

[AMD Official Use Only - Internal Distribution Only]

Hi, Christian and Andrey,
  We maybe try to implement "wait" callback function of dma_fence_ops, when 
GPU reset or unplug happen, make this callback return - ENODEV, to notify the 
caller device lost. 

 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
 * interrupted, and remaining jiffies if fence has signaled, or 0 if 
wait
 * timed out. Can also return other error values on custom 
implementations,
 * which should be treated as if the fence is signaled. For example a 
hardware
 * lockup could be reported like that.
 *
 * This callback is optional.
 */
signed long (*wait)(struct dma_fence *fence,
bool intr, signed long timeout);

Best Regards
Dennis Li
-Original Message-
From: Christian König  
Sent: Tuesday, April 13, 2021 3:10 PM
To: Grodzovsky, Andrey ; Koenig, Christian 
; Li, Dennis ; 
amd-gfx@lists.freedesktop.org; Deucher, Alexander ; 
Kuehling, Felix ; Zhang, Hawking 
; Daniel Vetter 
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>
> On 2021-04-12 3:18 p.m., Christian König wrote:
>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>> [SNIP]
>
> So what's the right approach ? How we guarantee that when running 
> amdgpu_fence_driver_force_completion we will signal all the HW 
> fences and not racing against some more fences insertion into that 
> array ?
>

 Well I would still say the best approach would be to insert this 
 between the front end and the backend and not rely on signaling 
 fences while holding the device srcu.
>>>
>>>
>>> My question is, even now, when we run 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
>>> what there prevents a race with another fence being at the same time 
>>> emitted and inserted into the fence array ? Looks like nothing.
>>>
>>
>> Each ring can only be used by one thread at the same time, this 
>> includes emitting fences as well as other stuff.
>>
>> During GPU reset we make sure nobody writes to the rings by stopping 
>> the scheduler and taking the GPU reset lock (so that nobody else can 
>> start the scheduler again).
>
>
> What about direct submissions not through scheduler - 
> amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.

>>

 BTW: Could it be that the device SRCU protects more than one device 
 and we deadlock because of this?
>>>
>>>
>>> I haven't actually experienced any deadlock until now but, yes, 
>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>> presence  of multiple devices from same or different drivers we in 
>>> fact are dependent on all their critical sections i guess.
>>>
>>
>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>> need to sync that up with Daniel and the rest of the i915 guys.
>>
>> IIRC we could actually have an amdgpu device in a docking station 
>> which needs hotplug and the driver might depend on waiting for the
>> i915 driver as well.
>
>
> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
> don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>> Andrey
>>>
>>>

 Christian.

> Andrey
>
>
>>
>>> Andrey
>>>
>>>

 Christian.

>     /* Past this point no more fence are submitted to HW ring 
> and hence we can safely call force signal on all that are 
> currently there.
>  * Any subsequently created  HW fences will be returned 
> signaled with an error code right away
>  */
>
>     for_each_ring(adev)
>         amdgpu_fence_process(ring)
>
>     drm_dev_unplug(dev);
>     Stop schedulers
>     cancel_sync(all timers and queued works);
>     hw_fini
>     unmap_mmio
>
> }
>
>
> Andrey
>
>
>>
>>
>>>
>>
>> Alternatively grabbing the reset write side and stopping 
>> and then restarting the scheduler could work as well.
>>
>> Christian.
>
>
> I didn't get the above and I don't see why I need to reuse 
> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
> unplug. Also, not clear to me why are we focusing on the 
> scheduler threads, any code patch to generate HW fences 
> should be covered,

[PATCH] drm/radeon/cik: remove set but not used variables

2021-04-13 Thread Tian Tao

The value of pipe_id and queue_id  are not used under certain
circumstances, so just delete.

Signed-off-by: Tian Tao 
---
 drivers/gpu/drm/radeon/cik.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 8b7a4f7..42a8afa 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -7948,8 +7948,6 @@ int cik_irq_process(struct radeon_device *rdev)
DRM_ERROR("Illegal register access in command 
stream\n");
/* XXX check the bitfield order! */
me_id = (ring_id & 0x60) >> 5;
-   pipe_id = (ring_id & 0x18) >> 3;
-   queue_id = (ring_id & 0x7) >> 0;
switch (me_id) {
case 0:
/* This results in a full GPU reset, but all we 
need to do is soft
@@ -7971,8 +7969,6 @@ int cik_irq_process(struct radeon_device *rdev)
DRM_ERROR("Illegal instruction in command stream\n");
/* XXX check the bitfield order! */
me_id = (ring_id & 0x60) >> 5;
-   pipe_id = (ring_id & 0x18) >> 3;
-   queue_id = (ring_id & 0x7) >> 0;
switch (me_id) {
case 0:
/* This results in a full GPU reset, but all we 
need to do is soft
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add graphics cache rinse packet for sdma


Yeah agree, a bit more commit text would be nice to have.

Apart from that feel free to add an Acked-by: Christian König 
 as well.


Christian.

Am 13.04.21 um 08:41 schrieb Huang Rui:

On Tue, Apr 13, 2021 at 02:23:00PM +0800, Su, Jinzhou (Joe) wrote:

   Add emit mem sync callback for sdma_v5_2

I suggest to describe the problem you encountered for this change, most of
persons would like to know how.

With that fixed, patch is Reviewed-by: Huang Rui 


Signed-off-by: Jinzhou Su 
---
  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++
  1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 93f826a7d3f0..b1ad9e52b234 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -369,6 +369,33 @@ static void sdma_v5_2_ring_emit_ib(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, upper_32_bits(csa_mc_addr));
  }
  
+/**

+ * sdma_v5_2_ring_emit_mem_sync - flush the IB by graphics cache rinse
+ *
+ * @ring: amdgpu ring pointer
+ * @job: job to retrieve vmid from
+ * @ib: IB object to schedule
+ *
+ * flush the IB by graphics cache rinse.
+ */
+static void sdma_v5_2_ring_emit_mem_sync(struct amdgpu_ring *ring)
+{
+uint32_t gcr_cntl =
+   SDMA_GCR_GL2_INV | SDMA_GCR_GL2_WB | SDMA_GCR_GLM_INV |
+   SDMA_GCR_GL1_INV | SDMA_GCR_GLV_INV | SDMA_GCR_GLK_INV |
+   SDMA_GCR_GLI_INV(1);
+
+   /* flush entire cache L0/L1/L2, this can be optimized by performance 
requirement */
+   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD1_BASE_VA_31_7(0));
+   amdgpu_ring_write(ring, 
SDMA_PKT_GCR_REQ_PAYLOAD2_GCR_CONTROL_15_0(gcr_cntl) |
+   SDMA_PKT_GCR_REQ_PAYLOAD2_BASE_VA_47_32(0));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD3_LIMIT_VA_31_7(0) |
+   SDMA_PKT_GCR_REQ_PAYLOAD3_GCR_CONTROL_18_16(gcr_cntl >> 
16));
+   amdgpu_ring_write(ring, SDMA_PKT_GCR_REQ_PAYLOAD4_LIMIT_VA_47_32(0) |
+   SDMA_PKT_GCR_REQ_PAYLOAD4_VMID(0));
+}
+
  /**
   * sdma_v5_2_ring_emit_hdp_flush - emit an hdp flush on the DMA ring
   *
@@ -1663,6 +1690,7 @@ static const struct amdgpu_ring_funcs 
sdma_v5_2_ring_funcs = {
10 + 10 + 10, /* sdma_v5_2_ring_emit_fence x3 for user fence, 
vm fence */
.emit_ib_size = 7 + 6, /* sdma_v5_2_ring_emit_ib */
.emit_ib = sdma_v5_2_ring_emit_ib,
+   .emit_mem_sync = sdma_v5_2_ring_emit_mem_sync,
.emit_fence = sdma_v5_2_ring_emit_fence,
.emit_pipeline_sync = sdma_v5_2_ring_emit_pipeline_sync,
.emit_vm_flush = sdma_v5_2_ring_emit_vm_flush,
--
2.27.0



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K


Hi Mikhail,

the crash is a known issue and should be fixed by:

commit f63da9ae7584280582cbc834b20cc18bfb203b14
Author: Philip Yang 
Date:   Thu Apr 1 00:22:23 2021 -0400

    drm/amdgpu: reserve fence slot to update page table

But that an userspace application can cause a page fault is perfectly 
possible. See here for example https://en.wikipedia.org/wiki/Halting_problem


What we do with misbehaving applications is to log the incident and 
prevent the queue which does nasty things from doing even more submissions.


Regards,
Christian.

Am 13.04.21 um 00:05 schrieb Mikhail Gavrilov:

Video demonstration: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F3nkvUeB0GSw&data=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yunKS%2Fbm%2B4eF5IMS4dYH9mKELbM6ajK19pXXgm8dv6Q%3D&reserved=0

How looks kernel traces.

1.
[ 7315.156460] amdgpu :0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156490] amdgpu :0b:00.0: amdgpu:   in page starting at
address 0x80011fdf5000 from client 18
[ 7315.156495] amdgpu :0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156500] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156503] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156505] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156509] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156510] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156513] amdgpu :0b:00.0: amdgpu: RW: 0x1
[ 7315.156545] amdgpu :0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156549] amdgpu :0b:00.0: amdgpu:   in page starting at
address 0x80011fdf6000 from client 18
[ 7315.156551] amdgpu :0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156554] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156556] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156559] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156561] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156564] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156566] amdgpu :0b:00.0: amdgpu: RW: 0x1

This is a harmless panic, but nevertheless VAAPI does not work and the
application that tried to use the encoder crashed.

2.
If we tries again and again encode 4K stream through VAAPI we can
encounter the next trace:
[12341.860944] [ cut here ]
[12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287!
[12341.860968] invalid opcode:  [#1] SMP NOPTI
[12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G
W- ---  5.12.0-0.rc5.180.fc35.x86_64+debug #1
[12341.860977] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
[12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
90 0f
[12341.861112] RSP: 0018:b2f084c87bb0 EFLAGS: 00010246
[12341.861115] RAX: 0002 RBX: 9f9551184998 RCX: 
[12341.861119] RDX: 0002 RSI:  RDI: 9f9551184a50
[12341.861122] RBP: 0002 R08:  R09: 
[12341.861124] R10:  R11:  R12: 9f91b9a18140
[12341.861127] R13: 9f91c9020740 R14: 9f91c9020768 R15: 
[12341.861130] FS:  () GS:9f984a20()
knlGS:
[12341.861133] CS:  0010 DS:  ES:  CR0: 80050033
[12341.861136] CR2: 144e080d8000 CR3: 00010e98c000 CR4: 00350ee0
[12341.861139] Call Trace:
[12341.861143]  amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu]
[12341.861251]  amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu]
[12341.861356]  amdgpu_vm_handle_fault+0x145/0x290 [amdgpu]
[12341.861461]  gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu]
[12341.861571]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[12341.861577]  ? lock_acquire+0x179/0x3a0
[12341.861583]  ? lock_acquire+0x179/0x3a0
[12341.861587]  ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861692]  amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861796]  amdgpu_ih_process+0x90/0x110 [amdgpu]
[12341.861900]  process_one_work+0x2b0/0x5e0
[12341.861906]  worker_thread+0x55/0x3c0
[12341.861910]  ? process_one_work+0x5e0/0x5e0
[12341.861915]  kthread+0x13a/0x150
[12341.861918]  ? __kthread_bind_mask+0x60/0x60
[12341.861922

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:

On 2021-04-12 3:18 p.m., Christian König wrote:

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:

[SNIP]

Well I would still say the best approach would be to insert this
between the front end and the backend and not rely on signaling
fences while holding the device srcu.

My question is, even now, when we run
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
what there prevents a race with another fence being at the same time
emitted and inserted into the fence array ? Looks like nothing.

Each ring can only be used by one thread at the same time, this
includes emitting fences as well as other stuff.

During GPU reset we make sure nobody writes to the rings by stopping
the scheduler and taking the GPU reset lock (so that nobody else can
start the scheduler again).

What about direct submissions not through scheduler -
amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.

BTW: Could it be that the device SRCU protects more than one device
and we deadlock because of this?

Shit, yeah the devil is a squirrel. So for A+I laptops we actually
need to sync that up with Daniel and the rest of the i915 guys.

IIRC we could actually have an amdgpu device in a docking station
which needs hotplug and the driver might depend on waiting for the
i915 driver as well.

Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.

Andrey

Christian.

Andrey

Christian.

Andrey

Christian.

for_each_ring(adev)
amdgpu_fence_process(ring)

drm_dev_unplug(dev);
Stop schedulers
cancel_sync(all timers and queued works);
hw_fini
unmap_mmio

}

Andrey

Alternatively grabbing the reset write side and stopping
and then restarting the scheduler could work as well.

Christian.

I didn't get the above and I don't see why I need to reuse
the GPU reset rw_lock. I rely on the SRCU unplug flag for
unplug. Also, not clear to me why are we focusing on the
scheduler threads, any code patch to generate HW fences
should be covered, so any code leading to
amdgpu_fence_emit needs to be taken into account such as,
direct IB submissions, VM flushes e.t.c

You need to work together with the reset lock anyway, cause
a hotplug could run at the same time as a reset.

For going my way indeed now I see now that I have to take
reset write side lock during HW fences signalling in order
to protect against scheduler/HW fences detachment and
reattachment during schedulers stop/restart. But if we go
with your approach then calling drm_dev_unplug and scoping
amdgpu_job_timeout with drm_dev_enter/exit should be enough
to prevent any concurrent GPU resets during unplug. In fact
I already do it anyway -
https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&reserved=0

Yes, good point as well.

Christian.

Andrey

Christian.

Andrey

Christian.

Andrey

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability