[PATCH] Handling of amdgpu_device_resume return value for graceful teardown

2021-04-26 Thread pavan.ramayanam
From: Pavan Kumar Ramayanam 

The runtime resume PM op disregards the return value from
amdgpu_device_resume(), masking errors for failed resumes at the PM
layer.

Signed-off-by: Pavan Kumar Ramayanam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0369d3532bf0..03f3cf194300 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1574,6 +1574,9 @@ static int amdgpu_pmops_runtime_resume(struct device *dev)
amdgpu_device_baco_exit(drm_dev);
}
ret = amdgpu_device_resume(drm_dev, false);
+   if (ret)
+   return ret;
+
if (amdgpu_device_supports_px(drm_dev))
drm_dev->switch_power_state = DRM_SWITCH_POWER_ON;
adev->in_runpm = false;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: fix spelling mistake in packet manager

2021-04-26 Thread Felix Kuehling
Am 2021-04-26 um 3:44 p.m. schrieb Jonathan Kim:
> The plural of 'process' should be 'processes'.
>
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 0ce507d7208a..f688451cb299 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -124,14 +124,14 @@ static int pm_create_runlist_ib(struct packet_manager 
> *pm,
>  {
>   unsigned int alloc_size_bytes;
>   unsigned int *rl_buffer, rl_wptr, i;
> - int retval, proccesses_mapped;
> + int retval, processes_mapped;
>   struct device_process_node *cur;
>   struct qcm_process_device *qpd;
>   struct queue *q;
>   struct kernel_queue *kq;
>   bool is_over_subscription;
>  
> - rl_wptr = retval = proccesses_mapped = 0;
> + rl_wptr = retval = processes_mapped = 0;
>  
>   retval = pm_allocate_runlist_ib(pm, &rl_buffer, rl_gpu_addr,
>   &alloc_size_bytes, &is_over_subscription);
> @@ -148,7 +148,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
>   list_for_each_entry(cur, queues, list) {
>   qpd = cur->qpd;
>   /* build map process packet */
> - if (proccesses_mapped >= pm->dqm->processes_count) {
> + if (processes_mapped >= pm->dqm->processes_count) {
>   pr_debug("Not enough space left in runlist IB\n");
>   pm_release_ib(pm);
>   return -ENOMEM;
> @@ -158,7 +158,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
>   if (retval)
>   return retval;
>  
> - proccesses_mapped++;
> + processes_mapped++;
>   inc_wptr(&rl_wptr, pm->pmf->map_process_size,
>   alloc_size_bytes);
>  
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs

2021-04-26 Thread Felix Kuehling
Am 2021-04-26 um 8:35 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" 
>  
> wrote:
>
> Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.
>
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++-
>  2 files changed, 77 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 63668433f5a6..b706e5a54782 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -41,6 +41,7 @@ struct amdgpu_device;
>  enum kfd_mem_attachment_type {
>   KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
>   KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
> + KFD_MEM_ATT_DMABUF, /* DMAbuf to DMA map TTM BOs */
>  };
>
>  struct kfd_mem_attachment {
> @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
>  struct kgd_mem {
>   struct mutex lock;
>   struct amdgpu_bo *bo;
> + struct dma_buf *dmabuf;
>   struct list_head attachments;
>   /* protected by amdkfd_process_info.lock */
>   struct ttm_validate_buffer validate_list;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 9eeedd0c7920..18a1f9222a59 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>   return ret;
>  }
>
> +static int
> +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
> +{
> + struct ttm_operation_ctx ctx = {.interruptible = true};
> + struct amdgpu_bo *bo = attachment->bo_va->base.bo;
> +
> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
> + return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> How does this work? The function name says this is dma mapping a buffer but 
> from the implementation, it is just a placement and validation

Conceptually, calling ttm_bo_validate ensures that the BO is in the
specified domain, in this case GTT. Before calling validate, it can be
in the CPU domain, which means it may be swapped to disk so it's not GPU
accessible. For a DMABuf attachment, the CPU domain means, that the
DMABuf is not attached because the underlying memory object may be on
the move or swapped out.

The actual implementation of the dmabuf attachment is currently in
amdgpu_ttm_populate/unpopulate. This is incorrect. Patch 10 in this
series fixes that to move the actual dmabuf attachment into
amdgpu_ttm_backend_bind/unbind, which is called from amdgpu_bo_move when
a BO is moved between the CPU and GTT domains.

Regards,
  Felix


> +}
> +
>  static int
>  kfd_mem_dmamap_attachment(struct kgd_mem *mem,
> struct kfd_mem_attachment *attachment)
> @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>   return 0;
>   case KFD_MEM_ATT_USERPTR:
>   return kfd_mem_dmamap_userptr(mem, attachment);
> + case KFD_MEM_ATT_DMABUF:
> + return kfd_mem_dmamap_dmabuf(attachment);
>   default:
>   WARN_ON_ONCE(1);
>   }
> @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>   ttm->sg = NULL;
>  }
>
> +static void
> +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
> +{
> + struct ttm_operation_ctx ctx = {.interruptible = true};
> + struct amdgpu_bo *bo = attachment->bo_va->base.bo;
> +
> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
> + ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> + /* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
> +  * called
> +  */
> +}
> +
>  static void
>  kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>   struct kfd_mem_attachment *attachment)
> @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>   case KFD_MEM_ATT_USERPTR:
>   kfd_mem_dmaunmap_userptr(mem, attachment);
>   break;
> + case KFD_MEM_ATT_DMABUF:
> + kfd_mem_dmaunmap_dmabuf(attachment);
> + break;
>   default:
>   WARN_ON_ONCE(1);
>   }
> @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, 
> struct kgd_mem *mem,
>   return 0;
>  }
>
> +static int
> +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
> +   struct amdgpu_bo **bo)
> +{
> + struct drm_gem_object *gobj;
> +
> + if (!mem->dmabuf) {
> + mem->dmabuf = amdgpu_gem_prime_expor

Re: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings

2021-04-26 Thread Felix Kuehling
Am 2021-04-26 um 8:23 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" 
>  
> wrote:
>
> DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
> with the BO and page tables reserved, so we can safely update the DMA
> mapping.
>
> DMA unmap when a BO is unmapped from a GPU and before updating mappings
> in restore workers.
>
> Signed-off-by: Felix Kuehling 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++-
>  1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 49d1af4aa5f1..7d25d886b98c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct 
> bo_vm_reservation_context *ctx,
>   return ret;
>  }
>
> -static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
> +static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
>   struct kfd_mem_attachment *entry,
>   struct amdgpu_sync *sync)
>  {
>   struct amdgpu_bo_va *bo_va = entry->bo_va;
> + struct amdgpu_device *adev = entry->adev;
>   struct amdgpu_vm *vm = bo_va->base.vm;
>
>   amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
> @@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device 
> *adev,
>
>   amdgpu_sync_fence(sync, bo_va->last_pt_update);
>
> - return 0;
> + kfd_mem_dmaunmap_attachment(mem, entry);
>  }
>
> -static int update_gpuvm_pte(struct amdgpu_device *adev,
> - struct kfd_mem_attachment *entry,
> - struct amdgpu_sync *sync)
> +static int update_gpuvm_pte(struct kgd_mem *mem,
> + struct kfd_mem_attachment *entry,
> + struct amdgpu_sync *sync)
>  {
> - int ret;
>   struct amdgpu_bo_va *bo_va = entry->bo_va;
> + struct amdgpu_device *adev = entry->adev;
> + int ret;
> +
> + ret = kfd_mem_dmamap_attachment(mem, entry);
> Should the dma mapping be done in the kfd_mem_attach function on a memory 
> object is attached to a vm the first time? Since each memory object can be 
> mapped to many GPU or many VMs, by doing dma mapping the first it is attached 
> can simplify the logics. Or even simpler, maybe we can just just dma map when 
> a memory object is created - it wastes some iommu page table entry but really 
> simplify the logic in this patch series. I found this series is not very easy 
> to understand.

The DMA mapping must be updated every time the physical memory
allocation changes, e.g. after a BO was evicted and restored. Basically,
if the physical pages of the BO change, we need to update the DMA
mapping to point to those new pages. Therefore I added this in the
update_gpu_vm_pte function, which is called after a BO has been
validated the first time, or revalidated after an eviction.

You'll also see that I call dmaunmap in the re-validation cases (in the
restore workers below) to ensure that we don't leak DMA mappings.

Regards,
  Felix


> + if (ret)
> + return ret;
>
>   /* Update the page tables  */
>   ret = amdgpu_vm_bo_update(adev, bo_va, false);
> @@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device 
> *adev,
>   return amdgpu_sync_fence(sync, bo_va->last_pt_update);
>  }
>
> -static int map_bo_to_gpuvm(struct amdgpu_device *adev,
> - struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
> - bool no_update_pte)
> +static int map_bo_to_gpuvm(struct kgd_mem *mem,
> +struct kfd_mem_attachment *entry,
> +struct amdgpu_sync *sync,
> +bool no_update_pte)
>  {
>   int ret;
>
>   /* Set virtual address for the allocation */
> - ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
> + ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
>  amdgpu_bo_size(entry->bo_va->base.bo),
>  entry->pte_flags);
>   if (ret) {
> @@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device 
> *adev,
>   if (no_update_pte)
>   return 0;
>
> - ret = update_gpuvm_pte(adev, entry, sync);
> + ret = update_gpuvm_pte(mem, entry, sync);
>   if (ret) {
>   pr_err("update_gpuvm_pte() failed\n");
>   goto update_gpuvm_pte_failed;
> @@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device 
> *adev,
>   return 0;
>
>  update_gpuvm_pte_failed:
> - unmap_bo_from_gpuvm(adev, entry, sync);
> + unmap_bo_from_gpuvm(mem, entry, sync);
>   return ret;
>  }
>
> @@ -1596

Re: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers

2021-04-26 Thread Felix Kuehling
Am 2021-04-26 um 8:09 p.m. schrieb Zeng, Oak:
> As I understand it, when one GPU map another GPU's vram, this vram should 
> also be mapped in iommu page table. Also normal GTT memory (versus userptr) 
> also need to be mapped in iommu. But don't see this code below.

Right, I'm not solving all problems at once. The next patch is there to
handle GTT BOs.

Peer mappings of doorbells, MMIO and VRAM still need to be handled in
the future. I'm trying to fix the worst issues first. This series should
get 99% of real world tests working.


>  I only see you map userptr in iommu. Maybe you map them in iommu not during 
> memory attachment time?
>
> Also see a nit-pick inline
>
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" 
>  
> wrote:
>
> Add BO-type specific helpers functions to DMA-map and unmap
> kfd_mem_attachments. Implement this functionality for userptrs by creating
> one SG BO per GPU and filling it with a DMA mapping of the pages from the
> original mem->bo.
>
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   8 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +-
>  2 files changed, 145 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index c24b2478f445..63668433f5a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
>
>  struct amdgpu_device;
>
> +enum kfd_mem_attachment_type {
> + KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
> + KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
> +};
> +
>  struct kfd_mem_attachment {
>   struct list_head list;
> + enum kfd_mem_attachment_type type;
> + bool is_mapped;
>   struct amdgpu_bo_va *bo_va;
>   struct amdgpu_device *adev;
> - bool is_mapped;
>   uint64_t va;
>   uint64_t pte_flags;
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index fbd7e786b54e..49d1af4aa5f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device 
> *adev, struct kgd_mem *mem)
>   return pte_flags;
>  }
>
> +static int
> +kfd_mem_dmamap_userptr(struct kgd_mem *mem,
> +struct kfd_mem_attachment *attachment)
> +{
> + enum dma_data_direction direction =
> + mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
> + DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
> + struct ttm_operation_ctx ctx = {.interruptible = true};
> + struct amdgpu_bo *bo = attachment->bo_va->base.bo;
> + struct amdgpu_device *adev = attachment->adev;
> + struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
> + struct ttm_tt *ttm = bo->tbo.ttm;
> + int ret;
> +
> + ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
> + if (unlikely(!ttm->sg))
> + return -ENOMEM;
> +
> + if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
> + return -EINVAL;
> +
> + /* Same sequence as in amdgpu_ttm_tt_pin_userptr */
> + ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
> + ttm->num_pages, 0,
> + (u64)ttm->num_pages << PAGE_SHIFT,
> + GFP_KERNEL);
> + if (unlikely(ret))
> + goto release_sg;
> Should go to a label starting from kfree below?

Thanks, I'll fix that.

Regards,
  Felix


> +
> + ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
> + if (unlikely(ret))
> + goto release_sg;
> +
> + drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
> +ttm->num_pages);
> +
> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
> + ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> + if (ret)
> + goto release_sg;
> +
> + return 0;
> +
> +release_sg:
> + pr_err("DMA map userptr failed: %d\n", ret);
> + sg_free_table(ttm->sg);
> + kfree(ttm->sg);
> + ttm->sg = NULL;
> + return ret;
> +}
> +
> +static int
> +kfd_mem_dmamap_attachment(struct kgd_mem *mem,
> +   struct kfd_mem_attachment *attachment)
> +{
> + switch (attachment->type) {
> + case KFD_MEM_ATT_SHARED:
> + return 0;
> + case KFD_MEM_ATT_USERPTR:
> + return kfd_mem_dmamap_userptr(mem, attachment);
> + default:
> + WARN_ON_ONCE(1);
> + }
> + return -EINVAL;
> 

Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs

2021-04-26 Thread Zeng, Oak


Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" 
 
wrote:

Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 63668433f5a6..b706e5a54782 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -41,6 +41,7 @@ struct amdgpu_device;
 enum kfd_mem_attachment_type {
KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
+   KFD_MEM_ATT_DMABUF, /* DMAbuf to DMA map TTM BOs */
 };

 struct kfd_mem_attachment {
@@ -56,6 +57,7 @@ struct kfd_mem_attachment {
 struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
+   struct dma_buf *dmabuf;
struct list_head attachments;
/* protected by amdkfd_process_info.lock */
struct ttm_validate_buffer validate_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9eeedd0c7920..18a1f9222a59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
return ret;
 }

+static int
+kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+   return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
How does this work? The function name says this is dma mapping a buffer but 
from the implementation, it is just a placement and validation
+}
+
 static int
 kfd_mem_dmamap_attachment(struct kgd_mem *mem,
  struct kfd_mem_attachment *attachment)
@@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
return 0;
case KFD_MEM_ATT_USERPTR:
return kfd_mem_dmamap_userptr(mem, attachment);
+   case KFD_MEM_ATT_DMABUF:
+   return kfd_mem_dmamap_dmabuf(attachment);
default:
WARN_ON_ONCE(1);
}
@@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
ttm->sg = NULL;
 }

+static void
+kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+   /* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
+* called
+*/
+}
+
 static void
 kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
struct kfd_mem_attachment *attachment)
@@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
case KFD_MEM_ATT_USERPTR:
kfd_mem_dmaunmap_userptr(mem, attachment);
break;
+   case KFD_MEM_ATT_DMABUF:
+   kfd_mem_dmaunmap_dmabuf(attachment);
+   break;
default:
WARN_ON_ONCE(1);
}
@@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, 
struct kgd_mem *mem,
return 0;
 }

+static int
+kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
+ struct amdgpu_bo **bo)
+{
+   struct drm_gem_object *gobj;
+
+   if (!mem->dmabuf) {
+   mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DRM_RDWR : 0);
+   if (IS_ERR(mem->dmabuf)) {
+   mem->dmabuf = NULL;
+   return PTR_ERR(mem->dmabuf);
+   }
+   }
+
+   gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
+   if (IS_ERR(gobj))
+   return PTR_ERR(gobj);
+
+   /* Import takes an extra reference on the dmabuf. Drop it now to
+* avoid leaking it. We only need the one reference in
+* kgd_mem->dmabuf.
+*/
+   dma_buf_put(mem->dmabuf);
+
+   *bo = gem_to_amdgpu_bo(gobj);
+   (*bo)->parent = amdgpu_bo_ref(mem->bo);
+
+   return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo d

Re: [Nouveau] [PATCH v4 00/17] drm: Use new DRM printk funcs (like drm_dbg_*()) in DP helpers

2021-04-26 Thread Dave Airlie
On Sat, 24 Apr 2021 at 04:43, Lyude Paul  wrote:
>
> Since it's been asked quite a few times on some of the various DP
> related patch series I've submitted to use the new DRM printk helpers,
> and it technically wasn't really trivial to do this before due to the
> lack of a consistent way to find a drm_device for an AUX channel, this
> patch series aims to address this. In this series we:
>
> * (NEW! starting from V3) Make sure drm_dbg_*() and friends can handle
>   NULL drm device pointers
> * Clean-up potentially erroneous usages of drm_dp_aux_init() and
>   drm_dp_aux_register() so that actual AUX registration doesn't happen
>   until we have an associated DRM device
> * Clean-up any obvious errors in drivers we find along the way
> * Add a backpointer to the respective drm_device for an AUX channel in
>   drm_dp_aux.drm_dev, and hook it up in every driver with an AUX channel
>   across the tree
> * Add a new ratelimited print helper we'll need for converting the DP
>   helpers over to using the new DRM printk helpers
> * Fix any inconsistencies with logging in drm_dp_helper.c so we always
>   have the aux channel name printed
> * Prepare the various DP helpers so they can find the correct drm_device
>   to use for logging
> * And finally, convert all of the DP helpers over to using drm_dbg_*()
>   and drm_err().
>
> Major changes in v4:
> * Don't move i2c aux init into drm_dp_aux_init(), since I think I've
>   found a much better solution to tegra's issues:
>   https://patchwork.freedesktop.org/series/89420/

I've given this a general once over, seems like a good plan and since
it's mostly mechanical.

for the series:
Reviewed-by: Dave Airlie 

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings

2021-04-26 Thread Zeng, Oak


Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" 
 
wrote:

DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
with the BO and page tables reserved, so we can safely update the DMA
mapping.

DMA unmap when a BO is unmapped from a GPU and before updating mappings
in restore workers.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++-
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 49d1af4aa5f1..7d25d886b98c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct 
bo_vm_reservation_context *ctx,
return ret;
 }

-static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
+static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
struct kfd_mem_attachment *entry,
struct amdgpu_sync *sync)
 {
struct amdgpu_bo_va *bo_va = entry->bo_va;
+   struct amdgpu_device *adev = entry->adev;
struct amdgpu_vm *vm = bo_va->base.vm;

amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
@@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device 
*adev,

amdgpu_sync_fence(sync, bo_va->last_pt_update);

-   return 0;
+   kfd_mem_dmaunmap_attachment(mem, entry);
 }

-static int update_gpuvm_pte(struct amdgpu_device *adev,
-   struct kfd_mem_attachment *entry,
-   struct amdgpu_sync *sync)
+static int update_gpuvm_pte(struct kgd_mem *mem,
+   struct kfd_mem_attachment *entry,
+   struct amdgpu_sync *sync)
 {
-   int ret;
struct amdgpu_bo_va *bo_va = entry->bo_va;
+   struct amdgpu_device *adev = entry->adev;
+   int ret;
+
+   ret = kfd_mem_dmamap_attachment(mem, entry);
Should the dma mapping be done in the kfd_mem_attach function on a memory 
object is attached to a vm the first time? Since each memory object can be 
mapped to many GPU or many VMs, by doing dma mapping the first it is attached 
can simplify the logics. Or even simpler, maybe we can just just dma map when a 
memory object is created - it wastes some iommu page table entry but really 
simplify the logic in this patch series. I found this series is not very easy 
to understand.
+   if (ret)
+   return ret;

/* Update the page tables  */
ret = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device 
*adev,
return amdgpu_sync_fence(sync, bo_va->last_pt_update);
 }

-static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-   struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
-   bool no_update_pte)
+static int map_bo_to_gpuvm(struct kgd_mem *mem,
+  struct kfd_mem_attachment *entry,
+  struct amdgpu_sync *sync,
+  bool no_update_pte)
 {
int ret;

/* Set virtual address for the allocation */
-   ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
+   ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
   amdgpu_bo_size(entry->bo_va->base.bo),
   entry->pte_flags);
if (ret) {
@@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
if (no_update_pte)
return 0;

-   ret = update_gpuvm_pte(adev, entry, sync);
+   ret = update_gpuvm_pte(mem, entry, sync);
if (ret) {
pr_err("update_gpuvm_pte() failed\n");
goto update_gpuvm_pte_failed;
@@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
return 0;

 update_gpuvm_pte_failed:
-   unmap_bo_from_gpuvm(adev, entry, sync);
+   unmap_bo_from_gpuvm(mem, entry, sync);
return ret;
 }

@@ -1596,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
 entry->va, entry->va + bo_size, entry);

-   ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+   ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
  is_invalid_userptr);
if (ret) {
pr_err("Failed to map bo to gpuvm\n");
@@ -1635,7 +1642,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv)
 {
-   struct amdgpu_device *adev = get_amdgpu_device(kgd)

Re: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers

2021-04-26 Thread Zeng, Oak
As I understand it, when one GPU map another GPU's vram, this vram should also 
be mapped in iommu page table. Also normal GTT memory (versus userptr) also 
need to be mapped in iommu. But don't see this code below. I only see you map 
userptr in iommu. Maybe you map them in iommu not during memory attachment time?

Also see a nit-pick inline

Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" 
 
wrote:

Add BO-type specific helpers functions to DMA-map and unmap
kfd_mem_attachments. Implement this functionality for userptrs by creating
one SG BO per GPU and filling it with a DMA mapping of the pages from the
original mem->bo.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +-
 2 files changed, 145 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index c24b2478f445..63668433f5a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;

 struct amdgpu_device;

+enum kfd_mem_attachment_type {
+   KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
+   KFD_MEM_ATT_USERPTR,/* SG bo to DMA map pages from a userptr bo */
+};
+
 struct kfd_mem_attachment {
struct list_head list;
+   enum kfd_mem_attachment_type type;
+   bool is_mapped;
struct amdgpu_bo_va *bo_va;
struct amdgpu_device *adev;
-   bool is_mapped;
uint64_t va;
uint64_t pte_flags;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fbd7e786b54e..49d1af4aa5f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device 
*adev, struct kgd_mem *mem)
return pte_flags;
 }

+static int
+kfd_mem_dmamap_userptr(struct kgd_mem *mem,
+  struct kfd_mem_attachment *attachment)
+{
+   enum dma_data_direction direction =
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+   struct ttm_operation_ctx ctx = {.interruptible = true};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+   struct amdgpu_device *adev = attachment->adev;
+   struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
+   struct ttm_tt *ttm = bo->tbo.ttm;
+   int ret;
+
+   ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
+   if (unlikely(!ttm->sg))
+   return -ENOMEM;
+
+   if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
+   return -EINVAL;
+
+   /* Same sequence as in amdgpu_ttm_tt_pin_userptr */
+   ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
+   ttm->num_pages, 0,
+   (u64)ttm->num_pages << PAGE_SHIFT,
+   GFP_KERNEL);
+   if (unlikely(ret))
+   goto release_sg;
Should go to a label starting from kfree below?
+
+   ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
+   if (unlikely(ret))
+   goto release_sg;
+
+   drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
+  ttm->num_pages);
+
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+   ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+   if (ret)
+   goto release_sg;
+
+   return 0;
+
+release_sg:
+   pr_err("DMA map userptr failed: %d\n", ret);
+   sg_free_table(ttm->sg);
+   kfree(ttm->sg);
+   ttm->sg = NULL;
+   return ret;
+}
+
+static int
+kfd_mem_dmamap_attachment(struct kgd_mem *mem,
+ struct kfd_mem_attachment *attachment)
+{
+   switch (attachment->type) {
+   case KFD_MEM_ATT_SHARED:
+   return 0;
+   case KFD_MEM_ATT_USERPTR:
+   return kfd_mem_dmamap_userptr(mem, attachment);
+   default:
+   WARN_ON_ONCE(1);
+   }
+   return -EINVAL;
+}
+
+static void
+kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
+struct kfd_mem_attachment *attachment)
+{
+   enum dma_data_direction direction =
+   mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+   struct ttm_operation_ctx ctx = {.interruptible = false};
+   struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+   struct amdgpu_device *adev = attachment->adev;
 

[PATCH 5/5] drm/amdkfd: enable subsequent retry fault

2021-04-26 Thread Philip Yang
After draining the stale retry fault, or failed to validate the range
to recover, have to remove the fault address from fault filter ring, to
be able to handle subsequent retry interrupt on same address. Otherwise
the retry fault will not be processed to recover until timeout passed.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 00d759b257f4..d9111fea724b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2363,8 +2363,10 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
 
mutex_lock(&prange->migrate_mutex);
 
-   if (svm_range_skip_recover(prange))
+   if (svm_range_skip_recover(prange)) {
+   amdgpu_gmc_filter_faults_remove(adev, addr, pasid);
goto out_unlock_range;
+   }
 
timestamp = ktime_to_us(ktime_get()) - prange->validate_timestamp;
/* skip duplicate vm fault on different pages of same range */
@@ -2426,6 +2428,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
 
if (r == -EAGAIN) {
pr_debug("recover vm fault later\n");
+   amdgpu_gmc_filter_faults_remove(adev, addr, pasid);
r = 0;
}
return r;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v4 4/5] drm/amdgpu: address remove from fault filter

2021-04-26 Thread Philip Yang
Add interface to remove address from fault filter ring by resetting
fault ring entry key, then future vm fault on the address will be
processed to recover.

Define fault key as atomic64_t type to use atomic read/set/cmpxchg key
to protect fault ring access by interrupt handler and interrupt deferred
work for vg20. Change fault->timestamp to 56-bit to share same uint64_t
with 8-bit fault->next, it is big enough for 48bit IH timestamp.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 54 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  6 ++-
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c39ed9eb0987..888b749bd75e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -332,6 +332,17 @@ void amdgpu_gmc_agp_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
mc->agp_size >> 20, mc->agp_start, mc->agp_end);
 }
 
+/**
+ * amdgpu_gmc_fault_key - get hask key from vm fault address and pasid
+ *
+ * @addr: 48bit physical address
+ * @pasid: 4 bit
+ */
+static inline uint64_t amdgpu_gmc_fault_key(uint64_t addr, uint16_t pasid)
+{
+   return addr << 4 | pasid;
+}
+
 /**
  * amdgpu_gmc_filter_faults - filter VM faults
  *
@@ -349,13 +360,14 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
 {
struct amdgpu_gmc *gmc = &adev->gmc;
 
-   uint64_t stamp, key = addr << 4 | pasid;
+   uint64_t stamp, key = amdgpu_gmc_fault_key(addr, pasid);
struct amdgpu_gmc_fault *fault;
uint32_t hash;
 
/* If we don't have space left in the ring buffer return immediately */
stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) -
AMDGPU_GMC_FAULT_TIMEOUT;
+
if (gmc->fault_ring[gmc->last_fault].timestamp >= stamp)
return true;
 
@@ -365,7 +377,7 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
while (fault->timestamp >= stamp) {
uint64_t tmp;
 
-   if (fault->key == key)
+   if (atomic64_read(&fault->key) == key)
return true;
 
tmp = fault->timestamp;
@@ -378,7 +390,7 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
 
/* Add the fault to the ring */
fault = &gmc->fault_ring[gmc->last_fault];
-   fault->key = key;
+   atomic64_set(&fault->key, key);
fault->timestamp = timestamp;
 
/* And update the hash */
@@ -387,6 +399,42 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
return false;
 }
 
+/**
+ * amdgpu_gmc_filter_faults_remove - remove address from VM faults filter
+ *
+ * @adev: amdgpu device structure
+ * @addr: address of the VM fault
+ * @pasid: PASID of the process causing the fault
+ *
+ * Remove the address from fault filter, then future vm fault on this address
+ * will pass to retry fault handler to recover.
+ */
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
+uint16_t pasid)
+{
+   struct amdgpu_gmc *gmc = &adev->gmc;
+
+   uint64_t key = amdgpu_gmc_fault_key(addr, pasid);
+   struct amdgpu_gmc_fault *fault;
+   uint32_t hash;
+
+   hash = hash_64(key, AMDGPU_GMC_FAULT_HASH_ORDER);
+   fault = &gmc->fault_ring[gmc->fault_hash[hash].idx];
+   while (true) {
+   uint64_t tmp;
+
+   if (atomic64_cmpxchg(&fault->key, key, 0) == key)
+   break;
+
+   tmp = fault->timestamp;
+   fault = &gmc->fault_ring[fault->next];
+
+   /* Check if the entry was reused */
+   if (fault->timestamp >= tmp)
+   break;
+   }
+}
+
 int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev)
 {
int r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 9d11c02a3938..95e18ef83aec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -66,9 +66,9 @@ struct firmware;
  * GMC page fault information
  */
 struct amdgpu_gmc_fault {
-   uint64_ttimestamp;
+   uint64_ttimestamp:56;
uint64_tnext:AMDGPU_GMC_FAULT_RING_ORDER;
-   uint64_tkey:52;
+   atomic64_t  key;
 };
 
 /*
@@ -318,6 +318,8 @@ void amdgpu_gmc_agp_location(struct amdgpu_device *adev,
 struct amdgpu_gmc *mc);
 bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, uint64_t addr,
  uint16_t pasid, uint64_t timestamp);
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device *adev, uint64_t addr,
+uint16_t pasid);
 int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev);
 v

Crash on device remove in drm_mode_config_cleanup

2021-04-26 Thread Andrey Grodzovsky
Daniel, Harry and Nick - in latest drm-misc-next (5.12.rc3) I am testing 
for device unplug patches a user testing with eGPU box reported a crash 
on unplug. I debugged myself a bit and I see that 
drm_mode_config_cleanup is called twice - once explicitly from display 
shutdown code and once as a callback from drm_managed_release. 
Obliviously there is a problem here.  What's the best way to fix this ?


root@andrey-test:~# echo 1 > 
/sys/bus/pci/drivers/amdgpu/\:05\:00.0/remove
[   37.068698 <3.923109>] amdgpu :05:00.0: amdgpu: amdgpu: 
finishing device.
[   37.081385 <0.012687>] CPU: 1 PID: 2397 Comm: bash Tainted: G 
B   W  OE 5.12.0-rc3-drm-misc-next+ #3
[   37.081397 <0.12>] Hardware name: ASUS System Product 
Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1004 08/13/2020

[   37.081402 <0.05>] Call Trace:
[   37.081407 <0.05>]  dump_stack+0xa5/0xe6
[   37.081419 <0.12>]  drm_mode_config_cleanup.cold+0x5/0x4f [drm]
[   37.081555 <0.000136>]  ? drm_mode_config_reset+0x220/0x220 [drm]
[   37.081689 <0.000134>]  ? kfree+0xf3/0x3c0
[   37.081699 <0.10>]  amdgpu_dm_fini+0x73/0x230 [amdgpu]
[   37.082541 <0.000842>]  dm_hw_fini+0x1e/0x30 [amdgpu]
[   37.083404 <0.000863>]  amdgpu_device_fini_hw+0x38f/0x660 [amdgpu]
[   37.084030 <0.000626>]  amdgpu_pci_remove+0x40/0x60 [amdgpu]
[   37.084524 <0.000494>]  pci_device_remove+0x82/0x120
[   37.084531 <0.07>]  device_release_driver_internal+0x17b/0x2a0
[   37.084537 <0.06>]  ? sysfs_file_ops+0xa0/0xa0
[   37.084541 <0.04>]  pci_stop_bus_device+0xd5/0x100
[   37.084547 <0.06>] 
pci_stop_and_remove_bus_device_locked+0x16/0x30

[   37.084552 <0.05>]  remove_store+0xe7/0x100
[   37.084557 <0.05>]  ? subordinate_bus_number_show+0xc0/0xc0
[   37.084563 <0.06>]  ? __check_object_size+0x16b/0x480
[   37.084572 <0.09>]  ? sysfs_file_ops+0x76/0xa0
[   37.084577 <0.05>]  ? sysfs_kf_write+0x83/0xe0
[   37.084582 <0.05>]  kernfs_fop_write_iter+0x1ef/0x290
[   37.084587 <0.05>]  new_sync_write+0x253/0x370
[   37.084591 <0.04>]  ? new_sync_read+0x360/0x360
[   37.084596 <0.05>]  ? lockdep_hardirqs_on_prepare+0x210/0x210
[   37.084603 <0.07>]  ? __cond_resched+0x15/0x30
[   37.084608 <0.05>]  ? __inode_security_revalidate+0xa2/0xb0
[   37.084614 <0.06>]  ? __might_sleep+0x45/0xf0
[   37.084620 <0.06>]  vfs_write+0x3d7/0x4e0
[   37.084624 <0.04>]  ? ksys_write+0xe6/0x1a0
[   37.084629 <0.05>]  ksys_write+0xe6/0x1a0
[   37.084633 <0.04>]  ? __ia32_sys_read+0x60/0x60
[   37.084638 <0.05>]  ? lockdep_hardirqs_on_prepare+0xe/0x210
[   37.084643 <0.05>]  ? syscall_enter_from_user_mode+0x27/0x70
[   37.084648 <0.05>]  do_syscall_64+0x33/0x80
[   37.084653 <0.05>]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   37.084658 <0.05>] RIP: 0033:0x7f576c3e01e7
[   37.084663 <0.05>] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 
0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 
01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 
18 48 89 74 24
[   37.084667 <0.04>] RSP: 002b:7ffcf7b05948 EFLAGS: 
0246 ORIG_RAX: 0001
[   37.084672 <0.05>] RAX: ffda RBX: 
0002 RCX: 7f576c3e01e7
[   37.084675 <0.03>] RDX: 0002 RSI: 
5568ffe63d80 RDI: 0001
[   37.084678 <0.03>] RBP: 5568ffe63d80 R08: 
000a R09: 0001
[   37.084681 <0.03>] R10: 5568ff9f3017 R11: 
0246 R12: 0002
[   37.084684 <0.03>] R13: 7f576c4bb6a0 R14: 
7f576c4bc4a0 R15: 7f576c4bb8a0
[   37.400338 <0.315654>] amdgpu :05:00.0: 
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test 
failed (-110)

[   37.401171 <0.000833>] [drm] free PSP TMR buffer
[   37.443240 <0.042069>] [drm] amdgpu: ttm finalized
[   37.443246 <0.06>] x86/PAT: bash:2397 freeing invalid memtype 
[mem 0xd000-0xdfff]
[   37.443945 <0.000699>] CPU: 3 PID: 2397 Comm: bash Tainted: G 
B   W  OE 5.12.0-rc3-drm-misc-next+ #3
[   37.443952 <0.07>] Hardware name: ASUS System Product 
Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1004 08/13/2020

[   37.443956 <0.04>] Call Trace:
[   37.443959 <0.03>]  dump_stack+0xa5/0xe6
[   37.443967 <0.08>]  drm_mode_config_cleanup.cold+0x5/0x4f [drm]
[   37.444048 <0.81>]  ? drm_mode_config_reset+0x220/0x220 [drm]
[   37.444129 <0.81>]  ? drm_mode_config_cleanup+0x430/0x430 [drm]
[   37.444208 <0.79>]  drm_managed_release+0xf2/0x1c0 [drm]
[   37.444287 <0.79>]  drm_dev_release+0x4d/0x80 [drm]
[   37.444363 <0.76>]  release_nodes+0x373/0x3e0
[   37.444371 <0.08>]  ? devres_close_group+0x150/0x150
[   37.444376 <

Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

Fair point.  Either way works for me.

Alex

From: Christian König 
Sent: Monday, April 26, 2021 3:48 PM
To: Deucher, Alexander 
Cc: amd-gfx list ; Sun, Roy ; 
Nieto, David M 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

My concern is more to get this tested from more people than just AMD.

Christian.

Am 26.04.21 um 21:40 schrieb Deucher, Alexander:

[AMD Official Use Only - Internal Distribution Only]

That said, it would be easier for me to merge through the AMD tree since a 
relatively big AMD feature depends on it.  Not sure how much conflict potential 
there is if this goes through the AMD tree.

Alex


From: amd-gfx 

 on behalf of Deucher, Alexander 

Sent: Monday, April 26, 2021 3:24 PM
To: Christian König 

Cc: amd-gfx list 
; Sun, Roy 
; Nieto, David M 

Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

No objections from me.

Thanks!

Alex


From: Christian König 

Sent: Monday, April 26, 2021 2:49 AM
To: Deucher, Alexander 

Cc: Nieto, David M ; Sun, Roy 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto 
> Signed-off-by: Roy Sun 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs 
> from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
> from pending list
>*
>* @sched: scheduler instance
>* @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>/*
> * Don't destroy jobs while the timeout worker is running  OR thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>/* remove job from pending_list */
>list_del_init(&job->list);
> We just need to record the scheduled time of the next job. So we
> need not to check the rest job.
> + /* account for the next fence in the queue */
> + next = list_first_entry_or_null(&sched->pending_list,
> + struct drm_sched_job, list);
> + if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> + &job->s_fence->finished.flags)) {
> + next->s_fence->scheduled.timestamp =
> + job->s_fence->finished.timestamp;
> + }
>} else {
>job = NULL;
>/* queue timeout for next job */


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Christian König

My concern is more to get this tested from more people than just AMD.

Christian.

Am 26.04.21 um 21:40 schrieb Deucher, Alexander:


[AMD Official Use Only - Internal Distribution Only]


That said, it would be easier for me to merge through the AMD tree 
since a relatively big AMD feature depends on it.  Not sure how much 
conflict potential there is if this goes through the AMD tree.


Alex


*From:* amd-gfx  on behalf of 
Deucher, Alexander 

*Sent:* Monday, April 26, 2021 3:24 PM
*To:* Christian König 
*Cc:* amd-gfx list ; Sun, Roy 
; Nieto, David M 

*Subject:* Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]


No objections from me.

Thanks!

Alex


*From:* Christian König 
*Sent:* Monday, April 26, 2021 2:49 AM
*To:* Deucher, Alexander 
*Cc:* Nieto, David M ; Sun, Roy 
; amd-gfx list 

*Subject:* Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track
Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto 
> Signed-off-by: Roy Sun 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct 
drm_gpu_scheduler *sched)

>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number 
of jobs from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number 
of jobs from pending list

>    *
>    * @sched: scheduler instance
>    * @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler 
*sched)

>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>    /*
> * Don't destroy jobs while the timeout worker is running  OR 
thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct 
drm_gpu_scheduler *sched)

>    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>    /* remove job from pending_list */
> list_del_init(&job->list);
> We just need to record the scheduled time of the next job. So we
> need not to check the rest job.
> + /* account for the next fence in the queue */
> + next = list_first_entry_or_null(&sched->pending_list,
> + struct drm_sched_job, list);
> + if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> + &job->s_fence->finished.flags)) {
> + next->s_fence->scheduled.timestamp =
> + job->s_fence->finished.timestamp;
> + }
>    } else {
>    job = NULL;
>    /* queue timeout for next job */



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: fix spelling mistake in packet manager

2021-04-26 Thread Jonathan Kim
The plural of 'process' should be 'processes'.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 0ce507d7208a..f688451cb299 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -124,14 +124,14 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 {
unsigned int alloc_size_bytes;
unsigned int *rl_buffer, rl_wptr, i;
-   int retval, proccesses_mapped;
+   int retval, processes_mapped;
struct device_process_node *cur;
struct qcm_process_device *qpd;
struct queue *q;
struct kernel_queue *kq;
bool is_over_subscription;
 
-   rl_wptr = retval = proccesses_mapped = 0;
+   rl_wptr = retval = processes_mapped = 0;
 
retval = pm_allocate_runlist_ib(pm, &rl_buffer, rl_gpu_addr,
&alloc_size_bytes, &is_over_subscription);
@@ -148,7 +148,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
list_for_each_entry(cur, queues, list) {
qpd = cur->qpd;
/* build map process packet */
-   if (proccesses_mapped >= pm->dqm->processes_count) {
+   if (processes_mapped >= pm->dqm->processes_count) {
pr_debug("Not enough space left in runlist IB\n");
pm_release_ib(pm);
return -ENOMEM;
@@ -158,7 +158,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
if (retval)
return retval;
 
-   proccesses_mapped++;
+   processes_mapped++;
inc_wptr(&rl_wptr, pm->pmf->map_process_size,
alloc_size_bytes);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

That said, it would be easier for me to merge through the AMD tree since a 
relatively big AMD feature depends on it.  Not sure how much conflict potential 
there is if this goes through the AMD tree.

Alex


From: amd-gfx  on behalf of Deucher, 
Alexander 
Sent: Monday, April 26, 2021 3:24 PM
To: Christian König 
Cc: amd-gfx list ; Sun, Roy ; 
Nieto, David M 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track


[AMD Official Use Only - Internal Distribution Only]


[AMD Official Use Only - Internal Distribution Only]

No objections from me.

Thanks!

Alex


From: Christian König 
Sent: Monday, April 26, 2021 2:49 AM
To: Deucher, Alexander 
Cc: Nieto, David M ; Sun, Roy ; amd-gfx 
list 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto 
> Signed-off-by: Roy Sun 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs 
> from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
> from pending list
>*
>* @sched: scheduler instance
>* @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>/*
> * Don't destroy jobs while the timeout worker is running  OR thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>/* remove job from pending_list */
>list_del_init(&job->list);
> We just need to record the scheduled time of the next job. So we
> need not to check the rest job.
> + /* account for the next fence in the queue */
> + next = list_first_entry_or_null(&sched->pending_list,
> + struct drm_sched_job, list);
> + if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> + &job->s_fence->finished.flags)) {
> + next->s_fence->scheduled.timestamp =
> + job->s_fence->finished.timestamp;
> + }
>} else {
>job = NULL;
>/* queue timeout for next job */

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only]

No objections from me.

Thanks!

Alex


From: Christian König 
Sent: Monday, April 26, 2021 2:49 AM
To: Deucher, Alexander 
Cc: Nieto, David M ; Sun, Roy ; amd-gfx 
list 
Subject: Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Hey Alex,

any objections that we merge those two patches through drm-misc-next?

Thanks,
Christian.

Am 26.04.21 um 08:27 schrieb Roy Sun:
> Update the timestamp of scheduled fence on HW
> completion of the previous fences
>
> This allow more accurate tracking of the fence
> execution in HW
>
> Signed-off-by: David M Nieto 
> Signed-off-by: Roy Sun 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 92d8de24d0a1..f8e39ab0c41b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>
>   /**
> - * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs 
> from mirror ring list
> + * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
> from pending list
>*
>* @sched: scheduler instance
>* @max: job numbers to relaunch
> @@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   static struct drm_sched_job *
>   drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> - struct drm_sched_job *job;
> + struct drm_sched_job *job, *next;
>
>/*
> * Don't destroy jobs while the timeout worker is running  OR thread
> @@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>/* remove job from pending_list */
>list_del_init(&job->list);
> We just need to record the scheduled time of the next job. So we
> need not to check the rest job.
> + /* account for the next fence in the queue */
> + next = list_first_entry_or_null(&sched->pending_list,
> + struct drm_sched_job, list);
> + if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> + &job->s_fence->finished.flags)) {
> + next->s_fence->scheduled.timestamp =
> + job->s_fence->finished.timestamp;
> + }
>} else {
>job = NULL;
>/* queue timeout for next job */

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/3] drm/color: Add RGB Color encodings

2021-04-26 Thread Ville Syrjälä
On Mon, Apr 26, 2021 at 02:56:26PM -0400, Harry Wentland wrote:
> On 2021-04-26 2:07 p.m., Ville Syrjälä wrote:
> > On Mon, Apr 26, 2021 at 01:38:50PM -0400, Harry Wentland wrote:
> >> From: Bhawanpreet Lakha 
> >>
> >> Add the following color encodings
> >> - RGB versions for BT601, BT709, BT2020
> >> - DCI-P3: Used for digital movies
> >>
> >> Signed-off-by: Bhawanpreet Lakha 
> >> Signed-off-by: Harry Wentland 
> >> ---
> >>   drivers/gpu/drm/drm_color_mgmt.c | 4 
> >>   include/drm/drm_color_mgmt.h | 4 
> >>   2 files changed, 8 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_color_mgmt.c 
> >> b/drivers/gpu/drm/drm_color_mgmt.c
> >> index bb14f488c8f6..a183ebae2941 100644
> >> --- a/drivers/gpu/drm/drm_color_mgmt.c
> >> +++ b/drivers/gpu/drm/drm_color_mgmt.c
> >> @@ -469,6 +469,10 @@ static const char * const color_encoding_name[] = {
> >>[DRM_COLOR_YCBCR_BT601] = "ITU-R BT.601 YCbCr",
> >>[DRM_COLOR_YCBCR_BT709] = "ITU-R BT.709 YCbCr",
> >>[DRM_COLOR_YCBCR_BT2020] = "ITU-R BT.2020 YCbCr",
> >> +  [DRM_COLOR_RGB_BT601] = "ITU-R BT.601 RGB",
> >> +  [DRM_COLOR_RGB_BT709] = "ITU-R BT.709 RGB",
> >> +  [DRM_COLOR_RGB_BT2020] = "ITU-R BT.2020 RGB",
> >> +  [DRM_COLOR_P3] = "DCI-P3",
> > 
> > These are a totally different thing than the YCbCr stuff.
> > The YCbCr stuff just specifies the YCbCr<->RGB converison matrix,
> > whereas these are I guess supposed to specify the primaries/whitepoint?
> > But without specifying what we're converting *to* these mean absolutely
> > nothing. Ie. I don't think they belong in this property.
> > 
> 
> If this is the intention I don't see it documented.
> 
> I might have overlooked something but do we have a way to explicitly 
> specify today what *to* format the YCbCr color encodings convert into? 

These just specific which YCbCr<->RGB matrix to use as specificed
in the relevant standards. The primaries/whitepoint/etc. don't
change at all.

> Would that be a combination of the output color encoding specified via 
> colorspace_property and the color space encoded in the primaries and 
> whitepoint of the hdr_output_metadata?

Those propertis only affect the infoframes. They don't apply any
color processing to the data.

> 
> Fundamentally I don't see how the use of this property differs, whether 
> you translate from YCbCr or from RGB. In either case you're converting 
> from the defined input color space and pixel format to an output color 
> space and pixel format.

The gamut does not change when you do YCbCr<->RGB conversion.

> 
> > The previous proposals around this topic have suggested a new
> > property to specify a conversion matrix either explicitly, or
> > via a separate enum (which would specify both the src and dst
> > colorspaces). I've always argued the enum approach is needed
> > anyway since not all hardware has a programmable matrix for
> > this. But a fully programmable matrix could be nice for tone
> > mapping purposes/etc, so we may want to make sure both are
> > possible.
> > 
> > As for the transfer func, the proposals so far have mostly just
> > been to expose a programmable degamma/gamma LUTs for each plane.
> > But considering how poor the current gamma uapi is we've thrown
> > around some ideas how to allow the kernel to properly expose the
> > hw capabilities. This is one of those ideas:
> > https://lists.freedesktop.org/archives/dri-devel/2019-April/212886.html>> I 
> > think that basic idea could be also be extended to allow fixed
> > curves in case the hw doesn't have a fully programmable LUT. But
> > dunno if that's relevant for your hw.
> > 
> 
> The problem with exposing gamma, whether per-plane or per-crtc, is that 
> it is hard to define an API that works for all the HW out there. The 
> capabilities for different HW differ a lot, not just between vendors but 
> also between generations of a vendor's HW.
> 
> Another reason I'm proposing to define the color space (and gamma) of a 
> plane is to make this explicit. Up until the color space and gamma of a 
> plane or framebuffer are not well defined, which leads to drivers 
> assuming the color space and gamma of a buffer (for blending and other 
> purposes) and might lead to sub-optimal outcomes.

The current state is that things just get passed through as is
(apart from the crtc LUTs/CTM).

-- 
Ville Syrjälä
Intel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/3] drm/color: Add RGB Color encodings

2021-04-26 Thread Harry Wentland

On 2021-04-26 2:07 p.m., Ville Syrjälä wrote:

On Mon, Apr 26, 2021 at 01:38:50PM -0400, Harry Wentland wrote:

From: Bhawanpreet Lakha 

Add the following color encodings
- RGB versions for BT601, BT709, BT2020
- DCI-P3: Used for digital movies

Signed-off-by: Bhawanpreet Lakha 
Signed-off-by: Harry Wentland 
---
  drivers/gpu/drm/drm_color_mgmt.c | 4 
  include/drm/drm_color_mgmt.h | 4 
  2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index bb14f488c8f6..a183ebae2941 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -469,6 +469,10 @@ static const char * const color_encoding_name[] = {
[DRM_COLOR_YCBCR_BT601] = "ITU-R BT.601 YCbCr",
[DRM_COLOR_YCBCR_BT709] = "ITU-R BT.709 YCbCr",
[DRM_COLOR_YCBCR_BT2020] = "ITU-R BT.2020 YCbCr",
+   [DRM_COLOR_RGB_BT601] = "ITU-R BT.601 RGB",
+   [DRM_COLOR_RGB_BT709] = "ITU-R BT.709 RGB",
+   [DRM_COLOR_RGB_BT2020] = "ITU-R BT.2020 RGB",
+   [DRM_COLOR_P3] = "DCI-P3",


These are a totally different thing than the YCbCr stuff.
The YCbCr stuff just specifies the YCbCr<->RGB converison matrix,
whereas these are I guess supposed to specify the primaries/whitepoint?
But without specifying what we're converting *to* these mean absolutely
nothing. Ie. I don't think they belong in this property.



If this is the intention I don't see it documented.

I might have overlooked something but do we have a way to explicitly 
specify today what *to* format the YCbCr color encodings convert into? 
Would that be a combination of the output color encoding specified via 
colorspace_property and the color space encoded in the primaries and 
whitepoint of the hdr_output_metadata?


Fundamentally I don't see how the use of this property differs, whether 
you translate from YCbCr or from RGB. In either case you're converting 
from the defined input color space and pixel format to an output color 
space and pixel format.



The previous proposals around this topic have suggested a new
property to specify a conversion matrix either explicitly, or
via a separate enum (which would specify both the src and dst
colorspaces). I've always argued the enum approach is needed
anyway since not all hardware has a programmable matrix for
this. But a fully programmable matrix could be nice for tone
mapping purposes/etc, so we may want to make sure both are
possible.

As for the transfer func, the proposals so far have mostly just
been to expose a programmable degamma/gamma LUTs for each plane.
But considering how poor the current gamma uapi is we've thrown
around some ideas how to allow the kernel to properly expose the
hw capabilities. This is one of those ideas:
https://lists.freedesktop.org/archives/dri-devel/2019-April/212886.html>> I 
think that basic idea could be also be extended to allow fixed
curves in case the hw doesn't have a fully programmable LUT. But
dunno if that's relevant for your hw.



The problem with exposing gamma, whether per-plane or per-crtc, is that 
it is hard to define an API that works for all the HW out there. The 
capabilities for different HW differ a lot, not just between vendors but 
also between generations of a vendor's HW.


Another reason I'm proposing to define the color space (and gamma) of a 
plane is to make this explicit. Up until the color space and gamma of a 
plane or framebuffer are not well defined, which leads to drivers 
assuming the color space and gamma of a buffer (for blending and other 
purposes) and might lead to sub-optimal outcomes.


Harry

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH][next] drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"

2021-04-26 Thread Felix Kuehling
On 2021-04-26 8:15, Nirmoy wrote:
> Reviewed-by: Nirmoy Das 
>
> On 4/26/21 2:13 PM, Colin King wrote:
>> From: Colin Ian King 
>>
>> There is a spelling mistake in a pr_debug message. Fix it.
>>
>> Signed-off-by: Colin Ian King 

Applied to amd-staging-drm-next.

Thanks,
  Felix


>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 4cc2539bed5b..e4ce97ab6e26 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -2286,7 +2286,7 @@ svm_range_restore_pages(struct amdgpu_device
>> *adev, unsigned int pasid,
>>   }
>>   prange = svm_range_create_unregistered_range(adev, p, mm,
>> addr);
>>   if (!prange) {
>> -    pr_debug("failed to create unregisterd range svms 0x%p
>> address [0x%llx]\n",
>> +    pr_debug("failed to create unregistered range svms 0x%p
>> address [0x%llx]\n",
>>    svms, addr);
>>   mmap_write_downgrade(mm);
>>   r = -EFAULT;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: fix double free device pgmap resource

2021-04-26 Thread Felix Kuehling
Am 2021-04-26 um 2:41 p.m. schrieb Philip Yang:
> Use devm_memunmap_pages instead of memunmap_pages to release pgmap
> and remove pgmap from device action, to avoid double free pgmap when
> unloading driver module.
>
> Release device memory region if failed to create device memory pages
> structure.
>
> Signed-off-by: Philip Yang 

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index a66b67083d83..6b810863f6ba 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -912,6 +912,8 @@ int svm_migrate_init(struct amdgpu_device *adev)
>   r = devm_memremap_pages(adev->dev, pgmap);
>   if (IS_ERR(r)) {
>   pr_err("failed to register HMM device memory\n");
> + devm_release_mem_region(adev->dev, res->start,
> + res->end - res->start + 1);
>   return PTR_ERR(r);
>   }
>  
> @@ -927,5 +929,9 @@ int svm_migrate_init(struct amdgpu_device *adev)
>  
>  void svm_migrate_fini(struct amdgpu_device *adev)
>  {
> - memunmap_pages(&adev->kfd.dev->pgmap);
> + struct dev_pagemap *pgmap = &adev->kfd.dev->pgmap;
> +
> + devm_memunmap_pages(adev->dev, pgmap);
> + devm_release_mem_region(adev->dev, pgmap->range.start,
> + pgmap->range.end - pgmap->range.start + 1);
>  }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: fix double free device pgmap resource

2021-04-26 Thread Philip Yang
Use devm_memunmap_pages instead of memunmap_pages to release pgmap
and remove pgmap from device action, to avoid double free pgmap when
unloading driver module.

Release device memory region if failed to create device memory pages
structure.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index a66b67083d83..6b810863f6ba 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -912,6 +912,8 @@ int svm_migrate_init(struct amdgpu_device *adev)
r = devm_memremap_pages(adev->dev, pgmap);
if (IS_ERR(r)) {
pr_err("failed to register HMM device memory\n");
+   devm_release_mem_region(adev->dev, res->start,
+   res->end - res->start + 1);
return PTR_ERR(r);
}
 
@@ -927,5 +929,9 @@ int svm_migrate_init(struct amdgpu_device *adev)
 
 void svm_migrate_fini(struct amdgpu_device *adev)
 {
-   memunmap_pages(&adev->kfd.dev->pgmap);
+   struct dev_pagemap *pgmap = &adev->kfd.dev->pgmap;
+
+   devm_memunmap_pages(adev->dev, pgmap);
+   devm_release_mem_region(adev->dev, pgmap->range.start,
+   pgmap->range.end - pgmap->range.start + 1);
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/3] drm/color: Add RGB Color encodings

2021-04-26 Thread Ville Syrjälä
On Mon, Apr 26, 2021 at 01:38:50PM -0400, Harry Wentland wrote:
> From: Bhawanpreet Lakha 
> 
> Add the following color encodings
> - RGB versions for BT601, BT709, BT2020
> - DCI-P3: Used for digital movies
> 
> Signed-off-by: Bhawanpreet Lakha 
> Signed-off-by: Harry Wentland 
> ---
>  drivers/gpu/drm/drm_color_mgmt.c | 4 
>  include/drm/drm_color_mgmt.h | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_color_mgmt.c 
> b/drivers/gpu/drm/drm_color_mgmt.c
> index bb14f488c8f6..a183ebae2941 100644
> --- a/drivers/gpu/drm/drm_color_mgmt.c
> +++ b/drivers/gpu/drm/drm_color_mgmt.c
> @@ -469,6 +469,10 @@ static const char * const color_encoding_name[] = {
>   [DRM_COLOR_YCBCR_BT601] = "ITU-R BT.601 YCbCr",
>   [DRM_COLOR_YCBCR_BT709] = "ITU-R BT.709 YCbCr",
>   [DRM_COLOR_YCBCR_BT2020] = "ITU-R BT.2020 YCbCr",
> + [DRM_COLOR_RGB_BT601] = "ITU-R BT.601 RGB",
> + [DRM_COLOR_RGB_BT709] = "ITU-R BT.709 RGB",
> + [DRM_COLOR_RGB_BT2020] = "ITU-R BT.2020 RGB",
> + [DRM_COLOR_P3] = "DCI-P3",

These are a totally different thing than the YCbCr stuff.
The YCbCr stuff just specifies the YCbCr<->RGB converison matrix,
whereas these are I guess supposed to specify the primaries/whitepoint?
But without specifying what we're converting *to* these mean absolutely
nothing. Ie. I don't think they belong in this property.

The previous proposals around this topic have suggested a new
property to specify a conversion matrix either explicitly, or
via a separate enum (which would specify both the src and dst
colorspaces). I've always argued the enum approach is needed
anyway since not all hardware has a programmable matrix for
this. But a fully programmable matrix could be nice for tone
mapping purposes/etc, so we may want to make sure both are
possible.

As for the transfer func, the proposals so far have mostly just
been to expose a programmable degamma/gamma LUTs for each plane.
But considering how poor the current gamma uapi is we've thrown
around some ideas how to allow the kernel to properly expose the
hw capabilities. This is one of those ideas:
https://lists.freedesktop.org/archives/dri-devel/2019-April/212886.html
I think that basic idea could be also be extended to allow fixed
curves in case the hw doesn't have a fully programmable LUT. But
dunno if that's relevant for your hw.

-- 
Ville Syrjälä
Intel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/dp_mst: Use the correct DPCD space in Synaptics quirk

2021-04-26 Thread Lyude Paul
mhhh, probably should do this a bit differently. Also adding Koba since
this involves using extended DPCD caps in the MST topology mgr.

On Fri, 2021-04-23 at 16:05 -0400, Nikola Cornij wrote:
> [why]
> Two conditions that were part of this fix did not go through:
> 
> 1. DPCD revision has to be v1.4 and up
>    This was because wrong DPCD space was used to get the values
> 
> 2. Downstream port must not be VGA converter
>    This was because for MST the topology manager AUX has to be used,
>    due to the way MST AUX reads are done.
> 
> [how]
> - Use Extended Receiver Capability DPCD space if
> DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set
> - Use MST topology manager AUX to get port DPCD
> 
> Signed-off-by: Nikola Cornij 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 33 ---
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index de5124ce42cb..69fd16ce2cb3 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -5878,18 +5878,35 @@ struct drm_dp_aux *drm_dp_mst_dsc_aux_for_port(struct
> drm_dp_mst_port *port)
> return NULL;
>  
> if (drm_dp_has_quirk(&desc, DP_DPCD_QUIRK_DSC_WITHOUT_VIRTUAL_DPCD) &&
> -   port->mgr->dpcd[DP_DPCD_REV] >= DP_DPCD_REV_14 &&
>     port->parent == port->mgr->mst_primary) {
> -   u8 downstreamport;
> +   u8 training_aux_rd_interval = 0;
> +   u8 dpcd_rev = 0;
> +   unsigned int dpcd_caps_offset = 0;
>  
> -   if (drm_dp_dpcd_read(&port->aux, DP_DOWNSTREAMPORT_PRESENT,
> -    &downstreamport, 1) < 0)
> +   if (drm_dp_dpcd_read(port->mgr->aux,
> DP_TRAINING_AUX_RD_INTERVAL,
> +    &training_aux_rd_interval, 1) < 1)
> return NULL;
>  
> -   if ((downstreamport & DP_DWN_STRM_PORT_PRESENT) &&
> -  ((downstreamport & DP_DWN_STRM_PORT_TYPE_MASK)
> -    != DP_DWN_STRM_PORT_TYPE_ANALOG))
> -   return port->mgr->aux;
> +   /* If DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is set, the
> Extended Receiver Capability field has to be used */
> +   if (training_aux_rd_interval &
> DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT)
> +   dpcd_caps_offset = 0x02200;

If we need to read the extended DPCD caps then we should give another go at
teaching the MST helpers how to read them by default instead of hacking around
it like this. We attempted to do this in the past:

https://patchwork.kernel.org/project/dri-devel/patch/20200911034431.29059-1-koba...@canonical.com/

But it ended up causing issues that I didn't have the time to look into, so we
had to revert it:

https://patchwork.freedesktop.org/series/83408/

Looking at this now I think the proper solution here shouldn't be very
difficult. In order to make it so drm_dp_mst_topology_mgr uses
drm_dp_dpcd_read_caps() we just need to make it so that drivers need to
provide their maximum link rate and lane count when setting up
drm_dp_mst_topology_set_mst(). From there, we can convert
drm_dp_mst_topology_mgr to drm_dp_dpcd_read_caps() and simply limit the
maximum link rate/lane count we cache in mgr->dpcd to the max link rate/lane
count limitations provided by the DRM driver.

Would you write up some patches to do this instead?

> +
> +   if (drm_dp_dpcd_read(port->mgr->aux, dpcd_caps_offset +
> DP_DPCD_REV,
> +    &dpcd_rev, 1) < 1)
> +   return NULL;
> +
> +   if (dpcd_rev >= DP_DPCD_REV_14) {
> +   u8 downstreamport = 0;
> +
> +   if (drm_dp_dpcd_read(port->mgr->aux, dpcd_caps_offset
> + DP_DOWNSTREAMPORT_PRESENT,
> +    &downstreamport, 1) < 1)
> +   return NULL;
> +
> +   if ((downstreamport & DP_DWN_STRM_PORT_PRESENT) &&
> +  ((downstreamport & DP_DWN_STRM_PORT_TYPE_MASK)
> +    != DP_DWN_STRM_PORT_TYPE_ANALOG))
> +   return port->mgr->aux;
> +   }
> }
>  
> /*

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[RFC PATCH 1/3] drm/color: Add RGB Color encodings

2021-04-26 Thread Harry Wentland
From: Bhawanpreet Lakha 

Add the following color encodings
- RGB versions for BT601, BT709, BT2020
- DCI-P3: Used for digital movies

Signed-off-by: Bhawanpreet Lakha 
Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/drm_color_mgmt.c | 4 
 include/drm/drm_color_mgmt.h | 4 
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index bb14f488c8f6..a183ebae2941 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -469,6 +469,10 @@ static const char * const color_encoding_name[] = {
[DRM_COLOR_YCBCR_BT601] = "ITU-R BT.601 YCbCr",
[DRM_COLOR_YCBCR_BT709] = "ITU-R BT.709 YCbCr",
[DRM_COLOR_YCBCR_BT2020] = "ITU-R BT.2020 YCbCr",
+   [DRM_COLOR_RGB_BT601] = "ITU-R BT.601 RGB",
+   [DRM_COLOR_RGB_BT709] = "ITU-R BT.709 RGB",
+   [DRM_COLOR_RGB_BT2020] = "ITU-R BT.2020 RGB",
+   [DRM_COLOR_P3] = "DCI-P3",
 };
 
 static const char * const color_range_name[] = {
diff --git a/include/drm/drm_color_mgmt.h b/include/drm/drm_color_mgmt.h
index 81c298488b0c..3043dd73480c 100644
--- a/include/drm/drm_color_mgmt.h
+++ b/include/drm/drm_color_mgmt.h
@@ -78,6 +78,10 @@ enum drm_color_encoding {
DRM_COLOR_YCBCR_BT601,
DRM_COLOR_YCBCR_BT709,
DRM_COLOR_YCBCR_BT2020,
+   DRM_COLOR_RGB_BT601,
+   DRM_COLOR_RGB_BT709,
+   DRM_COLOR_RGB_BT2020,
+   DRM_COLOR_P3,
DRM_COLOR_ENCODING_MAX,
 };
 
-- 
2.31.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[RFC PATCH 2/3] drm/color: Add Color transfer functions for HDR/SDR

2021-04-26 Thread Harry Wentland
From: Bhawanpreet Lakha 

Due to the way displays and human vision work it is most effective to
encode luminance information in a non-linear space.

For SDR this non-linear mapping is assumed to roughly use a gamma 2.2 curve.
This was due to the way CRTs worked and was fine for SDR content with a
low luminance range.

The large luminance range (0-10,000 nits) for HDR exposes some
short-comings of a simple gamma curve that have been addressed
through various Electro-Optical Transfer Functions (EOTFs).

Rather than assuming how framebuffer content is encoded we want to
make sure userspace presenting HDR content is explicit about the
EOTF of the content, so a driver can decide whether the content
can be supported or not.

This Patch adds common color transfer functions for SDR/HDR. These can
be used to communicate with the driver regarding the transformation to
use for a given plane.

enums added:
DRM_COLOR_TF_SRGB
roughly 2.4 gamma with initial linear section
DRM_COLOR_TF_BT709
Similar to Gamma 2.2-2.8
DRM_COLOR_TF_PQ2084
most common tf used for HDR video (HDR10/Dolby). Can support up
to 10,000 nits

The usage is similar to color_encoding and color_range where the driver
can specify the default and supported tfs and pass it into
drm_plane_create_color_properties().

Signed-off-by: Bhawanpreet Lakha 
Signed-off-by: Harry Wentland 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  4 +-
 .../gpu/drm/arm/display/komeda/komeda_plane.c |  4 +-
 drivers/gpu/drm/arm/malidp_planes.c   |  4 +-
 drivers/gpu/drm/armada/armada_overlay.c   |  4 +-
 drivers/gpu/drm/drm_atomic_uapi.c |  4 ++
 drivers/gpu/drm/drm_color_mgmt.c  | 63 +--
 drivers/gpu/drm/i915/display/intel_sprite.c   |  4 +-
 .../drm/i915/display/skl_universal_plane.c|  4 +-
 drivers/gpu/drm/nouveau/dispnv04/overlay.c|  4 +-
 drivers/gpu/drm/omapdrm/omap_plane.c  |  4 +-
 drivers/gpu/drm/sun4i/sun8i_vi_layer.c|  4 +-
 drivers/gpu/drm/tidss/tidss_plane.c   |  6 +-
 include/drm/drm_color_mgmt.h  | 15 -
 include/drm/drm_plane.h   | 16 +
 14 files changed, 124 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6985362c367d..28481540327f 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7127,7 +7127,9 @@ static int amdgpu_dm_plane_init(struct 
amdgpu_display_manager *dm,
BIT(DRM_COLOR_YCBCR_BT2020),
BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
BIT(DRM_COLOR_YCBCR_FULL_RANGE),
-   DRM_COLOR_YCBCR_BT709, DRM_COLOR_YCBCR_LIMITED_RANGE);
+   BIT(DRM_COLOR_TF_SRGB),
+   DRM_COLOR_YCBCR_BT709, DRM_COLOR_YCBCR_LIMITED_RANGE,
+   DRM_COLOR_TF_SRGB);
}
 
supported_rotations =
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_plane.c 
b/drivers/gpu/drm/arm/display/komeda/komeda_plane.c
index 2d5066ea270c..eb8ccf805408 100644
--- a/drivers/gpu/drm/arm/display/komeda/komeda_plane.c
+++ b/drivers/gpu/drm/arm/display/komeda/komeda_plane.c
@@ -299,8 +299,10 @@ static int komeda_plane_add(struct komeda_kms_dev *kms,
BIT(DRM_COLOR_YCBCR_BT2020),
BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) |
BIT(DRM_COLOR_YCBCR_FULL_RANGE),
+   BIT(DRM_COLOR_TF_SRGB),
DRM_COLOR_YCBCR_BT601,
-   DRM_COLOR_YCBCR_LIMITED_RANGE);
+   DRM_COLOR_YCBCR_LIMITED_RANGE,
+   DRM_COLOR_TF_SRGB);
if (err)
goto cleanup;
 
diff --git a/drivers/gpu/drm/arm/malidp_planes.c 
b/drivers/gpu/drm/arm/malidp_planes.c
index 351a85088d0e..0d77a2c0c829 100644
--- a/drivers/gpu/drm/arm/malidp_planes.c
+++ b/drivers/gpu/drm/arm/malidp_planes.c
@@ -1020,7 +1020,9 @@ int malidp_de_planes_init(struct drm_device *drm)
BIT(DRM_COLOR_YCBCR_BT2020),
BIT(DRM_COLOR_YCBCR_LIMITED_RANGE) | \
BIT(DRM_COLOR_YCBCR_FULL_RANGE),
-   enc, range);
+   BIT(DRM_COLOR_SRGB),
+   enc, range,
+   DRM_COLOR_SRGB);
if (!ret)
/* program the HW registers */
malidp_de_set_color_encoding(plane, enc, range);
diff --git a/drivers/gpu/drm/armada/armada_overlay.c 
b/drivers/gpu/drm/armada/armada_overlay.c
index 6346b890279a..ad3a743242f0 100644
--- a/drivers/gpu/drm/armada/arm

[RFC PATCH 3/3] drm/color: Add sdr boost property

2021-04-26 Thread Harry Wentland
From: Bhawanpreet Lakha 

SDR is typically mastered at 200 nits and HDR is mastered at up to 10,000
nits. Due to this luminance range difference if we blend a SDR and
HDR plane together, we can run into problems where the HDR plane is too
bright or the SDR plane is too dim

A common solution to this problem is to boost the SDR plane so that its
not too dim.

This patch introduces a "sdr_white_level" property, this property can be
used by the userspace to boost the SDR content luminance. The boost
value is a explicit luiminance value in nits. This allows the userspace
to set the maximum white level for the SDR plane.

Signed-off-by: Bhawanpreet Lakha 
Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/drm_atomic_uapi.c |  4 
 drivers/gpu/drm/drm_color_mgmt.c  | 17 +
 include/drm/drm_color_mgmt.h  |  6 ++
 include/drm/drm_plane.h   | 14 ++
 4 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index ea95c1224253..b3b6de7b74aa 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -597,6 +597,8 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_range = val;
} else if (property == plane->color_tf_property) {
state->color_tf = val;
+   } else if (property == plane->sdr_white_level_property) {
+   state->sdr_white_level = val;
} else if (property == config->prop_fb_damage_clips) {
ret = drm_atomic_replace_property_blob_from_id(dev,
&state->fb_damage_clips,
@@ -665,6 +667,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_range;
} else if (property == plane->color_tf_property) {
*val = state->color_tf;
+   } else if (property == plane->sdr_white_level_property) {
+   *val = state->sdr_white_level;
} else if (property == config->prop_fb_damage_clips) {
*val = (state->fb_damage_clips) ?
state->fb_damage_clips->base.id : 0;
diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
index 2404b07046c5..a0b77d7d0565 100644
--- a/drivers/gpu/drm/drm_color_mgmt.c
+++ b/drivers/gpu/drm/drm_color_mgmt.c
@@ -519,6 +519,23 @@ const char *drm_get_color_range_name(enum drm_color_range 
range)
return color_range_name[range];
 }
 
+int drm_plane_create_sdr_white_level_property(struct drm_plane *plane){
+
+   struct drm_property *prop;
+
+   prop = drm_property_create_range(plane->dev, 0, "SDR_WHITE_LEVEL", 0, 
UINT_MAX);
+
+   if (!prop)
+   return -ENOMEM;
+
+   plane->sdr_white_level_property = prop;
+   drm_object_attach_property(&plane->base, prop, 
DRM_DEFAULT_SDR_WHITE_LEVEL);
+
+   if (plane->state)
+   plane->state->sdr_white_level = DRM_DEFAULT_SDR_WHITE_LEVEL;
+
+   return 0;
+}
 /**
  * drm_get_color_transfer_function - return a string for color transfer 
function
  * @tf: transfer function to compute name of
diff --git a/include/drm/drm_color_mgmt.h b/include/drm/drm_color_mgmt.h
index f59806366855..a020346b1747 100644
--- a/include/drm/drm_color_mgmt.h
+++ b/include/drm/drm_color_mgmt.h
@@ -26,6 +26,12 @@
 #include 
 #include 
 
+/**
+ * Default SDR white level in nits. Although there is no standard SDR nit 
level, 200
+ * is chosen as the default since that is the generally accepted value.
+ */
+#define DRM_DEFAULT_SDR_WHITE_LEVEL 200
+
 struct drm_crtc;
 struct drm_plane;
 
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index c85c59501a7a..fad8b7dd430c 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -187,6 +187,11 @@ struct drm_plane_state {
 * format for a proper HDR color/luminance output.
 */
enum drm_color_transfer_function color_tf;
+   /**
+* @sdr_white_level:
+* SDR white level boost for HDR+SDR multi plane usecases. max white 
level in nits
+*/
+   unsigned int sdr_white_level;
/**
 * @fb_damage_clips:
 *
@@ -757,6 +762,15 @@ struct drm_plane {
 * See drm_plane_create_color_properties().
 */
struct drm_property *color_tf_property;
+   /**
+* @sdr_white_level:
+*
+* Optional sdr_white_level. When HDR and SDR are combined in multi 
plane
+* overlay cases, the sdr plane will be very dim. This property allows
+* the driver to boost the sdr plane's white level. The value should be
+* max white level in nits.
+*/
+   struct drm_property *sdr_white_level_property;
 
/**
 * @scaling_filter_property: property to apply a particular filter while
-- 
2.31.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/

[RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-04-26 Thread Harry Wentland


## Introduction

We are looking to enable HDR support for a couple of single-plane and 
multi-plane scenarios. To do this effectively we recommend new interfaces to 
drm_plane. Below I'll give a bit of background on HDR and why we propose these 
interfaces.


## Defining a pixel's luminance

Currently the luminance space of pixels in a framebuffer/plane presented to the 
display is not well defined. It's usually assumed to be in a 2.2 or 2.4 gamma 
space and has no mapping to an absolute luminance value but is interpreted in 
relative terms.

Luminance can be measured and described in absolute terms as candela per meter 
squared, or cd/m2, or nits. Even though a pixel value can be mapped to 
luminance in a linear fashion to do so without losing a lot of detail requires 
16-bpc color depth. The reason for this is that human perception can 
distinguish roughly between a 0.5-1% luminance delta. A linear representation 
is suboptimal, wasting precision in the highlights and losing precision in the 
shadows.

A gamma curve is a decent approximation to a human's perception of luminance, 
but the PQ (perceptual quantizer) function [1] improves on it. It also defines 
the luminance values in absolute terms, with the highest value being 10,000 
nits and the lowest 0.0005 nits.

Using a content that's defined in PQ space we can approximate the real world in 
a much better way.

Here are some examples of real-life objects and their approximate luminance 
values:

| Object| Luminance in nits |
| - | - |
| Sun   | 1.6 million   |
| Fluorescent light | 10,000|
| Highlights| 1,000 - sunlight  |
| White Objects | 250 - 1,000   |
| Typical objects   | 1 - 250   |
| Shadows   | 0.01 - 1  |
| Ultra Blacks  | 0 - 0.0005|


## Describing the luminance space

**We propose a new drm_plane property to describe the Eletro-Optical Transfer 
Function (EOTF) with which its framebuffer was composed.** Examples of EOTF are:

| EOTF  | Description   
|
| - 
|:- |
| Gamma 2.2 | a simple 2.2 gamma
|
| sRGB  | 2.4 gamma with small initial linear section   
|
| PQ 2084   | SMPTE ST 2084; used for HDR video and allows for up to 10,000 nit 
support |
| Linear| Linear relationship between pixel value and luminance value   
|


## Mastering Luminances

Now we are able to use the PQ 2084 EOTF to define the luminance of pixels in 
absolute terms. Unfortunately we're again presented with physical limitations 
of the display technologies on the market today. Here are a few examples of 
luminance ranges of displays.

| Display  | Luminance range in nits |
|  | --- |
| Typical PC display   | 0.3 - 200   |
| Excellent LCD HDTV   | 0.3 - 400   |
| HDR LCD w/ local dimming | 0.05 - 1,500|

Since no display can currently show the full 0.0005 to 10,000 nits luminance 
range the display will need to tonemap the HDR content, i.e to fit the content 
within a display's capabilities. To assist with tonemapping HDR content is 
usually accompanied with a metadata that describes (among other things) the 
minimum and maximum mastering luminance, i.e. the maximum and minimum luminance 
of the display that was used to master the HDR content.

The HDR metadata is currently defined on the drm_connector via the 
hdr_output_metadata blob property.

It might be useful to define per-plane hdr metadata, as different planes might 
have been mastered differently.


## SDR Luminance

Since SDR covers a smaller luminance range than HDR, an SDR plane might look 
dark when blended with HDR content. Since the max HDR luminance can be quite 
variable (200-1,500 nits on actual displays) it is best to make the SDR maximum 
luminance value configurable.

**We propose a drm_plane property to specfy the desired maximum luminance of 
the SDR plane in nits.** This allows us to map the SDR content predictably into 
HDR's absolute luminance space.


## Let There Be Color

So far we've only talked about luminance, ignoring colors altogether. Just like 
in the luminance space, traditionally the color space of display outputs has 
not been well defined. Similar to how an EOTF defines a mapping of pixel data 
to an absolute luminance value, the color space maps color information for each 
pixel onto the CIE 1931 chromaticity space. This can be thought of as a mapping 
to an absolute, real-life, color value.

A color space is defined by its primaries and white point. The primaries and 
white point are expressed as coordinates in the CIE 1931 color space. Think of 
the red primary as the reddest red that can be displayed withi

Re: [PATCH] drm/amdgpu: restructure amdgpu_vram_mgr_new

2021-04-26 Thread Felix Kuehling

Am 2021-04-26 um 12:12 p.m. schrieb Christian König:
>
>
> Am 26.04.21 um 18:02 schrieb Felix Kuehling:
>> Am 2021-04-26 um 4:54 a.m. schrieb Christian König:
>>> Merge the two loops, loosen the restriction for big allocations.
>>> This reduces the CPU overhead in the good case, but increases
>>> it a bit under memory pressure.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 58
>>> +---
>>>   1 file changed, 27 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 529c5c32a205..e2cbe19404c0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -358,13 +358,13 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>>  const struct ttm_place *place,
>>>  struct ttm_resource *mem)
>>>   {
>>> +    unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
>>>   struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>>>   struct amdgpu_device *adev = to_amdgpu_device(mgr);
>>> +    uint64_t vis_usage = 0, mem_bytes, max_bytes;
>>>   struct drm_mm *mm = &mgr->mm;
>>> -    struct drm_mm_node *nodes;
>>>   enum drm_mm_insert_mode mode;
>>> -    unsigned long lpfn, num_nodes, pages_per_node, pages_left;
>>> -    uint64_t vis_usage = 0, mem_bytes, max_bytes;
>>> +    struct drm_mm_node *nodes;
>>>   unsigned i;
>>>   int r;
>>>   @@ -391,9 +391,10 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>>   pages_per_node = HPAGE_PMD_NR;
>>>   #else
>>>   /* default to 2MB */
>>> -    pages_per_node = (2UL << (20UL - PAGE_SHIFT));
>>> +    pages_per_node = 2UL << (20UL - PAGE_SHIFT);
>>>   #endif
>>> -    pages_per_node = max((uint32_t)pages_per_node,
>>> mem->page_alignment);
>>> +    pages_per_node = max_t(uint32_t, pages_per_node,
>>> +   mem->page_alignment);
>>>   num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
>>>   }
>>>   @@ -411,42 +412,37 @@ static int amdgpu_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>>   mem->start = 0;
>>>   pages_left = mem->num_pages;
>>>   -    spin_lock(&mgr->lock);
>>> -    for (i = 0; pages_left >= pages_per_node; ++i) {
>>> -    unsigned long pages = rounddown_pow_of_two(pages_left);
>>> -
>>> -    /* Limit maximum size to 2GB due to SG table limitations */
>>> -    pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
>>> +    /* Limit maximum size to 2GB due to SG table limitations */
>>> +    pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>>>   -    r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>>> -    pages_per_node, 0,
>>> -    place->fpfn, lpfn,
>>> -    mode);
>>> -    if (unlikely(r))
>>> -    break;
>>> -
>>> -    vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>>> -    amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>> -    pages_left -= pages;
>>> -    }
>>> -
>>> -    for (; pages_left; ++i) {
>>> -    unsigned long pages = min(pages_left, pages_per_node);
>>> +    i = 0;
>>> +    spin_lock(&mgr->lock);
>>> +    while (pages_left) {
>>>   uint32_t alignment = mem->page_alignment;
>>>   -    if (pages == pages_per_node)
>>> +    if (pages >= pages_per_node)
>>>   alignment = pages_per_node;
>>>   -    r = drm_mm_insert_node_in_range(mm, &nodes[i],
>>> -    pages, alignment, 0,
>>> -    place->fpfn, lpfn,
>>> -    mode);
>>> -    if (unlikely(r))
>>> +    r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>>> alignment,
>>> +    0, place->fpfn, lpfn, mode);
>>> +    if (unlikely(r)) {
>>> +    if (pages > pages_per_node) {
>> This means we can never allocate chunks smaller than 2MB, except for the
>> tail. And the tail still needs to be allocated in one piece if it's <
>> 2MB.
>
> Correct, but that was the behavior before as well.
>
>> On the other hand, we should not allow allocations smaller than
>> mem->page_alignment, except for the tail. So should this condition be
>> "if (pages > mem->page_alignment)" to allow maximum flexibility for
>> allocations without physical alignment constraints when memory is very
>> fragmented?
>
> See a few lines above:
>
> pages_per_node = max_t(uint32_t, pages_per_node, page_alignment);
>
> So pages_per_node is always larger than page_alignment and we actually
> can't allocate less than pages_per_node in one allocation or we would
> overflow the nodes array.

Makes sense. The patch is

Reviewed-by: Felix Kuehling 



>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> +    if (is_power_of_2(pages))
>>> +    pages = pages 

Re: [PATCH] drm/amdgpu: restructure amdgpu_vram_mgr_new

2021-04-26 Thread Christian König



Am 26.04.21 um 18:02 schrieb Felix Kuehling:

Am 2021-04-26 um 4:54 a.m. schrieb Christian König:

Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 58 +---
  1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 529c5c32a205..e2cbe19404c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -358,13 +358,13 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
   const struct ttm_place *place,
   struct ttm_resource *mem)
  {
+   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct amdgpu_device *adev = to_amdgpu_device(mgr);
+   uint64_t vis_usage = 0, mem_bytes, max_bytes;
struct drm_mm *mm = &mgr->mm;
-   struct drm_mm_node *nodes;
enum drm_mm_insert_mode mode;
-   unsigned long lpfn, num_nodes, pages_per_node, pages_left;
-   uint64_t vis_usage = 0, mem_bytes, max_bytes;
+   struct drm_mm_node *nodes;
unsigned i;
int r;
  
@@ -391,9 +391,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,

pages_per_node = HPAGE_PMD_NR;
  #else
/* default to 2MB */
-   pages_per_node = (2UL << (20UL - PAGE_SHIFT));
+   pages_per_node = 2UL << (20UL - PAGE_SHIFT);
  #endif
-   pages_per_node = max((uint32_t)pages_per_node, 
mem->page_alignment);
+   pages_per_node = max_t(uint32_t, pages_per_node,
+  mem->page_alignment);
num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
}
  
@@ -411,42 +412,37 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,

mem->start = 0;
pages_left = mem->num_pages;
  
-	spin_lock(&mgr->lock);

-   for (i = 0; pages_left >= pages_per_node; ++i) {
-   unsigned long pages = rounddown_pow_of_two(pages_left);
-
-   /* Limit maximum size to 2GB due to SG table limitations */
-   pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
+   /* Limit maximum size to 2GB due to SG table limitations */
+   pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
  
-		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,

-   pages_per_node, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
-   break;
-
-   vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
-   amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
-   pages_left -= pages;
-   }
-
-   for (; pages_left; ++i) {
-   unsigned long pages = min(pages_left, pages_per_node);
+   i = 0;
+   spin_lock(&mgr->lock);
+   while (pages_left) {
uint32_t alignment = mem->page_alignment;
  
-		if (pages == pages_per_node)

+   if (pages >= pages_per_node)
alignment = pages_per_node;
  
-		r = drm_mm_insert_node_in_range(mm, &nodes[i],

-   pages, alignment, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
+   r = drm_mm_insert_node_in_range(mm, &nodes[i], pages, alignment,
+   0, place->fpfn, lpfn, mode);
+   if (unlikely(r)) {
+   if (pages > pages_per_node) {

This means we can never allocate chunks smaller than 2MB, except for the
tail. And the tail still needs to be allocated in one piece if it's < 2MB.


Correct, but that was the behavior before as well.


On the other hand, we should not allow allocations smaller than
mem->page_alignment, except for the tail. So should this condition be
"if (pages > mem->page_alignment)" to allow maximum flexibility for
allocations without physical alignment constraints when memory is very
fragmented?


See a few lines above:

pages_per_node = max_t(uint32_t, pages_per_node, page_alignment);

So pages_per_node is always larger than page_alignment and we actually 
can't allocate less than pages_per_node in one allocation or we would 
overflow the nodes array.


Regards,
Christian.



Regards,
   Felix



+   if (is_power_of_2(pages))
+   pages = pages / 2;
+   else
+   

Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Christian König



Am 26.04.21 um 17:30 schrieb Wang, Kevin(Yang):


[AMD Official Use Only - Internal Distribution Only]





*From:* amd-gfx  on behalf of 
Roy Sun 

*Sent:* Monday, April 26, 2021 2:27 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Sun, Roy ; Nieto, David M 
*Subject:* [PATCH 1/2] drm/scheduler: Change scheduled fence track
Update the timestamp of scheduled fence on HW
completion of the previous fences

This allow more accurate tracking of the fence
execution in HW

Signed-off-by: David M Nieto 
Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 92d8de24d0a1..f8e39ab0c41b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct 
drm_gpu_scheduler *sched)

 EXPORT_SYMBOL(drm_sched_resubmit_jobs);

 /**
- * drm_sched_resubmit_jobs_ext - helper to relunch certain number of 
jobs from mirror ring list
+ * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of 
jobs from pending list

  *
  * @sched: scheduler instance
  * @max: job numbers to relaunch
@@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler 
*sched)

 static struct drm_sched_job *
 drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 {
-   struct drm_sched_job *job;
+   struct drm_sched_job *job, *next;

 /*
  * Don't destroy jobs while the timeout worker is running  OR 
thread
@@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct 
drm_gpu_scheduler *sched)

 if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 /* remove job from pending_list */
 list_del_init(&job->list);
We just need to record the scheduled time of the next job. So we
need not to check the rest job.

[kevin]:
ok, it is fine for me with the timestamp flag check.
Reviewed-by: Kevin Wang 


Actually please drop that extra check.

The timestamp is guaranteed to be set on the next job or otherwise we 
wouldn't got here in the first place.


I've considered dropping the flag for quite a while and don't want any 
new users of this.


Christian.



+   /* account for the next fence in the queue */
+   next = list_first_entry_or_null(&sched->pending_list,
+   struct drm_sched_job, list);
+   if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+ &job->s_fence->finished.flags)) {
+ next->s_fence->scheduled.timestamp =
+ job->s_fence->finished.timestamp;
+   }
 } else {
 job = NULL;
 /* queue timeout for next job */
--
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CKevin1.Wang%40amd.com%7C0cebaf8d37e144c6b82108d9087c502e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637550152295564379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Hdiil9BC2sp2pUI1121yZWELoCQqhDqTnbr7E9oVutw%3D&reserved=0 



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: restructure amdgpu_vram_mgr_new

2021-04-26 Thread Felix Kuehling

Am 2021-04-26 um 4:54 a.m. schrieb Christian König:
> Merge the two loops, loosen the restriction for big allocations.
> This reduces the CPU overhead in the good case, but increases
> it a bit under memory pressure.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 58 +---
>  1 file changed, 27 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 529c5c32a205..e2cbe19404c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -358,13 +358,13 @@ static int amdgpu_vram_mgr_new(struct 
> ttm_resource_manager *man,
>  const struct ttm_place *place,
>  struct ttm_resource *mem)
>  {
> + unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
>   struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>   struct amdgpu_device *adev = to_amdgpu_device(mgr);
> + uint64_t vis_usage = 0, mem_bytes, max_bytes;
>   struct drm_mm *mm = &mgr->mm;
> - struct drm_mm_node *nodes;
>   enum drm_mm_insert_mode mode;
> - unsigned long lpfn, num_nodes, pages_per_node, pages_left;
> - uint64_t vis_usage = 0, mem_bytes, max_bytes;
> + struct drm_mm_node *nodes;
>   unsigned i;
>   int r;
>  
> @@ -391,9 +391,10 @@ static int amdgpu_vram_mgr_new(struct 
> ttm_resource_manager *man,
>   pages_per_node = HPAGE_PMD_NR;
>  #else
>   /* default to 2MB */
> - pages_per_node = (2UL << (20UL - PAGE_SHIFT));
> + pages_per_node = 2UL << (20UL - PAGE_SHIFT);
>  #endif
> - pages_per_node = max((uint32_t)pages_per_node, 
> mem->page_alignment);
> + pages_per_node = max_t(uint32_t, pages_per_node,
> +mem->page_alignment);
>   num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
>   }
>  
> @@ -411,42 +412,37 @@ static int amdgpu_vram_mgr_new(struct 
> ttm_resource_manager *man,
>   mem->start = 0;
>   pages_left = mem->num_pages;
>  
> - spin_lock(&mgr->lock);
> - for (i = 0; pages_left >= pages_per_node; ++i) {
> - unsigned long pages = rounddown_pow_of_two(pages_left);
> -
> - /* Limit maximum size to 2GB due to SG table limitations */
> - pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
> + /* Limit maximum size to 2GB due to SG table limitations */
> + pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>  
> - r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
> - pages_per_node, 0,
> - place->fpfn, lpfn,
> - mode);
> - if (unlikely(r))
> - break;
> -
> - vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
> - amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
> - pages_left -= pages;
> - }
> -
> - for (; pages_left; ++i) {
> - unsigned long pages = min(pages_left, pages_per_node);
> + i = 0;
> + spin_lock(&mgr->lock);
> + while (pages_left) {
>   uint32_t alignment = mem->page_alignment;
>  
> - if (pages == pages_per_node)
> + if (pages >= pages_per_node)
>   alignment = pages_per_node;
>  
> - r = drm_mm_insert_node_in_range(mm, &nodes[i],
> - pages, alignment, 0,
> - place->fpfn, lpfn,
> - mode);
> - if (unlikely(r))
> + r = drm_mm_insert_node_in_range(mm, &nodes[i], pages, alignment,
> + 0, place->fpfn, lpfn, mode);
> + if (unlikely(r)) {
> + if (pages > pages_per_node) {

This means we can never allocate chunks smaller than 2MB, except for the
tail. And the tail still needs to be allocated in one piece if it's < 2MB.

On the other hand, we should not allow allocations smaller than
mem->page_alignment, except for the tail. So should this condition be
"if (pages > mem->page_alignment)" to allow maximum flexibility for
allocations without physical alignment constraints when memory is very
fragmented?

Regards,
  Felix


> + if (is_power_of_2(pages))
> + pages = pages / 2;
> + else
> + pages = rounddown_pow_of_two(pages);
> + continue;
> + }
>   goto error;
> + }
>  
>   vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>   amdgpu_vram_mgr_virt_start(mem, &nodes[i])

Re: [PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-26 Thread Wang, Kevin(Yang)
[AMD Official Use Only - Internal Distribution Only]




From: amd-gfx  on behalf of Roy Sun 

Sent: Monday, April 26, 2021 2:27 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Sun, Roy ; Nieto, David M 
Subject: [PATCH 1/2] drm/scheduler: Change scheduled fence track

Update the timestamp of scheduled fence on HW
completion of the previous fences

This allow more accurate tracking of the fence
execution in HW

Signed-off-by: David M Nieto 
Signed-off-by: Roy Sun 
---
 drivers/gpu/drm/scheduler/sched_main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..f8e39ab0c41b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -515,7 +515,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
*sched)
 EXPORT_SYMBOL(drm_sched_resubmit_jobs);

 /**
- * drm_sched_resubmit_jobs_ext - helper to relunch certain number of jobs from 
mirror ring list
+ * drm_sched_resubmit_jobs_ext - helper to relaunch certain number of jobs 
from pending list
  *
  * @sched: scheduler instance
  * @max: job numbers to relaunch
@@ -671,7 +671,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 static struct drm_sched_job *
 drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 {
-   struct drm_sched_job *job;
+   struct drm_sched_job *job, *next;

 /*
  * Don't destroy jobs while the timeout worker is running  OR thread
@@ -690,6 +690,14 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 /* remove job from pending_list */
 list_del_init(&job->list);
We just need to record the scheduled time of the next job. So we
need not to check the rest job.

[kevin]:
ok, it is fine for me with the timestamp flag check.
Reviewed-by: Kevin Wang 

+   /* account for the next fence in the queue */
+   next = list_first_entry_or_null(&sched->pending_list,
+   struct drm_sched_job, list);
+   if (next && test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+   &job->s_fence->finished.flags)) {
+   next->s_fence->scheduled.timestamp =
+   job->s_fence->finished.timestamp;
+   }
 } else {
 job = NULL;
 /* queue timeout for next job */
--
2.31.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CKevin1.Wang%40amd.com%7C0cebaf8d37e144c6b82108d9087c502e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637550152295564379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Hdiil9BC2sp2pUI1121yZWELoCQqhDqTnbr7E9oVutw%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/pm: Use correct typecast

2021-04-26 Thread Felix Kuehling
Please add a patch description. With that fixed, the patch is

Reviewed-by: Felix Kuehling 

Am 2021-04-26 um 9:44 a.m. schrieb Harish Kasiviswanathan:
> Signed-off-by: Harish Kasiviswanathan 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> index 1f02e4ee2909..771e2f0acdd3 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> @@ -1657,8 +1657,9 @@ static ssize_t aldebaran_get_gpu_metrics(struct 
> smu_context *smu,
>   gpu_metrics->average_mm_activity = 0;
>  
>   gpu_metrics->average_socket_power = metrics.AverageSocketPower;
> - gpu_metrics->energy_accumulator = metrics.EnergyAcc64bitHigh << 32 |
> -   metrics.EnergyAcc64bitLow;
> + gpu_metrics->energy_accumulator =
> + (uint64_t)metrics.EnergyAcc64bitHigh << 32 |
> + metrics.EnergyAcc64bitLow;
>  
>   gpu_metrics->average_gfxclk_frequency = metrics.AverageGfxclkFrequency;
>   gpu_metrics->average_socclk_frequency = metrics.AverageSocclkFrequency;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/pm: Use correct typecast

2021-04-26 Thread Harish Kasiviswanathan
Signed-off-by: Harish Kasiviswanathan 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 1f02e4ee2909..771e2f0acdd3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1657,8 +1657,9 @@ static ssize_t aldebaran_get_gpu_metrics(struct 
smu_context *smu,
gpu_metrics->average_mm_activity = 0;
 
gpu_metrics->average_socket_power = metrics.AverageSocketPower;
-   gpu_metrics->energy_accumulator = metrics.EnergyAcc64bitHigh << 32 |
- metrics.EnergyAcc64bitLow;
+   gpu_metrics->energy_accumulator =
+   (uint64_t)metrics.EnergyAcc64bitHigh << 32 |
+   metrics.EnergyAcc64bitLow;
 
gpu_metrics->average_gfxclk_frequency = metrics.AverageGfxclkFrequency;
gpu_metrics->average_socclk_frequency = metrics.AverageSocclkFrequency;
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu: Register VGA clients after init can no longer fail

2021-04-26 Thread Kai-Heng Feng
When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu :08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu :08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of :08:00.0 failed with error -110
...
kernel: amdgpu :01:00.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops:  [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: GW 5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 
12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 
90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f 
b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ae3c0246bd68 EFLAGS: 00010002
kernel: RAX:  RBX:  RCX: 
kernel: RDX: 8dd1af5a8560 RSI:  RDI: 8dce8c16
kernel: RBP: ae3c0246bd70 R08: 8dd1af5985c0 R09: ae3c0246ba38
kernel: R10: 0001 R11: 0001 R12: 0246
kernel: R13:  R14: 0003 R15: 8dce8149
kernel: FS:  7f9303d8fa40() GS:8dd1af58() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 80050033
kernel: CR2: 0018 CR3: 000103cfa000 CR4: 00350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e 
fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 
77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:7fff60e49b28 EFLAGS: 0246 ORIG_RAX: 0001
kernel: RAX: ffda RBX: 000b RCX: 7f93041e02f7
kernel: RDX: 000b RSI: 7fff60e49b40 RDI: 000f
kernel: RBP: 7fff60e49b40 R08:  R09: 7fff60e499d0
kernel: R10: 7f93049350b5 R11: 0246 R12: 56111d45e808
kernel: R13:  R14: 56111d45e7f8 R15: 56111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel 
snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq 
input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw 
k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore 
typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport 
ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor 
raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log 
hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched 
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt 
crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel 
crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci 
xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
 - Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ++
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b4ad1c055c70..7d3b54615147 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3410,19 +3410,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
/* doorbell bar mapping and doorbell index init*/
amdgpu_device_doorbell_init(adev);
 
-   /

[PATCH][next] drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"

2021-04-26 Thread Colin King
From: Colin Ian King 

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4cc2539bed5b..e4ce97ab6e26 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2286,7 +2286,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
}
prange = svm_range_create_unregistered_range(adev, p, mm, addr);
if (!prange) {
-   pr_debug("failed to create unregisterd range svms 0x%p 
address [0x%llx]\n",
+   pr_debug("failed to create unregistered range svms 0x%p 
address [0x%llx]\n",
 svms, addr);
mmap_write_downgrade(mm);
r = -EFAULT;
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH][next] drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"

2021-04-26 Thread Nirmoy

Reviewed-by: Nirmoy Das 

On 4/26/21 2:13 PM, Colin King wrote:

From: Colin Ian King 

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King 
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4cc2539bed5b..e4ce97ab6e26 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2286,7 +2286,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
}
prange = svm_range_create_unregistered_range(adev, p, mm, addr);
if (!prange) {
-   pr_debug("failed to create unregisterd range svms 0x%p 
address [0x%llx]\n",
+   pr_debug("failed to create unregistered range svms 0x%p 
address [0x%llx]\n",
 svms, addr);
mmap_write_downgrade(mm);
r = -EFAULT;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: restructure amdgpu_vram_mgr_new

2021-04-26 Thread Nirmoy

Acked-and-Tested-by: Nirmoy Das 

On 4/26/21 10:54 AM, Christian König wrote:

Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 58 +---
  1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 529c5c32a205..e2cbe19404c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -358,13 +358,13 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
   const struct ttm_place *place,
   struct ttm_resource *mem)
  {
+   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct amdgpu_device *adev = to_amdgpu_device(mgr);
+   uint64_t vis_usage = 0, mem_bytes, max_bytes;
struct drm_mm *mm = &mgr->mm;
-   struct drm_mm_node *nodes;
enum drm_mm_insert_mode mode;
-   unsigned long lpfn, num_nodes, pages_per_node, pages_left;
-   uint64_t vis_usage = 0, mem_bytes, max_bytes;
+   struct drm_mm_node *nodes;
unsigned i;
int r;
  
@@ -391,9 +391,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,

pages_per_node = HPAGE_PMD_NR;
  #else
/* default to 2MB */
-   pages_per_node = (2UL << (20UL - PAGE_SHIFT));
+   pages_per_node = 2UL << (20UL - PAGE_SHIFT);
  #endif
-   pages_per_node = max((uint32_t)pages_per_node, 
mem->page_alignment);
+   pages_per_node = max_t(uint32_t, pages_per_node,
+  mem->page_alignment);
num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
}
  
@@ -411,42 +412,37 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,

mem->start = 0;
pages_left = mem->num_pages;
  
-	spin_lock(&mgr->lock);

-   for (i = 0; pages_left >= pages_per_node; ++i) {
-   unsigned long pages = rounddown_pow_of_two(pages_left);
-
-   /* Limit maximum size to 2GB due to SG table limitations */
-   pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
+   /* Limit maximum size to 2GB due to SG table limitations */
+   pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
  
-		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,

-   pages_per_node, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
-   break;
-
-   vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
-   amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
-   pages_left -= pages;
-   }
-
-   for (; pages_left; ++i) {
-   unsigned long pages = min(pages_left, pages_per_node);
+   i = 0;
+   spin_lock(&mgr->lock);
+   while (pages_left) {
uint32_t alignment = mem->page_alignment;
  
-		if (pages == pages_per_node)

+   if (pages >= pages_per_node)
alignment = pages_per_node;
  
-		r = drm_mm_insert_node_in_range(mm, &nodes[i],

-   pages, alignment, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
+   r = drm_mm_insert_node_in_range(mm, &nodes[i], pages, alignment,
+   0, place->fpfn, lpfn, mode);
+   if (unlikely(r)) {
+   if (pages > pages_per_node) {
+   if (is_power_of_2(pages))
+   pages = pages / 2;
+   else
+   pages = rounddown_pow_of_two(pages);
+   continue;
+   }
goto error;
+   }
  
  		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);

amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
pages_left -= pages;
+   ++i;
+
+   if (pages > pages_left)
+   pages = pages_left;
}
spin_unlock(&mgr->lock);
  

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: restructure amdgpu_vram_mgr_new

2021-04-26 Thread Christian König
Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 58 +---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 529c5c32a205..e2cbe19404c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -358,13 +358,13 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
   const struct ttm_place *place,
   struct ttm_resource *mem)
 {
+   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
struct amdgpu_device *adev = to_amdgpu_device(mgr);
+   uint64_t vis_usage = 0, mem_bytes, max_bytes;
struct drm_mm *mm = &mgr->mm;
-   struct drm_mm_node *nodes;
enum drm_mm_insert_mode mode;
-   unsigned long lpfn, num_nodes, pages_per_node, pages_left;
-   uint64_t vis_usage = 0, mem_bytes, max_bytes;
+   struct drm_mm_node *nodes;
unsigned i;
int r;
 
@@ -391,9 +391,10 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
pages_per_node = HPAGE_PMD_NR;
 #else
/* default to 2MB */
-   pages_per_node = (2UL << (20UL - PAGE_SHIFT));
+   pages_per_node = 2UL << (20UL - PAGE_SHIFT);
 #endif
-   pages_per_node = max((uint32_t)pages_per_node, 
mem->page_alignment);
+   pages_per_node = max_t(uint32_t, pages_per_node,
+  mem->page_alignment);
num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
}
 
@@ -411,42 +412,37 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
mem->start = 0;
pages_left = mem->num_pages;
 
-   spin_lock(&mgr->lock);
-   for (i = 0; pages_left >= pages_per_node; ++i) {
-   unsigned long pages = rounddown_pow_of_two(pages_left);
-
-   /* Limit maximum size to 2GB due to SG table limitations */
-   pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
+   /* Limit maximum size to 2GB due to SG table limitations */
+   pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
 
-   r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
-   pages_per_node, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
-   break;
-
-   vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
-   amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
-   pages_left -= pages;
-   }
-
-   for (; pages_left; ++i) {
-   unsigned long pages = min(pages_left, pages_per_node);
+   i = 0;
+   spin_lock(&mgr->lock);
+   while (pages_left) {
uint32_t alignment = mem->page_alignment;
 
-   if (pages == pages_per_node)
+   if (pages >= pages_per_node)
alignment = pages_per_node;
 
-   r = drm_mm_insert_node_in_range(mm, &nodes[i],
-   pages, alignment, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
+   r = drm_mm_insert_node_in_range(mm, &nodes[i], pages, alignment,
+   0, place->fpfn, lpfn, mode);
+   if (unlikely(r)) {
+   if (pages > pages_per_node) {
+   if (is_power_of_2(pages))
+   pages = pages / 2;
+   else
+   pages = rounddown_pow_of_two(pages);
+   continue;
+   }
goto error;
+   }
 
vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
pages_left -= pages;
+   ++i;
+
+   if (pages > pages_left)
+   pages = pages_left;
}
spin_unlock(&mgr->lock);
 
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Added missing prototype

2021-04-26 Thread Souptick Joarder
On Sat, Apr 24, 2021 at 5:03 AM Felix Kuehling  wrote:
>
> Am 2021-04-23 um 5:39 p.m. schrieb Souptick Joarder:
> > Kernel test robot throws below warning ->
> >
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:125:5: warning:
> >>> no previous prototype for 'kgd_arcturus_hqd_sdma_load'
> >>> [-Wmissing-prototypes]
> >  125 | int kgd_arcturus_hqd_sdma_load(struct kgd_dev *kgd, void
> > *mqd,
> >
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:227:6: warning:
> >>> no previous prototype for 'kgd_arcturus_hqd_sdma_is_occupied'
> >>> [-Wmissing-prototypes]
> >  227 | bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev *kgd,
> > void *mqd)
> >  |  ^
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:246:5: warning:
> >>> no previous prototype for 'kgd_arcturus_hqd_sdma_destroy'
> >>> [-Wmissing-prototypes]
> >  246 | int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev *kgd, void
> > *mqd,
> >  | ^
> >
> > Added prototype for these functions.
> The prototypes are already defined in amdgpu_amdkfd_arcturus.h. I think
> we just need to add a #include in amdgpu_amdkfd_arcturus.c.
>

Right. I missed that file. Will send v2.

> Regards,
>   Felix
>
>
> >
> > Reported-by: kernel test robot 
> > Signed-off-by: Souptick Joarder 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index dc3a692..8fff0e7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -316,6 +316,11 @@ int amdgpu_device_ip_wait_for_idle(struct 
> > amdgpu_device *adev,
> >  enum amd_ip_block_type block_type);
> >  bool amdgpu_device_ip_is_idle(struct amdgpu_device *adev,
> > enum amd_ip_block_type block_type);
> > +int kgd_arcturus_hqd_sdma_load(struct kgd_dev *kgd, void *mqd,
> > + uint32_t __user *wptr, struct mm_struct *mm);
> > +bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev *kgd, void *mqd);
> > +int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
> > + unsigned int utimeout);
> >
> >  #define AMDGPU_MAX_IP_NUM 16
> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu: Added missing prototype

2021-04-26 Thread Souptick Joarder
Kernel test robot throws below warning ->

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:125:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_load'
>> [-Wmissing-prototypes]
 125 | int kgd_arcturus_hqd_sdma_load(struct kgd_dev *kgd, void
*mqd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:195:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_dump'
>> [-Wmissing-prototypes]
 195 | int kgd_arcturus_hqd_sdma_dump(struct kgd_dev *kgd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:227:6: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_is_occupied'
>> [-Wmissing-prototypes]
 227 | bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev *kgd,
void *mqd)
 |  ^
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:246:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_destroy'
>> [-Wmissing-prototypes]
 246 | int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev *kgd, void
*mqd,
 | ^

Added prototype for these functions.

Reported-by: kernel test robot 
Signed-off-by: Souptick Joarder 
---
v2:
Added header which has the function declarations.

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 9ef9f3d..6409d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -25,6 +25,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
+#include "amdgpu_amdkfd_arcturus.h"
 #include "sdma0/sdma0_4_2_2_offset.h"
 #include "sdma0/sdma0_4_2_2_sh_mask.h"
 #include "sdma1/sdma1_4_2_2_offset.h"
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx