Re: [PATCH v10 03/14] drm/amdgpu: add new IOCTL for usermode queue

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 23:25, Alex Deucher wrote:

On Thu, May 2, 2024 at 1:27 PM Shashank Sharma  wrote:

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
   - Alex: Keep a list of queues, instead of single queue per process.
   - Christian: Use the queue manager instead of global ptrs,
Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
  - Christian:
- Formatting of text
- There is no need for queuing of userqueues, with idr in place
  - Alex:
- Remove use_doorbell, its unnecessary
- Reuse amdgpu_mqd_props for saving mqd fields

  - Code formatting and re-arrangement

V3:
  - Integration with doorbell manager

V4:
  - Accommodate MQD union related changes in UAPI (Alex)
  - Do not set the queue size twice (Bas)

V5:
  - Remove wrapper functions for queue indexing (Christian)
  - Do not save the queue id/idr in queue itself (Christian)
  - Move the idr allocation in the IP independent generic space
   (Christian)

V6:
  - Check the validity of input IP type (Christian)

V7:
  - Move uq_func from uq_mgr to adev (Alex)
  - Add missing free(queue) for error cases (Yifan)

V9:
  - Rebase

V10: Addressed review comments from Christian, and added R-B:
  - Do not initialize the local variable
  - Convert DRM_ERROR to DEBUG.

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++
  .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
  3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b52442e2d04a..551e13693100 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2929,6 +2929,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
  };

  static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..ce9b25b82e94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,127 @@
   */

  #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+   return idr_find(_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+
+   mutex_lock(_mgr->userq_mutex);
+
+   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+   if (!queue) {
+   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+   mutex_unlock(_mgr->userq_mutex);
+   return -EINVAL;
+   }
+
+   uq_funcs = adev->userq_funcs[queue->queue_type];
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(_mgr->userq_idr, queue_id);
+   kfree(queue);
+
+   mutex_unlock(_mgr->userq_mutex);
+   return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int qid, r = 0;
+
+   /* Usermode queues are only supported for GFX/SDMA engines as of now */
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
+   return -EINVAL;
+   }
+
+   mutex_lock(_mgr->userq_mutex);
+
+   uq_funcs = adev->userq_funcs[args->in.ip_type];
+   if (!uq_funcs) {
+   DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", 
args->in.ip_type);
+   r = -EINVAL;
+   goto unlock;
+   }
+
+   queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+   if (!queue) {
+   DRM_ERROR("Failed to allocate memory 

Re: [PATCH v3 2/3] drm/amdgpu: Reduce mem_type to domain double indirection

2024-05-02 Thread Felix Kuehling



On 2024-04-30 13:16, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
  * Remove AMDGPU_PL_PREEMPT handling.

v3:
  * Rebase.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Christian König  # v1
Reviewed-by: Felix Kuehling  # v2


I'm waiting for Christian to review patches 1 and 3. Then I can apply 
the whole series.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 29 +
  2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 055ba2ea4c12..0b3b10d21952 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -165,8 +165,7 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (r)
return ERR_PTR(r);
  
-	} else if (!(amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type) &

-AMDGPU_GEM_DOMAIN_GTT)) {
+   } else if (bo->tbo.resource->mem_type != TTM_PL_TT) {
return ERR_PTR(-EBUSY);
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index b2a83c802bbd..c581e4952cbd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -983,12 +983,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  
  	ttm_bo_pin(>tbo);
  
-	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
atomic64_add(amdgpu_bo_size(bo), >vram_pin_size);
atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
 >visible_pin_size);
-   } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+   } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
atomic64_add(amdgpu_bo_size(bo), >gart_pin_size);
}
  
@@ -1289,7 +1288,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,

struct ttm_resource *res = bo->tbo.resource;
uint64_t size = amdgpu_bo_size(bo);
struct drm_gem_object *obj;
-   unsigned int domain;
bool shared;
  
  	/* Abort if the BO doesn't currently have a backing store */

@@ -1299,21 +1297,20 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
obj = >tbo.base;
shared = drm_gem_object_is_shared_for_memory_stats(obj);
  
-	domain = amdgpu_mem_type_to_domain(res->mem_type);

-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (res->mem_type) {
+   case TTM_PL_VRAM:
stats->vram += size;
-   if (amdgpu_res_cpu_visible(adev, bo->tbo.resource))
+   if (amdgpu_res_cpu_visible(adev, res))
stats->visible_vram += size;
if (shared)
stats->vram_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_GTT:
+   case TTM_PL_TT:
stats->gtt += size;
if (shared)
stats->gtt_shared += size;
break;
-   case AMDGPU_GEM_DOMAIN_CPU:
+   case TTM_PL_SYSTEM:
default:
stats->cpu += size;
if (shared)
@@ -1326,7 +1323,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->requested_visible_vram += size;
  
-		if (domain != AMDGPU_GEM_DOMAIN_VRAM) {

+   if (res->mem_type != TTM_PL_VRAM) {
stats->evicted_vram += size;
if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)
stats->evicted_visible_vram += size;
@@ -1600,20 +1597,18 @@ u64 amdgpu_bo_print_info(int id, struct amdgpu_bo *bo, 
struct seq_file *m)
u64 size;
  
  	if (dma_resv_trylock(bo->tbo.base.resv)) {

-   unsigned int domain;
  
-		domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);

-   switch (domain) {
-   case AMDGPU_GEM_DOMAIN_VRAM:
+   switch (bo->tbo.resource->mem_type) {
+   case TTM_PL_VRAM:
if (amdgpu_res_cpu_visible(adev, bo->tbo.resource))
placement = "VRAM VISIBLE";
else

[PATCH 3/3] drm/amdgpu/gfx11: enable gfx pipe1 hardware support

2024-05-02 Thread Alex Deucher
Enable gfx pipe1 hardware support.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 75157e0196d22..de15abc6a0351 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -50,7 +50,7 @@
 #include "nbio_v4_3.h"
 #include "mes_v11_0.h"
 
-#define GFX11_NUM_GFX_RINGS1
+#define GFX11_NUM_GFX_RINGS2
 #define GFX11_MEC_HPD_SIZE 2048
 
 #define RLCG_UCODE_LOADING_START_ADDRESS   0x2000L
@@ -1341,7 +1341,7 @@ static int gfx_v11_0_sw_init(void *handle)
case IP_VERSION(11, 0, 2):
case IP_VERSION(11, 0, 3):
adev->gfx.me.num_me = 1;
-   adev->gfx.me.num_pipe_per_me = 1;
+   adev->gfx.me.num_pipe_per_me = 2;
adev->gfx.me.num_queue_per_pipe = 1;
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
@@ -1352,7 +1352,7 @@ static int gfx_v11_0_sw_init(void *handle)
case IP_VERSION(11, 5, 0):
case IP_VERSION(11, 5, 1):
adev->gfx.me.num_me = 1;
-   adev->gfx.me.num_pipe_per_me = 1;
+   adev->gfx.me.num_pipe_per_me = 2;
adev->gfx.me.num_queue_per_pipe = 1;
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
-- 
2.44.0



[PATCH 2/3] drm/amdgpu/gfx11: handle priority setup for gfx pipe1

2024-05-02 Thread Alex Deucher
Set up pipe1 as a high priority queue.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 36 ++
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 81a35d0f0a58e..75157e0196d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -929,9 +929,9 @@ static int gfx_v11_0_gpu_early_init(struct amdgpu_device 
*adev)
 static int gfx_v11_0_gfx_ring_init(struct amdgpu_device *adev, int ring_id,
   int me, int pipe, int queue)
 {
-   int r;
struct amdgpu_ring *ring;
unsigned int irq_type;
+   unsigned int hw_prio;
 
ring = >gfx.gfx_ring[ring_id];
 
@@ -950,11 +950,10 @@ static int gfx_v11_0_gfx_ring_init(struct amdgpu_device 
*adev, int ring_id,
sprintf(ring->name, "gfx_%d.%d.%d", ring->me, ring->pipe, ring->queue);
 
irq_type = AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP + ring->pipe;
-   r = amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq, irq_type,
-AMDGPU_RING_PRIO_DEFAULT, NULL);
-   if (r)
-   return r;
-   return 0;
+   hw_prio = amdgpu_gfx_is_high_priority_graphics_queue(adev, ring) ?
+   AMDGPU_GFX_PIPE_PRIO_HIGH : AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   return amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq, irq_type,
+   hw_prio, NULL);
 }
 
 static int gfx_v11_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
@@ -3615,6 +3614,24 @@ static void gfx_v11_0_cp_set_doorbell_range(struct 
amdgpu_device *adev)
 (adev->doorbell_index.userqueue_end * 2) << 2);
 }
 
+static void gfx_v11_0_gfx_mqd_set_priority(struct amdgpu_device *adev,
+  struct v11_gfx_mqd *mqd,
+  struct amdgpu_mqd_prop *prop)
+{
+   bool priority = 0;
+   u32 tmp;
+
+   /* set up default queue priority level
+* 0x0 = low priority, 0x1 = high priority
+*/
+   if (prop->hqd_pipe_priority == AMDGPU_GFX_PIPE_PRIO_HIGH)
+   priority = 1;
+
+   tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
+   tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 
priority);
+   mqd->cp_gfx_hqd_queue_priority = tmp;
+}
+
 static int gfx_v11_0_gfx_mqd_init(struct amdgpu_device *adev, void *m,
  struct amdgpu_mqd_prop *prop)
 {
@@ -3643,11 +3660,8 @@ static int gfx_v11_0_gfx_mqd_init(struct amdgpu_device 
*adev, void *m,
tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
mqd->cp_gfx_hqd_vmid = 0;
 
-   /* set up default queue priority level
-* 0x0 = low priority, 0x1 = high priority */
-   tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
-   tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
-   mqd->cp_gfx_hqd_queue_priority = tmp;
+   /* set up gfx queue priority */
+   gfx_v11_0_gfx_mqd_set_priority(adev, mqd, prop);
 
/* set up time quantum */
tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
-- 
2.44.0



[PATCH 1/3] drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe

2024-05-02 Thread Alex Deucher
Use correct ref/mask for differnent gfx ring pipe. Ported from
ZhenGuo's patch for gfx10.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ad6431013c738..81a35d0f0a58e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5293,7 +5293,7 @@ static void gfx_v11_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
}
reg_mem_engine = 0;
} else {
-   ref_and_mask = nbio_hf_reg->ref_and_mask_cp0;
+   ref_and_mask = nbio_hf_reg->ref_and_mask_cp0 << ring->pipe;
reg_mem_engine = 1; /* pfp */
}
 
-- 
2.44.0



[PATCH] drm/amdgpu: drop MES 10.1 support

2024-05-02 Thread Alex Deucher
It was an enablement vehicle for MES 11 and was never
productized.  Remove it.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |   20 -
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c| 1189 -
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.h|   29 -
 drivers/gpu/drm/amd/amdgpu/nv.c   |1 -
 5 files changed, 1240 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
 delete mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v10_1.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index de7b76327f5ba..6e1237a97a91e 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -187,7 +187,6 @@ amdgpu-y += \
 # add MES block
 amdgpu-y += \
amdgpu_mes.o \
-   mes_v10_1.o \
mes_v11_0.o \
mes_v12_0.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index ece462f8a324b..20887fd5a3342 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -94,7 +94,6 @@
 #include "vcn_v4_0_5.h"
 #include "jpeg_v4_0_5.h"
 #include "amdgpu_vkms.h"
-#include "mes_v10_1.h"
 #include "mes_v11_0.h"
 #include "mes_v12_0.h"
 #include "smuio_v11_0.h"
@@ -2213,25 +2212,6 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
 static int amdgpu_discovery_set_mes_ip_blocks(struct amdgpu_device *adev)
 {
switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
-   case IP_VERSION(10, 1, 10):
-   case IP_VERSION(10, 1, 1):
-   case IP_VERSION(10, 1, 2):
-   case IP_VERSION(10, 1, 3):
-   case IP_VERSION(10, 1, 4):
-   case IP_VERSION(10, 3, 0):
-   case IP_VERSION(10, 3, 1):
-   case IP_VERSION(10, 3, 2):
-   case IP_VERSION(10, 3, 3):
-   case IP_VERSION(10, 3, 4):
-   case IP_VERSION(10, 3, 5):
-   case IP_VERSION(10, 3, 6):
-   if (amdgpu_mes) {
-   amdgpu_device_ip_block_add(adev, _v10_1_ip_block);
-   adev->enable_mes = true;
-   if (amdgpu_mes_kiq)
-   adev->enable_mes_kiq = true;
-   }
-   break;
case IP_VERSION(11, 0, 0):
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 2):
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
deleted file mode 100644
index a626bf9049260..0
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ /dev/null
@@ -1,1189 +0,0 @@
-/*
- * Copyright 2019 Advanced Micro Devices, Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- */
-
-#include 
-#include 
-#include "amdgpu.h"
-#include "soc15_common.h"
-#include "nv.h"
-#include "gc/gc_10_1_0_offset.h"
-#include "gc/gc_10_1_0_sh_mask.h"
-#include "gc/gc_10_1_0_default.h"
-#include "v10_structs.h"
-#include "mes_api_def.h"
-
-#define mmCP_MES_IC_OP_CNTL_Sienna_Cichlid   0x2820
-#define mmCP_MES_IC_OP_CNTL_Sienna_Cichlid_BASE_IDX  1
-#define mmRLC_CP_SCHEDULERS_Sienna_Cichlid 0x4ca1
-#define mmRLC_CP_SCHEDULERS_Sienna_Cichlid_BASE_IDX1
-
-MODULE_FIRMWARE("amdgpu/navi10_mes.bin");
-MODULE_FIRMWARE("amdgpu/sienna_cichlid_mes.bin");
-MODULE_FIRMWARE("amdgpu/sienna_cichlid_mes1.bin");
-
-static int mes_v10_1_hw_fini(void *handle);
-static int mes_v10_1_kiq_hw_init(struct amdgpu_device *adev);
-
-#define MES_EOP_SIZE   2048
-
-static void mes_v10_1_ring_set_wptr(struct amdgpu_ring *ring)
-{
-   struct amdgpu_device *adev = ring->adev;
-
-   if (ring->use_doorbell) {
-   atomic64_set((atomic64_t *)ring->wptr_cpu_addr,
-ring->wptr);
-   WDOORBELL64(ring->doorbell_index, ring->wptr);
-   } else {
-   BUG();
-   }
-}
-
-static 

Re: [PATCH v10 03/14] drm/amdgpu: add new IOCTL for usermode queue

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 1:27 PM Shashank Sharma  wrote:
>
> This patch adds:
> - A new IOCTL function to create and destroy
> - A new structure to keep all the user queue data in one place.
> - A function to generate unique index for the queue.
>
> V1: Worked on review comments from RFC patch series:
>   - Alex: Keep a list of queues, instead of single queue per process.
>   - Christian: Use the queue manager instead of global ptrs,
>Don't keep the queue structure in amdgpu_ctx
>
> V2: Worked on review comments:
>  - Christian:
>- Formatting of text
>- There is no need for queuing of userqueues, with idr in place
>  - Alex:
>- Remove use_doorbell, its unnecessary
>- Reuse amdgpu_mqd_props for saving mqd fields
>
>  - Code formatting and re-arrangement
>
> V3:
>  - Integration with doorbell manager
>
> V4:
>  - Accommodate MQD union related changes in UAPI (Alex)
>  - Do not set the queue size twice (Bas)
>
> V5:
>  - Remove wrapper functions for queue indexing (Christian)
>  - Do not save the queue id/idr in queue itself (Christian)
>  - Move the idr allocation in the IP independent generic space
>   (Christian)
>
> V6:
>  - Check the validity of input IP type (Christian)
>
> V7:
>  - Move uq_func from uq_mgr to adev (Alex)
>  - Add missing free(queue) for error cases (Yifan)
>
> V9:
>  - Rebase
>
> V10: Addressed review comments from Christian, and added R-B:
>  - Do not initialize the local variable
>  - Convert DRM_ERROR to DEBUG.
>
> Cc: Alex Deucher 
> Cc: Christian Koenig 
> Reviewed-by: Christian Koenig 
> Signed-off-by: Shashank Sharma 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
>  3 files changed, 124 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b52442e2d04a..551e13693100 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2929,6 +2929,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> +   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>  };
>
>  static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index effc0c7c02cf..ce9b25b82e94 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -23,6 +23,127 @@
>   */
>
>  #include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +#include "amdgpu_userqueue.h"
> +
> +static struct amdgpu_usermode_queue *
> +amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
> +{
> +   return idr_find(_mgr->userq_idr, qid);
> +}
> +
> +static int
> +amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> +{
> +   struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
> +   struct amdgpu_device *adev = uq_mgr->adev;
> +   const struct amdgpu_userq_funcs *uq_funcs;
> +   struct amdgpu_usermode_queue *queue;
> +
> +   mutex_lock(_mgr->userq_mutex);
> +
> +   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
> +   if (!queue) {
> +   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
> +   mutex_unlock(_mgr->userq_mutex);
> +   return -EINVAL;
> +   }
> +
> +   uq_funcs = adev->userq_funcs[queue->queue_type];
> +   uq_funcs->mqd_destroy(uq_mgr, queue);
> +   idr_remove(_mgr->userq_idr, queue_id);
> +   kfree(queue);
> +
> +   mutex_unlock(_mgr->userq_mutex);
> +   return 0;
> +}
> +
> +static int
> +amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> +{
> +   struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
> +   struct amdgpu_device *adev = uq_mgr->adev;
> +   const struct amdgpu_userq_funcs *uq_funcs;
> +   struct amdgpu_usermode_queue *queue;
> +   int qid, r = 0;
> +
> +   /* Usermode queues are only supported for GFX/SDMA engines as of now 
> */
> +   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
> +   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
> args->in.ip_type);
> +   return -EINVAL;
> +   }
> +
> +   mutex_lock(_mgr->userq_mutex);
> +
> +   uq_funcs = adev->userq_funcs[args->in.ip_type];
> +   if (!uq_funcs) {
> +   DRM_ERROR("Usermode queue is not supported for this IP 
> (%u)\n", args->in.ip_type);
> +   r = 

[linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-02 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a  Add linux-next specific 
files for 20240502

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:80:52: error: '%s' 
directive output may be truncated writing up to 29 bytes into a region of size 
23 [-Werror=format-truncation=]
{standard input}:898: Warning: overflow in branch to .L152; converted into 
longer instruction sequence

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-all_hub-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-inst-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-vmhub-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-addr-description-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-fence-description-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-flags-description-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ib-description-in-sdma_v7_0_ring_emit_mem_sync
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-job-description-in-sdma_v7_0_ring_emit_mem_sync
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ring-description-in-sdma_v7_0_emit_copy_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ring-description-in-sdma_v7_0_emit_fill_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-vm-description-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-addr-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-flags-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-flags-not-described-in-sdma_v7_0_ring_emit_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ib-not-described-in-sdma_v7_0_emit_copy_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ib-not-described-in-sdma_v7_0_emit_fill_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-job-not-described-in-sdma_v7_0_ring_emit_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-pd_addr-not-described-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ring-not-described-in-sdma_v7_0_ring_pad_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-seq-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-timeout-not-described-in-sdma_v7_0_ring_test_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-value-not-described-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-vmid-not-described-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-imx-ipuv3-imx-ldb.c:error:_sel-directive-output-may-be-truncated-writing-bytes-into-a-region-of-size-between-and
|   |-- 
drivers-gpu-drm-nouveau-nouveau_backlight.c:error:d-directive-output-may-be-truncated-writing-between-and-bytes-into-a-region-of-size
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arc-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-all_hub-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-inst

[PATCH] drm/amdgpu/pm: update documentation on memory clock

2024-05-02 Thread Alex Deucher
Update documentation for RDNA3 dGPUs.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index ec9058c80647a..9ad114e695e5d 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -610,12 +610,18 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
  *
  * Clock conversion (Mhz):
  *
+ * Pre-RDNA3 GPUs:
+ *
  * HBM: effective_memory_clock = memory_controller_clock * 1
  *
  * G5: effective_memory_clock = memory_controller_clock * 1
  *
  * G6: effective_memory_clock = memory_controller_clock * 2
  *
+ * RDNA3 GPUs:
+ *
+ * G6: effective_memory_clock = memory_controller_clock * 1
+ *
  * DRAM data rate (MT/s):
  *
  * HBM: effective_memory_clock * 2 = data_rate
-- 
2.44.0



[PATCH] dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users

2024-05-02 Thread Mario Limonciello
Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.

Cc: Tim Huang 
Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Mario Limonciello 
---
I tested this on SMU 13.0.4 for ~85 cycles with this script, BIOS 1.1.0.2a and
didn't observe any hangs.

```
#!/bin/sh
echo test_resume > /sys/power/disk
i=1
while [ : ]; do

  echo "Starting cycle $i"
  echo disk > /sys/power/state
  echo "Ending cycle $i"
  i=$((i+1))
  sleep 5
done
```

 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
index 949131bd1ecb..4abfcd32747d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c
@@ -226,7 +226,7 @@ static int smu_v13_0_4_system_features_control(struct 
smu_context *smu, bool en)
struct amdgpu_device *adev = smu->adev;
int ret = 0;
 
-   if (!en && !adev->in_s0ix) {
+   if (!en && adev->in_s4) {
/* Adds a GFX reset as workaround just before sending the
 * MP1_UNLOAD message to prevent GC/RLC/PMFW from entering
 * an invalid state.
-- 
2.43.0



Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 17:22, Christian König wrote:



Am 26.04.24 um 15:48 schrieb Shashank Sharma:

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
   this flag

so that the userqueue works only when the config is enabled.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
  drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++--
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
  4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 22d88f8ef527..bba963527d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
    Add -Werror to the build flags for amdgpu.ko.
    Only enable this if you are warning code for amdgpu.ko.
  +config DRM_AMDGPU_USERQ_GFX
+    bool "Enable Navi 3x gfx usermode queues"
+    depends on DRM_AMDGPU
+    default n
+    help
+  Choose this option to enable usermode queue support for GFX
+  workload submission. This feature is supported on Navi 3X 
only.


When this is for Navi 3x only I would name that 
DRM_AMDGPU_NAVI3X_USERQ instead.


And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
from the comment and description.


Apart from that the approach looks good to me.


Agree, both the review comments addressed in V10.

- Shashank


Christian.


+
  source "drivers/gpu/drm/amd/acp/Kconfig"
  source "drivers/gpu/drm/amd/display/Kconfig"
  source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index a640bfa468ad..0b17fc1740a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,8 +184,12 @@ amdgpu-y += \
  amdgpu-y += \
  amdgpu_mes.o \
  mes_v10_1.o \
-    mes_v11_0.o \
-    mes_v11_0_userqueue.o
+    mes_v11_0.o
+
+# add GFX userqueue support
+ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
+amdgpu-y += mes_v11_0_userqueue.o
+endif
    # add UVD block
  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

index 27b86f7fe949..8591aed9f9ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 2;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;

+#endif
  break;
  case IP_VERSION(11, 0, 1):
  case IP_VERSION(11, 0, 4):
@@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 1;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;

+#endif
  break;
  default:
  adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c

index 90354a70c807..084059c95db6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
  return -EINVAL;
  }
  +#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
+#endif
+
  return r;
  }




[PATCH] drm/amdgpu: Add Ring Hang Events

2024-05-02 Thread Ori Messinger
This patch adds 'ring hang' events to the driver.
This is done by adding a 'reset_ring_hang' bool variable to the
struct 'amdgpu_reset_context' in the amdgpu_reset.h file.
The purpose for this 'reset_ring_hang' variable is whenever a GPU
reset is initiated, the reset_ring_hang should be set to 'true'.

This 'amdgpu_reset_context' struct is now also passed
through across all relevant functions, and another event type
"KFD_SMI_EVENT_RING_HANG" is added to the kfd_smi_event enum.

Signed-off-by: Ori Messinger 
Change-Id: I6af3022eb1b4514201c9430d635ff87f167ad6f7
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h  |  9 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h   |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c |  7 ---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  5 -
 include/uapi/linux/kfd_ioctl.h  | 15 ---
 9 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 7ba05f030dd1..509f750702b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -261,12 +261,12 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool 
run_pm)
return r;
 }
 
-int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev)
+int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, struct 
amdgpu_reset_context *reset_context)
 {
int r = 0;
 
if (adev->kfd.dev)
-   r = kgd2kfd_pre_reset(adev->kfd.dev);
+   r = kgd2kfd_pre_reset(adev->kfd.dev, reset_context);
 
return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 1de021ebdd46..c9030d8b8308 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -47,6 +47,7 @@ enum TLB_FLUSH_TYPE {
 };
 
 struct amdgpu_device;
+struct amdgpu_reset_context;
 
 enum kfd_mem_attachment_type {
KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */
@@ -170,7 +171,8 @@ bool amdgpu_amdkfd_have_atomics_support(struct 
amdgpu_device *adev);
 
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
 
-int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev);
+int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev,
+   struct amdgpu_reset_context *reset_context);
 
 int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev);
 
@@ -416,7 +418,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 void kgd2kfd_device_exit(struct kfd_dev *kfd);
 void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm);
 int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm);
-int kgd2kfd_pre_reset(struct kfd_dev *kfd);
+int kgd2kfd_pre_reset(struct kfd_dev *kfd,
+ struct amdgpu_reset_context *reset_context);
 int kgd2kfd_post_reset(struct kfd_dev *kfd);
 void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
 void kgd2kfd_set_sram_ecc_flag(struct kfd_dev *kfd);
@@ -459,7 +462,7 @@ static inline int kgd2kfd_resume(struct kfd_dev *kfd, bool 
run_pm)
return 0;
 }
 
-static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd)
+static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd, struct 
amdgpu_reset_context *reset_context)
 {
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 77f6fd50002a..f9fa784f36f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5772,7 +5772,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
cancel_delayed_work_sync(_adev->delayed_init_work);
 
-   amdgpu_amdkfd_pre_reset(tmp_adev);
+   amdgpu_amdkfd_pre_reset(tmp_adev, reset_context);
 
/*
 * Mark these ASICs to be reseted as untracked first
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index e4742b65032d..361ba892739f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -77,6 +77,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
 
reset_context.method = AMD_RESET_METHOD_NONE;
reset_context.reset_req_dev = adev;
+   reset_context.reset_ring_hang = true;
+   DRM_ERROR("Reset cause: ring hang\n");
clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags);
 
r = amdgpu_device_gpu_recover(ring->adev, job, _context);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
index 5a9cc043b858..e1f5c0c1458d 100644
--- 

[PATCH v10 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-05-02 Thread Shashank Sharma
This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

V9:  Introduce this patch
V10: Call it CONFIG_DRM_AMDGPU_NAVI3X_USERQ instead of
 CONFIG_DRM_AMDGPU_USERQ_GFX (Christian)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
 drivers/gpu/drm/amd/amdgpu/Makefile| 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef527..a7c85eeec756 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
  Add -Werror to the build flags for amdgpu.ko.
  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_NAVI3X_USERQ
+   bool "Enable Navi 3x gfx usermode queues"
+   depends on DRM_AMDGPU
+   default n
+   help
+ Choose this option to enable usermode queue support for 
GFX/SDMA/Compute
+  workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 987fabb2b2c6..0a64f2c57def 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -189,9 +189,11 @@ amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
mes_v11_0.o \
-   mes_v11_0_userqueue.o \
mes_v12_0.o
 
+# add GFX userqueue support
+amdgpu-$(DRM_AMDGPU_NAVI3X_USERQ) += mes_v11_0_userqueue.o
+
 # add UVD block
 amdgpu-y += \
amdgpu_uvd.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 46304d09c4bd..5c4bf243ed04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1348,8 +1348,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
+#endif
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1361,8 +1363,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
+#endif
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 0989400d0afe..f6a2c2daa00f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1274,7 +1274,10 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+#ifdef DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
+#endif
+
return r;
 }
 
-- 
2.43.2



[PATCH v10 13/14] drm/amdgpu: enable compute/gfx usermode queue

2024-05-02 Thread Shashank Sharma
This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

V9:  Patch introduced
V10: Add custom IP specific mqd strcuture for compute (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c|  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 15 +++
 include/uapi/drm/amdgpu_drm.h| 10 ++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index f7ece0b31ff9..84bce9434102 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA
+   && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 888edc2b4769..46304d09c4bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1361,6 +1362,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 80375894c4f3..2ae6f720dc66 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -260,6 +260,21 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->use_doorbell = true;
userq_props->doorbell_index = queue->doorbell_index;
 
+   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+   struct drm_amdgpu_userq_mqd_compute_gfx_v11 *compute_mqd;
+
+   if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
+   DRM_ERROR("Invalid compute IP MQD size\n");
+   goto free_mqd_user;
+   }
+   compute_mqd = (struct drm_amdgpu_userq_mqd_compute_gfx_v11 
*)mqd_user->mqd;
+
+   userq_props->eop_gpu_addr = compute_mqd->eop_va;
+   userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   userq_props->hqd_queue_priority = 
AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+   userq_props->hqd_active = false;
+   }
+
queue->userq_prop = userq_props;
 
r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 6798139036a1..7ffa9ee885e6 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -429,6 +429,16 @@ struct drm_amdgpu_userq_mqd_gfx_v11 {
__u64   csa_va;
 };
 
+/* GFX V11 Compute IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_compute_gfx_v11 {
+   /**
+* @eop_va: Virtual address of the GPU memory to hold the EOP buffer.
+* This must be a from a separate GPU object, and must be at least 1 
page
+* sized.
+*/
+   __u64   eop_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.43.2



[PATCH v10 11/14] drm/amdgpu: enable GFX-V11 userqueue support

2024-05-02 Thread Shashank Sharma
This patch enables GFX-v11 IP support in the usermode queue base
code. It typically:
- adds a GFX_v11 specific MQD structure
- sets IP functions to create and destroy MQDs
- sets MQD objects coming from userspace

V10: introduced this spearate patch for GFX V11 enabling (Alex).

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  3 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 22 +++
 include/uapi/drm/amdgpu_drm.h | 22 +++
 3 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ad6431013c73..888edc2b4769 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -49,6 +49,7 @@
 #include "gfx_v11_0_3.h"
 #include "nbio_v4_3.h"
 #include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
 
 #define GFX11_NUM_GFX_RINGS1
 #define GFX11_MEC_HPD_SIZE 2048
@@ -1347,6 +1348,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1358,6 +1360,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index d084c5754273..80375894c4f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -180,6 +180,28 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
return r;
}
 
+   /* Shadow, GDS and CSA objects come directly from userspace */
+   if (mqd_user->ip_type == AMDGPU_HW_IP_GFX) {
+   struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+   struct drm_amdgpu_userq_mqd_gfx_v11 *mqd_gfx_v11;
+
+   if (mqd_user->mqd_size != sizeof(*mqd_gfx_v11) || 
!mqd_user->mqd) {
+   DRM_ERROR("Invalid GFX MQD\n");
+   return -EINVAL;
+   }
+
+   mqd_gfx_v11 = (struct drm_amdgpu_userq_mqd_gfx_v11 
*)mqd_user->mqd;
+
+   mqd->shadow_base_lo = mqd_gfx_v11->shadow_va & 0xFFFC;
+   mqd->shadow_base_hi = upper_32_bits(mqd_gfx_v11->shadow_va);
+
+   mqd->gds_bkup_base_lo = mqd_gfx_v11->gds_va & 0xFFFC;
+   mqd->gds_bkup_base_hi = upper_32_bits(mqd_gfx_v11->gds_va);
+
+   mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFC;
+   mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
+   }
+
return 0;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f7313e576f06..6798139036a1 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -407,6 +407,28 @@ union drm_amdgpu_userq {
struct drm_amdgpu_userq_out out;
 };
 
+/* GFX V11 IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_gfx_v11 {
+   /**
+* @shadow_va: Virtual address of the GPU memory to hold the shadow 
buffer.
+* This must be a from a separate GPU object, and must be at least 
4-page
+* sized.
+*/
+   __u64   shadow_va;
+   /**
+* @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   gds_va;
+   /**
+* @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   csa_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.43.2



[PATCH v10 12/14] drm/amdgpu: enable SDMA-V6 usermode queues

2024-05-02 Thread Shashank Sharma
This patch does necessary modifications to enable the SDMA-v6
usermode queues using the existing userqueue infrastructure.

V9:  introduced this patch in the series
V10: use header file instead of extern (Alex)

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index df0e74a3ec8c..f7ece0b31ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index c833b6b8373b..0989400d0afe 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -43,6 +43,7 @@
 #include "sdma_common.h"
 #include "sdma_v6_0.h"
 #include "v11_structs.h"
+#include "mes_v11_0_userqueue.h"
 
 MODULE_FIRMWARE("amdgpu/sdma_6_0_0.bin");
 MODULE_FIRMWARE("amdgpu/sdma_6_0_1.bin");
@@ -1273,6 +1274,7 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+   adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
return r;
 }
 
-- 
2.43.2



[PATCH v10 09/14] drm/amdgpu: generate doorbell index for userqueue

2024-05-02 Thread Shashank Sharma
The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5:  Fix the db object reference leak (Christian)
V6:  Pin the doorbell bo in userqueue_create() function, and unpin it
 in userqueue destoy (Christian)
V7:  Added missing kfree for queue in error cases
 Added Alex's R-B
V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index edbcb0f4c898..fbf6235cfea0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_bo_unref(_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+struct drm_file *filp,
+uint32_t doorbell_offset)
+{
+   uint64_t index;
+   struct drm_gem_object *gobj;
+   struct amdgpu_userq_obj *db_obj = >db_obj;
+   int r;
+
+   gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+   if (gobj == NULL) {
+   DRM_ERROR("Can't find GEM object for doorbell\n");
+   return -EINVAL;
+   }
+
+   db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+   drm_gem_object_put(gobj);
+
+   /* Pin the BO before generating the index, unpin in queue destroy */
+   r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unref_bo;
+   }
+
+   r = amdgpu_bo_reserve(db_obj->obj, true);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unpin_bo;
+   }
+
+   index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+doorbell_offset, sizeof(u64));
+   DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+   amdgpu_bo_unreserve(db_obj->obj);
+   return index;
+
+unpin_bo:
+   amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+   amdgpu_bo_unref(_obj->obj);
+   return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 
uq_funcs = adev->userq_funcs[queue->queue_type];
uq_funcs->mqd_destroy(uq_mgr, queue);
+   amdgpu_bo_unpin(queue->db_obj.obj);
+   amdgpu_bo_unref(>db_obj.obj);
idr_remove(_mgr->userq_idr, queue_id);
kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
struct amdgpu_device *adev = uq_mgr->adev;
const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
+   uint64_t index;
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
@@ -158,6 +208,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
queue->flags = args->in.flags;
queue->vm = >vm;
 
+   /* Convert relative doorbell offset into absolute doorbell index */
+   index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, 
args->in.doorbell_offset);
+   if (index == (uint64_t)-EINVAL) {
+   DRM_ERROR("Failed to get doorbell for queue\n");
+   kfree(queue);
+   goto unlock;
+   }
+   queue->doorbell_index = index;
+
r = uq_funcs->mqd_create(uq_mgr, >in, queue);
if (r) {
DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 6ff04647b62e..d084c5754273 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -236,6 +236,7 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
userq_props->use_doorbell = true;
+   userq_props->doorbell_index = queue->doorbell_index;
 
queue->userq_prop = 

[PATCH v10 10/14] drm/amdgpu: cleanup leftover queues

2024-05-02 Thread Shashank Sharma
This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7:  Added Alex's R-B
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Suggested-by: Bas Nieuwenhuizen 
Signed-off-by: Bas Nieuwenhuizen 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index fbf6235cfea0..df0e74a3ec8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+int queue_id)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs = 
adev->userq_funcs[queue->queue_type];
+
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(_mgr->userq_idr, queue_id);
+   kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 {
struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
-   struct amdgpu_device *adev = uq_mgr->adev;
-   const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
 
mutex_lock(_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
return -EINVAL;
}
 
-   uq_funcs = adev->userq_funcs[queue->queue_type];
-   uq_funcs->mqd_destroy(uq_mgr, queue);
amdgpu_bo_unpin(queue->db_obj.obj);
amdgpu_bo_unref(>db_obj.obj);
-   idr_remove(_mgr->userq_idr, queue_id);
-   kfree(queue);
-
+   amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
mutex_unlock(_mgr->userq_mutex);
return 0;
 }
@@ -277,6 +284,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+   uint32_t queue_id;
+   struct amdgpu_usermode_queue *queue;
+
+   idr_for_each_entry(_mgr->userq_idr, queue, queue_id)
+   amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
idr_destroy(_mgr->userq_idr);
mutex_destroy(_mgr->userq_mutex);
 }
-- 
2.43.2



[PATCH v10 07/14] drm/amdgpu: map usermode queue into MES

2024-05-02 Thread Shashank Sharma
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
- Map/Unmap should be IP specific.
V2:
Addressed review comments from Christian:
- Fix the wptr_mc_addr calculation (moved into another patch)
Addressed review comments from Alex:
- Do not add fptrs for map/unmap

V3:  Integration with doorbell manager
V4:  Rebase
V5:  Use gfx_v11_0 for function names (Alex)
V6:  Removed queue->proc/gang/fw_ctx_address variables and doing the
 address calculations locally to keep the queue structure GEN
 independent (Alex)
V7:  Added R-B from Alex
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 58cfc956cddd..874ea3901319 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue,
+  struct amdgpu_mqd_prop *userq_props)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_userq_obj *ctx = >fw_obj;
+   struct mes_add_queue_input queue_input;
+   int r;
+
+   memset(_input, 0x0, sizeof(struct mes_add_queue_input));
+
+   queue_input.process_va_start = 0;
+   queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << 
AMDGPU_GPU_PAGE_SHIFT;
+
+   /* set process quantum to 10 ms and gang quantum to 1 ms as default */
+   queue_input.process_quantum = 10;
+   queue_input.gang_quantum = 1;
+   queue_input.paging = false;
+
+   queue_input.process_context_addr = ctx->gpu_addr;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+   queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+   queue_input.gang_global_priority_level = 
AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+   queue_input.process_id = queue->vm->pasid;
+   queue_input.queue_type = queue->queue_type;
+   queue_input.mqd_addr = queue->mqd.gpu_addr;
+   queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+   queue_input.queue_size = userq_props->queue_size >> 2;
+   queue_input.doorbell_offset = userq_props->doorbell_index;
+   queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+   amdgpu_mes_lock(>mes);
+   r = adev->mes.funcs->add_hw_queue(>mes, _input);
+   amdgpu_mes_unlock(>mes);
+   if (r) {
+   DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+   return r;
+   }
+
+   DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", 
userq_props->doorbell_index);
+   return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct mes_remove_queue_input queue_input;
+   struct amdgpu_userq_obj *ctx = >fw_obj;
+   int r;
+
+   memset(_input, 0x0, sizeof(struct mes_remove_queue_input));
+   queue_input.doorbell_offset = queue->doorbell_index;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+
+   amdgpu_mes_lock(>mes);
+   r = adev->mes.funcs->remove_hw_queue(>mes, _input);
+   amdgpu_mes_unlock(>mes);
+   if (r)
+   DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue,
struct drm_amdgpu_userq_in 
*mqd_user)
@@ -121,8 +184,18 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Map userqueue into FW using MES */
+   r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+   if (r) {
+   DRM_ERROR("Failed to init MQD\n");
+   goto free_ctx;
+   }
+
return 0;
 
+free_ctx:
+   amdgpu_userqueue_destroy_object(uq_mgr, >fw_obj);
+
 free_mqd:
amdgpu_userqueue_destroy_object(uq_mgr, >mqd);
 
@@ -139,6 +212,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
+   mes_v11_0_userq_unmap(uq_mgr, queue);

[PATCH v10 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Shashank Sharma
To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
- Either pin object or allocate from GART, but not both.
- All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
- Do not take vm->eviction_lock
- Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Remove unused adev (Harish)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 76 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 874ea3901319..6ff04647b62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,73 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(>tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+   return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
   struct amdgpu_usermode_queue *queue,
   struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +128,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
queue_input.queue_size = userq_props->queue_size >> 2;
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+   queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
amdgpu_mes_lock(>mes);
r = adev->mes.funcs->add_hw_queue(>mes, _input);
@@ -184,6 +252,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* FW expects WPTR BOs to be mapped into GART */
+   r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);
+   if (r) {
+   DRM_ERROR("Failed to create WPTR mapping\n");
+   goto free_ctx;
+   }
+
/* Map userqueue into FW using MES */
r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
if (r) {
@@ -213,6 +288,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
mes_v11_0_userq_unmap(uq_mgr, queue);
+   amdgpu_bo_unref(>wptr_obj.obj);
amdgpu_userqueue_destroy_object(uq_mgr, >fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, >mqd);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 643f31474bd8..ffe8a3d73756 100644
--- 

[PATCH v10 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-05-02 Thread Shashank Sharma
A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.

V1: Worked on review comments from Alex:
- Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
- Reuse the existing adev->mqd[ip] for MQD creation
- Formatting and arrangement of code

V3:
- Integration with doorbell manager

V4: Review comments addressed:
- Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
- Align name of structure members (Luben)
- Don't break up the Cc tag list and the Sob tag list in commit
  message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
 calls while creating MQD object to amdgpu_bo_create() once eviction
 fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
 that it can be reused for SDMA userqueues as well (Shashank, Alex)

V10: Addressed review comments from Alex
   - Making this patch independent of IP engine(GFX/SDMA/Compute) and
 specific to MES V11 only, using the generic MQD structure.
   - Splitting a spearate patch to enabling GFX support from here.
   - Verify mqd va address to be non-NULL.
   - Add a separate header file.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   1 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 117 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
 3 files changed, 148 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2d421f17626d..987fabb2b2c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -189,6 +189,7 @@ amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
mes_v11_0.o \
+   mes_v11_0_userqueue.o \
mes_v12_0.o
 
 # add UVD block
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index ..75d7c58418c8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_gfx.h"
+#include "v11_structs.h"
+#include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
+
+static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+ struct drm_amdgpu_userq_in *args_in,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_mqd *mqd_hw_default = >mqds[queue->queue_type];
+   struct drm_amdgpu_userq_in *mqd_user;
+   struct amdgpu_mqd_prop *userq_props;
+   int r;
+
+   /* Incoming MQD parameters from userspace to be saved here */
+   memset(_user, 0, sizeof(mqd_user));
+
+   /* Structure to initialize MQD for userqueue using generic MQD init 
function */
+   userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
+   if (!userq_props) {
+   DRM_ERROR("Failed to 

[PATCH v10 04/14] drm/amdgpu: add helpers to create userqueue object

2024-05-02 Thread Shashank Sharma
This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase
V10:
 - Added Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 13 
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index ce9b25b82e94..edbcb0f4c898 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int 
qid)
return idr_find(_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_param bp;
+   int r;
+
+   memset(, 0, sizeof(bp));
+   bp.byte_align = PAGE_SIZE;
+   bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+   bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   bp.type = ttm_bo_type_kernel;
+   bp.size = size;
+   bp.resv = NULL;
+   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+   r = amdgpu_bo_create(adev, , _obj->obj);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   return r;
+   }
+
+   r = amdgpu_bo_reserve(userq_obj->obj, true);
+   if (r) {
+   DRM_ERROR("Failed to reserve BO to map (%d)", r);
+   goto free_obj;
+   }
+
+   r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+   if (r) {
+   DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+   goto unresv;
+   }
+
+   r = amdgpu_bo_kmap(userq_obj->obj, _obj->cpu_ptr);
+   if (r) {
+   DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+   goto unresv;
+   }
+
+   userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+   amdgpu_bo_unreserve(userq_obj->obj);
+   memset(userq_obj->cpu_ptr, 0, size);
+   return 0;
+
+unresv:
+   amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+   amdgpu_bo_unref(_obj->obj);
+   return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj)
+{
+   amdgpu_bo_kunmap(userq_obj->obj);
+   amdgpu_bo_unref(_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+   void *cpu_ptr;
+   uint64_t gpu_addr;
+   struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
int queue_type;
uint64_tdoorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_mqd_prop  *userq_prop;
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
+   struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.43.2



[PATCH v10 06/14] drm/amdgpu: create context space for usermode queue

2024-05-02 Thread Shashank Sharma
The MES FW expects us to allocate at least one page as context
space to process gang and process related context data. This
patch creates a joint object for the same, and calculates GPU
space offsets of these spaces.

V1: Addressed review comments on RFC patch:
Alex: Make this function IP specific

V2: Addressed review comments from Christian
- Allocate only one object for total FW space, and calculate
  offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
- Remove shadow from FW space list from cover letter (Alex)
- Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
Addressed review comments:
- Use lower_32_bits instead of mask (Christian)
- gfx_v11_0 instead of gfx_v11 in function names (Alex)
- Shadow and GDS objects are now coming from userspace (Christian,
  Alex)

V6:
- Add a comment to replace amdgpu_bo_create_kernel() with
  amdgpu_bo_create() during fw_ctx object creation (Christian).
- Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
  of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

V10:
   - making this patch independent of IP based changes, moving any
 GFX object related changes in GFX specific patch (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Acked-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 75d7c58418c8..58cfc956cddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,31 @@
 #include "mes_v11_0.h"
 #include "mes_v11_0_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+   struct amdgpu_usermode_queue *queue,
+   struct drm_amdgpu_userq_in 
*mqd_user)
+{
+   struct amdgpu_userq_obj *ctx = >fw_obj;
+   int r, size;
+
+   /*
+* The FW expects at least one page space allocated for
+* process ctx and gang ctx each. Create an object
+* for the same.
+*/
+   size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+   r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+   if (r) {
+   DRM_ERROR("Failed to allocate ctx space bo for userqueue, 
err:%d\n", r);
+   return r;
+   }
+
+   return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
  struct drm_amdgpu_userq_in *args_in,
  struct amdgpu_usermode_queue *queue)
@@ -89,6 +114,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Create BO for FW operations */
+   r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   goto free_mqd;
+   }
+
return 0;
 
 free_mqd:
@@ -107,6 +139,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
+   amdgpu_userqueue_destroy_object(uq_mgr, >fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, >mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
struct amdgpu_userq_obj mqd;
+   struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.43.2



[PATCH v10 03/14] drm/amdgpu: add new IOCTL for usermode queue

2024-05-02 Thread Shashank Sharma
This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

V10: Addressed review comments from Christian, and added R-B:
 - Do not initialize the local variable
 - Convert DRM_ERROR to DEBUG.

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b52442e2d04a..551e13693100 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2929,6 +2929,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..ce9b25b82e94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,127 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+   return idr_find(_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+
+   mutex_lock(_mgr->userq_mutex);
+
+   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+   if (!queue) {
+   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+   mutex_unlock(_mgr->userq_mutex);
+   return -EINVAL;
+   }
+
+   uq_funcs = adev->userq_funcs[queue->queue_type];
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(_mgr->userq_idr, queue_id);
+   kfree(queue);
+
+   mutex_unlock(_mgr->userq_mutex);
+   return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = >userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int qid, r = 0;
+
+   /* Usermode queues are only supported for GFX/SDMA engines as of now */
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
+   return -EINVAL;
+   }
+
+   mutex_lock(_mgr->userq_mutex);
+
+   uq_funcs = adev->userq_funcs[args->in.ip_type];
+   if (!uq_funcs) {
+   DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", 
args->in.ip_type);
+   r = -EINVAL;
+   goto unlock;
+   }
+
+   queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+   if (!queue) {
+   DRM_ERROR("Failed to allocate memory for queue\n");
+   r = -ENOMEM;
+   goto unlock;
+   }
+   queue->doorbell_handle = 

[PATCH v10 02/14] drm/amdgpu: add usermode queue base code

2024-05-02 Thread Shashank Sharma
This patch adds IP independent skeleton code for amdgpu
usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase
V10: Rebase + Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 61 +++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index de7b76327f5b..2d421f17626d 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -266,6 +266,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8bb8b414d511..c24e9f9d37e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE   64
 
@@ -486,6 +487,7 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   struct amdgpu_userq_mgr userq_mgr;
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1050,6 +1052,7 @@ struct amdgpu_device {
boolenable_uni_mes;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
+   const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
/* df */
struct amdgpu_dfdf;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 447fa858c654..b52442e2d04a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a0ea6fe8d060..76d02dc330a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -44,6 +44,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1357,6 +1358,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
 
amdgpu_ctx_mgr_init(>ctx_mgr, adev);
 
+   r = amdgpu_userq_mgr_init(>userq_mgr, adev);
+   if (r)
+   DRM_WARN("Can't setup usermode queues, use legacy workload 
submission only\n");
+
file_priv->driver_priv = fpriv;
goto out_suspend;
 
@@ -1426,6 +1431,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
amdgpu_ctx_mgr_fini(>ctx_mgr);
amdgpu_vm_fini(adev, >vm);
+   amdgpu_userq_mgr_fini(>userq_mgr);
 
if (pasid)
amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index ..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software 

[PATCH v10 01/14] drm/amdgpu: UAPI for user queue management

2024-05-02 Thread Shashank Sharma
From: Alex Deucher 

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
- Make the doorbell offset's comment clearer
- Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
- Updated the UAPI doc (Pierre-Eric)
- Created a Union for engine specific MQDs (Alex)
- Added Christian's R-B
V5:
- Add variables for GDS and CSA in MQD structure (Alex)
- Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
 drm_amdgpu_userq_mqd as its being used for SDMA and
 compute queues as well

V10:
- keeping the drm_amdgpu_userq_mqd IP independent, moving the
  _gfx_v11 objects in a separate structure in other patch.
  (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 90 +++
 1 file changed, 90 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 5b6c0055cfcf..f7313e576f06 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM  0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
 #define DRM_AMDGPU_SCHED   0x15
+#define DRM_AMDGPU_USERQ   0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -317,6 +319,94 @@ union drm_amdgpu_ctx {
union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE 1
+#define AMDGPU_USERQ_OP_FREE   2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_in {
+   /** AMDGPU_USERQ_OP_* */
+   __u32   op;
+   /** Queue handle for USERQ_OP_FREE */
+   __u32   queue_id;
+   /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+   __u32   ip_type;
+   /**
+* @flags: flags to indicate special function for queue like secure
+* buffer (TMZ). Unused for now.
+*/
+   __u32   flags;
+   /**
+* @doorbell_handle: the handle of doorbell GEM object
+* associated to this client.
+*/
+   __u32   doorbell_handle;
+   /**
+* @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+* Kernel will generate absolute doorbell offset using doorbell_handle
+* and doorbell_offset in the doorbell bo.
+*/
+   __u32   doorbell_offset;
+
+   /**
+* @queue_va: Virtual address of the GPU memory which holds the queue
+* object. The queue holds the workload packets.
+*/
+   __u64   queue_va;
+   /**
+* @queue_size: Size of the queue in bytes, this needs to be 256-byte
+* aligned.
+*/
+   __u64   queue_size;
+   /**
+* @rptr_va : Virtual address of the GPU memory which holds the ring 
RPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*/
+   __u64   rptr_va;
+   /**
+* @wptr_va : Virtual address of the GPU memory which holds the ring 
WPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*
+* Queue, RPTR and WPTR can come from the same object, as long as the 
size
+* and alignment related requirements are met.
+*/
+   __u64   wptr_va;
+   /**
+* @mqd: Queue descriptor for USERQ_OP_CREATE
+* MQD data can be of different size for 

[PATCH v10 00/14] AMDGPU usermode queues

2024-05-02 Thread Shashank Sharma
This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue and
submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Shadow bufffer pages.
  - GDS buffer pages (as required).
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer, and the GPU will start fetching the data.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).

libDRM changes for this series and a sample DRM test program can be
found here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

MESA changes consuming this series can be seen in the MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Shashank Sharma (13):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: enable GFX-V11 userqueue support
  drm/amdgpu: enable SDMA-V6 usermode queues
  drm/amdgpu: enable compute/gfx usermode queue
  drm/amdgpu: add kernel config for gfx-userqueue

 drivers/gpu/drm/amd/amdgpu/Kconfig|   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile   |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   9 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 338 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|   5 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  79 
 include/uapi/drm/amdgpu_drm.h | 122 +++
 12 files changed, 903 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.43.2



Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 17:22, Christian König wrote:



Am 26.04.24 um 15:48 schrieb Shashank Sharma:

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
   this flag

so that the userqueue works only when the config is enabled.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
  drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++--
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
  4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 22d88f8ef527..bba963527d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
    Add -Werror to the build flags for amdgpu.ko.
    Only enable this if you are warning code for amdgpu.ko.
  +config DRM_AMDGPU_USERQ_GFX
+    bool "Enable Navi 3x gfx usermode queues"
+    depends on DRM_AMDGPU
+    default n
+    help
+  Choose this option to enable usermode queue support for GFX
+  workload submission. This feature is supported on Navi 3X 
only.


When this is for Navi 3x only I would name that 
DRM_AMDGPU_NAVI3X_USERQ instead.



Noted,
And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
from the comment and description.


Noted, I just did not want users to get confused with KFD queues, hence 
added GFX.


I will update the patch with both the changes.

- Shashank


Apart from that the approach looks good to me.

Christian.


+
  source "drivers/gpu/drm/amd/acp/Kconfig"
  source "drivers/gpu/drm/amd/display/Kconfig"
  source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index a640bfa468ad..0b17fc1740a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,8 +184,12 @@ amdgpu-y += \
  amdgpu-y += \
  amdgpu_mes.o \
  mes_v10_1.o \
-    mes_v11_0.o \
-    mes_v11_0_userqueue.o
+    mes_v11_0.o
+
+# add GFX userqueue support
+ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
+amdgpu-y += mes_v11_0_userqueue.o
+endif
    # add UVD block
  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

index 27b86f7fe949..8591aed9f9ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 2;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;

+#endif
  break;
  case IP_VERSION(11, 0, 1):
  case IP_VERSION(11, 0, 4):
@@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 1;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;

+#endif
  break;
  default:
  adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c

index 90354a70c807..084059c95db6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
  return -EINVAL;
  }
  +#ifdef CONFIG_DRM_AMD_USERQ_GFX
  adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
+#endif
+
  return r;
  }




Re: [PATCH v9 00/14] AMDGPU usermode queues

2024-05-02 Thread Alex Deucher
On Fri, Apr 26, 2024 at 10:17 AM Shashank Sharma
 wrote:
>
> This patch series introduces AMDGPU usermode queues for gfx workloads.
> Usermode queues is a method of GPU workload submission into the graphics
> hardware without any interaction with kernel/DRM schedulers. In this
> method, a userspace graphics application can create its own workqueue and
> submit it directly in the GPU HW.
>
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
>   - Shadow bufffer pages.
>   - GDS buffer pages (as required).
> - The application picks a 32-bit offset in the doorbell page for this
>   queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the GPU addresses of these objects (read ptr,
>   write ptr, queue base address, shadow, gds) with doorbell object and
>   32-bit doorbell offset in the doorbell page.
> - The kernel creates the queue and maps it in the HW.
> - The application maps the GPU buffers in process address space.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - After filling the workload data in the queue, the app must write the
>   number of dwords added in the queue into the doorbell offset and the
>   WPTR buffer, and the GPU will start fetching the data.
> - This series adds usermode queue support for all three MES based IPs
>   (GFX, SDMA and Compute).

I think we also need a new INFO IOCTL query to get the doorbell
offsets for each engine type within each doorbell page.

Alex

>
> libDRM changes for this series and a sample DRM test program can be found
> in the MESA merge request here:
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> Alex Deucher (1):
>   drm/amdgpu: UAPI for user queue management
>
> Arvind Yadav (1):
>   drm/amdgpu: enable compute/gfx usermode queue
>
> Shashank Sharma (12):
>   drm/amdgpu: add usermode queue base code
>   drm/amdgpu: add new IOCTL for usermode queue
>   drm/amdgpu: add helpers to create userqueue object
>   drm/amdgpu: create MES-V11 usermode queue for GFX
>   drm/amdgpu: create context space for usermode queue
>   drm/amdgpu: map usermode queue into MES
>   drm/amdgpu: map wptr BO into GART
>   drm/amdgpu: generate doorbell index for userqueue
>   drm/amdgpu: cleanup leftover queues
>   drm/amdgpu: fix MES GFX mask
>   drm/amdgpu: enable SDMA usermode queues
>   drm/amdgpu: add kernel config for gfx-userqueue
>
>  drivers/gpu/drm/amd/amdgpu/Kconfig|   8 +
>  drivers/gpu/drm/amd/amdgpu/Makefile   |   7 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |   3 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |   1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  10 +
>  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c|   9 +-
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   9 +-
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 317 ++
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|   6 +
>  .../gpu/drm/amd/include/amdgpu_userqueue.h|  79 +
>  include/uapi/drm/amdgpu_drm.h | 111 ++
>  15 files changed, 859 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> --
> 2.43.2
>


Re: [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 17:19, Christian König wrote:

Am 26.04.24 um 15:48 schrieb Shashank Sharma:

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same.
- Removes the central mask setup and makes it IP specific, as it would
   be different when the number of pipes and queues are different.

V9: introduce this patch in the series


As far as I can see this is a bug fix for existing code and should be 
pushed completely independent of the other work to amd-staging-drm-next.


Agreed, I added it here for completion of series. I had pushed this as 
single patch as well last week, I will push it accordingly.


- Shashank


Regards,
Christian.



Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++--
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++--
  4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c

index a00cf4756ad0..b405fafc0b71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
  adev->mes.compute_hqd_mask[i] = 0xc;
  }
  -    for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-    adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
  for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
  if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
  IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

index 4c8fc3117ef8..598556619337 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -110,7 +110,6 @@ struct amdgpu_mes {
  uint32_t    vmid_mask_gfxhub;
  uint32_t    vmid_mask_mmhub;
  uint32_t compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
-    uint32_t gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
  uint32_t sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
  uint32_t aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
  uint32_t    sch_ctx_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c

index 1e5ad1e08d2a..4d1121d1a1e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct 
amdgpu_mes *mes)

  mes_set_hw_res_pkt.compute_hqd_mask[i] =
  mes->compute_hqd_mask[i];
  -    for (i = 0; i < MAX_GFX_PIPES; i++)
-    mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+    /*
+ * GFX pipe 0 queue 0 is being used by kernel
+ * Set GFX pipe 0 queue 1 for MES scheduling
+ * GFX pipe 1 can't be used for MES due to HW limitation.
+ */
+    mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+    mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
    for (i = 0; i < MAX_SDMA_PIPES; i++)
  mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c

index 63f281a9984d..feb7fa2c304c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct 
amdgpu_mes *mes)

  mes_set_hw_res_pkt.compute_hqd_mask[i] =
  mes->compute_hqd_mask[i];
  -    for (i = 0; i < MAX_GFX_PIPES; i++)
-    mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+    /*
+ * GFX pipe 0 queue 0 is being used by kernel
+ * Set GFX pipe 0 queue 1 for MES scheduling
+ * GFX pipe 1 can't be used for MES due to HW limitation.
+ */
+    mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+    mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
    for (i = 0; i < MAX_SDMA_PIPES; i++)
  mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];




Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 17:18, Christian König wrote:

Am 26.04.24 um 15:48 schrieb Shashank Sharma:

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
 - Either pin object or allocate from GART, but not both.
 - All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
 - Do not take vm->eviction_lock
 - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 


The patch itself looks good, but this really need the eviction fence 
to work properly.


Otherwise it can be that the BO mapped into the GART is evicted at 
some point.



Noted, eviction fences will be following up soon.

- Shashank



Christian.


---
  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
  2 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

index 8d2cd61af26b..37b80626e792 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,74 @@
  #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
  #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
  +static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
amdgpu_bo *bo)

+{
+    int ret;
+
+    ret = amdgpu_bo_reserve(bo, true);
+    if (ret) {
+    DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+    goto err_reserve_bo_failed;
+    }
+
+    ret = amdgpu_ttm_alloc_gart(>tbo);
+    if (ret) {
+    DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+    goto err_map_bo_gart_failed;
+    }
+
+    amdgpu_bo_unreserve(bo);
+    bo = amdgpu_bo_ref(bo);
+
+    return 0;
+
+err_map_bo_gart_failed:
+    amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+    return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue,
+  uint64_t wptr)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct amdgpu_bo_va_mapping *wptr_mapping;
+    struct amdgpu_vm *wptr_vm;
+    struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
+    int ret;
+
+    wptr_vm = queue->vm;
+    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+    if (ret)
+    return ret;
+
+    wptr &= AMDGPU_GMC_HOLE_MASK;
+    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
PAGE_SHIFT);

+    amdgpu_bo_unreserve(wptr_vm->root.bo);
+    if (!wptr_mapping) {
+    DRM_ERROR("Failed to lookup wptr bo\n");
+    return -EINVAL;
+    }
+
+    wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+    if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+    DRM_ERROR("Requested GART mapping for wptr bo larger than 
one page\n");

+    return -EINVAL;
+    }
+
+    ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
+    if (ret) {
+    DRM_ERROR("Failed to map wptr bo to GART\n");
+    return ret;
+    }
+
+    queue->wptr_obj.gpu_addr = 
amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);

+    return 0;
+}
+
  static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 struct amdgpu_usermode_queue *queue,
 struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct 
amdgpu_userq_mgr *uq_mgr,

  queue_input.queue_size = userq_props->queue_size >> 2;
  queue_input.doorbell_offset = userq_props->doorbell_index;
  queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);

+    queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
    amdgpu_mes_lock(>mes);
  r = adev->mes.funcs->add_hw_queue(>mes, _input);
@@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,

  goto free_mqd;
  }
  +    /* FW expects WPTR BOs to be mapped into GART */
+    r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);

+    if (r) {
+    DRM_ERROR("Failed to create WPTR mapping\n");
+    goto free_ctx;
+    }
+
  /* Map userqueue into FW using MES */
  r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
  if (r) {
@@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct 
amdgpu_userq_mgr *uq_mgr,

  struct amdgpu_usermode_queue *queue)
  {
  mes_v11_0_userq_unmap(uq_mgr, queue);
+    amdgpu_bo_unref(>wptr_obj.obj);
  amdgpu_userqueue_destroy_object(uq_mgr, >fw_obj);
  kfree(queue->userq_prop);
  amdgpu_userqueue_destroy_object(uq_mgr, >mqd);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 

Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 17:14, Christian König wrote:



Am 26.04.24 um 15:48 schrieb Shashank Sharma:

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
   functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.
- Adds new functions to create and destroy userqueue MQD for
   MES-V11 for GFX IP.

V1: Worked on review comments from Alex:
 - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
 - Reuse the existing adev->mqd[ip] for MQD creation
 - Formatting and arrangement of code

V3:
 - Integration with doorbell manager

V4: Review comments addressed:
 - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
 - Align name of structure members (Luben)
 - Don't break up the Cc tag list and the Sob tag list in commit
   message (Luben)
V5:
    - No need to reserve the bo for MQD (Christian).
    - Some more changes to support IP specific MQD creation.

V6:
    - Add a comment reminding us to replace the 
amdgpu_bo_create_kernel()
  calls while creating MQD object to amdgpu_bo_create() once 
eviction

  fences are ready (Christian).

V7:
    - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
    - Use memdup_user instead of copy_from_user (Christian)

V9:
    - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
  that it can be reused for SDMA userqueues as well (Shashank, Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c    |   4 +
  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++
  3 files changed, 116 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 05a2d1714070..a640bfa468ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,7 +184,8 @@ amdgpu-y += \
  amdgpu-y += \
  amdgpu_mes.o \
  mes_v10_1.o \
-    mes_v11_0.o
+    mes_v11_0.o \
+    mes_v11_0_userqueue.o


Do we really need a new C file for this or could we put the two 
functions into mes_v11_0.c as well?


Apart from that it looks correct to me, but I'm really not that deep 
inside the code at the moment.


Actually, this patch adds these two functions, and then the upcoming 
patches add other multiple functions to create/destroy FW objects, 
map/unmap_queue, handle doorbell and map wptr BO on top of these. So 
when we look at it in the end, its probably fine :).


- Shashank


Regards,
Christian.


    # add UVD block
  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

index f7325b02a191..525bd0f4d3f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1331,6 +1331,8 @@ static int 
gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)

  return 0;
  }
  +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+
  static int gfx_v11_0_sw_init(void *handle)
  {
  int i, j, k, r, ring_id = 0;
@@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 2;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+    adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  break;
  case IP_VERSION(11, 0, 1):
  case IP_VERSION(11, 0, 4):
@@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_mec = 1;
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
+    adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
  break;
  default:
  adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

new file mode 100644
index ..9e7dee77d344
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this 

Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-05-02 Thread Christian König




Am 26.04.24 um 15:48 schrieb Shashank Sharma:

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
   this flag

so that the userqueue works only when the config is enabled.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
  drivers/gpu/drm/amd/amdgpu/Makefile| 8 ++--
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
  4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef527..bba963527d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
  Add -Werror to the build flags for amdgpu.ko.
  Only enable this if you are warning code for amdgpu.ko.
  
+config DRM_AMDGPU_USERQ_GFX

+   bool "Enable Navi 3x gfx usermode queues"
+   depends on DRM_AMDGPU
+   default n
+   help
+ Choose this option to enable usermode queue support for GFX
+  workload submission. This feature is supported on Navi 3X only.


When this is for Navi 3x only I would name that DRM_AMDGPU_NAVI3X_USERQ 
instead.


And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
from the comment and description.


Apart from that the approach looks good to me.

Christian.


+
  source "drivers/gpu/drm/amd/acp/Kconfig"
  source "drivers/gpu/drm/amd/display/Kconfig"
  source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index a640bfa468ad..0b17fc1740a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,8 +184,12 @@ amdgpu-y += \
  amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
-   mes_v11_0.o \
-   mes_v11_0_userqueue.o
+   mes_v11_0.o
+
+# add GFX userqueue support
+ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
+amdgpu-y += mes_v11_0_userqueue.o
+endif
  
  # add UVD block

  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 27b86f7fe949..8591aed9f9ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
+#endif
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
+#endif
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 90354a70c807..084059c95db6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
  
+#ifdef CONFIG_DRM_AMD_USERQ_GFX

adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
+#endif
+
return r;
  }
  




Re: [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask

2024-05-02 Thread Christian König

Am 26.04.24 um 15:48 schrieb Shashank Sharma:

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same.
- Removes the central mask setup and makes it IP specific, as it would
   be different when the number of pipes and queues are different.

V9: introduce this patch in the series


As far as I can see this is a bug fix for existing code and should be 
pushed completely independent of the other work to amd-staging-drm-next.


Regards,
Christian.



Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++--
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++--
  4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index a00cf4756ad0..b405fafc0b71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
adev->mes.compute_hqd_mask[i] = 0xc;
}
  
-	for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)

-   adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 4c8fc3117ef8..598556619337 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -110,7 +110,6 @@ struct amdgpu_mes {
uint32_tvmid_mask_gfxhub;
uint32_tvmid_mask_mmhub;
uint32_t
compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
-   uint32_tgfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
uint32_t
sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
uint32_t
aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
uint32_tsch_ctx_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..4d1121d1a1e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
  
-	for (i = 0; i < MAX_GFX_PIPES; i++)

-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
  
  	for (i = 0; i < MAX_SDMA_PIPES; i++)

mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 63f281a9984d..feb7fa2c304c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
  
-	for (i = 0; i < MAX_GFX_PIPES; i++)

-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
  
  	for (i = 0; i < MAX_SDMA_PIPES; i++)

mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];




Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Christian König

Am 26.04.24 um 15:48 schrieb Shashank Sharma:

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
 - Either pin object or allocate from GART, but not both.
 - All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
 - Do not take vm->eviction_lock
 - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 


The patch itself looks good, but this really need the eviction fence to 
work properly.


Otherwise it can be that the BO mapped into the GART is evicted at some 
point.


Christian.


---
  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
  .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
  2 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 8d2cd61af26b..37b80626e792 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,74 @@
  #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
  #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
  
+static int

+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(>tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+   return 0;
+}
+
  static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
   struct amdgpu_usermode_queue *queue,
   struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
queue_input.queue_size = userq_props->queue_size >> 2;
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+   queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
  
  	amdgpu_mes_lock(>mes);

r = adev->mes.funcs->add_hw_queue(>mes, _input);
@@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
  
+	/* FW expects WPTR BOs to be mapped into GART */

+   r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);
+   if (r) {
+   DRM_ERROR("Failed to create WPTR mapping\n");
+   goto free_ctx;
+   }
+
/* Map userqueue into FW using MES */
r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
if (r) {
@@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
  {
mes_v11_0_userq_unmap(uq_mgr, queue);
+   amdgpu_bo_unref(>wptr_obj.obj);
amdgpu_userqueue_destroy_object(uq_mgr, 

Re: [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue

2024-05-02 Thread Christian König

Am 26.04.24 um 15:48 schrieb Shashank Sharma:

The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW  related work.
This patch creates a joint object for the same, and calculates
GPU space offsets of these spaces.

V1: Addressed review comments on RFC patch:
 Alex: Make this function IP specific

V2: Addressed review comments from Christian
 - Allocate only one object for total FW space, and calculate
   offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
 - Remove shadow from FW space list from cover letter (Alex)
 - Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
 Addressed review comments:
 - Use lower_32_bits instead of mask (Christian)
 - gfx_v11_0 instead of gfx_v11 in function names (Alex)
 - Shadow and GDS objects are now coming from userspace (Christian,
   Alex)

V6:
 - Add a comment to replace amdgpu_bo_create_kernel() with
   amdgpu_bo_create() during fw_ctx object creation (Christian).
 - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
   of generic queue structure and make it gen11 specific (Alex).

V7:
- Using helper function to create/destroy userqueue objects.
- Removed FW object space allocation.

V8:
- Updating FW object address from user values.

V9:
- uppdated function name from gfx_v11_* to mes_v11_*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 


Acked-by: Christian König 


---
  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++
  .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
  2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 9e7dee77d344..9f9fdcb9c294 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,41 @@
  #include "mes_v11_0.h"
  #include "amdgpu_userqueue.h"
  
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE

+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+   struct amdgpu_usermode_queue *queue,
+   struct drm_amdgpu_userq_mqd 
*mqd_user)
+{
+   struct amdgpu_userq_obj *ctx = >fw_obj;
+   struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+   int r, size;
+
+   /*
+* The FW expects at least one page space allocated for
+* process ctx and gang ctx each. Create an object
+* for the same.
+*/
+   size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+   r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+   if (r) {
+   DRM_ERROR("Failed to allocate ctx space bo for userqueue, 
err:%d\n", r);
+   return r;
+   }
+
+   /* Shadow and GDS objects come directly from userspace */
+   mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFC;
+   mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
+
+   mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFC;
+   mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
+
+   mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFC;
+   mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
+   return 0;
+}
+
  static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
  struct drm_amdgpu_userq_in *args_in,
  struct amdgpu_usermode_queue *queue)
@@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
  
+	/* Create BO for FW operations */

+   r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   goto free_mqd;
+   }
+
return 0;
  
  free_mqd:

@@ -100,6 +142,7 @@ static void
  mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
  {
+   amdgpu_userqueue_destroy_object(uq_mgr, >fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, >mqd);
  }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
struct amdgpu_userq_obj mqd;
+   struct amdgpu_userq_obj fw_obj;
  };
  
  struct amdgpu_userq_funcs {




Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-05-02 Thread Christian König




Am 26.04.24 um 15:48 schrieb Shashank Sharma:

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
   functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.
- Adds new functions to create and destroy userqueue MQD for
   MES-V11 for GFX IP.

V1: Worked on review comments from Alex:
 - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
 - Reuse the existing adev->mqd[ip] for MQD creation
 - Formatting and arrangement of code

V3:
 - Integration with doorbell manager

V4: Review comments addressed:
 - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
 - Align name of structure members (Luben)
 - Don't break up the Cc tag list and the Sob tag list in commit
   message (Luben)
V5:
- No need to reserve the bo for MQD (Christian).
- Some more changes to support IP specific MQD creation.

V6:
- Add a comment reminding us to replace the amdgpu_bo_create_kernel()
  calls while creating MQD object to amdgpu_bo_create() once eviction
  fences are ready (Christian).

V7:
- Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
- Use memdup_user instead of copy_from_user (Christian)

V9:
- Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
  that it can be reused for SDMA userqueues as well (Shashank, Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   4 +
  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++
  3 files changed, 116 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 05a2d1714070..a640bfa468ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,7 +184,8 @@ amdgpu-y += \
  amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
-   mes_v11_0.o
+   mes_v11_0.o \
+   mes_v11_0_userqueue.o


Do we really need a new C file for this or could we put the two 
functions into mes_v11_0.c as well?


Apart from that it looks correct to me, but I'm really not that deep 
inside the code at the moment.


Regards,
Christian.

  
  # add UVD block

  amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index f7325b02a191..525bd0f4d3f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct 
amdgpu_device *adev)
return 0;
  }
  
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;

+
  static int gfx_v11_0_sw_init(void *handle)
  {
int i, j, k, r, ring_id = 0;
@@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index ..9e7dee77d344
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread Philip Yang

  


On 2024-05-02 08:42, James Zhu wrote:


  
  On 2024-05-01 18:56, Philip Yang wrote:
  
  On system with khugepaged enabled and user
cases with THP buffer, the

hmm_range_fault may takes > 15 seconds to return -EBUSY, the
arbitrary

timeout value is not accurate, cause memory allocation failure.


Remove the arbitrary timeout value, return EAGAIN to application
if

hmm_range_fault return EBUSY, then userspace libdrm and Thunk
will call

ioctl again.


Change EAGAIN to debug message as this is not error.


Signed-off-by: Philip Yang 

---

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  5 -

  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c  | 12
+++-

  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  5 +

  3 files changed, 8 insertions(+), 14 deletions(-)


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index 54198c3928c7..02696c2102f1 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

@@ -1087,7 +1087,10 @@ static int init_user_pages(struct kgd_mem
*mem, uint64_t user_addr,

    ret = amdgpu_ttm_tt_get_user_pages(bo,
bo->tbo.ttm->pages, );

  if (ret) {

-    pr_err("%s: Failed to get user pages: %d\n", __func__,
ret);

+    if (ret == -EAGAIN)

+    pr_debug("Failed to get user pages, try again\n");

+    else

+    pr_err("%s: Failed to get user pages: %d\n",
__func__, ret);

  goto unregister_out;

  }

  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index 431ec72655ec..e36fede7f74c 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

@@ -202,20 +202,12 @@ int amdgpu_hmm_range_get_pages(struct
mmu_interval_notifier *notifier,

  pr_debug("hmm range: start = 0x%lx, end = 0x%lx",

  hmm_range->start, hmm_range->end);

  -    /* Assuming 64MB takes maximum 1 second to fault page
address */

-    timeout = max((hmm_range->end - hmm_range->start)
>> 26, 1UL);

-    timeout *= HMM_RANGE_DEFAULT_TIMEOUT;

-    timeout = jiffies + msecs_to_jiffies(timeout);

+    timeout = jiffies +
msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);

  
  [JZ] should we reduce MAX_WALK_BYTE to 64M in the meantime?
  

From debug log, the range size is not related, 64MB range may takes
same long time to return EBUSY too.

      retry:

  hmm_range->notifier_seq =
mmu_interval_read_begin(notifier);

  r = hmm_range_fault(hmm_range);

  if (unlikely(r)) {

-    schedule();

  
  [JZ] the above is for CPU stall WA, we may still need keep it.
  

The timeout 1 second should be long enough for normal case, if
  hmm_range_fault returns EBUSY, we release mmap_read lock and
  return to user space, so don't need explicit schedule to fix the
  CPU stale warning. Will run overnight KFDTest LargestSysBufferTest
  on larger memory system to confirm if there is CPU stale message.
Regards,
Philip


  -    /*

- * FIXME: This timeout should encompass the retry
from

- * mmu_interval_read_retry() as well.

- */

  if (r == -EBUSY && !time_after(jiffies,
timeout))

  goto retry;

  goto out_free_pfns;

@@ -247,6 +239,8 @@ int amdgpu_hmm_range_get_pages(struct
mmu_interval_notifier *notifier,

  out_free_range:

  kfree(hmm_range);

  +    if (r == -EBUSY)

+    r = -EAGAIN;

  return r;

  }

  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 94f83be2232d..e7040f809f33 100644

--- 

Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 10:43 AM Tvrtko Ursulin
 wrote:
>
>
> On 02/05/2024 14:07, Christian König wrote:
> > Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:
> >>
> >> Hi Alex,
> >>
> >> On 30/04/2024 19:32, Alex Deucher wrote:
> >>> On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin 
> >>> wrote:
> 
>  From: Tvrtko Ursulin 
> 
>  I have noticed AMD GPUs can have more than one "engine" (ring?) of
>  the same type
>  but amdgpu is not reporting that in fdinfo using the capacity engine
>  tag.
> 
>  This series is therefore an attempt to improve that, but only an RFC
>  since it is
>  quite likely I got stuff wrong on the first attempt. Or if not wrong
>  it may not
>  be very beneficial in AMDs case.
> 
>  So I tried to figure out how to count and store the number of
>  instances of an
>  "engine" type and spotted that could perhaps be used in more than
>  one place in
>  the driver. I was more than a little bit confused by the ip_instance
>  and uapi
>  rings, then how rings are selected to context entities internally.
>  Anyway..
>  hopefully it is a simple enough series to easily spot any such large
>  misses.
> 
>  End result should be that, assuming two "engine" instances, one
>  fully loaded and
>  one idle will only report client using 50% of that engine type.
> >>>
> >>> That would only be true if there are multiple instantiations of the IP
> >>> on the chip which in most cases is not true.  In most cases there is
> >>> one instance of the IP that can be fed from multiple rings. E.g. for
> >>> graphics and compute, all of the rings ultimately feed into the same
> >>> compute units on the chip.  So if you have a gfx ring and a compute
> >>> rings, you can schedule work to them asynchronously, but ultimately
> >>> whether they execute serially or in parallel depends on the actual
> >>> shader code in the command buffers and the extent to which it can
> >>> utilize the available compute units in the shader cores.
> >>
> >> This is the same as with Intel/i915. Fdinfo is not intended to provide
> >> utilisation of EUs and such, just how busy are the "entities" kernel
> >> submits to. So doing something like in this series would make the
> >> reporting more similar between the two drivers.
> >>
> >> I think both the 0-800% or 0-100% range (taking 8 ring compute as an
> >> example) can be misleading for different workloads. Neither <800% in
> >> the former means one can send more work and same for <100% in the latter.
> >
> > Yeah, I think that's what Alex tries to describe. By using 8 compute
> > rings your 800% load is actually incorrect and quite misleading.
> >
> > Background is that those 8 compute rings won't be active all at the same
> > time, but rather waiting on each other for resources.
> >
> > But this "waiting" is unfortunately considered execution time since the
> > used approach is actually not really capable of separating waiting and
> > execution time.
>
> Right, so 800% is what gputop could be suggesting today, by the virtue 8
> context/clients can each use 100% if they only use a subset of compute
> units. I was proposing to expose the capacity in fdinfo so it can be
> scaled down and then dicussing how both situation have pros and cons.
>
> >> There is also a parallel with the CPU world here and hyper threading,
> >> if not wider, where "What does 100% actually mean?" is also wishy-washy.
> >>
> >> Also note that the reporting of actual time based values in fdinfo
> >> would not changing with this series.
> >>
> >> Of if you can guide me towards how to distinguish real vs fake
> >> parallelism in HW IP blocks I could modify the series to only add
> >> capacity tags where there are truly independent blocks. That would be
> >> different from i915 though were I did not bother with that
> >> distinction. (For reasons that assignment of for instance EUs to
> >> compute "rings" (command streamers in i915) was supposed to be
> >> possible to re-configure on the fly. So it did not make sense to try
> >> and be super smart in fdinfo.)
> >
> > Well exactly that's the point we don't really have truly independent
> > blocks on AMD hardware.
> >
> > There are things like independent SDMA instances, but those a meant to
> > be used like the first instance for uploads and the second for downloads
> > etc.. When you use both instances for the same job they will pretty much
> > limit each other because of a single resource.
>
> So _never_ multiple instances of the same IP block? No video decode,
> encode, anything?

Some chips have multiple encode/decode IP blocks that are actually
separate instances, however, we load balance between them so userspace
sees just one engine.  Also in some cases they are asymmetric (e.g.,
different sets of supported CODECs on each instance).  The driver
handles this by inspecting the command buffer and scheduling on the
appropriate instance based on the requested CODEC. 

Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

2024-05-02 Thread Maarten Lankhorst

Hey,

Den 2024-04-24 kl. 18:56, skrev Friedrich Vock:

Hi everyone,

recently I've been looking into remedies for apps (in particular, newer
games) that experience significant performance loss when they start to
hit VRAM limits, especially on older or lower-end cards that struggle
to fit both desktop apps and all the game data into VRAM at once.

The root of the problem lies in the fact that from userspace's POV,
buffer eviction is very opaque: Userspace applications/drivers cannot
tell how oversubscribed VRAM is, nor do they have fine-grained control
over which buffers get evicted.  At the same time, with GPU APIs becoming
increasingly lower-level and GPU-driven, only the application itself
can know which buffers are used within a particular submission, and
how important each buffer is. For this, GPU APIs include interfaces
to query oversubscription and specify memory priorities: In Vulkan,
oversubscription can be queried through the VK_EXT_memory_budget
extension. Different buffers can also be assigned priorities via the
VK_EXT_pageable_device_local_memory extension. Modern games, especially
D3D12 games via vkd3d-proton, rely on oversubscription being reported and
priorities being respected in order to perform their memory management.

However, relaying this information to the kernel via the current KMD uAPIs
is not possible. On AMDGPU for example, all work submissions include a
"bo list" that contains any buffer object that is accessed during the
course of the submission. If VRAM is oversubscribed and a buffer in the
list was evicted to system memory, that buffer is moved back to VRAM
(potentially evicting other unused buffers).

Since the usermode driver doesn't know what buffers are used by the
application, its only choice is to submit a bo list that contains every
buffer the application has allocated. In case of VRAM oversubscription,
it is highly likely that some of the application's buffers were evicted,
which almost guarantees that some buffers will get moved around. Since
the bo list is only known at submit time, this also means the buffers
will get moved right before submitting application work, which is the
worst possible time to move buffers from a latency perspective. Another
consequence of the large bo list is that nearly all memory from other
applications will be evicted, too. When different applications (e.g. game
and compositor) submit work one after the other, this causes a ping-pong
effect where each app's submission evicts the other app's memory,
resulting in a large amount of unnecessary moves.

This overly aggressive eviction behavior led to RADV adopting a change
that effectively allows all VRAM applications to reside in system memory
[1].  This worked around the ping-ponging/excessive buffer moving problem,
but also meant that any memory evicted to system memory would forever
stay there, regardless of how VRAM is used.

My proposal aims at providing a middle ground between these extremes.
The goals I want to meet are:
- Userspace is accurately informed about VRAM oversubscription/how much
   VRAM has been evicted
- Buffer eviction respects priorities set by userspace - Wasteful
   ping-ponging is avoided to the extent possible

I have been testing out some prototypes, and came up with this rough
sketch of an API:

- For each ttm_resource_manager, the amount of evicted memory is tracked
   (similarly to how "usage" tracks the memory usage). When memory is
   evicted via ttm_bo_evict, the size of the evicted memory is added, when
   memory is un-evicted (see below), its size is subtracted. The amount of
   evicted memory for e.g. VRAM can be queried by userspace via an ioctl.

- Each ttm_resource_manager maintains a list of evicted buffer objects.

- ttm_mem_unevict walks the list of evicted bos for a given
   ttm_resource_manager and tries moving evicted resources back. When a
   buffer is freed, this function is called to immediately restore some
   evicted memory.

- Each ttm_buffer_object independently tracks the mem_type it wants
   to reside in.

- ttm_bo_try_unevict is added as a helper function which attempts to
   move the buffer to its preferred mem_type. If no space is available
   there, it fails with -ENOSPC/-ENOMEM.

- Similar to how ttm_bo_evict works, each driver can implement
   uneviction_valuable/unevict_flags callbacks to control buffer
   un-eviction.

This is what patches 1-10 accomplish (together with an amdgpu
implementation utilizing the new API).

Userspace priorities could then be implemented as follows:

- TTM already manages priorities for each buffer object. These priorities
   can be updated by userspace via a GEM_OP ioctl to inform the kernel
   which buffers should be evicted before others. If an ioctl increases
   the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
   try and move it back (potentially evicting buffers with a lower
   priority)

- Buffers should never be evicted by other buffers with equal/lower
   priority, but 

Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 16:10, Alex Deucher wrote:

On Thu, May 2, 2024 at 1:51 AM Sharma, Shashank  wrote:


On 01/05/2024 22:44, Alex Deucher wrote:

On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
 wrote:

From: Arvind Yadav 

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c|  3 ++-
   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  2 ++
   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +-
   include/uapi/drm/amdgpu_drm.h|  1 +
   4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index e516487e8db9..78d34fa7a0b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
  int qid, r = 0;

  /* Usermode queues are only supported for GFX/SDMA engines as of now 
*/
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA
+   && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
  DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
  return -EINVAL;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 525bd0f4d3f7..27b86f7fe949 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
  break;
  case IP_VERSION(11, 0, 1):
  case IP_VERSION(11, 0, 4):
@@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
  adev->gfx.mec.num_pipe_per_mec = 4;
  adev->gfx.mec.num_queue_per_pipe = 4;
  adev->userq_funcs[AMDGPU_HW_IP_GFX] = _mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
_mes_v11_0_funcs;
  break;
  default:
  adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a5e270eda37b..d61d80f86003 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
  }

  /* We don't need to set other FW objects for SDMA queues */
-   if (queue->queue_type == AMDGPU_HW_IP_DMA)
+   if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
+   (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
  return 0;

  /* Shadow and GDS objects come directly from userspace */
@@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
  userq_props->use_doorbell = true;
  userq_props->doorbell_index = queue->doorbell_index;

+   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+   userq_props->eop_gpu_addr = mqd_user->eop_va;
+   userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   userq_props->hqd_queue_priority = 
AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+   userq_props->hqd_active = false;
+   }
+
  queue->userq_prop = userq_props;

  r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 22f56a30f7cb..676792ad3618 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
   * sized.
   */
  __u64   csa_va;
+   __u64   eop_va;
   };

Let's add a new mqd descriptor for compute since it's different from
gfx and sdma.

the only different thing is this object (vs csa and gds objects), apart
from that, the mqd is the same as they all are MES based. Am I missing
something here ?

The scheduling entity is irrelevant.  The mqd is defined by the engine
itself.  E.g., v11_structs.h.  Gfx has one set of requirements,
compute has different ones, and SDMA has different ones.  VPE and VCN
also have mqds.  When we add support for them in the future, they may
have additional requirements.  I want to make it clear in the
interface what 

Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 1:51 AM Sharma, Shashank  wrote:
>
>
> On 01/05/2024 22:44, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> >  wrote:
> >> From: Arvind Yadav 
> >>
> >> This patch does the necessary changes required to
> >> enable compute workload support using the existing
> >> usermode queues infrastructure.
> >>
> >> Cc: Alex Deucher 
> >> Cc: Christian Koenig 
> >> Signed-off-by: Arvind Yadav 
> >> Signed-off-by: Shashank Sharma 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c|  3 ++-
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  2 ++
> >>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +-
> >>   include/uapi/drm/amdgpu_drm.h|  1 +
> >>   4 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> index e516487e8db9..78d34fa7a0b9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> @@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
> >> drm_amdgpu_userq *args)
> >>  int qid, r = 0;
> >>
> >>  /* Usermode queues are only supported for GFX/SDMA engines as of 
> >> now */
> >> -   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
> >> AMDGPU_HW_IP_DMA) {
> >> +   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
> >> AMDGPU_HW_IP_DMA
> >> +   && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
> >>  DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
> >> args->in.ip_type);
> >>  return -EINVAL;
> >>  }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> index 525bd0f4d3f7..27b86f7fe949 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> @@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
> >>  adev->gfx.mec.num_pipe_per_mec = 4;
> >>  adev->gfx.mec.num_queue_per_pipe = 4;
> >>  adev->userq_funcs[AMDGPU_HW_IP_GFX] = 
> >> _mes_v11_0_funcs;
> >> +   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
> >> _mes_v11_0_funcs;
> >>  break;
> >>  case IP_VERSION(11, 0, 1):
> >>  case IP_VERSION(11, 0, 4):
> >> @@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
> >>  adev->gfx.mec.num_pipe_per_mec = 4;
> >>  adev->gfx.mec.num_queue_per_pipe = 4;
> >>  adev->userq_funcs[AMDGPU_HW_IP_GFX] = 
> >> _mes_v11_0_funcs;
> >> +   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
> >> _mes_v11_0_funcs;
> >>  break;
> >>  default:
> >>  adev->gfx.me.num_me = 1;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
> >> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index a5e270eda37b..d61d80f86003 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct 
> >> amdgpu_userq_mgr *uq_mgr,
> >>  }
> >>
> >>  /* We don't need to set other FW objects for SDMA queues */
> >> -   if (queue->queue_type == AMDGPU_HW_IP_DMA)
> >> +   if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
> >> +   (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
> >>  return 0;
> >>
> >>  /* Shadow and GDS objects come directly from userspace */
> >> @@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct 
> >> amdgpu_userq_mgr *uq_mgr,
> >>  userq_props->use_doorbell = true;
> >>  userq_props->doorbell_index = queue->doorbell_index;
> >>
> >> +   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
> >> +   userq_props->eop_gpu_addr = mqd_user->eop_va;
> >> +   userq_props->hqd_pipe_priority = 
> >> AMDGPU_GFX_PIPE_PRIO_NORMAL;
> >> +   userq_props->hqd_queue_priority = 
> >> AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
> >> +   userq_props->hqd_active = false;
> >> +   }
> >> +
> >>  queue->userq_prop = userq_props;
> >>
> >>  r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
> >> userq_props);
> >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> >> index 22f56a30f7cb..676792ad3618 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
> >>   * sized.
> >>   */
> >>  __u64   csa_va;
> >> +   __u64   eop_va;
> >>   };
> > Let's add a new mqd descriptor for compute since it's different from
> > gfx and sdma.
> the only different thing is this object (vs csa and gds 

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread Philip Yang

  


On 2024-05-02 00:09, Chen, Xiaogang
  wrote:


  
  On 5/1/2024 5:56 PM, Philip Yang wrote:
  
  Caution: This message originated from an
External Source. Use proper caution when opening attachments,
clicking links, or responding.



On system with khugepaged enabled and user cases with THP
buffer, the

hmm_range_fault may takes > 15 seconds to return -EBUSY, the
arbitrary

timeout value is not accurate, cause memory allocation failure.


Remove the arbitrary timeout value, return EAGAIN to application
if

hmm_range_fault return EBUSY, then userspace libdrm and Thunk
will call

ioctl again.

  
  
  Wonder why letting user space do retry is better? Seems this issue
  is caused by hugepage merging, so how user space can avoid it?
  

The issue is caused by khugepaged + 4 processes + sdma stalls test
(to slow down sdma) + small_BAR + QPX mode, during overnight test,
hmm_range_fault 180MB buffer may takes >15 seconds returns EBUSY,
then alloc memory ioctl failed. Return EAGAIN, Thunk will call the
alloc memory ioctl again, and we don't see the alloc memory
failure.  

  
  And applications may not use Thunk or libdrm, instead, use ioctl
  directly.
  

If app calls ioctl directly, it should do the same thing, to call
  ioctl again if errno is EINTR or EAGAIN.
Regards,
Philip


  
  Regards
  
  
  Xiaogang
  
  
  Change EAGAIN to debug message as this is
not error.


Signed-off-by: Philip Yang 

---

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  5 -

  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c  | 12
+++-

  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  5 +

  3 files changed, 8 insertions(+), 14 deletions(-)


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

index 54198c3928c7..02696c2102f1 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

@@ -1087,7 +1087,10 @@ static int init_user_pages(struct kgd_mem
*mem, uint64_t user_addr,


 ret = amdgpu_ttm_tt_get_user_pages(bo,
bo->tbo.ttm->pages, );

 if (ret) {

-   pr_err("%s: Failed to get user pages: %d\n",
__func__, ret);

+   if (ret == -EAGAIN)

+   pr_debug("Failed to get user pages, try
again\n");

+   else

+   pr_err("%s: Failed to get user pages:
%d\n", __func__, ret);

 goto unregister_out;

 }


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index 431ec72655ec..e36fede7f74c 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

@@ -202,20 +202,12 @@ int amdgpu_hmm_range_get_pages(struct
mmu_interval_notifier *notifier,

 pr_debug("hmm range: start = 0x%lx, end =
0x%lx",

 hmm_range->start,
hmm_range->end);


-   /* Assuming 64MB takes maximum 1 second to fault
page address */

-   timeout = max((hmm_range->end -
hmm_range->start) >> 26, 1UL);

-   timeout *= HMM_RANGE_DEFAULT_TIMEOUT;

-   timeout = jiffies + msecs_to_jiffies(timeout);

+   timeout = jiffies +
msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);


  retry:

 hmm_range->notifier_seq =
mmu_interval_read_begin(notifier);

 r = hmm_range_fault(hmm_range);

 if (unlikely(r)) {

-   schedule();

-   /*

-    * FIXME: This timeout should encompass
the retry from

-    * mmu_interval_read_retry() as well.

-    */

 if (r == -EBUSY &&
!time_after(jiffies, timeout))
  

Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 15:55, Alex Deucher wrote:

On Thu, May 2, 2024 at 1:47 AM Sharma, Shashank  wrote:


On 01/05/2024 22:41, Alex Deucher wrote:

On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
 wrote:

This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.

V9: introduced this patch in the series

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Signed-off-by: Srinivasan Shanmugam 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c| 2 +-
   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 
   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 3 +++
   3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 781283753804..e516487e8db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
  int qid, r = 0;

  /* Usermode queues are only supported for GFX/SDMA engines as of now 
*/
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
  DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
  return -EINVAL;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a6c3037d2d1f..a5e270eda37b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
  return r;
  }

+   /* We don't need to set other FW objects for SDMA queues */
+   if (queue->queue_type == AMDGPU_HW_IP_DMA)
+   return 0;
+
  /* Shadow and GDS objects come directly from userspace */
  mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFC;
  mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 361835a61f2e..90354a70c807 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
  return 0;
   }

+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;

Can you include the header rather than adding an extern?

Noted,

+
   static int sdma_v6_0_sw_init(void *handle)
   {
  struct amdgpu_ring *ring;
@@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
  return -EINVAL;
  }

+   adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
  return r;
   }

I think we need a new mqd descriptor in amdgpu_drm.h as well since the
sdma metadata is different from gfx and compute.

Can you please elaborate on this ? AFAIK SDMA queue doesn't need any
specific metadata objects (like GFX).

Right.  I want to make it clear in the IOCTL interface what buffers
are required for which ring types.  E.g., UMD might allocate a shadow
buffer for SDMA, but they don't need it so there is no need to
allocate it.  If we have separate mqd structures for every ring type,
it makes it clear which additional buffers are needed for which ring
types.


Agree, it makes sense.

- Shashank


Alex


- Shashank


Alex


--
2.43.2



Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 1:47 AM Sharma, Shashank  wrote:
>
>
> On 01/05/2024 22:41, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> >  wrote:
> >> This patch does necessary modifications to enable the SDMA
> >> usermode queues using the existing userqueue infrastructure.
> >>
> >> V9: introduced this patch in the series
> >>
> >> Cc: Christian König 
> >> Cc: Alex Deucher 
> >> Signed-off-by: Shashank Sharma 
> >> Signed-off-by: Arvind Yadav 
> >> Signed-off-by: Srinivasan Shanmugam 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c| 2 +-
> >>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 
> >>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 3 +++
> >>   3 files changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> index 781283753804..e516487e8db9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> @@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
> >> drm_amdgpu_userq *args)
> >>  int qid, r = 0;
> >>
> >>  /* Usermode queues are only supported for GFX/SDMA engines as of 
> >> now */
> >> -   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
> >> +   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
> >> AMDGPU_HW_IP_DMA) {
> >>  DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
> >> args->in.ip_type);
> >>  return -EINVAL;
> >>  }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
> >> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index a6c3037d2d1f..a5e270eda37b 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct 
> >> amdgpu_userq_mgr *uq_mgr,
> >>  return r;
> >>  }
> >>
> >> +   /* We don't need to set other FW objects for SDMA queues */
> >> +   if (queue->queue_type == AMDGPU_HW_IP_DMA)
> >> +   return 0;
> >> +
> >>  /* Shadow and GDS objects come directly from userspace */
> >>  mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFC;
> >>  mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> index 361835a61f2e..90354a70c807 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> @@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
> >>  return 0;
> >>   }
> >>
> >> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
> > Can you include the header rather than adding an extern?
> Noted,
> >
> >> +
> >>   static int sdma_v6_0_sw_init(void *handle)
> >>   {
> >>  struct amdgpu_ring *ring;
> >> @@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
> >>  return -EINVAL;
> >>  }
> >>
> >> +   adev->userq_funcs[AMDGPU_HW_IP_DMA] = _mes_v11_0_funcs;
> >>  return r;
> >>   }
> > I think we need a new mqd descriptor in amdgpu_drm.h as well since the
> > sdma metadata is different from gfx and compute.
>
> Can you please elaborate on this ? AFAIK SDMA queue doesn't need any
> specific metadata objects (like GFX).

Right.  I want to make it clear in the IOCTL interface what buffers
are required for which ring types.  E.g., UMD might allocate a shadow
buffer for SDMA, but they don't need it so there is no need to
allocate it.  If we have separate mqd structures for every ring type,
it makes it clear which additional buffers are needed for which ring
types.

Alex

>
> - Shashank
>
> > Alex
> >
> >> --
> >> 2.43.2
> >>


Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 8:53 AM Sharma, Shashank  wrote:
>
>
> On 02/05/2024 07:23, Sharma, Shashank wrote:
> > Hey Alex,
> >
> > On 01/05/2024 22:39, Alex Deucher wrote:
> >> On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
> >>  wrote:
> >>> From: Alex Deucher 
> >>>
> >>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>> queue. The userspace app will fill this structure and request
> >>> the graphics driver to add a graphics work queue for it. The
> >>> output of this UAPI is a queue id.
> >>>
> >>> This UAPI maps the queue into GPU, so the graphics app can start
> >>> submitting work to the queue as soon as the call returns.
> >>>
> >>> V2: Addressed review comments from Alex and Christian
> >>>  - Make the doorbell offset's comment clearer
> >>>  - Change the output parameter name to queue_id
> >>>
> >>> V3: Integration with doorbell manager
> >>>
> >>> V4:
> >>>  - Updated the UAPI doc (Pierre-Eric)
> >>>  - Created a Union for engine specific MQDs (Alex)
> >>>  - Added Christian's R-B
> >>> V5:
> >>>  - Add variables for GDS and CSA in MQD structure (Alex)
> >>>  - Make MQD data a ptr-size pair instead of union (Alex)
> >>>
> >>> V9:
> >>> - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
> >>>   drm_amdgpu_userq_mqd as its being used for SDMA and
> >>>   compute queues as well
> >>>
> >>> Cc: Alex Deucher 
> >>> Cc: Christian Koenig 
> >>> Reviewed-by: Christian König 
> >>> Signed-off-by: Alex Deucher 
> >>> Signed-off-by: Shashank Sharma 
> >>> ---
> >>>   include/uapi/drm/amdgpu_drm.h | 110
> >>> ++
> >>>   1 file changed, 110 insertions(+)
> >>>
> >>> diff --git a/include/uapi/drm/amdgpu_drm.h
> >>> b/include/uapi/drm/amdgpu_drm.h
> >>> index 96e32dafd4f0..22f56a30f7cb 100644
> >>> --- a/include/uapi/drm/amdgpu_drm.h
> >>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>> @@ -54,6 +54,7 @@ extern "C" {
> >>>   #define DRM_AMDGPU_VM  0x13
> >>>   #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
> >>>   #define DRM_AMDGPU_SCHED   0x15
> >>> +#define DRM_AMDGPU_USERQ   0x16
> >>>
> >>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
> >>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>> @@ -71,6 +72,7 @@ extern "C" {
> >>>   #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE
> >>> + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
> >>>   #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>
> >>>   /**
> >>>* DOC: memory domains
> >>> @@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
> >>>  union drm_amdgpu_ctx_out out;
> >>>   };
> >>>
> >>> +/* user queue IOCTL */
> >>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>> +#define AMDGPU_USERQ_OP_FREE   2
> >>> +
> >>> +/* Flag to indicate secure buffer related workload, unused for now */
> >>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>> +/* Flag to indicate AQL workload, unused for now */
> >>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
> >>> +
> >>> +/*
> >>> + * MQD (memory queue descriptor) is a set of parameters which allow
> >>> + * the GPU to uniquely define and identify a usermode queue. This
> >>> + * structure defines the MQD for GFX-V11 IP ver 0.
> >>> + */
> >>> +struct drm_amdgpu_userq_mqd {
> >> Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
> >> Then we can add different MQDs for SDMA, compute, etc. as they have
> >> different metadata.  E.g., the shadow and CSA are gfx only.
> >
> >
> > Actually this was named drm_amdgpu_userq_mqd_gfx_v11_0 until the last
> > patchset, but then I realized that apart from the objects (gds/shadow
> > va) nothing is gfx specific, its actually required for every userqueue
> > IP which is MES based, so I thought it would be an overkill to create
> > multiple structures for almost the same data. If you feel strong about
> > this, I can change it again.
> >
> > - Shashank
>
>
> Please ignore my last comment, I understand what you are mentioning, and
> I have reformatted the patches accordingly. Now, I am keeping everything
> reqd for MES in one basic struture (drm_amdgpu_userq_in) and creating
> drm_amdgpu_userq_mqd_gfx_v11 for GFX specific things (like CSA, Shadow
> and GDS areas). Now there will be one separate patch which will enabled
> GFX_IP on MES code, just like how we have separate patches for SDMA and
> Compute IP in this series.  I will send the V10 patches with this
> reformatting in some time.

Yeah, we just need to make it clear to userspace what buffers are
necessary for which ring type.


Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Alex Deucher
On Thu, May 2, 2024 at 1:31 AM Sharma, Shashank  wrote:
>
>
> On 01/05/2024 23:36, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma  
> > wrote:
> >> To support oversubscription, MES FW expects WPTR BOs to
> >> be mapped into GART, before they are submitted to usermode
> >> queues. This patch adds a function for the same.
> >>
> >> V4: fix the wptr value before mapping lookup (Bas, Christian).
> >>
> >> V5: Addressed review comments from Christian:
> >>  - Either pin object or allocate from GART, but not both.
> >>  - All the handling must be done with the VM locks held.
> >>
> >> V7: Addressed review comments from Christian:
> >>  - Do not take vm->eviction_lock
> >>  - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
> >>
> >> V8: Rebase
> >> V9: Changed the function names from gfx_v11* to mes_v11*
> >>
> >> Cc: Alex Deucher 
> >> Cc: Christian Koenig 
> >> Signed-off-by: Shashank Sharma 
> >> Signed-off-by: Arvind Yadav 
> >> ---
> >>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
> >>   .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
> >>   2 files changed, 78 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
> >> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index 8d2cd61af26b..37b80626e792 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -30,6 +30,74 @@
> >>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> >>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> >>
> >> +static int
> >> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo 
> >> *bo)
> >> +{
> >> +   int ret;
> >> +
> >> +   ret = amdgpu_bo_reserve(bo, true);
> >> +   if (ret) {
> >> +   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> >> +   goto err_reserve_bo_failed;
> >> +   }
> >> +
> >> +   ret = amdgpu_ttm_alloc_gart(>tbo);
> >> +   if (ret) {
> >> +   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> >> +   goto err_map_bo_gart_failed;
> >> +   }
> >> +
> >> +   amdgpu_bo_unreserve(bo);
> >> +   bo = amdgpu_bo_ref(bo);
> >> +
> >> +   return 0;
> >> +
> >> +err_map_bo_gart_failed:
> >> +   amdgpu_bo_unreserve(bo);
> >> +err_reserve_bo_failed:
> >> +   return ret;
> >> +}
> >> +
> >> +static int
> >> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
> >> + struct amdgpu_usermode_queue *queue,
> >> + uint64_t wptr)
> >> +{
> >> +   struct amdgpu_device *adev = uq_mgr->adev;
> >> +   struct amdgpu_bo_va_mapping *wptr_mapping;
> >> +   struct amdgpu_vm *wptr_vm;
> >> +   struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
> >> +   int ret;
> >> +
> >> +   wptr_vm = queue->vm;
> >> +   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> >> +   if (ret)
> >> +   return ret;
> >> +
> >> +   wptr &= AMDGPU_GMC_HOLE_MASK;
> >> +   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
> >> PAGE_SHIFT);
> >> +   amdgpu_bo_unreserve(wptr_vm->root.bo);
> >> +   if (!wptr_mapping) {
> >> +   DRM_ERROR("Failed to lookup wptr bo\n");
> >> +   return -EINVAL;
> >> +   }
> >> +
> >> +   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
> >> +   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
> >> +   DRM_ERROR("Requested GART mapping for wptr bo larger than 
> >> one page\n");
> >> +   return -EINVAL;
> >> +   }
> >> +
> >> +   ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
> >> +   if (ret) {
> >> +   DRM_ERROR("Failed to map wptr bo to GART\n");
> >> +   return ret;
> >> +   }
> >> +
> >> +   queue->wptr_obj.gpu_addr = 
> >> amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> > The wptr virtual address from the user may not be at offset 0 from the
> > start of the object.  We should add the offset to the base vmid0 GPU
> > address.
>
> can you please elaborate a bit here ? wptr_obj->obj is already mapped to
> gart, do we still need this ?

The location that the MES will poll needs to be the same as the
location that the UMD will be writing to.  E.g., if you allocate the
BO and then map it into user space at location 0x5000 in the user's
GPU virtual address space and then the user uses 0x5008 as the wptr
address, we need to make sure that we are polling in MES at vmid0
virtual address + 0x8.  If you map the BO at 0x2000 in the vmid0
address space, you need to make sure to point the firmware to 0x2008.

Alex

>
> - Shashank
>
> >
> > Alex
> >
> >> +   return 0;
> >> +}
> >> +
> >>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
> >> struct amdgpu_usermode_queue *queue,
> >> struct amdgpu_mqd_prop 

Re: [PATCH] drm/amd/amdgpu: Check tbo resource pointer

2024-05-02 Thread Lazar, Lijo



On 5/2/2024 7:01 PM, Asad Kamal wrote:
> Validate tbo resource pointer, skip if NULL
> 
> Signed-off-by: Asad Kamal 
> Reviewed-by: Christian König 

Reviewed-by: Lijo Lazar 

Thanks,
Lijo

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7a6e3d13a454..77f6fd50002a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5012,7 +5012,8 @@ static int amdgpu_device_recover_vram(struct 
> amdgpu_device *adev)
>   shadow = vmbo->shadow;
>  
>   /* No need to recover an evicted BO */
> - if (shadow->tbo.resource->mem_type != TTM_PL_TT ||
> + if (!shadow->tbo.resource ||
> + shadow->tbo.resource->mem_type != TTM_PL_TT ||
>   shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET ||
>   shadow->parent->tbo.resource->mem_type != TTM_PL_VRAM)
>   continue;


[PATCH] drm/amd/amdgpu: Check tbo resource pointer

2024-05-02 Thread Asad Kamal
Validate tbo resource pointer, skip if NULL

Signed-off-by: Asad Kamal 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7a6e3d13a454..77f6fd50002a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5012,7 +5012,8 @@ static int amdgpu_device_recover_vram(struct 
amdgpu_device *adev)
shadow = vmbo->shadow;
 
/* No need to recover an evicted BO */
-   if (shadow->tbo.resource->mem_type != TTM_PL_TT ||
+   if (!shadow->tbo.resource ||
+   shadow->tbo.resource->mem_type != TTM_PL_TT ||
shadow->tbo.resource->start == AMDGPU_BO_INVALID_OFFSET ||
shadow->parent->tbo.resource->mem_type != TTM_PL_VRAM)
continue;
-- 
2.42.0



Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 15:06, Kasiviswanathan, Harish wrote:

[AMD Official Use Only - General]

-Original Message-
From: amd-gfx  On Behalf Of Sharma, 
Shashank
Sent: Thursday, May 2, 2024 1:32 AM
To: Alex Deucher 
Cc: amd-gfx@lists.freedesktop.org; Yadav, Arvind ; Deucher, Alexander 
; Koenig, Christian 
Subject: Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART


On 01/05/2024 23:36, Alex Deucher wrote:

On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma  wrote:

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
  - Either pin object or allocate from GART, but not both.
  - All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
  - Do not take vm->eviction_lock
  - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
   .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
   2 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 8d2cd61af26b..37b80626e792 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,74 @@
   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE

+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(>tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+

There is a very similar function amdgpu_amdkfd_map_gtt_bo_to_gart(). Is it 
possible to unify. Also, adev parameter in the above function is confusing. 
This was also removed from amdgpu_amdkfd_map_gtt_bo_to_gart(). It looks like bo 
is mapped to gart of adev, however it doesn't have to be. It is mapped to the 
gart to which bo is associated.


I don't think unification makes much sense here, but I agree that adev 
can be removed from the input args. I will update this.


- Shashank


+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);

The wptr virtual address from the user may not be at offset 0 from the
start of the object.  We should add the offset to the base vmid0 GPU
address.

can you please elaborate a bit here ? wptr_obj->obj is already mapped to
gart, do we still need this ?

- Shashank


Alex


+   return 0;
+}
+
   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 struct amdgpu_usermode_queue *queue,
 struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
  queue_input.queue_size = userq_props->queue_size >> 2;
  queue_input.doorbell_offset = userq_props->doorbell_index;
  

Re: [RFC 5/5] drm/amdgpu: Only show VRAM in fdinfo if it exists

2024-05-02 Thread Christian König

Am 30.04.24 um 19:27 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Do not emit the key-value pairs if the VRAM does not exist ie. VRAM
placement is not valid and accessible.


Yeah, that's unfortunately rather misleading.

Even APUs have VRAM or rather stolen system memory which is managed by 
the graphics driver.


We only have a single compute model which really doesn't have VRAM at all.

Regards,
Christian.



Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 29 +-
  1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index a09944104c41..603a5c010f5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -83,25 +83,30 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 */
  
  	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);

-   drm_printf(p, "drm-memory-vram:\t%llu KiB\n", stats.vram/1024UL);
drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", stats.gtt/1024UL);
drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", stats.cpu/1024UL);
-   drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
-  stats.visible_vram/1024UL);
-   drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
-  stats.evicted_vram/1024UL);
-   drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
-  stats.evicted_visible_vram/1024UL);
-   drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
-  stats.requested_vram/1024UL);
-   drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
-  stats.requested_visible_vram/1024UL);
drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
   stats.requested_gtt/1024UL);
-   drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL);
drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL);
drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL);
  
+	if (!adev->gmc.is_app_apu) {

+   drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
+  stats.vram/1024UL);
+   drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
+  stats.visible_vram/1024UL);
+   drm_printf(p, "amd-evicted-vram:\t%llu KiB\n",
+  stats.evicted_vram/1024UL);
+   drm_printf(p, "amd-evicted-visible-vram:\t%llu KiB\n",
+  stats.evicted_visible_vram/1024UL);
+   drm_printf(p, "amd-requested-vram:\t%llu KiB\n",
+  stats.requested_vram/1024UL);
+   drm_printf(p, "amd-requested-visible-vram:\t%llu KiB\n",
+  stats.requested_visible_vram/1024UL);
+   drm_printf(p, "drm-shared-vram:\t%llu KiB\n",
+  stats.vram_shared/1024UL);
+   }
+
for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
if (!usage[hw_ip])
continue;




Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-02 Thread Christian König

Am 01.05.24 um 15:27 schrieb Tvrtko Ursulin:


Hi Alex,

On 30/04/2024 19:32, Alex Deucher wrote:
On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin  
wrote:


From: Tvrtko Ursulin 

I have noticed AMD GPUs can have more than one "engine" (ring?) of 
the same type
but amdgpu is not reporting that in fdinfo using the capacity engine 
tag.


This series is therefore an attempt to improve that, but only an RFC 
since it is
quite likely I got stuff wrong on the first attempt. Or if not wrong 
it may not

be very beneficial in AMDs case.

So I tried to figure out how to count and store the number of 
instances of an
"engine" type and spotted that could perhaps be used in more than 
one place in
the driver. I was more than a little bit confused by the ip_instance 
and uapi
rings, then how rings are selected to context entities internally. 
Anyway..
hopefully it is a simple enough series to easily spot any such large 
misses.


End result should be that, assuming two "engine" instances, one 
fully loaded and

one idle will only report client using 50% of that engine type.


That would only be true if there are multiple instantiations of the IP
on the chip which in most cases is not true.  In most cases there is
one instance of the IP that can be fed from multiple rings. E.g. for
graphics and compute, all of the rings ultimately feed into the same
compute units on the chip.  So if you have a gfx ring and a compute
rings, you can schedule work to them asynchronously, but ultimately
whether they execute serially or in parallel depends on the actual
shader code in the command buffers and the extent to which it can
utilize the available compute units in the shader cores.


This is the same as with Intel/i915. Fdinfo is not intended to provide 
utilisation of EUs and such, just how busy are the "entities" kernel 
submits to. So doing something like in this series would make the 
reporting more similar between the two drivers.


I think both the 0-800% or 0-100% range (taking 8 ring compute as an 
example) can be misleading for different workloads. Neither <800% in 
the former means one can send more work and same for <100% in the latter.


Yeah, I think that's what Alex tries to describe. By using 8 compute 
rings your 800% load is actually incorrect and quite misleading.


Background is that those 8 compute rings won't be active all at the same 
time, but rather waiting on each other for resources.


But this "waiting" is unfortunately considered execution time since the 
used approach is actually not really capable of separating waiting and 
execution time.




There is also a parallel with the CPU world here and hyper threading, 
if not wider, where "What does 100% actually mean?" is also wishy-washy.


Also note that the reporting of actual time based values in fdinfo 
would not changing with this series.


Of if you can guide me towards how to distinguish real vs fake 
parallelism in HW IP blocks I could modify the series to only add 
capacity tags where there are truly independent blocks. That would be 
different from i915 though were I did not bother with that 
distinction. (For reasons that assignment of for instance EUs to 
compute "rings" (command streamers in i915) was supposed to be 
possible to re-configure on the fly. So it did not make sense to try 
and be super smart in fdinfo.)


Well exactly that's the point we don't really have truly independent 
blocks on AMD hardware.


There are things like independent SDMA instances, but those a meant to 
be used like the first instance for uploads and the second for downloads 
etc.. When you use both instances for the same job they will pretty much 
limit each other because of a single resource.



As for the UAPI portion of this, we generally expose a limited number
of rings to user space and then we use the GPU scheduler to load
balance between all of the available rings of a type to try and
extract as much parallelism as we can.


The part I do not understand is the purpose of the ring argument in 
for instance drm_amdgpu_cs_chunk_ib. It appears userspace can create 
up to N scheduling entities using different ring id's, but internally 
they can map to 1:N same scheduler instances (depending on IP type, 
can be that each userspace ring maps to same N hw rings, or for rings 
with no drm sched load balancing userspace ring also does not appear 
to have a relation to the picked drm sched instance.).


So I neither understand how this ring is useful, or how it does not 
create a problem for IP types which use drm_sched_pick_best. It 
appears even if userspace created two scheduling entities with 
different ring ids they could randomly map to same drm sched aka same 
hw ring, no?


Yeah, that is correct. The multimedia instances have to use a "fixed" 
load balancing because of lack of firmware support. That should have 
been fixed by now but we never found time to actually validate it.


Regarding the "ring" parameter in CS, that is basically just for 
backward 

RE: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Kasiviswanathan, Harish
[AMD Official Use Only - General]

-Original Message-
From: amd-gfx  On Behalf Of Sharma, 
Shashank
Sent: Thursday, May 2, 2024 1:32 AM
To: Alex Deucher 
Cc: amd-gfx@lists.freedesktop.org; Yadav, Arvind ; 
Deucher, Alexander ; Koenig, Christian 

Subject: Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART


On 01/05/2024 23:36, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma  
> wrote:
>> To support oversubscription, MES FW expects WPTR BOs to
>> be mapped into GART, before they are submitted to usermode
>> queues. This patch adds a function for the same.
>>
>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>
>> V5: Addressed review comments from Christian:
>>  - Either pin object or allocate from GART, but not both.
>>  - All the handling must be done with the VM locks held.
>>
>> V7: Addressed review comments from Christian:
>>  - Do not take vm->eviction_lock
>>  - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>>
>> V8: Rebase
>> V9: Changed the function names from gfx_v11* to mes_v11*
>>
>> Cc: Alex Deucher 
>> Cc: Christian Koenig 
>> Signed-off-by: Shashank Sharma 
>> Signed-off-by: Arvind Yadav 
>> ---
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
>>   2 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index 8d2cd61af26b..37b80626e792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -30,6 +30,74 @@
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>
>> +static int
>> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo 
>> *bo)
>> +{
>> +   int ret;
>> +
>> +   ret = amdgpu_bo_reserve(bo, true);
>> +   if (ret) {
>> +   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +   goto err_reserve_bo_failed;
>> +   }
>> +
>> +   ret = amdgpu_ttm_alloc_gart(>tbo);
>> +   if (ret) {
>> +   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +   goto err_map_bo_gart_failed;
>> +   }
>> +
>> +   amdgpu_bo_unreserve(bo);
>> +   bo = amdgpu_bo_ref(bo);
>> +
>> +   return 0;
>> +
>> +err_map_bo_gart_failed:
>> +   amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +   return ret;
>> +}
>> +

There is a very similar function amdgpu_amdkfd_map_gtt_bo_to_gart(). Is it 
possible to unify. Also, adev parameter in the above function is confusing. 
This was also removed from amdgpu_amdkfd_map_gtt_bo_to_gart(). It looks like bo 
is mapped to gart of adev, however it doesn't have to be. It is mapped to the 
gart to which bo is associated.

>> +static int
>> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
>> + struct amdgpu_usermode_queue *queue,
>> + uint64_t wptr)
>> +{
>> +   struct amdgpu_device *adev = uq_mgr->adev;
>> +   struct amdgpu_bo_va_mapping *wptr_mapping;
>> +   struct amdgpu_vm *wptr_vm;
>> +   struct amdgpu_userq_obj *wptr_obj = >wptr_obj;
>> +   int ret;
>> +
>> +   wptr_vm = queue->vm;
>> +   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +   if (ret)
>> +   return ret;
>> +
>> +   wptr &= AMDGPU_GMC_HOLE_MASK;
>> +   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
>> PAGE_SHIFT);
>> +   amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +   if (!wptr_mapping) {
>> +   DRM_ERROR("Failed to lookup wptr bo\n");
>> +   return -EINVAL;
>> +   }
>> +
>> +   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
>> +   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
>> +   DRM_ERROR("Requested GART mapping for wptr bo larger than 
>> one page\n");
>> +   return -EINVAL;
>> +   }
>> +
>> +   ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
>> +   if (ret) {
>> +   DRM_ERROR("Failed to map wptr bo to GART\n");
>> +   return ret;
>> +   }
>> +
>> +   queue->wptr_obj.gpu_addr = 
>> amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> The wptr virtual address from the user may not be at offset 0 from the
> start of the object.  We should add the offset to the base vmid0 GPU
> address.

can you please elaborate a bit here ? wptr_obj->obj is already mapped to
gart, do we still need this ?

- Shashank

>
> Alex
>
>> +   return 0;
>> +}
>> +
>>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>> struct amdgpu_usermode_queue *queue,
>> struct amdgpu_mqd_prop *userq_props)
>> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
>> *uq_mgr,
>>  

Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management

2024-05-02 Thread Sharma, Shashank



On 02/05/2024 07:23, Sharma, Shashank wrote:

Hey Alex,

On 01/05/2024 22:39, Alex Deucher wrote:

On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
 wrote:

From: Alex Deucher 

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
 - Make the doorbell offset's comment clearer
 - Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
 - Updated the UAPI doc (Pierre-Eric)
 - Created a Union for engine specific MQDs (Alex)
 - Added Christian's R-B
V5:
 - Add variables for GDS and CSA in MQD structure (Alex)
 - Make MQD data a ptr-size pair instead of union (Alex)

V9:
    - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
  drm_amdgpu_userq_mqd as its being used for SDMA and
  compute queues as well

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  include/uapi/drm/amdgpu_drm.h | 110 
++

  1 file changed, 110 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h 
b/include/uapi/drm/amdgpu_drm.h

index 96e32dafd4f0..22f56a30f7cb 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
  #define DRM_AMDGPU_VM  0x13
  #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
  #define DRM_AMDGPU_SCHED   0x15
+#define DRM_AMDGPU_USERQ   0x16

  #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
  #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)

@@ -71,6 +72,7 @@ extern "C" {
  #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE 
+ DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
  #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_USERQ, union drm_amdgpu_userq)


  /**
   * DOC: memory domains
@@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
 union drm_amdgpu_ctx_out out;
  };

+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE 1
+#define AMDGPU_USERQ_OP_FREE   2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_mqd {

Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
Then we can add different MQDs for SDMA, compute, etc. as they have
different metadata.  E.g., the shadow and CSA are gfx only.



Actually this was named drm_amdgpu_userq_mqd_gfx_v11_0 until the last 
patchset, but then I realized that apart from the objects (gds/shadow 
va) nothing is gfx specific, its actually required for every userqueue 
IP which is MES based, so I thought it would be an overkill to create 
multiple structures for almost the same data. If you feel strong about 
this, I can change it again.


- Shashank



Please ignore my last comment, I understand what you are mentioning, and 
I have reformatted the patches accordingly. Now, I am keeping everything 
reqd for MES in one basic struture (drm_amdgpu_userq_in) and creating  
drm_amdgpu_userq_mqd_gfx_v11 for GFX specific things (like CSA, Shadow 
and GDS areas). Now there will be one separate patch which will enabled 
GFX_IP on MES code, just like how we have separate patches for SDMA and 
Compute IP in this series.  I will send the V10 patches with this 
reformatting in some time.


- Shashank




Alex



+   /**
+    * @queue_va: Virtual address of the GPU memory which holds 
the queue

+    * object. The queue holds the workload packets.
+    */
+   __u64   queue_va;
+   /**
+    * @queue_size: Size of the queue in bytes, this needs to be 
256-byte

+    * aligned.
+    */
+   __u64   queue_size;
+   /**
+    * @rptr_va : Virtual address of the GPU memory which holds 
the ring RPTR.
+    * This object must be at least 8 byte in size and aligned 
to 8-byte offset.

+    */
+   __u64   rptr_va;
+   /**
+    * @wptr_va : Virtual address of the GPU memory which holds 
the ring WPTR.
+    * This object must be at 

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread James Zhu



On 2024-05-01 18:56, Philip Yang wrote:

On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.

Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.

Change EAGAIN to debug message as this is not error.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c  | 12 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  5 +
  3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 54198c3928c7..02696c2102f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1087,7 +1087,10 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr,
  
  	ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages, );

if (ret) {
-   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
+   if (ret == -EAGAIN)
+   pr_debug("Failed to get user pages, try again\n");
+   else
+   pr_err("%s: Failed to get user pages: %d\n", __func__, 
ret);
goto unregister_out;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index 431ec72655ec..e36fede7f74c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -202,20 +202,12 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
hmm_range->start, hmm_range->end);
  
-		/* Assuming 64MB takes maximum 1 second to fault page address */

-   timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
-   timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
-   timeout = jiffies + msecs_to_jiffies(timeout);
+   timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);

[JZ] should we reduce MAX_WALK_BYTE to 64M in the meantime?
  
  retry:

hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
r = hmm_range_fault(hmm_range);
if (unlikely(r)) {
-   schedule();

[JZ] the above is for CPU stall WA, we may still need keep it.

-   /*
-* FIXME: This timeout should encompass the retry from
-* mmu_interval_read_retry() as well.
-*/
if (r == -EBUSY && !time_after(jiffies, timeout))
goto retry;
goto out_free_pfns;
@@ -247,6 +239,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
  out_free_range:
kfree(hmm_range);
  
+	if (r == -EBUSY)

+   r = -EAGAIN;
return r;
  }
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 94f83be2232d..e7040f809f33 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1670,11 +1670,8 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
   readonly, owner, NULL,
   _range);
WRITE_ONCE(p->svms.faulting_task, NULL);
-   if (r) {
+   if (r)
pr_debug("failed %d to get svm range pages\n", 
r);
-   if (r == -EBUSY)
-   r = -EAGAIN;
-   }
} else {
r = -EFAULT;
}


Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive

2024-05-02 Thread Thomas Zimmermann




Am 30.04.24 um 19:38 schrieb Easwar Hariharan:

I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
with more appropriate terms. Inspired by and following on to Wolfram's
series to fix drivers/i2c/[1], fix the terminology for users of
I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
in the specification.

Compile tested, no functionality changes intended

[1]: 
https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/

Signed-off-by: Easwar Hariharan 


Acked-by: Thomas Zimmermann 


---
  drivers/video/fbdev/via/chip.h|  8 
  drivers/video/fbdev/via/dvi.c | 24 
  drivers/video/fbdev/via/lcd.c |  6 +++---
  drivers/video/fbdev/via/via_aux.h |  2 +-
  drivers/video/fbdev/via/via_i2c.c | 12 ++--
  drivers/video/fbdev/via/vt1636.c  |  6 +++---
  6 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/via/chip.h b/drivers/video/fbdev/via/chip.h
index f0a19cbcb9e5..1ea6d4ce79e7 100644
--- a/drivers/video/fbdev/via/chip.h
+++ b/drivers/video/fbdev/via/chip.h
@@ -69,7 +69,7 @@
  #define VT1632_TMDS 0x01
  #define INTEGRATED_TMDS 0x42
  
-/* Definition TMDS Trasmitter I2C Slave Address */

+/* Definition TMDS Trasmitter I2C Client Address */
  #define VT1632_TMDS_I2C_ADDR0x10
  
  /**/

@@ -88,21 +88,21 @@
  #define TX_DATA_DDR_MODE0x04
  #define TX_DATA_SDR_MODE0x08
  
-/* Definition LVDS Trasmitter I2C Slave Address */

+/* Definition LVDS Trasmitter I2C Client Address */
  #define VT1631_LVDS_I2C_ADDR0x70
  #define VT3271_LVDS_I2C_ADDR0x80
  #define VT1636_LVDS_I2C_ADDR0x80
  
  struct tmds_chip_information {

int tmds_chip_name;
-   int tmds_chip_slave_addr;
+   int tmds_chip_client_addr;
int output_interface;
int i2c_port;
  };
  
  struct lvds_chip_information {

int lvds_chip_name;
-   int lvds_chip_slave_addr;
+   int lvds_chip_client_addr;
int output_interface;
int i2c_port;
  };
diff --git a/drivers/video/fbdev/via/dvi.c b/drivers/video/fbdev/via/dvi.c
index 13147e3066eb..db7db26416c3 100644
--- a/drivers/video/fbdev/via/dvi.c
+++ b/drivers/video/fbdev/via/dvi.c
@@ -70,7 +70,7 @@ bool viafb_tmds_trasmitter_identify(void)
/* Check for VT1632: */
viaparinfo->chip_info->tmds_chip_info.tmds_chip_name = VT1632_TMDS;
viaparinfo->chip_info->
-   tmds_chip_info.tmds_chip_slave_addr = VT1632_TMDS_I2C_ADDR;
+   tmds_chip_info.tmds_chip_client_addr = VT1632_TMDS_I2C_ADDR;
viaparinfo->chip_info->tmds_chip_info.i2c_port = VIA_PORT_31;
if (check_tmds_chip(VT1632_DEVICE_ID_REG, VT1632_DEVICE_ID)) {
/*
@@ -128,14 +128,14 @@ bool viafb_tmds_trasmitter_identify(void)
viaparinfo->chip_info->
tmds_chip_info.tmds_chip_name = NON_TMDS_TRANSMITTER;
viaparinfo->chip_info->tmds_chip_info.
-   tmds_chip_slave_addr = VT1632_TMDS_I2C_ADDR;
+   tmds_chip_client_addr = VT1632_TMDS_I2C_ADDR;
return false;
  }
  
  static void tmds_register_write(int index, u8 data)

  {
viafb_i2c_writebyte(viaparinfo->chip_info->tmds_chip_info.i2c_port,
-   
viaparinfo->chip_info->tmds_chip_info.tmds_chip_slave_addr,
+   
viaparinfo->chip_info->tmds_chip_info.tmds_chip_client_addr,
index, data);
  }
  
@@ -144,7 +144,7 @@ static int tmds_register_read(int index)

u8 data;
  
  	viafb_i2c_readbyte(viaparinfo->chip_info->tmds_chip_info.i2c_port,

-  (u8) 
viaparinfo->chip_info->tmds_chip_info.tmds_chip_slave_addr,
+  (u8) 
viaparinfo->chip_info->tmds_chip_info.tmds_chip_client_addr,
   (u8) index, );
return data;
  }
@@ -152,7 +152,7 @@ static int tmds_register_read(int index)
  static int tmds_register_read_bytes(int index, u8 *buff, int buff_len)
  {
viafb_i2c_readbytes(viaparinfo->chip_info->tmds_chip_info.i2c_port,
-   (u8) 
viaparinfo->chip_info->tmds_chip_info.tmds_chip_slave_addr,
+   (u8) 
viaparinfo->chip_info->tmds_chip_info.tmds_chip_client_addr,
(u8) index, buff, buff_len);
return 0;
  }
@@ -256,14 +256,14 @@ static int viafb_dvi_query_EDID(void)
  
  	DEBUG_MSG(KERN_INFO "viafb_dvi_query_EDID!!\n");
  
-	restore = viaparinfo->chip_info->tmds_chip_info.tmds_chip_slave_addr;

-   viaparinfo->chip_info->tmds_chip_info.tmds_chip_slave_addr = 0xA0;
+   restore = viaparinfo->chip_info->tmds_chip_info.tmds_chip_client_addr;
+   viaparinfo->chip_info->tmds_chip_info.tmds_chip_client_addr = 0xA0;
  
  	data0 = (u8) tmds_register_read(0x00);

data1 = (u8) tmds_register_read(0x01);
 

Re: [PATCH v1 11/12] fbdev/smscufx: Make I2C terminology more inclusive

2024-05-02 Thread Thomas Zimmermann




Am 30.04.24 um 19:38 schrieb Easwar Hariharan:

I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
with more appropriate terms. Inspired by and following on to Wolfram's
series to fix drivers/i2c/[1], fix the terminology for users of
I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
in the specification.

Compile tested, no functionality changes intended

[1]: 
https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/

Signed-off-by: Easwar Hariharan 


Acked-by: Thomas Zimmermann 


---
  drivers/video/fbdev/smscufx.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/video/fbdev/smscufx.c b/drivers/video/fbdev/smscufx.c
index 35d682b110c4..1c80c1a3d516 100644
--- a/drivers/video/fbdev/smscufx.c
+++ b/drivers/video/fbdev/smscufx.c
@@ -1292,7 +1292,7 @@ static int ufx_realloc_framebuffer(struct ufx_data *dev, 
struct fb_info *info)
return 0;
  }
  
-/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, master,

+/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host,
   * restart enabled, but no start byte, enable controller */
  static int ufx_i2c_init(struct ufx_data *dev)
  {
@@ -1321,7 +1321,7 @@ static int ufx_i2c_init(struct ufx_data *dev)
/* 7-bit (not 10-bit) addressing */
tmp &= ~(0x10);
  
-	/* enable restart conditions and master mode */

+   /* enable restart conditions and host mode */
tmp |= 0x21;
  
  	status = ufx_reg_write(dev, 0x1000, tmp);


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: [PATCH v1 02/12] drm/gma500: Make I2C terminology more inclusive

2024-05-02 Thread Thomas Zimmermann




Am 30.04.24 um 19:38 schrieb Easwar Hariharan:

I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
with more appropriate terms. Inspired by and following on to Wolfram's
series to fix drivers/i2c/[1], fix the terminology for users of
I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
in the specification.

Compile tested, no functionality changes intended

[1]: 
https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/

Signed-off-by: Easwar Hariharan 


Acked-by: Thomas Zimmermann 


---
  drivers/gpu/drm/gma500/cdv_intel_lvds.c |  2 +-
  drivers/gpu/drm/gma500/intel_bios.c | 22 ++---
  drivers/gpu/drm/gma500/intel_bios.h |  4 ++--
  drivers/gpu/drm/gma500/intel_gmbus.c|  2 +-
  drivers/gpu/drm/gma500/psb_drv.h|  2 +-
  drivers/gpu/drm/gma500/psb_intel_drv.h  |  2 +-
  drivers/gpu/drm/gma500/psb_intel_lvds.c |  4 ++--
  drivers/gpu/drm/gma500/psb_intel_sdvo.c | 26 -
  8 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/gma500/cdv_intel_lvds.c 
b/drivers/gpu/drm/gma500/cdv_intel_lvds.c
index f08a6803dc18..c7652a02b42e 100644
--- a/drivers/gpu/drm/gma500/cdv_intel_lvds.c
+++ b/drivers/gpu/drm/gma500/cdv_intel_lvds.c
@@ -565,7 +565,7 @@ void cdv_intel_lvds_init(struct drm_device *dev,
dev->dev, "I2C bus registration failed.\n");
goto err_encoder_cleanup;
}
-   gma_encoder->i2c_bus->slave_addr = 0x2C;
+   gma_encoder->i2c_bus->target_addr = 0x2C;
dev_priv->lvds_i2c_bus = gma_encoder->i2c_bus;
  
  	/*

diff --git a/drivers/gpu/drm/gma500/intel_bios.c 
b/drivers/gpu/drm/gma500/intel_bios.c
index 8245b5603d2c..d5924ca3ed05 100644
--- a/drivers/gpu/drm/gma500/intel_bios.c
+++ b/drivers/gpu/drm/gma500/intel_bios.c
@@ -14,8 +14,8 @@
  #include "psb_intel_drv.h"
  #include "psb_intel_reg.h"
  
-#define	SLAVE_ADDR1	0x70

-#defineSLAVE_ADDR2 0x72
+#defineTARGET_ADDR10x70
+#defineTARGET_ADDR20x72
  
  static void *find_section(struct bdb_header *bdb, int section_id)

  {
@@ -357,10 +357,10 @@ parse_sdvo_device_mapping(struct drm_psb_private 
*dev_priv,
/* skip the device block if device type is invalid */
continue;
}
-   if (p_child->slave_addr != SLAVE_ADDR1 &&
-   p_child->slave_addr != SLAVE_ADDR2) {
+   if (p_child->target_addr != TARGET_ADDR1 &&
+   p_child->target_addr != TARGET_ADDR2) {
/*
-* If the slave address is neither 0x70 nor 0x72,
+* If the target address is neither 0x70 nor 0x72,
 * it is not a SDVO device. Skip it.
 */
continue;
@@ -371,22 +371,22 @@ parse_sdvo_device_mapping(struct drm_psb_private 
*dev_priv,
DRM_DEBUG_KMS("Incorrect SDVO port. Skip it\n");
continue;
}
-   DRM_DEBUG_KMS("the SDVO device with slave addr %2x is found on"
+   DRM_DEBUG_KMS("the SDVO device with target addr %2x is found on"
" %s port\n",
-   p_child->slave_addr,
+   p_child->target_addr,
(p_child->dvo_port == DEVICE_PORT_DVOB) ?
"SDVOB" : "SDVOC");
p_mapping = &(dev_priv->sdvo_mappings[p_child->dvo_port - 1]);
if (!p_mapping->initialized) {
p_mapping->dvo_port = p_child->dvo_port;
-   p_mapping->slave_addr = p_child->slave_addr;
+   p_mapping->target_addr = p_child->target_addr;
p_mapping->dvo_wiring = p_child->dvo_wiring;
p_mapping->ddc_pin = p_child->ddc_pin;
p_mapping->i2c_pin = p_child->i2c_pin;
p_mapping->initialized = 1;
DRM_DEBUG_KMS("SDVO device: dvo=%x, addr=%x, wiring=%d, 
ddc_pin=%d, i2c_pin=%d\n",
  p_mapping->dvo_port,
- p_mapping->slave_addr,
+ p_mapping->target_addr,
  p_mapping->dvo_wiring,
  p_mapping->ddc_pin,
  p_mapping->i2c_pin);
@@ -394,10 +394,10 @@ parse_sdvo_device_mapping(struct drm_psb_private 
*dev_priv,
DRM_DEBUG_KMS("Maybe one SDVO port is shared by "
 "two SDVO device.\n");
}
-   if (p_child->slave2_addr) {
+   if (p_child->target2_addr) {
/* Maybe this is a SDVO device 

Re: [PATCH] drm/amdgpu: remove ip dump reg_count variable

2024-05-02 Thread Christian König

Am 02.05.24 um 10:56 schrieb Sunil Khatri:

reg_count is not used and the register count is
directly derived from the array size and hence
removed.

Signed-off-by: Sunil Khatri 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 -
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 2 --
  2 files changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 64f197bbc866..9a946f0e015c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -436,7 +436,6 @@ struct amdgpu_gfx {
  
  	/* IP reg dump */

uint32_t*ip_dump;
-   uint32_treg_count;
  };
  
  struct amdgpu_gfx_ras_reg_entry {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 536287ddd2ec..3171ed5e5af3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4592,10 +4592,8 @@ static void gfx_v10_0_alloc_dump_mem(struct 
amdgpu_device *adev)
if (ptr == NULL) {
DRM_ERROR("Failed to allocate memory for IP Dump\n");
adev->gfx.ip_dump = NULL;
-   adev->gfx.reg_count = 0;
} else {
adev->gfx.ip_dump = ptr;
-   adev->gfx.reg_count = reg_count;
}
  }
  




[PATCH] drm/amdgpu: remove ip dump reg_count variable

2024-05-02 Thread Sunil Khatri
reg_count is not used and the register count is
directly derived from the array size and hence
removed.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 -
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 2 --
 2 files changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 64f197bbc866..9a946f0e015c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -436,7 +436,6 @@ struct amdgpu_gfx {
 
/* IP reg dump */
uint32_t*ip_dump;
-   uint32_treg_count;
 };
 
 struct amdgpu_gfx_ras_reg_entry {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 536287ddd2ec..3171ed5e5af3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4592,10 +4592,8 @@ static void gfx_v10_0_alloc_dump_mem(struct 
amdgpu_device *adev)
if (ptr == NULL) {
DRM_ERROR("Failed to allocate memory for IP Dump\n");
adev->gfx.ip_dump = NULL;
-   adev->gfx.reg_count = 0;
} else {
adev->gfx.ip_dump = ptr;
-   adev->gfx.reg_count = reg_count;
}
 }
 
-- 
2.34.1



Re: [RFC 0/5] Add capacity key to fdinfo

2024-05-02 Thread Tvrtko Ursulin



Hi Alex,

On 30/04/2024 19:32, Alex Deucher wrote:

On Tue, Apr 30, 2024 at 1:27 PM Tvrtko Ursulin  wrote:


From: Tvrtko Ursulin 

I have noticed AMD GPUs can have more than one "engine" (ring?) of the same type
but amdgpu is not reporting that in fdinfo using the capacity engine tag.

This series is therefore an attempt to improve that, but only an RFC since it is
quite likely I got stuff wrong on the first attempt. Or if not wrong it may not
be very beneficial in AMDs case.

So I tried to figure out how to count and store the number of instances of an
"engine" type and spotted that could perhaps be used in more than one place in
the driver. I was more than a little bit confused by the ip_instance and uapi
rings, then how rings are selected to context entities internally. Anyway..
hopefully it is a simple enough series to easily spot any such large misses.

End result should be that, assuming two "engine" instances, one fully loaded and
one idle will only report client using 50% of that engine type.


That would only be true if there are multiple instantiations of the IP
on the chip which in most cases is not true.  In most cases there is
one instance of the IP that can be fed from multiple rings.  E.g. for
graphics and compute, all of the rings ultimately feed into the same
compute units on the chip.  So if you have a gfx ring and a compute
rings, you can schedule work to them asynchronously, but ultimately
whether they execute serially or in parallel depends on the actual
shader code in the command buffers and the extent to which it can
utilize the available compute units in the shader cores.


This is the same as with Intel/i915. Fdinfo is not intended to provide 
utilisation of EUs and such, just how busy are the "entities" kernel 
submits to. So doing something like in this series would make the 
reporting more similar between the two drivers.


I think both the 0-800% or 0-100% range (taking 8 ring compute as an 
example) can be misleading for different workloads. Neither <800% in the 
former means one can send more work and same for <100% in the latter.


There is also a parallel with the CPU world here and hyper threading, if 
not wider, where "What does 100% actually mean?" is also wishy-washy.


Also note that the reporting of actual time based values in fdinfo would 
not changing with this series.


Of if you can guide me towards how to distinguish real vs fake 
parallelism in HW IP blocks I could modify the series to only add 
capacity tags where there are truly independent blocks. That would be 
different from i915 though were I did not bother with that distinction. 
(For reasons that assignment of for instance EUs to compute "rings" 
(command streamers in i915) was supposed to be possible to re-configure 
on the fly. So it did not make sense to try and be super smart in fdinfo.)



As for the UAPI portion of this, we generally expose a limited number
of rings to user space and then we use the GPU scheduler to load
balance between all of the available rings of a type to try and
extract as much parallelism as we can.


The part I do not understand is the purpose of the ring argument in for 
instance drm_amdgpu_cs_chunk_ib. It appears userspace can create up to N 
scheduling entities using different ring id's, but internally they can 
map to 1:N same scheduler instances (depending on IP type, can be that 
each userspace ring maps to same N hw rings, or for rings with no drm 
sched load balancing userspace ring also does not appear to have a 
relation to the picked drm sched instance.).


So I neither understand how this ring is useful, or how it does not 
create a problem for IP types which use drm_sched_pick_best. It appears 
even if userspace created two scheduling entities with different ring 
ids they could randomly map to same drm sched aka same hw ring, no?


Regards,

Tvrtko


Alex




Tvrtko Ursulin (5):
   drm/amdgpu: Cache number of rings per hw ip type
   drm/amdgpu: Use cached number of rings from the AMDGPU_INFO_HW_IP_INFO
 ioctl
   drm/amdgpu: Skip not present rings in amdgpu_ctx_mgr_usage
   drm/amdgpu: Show engine capacity in fdinfo
   drm/amdgpu: Only show VRAM in fdinfo if it exists

  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  3 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 39 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 62 +++---
  5 files changed, 49 insertions(+), 70 deletions(-)

--
2.44.0


Re: [PATCH v3 1/4] drm/amdgpu: Fix two reset triggered in a row

2024-05-02 Thread Christian König

Am 30.04.24 um 21:05 schrieb Li, Yunxiang (Teddy):

[Public]

Hi Christ,

I got R-b from the SRIOV team for the rest of the patches, can you help review 
this last one? I think the concerns from the previous thread are all addressed 
https://patchwork.freedesktop.org/patch/590678/?series=132727


I don't think I can help here since I'm not familiar with the RAS code 
either.


But I've seen that you already got an rb from Hawking's team, that 
should be sufficient I think.


Regards,
Christian.



Regards,
Teddy