Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-03-23 Thread Felix Kuehling



Am 2023-03-23 um 09:50 schrieb Kim, Jonathan:

[Public]


-Original Message-
From: Kuehling, Felix 
Sent: Monday, March 20, 2023 5:50 PM
To: Kim, Jonathan ; amd-
g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11


On 2023-01-25 14:53, Jonathan Kim wrote:

Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact

from

setup overhead.

Signed-off-by: Jonathan Kim 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |  2 +
   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  7 +-
   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 -
   drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64

+++

   drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +-
   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++
   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  3 +-
   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  3 +-
   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  | 42 
   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  3 +-
   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  3 +-
   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
   .../amd/amdkfd/kfd_process_queue_manager.c|  9 ++-
   13 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

index d20df0cf0d88..b5f5eed2b5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -219,6 +219,8 @@ struct mes_add_queue_input {
 uint32_tgws_size;
 uint64_ttba_addr;
 uint64_ttma_addr;
+   uint32_ttrap_en;
+   uint32_tskip_process_ctx_clear;
 uint32_tis_kfd_process;
 uint32_tis_aql_queue;
 uint32_tqueue_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c

b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c

index fbacdc42efac..38c7a0cbf264 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct

amdgpu_mes *mes,

 mes_add_queue_pkt.gws_size = input->gws_size;
 mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
 mes_add_queue_pkt.tma_addr = input->tma_addr;
+   mes_add_queue_pkt.trap_en = input->trap_en;
+   mes_add_queue_pkt.skip_process_ctx_clear = input-
skip_process_ctx_clear;
 mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;

 /* For KFD, gds_size is re-used for queue size (needed in MES for AQL

queues) */

 mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
 mes_add_queue_pkt.gds_size = input->queue_size;

-   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >=

4) &&

- (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0))

&&

- (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
-   mes_add_queue_pkt.trap_en = 1;
-
 /* For KFD, gds_size is re-used for queue size (needed in MES for AQL

queues) */

 mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
 mes_add_queue_pkt.gds_size = input->queue_size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index ee05c2e54ef6..f5f639de28f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp,

struct kfd_process *p,

 goto out;
 }

-   minfo.update_flag = UPDATE_FLAG_CU_MASK;
-
 mutex_lock(>mutex);

 retval = pqm_update_mqd(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

index f6ea6db266b4..6e99a0160275 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c

RE: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-03-23 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Monday, March 20, 2023 5:50 PM
> To: Kim, Jonathan ; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Subject: Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11
>
>
> On 2023-01-25 14:53, Jonathan Kim wrote:
> > Due to a HW bug, waves in only half the shader arrays can enter trap.
> >
> > When starting a debug session, relocate all waves to the first shader
> > array of each shader engine and mask off the 2nd shader array as
> > unavailable.
> >
> > When ending a debug session, re-enable the 2nd shader array per
> > shader engine.
> >
> > User CU masking per queue cannot be guaranteed to remain functional
> > if requested during debugging (e.g. user cu mask requests only 2nd shader
> > array as an available resource leading to zero HW resources available)
> > nor can runtime be alerted of any of these changes during execution.
> >
> > Make user CU masking and debugging mutual exclusive with respect to
> > availability.
> >
> > If the debugger tries to attach to a process with a user cu masked
> > queue, return the runtime status as enabled but busy.
> >
> > If the debugger tries to attach and fails to reallocate queue waves to
> > the first shader array of each shader engine, return the runtime status
> > as enabled but with an error.
> >
> > In addition, like any other mutli-process debug supported devices,
> > disable trap temporary setup per-process to avoid performance impact
> from
> > setup overhead.
> >
> > Signed-off-by: Jonathan Kim 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |  2 +
> >   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  7 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 -
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64
> +++
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +-
> >   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++
> >   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  3 +-
> >   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  3 +-
> >   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  | 42 
> >   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  3 +-
> >   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  3 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
> >   .../amd/amdkfd/kfd_process_queue_manager.c|  9 ++-
> >   13 files changed, 124 insertions(+), 29 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > index d20df0cf0d88..b5f5eed2b5ef 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > @@ -219,6 +219,8 @@ struct mes_add_queue_input {
> > uint32_tgws_size;
> > uint64_ttba_addr;
> > uint64_ttma_addr;
> > +   uint32_ttrap_en;
> > +   uint32_tskip_process_ctx_clear;
> > uint32_tis_kfd_process;
> > uint32_tis_aql_queue;
> > uint32_tqueue_size;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > index fbacdc42efac..38c7a0cbf264 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > @@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct
> amdgpu_mes *mes,
> > mes_add_queue_pkt.gws_size = input->gws_size;
> > mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
> > mes_add_queue_pkt.tma_addr = input->tma_addr;
> > +   mes_add_queue_pkt.trap_en = input->trap_en;
> > +   mes_add_queue_pkt.skip_process_ctx_clear = input-
> >skip_process_ctx_clear;
> > mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
> >
> > /* For KFD, gds_size is re-used for queue size (needed in MES for AQL
> queues) */
> > mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
> > mes_add_queue_pkt.gds_size = input->queue_size;
> >
> > -   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >=
> 4) &&
> > - (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0))
> &&
> > - (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
> > -   mes_add_queue_pkt.trap_en = 1;
> > -
> > /* For KFD, gds_size is re-used for queue size (needed in MES for AQL
> queues) */
> > mes_add_queue_pkt.is_aql_queue = i

Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-03-20 Thread Felix Kuehling



On 2023-01-25 14:53, Jonathan Kim wrote:

Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |  2 +
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  7 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 -
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 +++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +-
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  3 +-
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  3 +-
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  | 42 
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  3 +-
  .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  3 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
  .../amd/amdkfd/kfd_process_queue_manager.c|  9 ++-
  13 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index d20df0cf0d88..b5f5eed2b5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -219,6 +219,8 @@ struct mes_add_queue_input {
uint32_tgws_size;
uint64_ttba_addr;
uint64_ttma_addr;
+   uint32_ttrap_en;
+   uint32_tskip_process_ctx_clear;
uint32_tis_kfd_process;
uint32_tis_aql_queue;
uint32_tqueue_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index fbacdc42efac..38c7a0cbf264 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.gws_size = input->gws_size;
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
+   mes_add_queue_pkt.trap_en = input->trap_en;
+   mes_add_queue_pkt.skip_process_ctx_clear = 
input->skip_process_ctx_clear;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
  
  	/* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */

mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
  
-	if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&

- (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
- (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
-   mes_add_queue_pkt.trap_en = 1;
-
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ee05c2e54ef6..f5f639de28f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
goto out;
}
  
-	minfo.update_flag = UPDATE_FLAG_CU_MASK;

-
mutex_lock(>mutex);
  
  	retval = pqm_update_mqd(>pqm, args->queue_id, );

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index f6ea6db266b4..6e99a0160275 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -37,6 +37,70 @@ void debug_event_write_work_handler(struct work_struct *work)
kernel_write(process->dbg_ev_file, _data, 1, );
  }
  
+static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable)

+{
+   struct mqd_update_info minfo = {0};
+   int err;
+
+   if (!q || (!q->properties.is_dbg_wa && 

[PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11

2023-01-25 Thread Jonathan Kim
Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |  2 +
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  2 -
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 +++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|  3 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c  | 42 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  3 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  5 +-
 .../amd/amdkfd/kfd_process_queue_manager.c|  9 ++-
 13 files changed, 124 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index d20df0cf0d88..b5f5eed2b5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -219,6 +219,8 @@ struct mes_add_queue_input {
uint32_tgws_size;
uint64_ttba_addr;
uint64_ttma_addr;
+   uint32_ttrap_en;
+   uint32_tskip_process_ctx_clear;
uint32_tis_kfd_process;
uint32_tis_aql_queue;
uint32_tqueue_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index fbacdc42efac..38c7a0cbf264 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.gws_size = input->gws_size;
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
+   mes_add_queue_pkt.trap_en = input->trap_en;
+   mes_add_queue_pkt.skip_process_ctx_clear = 
input->skip_process_ctx_clear;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
 
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
 
-   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
- (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
- (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
-   mes_add_queue_pkt.trap_en = 1;
-
/* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
mes_add_queue_pkt.gds_size = input->queue_size;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ee05c2e54ef6..f5f639de28f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct 
kfd_process *p,
goto out;
}
 
-   minfo.update_flag = UPDATE_FLAG_CU_MASK;
-
mutex_lock(>mutex);
 
retval = pqm_update_mqd(>pqm, args->queue_id, );
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index f6ea6db266b4..6e99a0160275 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -37,6 +37,70 @@ void debug_event_write_work_handler(struct work_struct *work)
kernel_write(process->dbg_ev_file, _data, 1, );
 }
 
+static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable)
+{
+   struct mqd_update_info minfo = {0};
+   int err;
+
+   if (!q || (!q->properties.is_dbg_wa && !enable))
+   return 0;
+
+