Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11
Am 2023-03-23 um 09:50 schrieb Kim, Jonathan: [Public] -Original Message- From: Kuehling, Felix Sent: Monday, March 20, 2023 5:50 PM To: Kim, Jonathan ; amd- g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org Subject: Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11 On 2023-01-25 14:53, Jonathan Kim wrote: Due to a HW bug, waves in only half the shader arrays can enter trap. When starting a debug session, relocate all waves to the first shader array of each shader engine and mask off the 2nd shader array as unavailable. When ending a debug session, re-enable the 2nd shader array per shader engine. User CU masking per queue cannot be guaranteed to remain functional if requested during debugging (e.g. user cu mask requests only 2nd shader array as an available resource leading to zero HW resources available) nor can runtime be alerted of any of these changes during execution. Make user CU masking and debugging mutual exclusive with respect to availability. If the debugger tries to attach to a process with a user cu masked queue, return the runtime status as enabled but busy. If the debugger tries to attach and fails to reallocate queue waves to the first shader array of each shader engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 3 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ++ .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 42 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +- .../amd/amdkfd/kfd_process_queue_manager.c| 9 ++- 13 files changed, 124 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index d20df0cf0d88..b5f5eed2b5ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -219,6 +219,8 @@ struct mes_add_queue_input { uint32_tgws_size; uint64_ttba_addr; uint64_ttma_addr; + uint32_ttrap_en; + uint32_tskip_process_ctx_clear; uint32_tis_kfd_process; uint32_tis_aql_queue; uint32_tqueue_size; diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index fbacdc42efac..38c7a0cbf264 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes, mes_add_queue_pkt.gws_size = input->gws_size; mes_add_queue_pkt.trap_handler_addr = input->tba_addr; mes_add_queue_pkt.tma_addr = input->tma_addr; + mes_add_queue_pkt.trap_en = input->trap_en; + mes_add_queue_pkt.skip_process_ctx_clear = input- skip_process_ctx_clear; mes_add_queue_pkt.is_kfd_process = input->is_kfd_process; /* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */ mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; - if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) && - (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) && - (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3 - mes_add_queue_pkt.trap_en = 1; - /* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */ mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index ee05c2e54ef6..f5f639de28f0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p, goto out; } - minfo.update_flag = UPDATE_FLAG_CU_MASK; - mutex_lock(>mutex); retval = pqm_update_mqd(>pqm, args->queue_id, ); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index f6ea6db266b4..6e99a0160275 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
RE: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11
[Public] > -Original Message- > From: Kuehling, Felix > Sent: Monday, March 20, 2023 5:50 PM > To: Kim, Jonathan ; amd- > g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org > Subject: Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11 > > > On 2023-01-25 14:53, Jonathan Kim wrote: > > Due to a HW bug, waves in only half the shader arrays can enter trap. > > > > When starting a debug session, relocate all waves to the first shader > > array of each shader engine and mask off the 2nd shader array as > > unavailable. > > > > When ending a debug session, re-enable the 2nd shader array per > > shader engine. > > > > User CU masking per queue cannot be guaranteed to remain functional > > if requested during debugging (e.g. user cu mask requests only 2nd shader > > array as an available resource leading to zero HW resources available) > > nor can runtime be alerted of any of these changes during execution. > > > > Make user CU masking and debugging mutual exclusive with respect to > > availability. > > > > If the debugger tries to attach to a process with a user cu masked > > queue, return the runtime status as enabled but busy. > > > > If the debugger tries to attach and fails to reallocate queue waves to > > the first shader array of each shader engine, return the runtime status > > as enabled but with an error. > > > > In addition, like any other mutli-process debug supported devices, > > disable trap temporary setup per-process to avoid performance impact > from > > setup overhead. > > > > Signed-off-by: Jonathan Kim > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 + > > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +- > > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 - > > drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 > +++ > > drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 3 +- > > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ++ > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 3 +- > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 3 +- > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 42 > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 3 +- > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 3 +- > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +- > > .../amd/amdkfd/kfd_process_queue_manager.c| 9 ++- > > 13 files changed, 124 insertions(+), 29 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h > > index d20df0cf0d88..b5f5eed2b5ef 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h > > @@ -219,6 +219,8 @@ struct mes_add_queue_input { > > uint32_tgws_size; > > uint64_ttba_addr; > > uint64_ttma_addr; > > + uint32_ttrap_en; > > + uint32_tskip_process_ctx_clear; > > uint32_tis_kfd_process; > > uint32_tis_aql_queue; > > uint32_tqueue_size; > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > > index fbacdc42efac..38c7a0cbf264 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > > @@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct > amdgpu_mes *mes, > > mes_add_queue_pkt.gws_size = input->gws_size; > > mes_add_queue_pkt.trap_handler_addr = input->tba_addr; > > mes_add_queue_pkt.tma_addr = input->tma_addr; > > + mes_add_queue_pkt.trap_en = input->trap_en; > > + mes_add_queue_pkt.skip_process_ctx_clear = input- > >skip_process_ctx_clear; > > mes_add_queue_pkt.is_kfd_process = input->is_kfd_process; > > > > /* For KFD, gds_size is re-used for queue size (needed in MES for AQL > queues) */ > > mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; > > mes_add_queue_pkt.gds_size = input->queue_size; > > > > - if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= > 4) && > > - (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) > && > > - (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3 > > - mes_add_queue_pkt.trap_en = 1; > > - > > /* For KFD, gds_size is re-used for queue size (needed in MES for AQL > queues) */ > > mes_add_queue_pkt.is_aql_queue = i
Re: [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11
On 2023-01-25 14:53, Jonathan Kim wrote: Due to a HW bug, waves in only half the shader arrays can enter trap. When starting a debug session, relocate all waves to the first shader array of each shader engine and mask off the 2nd shader array as unavailable. When ending a debug session, re-enable the 2nd shader array per shader engine. User CU masking per queue cannot be guaranteed to remain functional if requested during debugging (e.g. user cu mask requests only 2nd shader array as an available resource leading to zero HW resources available) nor can runtime be alerted of any of these changes during execution. Make user CU masking and debugging mutual exclusive with respect to availability. If the debugger tries to attach to a process with a user cu masked queue, return the runtime status as enabled but busy. If the debugger tries to attach and fails to reallocate queue waves to the first shader array of each shader engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 64 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 3 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ++ .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 42 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 3 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +- .../amd/amdkfd/kfd_process_queue_manager.c| 9 ++- 13 files changed, 124 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index d20df0cf0d88..b5f5eed2b5ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -219,6 +219,8 @@ struct mes_add_queue_input { uint32_tgws_size; uint64_ttba_addr; uint64_ttma_addr; + uint32_ttrap_en; + uint32_tskip_process_ctx_clear; uint32_tis_kfd_process; uint32_tis_aql_queue; uint32_tqueue_size; diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index fbacdc42efac..38c7a0cbf264 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -197,17 +197,14 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes, mes_add_queue_pkt.gws_size = input->gws_size; mes_add_queue_pkt.trap_handler_addr = input->tba_addr; mes_add_queue_pkt.tma_addr = input->tma_addr; + mes_add_queue_pkt.trap_en = input->trap_en; + mes_add_queue_pkt.skip_process_ctx_clear = input->skip_process_ctx_clear; mes_add_queue_pkt.is_kfd_process = input->is_kfd_process; /* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */ mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; - if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) && - (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) && - (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3 - mes_add_queue_pkt.trap_en = 1; - /* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */ mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index ee05c2e54ef6..f5f639de28f0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -530,8 +530,6 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p, goto out; } - minfo.update_flag = UPDATE_FLAG_CU_MASK; - mutex_lock(>mutex); retval = pqm_update_mqd(>pqm, args->queue_id, ); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index f6ea6db266b4..6e99a0160275 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -37,6 +37,70 @@ void debug_event_write_work_handler(struct work_struct *work) kernel_write(process->dbg_ev_file, _data, 1, ); } +static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable) +{ + struct mqd_update_info minfo = {0}; + int err; + + if (!q || (!q->properties.is_dbg_wa &&