[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls as 
non-valid automatically on queue preemption to avoid this kind of issue in the 
first place by design.  MAP_PROCESS on resume will take whatever the driver 
requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon

> -----Original Message-----
> From: Kuehling, Felix <felix.kuehl...@amd.com>
> Sent: Thursday, August 10, 2023 5:03 PM
> To: Huang, JinHuiEric <jinhuieric.hu...@amd.com>; amd-
> g...@lists.freedesktop.org
> Cc: Kim, Jonathan <jonathan....@amd.com>
> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
>
> I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
> different because it needs to support multiple XCCs.
>
> That said, this patch is
>
> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>
>
>
> On 2023-08-10 16:47, Eric Huang wrote:
> > KFD currently relies on MEC FW to clear tcp watch control
> > register by sending MAP_PROCESS packet with 0 of field
> > tcp_watch_cntl to HWS, but if the queue is suspended, the
> > packet will not be sent and the previous value will be
> > left on the register, that will affect the following apps.
> > So the solution is to clear the register as gfx v9 in KFD.
> >
> > Signed-off-by: Eric Huang <jinhuieric.hu...@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +-------
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > index e2fed6edbdd0..aff08321e976 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > @@ -163,12 +163,6 @@ static uint32_t
> kgd_gfx_aldebaran_set_address_watch(
> >     return watch_address_cntl;
> >   }
> >
> > -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct
> amdgpu_device *adev,
> > -                                                 uint32_t watch_id)
> > -{
> > -   return 0;
> > -}
> > -
> >   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
> >     .program_sh_mem_settings =
> kgd_gfx_v9_program_sh_mem_settings,
> >     .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
> > @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
> >     .set_wave_launch_trap_override =
> kgd_aldebaran_set_wave_launch_trap_override,
> >     .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
> >     .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
> > -   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
> > +   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
> >     .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
> >     .build_grace_period_packet_info =
> kgd_gfx_v9_build_grace_period_packet_info,
> >     .program_trap_handler_settings =
> kgd_gfx_v9_program_trap_handler_settings,

Reply via email to