Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-11 Thread Eric Huang

On 2023-08-11 09:26, Felix Kuehling wrote:

Am 2023-08-10 um 18:27 schrieb Eric Huang:
There is not UNMAP_QUEUES command sending for queue preemption 
because the queue is suspended and test is closed to the end. 
Function unmap_queue_cpsch will do nothing after that.


How do you suspend queues without sending an UNMAP_QUEUES command?
Now I understand what you mean, I was only thinking of UNMAP_QUEUES 
sending after clearing call. So MEC FW should clear the control register 
unconditionally on every UNMAP_QUEUES command. We can request it for gfx 
v9.4.3 to avoid the awkward workaround in KFD.


Thanks,
Eric


Regards,
  Felix




The workaround is new and only for gfx v9.4.2, because debugger tests 
has changed to check if all address watch points are correctly set, 
i.e. test A sets more than one watchpoint and leave, the following 
test B only sets one watchpoint, and test A's setting will cause more 
than one watchpoint event, so test B check out and report error on 
second or third watchpoint not set by itself.


Regards,
Eric

On 2023-08-10 17:56, Felix Kuehling wrote:
I think Jon is suggesting that the UNMAP_QUEUES command should clear 
the address watch registers. Requesting such a change from the the 
HWS team may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, 
KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it 
doesn't consider if the queue is suspended. So GFX11 doesn't have 
this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch 
controls as non-valid automatically on queue preemption to avoid 
this kind of issue in the first place by design. MAP_PROCESS on 
resume will take whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug 
for gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a 
bit

different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd 
= {

 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

.program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,








Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-11 Thread Felix Kuehling

Am 2023-08-10 um 18:27 schrieb Eric Huang:
There is not UNMAP_QUEUES command sending for queue preemption because 
the queue is suspended and test is closed to the end. Function 
unmap_queue_cpsch will do nothing after that.


How do you suspend queues without sending an UNMAP_QUEUES command?

Regards,
  Felix




The workaround is new and only for gfx v9.4.2, because debugger tests 
has changed to check if all address watch points are correctly set, 
i.e. test A sets more than one watchpoint and leave, the following 
test B only sets one watchpoint, and test A's setting will cause more 
than one watchpoint event, so test B check out and report error on 
second or third watchpoint not set by itself.


Regards,
Eric

On 2023-08-10 17:56, Felix Kuehling wrote:
I think Jon is suggesting that the UNMAP_QUEUES command should clear 
the address watch registers. Requesting such a change from the the 
HWS team may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it 
doesn't consider if the queue is suspended. So GFX11 doesn't have 
this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch 
controls as non-valid automatically on queue preemption to avoid 
this kind of issue in the first place by design. MAP_PROCESS on 
resume will take whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug 
for gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

.program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,






RE: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Kim, Jonathan
[Public]

Sounds good.

I'd also change:

>>>>>> KFD currently relies on MEC FW to clear tcp watch control
>>>>>> register by sending MAP_PROCESS packet with 0 of field
>>>>>> tcp_watch_cntl to HWS, but if the queue is suspended, the
>>>>>> packet will not be sent and the previous value will be
>>>>>> left on the register, that will affect the following apps.
>>>>>> So the solution is to clear the register as gfx v9 in KFD.

To something like:

KFD currently relies on MEC FW to clear tcp watch control
register on UNMAP_QUEUES.  Due to a FW bug, MEC does not
do this.
So the solution is to clear the register as gfx v9 in KFD.

With those fixed, this patch is Reviewed-by: Jonathan Kim 

Hopefully we can get away with this since every watch instance register is 
supposed to be 1-1 to a process ...
And that there's no race scenarios with trailing exceptions on dynamic watch 
point address changes ...

Thanks,

Jon

> -Original Message-
> From: Huang, JinHuiEric 
> Sent: Thursday, August 10, 2023 6:31 PM
> To: Kim, Jonathan ; Kuehling, Felix
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
>
> I will change title to "drm/amdkfd: workaround address watch clearing
> bug for gfx v9.4.2". is it OK?
>
> Regards,
> Eric
>
> On 2023-08-10 18:25, Kim, Jonathan wrote:
> > [Public]
> >
> > Yeah this is a recent bug so this workaround is new.  More rigorous tests
> revealed this is probably a miss on the FW side.  We explicitly requested
> UNMAP_QUEUES unconditionally invalidate watch controls during the
> beginning of design to prevent any watch point racing.
> >
> > Note GFX11 MES calls are different on the surface but under the hood it's
> the same (registers get invalidated on unmap then get updated on map.
> Only difference it's at the queue level).
> >
> > I'm fine with this solution but I think it'd be good to describe this as a
> workaround somewhere (as opposed to a driver issue) so that folks aren't
> scratching their heads later on looking at code for GFX11 and up and
> wondering why we don't nuke the control setting with the KFD for those
> devices.
> >
> > Thanks,
> >
> > Jon
> >
> >> -Original Message-
> >> From: Kuehling, Felix 
> >> Sent: Thursday, August 10, 2023 5:56 PM
> >> To: Huang, JinHuiEric ; Kim, Jonathan
> >> ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx
> v9.4.2
> >>
> >> I think Jon is suggesting that the UNMAP_QUEUES command should clear
> the
> >> address watch registers. Requesting such a change from the the HWS team
> >> may take a long time.
> >>
> >> That said, when was this workaround implemented and reviewed? Did I
> >> review it as part of Jon's debugger upstreaming patch series? Or did
> >> this come later? This patch only enables the workaround for v9.4.2.
> >>
> >> Regards,
> >> Felix
> >>
> >>
> >> On 2023-08-10 17:52, Eric Huang wrote:
> >>> The problem is the queue is suspended before clearing address watch
> >>> call in KFD, there is not queue preemption and queue resume after
> >>> clearing call, and the test ends. So there is not chance to send
> >>> MAP_PROCESS to HWS. At this point FW has nothing to do. We have
> >>> several test FWs from Tej, none of them works, so I recalled the
> >>> kernel debug log and found out the problem.
> >>>
> >>> GFX11 has different scheduler, when calling clear address watch, KFD
> >>> directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it
> >> doesn't
> >>> consider if the queue is suspended. So GFX11 doesn't have this issue.
> >>>
> >>> Regards,
> >>> Eric
> >>>
> >>> On 2023-08-10 17:27, Kim, Jonathan wrote:
> >>>> [AMD Official Use Only - General]
> >>>>
> >>>> This is a strange solution because the MEC should set watch controls
> >>>> as non-valid automatically on queue preemption to avoid this kind of
> >>>> issue in the first place by design.  MAP_PROCESS on resume will take
> >>>> whatever the driver requests.
> >>>> GFX11 has no issue with letting the HWS do this.
> >>>>
> >>>> Are we sure we're not working around some HWS bug?
> >>>>
> >>>>

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
I will change title to "drm/amdkfd: workaround address watch clearing 
bug for gfx v9.4.2". is it OK?


Regards,
Eric

On 2023-08-10 18:25, Kim, Jonathan wrote:

[Public]

Yeah this is a recent bug so this workaround is new.  More rigorous tests 
revealed this is probably a miss on the FW side.  We explicitly requested 
UNMAP_QUEUES unconditionally invalidate watch controls during the beginning of 
design to prevent any watch point racing.

Note GFX11 MES calls are different on the surface but under the hood it's the 
same (registers get invalidated on unmap then get updated on map.  Only 
difference it's at the queue level).

I'm fine with this solution but I think it'd be good to describe this as a 
workaround somewhere (as opposed to a driver issue) so that folks aren't 
scratching their heads later on looking at code for GFX11 and up and wondering 
why we don't nuke the control setting with the KFD for those devices.

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:56 PM
To: Huang, JinHuiEric ; Kim, Jonathan
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

I think Jon is suggesting that the UNMAP_QUEUES command should clear the
address watch registers. Requesting such a change from the the HWS team
may take a long time.

That said, when was this workaround implemented and reviewed? Did I
review it as part of Jon's debugger upstreaming patch series? Or did
this come later? This patch only enables the workaround for v9.4.2.

Regards,
Felix


On 2023-08-10 17:52, Eric Huang wrote:

The problem is the queue is suspended before clearing address watch
call in KFD, there is not queue preemption and queue resume after
clearing call, and the test ends. So there is not chance to send
MAP_PROCESS to HWS. At this point FW has nothing to do. We have
several test FWs from Tej, none of them works, so I recalled the
kernel debug log and found out the problem.

GFX11 has different scheduler, when calling clear address watch, KFD
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it

doesn't

consider if the queue is suspended. So GFX11 doesn't have this issue.

Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls
as non-valid automatically on queue preemption to avoid this kind of
issue in the first place by design.  MAP_PROCESS on resume will take
whatever the driver requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for
gfx v9.4.2

I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +-

--

1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

  return watch_address_cntl;
}

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

  .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd =

{

  .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

  .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
  .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
  .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
  .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

  .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
There is not UNMAP_QUEUES command sending for queue preemption because 
the queue is suspended and test is closed to the end. Function 
unmap_queue_cpsch will do nothing after that.


The workaround is new and only for gfx v9.4.2, because debugger tests 
has changed to check if all address watch points are correctly set, i.e. 
test A sets more than one watchpoint and leave, the following test B 
only sets one watchpoint, and test A's setting will cause more than one 
watchpoint event, so test B check out and report error on second or 
third watchpoint not set by itself.


Regards,
Eric

On 2023-08-10 17:56, Felix Kuehling wrote:
I think Jon is suggesting that the UNMAP_QUEUES command should clear 
the address watch registers. Requesting such a change from the the HWS 
team may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't 
consider if the queue is suspended. So GFX11 doesn't have this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls 
as non-valid automatically on queue preemption to avoid this kind of 
issue in the first place by design. MAP_PROCESS on resume will take 
whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for 
gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

 .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,






RE: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Kim, Jonathan
[Public]

Yeah this is a recent bug so this workaround is new.  More rigorous tests 
revealed this is probably a miss on the FW side.  We explicitly requested 
UNMAP_QUEUES unconditionally invalidate watch controls during the beginning of 
design to prevent any watch point racing.

Note GFX11 MES calls are different on the surface but under the hood it's the 
same (registers get invalidated on unmap then get updated on map.  Only 
difference it's at the queue level).

I'm fine with this solution but I think it'd be good to describe this as a 
workaround somewhere (as opposed to a driver issue) so that folks aren't 
scratching their heads later on looking at code for GFX11 and up and wondering 
why we don't nuke the control setting with the KFD for those devices.

Thanks,

Jon

> -Original Message-
> From: Kuehling, Felix 
> Sent: Thursday, August 10, 2023 5:56 PM
> To: Huang, JinHuiEric ; Kim, Jonathan
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
>
> I think Jon is suggesting that the UNMAP_QUEUES command should clear the
> address watch registers. Requesting such a change from the the HWS team
> may take a long time.
>
> That said, when was this workaround implemented and reviewed? Did I
> review it as part of Jon's debugger upstreaming patch series? Or did
> this come later? This patch only enables the workaround for v9.4.2.
>
> Regards,
>Felix
>
>
> On 2023-08-10 17:52, Eric Huang wrote:
> > The problem is the queue is suspended before clearing address watch
> > call in KFD, there is not queue preemption and queue resume after
> > clearing call, and the test ends. So there is not chance to send
> > MAP_PROCESS to HWS. At this point FW has nothing to do. We have
> > several test FWs from Tej, none of them works, so I recalled the
> > kernel debug log and found out the problem.
> >
> > GFX11 has different scheduler, when calling clear address watch, KFD
> > directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it
> doesn't
> > consider if the queue is suspended. So GFX11 doesn't have this issue.
> >
> > Regards,
> > Eric
> >
> > On 2023-08-10 17:27, Kim, Jonathan wrote:
> >> [AMD Official Use Only - General]
> >>
> >> This is a strange solution because the MEC should set watch controls
> >> as non-valid automatically on queue preemption to avoid this kind of
> >> issue in the first place by design.  MAP_PROCESS on resume will take
> >> whatever the driver requests.
> >> GFX11 has no issue with letting the HWS do this.
> >>
> >> Are we sure we're not working around some HWS bug?
> >>
> >> Thanks,
> >>
> >> Jon
> >>
> >>> -Original Message-
> >>> From: Kuehling, Felix 
> >>> Sent: Thursday, August 10, 2023 5:03 PM
> >>> To: Huang, JinHuiEric ; amd-
> >>> g...@lists.freedesktop.org
> >>> Cc: Kim, Jonathan 
> >>> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for
> >>> gfx v9.4.2
> >>>
> >>> I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
> >>> different because it needs to support multiple XCCs.
> >>>
> >>> That said, this patch is
> >>>
> >>> Reviewed-by: Felix Kuehling 
> >>>
> >>>
> >>> On 2023-08-10 16:47, Eric Huang wrote:
> >>>> KFD currently relies on MEC FW to clear tcp watch control
> >>>> register by sending MAP_PROCESS packet with 0 of field
> >>>> tcp_watch_cntl to HWS, but if the queue is suspended, the
> >>>> packet will not be sent and the previous value will be
> >>>> left on the register, that will affect the following apps.
> >>>> So the solution is to clear the register as gfx v9 in KFD.
> >>>>
> >>>> Signed-off-by: Eric Huang 
> >>>> ---
> >>>>drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +-
> --
> >>>>1 file changed, 1 insertion(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> >>>> index e2fed6edbdd0..aff08321e976 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> >>>> @@ -163,12 +163,6 @@ static uint32_t
> >>> kg

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Felix Kuehling
I think Jon is suggesting that the UNMAP_QUEUES command should clear the 
address watch registers. Requesting such a change from the the HWS team 
may take a long time.


That said, when was this workaround implemented and reviewed? Did I 
review it as part of Jon's debugger upstreaming patch series? Or did 
this come later? This patch only enables the workaround for v9.4.2.


Regards,
  Felix


On 2023-08-10 17:52, Eric Huang wrote:
The problem is the queue is suspended before clearing address watch 
call in KFD, there is not queue preemption and queue resume after 
clearing call, and the test ends. So there is not chance to send 
MAP_PROCESS to HWS. At this point FW has nothing to do. We have 
several test FWs from Tej, none of them works, so I recalled the 
kernel debug log and found out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't 
consider if the queue is suspended. So GFX11 doesn't have this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls 
as non-valid automatically on queue preemption to avoid this kind of 
issue in the first place by design.  MAP_PROCESS on resume will take 
whatever the driver requests.

GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for 
gfx v9.4.2


I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

 .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
The problem is the queue is suspended before clearing address watch call 
in KFD, there is not queue preemption and queue resume after clearing 
call, and the test ends. So there is not chance to send MAP_PROCESS to 
HWS. At this point FW has nothing to do. We have several test FWs from 
Tej, none of them works, so I recalled the kernel debug log and found 
out the problem.


GFX11 has different scheduler, when calling clear address watch, KFD 
directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't 
consider if the queue is suspended. So GFX11 doesn't have this issue.


Regards,
Eric

On 2023-08-10 17:27, Kim, Jonathan wrote:

[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls as 
non-valid automatically on queue preemption to avoid this kind of issue in the 
first place by design.  MAP_PROCESS on resume will take whatever the driver 
requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon


-Original Message-
From: Kuehling, Felix 
Sent: Thursday, August 10, 2023 5:03 PM
To: Huang, JinHuiEric ; amd-
g...@lists.freedesktop.org
Cc: Kim, Jonathan 
Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
different because it needs to support multiple XCCs.

That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
   1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t

kgd_gfx_aldebaran_set_address_watch(

 return watch_address_cntl;
   }

-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct

amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .program_sh_mem_settings =

kgd_gfx_v9_program_sh_mem_settings,

 .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
 .set_wave_launch_trap_override =

kgd_aldebaran_set_wave_launch_trap_override,

 .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
 .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
 .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
 .build_grace_period_packet_info =

kgd_gfx_v9_build_grace_period_packet_info,

 .program_trap_handler_settings =

kgd_gfx_v9_program_trap_handler_settings,




RE: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Kim, Jonathan
[AMD Official Use Only - General]

This is a strange solution because the MEC should set watch controls as 
non-valid automatically on queue preemption to avoid this kind of issue in the 
first place by design.  MAP_PROCESS on resume will take whatever the driver 
requests.
GFX11 has no issue with letting the HWS do this.

Are we sure we're not working around some HWS bug?

Thanks,

Jon

> -Original Message-
> From: Kuehling, Felix 
> Sent: Thursday, August 10, 2023 5:03 PM
> To: Huang, JinHuiEric ; amd-
> g...@lists.freedesktop.org
> Cc: Kim, Jonathan 
> Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
>
> I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit
> different because it needs to support multiple XCCs.
>
> That said, this patch is
>
> Reviewed-by: Felix Kuehling 
>
>
> On 2023-08-10 16:47, Eric Huang wrote:
> > KFD currently relies on MEC FW to clear tcp watch control
> > register by sending MAP_PROCESS packet with 0 of field
> > tcp_watch_cntl to HWS, but if the queue is suspended, the
> > packet will not be sent and the previous value will be
> > left on the register, that will affect the following apps.
> > So the solution is to clear the register as gfx v9 in KFD.
> >
> > Signed-off-by: Eric Huang 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > index e2fed6edbdd0..aff08321e976 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
> > @@ -163,12 +163,6 @@ static uint32_t
> kgd_gfx_aldebaran_set_address_watch(
> > return watch_address_cntl;
> >   }
> >
> > -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct
> amdgpu_device *adev,
> > - uint32_t watch_id)
> > -{
> > -   return 0;
> > -}
> > -
> >   const struct kfd2kgd_calls aldebaran_kfd2kgd = {
> > .program_sh_mem_settings =
> kgd_gfx_v9_program_sh_mem_settings,
> > .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
> > @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
> > .set_wave_launch_trap_override =
> kgd_aldebaran_set_wave_launch_trap_override,
> > .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
> > .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
> > -   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
> > +   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
> > .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
> > .build_grace_period_packet_info =
> kgd_gfx_v9_build_grace_period_packet_info,
> > .program_trap_handler_settings =
> kgd_gfx_v9_program_trap_handler_settings,


Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang

Yes. I will send out the fix for gc v9.4.3 later. Thanks for your review.

Eric

On 2023-08-10 17:02, Felix Kuehling wrote:
I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit 
different because it needs to support multiple XCCs.


That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c

index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t 
kgd_gfx_aldebaran_set_address_watch(

  return watch_address_cntl;
  }
  -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct 
amdgpu_device *adev,

-  uint32_t watch_id)
-{
-    return 0;
-}
-
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
  .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
  .set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,

  .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
  .set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-    .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+    .clear_address_watch = kgd_gfx_v9_clear_address_watch,
  .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
  .build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
  .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,




Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Felix Kuehling
I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit 
different because it needs to support multiple XCCs.


That said, this patch is

Reviewed-by: Felix Kuehling 


On 2023-08-10 16:47, Eric Huang wrote:

KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
  }
  
-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev,

- uint32_t watch_id)
-{
-   return 0;
-}
-
  const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
.set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,


[PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
KFD currently relies on MEC FW to clear tcp watch control
register by sending MAP_PROCESS packet with 0 of field
tcp_watch_cntl to HWS, but if the queue is suspended, the
packet will not be sent and the previous value will be
left on the register, that will affect the following apps.
So the solution is to clear the register as gfx v9 in KFD.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index e2fed6edbdd0..aff08321e976 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
-static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device 
*adev,
- uint32_t watch_id)
-{
-   return 0;
-}
-
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.set_wave_launch_trap_override = 
kgd_aldebaran_set_wave_launch_trap_override,
.set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode,
.set_address_watch = kgd_gfx_aldebaran_set_address_watch,
-   .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch,
+   .clear_address_watch = kgd_gfx_v9_clear_address_watch,
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
-- 
2.34.1