Re: [PATCH v2] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Felix Kuehling

On 2024-02-07 0:32, Joseph Greathouse wrote:

The current kfd_gpu_cache_info structure is only partially
filled in for some architectures. This means that for devices
where we do not fill in some fields, we can returned
uninitialized values through  the KFD topology.
Zero out the kfd_gpu_cache_info before asking the remaining
fields to be filled in by lower-level functions.

Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology")
Signed-off-by: Joseph Greathouse 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..5cb0465493b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct 
kfd_topology_device *dev, struct
  
  	gpu_processor_id = dev->node_props.simd_id_base;
  
+	memset(cache_info, 0, sizeof(cache_info));

pcache_info = cache_info;
num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
if (!num_of_cache_types) {


[PATCH v2] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Joseph Greathouse
The current kfd_gpu_cache_info structure is only partially
filled in for some architectures. This means that for devices
where we do not fill in some fields, we can returned
uninitialized values through  the KFD topology.
Zero out the kfd_gpu_cache_info before asking the remaining
fields to be filled in by lower-level functions.

Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology")
Signed-off-by: Joseph Greathouse 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..5cb0465493b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct 
kfd_topology_device *dev, struct
 
gpu_processor_id = dev->node_props.simd_id_base;
 
+   memset(cache_info, 0, sizeof(cache_info));
pcache_info = cache_info;
num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
if (!num_of_cache_types) {
-- 
2.20.1



[PATCH] drm/amd/display: Fix possible use of uninitialized 'max_chunks_fbc_mode' in 'calculate_bandwidth()'

2024-02-06 Thread Srinivasan Shanmugam
'max_chunks_fbc_mode' is only declared and assigned a value under a
specific condition in the following lines:

if (data->fbc_en[i] == 1) {
max_chunks_fbc_mode = 128 - dmif_chunk_buff_margin;
}

If 'data->fbc_en[i]' is not equal to 1 for any i, max_chunks_fbc_mode
will not be initialized if it's used outside of this for loop.

Ensure that 'max_chunks_fbc_mode' is properly initialized before it's
used. Initialize it to a default value right after its declaration to
ensure that it gets a value assigned under all possible control flow
paths.

Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:914 
calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:917 
calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.

Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)")
Cc: Harry Wentland 
Cc: Alex Deucher 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
index f2dfa96f9ef5..39530b2ea495 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
@@ -94,7 +94,7 @@ static void calculate_bandwidth(
const uint32_t s_high = 7;
const uint32_t dmif_chunk_buff_margin = 1;
 
-   uint32_t max_chunks_fbc_mode;
+   uint32_t max_chunks_fbc_mode = 0;
int32_t num_cursor_lines;
 
int32_t i, j, k;
-- 
2.34.1



[PATCH] drm/amd/display: Fix possible buffer overflow in 'find_dcfclk_for_voltage()'

2024-02-06 Thread Srinivasan Shanmugam
when 'find_dcfclk_for_voltage()' function is looping over
VG_NUM_SOC_VOLTAGE_LEVELS (which is 8), but the size of the DcfClocks
array is VG_NUM_DCFCLK_DPM_LEVELS (which is 7).

When the loop variable i reaches 7, the function tries to access
clock_table->DcfClocks[7]. However, since the size of the DcfClocks
array is 7, the valid indices are 0 to 6. Index 7 is beyond the size of
the array, leading to a buffer overflow.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn301/vg_clk_mgr.c:550 
find_dcfclk_for_voltage() error: buffer overflow 'clock_table->DcfClocks' 7 <= 7

Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)")
Cc: Roman Li 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
index a5489fe6875f..aa9fd1dc550a 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
@@ -546,6 +546,8 @@ static unsigned int find_dcfclk_for_voltage(const struct 
vg_dpm_clocks *clock_ta
int i;
 
for (i = 0; i < VG_NUM_SOC_VOLTAGE_LEVELS; i++) {
+   if (i >= VG_NUM_DCFCLK_DPM_LEVELS)
+   break;
if (clock_table->SocVoltage[i] == voltage)
return clock_table->DcfClocks[i];
}
-- 
2.34.1



[PATCH] drm/amd/display: Fix possible NULL dereference on device remove/driver unload

2024-02-06 Thread Srinivasan Shanmugam
As part of a cleanup amdgpu_dm_fini() function, which is typically
called when a device is being shut down or a driver is being unloaded

The below error message suggests that there is a potential null pointer
dereference issue with adev->dm.dc.

In the below, line of code where adev->dm.dc is used without a preceding
null check:

for (i = 0; i < adev->dm.dc->caps.max_links; i++) {

To fix this issue, add a null check for adev->dm.dc before this line.

Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1959 
amdgpu_dm_fini() error: we previously assumed 'adev->dm.dc' could be null (see 
line 1943)

Fixes: 006c26a0f1c8 ("drm/amd/display: Fix crash on device remove/driver 
unload")
Cc: Andrey Grodzovsky 
Cc: Harry Wentland 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b3a5e730be24..d4c1415f4562 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1956,7 +1956,7 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
  >dm.dmub_bo_gpu_addr,
  >dm.dmub_bo_cpu_addr);
 
-   if (adev->dm.hpd_rx_offload_wq) {
+   if (adev->dm.hpd_rx_offload_wq && adev->dm.dc) {
for (i = 0; i < adev->dm.dc->caps.max_links; i++) {
if (adev->dm.hpd_rx_offload_wq[i].wq) {

destroy_workqueue(adev->dm.hpd_rx_offload_wq[i].wq);
-- 
2.34.1



[PATCH] drm/amd/display: Initialize 'wait_time_microsec' variable in link_dp_training_dpia.c

2024-02-06 Thread Srinivasan Shanmugam
wait_time_microsec = max(wait_time_microsec, (uint32_t)
DPIA_CLK_SYNC_DELAY);

Above line is trying to assign the maximum value between
'wait_time_microsec' and 'DPIA_CLK_SYNC_DELAY' to wait_time_microsec.
However, 'wait_time_microsec' has not been assigned a value before this
line, initialize 'wait_time_microsec' at the point of declaration.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training_dpia.c:697
 dpia_training_eq_non_transparent() error: uninitialized symbol 
'wait_time_microsec'.

Fixes: 630168a97314 ("drm/amd/display: move dp link training logic to 
link_dp_training")
Cc: Wenjing Liu 
Cc: Rodrigo Siqueira 
Cc: Aurabindo Pillai 
Signed-off-by: Srinivasan Shanmugam 
---
 .../drm/amd/display/dc/link/protocols/link_dp_training_dpia.c   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
index e8dda44b23cb..5d36bab0029c 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
@@ -619,7 +619,7 @@ static enum link_training_result 
dpia_training_eq_non_transparent(
uint32_t retries_eq = 0;
enum dc_status status;
enum dc_dp_training_pattern tr_pattern;
-   uint32_t wait_time_microsec;
+   uint32_t wait_time_microsec = 0;
enum dc_lane_count lane_count = lt_settings->link_settings.lane_count;
union lane_align_status_updated dpcd_lane_status_updated = {0};
union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {0};
-- 
2.34.1



RE: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Thong
> Sent: Tuesday, February 6, 2024 6:28 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Thai, Thong 
> Subject: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding
> resolution
>
> Update the maximum resolution reported for HEVC encoding on VCN 4 devices
> to reflect its 8K encoding capability.
>

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159

With that added,
Acked-by: Alex Deucher 

> Signed-off-by: Thong 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c
> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 48c6efcdeac9..4d7188912edf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -50,13 +50,13 @@ static const struct amd_ip_funcs
> soc21_common_ip_funcs;
>  /* SOC21 */
>  static const struct amdgpu_video_codec_info
> vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
>
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4
> _AVC, 4096, 2304, 0)},
> - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 4096, 2304, 0)},
> + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 8192, 4352,
> +0)},
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1,
> 8192, 4352, 0)},  };
>
>  static const struct amdgpu_video_codec_info
> vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
>
>   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4
> _AVC, 4096, 2304, 0)},
> - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 4096, 2304, 0)},
> + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC,
> 8192, 4352,
> +0)},
>  };
>
>  static const struct amdgpu_video_codecs
> vcn_4_0_0_video_codecs_encode_vcn0 = {
> --
> 2.34.1



RE: [PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9

2024-02-06 Thread Zhang, Hawking
[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Lazar, Lijo 
Sent: Wednesday, February 7, 2024 10:22
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Ming, Davis ; Kamal, Asad 
; Ma, Le 
Subject: [PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9

HDP flush remapping is not done for VFs. Keep the original offsets in VF 
environment.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
index e90f33780803..b4723d68eab0 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
@@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device 
*adev)
u32 inst_mask;
int i;

+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset =
+   SOC15_REG_OFFSET(
+   NBIO, 0,
+   
regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+   << 2;
WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE,
0xff & ~(adev->gfx.xcc_mask));

--
2.25.1



[PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9

2024-02-06 Thread Lijo Lazar
HDP flush remapping is not done for VFs. Keep the original offsets in VF
environment.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
index e90f33780803..b4723d68eab0 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
@@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device 
*adev)
u32 inst_mask;
int i;
 
+   if (amdgpu_sriov_vf(adev))
+   adev->rmmio_remap.reg_offset =
+   SOC15_REG_OFFSET(
+   NBIO, 0,
+   
regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+   << 2;
WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE,
0xff & ~(adev->gfx.xcc_mask));
 
-- 
2.25.1



[PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution

2024-02-06 Thread Thong
Update the maximum resolution reported for HEVC encoding on VCN 4
devices to reflect its 8K encoding capability.

Signed-off-by: Thong 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 48c6efcdeac9..4d7188912edf 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -50,13 +50,13 @@ static const struct amd_ip_funcs soc21_common_ip_funcs;
 /* SOC21 */
 static const struct amdgpu_video_codec_info 
vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
2304, 0)},
-   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 
0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 
0)},
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)},
 };
 
 static const struct amdgpu_video_codec_info 
vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 
2304, 0)},
-   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 
0)},
+   {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 
0)},
 };
 
 static const struct amdgpu_video_codecs vcn_4_0_0_video_codecs_encode_vcn0 = {
-- 
2.34.1



Re: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback

2024-02-06 Thread Mario Limonciello

On 2/6/2024 16:00, Deucher, Alexander wrote:

[AMD Official Use Only - General]


-Original Message-
From: amd-gfx  On Behalf Of Mario
Limonciello
Sent: Tuesday, February 6, 2024 4:32 PM
To: amd-gfx@lists.freedesktop.org
Cc: Limonciello, Mario ; Jürg Billeter

Subject: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of
suspend() callback

commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
callback") intentionally moved the eviction of resources to earlier in the
suspend process, but this introduced a subtle change that it occurs before
adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to
evict resources at suspend time as well.

Move the s0i3/s3 setting flags into prepare() to ensure that they're set during
eviction. Drop the existing call to return 1 in this case because the suspend()
callback looks for the flags too.

Reported-by: Jürg Billeter 
Closes: https://gitlab.freedesktop.org/drm/amd/-
/issues/3132#note_2271038
Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
callback")
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 --
  1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b74f68a15802..190b2ee9e36b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device
*dev)
   pm_runtime_suspended(dev))
   return 1;

- /* if we will not support s3 or s2i for the device
-  *  then skip suspend
-  */
- if (!amdgpu_acpi_is_s0ix_active(adev) &&
- !amdgpu_acpi_is_s3_active(adev))
- return 1;
+ if (amdgpu_acpi_is_s0ix_active(adev))
+ adev->in_s0ix = true;
+ else if (amdgpu_acpi_is_s3_active(adev))
+ adev->in_s3 = true;



Will resume always get called to clear these after after prepare?  Will these 
ever get set and then not unset?


You're right; it doesn't clean up.

This is the call sequence:

suspend_devices_and_enter()
->dpm_suspend_start()
->->device_prepare()
->->->dpm_prepare()

Errors bubble up.  In suspend_devices_and_enter() errors goto 
Recover_platform label.  This calls platform_recover().


platform_recover() is for platform recovery not device recovery.
So this patch is incorrect.

Let me see if I can come up with another way to do this without having 
to revert 5095d5418193.




Alex


   return amdgpu_device_prepare(drm_dev);  } @@ -2484,10 +2482,6
@@ static int amdgpu_pmops_suspend(struct device *dev)
   struct drm_device *drm_dev = dev_get_drvdata(dev);
   struct amdgpu_device *adev = drm_to_adev(drm_dev);

- if (amdgpu_acpi_is_s0ix_active(adev))
- adev->in_s0ix = true;
- else if (amdgpu_acpi_is_s3_active(adev))
- adev->in_s3 = true;
   if (!adev->in_s0ix && !adev->in_s3)
   return 0;
   return amdgpu_device_suspend(drm_dev, true);
--
2.34.1






RE: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Mario
> Limonciello
> Sent: Tuesday, February 6, 2024 4:32 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ; Jürg Billeter
> 
> Subject: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of
> suspend() callback
>
> commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
> callback") intentionally moved the eviction of resources to earlier in the
> suspend process, but this introduced a subtle change that it occurs before
> adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to
> evict resources at suspend time as well.
>
> Move the s0i3/s3 setting flags into prepare() to ensure that they're set 
> during
> eviction. Drop the existing call to return 1 in this case because the 
> suspend()
> callback looks for the flags too.
>
> Reported-by: Jürg Billeter 
> Closes: https://gitlab.freedesktop.org/drm/amd/-
> /issues/3132#note_2271038
> Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
> callback")
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b74f68a15802..190b2ee9e36b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device
> *dev)
>   pm_runtime_suspended(dev))
>   return 1;
>
> - /* if we will not support s3 or s2i for the device
> -  *  then skip suspend
> -  */
> - if (!amdgpu_acpi_is_s0ix_active(adev) &&
> - !amdgpu_acpi_is_s3_active(adev))
> - return 1;
> + if (amdgpu_acpi_is_s0ix_active(adev))
> + adev->in_s0ix = true;
> + else if (amdgpu_acpi_is_s3_active(adev))
> + adev->in_s3 = true;
>

Will resume always get called to clear these after after prepare?  Will these 
ever get set and then not unset?

Alex

>   return amdgpu_device_prepare(drm_dev);  } @@ -2484,10 +2482,6
> @@ static int amdgpu_pmops_suspend(struct device *dev)
>   struct drm_device *drm_dev = dev_get_drvdata(dev);
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> - if (amdgpu_acpi_is_s0ix_active(adev))
> - adev->in_s0ix = true;
> - else if (amdgpu_acpi_is_s3_active(adev))
> - adev->in_s3 = true;
>   if (!adev->in_s0ix && !adev->in_s3)
>   return 0;
>   return amdgpu_device_suspend(drm_dev, true);
> --
> 2.34.1



[PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback

2024-02-06 Thread Mario Limonciello
commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() 
callback")
intentionally moved the eviction of resources to earlier in the suspend
process, but this introduced a subtle change that it occurs before adev->in_s0ix
or adev->in_s3 are set. This meant that APUs actually started to evict
resources at suspend time as well.

Move the s0i3/s3 setting flags into prepare() to ensure that they're set
during eviction. Drop the existing call to return 1 in this case because
the suspend() callback looks for the flags too.

Reported-by: Jürg Billeter 
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038
Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() 
callback")
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b74f68a15802..190b2ee9e36b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device *dev)
pm_runtime_suspended(dev))
return 1;
 
-   /* if we will not support s3 or s2i for the device
-*  then skip suspend
-*/
-   if (!amdgpu_acpi_is_s0ix_active(adev) &&
-   !amdgpu_acpi_is_s3_active(adev))
-   return 1;
+   if (amdgpu_acpi_is_s0ix_active(adev))
+   adev->in_s0ix = true;
+   else if (amdgpu_acpi_is_s3_active(adev))
+   adev->in_s3 = true;
 
return amdgpu_device_prepare(drm_dev);
 }
@@ -2484,10 +2482,6 @@ static int amdgpu_pmops_suspend(struct device *dev)
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
 
-   if (amdgpu_acpi_is_s0ix_active(adev))
-   adev->in_s0ix = true;
-   else if (amdgpu_acpi_is_s3_active(adev))
-   adev->in_s3 = true;
if (!adev->in_s0ix && !adev->in_s3)
return 0;
return amdgpu_device_suspend(drm_dev, true);
-- 
2.34.1



RE: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, February 6, 2024 4:15 PM
> To: Greathouse, Joseph ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander
> 
> Subject: Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD
> topology
>
>
> On 2024-02-06 15:55, Joseph Greathouse wrote:
> > The current kfd_gpu_cache_info structure is only partially filled in
> > for some architectures. This means that for devices where we do not
> > fill in some fields, we can returned uninitialized values through  the
> > KFD topology.
> > Zero out the kfd_gpu_cache_info before asking the remaining fields to
> > be filled in by lower-level functions.
> >
> > Signed-off-by: Joseph Greathouse 
>
> This fixes your previous patch "drm/amdkfd: Add cache line sizes to KFD
> topology". Alex, I think the previous patch hasn't gone upstream yet. Do you
> want a Fixes: tag or is is possible to squash this with Joe's previous patch
> before upstreaming?

Either way.  I can fix up the tag when we upstream or squash it.

Alex

>
> One nit-pick below.
>
>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > index 3df2a8ad86fb..67c1e7f84750 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct
> > kfd_topology_device *dev, struct
> >
> > gpu_processor_id = dev->node_props.simd_id_base;
> >
> > +   memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) *
> > +KFD_MAX_CACHE_TYPES);
>
> Just use sizeof(cache_info). No need to calculate the size of the array and 
> risk
> getting it wrong.
>
> Regards,
>Felix
>
>
> > pcache_info = cache_info;
> > num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
> > if (!num_of_cache_types) {


RE: [PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

2024-02-06 Thread Errabolu, Ramesh
[AMD Official Use Only - General]

Looks fine by me

Regards,
Ramesh

-Original Message-
From: amd-gfx  On Behalf Of Kent Russell
Sent: Wednesday, February 7, 2024 3:02 AM
To: amd-gfx@lists.freedesktop.org
Cc: Joshi, Mukul ; Russell, Kent 
Subject: [PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

Its currently incorrectly multiplied by number of XCCs in the partition

Fixes: 6b537864925e ("drm/amdkfd: Update cache info for GFX 9.4.3")
Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..533b8292b136 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1640,12 +1640,10 @@ static int fill_in_l2_l3_pcache(struct 
kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;

-   if (pcache->cache_level == 2)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
* num_xcc;
-   else if (mode)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
/ mode;
-   else
-   pcache->cache_size = pcache_info[cache_type].cache_size;
+   pcache->cache_size = pcache_info[cache_type].cache_size;
+   /* Partition mode only affects L3 cache size */
+   if (mode && pcache->cache_level == 3)
+   pcache->cache_size /= mode;

if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)
pcache->cache_type |= HSA_CACHE_TYPE_DATA;
--
2.34.1



[PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

2024-02-06 Thread Kent Russell
Its currently incorrectly multiplied by number of XCCs in the partition

Fixes: 6b537864925e ("drm/amdkfd: Update cache info for GFX 9.4.3")
Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..533b8292b136 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1640,12 +1640,10 @@ static int fill_in_l2_l3_pcache(struct 
kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;
 
-   if (pcache->cache_level == 2)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
* num_xcc;
-   else if (mode)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
/ mode;
-   else
-   pcache->cache_size = pcache_info[cache_type].cache_size;
+   pcache->cache_size = pcache_info[cache_type].cache_size;
+   /* Partition mode only affects L3 cache size */
+   if (mode && pcache->cache_level == 3)
+   pcache->cache_size /= mode;
 
if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)
pcache->cache_type |= HSA_CACHE_TYPE_DATA;
-- 
2.34.1



Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

The firmware has not been released yet, It's still undergoing regression 
testing.

Alex



From: Shengyu Qu
Sent: Tuesday, February 6, 2024 5:08 AM
To: Deucher, Alexander; Kuehling, Felix; amd-gfx@lists.freedesktop.org
Cc: wiagn...@outlook.com; Cornwall, Jay; Koenig, Christian; Paneer Selvam, 
Arunpravin
Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)

Hi Alexander,

在 2024/2/6 1:12, Deucher, Alexander 写道:

Are you only seeing the problem with this patch applied or in general?  If you 
are seeing it in general, it likely related to a firmware issue that was 
recently fixed that will be resolved with an update CP firmware image.
Driver side changes:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3
https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c

Alex


This problem is not affected by this patch, so possible the firmware issue. 
Where can I get the newest firmware image? Or is it already pushed to 
linux-firmware repo?

Best regards,
Shengyu



RE: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode

2024-02-06 Thread Russell, Kent
[AMD Official Use Only - General]

Oh excellent, it didn't get merged in yet. Time to squash!

 Kent

> -Original Message-
> From: Kuehling, Felix 
> Sent: Tuesday, February 6, 2024 4:29 PM
> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
> Cc: Joshi, Mukul 
> Subject: Re: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode
>
>
> On 2024-02-06 16:24, Kent Russell wrote:
> > Partition mode only affects L3 cache size. After removing the L2 check in
> > the previous patch, make sure we aren't dividing all cache sizes by
> > partition mode, just L3.
> >
> > Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3")
> The fixes tag looks wrong. I can't find the commit a75bfb3c4045
> anywhere. Did your previous patch actually make it into the branch yet?
> Maybe you can still abandon it in Gerrit.
>
> Regards,
>Felix
>
>
>
> > Signed-off-by: Kent Russell 
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > index 64bf2a56f010..533b8292b136 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > @@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct
> kfd_cache_properties **props_ext,
> > else
> > mode = UNKNOWN_MEMORY_PARTITION_MODE;
> >
> > -   if (mode)
> > -   pcache->cache_size =
> pcache_info[cache_type].cache_size / mode;
> > -   else
> > -   pcache->cache_size =
> pcache_info[cache_type].cache_size;
> > +   pcache->cache_size = pcache_info[cache_type].cache_size;
> > +   /* Partition mode only affects L3 cache size */
> > +   if (mode && pcache->cache_level == 3)
> > +   pcache->cache_size /= mode;
> >
> > if (pcache_info[cache_type].flags &
> CRAT_CACHE_FLAGS_DATA_CACHE)
> > pcache->cache_type |= HSA_CACHE_TYPE_DATA;


Re: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode

2024-02-06 Thread Felix Kuehling



On 2024-02-06 16:24, Kent Russell wrote:

Partition mode only affects L3 cache size. After removing the L2 check in
the previous patch, make sure we aren't dividing all cache sizes by
partition mode, just L3.

Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3")
The fixes tag looks wrong. I can't find the commit a75bfb3c4045 
anywhere. Did your previous patch actually make it into the branch yet? 
Maybe you can still abandon it in Gerrit.


Regards,
  Felix




Signed-off-by: Kent Russell 
---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 64bf2a56f010..533b8292b136 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct 
kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;
  
-		if (mode)

-   pcache->cache_size = pcache_info[cache_type].cache_size 
/ mode;
-   else
-   pcache->cache_size = pcache_info[cache_type].cache_size;
+   pcache->cache_size = pcache_info[cache_type].cache_size;
+   /* Partition mode only affects L3 cache size */
+   if (mode && pcache->cache_level == 3)
+   pcache->cache_size /= mode;
  
  		if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)

pcache->cache_type |= HSA_CACHE_TYPE_DATA;


[PATCH] drm/amdkfd: Don't divide L2 cache by partition mode

2024-02-06 Thread Kent Russell
Partition mode only affects L3 cache size. After removing the L2 check in
the previous patch, make sure we aren't dividing all cache sizes by
partition mode, just L3.

Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3")
Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 64bf2a56f010..533b8292b136 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct 
kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;
 
-   if (mode)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
/ mode;
-   else
-   pcache->cache_size = pcache_info[cache_type].cache_size;
+   pcache->cache_size = pcache_info[cache_type].cache_size;
+   /* Partition mode only affects L3 cache size */
+   if (mode && pcache->cache_level == 3)
+   pcache->cache_size /= mode;
 
if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)
pcache->cache_type |= HSA_CACHE_TYPE_DATA;
-- 
2.34.1



Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Felix Kuehling



On 2024-02-06 15:55, Joseph Greathouse wrote:

The current kfd_gpu_cache_info structure is only partially
filled in for some architectures. This means that for devices
where we do not fill in some fields, we can returned
uninitialized values through  the KFD topology.
Zero out the kfd_gpu_cache_info before asking the remaining
fields to be filled in by lower-level functions.

Signed-off-by: Joseph Greathouse 


This fixes your previous patch "drm/amdkfd: Add cache line sizes to KFD 
topology". Alex, I think the previous patch hasn't gone upstream yet. Do 
you want a Fixes: tag or is is possible to squash this with Joe's 
previous patch before upstreaming?


One nit-pick below.



---
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..67c1e7f84750 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct 
kfd_topology_device *dev, struct
  
  	gpu_processor_id = dev->node_props.simd_id_base;
  
+	memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) * KFD_MAX_CACHE_TYPES);


Just use sizeof(cache_info). No need to calculate the size of the array 
and risk getting it wrong.


Regards,
  Felix



pcache_info = cache_info;
num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
if (!num_of_cache_types) {


RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

2024-02-06 Thread Errabolu, Ramesh
[AMD Official Use Only - General]

Comments inline.

Regards,
Ramesh

-Original Message-
From: amd-gfx  On Behalf Of Joshi, Mukul
Sent: Wednesday, February 7, 2024 1:36 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

[AMD Official Use Only - General]

[AMD Official Use Only - General]

The commit description needs a Fixes tag of the offending commit.
With that fixed, this patch is:

Reviewed-by: Mukul Joshi 

> -Original Message-
> From: Russell, Kent 
> Sent: Tuesday, February 6, 2024 1:06 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Joshi, Mukul ; Russell, Kent
> 
> Subject: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
>
> Its currently incorrectly multiplied by number of XCCs in the
> partition
>
> Signed-off-by: Kent Russell 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 3df2a8ad86fb..64bf2a56f010 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct
> kfd_cache_properties **props_ext,
>   else
>   mode = UNKNOWN_MEMORY_PARTITION_MODE;
>
> - if (pcache->cache_level == 2)
> - pcache->cache_size =
> pcache_info[cache_type].cache_size * num_xcc;
> - else if (mode)
> + if (mode)
>   pcache->cache_size =
> pcache_info[cache_type].cache_size / mode;
>   else
>   pcache->cache_size =
> pcache_info[cache_type].cache_size;
Ramesh: Per my reading a cache_size is correct and should be around 4 MiB. Per 
my thinking "mode" does not come into play?
> --
> 2.34.1



[PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology

2024-02-06 Thread Joseph Greathouse
The current kfd_gpu_cache_info structure is only partially
filled in for some architectures. This means that for devices
where we do not fill in some fields, we can returned
uninitialized values through  the KFD topology.
Zero out the kfd_gpu_cache_info before asking the remaining
fields to be filled in by lower-level functions.

Signed-off-by: Joseph Greathouse 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..67c1e7f84750 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct 
kfd_topology_device *dev, struct
 
gpu_processor_id = dev->node_props.simd_id_base;
 
+   memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) * 
KFD_MAX_CACHE_TYPES);
pcache_info = cache_info;
num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info);
if (!num_of_cache_types) {
-- 
2.20.1



[PATCH 2/3] drm/amdgpu: Add hdp v7_0 ip block support

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add hdp v7_0 ip block support.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c | 142 ++
 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h |  31 ++
 3 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 9bc5f3dde442..87022325bbf7 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -98,7 +98,7 @@ amdgpu-y += \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o 
mxgpu_nv.o \
nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o 
soc21.o \
sienna_cichlid.o smu_v13_0_10.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o 
hdp_v5_2.o lsdma_v6_0.o \
-   nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o
+   nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o hdp_v7_0.o
 
 # add DF block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c
new file mode 100644
index ..8d7d0813e331
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c
@@ -0,0 +1,142 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_atombios.h"
+#include "hdp_v7_0.h"
+
+#include "hdp/hdp_7_0_0_offset.h"
+#include "hdp/hdp_7_0_0_sh_mask.h"
+#include 
+
+static void hdp_v7_0_flush_hdp(struct amdgpu_device *adev,
+   struct amdgpu_ring *ring)
+{
+   if (!ring || !ring->funcs->emit_wreg)
+   WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + 
KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+   else
+   amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + 
KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+}
+
+static void hdp_v7_0_update_clock_gating(struct amdgpu_device *adev,
+bool enable)
+{
+   uint32_t hdp_clk_cntl, hdp_clk_cntl1;
+   uint32_t hdp_mem_pwr_cntl;
+
+   if (!(adev->cg_flags & (AMD_CG_SUPPORT_HDP_LS |
+   AMD_CG_SUPPORT_HDP_DS |
+   AMD_CG_SUPPORT_HDP_SD)))
+   return;
+
+   hdp_clk_cntl = hdp_clk_cntl1 = RREG32_SOC15(HDP, 0,regHDP_CLK_CNTL);
+   hdp_mem_pwr_cntl = RREG32_SOC15(HDP, 0, regHDP_MEM_POWER_CTRL);
+
+   /* Before doing clock/power mode switch,
+* forced on IPH & RC clock */
+   hdp_clk_cntl = REG_SET_FIELD(hdp_clk_cntl, HDP_CLK_CNTL,
+RC_MEM_CLK_SOFT_OVERRIDE, 1);
+   WREG32_SOC15(HDP, 0, regHDP_CLK_CNTL, hdp_clk_cntl);
+
+   /* disable clock and power gating before any changing */
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+ATOMIC_MEM_POWER_CTRL_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+ATOMIC_MEM_POWER_LS_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+ATOMIC_MEM_POWER_DS_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+ATOMIC_MEM_POWER_SD_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+RC_MEM_POWER_CTRL_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+RC_MEM_POWER_LS_EN, 0);
+   hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL,
+ 

[PATCH 3/3] drm/amdgpu/discovery: Add hdp v7_0 ip block

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add hdp v7_0 ip block

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index c4370f154e8b..59530fe36b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -64,6 +64,7 @@
 #include "hdp_v5_0.h"
 #include "hdp_v5_2.h"
 #include "hdp_v6_0.h"
+#include "hdp_v7_0.h"
 #include "nv.h"
 #include "soc21.h"
 #include "navi10_ih.h"
@@ -2569,6 +2570,9 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
case IP_VERSION(6, 1, 0):
adev->hdp.funcs = _v6_0_funcs;
break;
+   case IP_VERSION(7, 0, 0):
+   adev->hdp.funcs = _v7_0_funcs;
+   break;
default:
break;
}
-- 
2.42.0



[PATCH 0/3] HDP 7.0 Support

2024-02-06 Thread Alex Deucher
This series adds support for HDP 7.0.  HDP (Host Data Path),
provides CPU access to device memory via the PCI BAR.

Patch 1 adds the register headers and is very large, so I've
omitted it.

Hawking Zhang (1):
  drm/amdgpu: Add hdp v7_0_0 ip headers (v3)

Likun Gao (2):
  drm/amdgpu: Add hdp v7_0 ip block support
  drm/amdgpu/discovery: Add hdp v7_0 ip block

 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |   4 +
 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c | 142 
 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h |  31 +
 .../include/asic_reg/hdp/hdp_7_0_0_offset.h   | 219 ++
 .../include/asic_reg/hdp/hdp_7_0_0_sh_mask.h  | 735 ++
 6 files changed, 1132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/hdp/hdp_7_0_0_offset.h
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/hdp/hdp_7_0_0_sh_mask.h

-- 
2.42.0



[PATCH 3/3] drm/amdgpu/discovery: Add ih v7_0 ip block

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add ih v7_0 ip block.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index e941aeb6f16a..c4370f154e8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -69,6 +69,7 @@
 #include "navi10_ih.h"
 #include "ih_v6_0.h"
 #include "ih_v6_1.h"
+#include "ih_v7_0.h"
 #include "gfx_v10_0.h"
 #include "gfx_v11_0.h"
 #include "sdma_v5_0.h"
@@ -1768,6 +1769,9 @@ static int amdgpu_discovery_set_ih_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(6, 1, 0):
amdgpu_device_ip_block_add(adev, _v6_1_ip_block);
break;
+   case IP_VERSION(7, 0, 0):
+   amdgpu_device_ip_block_add(adev, _v7_0_ip_block);
+   break;
default:
dev_err(adev->dev,
"Failed to add ih ip block(OSSSYS_HWIP:0x%x)\n",
-- 
2.42.0



[PATCH 2/3] drm/amdgpu: Add ih v7_0 ip block support

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add ih v7_0 ip block support.

Signed-off-by: Likun Gao 
Signed-off-by: Hawking Zhang 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 766 +++
 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h |  28 +
 3 files changed, 796 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 3f7de16e0dc4..9bc5f3dde442 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -132,7 +132,8 @@ amdgpu-y += \
vega20_ih.o \
navi10_ih.o \
ih_v6_0.o \
-   ih_v6_1.o
+   ih_v6_1.o \
+   ih_v7_0.o
 
 # add PSP block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c
new file mode 100644
index ..236806797b23
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c
@@ -0,0 +1,766 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "amdgpu.h"
+#include "amdgpu_ih.h"
+
+#include "oss/osssys_7_0_0_offset.h"
+#include "oss/osssys_7_0_0_sh_mask.h"
+
+#include "soc15_common.h"
+#include "ih_v7_0.h"
+
+#define MAX_REARM_RETRY 10
+
+static void ih_v7_0_set_interrupt_funcs(struct amdgpu_device *adev);
+
+/**
+ * ih_v7_0_init_register_offset - Initialize register offset for ih rings
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Initialize register offset ih rings (IH_V7_0).
+ */
+static void ih_v7_0_init_register_offset(struct amdgpu_device *adev)
+{
+   struct amdgpu_ih_regs *ih_regs;
+
+   /* ih ring 2 is removed
+* ih ring and ih ring 1 are available */
+   if (adev->irq.ih.ring_size) {
+   ih_regs = >irq.ih.ih_regs;
+   ih_regs->ih_rb_base = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_BASE);
+   ih_regs->ih_rb_base_hi = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_BASE_HI);
+   ih_regs->ih_rb_cntl = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_CNTL);
+   ih_regs->ih_rb_wptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_WPTR);
+   ih_regs->ih_rb_rptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_RPTR);
+   ih_regs->ih_doorbell_rptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_DOORBELL_RPTR);
+   ih_regs->ih_rb_wptr_addr_lo = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_WPTR_ADDR_LO);
+   ih_regs->ih_rb_wptr_addr_hi = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_WPTR_ADDR_HI);
+   ih_regs->psp_reg_id = PSP_REG_IH_RB_CNTL;
+   }
+
+   if (adev->irq.ih1.ring_size) {
+   ih_regs = >irq.ih1.ih_regs;
+   ih_regs->ih_rb_base = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_BASE_RING1);
+   ih_regs->ih_rb_base_hi = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_BASE_HI_RING1);
+   ih_regs->ih_rb_cntl = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_CNTL_RING1);
+   ih_regs->ih_rb_wptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_WPTR_RING1);
+   ih_regs->ih_rb_rptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_RB_RPTR_RING1);
+   ih_regs->ih_doorbell_rptr = SOC15_REG_OFFSET(OSSSYS, 0, 
regIH_DOORBELL_RPTR_RING1);
+   ih_regs->psp_reg_id = PSP_REG_IH_RB_CNTL_RING1;
+   }
+}
+
+/**
+ * force_update_wptr_for_self_int - Force update the wptr for self interrupt
+ *
+ * @adev: amdgpu_device pointer
+ * @threshold: threshold to trigger the wptr reporting
+ * @timeout: timeout to trigger the wptr reporting
+ * @enabled: Enable/disable timeout flush mechanism
+ *
+ * threshold input range: 0 ~ 15, default 0,
+ * real_threshold = 2^threshold
+ * timeout input range: 0 ~ 20, default 8,
+ * real_timeout = (2^timeout) * 1024 / (socclk_freq)
+ *
+ * Force 

[PATCH 0/3] IH 7.0 support

2024-02-06 Thread Alex Deucher
This series adds support for IH 7.0.x.  IH is the interrupt handler
on the GPU.  Interrupts are written to a ring buffer and the driver
walks the ring buffer handling the interrupt packets.

Patch 1 adds the new register headers and is very large, so I've
omitted it.

Hawking Zhang (1):
  drm/amdgpu: Add osssys v7_0_0 ip headers (v4)

Likun Gao (2):
  drm/amdgpu: Add ih v7_0 ip block support
  drm/amdgpu/discovery: Add ih v7_0 ip block

 drivers/gpu/drm/amd/amdgpu/Makefile   |3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |4 +
 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c  |  766 
 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h  |   28 +
 .../asic_reg/oss/osssys_7_0_0_offset.h|  279 +
 .../asic_reg/oss/osssys_7_0_0_sh_mask.h   | 1029 +
 6 files changed, 2108 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/oss/osssys_7_0_0_offset.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/oss/osssys_7_0_0_sh_mask.h

-- 
2.42.0



[PATCH 2/3] drm/amdgpu: Add lsdma v7_0 ip block support

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add lsdma v7_0 ip block support.

Signed-off-by: Likun Gao 
Signed-off-by: Hawking Zhang 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile |   2 +-
 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c | 121 
 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h |  31 ++
 3 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 1b04bae60fbf..3f7de16e0dc4 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -98,7 +98,7 @@ amdgpu-y += \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o 
mxgpu_nv.o \
nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o 
soc21.o \
sienna_cichlid.o smu_v13_0_10.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o 
hdp_v5_2.o lsdma_v6_0.o \
-   nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o
+   nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o
 
 # add DF block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c
new file mode 100644
index ..396262044ea8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c
@@ -0,0 +1,121 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include "amdgpu.h"
+#include "lsdma_v7_0.h"
+#include "amdgpu_lsdma.h"
+
+#include "lsdma/lsdma_7_0_0_offset.h"
+#include "lsdma/lsdma_7_0_0_sh_mask.h"
+
+static int lsdma_v7_0_wait_pio_status(struct amdgpu_device *adev)
+{
+   return amdgpu_lsdma_wait_for(adev, SOC15_REG_OFFSET(LSDMA, 0, 
regLSDMA_PIO_STATUS),
+   LSDMA_PIO_STATUS__PIO_IDLE_MASK | 
LSDMA_PIO_STATUS__PIO_FIFO_EMPTY_MASK,
+   LSDMA_PIO_STATUS__PIO_IDLE_MASK | 
LSDMA_PIO_STATUS__PIO_FIFO_EMPTY_MASK);
+}
+
+static int lsdma_v7_0_copy_mem(struct amdgpu_device *adev,
+  uint64_t src_addr,
+  uint64_t dst_addr,
+  uint64_t size)
+{
+   int ret;
+   uint32_t tmp;
+
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_SRC_ADDR_LO, 
lower_32_bits(src_addr));
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_SRC_ADDR_HI, 
upper_32_bits(src_addr));
+
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_LO, 
lower_32_bits(dst_addr));
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_HI, 
upper_32_bits(dst_addr));
+
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_CONTROL, 0x0);
+
+   tmp = RREG32_SOC15(LSDMA, 0, regLSDMA_PIO_COMMAND);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, BYTE_COUNT, size);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, SRC_LOCATION, 0);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, DST_LOCATION, 0);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, SRC_ADDR_INC, 0);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, DST_ADDR_INC, 0);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, OVERLAP_DISABLE, 0);
+   tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, CONSTANT_FILL, 0);
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_COMMAND, tmp);
+
+   ret = lsdma_v7_0_wait_pio_status(adev);
+   if (ret)
+   dev_err(adev->dev, "LSDMA PIO failed to copy memory!\n");
+
+   return ret;
+}
+
+static int lsdma_v7_0_fill_mem(struct amdgpu_device *adev,
+  uint64_t dst_addr,
+  uint32_t data,
+  uint64_t size)
+{
+   int ret;
+   uint32_t tmp;
+
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_CONSTFILL_DATA, data);
+
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_LO, 
lower_32_bits(dst_addr));
+   WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_HI, 

[PATCH 3/3] drm/amdgpu/discovery: Add lsdma v7_0 ip block

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Add lsdma v7_0 ip block.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 93c84a1c1d3e..e941aeb6f16a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -75,6 +75,7 @@
 #include "sdma_v5_2.h"
 #include "sdma_v6_0.h"
 #include "lsdma_v6_0.h"
+#include "lsdma_v7_0.h"
 #include "vcn_v2_0.h"
 #include "jpeg_v2_0.h"
 #include "vcn_v3_0.h"
@@ -2641,6 +2642,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
case IP_VERSION(6, 0, 3):
adev->lsdma.funcs = _v6_0_funcs;
break;
+   case IP_VERSION(7, 0, 0):
+   case IP_VERSION(7, 0, 1):
+   adev->lsdma.funcs = _v7_0_funcs;
+   break;
default:
break;
}
-- 
2.42.0



[PATCH 0/3] LSDMA 7.0 support

2024-02-06 Thread Alex Deucher
LSDMA (Light SDMA) is a general purpose SDMA engine on the GPU.
The driver uses it for MMIO-controlled DMA access to GPU
accessible memory.  This adds support for ASICs containing
LSDMA version 7.0.x.

The first patch adds the register headers and is very large, so I've
omitted it.

Hawking Zhang (1):
  drm/amdgpu: Add lsdma v7_0_0 ip headers (v3)

Likun Gao (2):
  drm/amdgpu: Add lsdma v7_0 ip block support
  drm/amdgpu/discovery: Add lsdma v7_0 ip block

 drivers/gpu/drm/amd/amdgpu/Makefile   |2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |5 +
 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c   |  121 ++
 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h   |   31 +
 .../asic_reg/lsdma/lsdma_7_0_0_offset.h   |  388 +
 .../asic_reg/lsdma/lsdma_7_0_0_sh_mask.h  | 1411 +
 6 files changed, 1957 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/lsdma/lsdma_7_0_0_offset.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/lsdma/lsdma_7_0_0_sh_mask.h

-- 
2.42.0



[PATCH 2/2] drm/amdgpu: Add athub v4_1_0 ip block support

2024-02-06 Thread Alex Deucher
From: Hawking Zhang 

Add athub v4_1_0 ip block support.

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c | 121 ++
 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h |  30 ++
 3 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4c989da4d2f3..1b04bae60fbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -233,7 +233,8 @@ amdgpu-y += \
athub_v1_0.o \
athub_v2_0.o \
athub_v2_1.o \
-   athub_v3_0.o
+   athub_v3_0.o \
+   athub_v4_1_0.o
 
 # add SMUIO block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c 
b/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c
new file mode 100644
index ..14f0a63cfb45
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c
@@ -0,0 +1,121 @@
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+#include "athub_v4_1_0.h"
+#include "athub/athub_4_1_0_offset.h"
+#include "athub/athub_4_1_0_sh_mask.h"
+#include "soc15_common.h"
+
+static uint32_t athub_v4_1_0_get_cg_cntl(struct amdgpu_device *adev)
+{
+   uint32_t data;
+
+   switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) {
+   case IP_VERSION(4, 1, 0):
+   data = RREG32_SOC15(ATHUB, 0, regATHUB_MISC_CNTL);
+   break;
+   default:
+   break;
+   }
+   return data;
+}
+
+static void athub_v4_1_0_set_cg_cntl(struct amdgpu_device *adev, uint32_t data)
+{
+   switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) {
+   case IP_VERSION(4, 1, 0):
+   WREG32_SOC15(ATHUB, 0, regATHUB_MISC_CNTL, data);
+   break;
+   default:
+   break;
+   }
+}
+
+static void
+athub_v4_1_0_update_medium_grain_clock_gating(struct amdgpu_device *adev,
+ bool enable)
+{
+   uint32_t def, data;
+
+   def = data = athub_v4_1_0_get_cg_cntl(adev);
+
+   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_ATHUB_MGCG))
+   data |= ATHUB_MISC_CNTL__CG_ENABLE_MASK;
+   else
+   data &= ~ATHUB_MISC_CNTL__CG_ENABLE_MASK;
+
+   if (def != data)
+   athub_v4_1_0_set_cg_cntl(adev, data);
+}
+
+static void
+athub_v4_1_0_update_medium_grain_light_sleep(struct amdgpu_device *adev,
+bool enable)
+{
+   uint32_t def, data;
+
+   def = data = athub_v4_1_0_get_cg_cntl(adev);
+
+   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_ATHUB_LS))
+   data |= ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK;
+   else
+   data &= ~ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK;
+
+   if (def != data)
+   athub_v4_1_0_set_cg_cntl(adev, data);
+}
+
+int athub_v4_1_0_set_clockgating(struct amdgpu_device *adev,
+enum amd_clockgating_state state)
+{
+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
+   switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) {
+   case IP_VERSION(4, 1, 0):
+   athub_v4_1_0_update_medium_grain_clock_gating(adev,
+   state == AMD_CG_STATE_GATE);
+   athub_v4_1_0_update_medium_grain_light_sleep(adev,
+   state == AMD_CG_STATE_GATE);
+   break;
+   default:
+   break;
+   }
+
+   return 0;
+}
+
+void athub_v4_1_0_get_clockgating(struct amdgpu_device *adev, u64 *flags)
+{
+   int data;
+
+   /* AMD_CG_SUPPORT_ATHUB_MGCG */
+   data = 

[PATCH 0/2] Add ATHUB 4.1 support

2024-02-06 Thread Alex Deucher
This adds support for ATHUB 4.1.x.  The driver's
interaction with this hardware is largely limited
to enabling clockgating features.

The first just adds the register headers and is
large, so I've omitted it.

Hawking Zhang (2):
  drm/amdgpu: Add athub v4_1_0 ip headers (v5)
  drm/amdgpu: Add athub v4_1_0 ip block support

 drivers/gpu/drm/amd/amdgpu/Makefile   |3 +-
 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c |  121 ++
 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h |   30 +
 .../asic_reg/athub/athub_4_1_0_offset.h   |  287 
 .../asic_reg/athub/athub_4_1_0_sh_mask.h  | 1348 +
 5 files changed, 1788 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/athub/athub_4_1_0_offset.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/athub/athub_4_1_0_sh_mask.h

-- 
2.42.0



[PATCH] drm/amdgpu: skip ucode bo reserve for RLC AUTOLOAD

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Skip ucode BO reservation for backdoor RLC autoload.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 3e12763e477a..afa3ac931638 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -1060,7 +1060,8 @@ static int amdgpu_ucode_patch_jt(struct 
amdgpu_firmware_info *ucode,
 
 int amdgpu_ucode_create_bo(struct amdgpu_device *adev)
 {
-   if (adev->firmware.load_type != AMDGPU_FW_LOAD_DIRECT) {
+   if ((adev->firmware.load_type != AMDGPU_FW_LOAD_DIRECT) &&
+   (adev->firmware.load_type != AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO)) {
amdgpu_bo_create_kernel(adev, adev->firmware.fw_size, PAGE_SIZE,
(amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ?
AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
-- 
2.42.0



[PATCH] drm/amdkfd: fill in data for control stack header for gfx10

2024-02-06 Thread Alex Deucher
From: Jonathan Kim 

The debugger requires the control stack header to be filled in to
update_waves.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Jonathan Kim 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 25 
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index 57bf5e513f4d..e5cc697a3ca8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -128,6 +128,31 @@ struct mqd_manager {
uint32_t mqd_size;
 };
 
+struct mqd_user_context_save_area_header {
+   /* Byte offset from start of user context
+* save area to the last saved top (lowest
+* address) of control stack data. Must be
+* 4 byte aligned.
+*/
+   uint32_t control_stack_offset;
+
+   /* Byte size of the last saved control stack
+* data. Must be 4 byte aligned.
+*/
+   uint32_t control_stack_size;
+
+   /* Byte offset from start of user context save
+* area to the last saved base (lowest address)
+* of wave state data. Must be 4 byte aligned.
+*/
+   uint32_t wave_state_offset;
+
+   /* Byte size of the last saved wave state data.
+* Must be 4 byte aligned.
+*/
+   uint32_t wave_state_size;
+};
+
 struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_node *dev,
struct queue_properties *q);
 
-- 
2.42.0



[PATCH] drm/amd/swsmu: add judgement for vcn jpeg dpm set

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Only enable VCN/JPEG dpm when VCN/JPEG PG flag was set
when smu set dpm table.

Signed-off-by: Likun Gao 
Reviewed-by: Kenneth Feng 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 30 +++
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 0ad947df777a..3d72c945cf56 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -751,6 +751,7 @@ static int smu_early_init(void *handle)
 
 static int smu_set_default_dpm_table(struct smu_context *smu)
 {
+   struct amdgpu_device *adev = smu->adev;
struct smu_power_context *smu_power = >smu_power;
struct smu_power_gate *power_gate = _power->power_gate;
int vcn_gate, jpeg_gate;
@@ -759,25 +760,34 @@ static int smu_set_default_dpm_table(struct smu_context 
*smu)
if (!smu->ppt_funcs->set_default_dpm_table)
return 0;
 
-   vcn_gate = atomic_read(_gate->vcn_gated);
-   jpeg_gate = atomic_read(_gate->jpeg_gated);
+   if (adev->pg_flags & AMD_PG_SUPPORT_VCN)
+   vcn_gate = atomic_read(_gate->vcn_gated);
+   if (adev->pg_flags & AMD_PG_SUPPORT_JPEG)
+   jpeg_gate = atomic_read(_gate->jpeg_gated);
 
-   ret = smu_dpm_set_vcn_enable(smu, true);
-   if (ret)
-   return ret;
+   if (adev->pg_flags & AMD_PG_SUPPORT_VCN) {
+   ret = smu_dpm_set_vcn_enable(smu, true);
+   if (ret)
+   return ret;
+   }
 
-   ret = smu_dpm_set_jpeg_enable(smu, true);
-   if (ret)
-   goto err_out;
+   if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) {
+   ret = smu_dpm_set_jpeg_enable(smu, true);
+   if (ret)
+   goto err_out;
+   }
 
ret = smu->ppt_funcs->set_default_dpm_table(smu);
if (ret)
dev_err(smu->adev->dev,
"Failed to setup default dpm clock tables!\n");
 
-   smu_dpm_set_jpeg_enable(smu, !jpeg_gate);
+   if (adev->pg_flags & AMD_PG_SUPPORT_JPEG)
+   smu_dpm_set_jpeg_enable(smu, !jpeg_gate);
 err_out:
-   smu_dpm_set_vcn_enable(smu, !vcn_gate);
+   if (adev->pg_flags & AMD_PG_SUPPORT_VCN)
+   smu_dpm_set_vcn_enable(smu, !vcn_gate);
+
return ret;
 }
 
-- 
2.42.0



[PATCH] drm/amdgpu: support rlc auotload type set

2024-02-06 Thread Alex Deucher
From: Likun Gao 

Support to set fw_load_type=3 to use backdoor
rlc autoload.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index afa3ac931638..2ab01b18d62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -556,6 +556,8 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, int 
load_type)
default:
if (!load_type)
return AMDGPU_FW_LOAD_DIRECT;
+   else if (load_type == 3)
+   return AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO;
else
return AMDGPU_FW_LOAD_PSP;
}
-- 
2.42.0



RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

2024-02-06 Thread Joshi, Mukul
[AMD Official Use Only - General]

The commit description needs a Fixes tag of the offending commit.
With that fixed, this patch is:

Reviewed-by: Mukul Joshi 

> -Original Message-
> From: Russell, Kent 
> Sent: Tuesday, February 6, 2024 1:06 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Joshi, Mukul ; Russell, Kent
> 
> Subject: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
>
> Its currently incorrectly multiplied by number of XCCs in the partition
>
> Signed-off-by: Kent Russell 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 3df2a8ad86fb..64bf2a56f010 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct
> kfd_cache_properties **props_ext,
>   else
>   mode = UNKNOWN_MEMORY_PARTITION_MODE;
>
> - if (pcache->cache_level == 2)
> - pcache->cache_size =
> pcache_info[cache_type].cache_size * num_xcc;
> - else if (mode)
> + if (mode)
>   pcache->cache_size =
> pcache_info[cache_type].cache_size / mode;
>   else
>   pcache->cache_size =
> pcache_info[cache_type].cache_size;
> --
> 2.34.1



Re: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler

2024-02-06 Thread Deucher, Alexander
[AMD Official Use Only - General]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Laurent 
Morichetti 
Sent: Thursday, February 1, 2024 4:33 PM
To: amd-gfx@lists.freedesktop.org 
Cc: jay.cornwall@amd.com ; Morichetti, Laurent 
; Six, Lancelot ; Cornwall, 
Jay 
Subject: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler

Call the 2nd level trap handler if the cwsr handler is entered with any
one of wave_start, wave_end, or trap_after_inst exceptions.

Signed-off-by: Laurent Morichetti 
Tested-by: Lancelot Six 
Reviewed-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h  |  2 +-
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm  | 17 -
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index d1caaf0e6a7c..2e9b64edb8d2 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -2518,7 +2518,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
 0x8b6eff7b, 0x0400,
 0xbfa20045, 0xbf830010,
 0xb8fbf803, 0xbfa0fffa,
-   0x8b6eff7b, 0x0900,
+   0x8b6eff7b, 0x00160900,
 0xbfa20015, 0x8b6eff7b,
 0x71ff, 0xbfa10008,
 0x8b6fff7b, 0x7080,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
index 71b3dc0c7363..7568ff3af978 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
@@ -81,6 +81,11 @@ var SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT   = 11
 var SQ_WAVE_TRAPSTS_POST_SAVECTX_SIZE   = 21
 var SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK   = 0x800
 var SQ_WAVE_TRAPSTS_EXCP_HI_MASK= 0x7000
+#if ASIC_FAMILY >= CHIP_PLUM_BONITO
+var SQ_WAVE_TRAPSTS_WAVE_START_MASK= 0x2
+var SQ_WAVE_TRAPSTS_WAVE_END_MASK  = 0x4
+var SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK   = 0x10
+#endif

 var SQ_WAVE_MODE_EXCP_EN_SHIFT  = 12
 var SQ_WAVE_MODE_EXCP_EN_ADDR_WATCH_SHIFT   = 19
@@ -92,6 +97,16 @@ var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK= 0x003F8000

 var SQ_WAVE_MODE_DEBUG_EN_MASK  = 0x800

+#if ASIC_FAMILY < CHIP_PLUM_BONITO
+var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = 
SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+#else
+var S_TRAPSTS_NON_MASKABLE_EXCP_MASK   = SQ_WAVE_TRAPSTS_MEM_VIOL_MASK 
|\
+ 
SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK  |\
+ 
SQ_WAVE_TRAPSTS_WAVE_START_MASK|\
+ SQ_WAVE_TRAPSTS_WAVE_END_MASK 
 |\
+ 
SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK
+#endif
+
 // bits [31:24] unused by SPI debug data
 var TTMP11_SAVE_REPLAY_W64H_SHIFT   = 31
 var TTMP11_SAVE_REPLAY_W64H_MASK= 0x8000
@@ -224,7 +239,7 @@ L_NOT_HALTED:
 // Check non-maskable exceptions. memory_violation, illegal_instruction
 // and xnack_error exceptions always cause the wave to enter the trap
 // handler.
-   s_and_b32   ttmp2, s_save_trapsts, 
SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+   s_and_b32   ttmp2, s_save_trapsts, S_TRAPSTS_NON_MASKABLE_EXCP_MASK
 s_cbranch_scc1  L_FETCH_2ND_TRAP

 // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi.

base-commit: c4b562a17829454713e45219fa754be1bfda9004
--
2.25.1



Re: [PATCH v3 1/5] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-02-06 Thread Rafael J. Wysocki
On Fri, Feb 2, 2024 at 5:09 PM Mario Limonciello
 wrote:
>
> On 2/2/2024 10:07, Rafael J. Wysocki wrote:
> > On Thu, Feb 1, 2024 at 11:11 PM Mario Limonciello
> >  wrote:
> >>
> >> The ACPI specification allows for an EDID to be up to 512 bytes but
> >> the _DDC EDID fetching code will only try up to 256 bytes.
> >>
> >> Modify the code to instead start at 512 bytes and work it's way
> >> down instead.
> >>
> >> As _DDC is now called up to 4 times on a machine debugging messages
> >> are noisier than necessary.  Decrease from info to debug.
> >>
> >> Link: 
> >> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> >> Signed-off-by: Mario Limonciello 
> >
> > Acked-by: Rafael J. Wysocki 
> >
> > or I can apply it if that's preferred.
>
> Thanks!
>
> I think go ahead and apply this one to your -next tree.

Applied now.

Barring any issues with it, It will get into linux-next in a couple of days.

Thanks!


Re: [PATCH 3/3] drm/amdgpu: wire up the can_remove() callback

2024-02-06 Thread Christian König

Am 06.02.24 um 15:29 schrieb Daniel Vetter:

On Fri, Feb 02, 2024 at 03:40:03PM -0800, Greg Kroah-Hartman wrote:

On Fri, Feb 02, 2024 at 05:25:56PM -0500, Hamza Mahfooz wrote:

Removing an amdgpu device that still has user space references allocated
to it causes undefined behaviour.

Then fix that please.  There should not be anything special about your
hardware that all of the tens of thousands of other devices can't handle
today.

What happens when I yank your device out of a system with a pci hotplug
bus?  You can't prevent that either, so this should not be any different
at all.

sorry, but please, just fix your driver.

fwiw Christian König from amd already rejected this too, I have no idea
why this was submitted


Well that was my fault.

I commented on an internal bug tracker that when sysfs bind/undbind is a 
different code path from PCI remove/re-scan we could try to reject it.


Turned out it isn't a different code path.


  since the very elaborate plan I developed with a
bunch of amd folks was to fix the various lifetime lolz we still have in
drm. We unfortunately export the world of internal objects to userspace as
uabi objects with dma_buf, dma_fence and everything else, but it's all
fixable and we have the plan even documented:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug

So yeah anything that isn't that plan of record is very much no-go for drm
drivers. Unless we change that plan of course, but that needs a
documentation patch first and a big discussion.

Aside from an absolute massive pile of kernel-internal refcounting bugs
the really big one we agreed on after a lot of discussion is that SIGBUS
on dma-buf mmaps is no-go for drm drivers, because it would break way too
much userspace in ways which are simply not fixable (since sig handlers
are shared in a process, which means the gl/vk driver cannot use it).

Otherwise it's bog standard "fix the kernel bugs" work, just a lot of it.


Ignoring a few memory leaks because of messed up refcounting we actually 
got that working quite nicely.


At least hot unplug / hot add seems to be working rather reliable in our 
internal testing.


So it can't be that messed up.

Regards,
Christian.



Cheers, Sima




RE: [PATCH v2] drm/amd/display: Implement bounds check for stream encoder creation in DCN301

2024-02-06 Thread Li, Roman
[Public]

Inline.

> -Original Message-
> From: SHANMUGAM, SRINIVASAN 
> Sent: Monday, February 5, 2024 10:47 PM
> To: Li, Roman ; Siqueira, Rodrigo
> ; Pillai, Aurabindo 
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> 
> Subject: [PATCH v2] drm/amd/display: Implement bounds check for stream
> encoder creation in DCN301
>
> 'stream_enc_regs' array is an array of dcn10_stream_enc_registers structures.
> The array is initialized with four elements, corresponding to the four calls 
> to
> stream_enc_regs() in the array initializer. This means that valid indices for 
> this
> array are 0, 1, 2, and 3.
>
> The error message 'stream_enc_regs' 4 <= 5 below, is indicating that there is 
> an
> attempt to access this array with an index of 5, which is out of bounds. This
> could lead to undefined behavior
>
> Here, eng_id is used as an index to access the stream_enc_regs array. If 
> eng_id
> is 5, this would result in an out-of-bounds access on the stream_enc_regs
> array.
>
> Thus fixing Buffer overflow error in dcn301_stream_encoder_create reported
> by Smatch:
> drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_reso
> urce.c:1011 dcn301_stream_encoder_create() error: buffer overflow
> 'stream_enc_regs' 4 <= 5
>
> Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)")
> Cc: Roman Li 
> Cc: Rodrigo Siqueira 
> Cc: Aurabindo Pillai 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>  .../drm/amd/display/dc/resource/dcn301/dcn301_resource.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git
> a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
> b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
> index 511ff6b5b985..4a475a723191 100644
> --- a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
> +++
> b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
> @@ -999,7 +999,7 @@ static struct stream_encoder
> *dcn301_stream_encoder_create(enum engine_id eng_id
>   vpg = dcn301_vpg_create(ctx, vpg_inst);
>   afmt = dcn301_afmt_create(ctx, afmt_inst);
>
> - if (!enc1 || !vpg || !afmt) {
> + if (!enc1 || !vpg || !afmt || eng_id >= ARRAY_SIZE(stream_enc_regs))
> {
>   kfree(enc1);
>   kfree(vpg);
>   kfree(afmt);

Reviewed-by: Roman Li 
I don't think the part below is necessary.

> @@ -1007,10 +1007,9 @@ static struct stream_encoder
> *dcn301_stream_encoder_create(enum engine_id eng_id
>   }
>
>   dcn30_dio_stream_encoder_construct(enc1, ctx, ctx->dc_bios,
> - eng_id, vpg, afmt,
> - _enc_regs[eng_id],
> - _shift, _mask);
> -
> +eng_id, vpg, afmt,
> +_enc_regs[eng_id],
> +_shift, _mask);
>   return >base;
>  }
>
> --
> 2.34.1



[PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3

2024-02-06 Thread Kent Russell
Its currently incorrectly multiplied by number of XCCs in the partition

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3df2a8ad86fb..64bf2a56f010 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct 
kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;
 
-   if (pcache->cache_level == 2)
-   pcache->cache_size = pcache_info[cache_type].cache_size 
* num_xcc;
-   else if (mode)
+   if (mode)
pcache->cache_size = pcache_info[cache_type].cache_size 
/ mode;
else
pcache->cache_size = pcache_info[cache_type].cache_size;
-- 
2.34.1



Re: [PATCH] drm/amd/display: Clear phantom stream count and plane count

2024-02-06 Thread Harry Wentland



On 2024-02-05 08:54, Deucher, Alexander wrote:
> [Public]
> 
> 
> [Public]
> 
> 
> Acked-by: Alex Deucher 
> 

Reviewed-by: Harry Wentland 

Harry

--
> *From:* amd-gfx  on behalf of Mario 
> Limonciello 
> *Sent:* Friday, February 2, 2024 7:30 PM
> *To:* amd-gfx@lists.freedesktop.org 
> *Cc:* Limonciello, Mario 
> *Subject:* [PATCH] drm/amd/display: Clear phantom stream count and plane count
>  
> When dc_state_destruct() was refactored the new phantom_stream_count
> and phantom_plane_count members weren't cleared.
> 
> Fixes: 012a04b1d6af ("drm/amd/display: Refactor phantom resource allocation")
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
> index 88c6436b28b6..180ac47868c2 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
> @@ -291,11 +291,14 @@ void dc_state_destruct(struct dc_state *state)
>  dc_stream_release(state->phantom_streams[i]);
>  state->phantom_streams[i] = NULL;
>  }
> +   state->phantom_stream_count = 0;
>  
>  for (i = 0; i < state->phantom_plane_count; i++) {
>  dc_plane_state_release(state->phantom_planes[i]);
>  state->phantom_planes[i] = NULL;
>  }
> +   state->phantom_plane_count = 0;
> +
>  state->stream_mask = 0;
>  memset(>res_ctx, 0, sizeof(state->res_ctx));
>  memset(>pp_display_cfg, 0, sizeof(state->pp_display_cfg));
> -- 
> 2.34.1
> 



[PATCH v4 12/24] drm/amdgpu: use trapID 4 for host trap

2024-02-06 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use
TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify
the host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 +
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 +
 3 files changed, 1070 insertions(+), 1054 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 7d8c0e13ac12..adfe5e5585e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
/* select *target_wave_slot */
value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+   /* set TrapID 4 for HOSTTRAP */
+   value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4);
 
mutex_lock(>grbm_idx_mutex);
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index af1f678790e7..b3c681d7256b 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf82025e,
+   0xbf820001, 0xbf820263,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
-   0x00ff, 0xbf85001e,
+   0x00ff, 0xbf850023,
0x866eff7b, 0x0400,
-   0xbf85005b, 0xbf8e0010,
+   0xbf850060, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
-   0xbf850015, 0x866eff7b,
-   0x71ff, 0xbf840008,
-   0x866fff7b, 0x7080,
-   0xbf840001, 0xbeee1a87,
-   0xb8eff801, 0x8e6e8c6e,
-   0x866e6f6e, 0xbf85000a,
-   0x866eff6d, 0x00ff,
-   0xbf850007, 0xb8eef801,
-   0x866eff6e, 0x0800,
-   0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850040,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xbf0d8f7b, 0xbf840002,
-   0x877bff7b, 0x,
-   0xc0031c3d, 0x0010,
-   0xc0071bbd, 0x,
-   0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x8671ff6d,
-   0x0100, 0xbf840004,
-   0x92f1ff70, 0x00010001,
-   0xbf840016, 0xbf820005,
-   0x86708170, 0x8e709770,
-   0x8977ff77, 0x0080,
-   0x8077, 0x86ee6e6e,
-   0xbf840001, 0xbe801d6e,
-   0x866eff6d, 0x01ff,
-   0xbf850005, 0x8778ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x866eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f7a8b77,
+   0xbf85001a, 0x866eff6d,
+   0x01ff, 0xbf06ff6e,
+   0x0104, 0xbf850015,
+   0x866eff7b, 0x71ff,
+   0xbf840008, 0x866fff7b,
+   0x7080, 0xbf840001,
+   0xbeee1a87, 0xb8eff801,
+   0x8e6e8c6e, 0x866e6f6e,
+   0xbf85000a, 0x866eff6d,
+   0x00ff, 0xbf850007,
+   0xb8eef801, 0x866eff6e,
+   0x0800, 0xbf850003,
+   0x866eff7b, 0x0400,
+   0xbf850040, 0xb8faf807,
0x867aff7a, 0x001f8000,
-   0xb97af807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xbf0d8f7b,
+   0xbf840002, 0x877bff7b,
+   0x, 0xc0031c3d,
+   0x0010, 0xc0071bbd,
+   0x, 0xc0071ebd,
+   0x0008, 0xbf8cc07f,
+   0x8671ff6d, 0x0100,
+   0xbf840004, 0x92f1ff70,
+   0x00010001, 0xbf840016,
+   0xbf820005, 0x86708170,
+   0x8e709770, 0x8977ff77,
+   0x0080, 0x8077,
+   0x86ee6e6e, 0xbf840001,
+   0xbe801d6e, 0x866eff6d,
+   0x01ff, 0xbf850005,
+   0x8778ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820005, 0x866eff6d,
+   0x0100, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbf94, 0x877a8478,
-   0xb97af802, 0xbf8e0002,
-   0xbf88fffe, 0xb8fa2a05,
-   0x807a817a, 0x8e7a8a7a,
-   

[PATCH v4 21/24] drm/amdkfd: add pc sampling thread to trigger trap

2024-02-06 Thread James Zhu
Add a kthread to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 91 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 6f50ba1f8989..ea9478c3738a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -39,6 +39,84 @@ struct supported_pc_sample_info supported_formats[] = {
{ IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
 };
 
+static int kfd_pc_sample_thread(void *param)
+{
+   struct amdgpu_device *adev;
+   struct kfd_node *node = param;
+   uint32_t timeout = 0;
+   ktime_t next_trap_time;
+
+   mutex_lock(>pcs_data.mutex);
+   if (node->pcs_data.hosttrap_entry.base.active_count &&
+   node->pcs_data.hosttrap_entry.base.pc_sample_info.interval &&
+   node->kfd2kgd->trigger_pc_sample_trap) {
+   switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) 
{
+   case KFD_IOCTL_PCS_TYPE_TIME_US:
+   timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval;
+   break;
+   default:
+   pr_debug("PC Sampling type %d not supported.",
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+   }
+   }
+   mutex_unlock(>pcs_data.mutex);
+   if (!timeout)
+   return -EINVAL;
+
+   adev = node->adev;
+
+   allow_signal(SIGKILL);
+   while (!kthread_should_stop() &&
+   
!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable) &&
+   
!signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   next_trap_time = ktime_add_us(ktime_get_raw(), timeout);
+
+   node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,
+   >pcs_data.hosttrap_entry.base.target_simd,
+   
>pcs_data.hosttrap_entry.base.target_wave_slot,
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+   pr_debug_ratelimited("triggered a host trap.");
+
+   might_sleep();
+   do {
+   ktime_t wait_time;
+   s64 wait_ns, wait_us;
+
+   wait_time = ktime_sub(next_trap_time, ktime_get_raw());
+   wait_ns = ktime_to_ns(wait_time);
+   wait_us = ktime_to_us(wait_time);
+   if (wait_ns >= 1)
+   usleep_range(wait_us - 10, wait_us);
+   else if (wait_ns > 0)
+   schedule();
+   else
+   break;
+   } while (1);
+   }
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+
+   return 0;
+}
+
+static int kfd_pc_sample_thread_start(struct kfd_node *node)
+{
+   char thread_name[16];
+   int ret = 0;
+
+   snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread =
+   kthread_run(kfd_pc_sample_thread, node, thread_name);
+
+   if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   ret = 
PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+   pr_debug("Failed to create pc sample thread for %s with ret = 
%d.",
+   thread_name, ret);
+   }
+
+   return ret;
+}
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
@@ -99,6 +177,7 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
struct pc_sampling_entry *pcs_entry)
 {
bool pc_sampling_start = false;
+   int ret = 0;
 
pcs_entry->enabled = true;
mutex_lock(>dev->pcs_data.mutex);
@@ -112,13 +191,16 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
mutex_unlock(>dev->pcs_data.mutex);
 
while (pc_sampling_start) {
-   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable))
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {
usleep_range(1000, 2000);
-   else
+   } else {
+   if 
(!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   ret = kfd_pc_sample_thread_start(pdd->dev);
break;
+   }

[PATCH v4 22/24] drm/amdkfd: add pc sampling release when process release

2024-02-06 Thread James Zhu
Add pc sampling release when process release, it will force to
stop all activate sessions with this process.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 25 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index ea9478c3738a..783844ddd82f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -337,6 +337,31 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
return 0;
 }
 
+void kfd_pc_sample_release(struct kfd_process_device *pdd)
+{
+   struct pc_sampling_entry *pcs_entry;
+   struct idr *idp;
+   uint32_t id;
+
+   /* force to release all PC sampling task for this process */
+   idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr;
+   do {
+   pcs_entry = NULL;
+   mutex_lock(>dev->pcs_data.mutex);
+   idr_for_each_entry(idp, pcs_entry, id) {
+   if (pcs_entry->pdd != pdd)
+   continue;
+   break;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+   if (pcs_entry) {
+   if (pcs_entry->enabled)
+   kfd_pc_sample_stop(pdd, pcs_entry);
+   kfd_pc_sample_destroy(pdd, id, pcs_entry);
+   }
+   } while (pcs_entry);
+}
+
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
index 4eeded4ea5b6..6175563ca9be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args);
+void kfd_pc_sample_release(struct kfd_process_device *pdd);
 
 #endif /* KFD_PC_SAMPLING_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 4a450abf9fa9..bbad0b0848df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_svm.h"
 #include "kfd_smi_events.h"
 #include "kfd_debug.h"
+#include "kfd_pc_sampling.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
pr_debug("Releasing pdd (topology id %d) for process (pasid 
0x%x)\n",
pdd->dev->id, p->pasid);
 
+   kfd_pc_sample_release(pdd);
+
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
 
-- 
2.25.1



[PATCH v4 23/24] drm/amdkfd: Set debug trap bit when enabling PC Sampling

2024-02-06 Thread James Zhu
From: David Yat Sin 

We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC
Sampling so that the TTMP registers are valid inside the sampling data.
runtime_info.ttmp_setup will be cleared when the user application
does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without
KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit.

It is also not valid to have the debugger attached to a process while PC
sampling is enabled so adding some checks to prevent this.

Signed-off-by: David Yat Sin 
Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 26 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 13 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  3 ++
 5 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d9cac97c54c0..bc37f3ee2c66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2804,26 +2804,9 @@ static int runtime_enable(struct kfd_process *p, 
uint64_t r_debug,
 
p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;
p->runtime_info.r_debug = r_debug;
-   p->runtime_info.ttmp_setup = enable_ttmp_setup;
 
-   if (p->runtime_info.ttmp_setup) {
-   for (i = 0; i < p->n_pdds; i++) {
-   struct kfd_process_device *pdd = p->pdds[i];
-
-   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
-   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   true,
-   
pdd->dev->vm_info.last_vmid_kfd);
-   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
-   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   false,
-   0);
-   }
-   }
-   }
+   if (enable_ttmp_setup)
+   kfd_dbg_enable_ttmp_setup(p);
 
 retry:
if (p->debug_trap_enabled) {
@@ -2972,10 +2955,10 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
}
 
-   /* Check if target is still PTRACED. */
rcu_read_lock();
+   /* Check if target is still PTRACED. */
if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
-   && ptrace_parent(target->lead_thread) != 
current) {
+   && ptrace_parent(target->lead_thread) != current) {
pr_err("PID %i is not PTRACED and cannot be debugged\n", 
args->pid);
r = -EPERM;
}
@@ -2985,6 +2968,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
 
mutex_lock(>mutex);
+   if (!!target->pc_sampling_ref) {
+   pr_debug("Cannot enable debug trap on PID:%d because PC 
Sampling active\n", args->pid);
+   r = -EBUSY;
+   goto unlock_out;
+   }
 
if (args->op != KFD_IOC_DBG_TRAP_ENABLE && !target->debug_trap_enabled) 
{
pr_err("PID %i not debug enabled for op %i\n", args->pid, 
args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index d889e3545120..8d836c65c636 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -1120,3 +1120,29 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct 
kfd_process *target,
 
mutex_unlock(>event_mutex);
 }
+
+void kfd_dbg_enable_ttmp_setup(struct kfd_process *p)
+{
+   int i;
+
+   if (p->runtime_info.ttmp_setup)
+   return;
+
+   p->runtime_info.ttmp_setup = true;
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   true,
+   pdd->dev->vm_info.last_vmid_kfd);
+   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
+   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   false,
+   0);
+   }
+ 

[PATCH v4 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2024-02-06 Thread James Zhu
Bump the minor version to declare pc sampling feature is now
available.

Signed-off-by: James Zhu 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index ec1b6404b185..7c2c867b57e8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -41,9 +41,10 @@
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
  * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl
+ * - 1.16 - Add PC Sampling ioctl
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 15
+#define KFD_IOCTL_MINOR_VERSION 16
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH v4 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for aldebaran.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index aff08321e976..27eda75ceecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
+static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device 
*adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap,
 };
-- 
2.25.1



[PATCH v4 18/24] drm/amdkfd: enable pc sampling stop

2024-02-06 Thread James Zhu
Enable pc sampling stop.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 29 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index b46caa52fbe8..53e44e68408e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,10 +99,33 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd)
return -EINVAL;
 }
 
-static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
+static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_stop = false;
+
+   pcs_entry->enabled = false;
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count--;
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) {
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
true);
+   pc_sampling_stop = true;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
false);
 
+   if (pc_sampling_stop) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
+   pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
false);
+   mutex_unlock(>dev->pcs_data.mutex);
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
@@ -250,7 +273,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (!pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_stop(pdd);
+   return kfd_pc_sample_stop(pdd, pcs_entry);
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 5a7805147da0..7bdcbe6be4fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,10 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   uint32_t active_count;  /* Num of active sessions */
+   uint32_t target_simd;   /* target simd for trap */
+   uint32_t target_wave_slot;  /* target wave slot for trap */
+   bool stop_enable;   /* pc sampling stop in process */
struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
-- 
2.25.1



[PATCH v4 19/24] drm/amdkfd: add queue remapping

2024-02-06 Thread James Zhu
Add queue remapping to ensure that any waves executing the PC sampling
part of the trap handler are done before kfd_pc_sample_stop returns,
and that no new waves enter that part of the trap handler afterwards.
This avoids race conditions that could lead to use-after-free. Unmapping
and remapping the queues either waits for the waves to drain, or preempts
them with CWSR, which itself executes a trap and waits for previous traps
to finish.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  4 +++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager 
*dqm)
return debug_map_and_unlock(dqm);
 }
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period)
+{
+   dqm_lock(dqm);
+   if (!dqm->dev->kfd->shared_resources.enable_mes)
+   execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+   dqm_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period);
+
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 53e44e68408e..df2f4bfd0cda 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
 
 struct supported_pc_sample_info {
uint32_t ip_version;
@@ -115,9 +116,10 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
 
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
false);
+   remap_queue(pdd->dev->dqm,
+   KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);
 
if (pc_sampling_stop) {
-
mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
-- 
2.25.1



[PATCH v4 09/24] drm/amdkfd: add interface to trigger pc sampling trap

2024-02-06 Thread James Zhu
Add interface to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 6d094cf3587d..12f9021d563e 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "amdgpu_irq.h"
 #include "amdgpu_gfx.h"
 
@@ -318,6 +320,11 @@ struct kfd2kgd_calls {
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
+   uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method method);
 };
 
 #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
-- 
2.25.1



[PATCH v4 17/24] drm/amdkfd: add setting trap pc sampling flag

2024-02-06 Thread James Zhu
Add setting trap pc sampling flag.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 2df240518d1f..5a7805147da0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
  uint64_t tma_addr);
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled);
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 3e3cead6ccf8..4a450abf9fa9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1463,6 +1463,19 @@ void kfd_process_set_trap_debug_flag(struct 
qcm_process_device *qpd,
}
 }
 
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(method, [2]);
+   else
+   clear_bit(method, [2]);
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH v4 05/24] drm/amdkfd: enable pc sampling create

2024-02-06 Thread James Zhu
From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 59 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e9277c9beec7..9267de0bbdac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -108,7 +108,64 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd)
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   struct kfd_pc_sample_info *supported_format = NULL;
+   struct kfd_pc_sample_info user_info;
+   int ret;
+   int i;
+
+   if (user_args->num_sample_info != 1)
+   return -EINVAL;
+
+   ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,
+   sizeof(struct kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info from user\n");
+   return -EFAULT;
+   }
+
+   if (user_info.flags & KFD_IOCTL_PCS_FLAG_POWER_OF_2 &&
+   user_info.interval & (user_info.interval - 1)) {
+   pr_debug("Sampling interval's power is unmatched!");
+   return -EINVAL;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+   && user_info.method == 
supported_formats[i].sample_info->method
+   && user_info.type == 
supported_formats[i].sample_info->type
+   && user_info.interval <= 
supported_formats[i].sample_info->interval_max
+   && user_info.interval >= 
supported_formats[i].sample_info->interval_min) {
+   supported_format =
+   (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;
+   break;
+   }
+   }
+
+   if (!supported_format) {
+   pr_debug("Sampling format is not supported!");
+   return -EOPNOTSUPP;
+   }
+
+   mutex_lock(>dev->pcs_data.mutex);
+   if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+   memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   _info, sizeof(user_info))) {
+   ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+   >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : -EEXIST;
+   }
+
+   /* TODO: add trace_id return */
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
+
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   return 0;
 }
 
 static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f55195fea3df..96999f602224 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,9 +269,19 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+struct kfd_dev_pc_sampling_data {
+   uint32_t use_count; /* Num of PC sampling sessions */
+   struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+   struct kfd_dev_pc_sampling_data base;
+};
+
 /* Per device PC Sampling data */
 struct kfd_dev_pc_sampling {
struct mutex mutex;
+   struct kfd_dev_pcs_hosttrap hosttrap_entry;
 };
 
 struct kfd_node {
-- 
2.25.1



[PATCH v4 11/24] drm/amdkfd/gfx9: enable host trap

2024-02-06 Thread James Zhu
Enable host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index d1caaf0e6a7c..af1f678790e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820258,
+   0xbf820001, 0xbf82025e,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d4,
+   0xbf820001, 0xbf8202da,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202df,
+   0xbf820001, 0xbf8202e5,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   

[PATCH v4 06/24] drm/amdkfd: add trace_id return

2024-02-06 Thread James Zhu
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
 static void kfd_pc_sampling_init(struct kfd_node *dev)
 {
mutex_init(>pcs_data.mutex);
+   idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
 }
 
 static void kfd_pc_sampling_exit(struct kfd_node *dev)
 {
+   idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
mutex_destroy(>pcs_data.mutex);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 9267de0bbdac..a607fc148958 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -110,6 +110,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
 {
struct kfd_pc_sample_info *supported_format = NULL;
struct kfd_pc_sample_info user_info;
+   struct pc_sampling_entry *pcs_entry;
int ret;
int i;
 
@@ -157,7 +158,19 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return ret ? -EFAULT : -EEXIST;
}
 
-   /* TODO: add trace_id return */
+   pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+   if (!pcs_entry) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   return -ENOMEM;
+   }
+
+   i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   pcs_entry, 1, 0, GFP_KERNEL);
+   if (i < 0) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   kfree(pcs_entry);
+   return i;
+   }
 
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
@@ -165,6 +178,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
mutex_unlock(>dev->pcs_data.mutex);
 
+   pcs_entry->pdd = pdd;
+   user_args->trace_id = (uint32_t)i;
+
+   pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 96999f602224..2df240518d1f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,7 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
 
@@ -756,6 +757,11 @@ enum kfd_pdd_bound {
  */
 #define SDMA_ACTIVITY_DIVISOR  100
 
+struct pc_sampling_entry {
+   bool enabled;
+   struct kfd_process_device *pdd;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
/* The device that owns this data. */
-- 
2.25.1



[PATCH v4 16/24] drm/amdkfd: use bit operation set debug trap

2024-02-06 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting,
to use bit operation to change selected bit only.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 717a60d7a4ea..3e3cead6ccf8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1443,13 +1443,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */
+enum KFD_TRAP_TYPE_BIT {
+   KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */
+   KFD_TRAP_TYPE_HOST,
+   KFD_TRAP_TYPE_STOCHASTIC,
+};
+
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled)
 {
if (qpd->cwsr_kaddr) {
-   uint64_t *tma =
-   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[2] = enabled;
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(KFD_TRAP_TYPE_DEBUG, [2]);
+   else
+   clear_bit(KFD_TRAP_TYPE_DEBUG, [2]);
}
 }
 
-- 
2.25.1



[PATCH v4 20/24] drm/amdkfd: enable pc sampling start

2024-02-06 Thread James Zhu
Enable pc sampling start.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 27 +---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index df2f4bfd0cda..6f50ba1f8989 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -95,9 +95,30 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_start = false;
+
+   pcs_entry->enabled = true;
+   mutex_lock(>dev->pcs_data.mutex);
+
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
true);
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+   pc_sampling_start = true;
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   while (pc_sampling_start) {
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable))
+   usleep_range(1000, 2000);
+   else
+   break;
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -269,7 +290,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_start(pdd);
+   return kfd_pc_sample_start(pdd, pcs_entry);
 
case KFD_IOCTL_PCS_OP_STOP:
if (!pcs_entry->enabled)
-- 
2.25.1



[PATCH v4 14/24] drm/amdkfd: trigger pc sampling trap for arcturus

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for arcturus.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0ba15dcbe4e1..10b362e072a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct 
amdgpu_device *adev,
 
return 0;
 }
+
+static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls arcturus_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap
 };
-- 
2.25.1



[PATCH v4 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
with master branch.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 include/uapi/linux/kfd_ioctl.h | 61 +-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ce46edc62a5..ec1b6404b185 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1447,6 +1447,62 @@ struct kfd_ioctl_dbg_trap_args {
};
 };
 
+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 interval;  /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 interval_min;  /* [OUT] */
+   __u64 interval_max;  /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+#define KFD_IOCTL_PCS_QUERY_TYPE_FULL (1 << 0) /* If not set, return current */
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+   __u32 flags; /* kfd_ioctl_pcs_query flags */
+   __u32 reserved;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1567,7 +1623,10 @@ struct kfd_ioctl_dbg_trap_args {
 #define AMDKFD_IOC_DBG_TRAP\
AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
 
+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28
 
 #endif
-- 
2.25.1



[PATCH v4 08/24] drm/amdkfd: enable pc sampling destroy

2024-02-06 Thread James Zhu
Enable pc sampling destroy.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 72c66d4bd24f..b46caa52fbe8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -186,10 +186,24 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
+static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x",
+   pcs_entry, trace_id, pdd->dev->id);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count--;
+   idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 
trace_id);
 
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 
0x0,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kfree(pcs_entry);
+
+   return 0;
 }
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
@@ -224,7 +238,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EBUSY;
else
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   return kfd_pc_sample_destroy(pdd, args->trace_id, 
pcs_entry);
 
case KFD_IOCTL_PCS_OP_START:
if (pcs_entry->enabled)
-- 
2.25.1



[PATCH v4 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for gfx v9.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  7 
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 5a35a8ca8922..7d8c0e13ac12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
+   uint32_t value = 0;
+
+   value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
+   value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
+
+   /* select *target_simd */
+   value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
+   /* select *target_wave_slot */
+   value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+
+   mutex_lock(>grbm_idx_mutex);
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, value);
+   mutex_unlock(>grbm_idx_mutex);
+
+   *target_wave_slot %= max_wave_slot;
+   if (!(*target_wave_slot)) {
+   (*target_simd)++;
+   *target_simd %= max_simd;
+   }
+   } else {
+   pr_debug("PC Sampling method %d not supported.", method);
+   return -EOPNOTSUPP;
+   }
+   return 0;
+}
+
 const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index ce424615f59b..b47b926891a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct 
amdgpu_device *adev,
   uint32_t grace_period,
   uint32_t *reg_offset,
   uint32_t *reg_data);
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method);
-- 
2.25.1



[PATCH v4 13/24] drm/amdgpu: add sq host trap status check

2024-02-06 Thread James Zhu
Before fire a new host trap, check the host trap status.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |  2 ++
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |  5 +++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index adfe5e5585e5..43edd62df5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev)
+{
+   uint32_t sq_hosttrap_status = 0x0;
+   int i, j;
+
+   mutex_lock(>grbm_idx_mutex);
+   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
+   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
+   amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0);
+   sq_hosttrap_status = RREG32_SOC15(GC, 0, 
mmSQ_HOSTTRAP_STATUS);
+
+   if (sq_hosttrap_status & 
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) {
+   WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS,
+   
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK);
+   sq_hosttrap_status = 0x0;
+   continue;
+   }
+   if (sq_hosttrap_status)
+   goto out;
+   }
+   }
+
+out:
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0);
+   mutex_unlock(>grbm_idx_mutex);
+
+   return sq_hosttrap_status;
+}
+
 uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
uint32_t vmid,
uint32_t max_wave_slot,
@@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
 {
if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
uint32_t value = 0;
+   uint32_t sq_hosttrap_status = 0x0;
+
+   sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev);
+   /* skip when last host trap request is still pending to 
complete */
+   if (sq_hosttrap_status)
+   return 0;
 
value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index 12d451e5475b..5b17d9066452 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -462,6 +462,8 @@
 #define mmSQ_IND_DATA_BASE_IDX 
0
 #define mmSQ_CMD   
0x037b
 #define mmSQ_CMD_BASE_IDX  
0
+#define mmSQ_HOSTTRAP_STATUS   
0x0376
+#define mmSQ_HOSTTRAP_STATUS_BASE_IDX  
0
 #define mmSQ_TIME_HI   
0x037c
 #define mmSQ_TIME_HI_BASE_IDX  
0
 #define mmSQ_TIME_LO   
0x037d
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..3dfe4ab31421 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -2616,6 +2616,11 @@
 //SQ_CMD_TIMESTAMP
 #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 
   0x0
 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK   
   0x00FFL
+//SQ_HOSTTRAP_STATUS
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT  
   0x0
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT  
   0x8
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK
   0x00FFL
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK
   0x0100L
 

[PATCH v4 07/24] drm/amdkfd: check pcs_entry valid

2024-02-06 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a607fc148958..72c66d4bd24f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -195,6 +195,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -203,13 +221,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



[PATCH v4 04/24] drm/amdkfd: add pc sampling mutex

2024-02-06 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..0e24e011f66b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev)
spin_lock_init(>smi_lock);
 }
 
+static void kfd_pc_sampling_init(struct kfd_node *dev)
+{
+   mutex_init(>pcs_data.mutex);
+}
+
+static void kfd_pc_sampling_exit(struct kfd_node *dev)
+{
+   mutex_destroy(>pcs_data.mutex);
+}
+
 static int kfd_init_node(struct kfd_node *node)
 {
int err = -1;
@@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node)
}
 
kfd_smi_init(node);
+   kfd_pc_sampling_init(node);
 
return 0;
 
@@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned 
int num_nodes)
kfd_topology_remove_device(knode);
if (knode->gws)
amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
+   kfd_pc_sampling_exit(knode);
kfree(knode);
kfd->nodes[i] = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index ae9a41670909..f55195fea3df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,6 +269,11 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+/* Per device PC Sampling data */
+struct kfd_dev_pc_sampling {
+   struct mutex mutex;
+};
+
 struct kfd_node {
unsigned int node_id;
struct amdgpu_device *adev; /* Duplicated here along with keeping
@@ -322,6 +327,8 @@ struct kfd_node {
struct kfd_local_mem_info local_mem_info;
 
struct kfd_dev *kfd;
+
+   struct kfd_dev_pc_sampling pcs_data;
 };
 
 struct kfd_dev {
-- 
2.25.1



[PATCH v4 03/24] drm/amdkfd: enable pc sampling query

2024-02-06 Thread James Zhu
From: David Yat Sin 

Enable pc sampling to query system capability.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 65 +++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..e9277c9beec7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,73 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
 
+struct supported_pc_sample_info {
+   uint32_t ip_version;
+   const struct kfd_pc_sample_info *sample_info;
+};
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+   0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, 
KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+   { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 },
+   { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
+};
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   uint64_t sample_offset;
+   int num_method = 0;
+   int ret;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version)
+   num_method++;
+
+   if (!num_method) {
+   pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+   pdd->dev->adev->ip_versions[GC_HWIP][0]);
+   return -EOPNOTSUPP;
+   }
+
+   ret = 0;
+   mutex_lock(>dev->pcs_data.mutex);
+   if (user_args->flags != KFD_IOCTL_PCS_QUERY_TYPE_FULL &&
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count) {
+   /* If we already have a session, restrict returned list to 
current method  */
+   user_args->num_sample_info = 1;
+
+   if (user_args->sample_info_ptr)
+   ret = copy_to_user((void __user *) 
user_args->sample_info_ptr,
+   
>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : 0;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   if (!user_args->sample_info_ptr || user_args->num_sample_info < 
num_method) {
+   user_args->num_sample_info = num_method;
+   pr_debug("ASIC requires space for %d kfd_pc_sample_info 
entries.", num_method);
+   return -ENOSPC;
+   }
+
+   sample_offset = user_args->sample_info_ptr;
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == 
supported_formats[i].ip_version) {
+   ret = copy_to_user((void __user *) sample_offset,
+   supported_formats[i].sample_info, sizeof(struct 
kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info to 
user.");
+   return -EFAULT;
+   }
+   sample_offset += sizeof(struct kfd_pc_sample_info);
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_start(struct kfd_process_device *pdd)
-- 
2.25.1



[PATCH v4 02/24] drm/amdkfd: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin 

Add pc sampling functions in amdkfd.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 
 5 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index a5ae7bcf44eb..790fd028a681 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -57,7 +57,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_debug.o
+   $(AMDKFD_PATH)/kfd_debug.o \
+   $(AMDKFD_PATH)/kfd_pc_sampling.o
 
 ifneq ($(CONFIG_DEBUG_FS),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 80e90fdef291..d9cac97c54c0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -41,6 +41,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_svm.h"
+#include "kfd_pc_sampling.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
@@ -1745,6 +1746,39 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int kfd_ioctl_pc_sample(struct file *filep,
+  struct kfd_process *p, void __user *data)
+{
+   struct kfd_ioctl_pc_sample_args *args = data;
+   struct kfd_process_device *pdd;
+   int ret = 0;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("PC Sampling does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+
+   if (!pdd) {
+   pr_debug("could not find gpu id 0x%x.", args->gpu_id);
+   ret = -EINVAL;
+   } else if (args->op == KFD_IOCTL_PCS_OP_START) {
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
+   if (IS_ERR(pdd)) {
+   pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
+   ret = -ESRCH;
+   }
+   }
+
+   if (!ret)
+   ret = kfd_pc_sample(pdd, args);
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
 static int criu_checkpoint_process(struct kfd_process *p,
 uint8_t __user *user_priv_data,
 uint64_t *priv_offset)
@@ -3219,6 +3253,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
kfd_ioctl_set_debug_trap, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
+   kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -3295,6 +3332,14 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
}
}
 
+   /* PC Sampling Monitor */
+   if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+   if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+   retcode = -EACCES;
+   goto err_i1;
+   }
+   }
+
if (cmd & (IOC_IN | IOC_OUT)) {
if (asize <= sizeof(stack_kdata)) {
kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
new file mode 100644
index ..a7e78ff42d07
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED 

[PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942

2024-02-06 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (5):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create
  drm/amdkfd: Set debug trap bit when enabling PC Sampling

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_entry valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   75 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   26 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  426 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   46 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |7 +
 include/uapi/linux/kfd_ioctl.h|   64 +-
 21 files changed, 1914 insertions(+), 1080 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



Re: [PATCH] drm/amd/display: Increase frame-larger-than for all display_mode_vba files

2024-02-06 Thread Alex Deucher
Applied.  Thanks!

On Mon, Feb 5, 2024 at 5:08 PM Nathan Chancellor  wrote:
>
> After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y
> and CONFIG_WERROR=y enabled) has a few new instances of
> -Wframe-larger-than for the mode support and system configuration
> functions:
>
>   
> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6:
>  error: stack frame size (2144) exceeds limit (2048) in 
> 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
>3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct 
> display_mode_lib *mode_lib)
> |  ^
>   1 error generated.
>
>   
> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6:
>  error: stack frame size (2192) exceeds limit (2048) in 
> 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
>3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct 
> display_mode_lib *mode_lib)
> |  ^
>   1 error generated.
>
>   
> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6:
>  error: stack frame size (2128) exceeds limit (2048) in 
> 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
>3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct 
> display_mode_lib *mode_lib)
> |  ^
>   1 error generated.
>
> Without the sanitizers enabled, there are no warnings.
>
> This was the catalyst for commit 6740ec97bcdb ("drm/amd/display:
> Increase frame warning limit with KASAN or KCSAN in dml2") and that same
> change was made to dml in commit 5b750b22530f ("drm/amd/display:
> Increase frame warning limit with KASAN or KCSAN in dml") but the
> frame_warn_flag variable was not applied to all files. Do so now to
> clear up the warnings and make all these files consistent.
>
> Cc: sta...@vger.kernel.org
> Closes: https://github.com/ClangBuiltLinux/linux/issue/1990
> Signed-off-by: Nathan Chancellor 
> ---
>  drivers/gpu/drm/amd/display/dc/dml/Makefile | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
> b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index 6042a5a6a44f..59ade76ffb18 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -72,11 +72,11 @@ CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_lib.o := 
> $(dml_ccflags)
>  CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_vba.o := $(dml_ccflags)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn10/dcn10_fpu.o := $(dml_ccflags)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/dcn20_fpu.o := $(dml_ccflags)
> -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags)
> +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) 
> $(frame_warn_flag)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20.o := $(dml_ccflags)
> -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags)
> +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) 
> $(frame_warn_flag)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20v2.o := 
> $(dml_ccflags)
> -CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags)
> +CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) 
> $(frame_warn_flag)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_rq_dlg_calc_21.o := $(dml_ccflags)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) 
> $(frame_warn_flag)
>  CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_rq_dlg_calc_30.o := $(dml_ccflags)
>
> ---
> base-commit: 6813cdca4ab94a238f8eb0cef3d3f3fcbdfb0ee0
> change-id: 20240205-amdgpu-raise-flt-for-dml-vba-files-ee5b5a9c5e43
>
> Best regards,
> --
> Nathan Chancellor 
>


Re: [PATCH 3/3] drm/amdgpu: wire up the can_remove() callback

2024-02-06 Thread Daniel Vetter
On Fri, Feb 02, 2024 at 03:40:03PM -0800, Greg Kroah-Hartman wrote:
> On Fri, Feb 02, 2024 at 05:25:56PM -0500, Hamza Mahfooz wrote:
> > Removing an amdgpu device that still has user space references allocated
> > to it causes undefined behaviour.
> 
> Then fix that please.  There should not be anything special about your
> hardware that all of the tens of thousands of other devices can't handle
> today.
> 
> What happens when I yank your device out of a system with a pci hotplug
> bus?  You can't prevent that either, so this should not be any different
> at all.
> 
> sorry, but please, just fix your driver.

fwiw Christian König from amd already rejected this too, I have no idea
why this was submitted since the very elaborate plan I developed with a
bunch of amd folks was to fix the various lifetime lolz we still have in
drm. We unfortunately export the world of internal objects to userspace as
uabi objects with dma_buf, dma_fence and everything else, but it's all
fixable and we have the plan even documented:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug

So yeah anything that isn't that plan of record is very much no-go for drm
drivers. Unless we change that plan of course, but that needs a
documentation patch first and a big discussion.

Aside from an absolute massive pile of kernel-internal refcounting bugs
the really big one we agreed on after a lot of discussion is that SIGBUS
on dma-buf mmaps is no-go for drm drivers, because it would break way too
much userspace in ways which are simply not fixable (since sig handlers
are shared in a process, which means the gl/vk driver cannot use it).

Otherwise it's bog standard "fix the kernel bugs" work, just a lot of it.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-02-06 Thread Sharma, Shashank

Hey Christian,

On 01/02/2024 14:48, Christian König wrote:



Am 31.01.24 um 18:14 schrieb Shashank Sharma:

This patch:
- Attaches the TLB flush fence to the PT objects being freed
- Adds a new ptr in VM to save this last TLB flush fence
- Adds a new lock in VM to prevent out-of-context update of saved
   TLB flush fence
- Adds a new ptr in tlb_flush structure to save vm

The idea is to delay freeing of page table objects until we have the
respective TLB entries flushed.

V2: rebase

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  3 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  4 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 27 +++
  .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 13 +++--
  4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 67c690044b97..b0e81c249e3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2245,6 +2245,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,

  vm->generation = 0;
    mutex_init(>eviction_lock);
+    mutex_init(>tlb_flush_lock);
  vm->evicting = false;
  vm->tlb_fence_context = dma_fence_context_alloc(1);
  @@ -2360,7 +2361,9 @@ int amdgpu_vm_make_compute(struct 
amdgpu_device *adev, struct amdgpu_vm *vm)

  }
    dma_fence_put(vm->last_update);
+    dma_fence_put(vm->tlb_fence_last);
  vm->last_update = dma_fence_get_stub();
+    vm->tlb_fence_last = dma_fence_get_stub();
  vm->is_compute_context = true;
    /* Free the shadow bo for compute VM */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 8e6fd25d07b7..b05bc586237f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -334,6 +334,10 @@ struct amdgpu_vm {
  uint64_t    *tlb_seq_cpu_addr;
  uint64_t    tlb_fence_context;
  +    /* Ptr and lock to maintain tlb flush sync */
+    struct mutex    tlb_flush_lock;
+    struct dma_fence    *tlb_fence_last;
+
  atomic64_t    kfd_last_flushed_seq;
    /* How many times we had to re-generate the page tables */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c

index 95dc0afdaffb..f1c4418c4d63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -631,6 +631,18 @@ static int amdgpu_vm_pt_alloc(struct 
amdgpu_device *adev,

  return r;
  }
  +static inline
+void amdgpu_vm_attach_tlb_fence(struct amdgpu_bo *bo, struct 
dma_fence *fence)

+{
+    if (!bo || !fence)
+    return;
+
+    if (!dma_resv_reserve_fences(bo->tbo.base.resv, 1)) {
+    dma_resv_add_fence(bo->tbo.base.resv, fence,
+   DMA_RESV_USAGE_BOOKKEEP);
+    }
+}
+
  /**
   * amdgpu_vm_pt_free - free one PD/PT
   *
@@ -638,6 +650,7 @@ static int amdgpu_vm_pt_alloc(struct 
amdgpu_device *adev,

   */
  static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
  {
+    struct amdgpu_vm *vm;
  struct amdgpu_bo *shadow;
    if (!entry->bo)
@@ -646,9 +659,23 @@ static void amdgpu_vm_pt_free(struct 
amdgpu_vm_bo_base *entry)

  entry->bo->vm_bo = NULL;
  shadow = amdgpu_bo_shadowed(entry->bo);
  if (shadow) {
+    vm = shadow->vm_bo->vm;
+
+    mutex_lock(>tlb_flush_lock);
+    if (vm->tlb_fence_last)
+    amdgpu_vm_attach_tlb_fence(shadow, vm->tlb_fence_last);
+    mutex_unlock(>tlb_flush_lock);
+
  ttm_bo_set_bulk_move(>tbo, NULL);
  amdgpu_bo_unref();
  }
+
+    vm = entry->vm;
+    mutex_lock(>tlb_flush_lock);
+    if (vm->tlb_fence_last)
+    amdgpu_vm_attach_tlb_fence(entry->bo, vm->tlb_fence_last);
+    mutex_unlock(>tlb_flush_lock);
+


That approach doesn't make sense. Instead add the freed PT/PDs to a 
linked list in the parameters structure and only really free them 
after adding the fence to the root PD.


Sure, I will do those changes.

Just for the curiosity, why wouldn't this approach work ? Wouldn't this 
delay the actual freeing of buffers TTM until the fence signal ?


- Shashank





ttm_bo_set_bulk_move(>bo->tbo, NULL);
    spin_lock(>vm->status_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

index 569681badd7c..54ec81d30034 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -31,6 +31,7 @@
  struct amdgpu_tlb_fence {
  struct dma_fence    base;
  struct amdgpu_device    *adev;
+    struct amdgpu_vm    *vm;


Big NAK to that. The VM might not live long enough to see the end of 
the TLB flush.


Regards,
Christian.


  struct dma_fence    

Re: linux-next: Tree for Feb 6 (gpu/drm/amd/display/ kernel-doc warnings)

2024-02-06 Thread Randy Dunlap



On 2/5/24 20:43, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20240205:
> 

Hi Rodrigo,

Are you aware of these kernel-doc warnings?
I think they are due to

commit b8c1c3a82e75
Author: Rodrigo Siqueira 
Date:   Mon Jan 22 14:24:57 2024 -0700
Documentation/gpu: Add kernel doc entry for MPC



../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured 
comments found
../drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no 
structured comments found
../drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no 
structured comments found

../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter 
or struct member 'pre_multiplied_alpha' not described in 'mpcc_blnd_cfg'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter 
or struct member 'overlap_only' not described in 'mpcc_blnd_cfg'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'read_mpcc_state' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'mpc_init_single_inst' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'get_mpcc_for_dpp_from_secondary' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'get_mpcc_for_dpp' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'wait_for_idle' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'assert_mpcc_idle_before_connect' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'init_mpcc_list_from_hw' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_denorm' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_denorm_clamp' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_output_csc' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_ocsc_default' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_output_gamma' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'power_on_mpc_mem_pwr' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_dwb_mux' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'disable_dwb_mux' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'is_dwb_idle' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_out_rate_control' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'set_gamut_remap' not described in 'mpc_funcs'
../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter 
or struct member 'program_1dlut' not described in 'mpc_funcs'

[lvc-project] [PATCH v2] drm/amd/pm: check return value of amdgpu_irq_add_id()

2024-02-06 Thread Igor Artemiev
amdgpu_irq_ad_id() may fail and the irq handlers will not be registered.
This patch adds error code check.

Found by Linux Verification Center (linuxtesting.org).

Signed-off-by: Igor Artemiev 
---
v2: Free the source as Alexey Khoroshilov  suggested.
 .../drm/amd/pm/powerplay/hwmgr/smu_helper.c   | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c
index 79a566f3564a..109df1039d5c 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c
@@ -647,28 +647,41 @@ int smu9_register_irq_handlers(struct pp_hwmgr *hwmgr)
 {
struct amdgpu_irq_src *source =
kzalloc(sizeof(struct amdgpu_irq_src), GFP_KERNEL);
+   int ret;
 
if (!source)
return -ENOMEM;
 
source->funcs = _irq_funcs;
 
-   amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
+   ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
SOC15_IH_CLIENTID_THM,
THM_9_0__SRCID__THM_DIG_THERM_L2H,
source);
-   amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
+   if (ret)
+   goto err;
+
+   ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
SOC15_IH_CLIENTID_THM,
THM_9_0__SRCID__THM_DIG_THERM_H2L,
source);
+   if (ret)
+   goto err;
 
/* Register CTF(GPIO_19) interrupt */
-   amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
+   ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev),
SOC15_IH_CLIENTID_ROM_SMUIO,
SMUIO_9_0__SRCID__SMUIO_GPIO19,
source);
+   if (ret)
+   goto err;
 
return 0;
+
+err:
+   kfree(source);
+
+   return ret;
 }
 
 void *smu_atom_get_data_table(void *dev, uint32_t table, uint16_t *size,
-- 
2.39.2