Re: [PATCH v2] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology
On 2024-02-07 0:32, Joseph Greathouse wrote: The current kfd_gpu_cache_info structure is only partially filled in for some architectures. This means that for devices where we do not fill in some fields, we can returned uninitialized values through the KFD topology. Zero out the kfd_gpu_cache_info before asking the remaining fields to be filled in by lower-level functions. Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology") Signed-off-by: Joseph Greathouse Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..5cb0465493b8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct kfd_topology_device *dev, struct gpu_processor_id = dev->node_props.simd_id_base; + memset(cache_info, 0, sizeof(cache_info)); pcache_info = cache_info; num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info); if (!num_of_cache_types) {
[PATCH v2] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology
The current kfd_gpu_cache_info structure is only partially filled in for some architectures. This means that for devices where we do not fill in some fields, we can returned uninitialized values through the KFD topology. Zero out the kfd_gpu_cache_info before asking the remaining fields to be filled in by lower-level functions. Fixes: 04756ac9a24c ("drm/amdkfd: Add cache line sizes to KFD topology") Signed-off-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..5cb0465493b8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct kfd_topology_device *dev, struct gpu_processor_id = dev->node_props.simd_id_base; + memset(cache_info, 0, sizeof(cache_info)); pcache_info = cache_info; num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info); if (!num_of_cache_types) { -- 2.20.1
[PATCH] drm/amd/display: Fix possible use of uninitialized 'max_chunks_fbc_mode' in 'calculate_bandwidth()'
'max_chunks_fbc_mode' is only declared and assigned a value under a specific condition in the following lines: if (data->fbc_en[i] == 1) { max_chunks_fbc_mode = 128 - dmif_chunk_buff_margin; } If 'data->fbc_en[i]' is not equal to 1 for any i, max_chunks_fbc_mode will not be initialized if it's used outside of this for loop. Ensure that 'max_chunks_fbc_mode' is properly initialized before it's used. Initialize it to a default value right after its declaration to ensure that it gets a value assigned under all possible control flow paths. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:914 calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'. drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:917 calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'. Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Cc: Harry Wentland Cc: Alex Deucher Cc: Rodrigo Siqueira Cc: Aurabindo Pillai Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c index f2dfa96f9ef5..39530b2ea495 100644 --- a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c @@ -94,7 +94,7 @@ static void calculate_bandwidth( const uint32_t s_high = 7; const uint32_t dmif_chunk_buff_margin = 1; - uint32_t max_chunks_fbc_mode; + uint32_t max_chunks_fbc_mode = 0; int32_t num_cursor_lines; int32_t i, j, k; -- 2.34.1
[PATCH] drm/amd/display: Fix possible buffer overflow in 'find_dcfclk_for_voltage()'
when 'find_dcfclk_for_voltage()' function is looping over VG_NUM_SOC_VOLTAGE_LEVELS (which is 8), but the size of the DcfClocks array is VG_NUM_DCFCLK_DPM_LEVELS (which is 7). When the loop variable i reaches 7, the function tries to access clock_table->DcfClocks[7]. However, since the size of the DcfClocks array is 7, the valid indices are 0 to 6. Index 7 is beyond the size of the array, leading to a buffer overflow. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn301/vg_clk_mgr.c:550 find_dcfclk_for_voltage() error: buffer overflow 'clock_table->DcfClocks' 7 <= 7 Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)") Cc: Roman Li Cc: Rodrigo Siqueira Cc: Aurabindo Pillai Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c index a5489fe6875f..aa9fd1dc550a 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c @@ -546,6 +546,8 @@ static unsigned int find_dcfclk_for_voltage(const struct vg_dpm_clocks *clock_ta int i; for (i = 0; i < VG_NUM_SOC_VOLTAGE_LEVELS; i++) { + if (i >= VG_NUM_DCFCLK_DPM_LEVELS) + break; if (clock_table->SocVoltage[i] == voltage) return clock_table->DcfClocks[i]; } -- 2.34.1
[PATCH] drm/amd/display: Fix possible NULL dereference on device remove/driver unload
As part of a cleanup amdgpu_dm_fini() function, which is typically called when a device is being shut down or a driver is being unloaded The below error message suggests that there is a potential null pointer dereference issue with adev->dm.dc. In the below, line of code where adev->dm.dc is used without a preceding null check: for (i = 0; i < adev->dm.dc->caps.max_links; i++) { To fix this issue, add a null check for adev->dm.dc before this line. Reported by smatch: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1959 amdgpu_dm_fini() error: we previously assumed 'adev->dm.dc' could be null (see line 1943) Fixes: 006c26a0f1c8 ("drm/amd/display: Fix crash on device remove/driver unload") Cc: Andrey Grodzovsky Cc: Harry Wentland Cc: Rodrigo Siqueira Cc: Aurabindo Pillai Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b3a5e730be24..d4c1415f4562 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1956,7 +1956,7 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev) >dm.dmub_bo_gpu_addr, >dm.dmub_bo_cpu_addr); - if (adev->dm.hpd_rx_offload_wq) { + if (adev->dm.hpd_rx_offload_wq && adev->dm.dc) { for (i = 0; i < adev->dm.dc->caps.max_links; i++) { if (adev->dm.hpd_rx_offload_wq[i].wq) { destroy_workqueue(adev->dm.hpd_rx_offload_wq[i].wq); -- 2.34.1
[PATCH] drm/amd/display: Initialize 'wait_time_microsec' variable in link_dp_training_dpia.c
wait_time_microsec = max(wait_time_microsec, (uint32_t) DPIA_CLK_SYNC_DELAY); Above line is trying to assign the maximum value between 'wait_time_microsec' and 'DPIA_CLK_SYNC_DELAY' to wait_time_microsec. However, 'wait_time_microsec' has not been assigned a value before this line, initialize 'wait_time_microsec' at the point of declaration. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training_dpia.c:697 dpia_training_eq_non_transparent() error: uninitialized symbol 'wait_time_microsec'. Fixes: 630168a97314 ("drm/amd/display: move dp link training logic to link_dp_training") Cc: Wenjing Liu Cc: Rodrigo Siqueira Cc: Aurabindo Pillai Signed-off-by: Srinivasan Shanmugam --- .../drm/amd/display/dc/link/protocols/link_dp_training_dpia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c index e8dda44b23cb..5d36bab0029c 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c @@ -619,7 +619,7 @@ static enum link_training_result dpia_training_eq_non_transparent( uint32_t retries_eq = 0; enum dc_status status; enum dc_dp_training_pattern tr_pattern; - uint32_t wait_time_microsec; + uint32_t wait_time_microsec = 0; enum dc_lane_count lane_count = lt_settings->link_settings.lane_count; union lane_align_status_updated dpcd_lane_status_updated = {0}; union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {0}; -- 2.34.1
RE: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
[AMD Official Use Only - General] > -Original Message- > From: amd-gfx On Behalf Of Thong > Sent: Tuesday, February 6, 2024 6:28 PM > To: amd-gfx@lists.freedesktop.org > Cc: Thai, Thong > Subject: [PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding > resolution > > Update the maximum resolution reported for HEVC encoding on VCN 4 devices > to reflect its 8K encoding capability. > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159 With that added, Acked-by: Alex Deucher > Signed-off-by: Thong > --- > drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c > b/drivers/gpu/drm/amd/amdgpu/soc21.c > index 48c6efcdeac9..4d7188912edf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc21.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c > @@ -50,13 +50,13 @@ static const struct amd_ip_funcs > soc21_common_ip_funcs; > /* SOC21 */ > static const struct amdgpu_video_codec_info > vcn_4_0_0_video_codecs_encode_array_vcn0[] = { > > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4 > _AVC, 4096, 2304, 0)}, > - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, > 4096, 2304, 0)}, > + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, > 8192, 4352, > +0)}, > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, > 8192, 4352, 0)}, }; > > static const struct amdgpu_video_codec_info > vcn_4_0_0_video_codecs_encode_array_vcn1[] = { > > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4 > _AVC, 4096, 2304, 0)}, > - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, > 4096, 2304, 0)}, > + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, > 8192, 4352, > +0)}, > }; > > static const struct amdgpu_video_codecs > vcn_4_0_0_video_codecs_encode_vcn0 = { > -- > 2.34.1
RE: [PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
[AMD Official Use Only - General] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Lazar, Lijo Sent: Wednesday, February 7, 2024 10:22 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Ming, Davis ; Kamal, Asad ; Ma, Le Subject: [PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9 HDP flush remapping is not done for VFs. Keep the original offsets in VF environment. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c index e90f33780803..b4723d68eab0 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c @@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device *adev) u32 inst_mask; int i; + if (amdgpu_sriov_vf(adev)) + adev->rmmio_remap.reg_offset = + SOC15_REG_OFFSET( + NBIO, 0, + regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) + << 2; WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE, 0xff & ~(adev->gfx.xcc_mask)); -- 2.25.1
[PATCH] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
HDP flush remapping is not done for VFs. Keep the original offsets in VF environment. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c index e90f33780803..b4723d68eab0 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c @@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device *adev) u32 inst_mask; int i; + if (amdgpu_sriov_vf(adev)) + adev->rmmio_remap.reg_offset = + SOC15_REG_OFFSET( + NBIO, 0, + regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) + << 2; WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE, 0xff & ~(adev->gfx.xcc_mask)); -- 2.25.1
[PATCH] drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
Update the maximum resolution reported for HEVC encoding on VCN 4 devices to reflect its 8K encoding capability. Signed-off-by: Thong --- drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c index 48c6efcdeac9..4d7188912edf 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -50,13 +50,13 @@ static const struct amd_ip_funcs soc21_common_ip_funcs; /* SOC21 */ static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn0[] = { {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)}, - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)}, {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)}, }; static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn1[] = { {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)}, - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)}, }; static const struct amdgpu_video_codecs vcn_4_0_0_video_codecs_encode_vcn0 = { -- 2.34.1
Re: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback
On 2/6/2024 16:00, Deucher, Alexander wrote: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Mario Limonciello Sent: Tuesday, February 6, 2024 4:32 PM To: amd-gfx@lists.freedesktop.org Cc: Limonciello, Mario ; Jürg Billeter Subject: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Move the s0i3/s3 setting flags into prepare() to ensure that they're set during eviction. Drop the existing call to return 1 in this case because the suspend() callback looks for the flags too. Reported-by: Jürg Billeter Closes: https://gitlab.freedesktop.org/drm/amd/- /issues/3132#note_2271038 Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index b74f68a15802..190b2ee9e36b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device *dev) pm_runtime_suspended(dev)) return 1; - /* if we will not support s3 or s2i for the device - * then skip suspend - */ - if (!amdgpu_acpi_is_s0ix_active(adev) && - !amdgpu_acpi_is_s3_active(adev)) - return 1; + if (amdgpu_acpi_is_s0ix_active(adev)) + adev->in_s0ix = true; + else if (amdgpu_acpi_is_s3_active(adev)) + adev->in_s3 = true; Will resume always get called to clear these after after prepare? Will these ever get set and then not unset? You're right; it doesn't clean up. This is the call sequence: suspend_devices_and_enter() ->dpm_suspend_start() ->->device_prepare() ->->->dpm_prepare() Errors bubble up. In suspend_devices_and_enter() errors goto Recover_platform label. This calls platform_recover(). platform_recover() is for platform recovery not device recovery. So this patch is incorrect. Let me see if I can come up with another way to do this without having to revert 5095d5418193. Alex return amdgpu_device_prepare(drm_dev); } @@ -2484,10 +2482,6 @@ static int amdgpu_pmops_suspend(struct device *dev) struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); - if (amdgpu_acpi_is_s0ix_active(adev)) - adev->in_s0ix = true; - else if (amdgpu_acpi_is_s3_active(adev)) - adev->in_s3 = true; if (!adev->in_s0ix && !adev->in_s3) return 0; return amdgpu_device_suspend(drm_dev, true); -- 2.34.1
RE: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback
[AMD Official Use Only - General] > -Original Message- > From: amd-gfx On Behalf Of Mario > Limonciello > Sent: Tuesday, February 6, 2024 4:32 PM > To: amd-gfx@lists.freedesktop.org > Cc: Limonciello, Mario ; Jürg Billeter > > Subject: [PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of > suspend() callback > > commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() > callback") intentionally moved the eviction of resources to earlier in the > suspend process, but this introduced a subtle change that it occurs before > adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to > evict resources at suspend time as well. > > Move the s0i3/s3 setting flags into prepare() to ensure that they're set > during > eviction. Drop the existing call to return 1 in this case because the > suspend() > callback looks for the flags too. > > Reported-by: Jürg Billeter > Closes: https://gitlab.freedesktop.org/drm/amd/- > /issues/3132#note_2271038 > Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() > callback") > Signed-off-by: Mario Limonciello > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 -- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index b74f68a15802..190b2ee9e36b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device > *dev) > pm_runtime_suspended(dev)) > return 1; > > - /* if we will not support s3 or s2i for the device > - * then skip suspend > - */ > - if (!amdgpu_acpi_is_s0ix_active(adev) && > - !amdgpu_acpi_is_s3_active(adev)) > - return 1; > + if (amdgpu_acpi_is_s0ix_active(adev)) > + adev->in_s0ix = true; > + else if (amdgpu_acpi_is_s3_active(adev)) > + adev->in_s3 = true; > Will resume always get called to clear these after after prepare? Will these ever get set and then not unset? Alex > return amdgpu_device_prepare(drm_dev); } @@ -2484,10 +2482,6 > @@ static int amdgpu_pmops_suspend(struct device *dev) > struct drm_device *drm_dev = dev_get_drvdata(dev); > struct amdgpu_device *adev = drm_to_adev(drm_dev); > > - if (amdgpu_acpi_is_s0ix_active(adev)) > - adev->in_s0ix = true; > - else if (amdgpu_acpi_is_s3_active(adev)) > - adev->in_s3 = true; > if (!adev->in_s0ix && !adev->in_s3) > return 0; > return amdgpu_device_suspend(drm_dev, true); > -- > 2.34.1
[PATCH] drm/amd: Set s0i3/s3 in prepare() callback instead of suspend() callback
commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Move the s0i3/s3 setting flags into prepare() to ensure that they're set during eviction. Drop the existing call to return 1 in this case because the suspend() callback looks for the flags too. Reported-by: Jürg Billeter Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index b74f68a15802..190b2ee9e36b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2464,12 +2464,10 @@ static int amdgpu_pmops_prepare(struct device *dev) pm_runtime_suspended(dev)) return 1; - /* if we will not support s3 or s2i for the device -* then skip suspend -*/ - if (!amdgpu_acpi_is_s0ix_active(adev) && - !amdgpu_acpi_is_s3_active(adev)) - return 1; + if (amdgpu_acpi_is_s0ix_active(adev)) + adev->in_s0ix = true; + else if (amdgpu_acpi_is_s3_active(adev)) + adev->in_s3 = true; return amdgpu_device_prepare(drm_dev); } @@ -2484,10 +2482,6 @@ static int amdgpu_pmops_suspend(struct device *dev) struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); - if (amdgpu_acpi_is_s0ix_active(adev)) - adev->in_s0ix = true; - else if (amdgpu_acpi_is_s3_active(adev)) - adev->in_s3 = true; if (!adev->in_s0ix && !adev->in_s3) return 0; return amdgpu_device_suspend(drm_dev, true); -- 2.34.1
RE: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology
[AMD Official Use Only - General] > -Original Message- > From: Kuehling, Felix > Sent: Tuesday, February 6, 2024 4:15 PM > To: Greathouse, Joseph ; amd- > g...@lists.freedesktop.org; Deucher, Alexander > > Subject: Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD > topology > > > On 2024-02-06 15:55, Joseph Greathouse wrote: > > The current kfd_gpu_cache_info structure is only partially filled in > > for some architectures. This means that for devices where we do not > > fill in some fields, we can returned uninitialized values through the > > KFD topology. > > Zero out the kfd_gpu_cache_info before asking the remaining fields to > > be filled in by lower-level functions. > > > > Signed-off-by: Joseph Greathouse > > This fixes your previous patch "drm/amdkfd: Add cache line sizes to KFD > topology". Alex, I think the previous patch hasn't gone upstream yet. Do you > want a Fixes: tag or is is possible to squash this with Joe's previous patch > before upstreaming? Either way. I can fix up the tag when we upstream or squash it. Alex > > One nit-pick below. > > > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > index 3df2a8ad86fb..67c1e7f84750 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct > > kfd_topology_device *dev, struct > > > > gpu_processor_id = dev->node_props.simd_id_base; > > > > + memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) * > > +KFD_MAX_CACHE_TYPES); > > Just use sizeof(cache_info). No need to calculate the size of the array and > risk > getting it wrong. > > Regards, >Felix > > > > pcache_info = cache_info; > > num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info); > > if (!num_of_cache_types) {
RE: [PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
[AMD Official Use Only - General] Looks fine by me Regards, Ramesh -Original Message- From: amd-gfx On Behalf Of Kent Russell Sent: Wednesday, February 7, 2024 3:02 AM To: amd-gfx@lists.freedesktop.org Cc: Joshi, Mukul ; Russell, Kent Subject: [PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 Its currently incorrectly multiplied by number of XCCs in the partition Fixes: 6b537864925e ("drm/amdkfd: Update cache info for GFX 9.4.3") Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..533b8292b136 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1640,12 +1640,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, else mode = UNKNOWN_MEMORY_PARTITION_MODE; - if (pcache->cache_level == 2) - pcache->cache_size = pcache_info[cache_type].cache_size * num_xcc; - else if (mode) - pcache->cache_size = pcache_info[cache_type].cache_size / mode; - else - pcache->cache_size = pcache_info[cache_type].cache_size; + pcache->cache_size = pcache_info[cache_type].cache_size; + /* Partition mode only affects L3 cache size */ + if (mode && pcache->cache_level == 3) + pcache->cache_size /= mode; if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE) pcache->cache_type |= HSA_CACHE_TYPE_DATA; -- 2.34.1
[PATCH v2] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
Its currently incorrectly multiplied by number of XCCs in the partition Fixes: 6b537864925e ("drm/amdkfd: Update cache info for GFX 9.4.3") Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..533b8292b136 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1640,12 +1640,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, else mode = UNKNOWN_MEMORY_PARTITION_MODE; - if (pcache->cache_level == 2) - pcache->cache_size = pcache_info[cache_type].cache_size * num_xcc; - else if (mode) - pcache->cache_size = pcache_info[cache_type].cache_size / mode; - else - pcache->cache_size = pcache_info[cache_type].cache_size; + pcache->cache_size = pcache_info[cache_type].cache_size; + /* Partition mode only affects L3 cache size */ + if (mode && pcache->cache_level == 3) + pcache->cache_size /= mode; if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE) pcache->cache_type |= HSA_CACHE_TYPE_DATA; -- 2.34.1
Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2)
[AMD Official Use Only - General] The firmware has not been released yet, It's still undergoing regression testing. Alex From: Shengyu Qu Sent: Tuesday, February 6, 2024 5:08 AM To: Deucher, Alexander; Kuehling, Felix; amd-gfx@lists.freedesktop.org Cc: wiagn...@outlook.com; Cornwall, Jay; Koenig, Christian; Paneer Selvam, Arunpravin Subject: Re: drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole (v2) Hi Alexander, 在 2024/2/6 1:12, Deucher, Alexander 写道: Are you only seeing the problem with this patch applied or in general? If you are seeing it in general, it likely related to a firmware issue that was recently fixed that will be resolved with an update CP firmware image. Driver side changes: https://gitlab.freedesktop.org/agd5f/linux/-/commit/0eb6c664b780dd1b4080e047ad51b100cd7840a3 https://gitlab.freedesktop.org/agd5f/linux/-/commit/40970e60070ed3d1390ec65e38e819f6d81b8f0c Alex This problem is not affected by this patch, so possible the firmware issue. Where can I get the newest firmware image? Or is it already pushed to linux-firmware repo? Best regards, Shengyu
RE: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode
[AMD Official Use Only - General] Oh excellent, it didn't get merged in yet. Time to squash! Kent > -Original Message- > From: Kuehling, Felix > Sent: Tuesday, February 6, 2024 4:29 PM > To: Russell, Kent ; amd-gfx@lists.freedesktop.org > Cc: Joshi, Mukul > Subject: Re: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode > > > On 2024-02-06 16:24, Kent Russell wrote: > > Partition mode only affects L3 cache size. After removing the L2 check in > > the previous patch, make sure we aren't dividing all cache sizes by > > partition mode, just L3. > > > > Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3") > The fixes tag looks wrong. I can't find the commit a75bfb3c4045 > anywhere. Did your previous patch actually make it into the branch yet? > Maybe you can still abandon it in Gerrit. > > Regards, >Felix > > > > > Signed-off-by: Kent Russell > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > index 64bf2a56f010..533b8292b136 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > @@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct > kfd_cache_properties **props_ext, > > else > > mode = UNKNOWN_MEMORY_PARTITION_MODE; > > > > - if (mode) > > - pcache->cache_size = > pcache_info[cache_type].cache_size / mode; > > - else > > - pcache->cache_size = > pcache_info[cache_type].cache_size; > > + pcache->cache_size = pcache_info[cache_type].cache_size; > > + /* Partition mode only affects L3 cache size */ > > + if (mode && pcache->cache_level == 3) > > + pcache->cache_size /= mode; > > > > if (pcache_info[cache_type].flags & > CRAT_CACHE_FLAGS_DATA_CACHE) > > pcache->cache_type |= HSA_CACHE_TYPE_DATA;
Re: [PATCH] drm/amdkfd: Don't divide L2 cache by partition mode
On 2024-02-06 16:24, Kent Russell wrote: Partition mode only affects L3 cache size. After removing the L2 check in the previous patch, make sure we aren't dividing all cache sizes by partition mode, just L3. Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3") The fixes tag looks wrong. I can't find the commit a75bfb3c4045 anywhere. Did your previous patch actually make it into the branch yet? Maybe you can still abandon it in Gerrit. Regards, Felix Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 64bf2a56f010..533b8292b136 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, else mode = UNKNOWN_MEMORY_PARTITION_MODE; - if (mode) - pcache->cache_size = pcache_info[cache_type].cache_size / mode; - else - pcache->cache_size = pcache_info[cache_type].cache_size; + pcache->cache_size = pcache_info[cache_type].cache_size; + /* Partition mode only affects L3 cache size */ + if (mode && pcache->cache_level == 3) + pcache->cache_size /= mode; if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE) pcache->cache_type |= HSA_CACHE_TYPE_DATA;
[PATCH] drm/amdkfd: Don't divide L2 cache by partition mode
Partition mode only affects L3 cache size. After removing the L2 check in the previous patch, make sure we aren't dividing all cache sizes by partition mode, just L3. Fixes: a75bfb3c4045 ("drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3") Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 64bf2a56f010..533b8292b136 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1640,10 +1640,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, else mode = UNKNOWN_MEMORY_PARTITION_MODE; - if (mode) - pcache->cache_size = pcache_info[cache_type].cache_size / mode; - else - pcache->cache_size = pcache_info[cache_type].cache_size; + pcache->cache_size = pcache_info[cache_type].cache_size; + /* Partition mode only affects L3 cache size */ + if (mode && pcache->cache_level == 3) + pcache->cache_size /= mode; if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE) pcache->cache_type |= HSA_CACHE_TYPE_DATA; -- 2.34.1
Re: [PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology
On 2024-02-06 15:55, Joseph Greathouse wrote: The current kfd_gpu_cache_info structure is only partially filled in for some architectures. This means that for devices where we do not fill in some fields, we can returned uninitialized values through the KFD topology. Zero out the kfd_gpu_cache_info before asking the remaining fields to be filled in by lower-level functions. Signed-off-by: Joseph Greathouse This fixes your previous patch "drm/amdkfd: Add cache line sizes to KFD topology". Alex, I think the previous patch hasn't gone upstream yet. Do you want a Fixes: tag or is is possible to squash this with Joe's previous patch before upstreaming? One nit-pick below. --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..67c1e7f84750 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct kfd_topology_device *dev, struct gpu_processor_id = dev->node_props.simd_id_base; + memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) * KFD_MAX_CACHE_TYPES); Just use sizeof(cache_info). No need to calculate the size of the array and risk getting it wrong. Regards, Felix pcache_info = cache_info; num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info); if (!num_of_cache_types) {
RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
[AMD Official Use Only - General] Comments inline. Regards, Ramesh -Original Message- From: amd-gfx On Behalf Of Joshi, Mukul Sent: Wednesday, February 7, 2024 1:36 AM To: Russell, Kent ; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 [AMD Official Use Only - General] [AMD Official Use Only - General] The commit description needs a Fixes tag of the offending commit. With that fixed, this patch is: Reviewed-by: Mukul Joshi > -Original Message- > From: Russell, Kent > Sent: Tuesday, February 6, 2024 1:06 PM > To: amd-gfx@lists.freedesktop.org > Cc: Joshi, Mukul ; Russell, Kent > > Subject: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 > > Its currently incorrectly multiplied by number of XCCs in the > partition > > Signed-off-by: Kent Russell > --- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > index 3df2a8ad86fb..64bf2a56f010 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > @@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct > kfd_cache_properties **props_ext, > else > mode = UNKNOWN_MEMORY_PARTITION_MODE; > > - if (pcache->cache_level == 2) > - pcache->cache_size = > pcache_info[cache_type].cache_size * num_xcc; > - else if (mode) > + if (mode) > pcache->cache_size = > pcache_info[cache_type].cache_size / mode; > else > pcache->cache_size = > pcache_info[cache_type].cache_size; Ramesh: Per my reading a cache_size is correct and should be around 4 MiB. Per my thinking "mode" does not come into play? > -- > 2.34.1
[PATCH] drm/amdkfd: Initialize kfd_gpu_cache_info for KFD topology
The current kfd_gpu_cache_info structure is only partially filled in for some architectures. This means that for devices where we do not fill in some fields, we can returned uninitialized values through the KFD topology. Zero out the kfd_gpu_cache_info before asking the remaining fields to be filled in by lower-level functions. Signed-off-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..67c1e7f84750 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1707,6 +1707,7 @@ static void kfd_fill_cache_non_crat_info(struct kfd_topology_device *dev, struct gpu_processor_id = dev->node_props.simd_id_base; + memset(cache_info, 0, sizeof(struct kfd_gpu_cache_info) * KFD_MAX_CACHE_TYPES); pcache_info = cache_info; num_of_cache_types = kfd_get_gpu_cache_info(kdev, _info); if (!num_of_cache_types) { -- 2.20.1
[PATCH 2/3] drm/amdgpu: Add hdp v7_0 ip block support
From: Likun Gao Add hdp v7_0 ip block support. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c | 142 ++ drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h | 31 ++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 9bc5f3dde442..87022325bbf7 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -98,7 +98,7 @@ amdgpu-y += \ vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o mxgpu_nv.o \ nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o soc21.o \ sienna_cichlid.o smu_v13_0_10.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o \ - nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o + nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o hdp_v7_0.o # add DF block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c new file mode 100644 index ..8d7d0813e331 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c @@ -0,0 +1,142 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ +#include "amdgpu.h" +#include "amdgpu_atombios.h" +#include "hdp_v7_0.h" + +#include "hdp/hdp_7_0_0_offset.h" +#include "hdp/hdp_7_0_0_sh_mask.h" +#include + +static void hdp_v7_0_flush_hdp(struct amdgpu_device *adev, + struct amdgpu_ring *ring) +{ + if (!ring || !ring->funcs->emit_wreg) + WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); + else + amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); +} + +static void hdp_v7_0_update_clock_gating(struct amdgpu_device *adev, +bool enable) +{ + uint32_t hdp_clk_cntl, hdp_clk_cntl1; + uint32_t hdp_mem_pwr_cntl; + + if (!(adev->cg_flags & (AMD_CG_SUPPORT_HDP_LS | + AMD_CG_SUPPORT_HDP_DS | + AMD_CG_SUPPORT_HDP_SD))) + return; + + hdp_clk_cntl = hdp_clk_cntl1 = RREG32_SOC15(HDP, 0,regHDP_CLK_CNTL); + hdp_mem_pwr_cntl = RREG32_SOC15(HDP, 0, regHDP_MEM_POWER_CTRL); + + /* Before doing clock/power mode switch, +* forced on IPH & RC clock */ + hdp_clk_cntl = REG_SET_FIELD(hdp_clk_cntl, HDP_CLK_CNTL, +RC_MEM_CLK_SOFT_OVERRIDE, 1); + WREG32_SOC15(HDP, 0, regHDP_CLK_CNTL, hdp_clk_cntl); + + /* disable clock and power gating before any changing */ + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +ATOMIC_MEM_POWER_CTRL_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +ATOMIC_MEM_POWER_LS_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +ATOMIC_MEM_POWER_DS_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +ATOMIC_MEM_POWER_SD_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +RC_MEM_POWER_CTRL_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +RC_MEM_POWER_LS_EN, 0); + hdp_mem_pwr_cntl = REG_SET_FIELD(hdp_mem_pwr_cntl, HDP_MEM_POWER_CTRL, +
[PATCH 3/3] drm/amdgpu/discovery: Add hdp v7_0 ip block
From: Likun Gao Add hdp v7_0 ip block Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index c4370f154e8b..59530fe36b6b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -64,6 +64,7 @@ #include "hdp_v5_0.h" #include "hdp_v5_2.h" #include "hdp_v6_0.h" +#include "hdp_v7_0.h" #include "nv.h" #include "soc21.h" #include "navi10_ih.h" @@ -2569,6 +2570,9 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(6, 1, 0): adev->hdp.funcs = _v6_0_funcs; break; + case IP_VERSION(7, 0, 0): + adev->hdp.funcs = _v7_0_funcs; + break; default: break; } -- 2.42.0
[PATCH 0/3] HDP 7.0 Support
This series adds support for HDP 7.0. HDP (Host Data Path), provides CPU access to device memory via the PCI BAR. Patch 1 adds the register headers and is very large, so I've omitted it. Hawking Zhang (1): drm/amdgpu: Add hdp v7_0_0 ip headers (v3) Likun Gao (2): drm/amdgpu: Add hdp v7_0 ip block support drm/amdgpu/discovery: Add hdp v7_0 ip block drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 + drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c | 142 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h | 31 + .../include/asic_reg/hdp/hdp_7_0_0_offset.h | 219 ++ .../include/asic_reg/hdp/hdp_7_0_0_sh_mask.h | 735 ++ 6 files changed, 1132 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/hdp_v7_0.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/hdp/hdp_7_0_0_offset.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/hdp/hdp_7_0_0_sh_mask.h -- 2.42.0
[PATCH 3/3] drm/amdgpu/discovery: Add ih v7_0 ip block
From: Likun Gao Add ih v7_0 ip block. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index e941aeb6f16a..c4370f154e8b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -69,6 +69,7 @@ #include "navi10_ih.h" #include "ih_v6_0.h" #include "ih_v6_1.h" +#include "ih_v7_0.h" #include "gfx_v10_0.h" #include "gfx_v11_0.h" #include "sdma_v5_0.h" @@ -1768,6 +1769,9 @@ static int amdgpu_discovery_set_ih_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(6, 1, 0): amdgpu_device_ip_block_add(adev, _v6_1_ip_block); break; + case IP_VERSION(7, 0, 0): + amdgpu_device_ip_block_add(adev, _v7_0_ip_block); + break; default: dev_err(adev->dev, "Failed to add ih ip block(OSSSYS_HWIP:0x%x)\n", -- 2.42.0
[PATCH 2/3] drm/amdgpu: Add ih v7_0 ip block support
From: Likun Gao Add ih v7_0 ip block support. Signed-off-by: Likun Gao Signed-off-by: Hawking Zhang Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 766 +++ drivers/gpu/drm/amd/amdgpu/ih_v7_0.h | 28 + 3 files changed, 796 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 3f7de16e0dc4..9bc5f3dde442 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -132,7 +132,8 @@ amdgpu-y += \ vega20_ih.o \ navi10_ih.o \ ih_v6_0.o \ - ih_v6_1.o + ih_v6_1.o \ + ih_v7_0.o # add PSP block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c new file mode 100644 index ..236806797b23 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c @@ -0,0 +1,766 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include + +#include "amdgpu.h" +#include "amdgpu_ih.h" + +#include "oss/osssys_7_0_0_offset.h" +#include "oss/osssys_7_0_0_sh_mask.h" + +#include "soc15_common.h" +#include "ih_v7_0.h" + +#define MAX_REARM_RETRY 10 + +static void ih_v7_0_set_interrupt_funcs(struct amdgpu_device *adev); + +/** + * ih_v7_0_init_register_offset - Initialize register offset for ih rings + * + * @adev: amdgpu_device pointer + * + * Initialize register offset ih rings (IH_V7_0). + */ +static void ih_v7_0_init_register_offset(struct amdgpu_device *adev) +{ + struct amdgpu_ih_regs *ih_regs; + + /* ih ring 2 is removed +* ih ring and ih ring 1 are available */ + if (adev->irq.ih.ring_size) { + ih_regs = >irq.ih.ih_regs; + ih_regs->ih_rb_base = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_BASE); + ih_regs->ih_rb_base_hi = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_BASE_HI); + ih_regs->ih_rb_cntl = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_CNTL); + ih_regs->ih_rb_wptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_WPTR); + ih_regs->ih_rb_rptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_RPTR); + ih_regs->ih_doorbell_rptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_DOORBELL_RPTR); + ih_regs->ih_rb_wptr_addr_lo = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_WPTR_ADDR_LO); + ih_regs->ih_rb_wptr_addr_hi = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_WPTR_ADDR_HI); + ih_regs->psp_reg_id = PSP_REG_IH_RB_CNTL; + } + + if (adev->irq.ih1.ring_size) { + ih_regs = >irq.ih1.ih_regs; + ih_regs->ih_rb_base = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_BASE_RING1); + ih_regs->ih_rb_base_hi = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_BASE_HI_RING1); + ih_regs->ih_rb_cntl = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_CNTL_RING1); + ih_regs->ih_rb_wptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_WPTR_RING1); + ih_regs->ih_rb_rptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_RB_RPTR_RING1); + ih_regs->ih_doorbell_rptr = SOC15_REG_OFFSET(OSSSYS, 0, regIH_DOORBELL_RPTR_RING1); + ih_regs->psp_reg_id = PSP_REG_IH_RB_CNTL_RING1; + } +} + +/** + * force_update_wptr_for_self_int - Force update the wptr for self interrupt + * + * @adev: amdgpu_device pointer + * @threshold: threshold to trigger the wptr reporting + * @timeout: timeout to trigger the wptr reporting + * @enabled: Enable/disable timeout flush mechanism + * + * threshold input range: 0 ~ 15, default 0, + * real_threshold = 2^threshold + * timeout input range: 0 ~ 20, default 8, + * real_timeout = (2^timeout) * 1024 / (socclk_freq) + * + * Force
[PATCH 0/3] IH 7.0 support
This series adds support for IH 7.0.x. IH is the interrupt handler on the GPU. Interrupts are written to a ring buffer and the driver walks the ring buffer handling the interrupt packets. Patch 1 adds the new register headers and is very large, so I've omitted it. Hawking Zhang (1): drm/amdgpu: Add osssys v7_0_0 ip headers (v4) Likun Gao (2): drm/amdgpu: Add ih v7_0 ip block support drm/amdgpu/discovery: Add ih v7_0 ip block drivers/gpu/drm/amd/amdgpu/Makefile |3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |4 + drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 766 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h | 28 + .../asic_reg/oss/osssys_7_0_0_offset.h| 279 + .../asic_reg/oss/osssys_7_0_0_sh_mask.h | 1029 + 6 files changed, 2108 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/ih_v7_0.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/oss/osssys_7_0_0_offset.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/oss/osssys_7_0_0_sh_mask.h -- 2.42.0
[PATCH 2/3] drm/amdgpu: Add lsdma v7_0 ip block support
From: Likun Gao Add lsdma v7_0 ip block support. Signed-off-by: Likun Gao Signed-off-by: Hawking Zhang Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c | 121 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h | 31 ++ 3 files changed, 153 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 1b04bae60fbf..3f7de16e0dc4 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -98,7 +98,7 @@ amdgpu-y += \ vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o arct_reg_init.o mxgpu_nv.o \ nbio_v7_2.o hdp_v4_0.o hdp_v5_0.o aldebaran_reg_init.o aldebaran.o soc21.o \ sienna_cichlid.o smu_v13_0_10.o nbio_v4_3.o hdp_v6_0.o nbio_v7_7.o hdp_v5_2.o lsdma_v6_0.o \ - nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o + nbio_v7_9.o aqua_vanjaram.o nbio_v7_11.o lsdma_v7_0.o # add DF block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c new file mode 100644 index ..396262044ea8 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c @@ -0,0 +1,121 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include +#include "amdgpu.h" +#include "lsdma_v7_0.h" +#include "amdgpu_lsdma.h" + +#include "lsdma/lsdma_7_0_0_offset.h" +#include "lsdma/lsdma_7_0_0_sh_mask.h" + +static int lsdma_v7_0_wait_pio_status(struct amdgpu_device *adev) +{ + return amdgpu_lsdma_wait_for(adev, SOC15_REG_OFFSET(LSDMA, 0, regLSDMA_PIO_STATUS), + LSDMA_PIO_STATUS__PIO_IDLE_MASK | LSDMA_PIO_STATUS__PIO_FIFO_EMPTY_MASK, + LSDMA_PIO_STATUS__PIO_IDLE_MASK | LSDMA_PIO_STATUS__PIO_FIFO_EMPTY_MASK); +} + +static int lsdma_v7_0_copy_mem(struct amdgpu_device *adev, + uint64_t src_addr, + uint64_t dst_addr, + uint64_t size) +{ + int ret; + uint32_t tmp; + + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_SRC_ADDR_LO, lower_32_bits(src_addr)); + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_SRC_ADDR_HI, upper_32_bits(src_addr)); + + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_LO, lower_32_bits(dst_addr)); + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_HI, upper_32_bits(dst_addr)); + + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_CONTROL, 0x0); + + tmp = RREG32_SOC15(LSDMA, 0, regLSDMA_PIO_COMMAND); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, BYTE_COUNT, size); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, SRC_LOCATION, 0); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, DST_LOCATION, 0); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, SRC_ADDR_INC, 0); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, DST_ADDR_INC, 0); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, OVERLAP_DISABLE, 0); + tmp = REG_SET_FIELD(tmp, LSDMA_PIO_COMMAND, CONSTANT_FILL, 0); + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_COMMAND, tmp); + + ret = lsdma_v7_0_wait_pio_status(adev); + if (ret) + dev_err(adev->dev, "LSDMA PIO failed to copy memory!\n"); + + return ret; +} + +static int lsdma_v7_0_fill_mem(struct amdgpu_device *adev, + uint64_t dst_addr, + uint32_t data, + uint64_t size) +{ + int ret; + uint32_t tmp; + + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_CONSTFILL_DATA, data); + + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_LO, lower_32_bits(dst_addr)); + WREG32_SOC15(LSDMA, 0, regLSDMA_PIO_DST_ADDR_HI,
[PATCH 3/3] drm/amdgpu/discovery: Add lsdma v7_0 ip block
From: Likun Gao Add lsdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 93c84a1c1d3e..e941aeb6f16a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -75,6 +75,7 @@ #include "sdma_v5_2.h" #include "sdma_v6_0.h" #include "lsdma_v6_0.h" +#include "lsdma_v7_0.h" #include "vcn_v2_0.h" #include "jpeg_v2_0.h" #include "vcn_v3_0.h" @@ -2641,6 +2642,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(6, 0, 3): adev->lsdma.funcs = _v6_0_funcs; break; + case IP_VERSION(7, 0, 0): + case IP_VERSION(7, 0, 1): + adev->lsdma.funcs = _v7_0_funcs; + break; default: break; } -- 2.42.0
[PATCH 0/3] LSDMA 7.0 support
LSDMA (Light SDMA) is a general purpose SDMA engine on the GPU. The driver uses it for MMIO-controlled DMA access to GPU accessible memory. This adds support for ASICs containing LSDMA version 7.0.x. The first patch adds the register headers and is very large, so I've omitted it. Hawking Zhang (1): drm/amdgpu: Add lsdma v7_0_0 ip headers (v3) Likun Gao (2): drm/amdgpu: Add lsdma v7_0 ip block support drm/amdgpu/discovery: Add lsdma v7_0 ip block drivers/gpu/drm/amd/amdgpu/Makefile |2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |5 + drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c | 121 ++ drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h | 31 + .../asic_reg/lsdma/lsdma_7_0_0_offset.h | 388 + .../asic_reg/lsdma/lsdma_7_0_0_sh_mask.h | 1411 + 6 files changed, 1957 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/lsdma_v7_0.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/lsdma/lsdma_7_0_0_offset.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/lsdma/lsdma_7_0_0_sh_mask.h -- 2.42.0
[PATCH 2/2] drm/amdgpu: Add athub v4_1_0 ip block support
From: Hawking Zhang Add athub v4_1_0 ip block support. Signed-off-by: Hawking Zhang Reviewed-by: Likun Gao Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/Makefile | 3 +- drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c | 121 ++ drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h | 30 ++ 3 files changed, 153 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 4c989da4d2f3..1b04bae60fbf 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -233,7 +233,8 @@ amdgpu-y += \ athub_v1_0.o \ athub_v2_0.o \ athub_v2_1.o \ - athub_v3_0.o + athub_v3_0.o \ + athub_v4_1_0.o # add SMUIO block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c b/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c new file mode 100644 index ..14f0a63cfb45 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c @@ -0,0 +1,121 @@ +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "amdgpu.h" +#include "athub_v4_1_0.h" +#include "athub/athub_4_1_0_offset.h" +#include "athub/athub_4_1_0_sh_mask.h" +#include "soc15_common.h" + +static uint32_t athub_v4_1_0_get_cg_cntl(struct amdgpu_device *adev) +{ + uint32_t data; + + switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) { + case IP_VERSION(4, 1, 0): + data = RREG32_SOC15(ATHUB, 0, regATHUB_MISC_CNTL); + break; + default: + break; + } + return data; +} + +static void athub_v4_1_0_set_cg_cntl(struct amdgpu_device *adev, uint32_t data) +{ + switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) { + case IP_VERSION(4, 1, 0): + WREG32_SOC15(ATHUB, 0, regATHUB_MISC_CNTL, data); + break; + default: + break; + } +} + +static void +athub_v4_1_0_update_medium_grain_clock_gating(struct amdgpu_device *adev, + bool enable) +{ + uint32_t def, data; + + def = data = athub_v4_1_0_get_cg_cntl(adev); + + if (enable && (adev->cg_flags & AMD_CG_SUPPORT_ATHUB_MGCG)) + data |= ATHUB_MISC_CNTL__CG_ENABLE_MASK; + else + data &= ~ATHUB_MISC_CNTL__CG_ENABLE_MASK; + + if (def != data) + athub_v4_1_0_set_cg_cntl(adev, data); +} + +static void +athub_v4_1_0_update_medium_grain_light_sleep(struct amdgpu_device *adev, +bool enable) +{ + uint32_t def, data; + + def = data = athub_v4_1_0_get_cg_cntl(adev); + + if (enable && (adev->cg_flags & AMD_CG_SUPPORT_ATHUB_LS)) + data |= ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK; + else + data &= ~ATHUB_MISC_CNTL__CG_MEM_LS_ENABLE_MASK; + + if (def != data) + athub_v4_1_0_set_cg_cntl(adev, data); +} + +int athub_v4_1_0_set_clockgating(struct amdgpu_device *adev, +enum amd_clockgating_state state) +{ + if (amdgpu_sriov_vf(adev)) + return 0; + + switch (amdgpu_ip_version(adev, ATHUB_HWIP, 0)) { + case IP_VERSION(4, 1, 0): + athub_v4_1_0_update_medium_grain_clock_gating(adev, + state == AMD_CG_STATE_GATE); + athub_v4_1_0_update_medium_grain_light_sleep(adev, + state == AMD_CG_STATE_GATE); + break; + default: + break; + } + + return 0; +} + +void athub_v4_1_0_get_clockgating(struct amdgpu_device *adev, u64 *flags) +{ + int data; + + /* AMD_CG_SUPPORT_ATHUB_MGCG */ + data =
[PATCH 0/2] Add ATHUB 4.1 support
This adds support for ATHUB 4.1.x. The driver's interaction with this hardware is largely limited to enabling clockgating features. The first just adds the register headers and is large, so I've omitted it. Hawking Zhang (2): drm/amdgpu: Add athub v4_1_0 ip headers (v5) drm/amdgpu: Add athub v4_1_0 ip block support drivers/gpu/drm/amd/amdgpu/Makefile |3 +- drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c | 121 ++ drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h | 30 + .../asic_reg/athub/athub_4_1_0_offset.h | 287 .../asic_reg/athub/athub_4_1_0_sh_mask.h | 1348 + 5 files changed, 1788 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.c create mode 100644 drivers/gpu/drm/amd/amdgpu/athub_v4_1_0.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/athub/athub_4_1_0_offset.h create mode 100644 drivers/gpu/drm/amd/include/asic_reg/athub/athub_4_1_0_sh_mask.h -- 2.42.0
[PATCH] drm/amdgpu: skip ucode bo reserve for RLC AUTOLOAD
From: Likun Gao Skip ucode BO reservation for backdoor RLC autoload. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index 3e12763e477a..afa3ac931638 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -1060,7 +1060,8 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode, int amdgpu_ucode_create_bo(struct amdgpu_device *adev) { - if (adev->firmware.load_type != AMDGPU_FW_LOAD_DIRECT) { + if ((adev->firmware.load_type != AMDGPU_FW_LOAD_DIRECT) && + (adev->firmware.load_type != AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO)) { amdgpu_bo_create_kernel(adev, adev->firmware.fw_size, PAGE_SIZE, (amdgpu_sriov_vf(adev) || adev->debug_use_vram_fw_buf) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, -- 2.42.0
[PATCH] drm/amdkfd: fill in data for control stack header for gfx10
From: Jonathan Kim The debugger requires the control stack header to be filled in to update_waves. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 25 1 file changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h index 57bf5e513f4d..e5cc697a3ca8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h @@ -128,6 +128,31 @@ struct mqd_manager { uint32_t mqd_size; }; +struct mqd_user_context_save_area_header { + /* Byte offset from start of user context +* save area to the last saved top (lowest +* address) of control stack data. Must be +* 4 byte aligned. +*/ + uint32_t control_stack_offset; + + /* Byte size of the last saved control stack +* data. Must be 4 byte aligned. +*/ + uint32_t control_stack_size; + + /* Byte offset from start of user context save +* area to the last saved base (lowest address) +* of wave state data. Must be 4 byte aligned. +*/ + uint32_t wave_state_offset; + + /* Byte size of the last saved wave state data. +* Must be 4 byte aligned. +*/ + uint32_t wave_state_size; +}; + struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_node *dev, struct queue_properties *q); -- 2.42.0
[PATCH] drm/amd/swsmu: add judgement for vcn jpeg dpm set
From: Likun Gao Only enable VCN/JPEG dpm when VCN/JPEG PG flag was set when smu set dpm table. Signed-off-by: Likun Gao Reviewed-by: Kenneth Feng Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 30 +++ 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 0ad947df777a..3d72c945cf56 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -751,6 +751,7 @@ static int smu_early_init(void *handle) static int smu_set_default_dpm_table(struct smu_context *smu) { + struct amdgpu_device *adev = smu->adev; struct smu_power_context *smu_power = >smu_power; struct smu_power_gate *power_gate = _power->power_gate; int vcn_gate, jpeg_gate; @@ -759,25 +760,34 @@ static int smu_set_default_dpm_table(struct smu_context *smu) if (!smu->ppt_funcs->set_default_dpm_table) return 0; - vcn_gate = atomic_read(_gate->vcn_gated); - jpeg_gate = atomic_read(_gate->jpeg_gated); + if (adev->pg_flags & AMD_PG_SUPPORT_VCN) + vcn_gate = atomic_read(_gate->vcn_gated); + if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) + jpeg_gate = atomic_read(_gate->jpeg_gated); - ret = smu_dpm_set_vcn_enable(smu, true); - if (ret) - return ret; + if (adev->pg_flags & AMD_PG_SUPPORT_VCN) { + ret = smu_dpm_set_vcn_enable(smu, true); + if (ret) + return ret; + } - ret = smu_dpm_set_jpeg_enable(smu, true); - if (ret) - goto err_out; + if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) { + ret = smu_dpm_set_jpeg_enable(smu, true); + if (ret) + goto err_out; + } ret = smu->ppt_funcs->set_default_dpm_table(smu); if (ret) dev_err(smu->adev->dev, "Failed to setup default dpm clock tables!\n"); - smu_dpm_set_jpeg_enable(smu, !jpeg_gate); + if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) + smu_dpm_set_jpeg_enable(smu, !jpeg_gate); err_out: - smu_dpm_set_vcn_enable(smu, !vcn_gate); + if (adev->pg_flags & AMD_PG_SUPPORT_VCN) + smu_dpm_set_vcn_enable(smu, !vcn_gate); + return ret; } -- 2.42.0
[PATCH] drm/amdgpu: support rlc auotload type set
From: Likun Gao Support to set fw_load_type=3 to use backdoor rlc autoload. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index afa3ac931638..2ab01b18d62e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -556,6 +556,8 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, int load_type) default: if (!load_type) return AMDGPU_FW_LOAD_DIRECT; + else if (load_type == 3) + return AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO; else return AMDGPU_FW_LOAD_PSP; } -- 2.42.0
RE: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
[AMD Official Use Only - General] The commit description needs a Fixes tag of the offending commit. With that fixed, this patch is: Reviewed-by: Mukul Joshi > -Original Message- > From: Russell, Kent > Sent: Tuesday, February 6, 2024 1:06 PM > To: amd-gfx@lists.freedesktop.org > Cc: Joshi, Mukul ; Russell, Kent > > Subject: [PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3 > > Its currently incorrectly multiplied by number of XCCs in the partition > > Signed-off-by: Kent Russell > --- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > index 3df2a8ad86fb..64bf2a56f010 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > @@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct > kfd_cache_properties **props_ext, > else > mode = UNKNOWN_MEMORY_PARTITION_MODE; > > - if (pcache->cache_level == 2) > - pcache->cache_size = > pcache_info[cache_type].cache_size * num_xcc; > - else if (mode) > + if (mode) > pcache->cache_size = > pcache_info[cache_type].cache_size / mode; > else > pcache->cache_size = > pcache_info[cache_type].cache_size; > -- > 2.34.1
Re: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler
[AMD Official Use Only - General] Acked-by: Alex Deucher From: amd-gfx on behalf of Laurent Morichetti Sent: Thursday, February 1, 2024 4:33 PM To: amd-gfx@lists.freedesktop.org Cc: jay.cornwall@amd.com ; Morichetti, Laurent ; Six, Lancelot ; Cornwall, Jay Subject: [PATCH v2] amdkfd: pass debug exceptions to second-level trap handler Call the 2nd level trap handler if the cwsr handler is entered with any one of wave_start, wave_end, or trap_after_inst exceptions. Signed-off-by: Laurent Morichetti Tested-by: Lancelot Six Reviewed-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 2 +- .../drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm | 17 - 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index d1caaf0e6a7c..2e9b64edb8d2 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -2518,7 +2518,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = { 0x8b6eff7b, 0x0400, 0xbfa20045, 0xbf830010, 0xb8fbf803, 0xbfa0fffa, - 0x8b6eff7b, 0x0900, + 0x8b6eff7b, 0x00160900, 0xbfa20015, 0x8b6eff7b, 0x71ff, 0xbfa10008, 0x8b6fff7b, 0x7080, diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm index 71b3dc0c7363..7568ff3af978 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm @@ -81,6 +81,11 @@ var SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT = 11 var SQ_WAVE_TRAPSTS_POST_SAVECTX_SIZE = 21 var SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK = 0x800 var SQ_WAVE_TRAPSTS_EXCP_HI_MASK= 0x7000 +#if ASIC_FAMILY >= CHIP_PLUM_BONITO +var SQ_WAVE_TRAPSTS_WAVE_START_MASK= 0x2 +var SQ_WAVE_TRAPSTS_WAVE_END_MASK = 0x4 +var SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK = 0x10 +#endif var SQ_WAVE_MODE_EXCP_EN_SHIFT = 12 var SQ_WAVE_MODE_EXCP_EN_ADDR_WATCH_SHIFT = 19 @@ -92,6 +97,16 @@ var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK= 0x003F8000 var SQ_WAVE_MODE_DEBUG_EN_MASK = 0x800 +#if ASIC_FAMILY < CHIP_PLUM_BONITO +var S_TRAPSTS_NON_MASKABLE_EXCP_MASK = SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK +#else +var S_TRAPSTS_NON_MASKABLE_EXCP_MASK = SQ_WAVE_TRAPSTS_MEM_VIOL_MASK |\ + SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK |\ + SQ_WAVE_TRAPSTS_WAVE_START_MASK|\ + SQ_WAVE_TRAPSTS_WAVE_END_MASK |\ + SQ_WAVE_TRAPSTS_TRAP_AFTER_INST_MASK +#endif + // bits [31:24] unused by SPI debug data var TTMP11_SAVE_REPLAY_W64H_SHIFT = 31 var TTMP11_SAVE_REPLAY_W64H_MASK= 0x8000 @@ -224,7 +239,7 @@ L_NOT_HALTED: // Check non-maskable exceptions. memory_violation, illegal_instruction // and xnack_error exceptions always cause the wave to enter the trap // handler. - s_and_b32 ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_MEM_VIOL_MASK|SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK + s_and_b32 ttmp2, s_save_trapsts, S_TRAPSTS_NON_MASKABLE_EXCP_MASK s_cbranch_scc1 L_FETCH_2ND_TRAP // Check for maskable exceptions in trapsts.excp and trapsts.excp_hi. base-commit: c4b562a17829454713e45219fa754be1bfda9004 -- 2.25.1
Re: [PATCH v3 1/5] ACPI: video: Handle fetching EDID that is longer than 256 bytes
On Fri, Feb 2, 2024 at 5:09 PM Mario Limonciello wrote: > > On 2/2/2024 10:07, Rafael J. Wysocki wrote: > > On Thu, Feb 1, 2024 at 11:11 PM Mario Limonciello > > wrote: > >> > >> The ACPI specification allows for an EDID to be up to 512 bytes but > >> the _DDC EDID fetching code will only try up to 256 bytes. > >> > >> Modify the code to instead start at 512 bytes and work it's way > >> down instead. > >> > >> As _DDC is now called up to 4 times on a machine debugging messages > >> are noisier than necessary. Decrease from info to debug. > >> > >> Link: > >> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device > >> Signed-off-by: Mario Limonciello > > > > Acked-by: Rafael J. Wysocki > > > > or I can apply it if that's preferred. > > Thanks! > > I think go ahead and apply this one to your -next tree. Applied now. Barring any issues with it, It will get into linux-next in a couple of days. Thanks!
Re: [PATCH 3/3] drm/amdgpu: wire up the can_remove() callback
Am 06.02.24 um 15:29 schrieb Daniel Vetter: On Fri, Feb 02, 2024 at 03:40:03PM -0800, Greg Kroah-Hartman wrote: On Fri, Feb 02, 2024 at 05:25:56PM -0500, Hamza Mahfooz wrote: Removing an amdgpu device that still has user space references allocated to it causes undefined behaviour. Then fix that please. There should not be anything special about your hardware that all of the tens of thousands of other devices can't handle today. What happens when I yank your device out of a system with a pci hotplug bus? You can't prevent that either, so this should not be any different at all. sorry, but please, just fix your driver. fwiw Christian König from amd already rejected this too, I have no idea why this was submitted Well that was my fault. I commented on an internal bug tracker that when sysfs bind/undbind is a different code path from PCI remove/re-scan we could try to reject it. Turned out it isn't a different code path. since the very elaborate plan I developed with a bunch of amd folks was to fix the various lifetime lolz we still have in drm. We unfortunately export the world of internal objects to userspace as uabi objects with dma_buf, dma_fence and everything else, but it's all fixable and we have the plan even documented: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug So yeah anything that isn't that plan of record is very much no-go for drm drivers. Unless we change that plan of course, but that needs a documentation patch first and a big discussion. Aside from an absolute massive pile of kernel-internal refcounting bugs the really big one we agreed on after a lot of discussion is that SIGBUS on dma-buf mmaps is no-go for drm drivers, because it would break way too much userspace in ways which are simply not fixable (since sig handlers are shared in a process, which means the gl/vk driver cannot use it). Otherwise it's bog standard "fix the kernel bugs" work, just a lot of it. Ignoring a few memory leaks because of messed up refcounting we actually got that working quite nicely. At least hot unplug / hot add seems to be working rather reliable in our internal testing. So it can't be that messed up. Regards, Christian. Cheers, Sima
RE: [PATCH v2] drm/amd/display: Implement bounds check for stream encoder creation in DCN301
[Public] Inline. > -Original Message- > From: SHANMUGAM, SRINIVASAN > Sent: Monday, February 5, 2024 10:47 PM > To: Li, Roman ; Siqueira, Rodrigo > ; Pillai, Aurabindo > Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN > > Subject: [PATCH v2] drm/amd/display: Implement bounds check for stream > encoder creation in DCN301 > > 'stream_enc_regs' array is an array of dcn10_stream_enc_registers structures. > The array is initialized with four elements, corresponding to the four calls > to > stream_enc_regs() in the array initializer. This means that valid indices for > this > array are 0, 1, 2, and 3. > > The error message 'stream_enc_regs' 4 <= 5 below, is indicating that there is > an > attempt to access this array with an index of 5, which is out of bounds. This > could lead to undefined behavior > > Here, eng_id is used as an index to access the stream_enc_regs array. If > eng_id > is 5, this would result in an out-of-bounds access on the stream_enc_regs > array. > > Thus fixing Buffer overflow error in dcn301_stream_encoder_create reported > by Smatch: > drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_reso > urce.c:1011 dcn301_stream_encoder_create() error: buffer overflow > 'stream_enc_regs' 4 <= 5 > > Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)") > Cc: Roman Li > Cc: Rodrigo Siqueira > Cc: Aurabindo Pillai > Signed-off-by: Srinivasan Shanmugam > --- > .../drm/amd/display/dc/resource/dcn301/dcn301_resource.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git > a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c > b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c > index 511ff6b5b985..4a475a723191 100644 > --- a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c > +++ > b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c > @@ -999,7 +999,7 @@ static struct stream_encoder > *dcn301_stream_encoder_create(enum engine_id eng_id > vpg = dcn301_vpg_create(ctx, vpg_inst); > afmt = dcn301_afmt_create(ctx, afmt_inst); > > - if (!enc1 || !vpg || !afmt) { > + if (!enc1 || !vpg || !afmt || eng_id >= ARRAY_SIZE(stream_enc_regs)) > { > kfree(enc1); > kfree(vpg); > kfree(afmt); Reviewed-by: Roman Li I don't think the part below is necessary. > @@ -1007,10 +1007,9 @@ static struct stream_encoder > *dcn301_stream_encoder_create(enum engine_id eng_id > } > > dcn30_dio_stream_encoder_construct(enc1, ctx, ctx->dc_bios, > - eng_id, vpg, afmt, > - _enc_regs[eng_id], > - _shift, _mask); > - > +eng_id, vpg, afmt, > +_enc_regs[eng_id], > +_shift, _mask); > return >base; > } > > -- > 2.34.1
[PATCH] drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
Its currently incorrectly multiplied by number of XCCs in the partition Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3df2a8ad86fb..64bf2a56f010 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1640,9 +1640,7 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, else mode = UNKNOWN_MEMORY_PARTITION_MODE; - if (pcache->cache_level == 2) - pcache->cache_size = pcache_info[cache_type].cache_size * num_xcc; - else if (mode) + if (mode) pcache->cache_size = pcache_info[cache_type].cache_size / mode; else pcache->cache_size = pcache_info[cache_type].cache_size; -- 2.34.1
Re: [PATCH] drm/amd/display: Clear phantom stream count and plane count
On 2024-02-05 08:54, Deucher, Alexander wrote: > [Public] > > > [Public] > > > Acked-by: Alex Deucher > Reviewed-by: Harry Wentland Harry -- > *From:* amd-gfx on behalf of Mario > Limonciello > *Sent:* Friday, February 2, 2024 7:30 PM > *To:* amd-gfx@lists.freedesktop.org > *Cc:* Limonciello, Mario > *Subject:* [PATCH] drm/amd/display: Clear phantom stream count and plane count > > When dc_state_destruct() was refactored the new phantom_stream_count > and phantom_plane_count members weren't cleared. > > Fixes: 012a04b1d6af ("drm/amd/display: Refactor phantom resource allocation") > Signed-off-by: Mario Limonciello > --- > drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c > b/drivers/gpu/drm/amd/display/dc/core/dc_state.c > index 88c6436b28b6..180ac47868c2 100644 > --- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c > @@ -291,11 +291,14 @@ void dc_state_destruct(struct dc_state *state) > dc_stream_release(state->phantom_streams[i]); > state->phantom_streams[i] = NULL; > } > + state->phantom_stream_count = 0; > > for (i = 0; i < state->phantom_plane_count; i++) { > dc_plane_state_release(state->phantom_planes[i]); > state->phantom_planes[i] = NULL; > } > + state->phantom_plane_count = 0; > + > state->stream_mask = 0; > memset(>res_ctx, 0, sizeof(state->res_ctx)); > memset(>pp_display_cfg, 0, sizeof(state->pp_display_cfg)); > -- > 2.34.1 >
[PATCH v4 12/24] drm/amdgpu: use trapID 4 for host trap
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 + 3 files changed, 1070 insertions(+), 1054 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 7d8c0e13ac12..adfe5e5585e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); /* select *target_wave_slot */ value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + /* set TrapID 4 for HOSTTRAP */ + value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4); mutex_lock(>grbm_idx_mutex); amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index af1f678790e7..b3c681d7256b 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf82025e, + 0xbf820001, 0xbf820263, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, - 0x00ff, 0xbf85001e, + 0x00ff, 0xbf850023, 0x866eff7b, 0x0400, - 0xbf85005b, 0xbf8e0010, + 0xbf850060, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, - 0xbf850015, 0x866eff7b, - 0x71ff, 0xbf840008, - 0x866fff7b, 0x7080, - 0xbf840001, 0xbeee1a87, - 0xb8eff801, 0x8e6e8c6e, - 0x866e6f6e, 0xbf85000a, - 0x866eff6d, 0x00ff, - 0xbf850007, 0xb8eef801, - 0x866eff6e, 0x0800, - 0xbf850003, 0x866eff7b, - 0x0400, 0xbf850040, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xb8faf812, - 0xb8fbf813, 0x8efa887a, - 0xbf0d8f7b, 0xbf840002, - 0x877bff7b, 0x, - 0xc0031c3d, 0x0010, - 0xc0071bbd, 0x, - 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x8671ff6d, - 0x0100, 0xbf840004, - 0x92f1ff70, 0x00010001, - 0xbf840016, 0xbf820005, - 0x86708170, 0x8e709770, - 0x8977ff77, 0x0080, - 0x8077, 0x86ee6e6e, - 0xbf840001, 0xbe801d6e, - 0x866eff6d, 0x01ff, - 0xbf850005, 0x8778ff78, - 0x2000, 0x80ec886c, - 0x82ed806d, 0xbf820005, - 0x866eff6d, 0x0100, - 0xbf850002, 0x806c846c, - 0x826d806d, 0x866dff6d, - 0x, 0x8f7a8b77, + 0xbf85001a, 0x866eff6d, + 0x01ff, 0xbf06ff6e, + 0x0104, 0xbf850015, + 0x866eff7b, 0x71ff, + 0xbf840008, 0x866fff7b, + 0x7080, 0xbf840001, + 0xbeee1a87, 0xb8eff801, + 0x8e6e8c6e, 0x866e6f6e, + 0xbf85000a, 0x866eff6d, + 0x00ff, 0xbf850007, + 0xb8eef801, 0x866eff6e, + 0x0800, 0xbf850003, + 0x866eff7b, 0x0400, + 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, - 0xb97af807, 0x86fe7e7e, - 0x86ea6a6a, 0x8f6e8378, - 0xb96ee0c2, 0xbf82, - 0xb9780002, 0xbe801f6c, + 0x8e7a8b7a, 0x8977ff77, + 0xfc00, 0x8a77, + 0xba7ff807, 0x, + 0xb8faf812, 0xb8fbf813, + 0x8efa887a, 0xbf0d8f7b, + 0xbf840002, 0x877bff7b, + 0x, 0xc0031c3d, + 0x0010, 0xc0071bbd, + 0x, 0xc0071ebd, + 0x0008, 0xbf8cc07f, + 0x8671ff6d, 0x0100, + 0xbf840004, 0x92f1ff70, + 0x00010001, 0xbf840016, + 0xbf820005, 0x86708170, + 0x8e709770, 0x8977ff77, + 0x0080, 0x8077, + 0x86ee6e6e, 0xbf840001, + 0xbe801d6e, 0x866eff6d, + 0x01ff, 0xbf850005, + 0x8778ff78, 0x2000, + 0x80ec886c, 0x82ed806d, + 0xbf820005, 0x866eff6d, + 0x0100, 0xbf850002, + 0x806c846c, 0x826d806d, 0x866dff6d, 0x, - 0xbefa0080, 0xb97a0283, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xbeee007e, - 0xbeef007f, 0xbefe0180, - 0xbf94, 0x877a8478, - 0xb97af802, 0xbf8e0002, - 0xbf88fffe, 0xb8fa2a05, - 0x807a817a, 0x8e7a8a7a, -
[PATCH v4 21/24] drm/amdkfd: add pc sampling thread to trigger trap
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 91 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 89 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 6f50ba1f8989..ea9478c3738a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -39,6 +39,84 @@ struct supported_pc_sample_info supported_formats[] = { { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; +static int kfd_pc_sample_thread(void *param) +{ + struct amdgpu_device *adev; + struct kfd_node *node = param; + uint32_t timeout = 0; + ktime_t next_trap_time; + + mutex_lock(>pcs_data.mutex); + if (node->pcs_data.hosttrap_entry.base.active_count && + node->pcs_data.hosttrap_entry.base.pc_sample_info.interval && + node->kfd2kgd->trigger_pc_sample_trap) { + switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) { + case KFD_IOCTL_PCS_TYPE_TIME_US: + timeout = (uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval; + break; + default: + pr_debug("PC Sampling type %d not supported.", + node->pcs_data.hosttrap_entry.base.pc_sample_info.type); + } + } + mutex_unlock(>pcs_data.mutex); + if (!timeout) + return -EINVAL; + + adev = node->adev; + + allow_signal(SIGKILL); + while (!kthread_should_stop() && + !READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable) && + !signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + next_trap_time = ktime_add_us(ktime_get_raw(), timeout); + + node->kfd2kgd->trigger_pc_sample_trap(adev, node->vm_info.last_vmid_kfd, + >pcs_data.hosttrap_entry.base.target_simd, + >pcs_data.hosttrap_entry.base.target_wave_slot, + node->pcs_data.hosttrap_entry.base.pc_sample_info.method); + pr_debug_ratelimited("triggered a host trap."); + + might_sleep(); + do { + ktime_t wait_time; + s64 wait_ns, wait_us; + + wait_time = ktime_sub(next_trap_time, ktime_get_raw()); + wait_ns = ktime_to_ns(wait_time); + wait_us = ktime_to_us(wait_time); + if (wait_ns >= 1) + usleep_range(wait_us - 10, wait_us); + else if (wait_ns > 0) + schedule(); + else + break; + } while (1); + } + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + + return 0; +} + +static int kfd_pc_sample_thread_start(struct kfd_node *node) +{ + char thread_name[16]; + int ret = 0; + + snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = + kthread_run(kfd_pc_sample_thread, node, thread_name); + + if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + ret = PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + pr_debug("Failed to create pc sample thread for %s with ret = %d.", + thread_name, ret); + } + + return ret; +} + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { @@ -99,6 +177,7 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, struct pc_sampling_entry *pcs_entry) { bool pc_sampling_start = false; + int ret = 0; pcs_entry->enabled = true; mutex_lock(>dev->pcs_data.mutex); @@ -112,13 +191,16 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, mutex_unlock(>dev->pcs_data.mutex); while (pc_sampling_start) { - if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { usleep_range(1000, 2000); - else + } else { + if (!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread) + ret = kfd_pc_sample_thread_start(pdd->dev); break; + }
[PATCH v4 22/24] drm/amdkfd: add pc sampling release when process release
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 25 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++ 3 files changed, 29 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index ea9478c3738a..783844ddd82f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -337,6 +337,31 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ return 0; } +void kfd_pc_sample_release(struct kfd_process_device *pdd) +{ + struct pc_sampling_entry *pcs_entry; + struct idr *idp; + uint32_t id; + + /* force to release all PC sampling task for this process */ + idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr; + do { + pcs_entry = NULL; + mutex_lock(>dev->pcs_data.mutex); + idr_for_each_entry(idp, pcs_entry, id) { + if (pcs_entry->pdd != pdd) + continue; + break; + } + mutex_unlock(>dev->pcs_data.mutex); + if (pcs_entry) { + if (pcs_entry->enabled) + kfd_pc_sample_stop(pdd, pcs_entry); + kfd_pc_sample_destroy(pdd, id, pcs_entry); + } + } while (pcs_entry); +} + int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h index 4eeded4ea5b6..6175563ca9be 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h @@ -30,5 +30,6 @@ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args); +void kfd_pc_sample_release(struct kfd_process_device *pdd); #endif /* KFD_PC_SAMPLING_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 4a450abf9fa9..bbad0b0848df 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -43,6 +43,7 @@ struct mm_struct; #include "kfd_svm.h" #include "kfd_smi_events.h" #include "kfd_debug.h" +#include "kfd_pc_sampling.h" /* * List of struct kfd_process (field kfd_process). @@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p) pr_debug("Releasing pdd (topology id %d) for process (pasid 0x%x)\n", pdd->dev->id, p->pasid); + kfd_pc_sample_release(pdd); + kfd_process_device_destroy_cwsr_dgpu(pdd); kfd_process_device_destroy_ib_mem(pdd); -- 2.25.1
[PATCH v4 23/24] drm/amdkfd: Set debug trap bit when enabling PC Sampling
From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit. It is also not valid to have the debugger attached to a process while PC sampling is enabled so adding some checks to prevent this. Signed-off-by: David Yat Sin Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 ++-- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 26 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 13 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 ++ 5 files changed, 54 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index d9cac97c54c0..bc37f3ee2c66 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2804,26 +2804,9 @@ static int runtime_enable(struct kfd_process *p, uint64_t r_debug, p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED; p->runtime_info.r_debug = r_debug; - p->runtime_info.ttmp_setup = enable_ttmp_setup; - if (p->runtime_info.ttmp_setup) { - for (i = 0; i < p->n_pdds; i++) { - struct kfd_process_device *pdd = p->pdds[i]; - - if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { - amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - true, - pdd->dev->vm_info.last_vmid_kfd); - } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { - pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - false, - 0); - } - } - } + if (enable_ttmp_setup) + kfd_dbg_enable_ttmp_setup(p); retry: if (p->debug_trap_enabled) { @@ -2972,10 +2955,10 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; } - /* Check if target is still PTRACED. */ rcu_read_lock(); + /* Check if target is still PTRACED. */ if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE - && ptrace_parent(target->lead_thread) != current) { + && ptrace_parent(target->lead_thread) != current) { pr_err("PID %i is not PTRACED and cannot be debugged\n", args->pid); r = -EPERM; } @@ -2985,6 +2968,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; mutex_lock(>mutex); + if (!!target->pc_sampling_ref) { + pr_debug("Cannot enable debug trap on PID:%d because PC Sampling active\n", args->pid); + r = -EBUSY; + goto unlock_out; + } if (args->op != KFD_IOC_DBG_TRAP_ENABLE && !target->debug_trap_enabled) { pr_err("PID %i not debug enabled for op %i\n", args->pid, args->op); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index d889e3545120..8d836c65c636 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1120,3 +1120,29 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, mutex_unlock(>event_mutex); } + +void kfd_dbg_enable_ttmp_setup(struct kfd_process *p) +{ + int i; + + if (p->runtime_info.ttmp_setup) + return; + + p->runtime_info.ttmp_setup = true; + for (i = 0; i < p->n_pdds; i++) { + struct kfd_process_device *pdd = p->pdds[i]; + + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { + amdgpu_gfx_off_ctrl(pdd->dev->adev, false); + pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + true, + pdd->dev->vm_info.last_vmid_kfd); + } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { + pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + false, + 0); + } +
[PATCH v4 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index ec1b6404b185..7c2c867b57e8 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -41,9 +41,10 @@ * - 1.13 - Add debugger API * - 1.14 - Update kfd_event_data * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl + * - 1.16 - Add PC Sampling ioctl */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 15 +#define KFD_IOCTL_MINOR_VERSION 16 struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ -- 2.25.1
[PATCH v4 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index aff08321e976..27eda75ceecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } +static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap, }; -- 2.25.1
[PATCH v4 18/24] drm/amdkfd: enable pc sampling stop
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 29 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index b46caa52fbe8..53e44e68408e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -99,10 +99,33 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd) return -EINVAL; } -static int kfd_pc_sample_stop(struct kfd_process_device *pdd) +static int kfd_pc_sample_stop(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_stop = false; + + pcs_entry->enabled = false; + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.active_count--; + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) { + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, true); + pc_sampling_stop = true; + } + mutex_unlock(>dev->pcs_data.mutex); + + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + if (pc_sampling_stop) { + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; + pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, false); + mutex_unlock(>dev->pcs_data.mutex); + } + + return 0; } static int kfd_pc_sample_create(struct kfd_process_device *pdd, @@ -250,7 +273,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (!pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_stop(pdd); + return kfd_pc_sample_stop(pdd, pcs_entry); } return -EINVAL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 5a7805147da0..7bdcbe6be4fe 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,10 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + uint32_t target_simd; /* target simd for trap */ + uint32_t target_wave_slot; /* target wave slot for trap */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; -- 2.25.1
[PATCH v4 19/24] drm/amdkfd: add queue remapping
Add queue remapping to ensure that any waves executing the PC sampling part of the trap handler are done before kfd_pc_sample_stop returns, and that no new waves enter that part of the trap handler afterwards. This avoids race conditions that could lead to use-after-free. Unmapping and remapping the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 4 +++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index c0e71543389a..a3f57be63f4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm) return debug_map_and_unlock(dqm); } +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period) +{ + dqm_lock(dqm); + if (!dqm->dev->kfd->shared_resources.enable_mes) + execute_queues_cpsch(dqm, filter, filter_param, grace_period); + dqm_unlock(dqm); +} + #if defined(CONFIG_DEBUG_FS) static void seq_reg_dump(struct seq_file *m, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index cf7e182588f8..f8aae3747a36 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm); int debug_map_and_unlock(struct device_queue_manager *dqm); int debug_refresh_runlist(struct device_queue_manager *dqm); +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period); + static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd) { return (pdd->lds_base >> 16) & 0xFF; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 53e44e68408e..df2f4bfd0cda 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -24,6 +24,7 @@ #include "kfd_priv.h" #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +#include "kfd_device_queue_manager.h" struct supported_pc_sample_info { uint32_t ip_version; @@ -115,9 +116,10 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + remap_queue(pdd->dev->dqm, + KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD); if (pc_sampling_stop) { - mutex_lock(>dev->pcs_data.mutex); pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; -- 2.25.1
[PATCH v4 09/24] drm/amdkfd: add interface to trigger pc sampling trap
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 6d094cf3587d..12f9021d563e 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -31,6 +31,8 @@ #include #include #include +#include + #include "amdgpu_irq.h" #include "amdgpu_gfx.h" @@ -318,6 +320,11 @@ struct kfd2kgd_calls { void (*program_trap_handler_settings)(struct amdgpu_device *adev, uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr, uint32_t inst); + uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); }; #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ -- 2.25.1
[PATCH v4 17/24] drm/amdkfd: add setting trap pc sampling flag
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 2df240518d1f..5a7805147da0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd, uint64_t tma_addr); void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled); +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled); /* CWSR initialization */ int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 3e3cead6ccf8..4a450abf9fa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1463,6 +1463,19 @@ void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, } } +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled) +{ + if (qpd->cwsr_kaddr) { + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(method, [2]); + else + clear_bit(method, [2]); + } +} + /* * On return the kfd_process is fully operational and will be freed when the * mm is released -- 2.25.1
[PATCH v4 05/24] drm/amdkfd: enable pc sampling create
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 59 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index e9277c9beec7..9267de0bbdac 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -108,7 +108,64 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd) static int kfd_pc_sample_create(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + struct kfd_pc_sample_info *supported_format = NULL; + struct kfd_pc_sample_info user_info; + int ret; + int i; + + if (user_args->num_sample_info != 1) + return -EINVAL; + + ret = copy_from_user(_info, (void __user *) user_args->sample_info_ptr, + sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info from user\n"); + return -EFAULT; + } + + if (user_info.flags & KFD_IOCTL_PCS_FLAG_POWER_OF_2 && + user_info.interval & (user_info.interval - 1)) { + pr_debug("Sampling interval's power is unmatched!"); + return -EINVAL; + } + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version + && user_info.method == supported_formats[i].sample_info->method + && user_info.type == supported_formats[i].sample_info->type + && user_info.interval <= supported_formats[i].sample_info->interval_max + && user_info.interval >= supported_formats[i].sample_info->interval_min) { + supported_format = + (struct kfd_pc_sample_info *)supported_formats[i].sample_info; + break; + } + } + + if (!supported_format) { + pr_debug("Sampling format is not supported!"); + return -EOPNOTSUPP; + } + + mutex_lock(>dev->pcs_data.mutex); + if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && + memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info))) { + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : -EEXIST; + } + + /* TODO: add trace_id return */ + + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; + + pdd->dev->pcs_data.hosttrap_entry.base.use_count++; + mutex_unlock(>dev->pcs_data.mutex); + + return 0; } static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index f55195fea3df..96999f602224 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,9 +269,19 @@ struct kfd_vmid_info { struct kfd_dev; +struct kfd_dev_pc_sampling_data { + uint32_t use_count; /* Num of PC sampling sessions */ + struct kfd_pc_sample_info pc_sample_info; +}; + +struct kfd_dev_pcs_hosttrap { + struct kfd_dev_pc_sampling_data base; +}; + /* Per device PC Sampling data */ struct kfd_dev_pc_sampling { struct mutex mutex; + struct kfd_dev_pcs_hosttrap hosttrap_entry; }; struct kfd_node { -- 2.25.1
[PATCH v4 11/24] drm/amdkfd/gfx9: enable host trap
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index d1caaf0e6a7c..af1f678790e7 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820258, + 0xbf820001, 0xbf82025e, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { }; static const uint32_t cwsr_trap_arcturus_hex[] = { - 0xbf820001, 0xbf8202d4, + 0xbf820001, 0xbf8202da, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { }; static const uint32_t cwsr_trap_aldebaran_hex[] = { - 0xbf820001, 0xbf8202df, + 0xbf820001, 0xbf8202e5, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, +
[PATCH v4 06/24] drm/amdkfd: add trace_id return
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 6 ++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0e24e011f66b..bcaeedac8fe0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev) static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); + idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); } static void kfd_pc_sampling_exit(struct kfd_node *dev) { + idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr); mutex_destroy(>pcs_data.mutex); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 9267de0bbdac..a607fc148958 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -110,6 +110,7 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, { struct kfd_pc_sample_info *supported_format = NULL; struct kfd_pc_sample_info user_info; + struct pc_sampling_entry *pcs_entry; int ret; int i; @@ -157,7 +158,19 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return ret ? -EFAULT : -EEXIST; } - /* TODO: add trace_id return */ + pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL); + if (!pcs_entry) { + mutex_unlock(>dev->pcs_data.mutex); + return -ENOMEM; + } + + i = idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + pcs_entry, 1, 0, GFP_KERNEL); + if (i < 0) { + mutex_unlock(>dev->pcs_data.mutex); + kfree(pcs_entry); + return i; + } if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; @@ -165,6 +178,11 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, pdd->dev->pcs_data.hosttrap_entry.base.use_count++; mutex_unlock(>dev->pcs_data.mutex); + pcs_entry->pdd = pdd; + user_args->trace_id = (uint32_t)i; + + pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, i, pdd->dev->id); + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 96999f602224..2df240518d1f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,7 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; @@ -756,6 +757,11 @@ enum kfd_pdd_bound { */ #define SDMA_ACTIVITY_DIVISOR 100 +struct pc_sampling_entry { + bool enabled; + struct kfd_process_device *pdd; +}; + /* Data that is per-process-per device. */ struct kfd_process_device { /* The device that owns this data. */ -- 2.25.1
[PATCH v4 16/24] drm/amdkfd: use bit operation set debug trap
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 717a60d7a4ea..3e3cead6ccf8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1443,13 +1443,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool supported) return true; } +/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */ +enum KFD_TRAP_TYPE_BIT { + KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */ + KFD_TRAP_TYPE_HOST, + KFD_TRAP_TYPE_STOCHASTIC, +}; + void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled) { if (qpd->cwsr_kaddr) { - uint64_t *tma = - (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); - tma[2] = enabled; + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(KFD_TRAP_TYPE_DEBUG, [2]); + else + clear_bit(KFD_TRAP_TYPE_DEBUG, [2]); } } -- 2.25.1
[PATCH v4 20/24] drm/amdkfd: enable pc sampling start
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 27 +--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index df2f4bfd0cda..6f50ba1f8989 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -95,9 +95,30 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) + usleep_range(1000, 2000); + else + break; + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -269,7 +290,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) -- 2.25.1
[PATCH v4 14/24] drm/amdkfd: trigger pc sampling trap for arcturus
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 0ba15dcbe4e1..10b362e072a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, return 0; } + +static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls arcturus_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy, - .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings + .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap }; -- 2.25.1
[PATCH v4 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 61 +- 1 file changed, 60 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..ec1b6404b185 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1447,6 +1447,62 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 interval; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units + */ + __u64 interval_min; /* [OUT] */ + __u64 interval_max; /* [OUT] */ + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +#define KFD_IOCTL_PCS_QUERY_TYPE_FULL (1 << 0) /* If not set, return current */ + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op;/* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; + __u32 flags; /* kfd_ioctl_pcs_query flags */ + __u32 reserved; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1567,7 +1623,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP\ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif -- 2.25.1
[PATCH v4 08/24] drm/amdkfd: enable pc sampling destroy
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 72c66d4bd24f..b46caa52fbe8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -186,10 +186,24 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) +static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", + pcs_entry, trace_id, pdd->dev->id); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.use_count--; + idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, trace_id); + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 0x0, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + + kfree(pcs_entry); + + return 0; } int kfd_pc_sample(struct kfd_process_device *pdd, @@ -224,7 +238,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EBUSY; else - return kfd_pc_sample_destroy(pdd, args->trace_id); + return kfd_pc_sample_destroy(pdd, args->trace_id, pcs_entry); case KFD_IOCTL_PCS_OP_START: if (pcs_entry->enabled) -- 2.25.1
[PATCH v4 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 5a35a8ca8922..7d8c0e13ac12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { + uint32_t value = 0; + + value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); + value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); + + /* select *target_simd */ + value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); + /* select *target_wave_slot */ + value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + + mutex_lock(>grbm_idx_mutex); + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + WREG32_SOC15(GC, 0, mmSQ_CMD, value); + mutex_unlock(>grbm_idx_mutex); + + *target_wave_slot %= max_wave_slot; + if (!(*target_wave_slot)) { + (*target_simd)++; + *target_simd %= max_simd; + } + } else { + pr_debug("PC Sampling method %d not supported.", method); + return -EOPNOTSUPP; + } + return 0; +} + const struct kfd2kgd_calls gfx_v9_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h index ce424615f59b..b47b926891a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h @@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct amdgpu_device *adev, uint32_t grace_period, uint32_t *reg_offset, uint32_t *reg_data); +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); -- 2.25.1
[PATCH v4 13/24] drm/amdgpu: add sq host trap status check
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index adfe5e5585e5..43edd62df5fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev) +{ + uint32_t sq_hosttrap_status = 0x0; + int i, j; + + mutex_lock(>grbm_idx_mutex); + for (i = 0; i < adev->gfx.config.max_shader_engines; i++) { + for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) { + amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0); + sq_hosttrap_status = RREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS); + + if (sq_hosttrap_status & SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) { + WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS, + SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK); + sq_hosttrap_status = 0x0; + continue; + } + if (sq_hosttrap_status) + goto out; + } + } + +out: + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + mutex_unlock(>grbm_idx_mutex); + + return sq_hosttrap_status; +} + uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, uint32_t vmid, uint32_t max_wave_slot, @@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, { if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { uint32_t value = 0; + uint32_t sq_hosttrap_status = 0x0; + + sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev); + /* skip when last host trap request is still pending to complete */ + if (sq_hosttrap_status) + return 0; value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h index 12d451e5475b..5b17d9066452 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h @@ -462,6 +462,8 @@ #define mmSQ_IND_DATA_BASE_IDX 0 #define mmSQ_CMD 0x037b #define mmSQ_CMD_BASE_IDX 0 +#define mmSQ_HOSTTRAP_STATUS 0x0376 +#define mmSQ_HOSTTRAP_STATUS_BASE_IDX 0 #define mmSQ_TIME_HI 0x037c #define mmSQ_TIME_HI_BASE_IDX 0 #define mmSQ_TIME_LO 0x037d diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h index efc16ddf274a..3dfe4ab31421 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h @@ -2616,6 +2616,11 @@ //SQ_CMD_TIMESTAMP #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 0x0 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK 0x00FFL +//SQ_HOSTTRAP_STATUS +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT 0x0 +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT 0x8 +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK 0x00FFL +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK 0x0100L
[PATCH v4 07/24] drm/amdkfd: check pcs_entry valid
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a607fc148958..72c66d4bd24f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -195,6 +195,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); + + /* pcs_entry is only for this pc sampling process, +* which has kfd_process->mutex protected here. +*/ + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -203,13 +221,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL; -- 2.25.1
[PATCH v4 04/24] drm/amdkfd: add pc sampling mutex
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc224..0e24e011f66b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev) spin_lock_init(>smi_lock); } +static void kfd_pc_sampling_init(struct kfd_node *dev) +{ + mutex_init(>pcs_data.mutex); +} + +static void kfd_pc_sampling_exit(struct kfd_node *dev) +{ + mutex_destroy(>pcs_data.mutex); +} + static int kfd_init_node(struct kfd_node *node) { int err = -1; @@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node) } kfd_smi_init(node); + kfd_pc_sampling_init(node); return 0; @@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes) kfd_topology_remove_device(knode); if (knode->gws) amdgpu_amdkfd_free_gws(knode->adev, knode->gws); + kfd_pc_sampling_exit(knode); kfree(knode); kfd->nodes[i] = NULL; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index ae9a41670909..f55195fea3df 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,6 +269,11 @@ struct kfd_vmid_info { struct kfd_dev; +/* Per device PC Sampling data */ +struct kfd_dev_pc_sampling { + struct mutex mutex; +}; + struct kfd_node { unsigned int node_id; struct amdgpu_device *adev; /* Duplicated here along with keeping @@ -322,6 +327,8 @@ struct kfd_node { struct kfd_local_mem_info local_mem_info; struct kfd_dev *kfd; + + struct kfd_dev_pc_sampling pcs_data; }; struct kfd_dev { -- 2.25.1
[PATCH v4 03/24] drm/amdkfd: enable pc sampling query
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 65 +++- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a7e78ff42d07..e9277c9beec7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -25,10 +25,73 @@ #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +struct supported_pc_sample_info { + uint32_t ip_version; + const struct kfd_pc_sample_info *sample_info; +}; + +const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = { + 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, KFD_IOCTL_PCS_TYPE_TIME_US }; + +struct supported_pc_sample_info supported_formats[] = { + { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 }, + { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, +}; + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + uint64_t sample_offset; + int num_method = 0; + int ret; + int i; + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) + num_method++; + + if (!num_method) { + pr_debug("PC Sampling not supported on GC_HWIP:0x%x.", + pdd->dev->adev->ip_versions[GC_HWIP][0]); + return -EOPNOTSUPP; + } + + ret = 0; + mutex_lock(>dev->pcs_data.mutex); + if (user_args->flags != KFD_IOCTL_PCS_QUERY_TYPE_FULL && + pdd->dev->pcs_data.hosttrap_entry.base.use_count) { + /* If we already have a session, restrict returned list to current method */ + user_args->num_sample_info = 1; + + if (user_args->sample_info_ptr) + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : 0; + } + mutex_unlock(>dev->pcs_data.mutex); + + if (!user_args->sample_info_ptr || user_args->num_sample_info < num_method) { + user_args->num_sample_info = num_method; + pr_debug("ASIC requires space for %d kfd_pc_sample_info entries.", num_method); + return -ENOSPC; + } + + sample_offset = user_args->sample_info_ptr; + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) { + ret = copy_to_user((void __user *) sample_offset, + supported_formats[i].sample_info, sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info to user."); + return -EFAULT; + } + sample_offset += sizeof(struct kfd_pc_sample_info); + } + } + + return 0; } static int kfd_pc_sample_start(struct kfd_process_device *pdd) -- 2.25.1
[PATCH v4 02/24] drm/amdkfd: add pc sampling support
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 5 files changed, 172 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index a5ae7bcf44eb..790fd028a681 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -57,7 +57,8 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ - $(AMDKFD_PATH)/kfd_debug.o + $(AMDKFD_PATH)/kfd_debug.o \ + $(AMDKFD_PATH)/kfd_pc_sampling.o ifneq ($(CONFIG_DEBUG_FS),) AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 80e90fdef291..d9cac97c54c0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -41,6 +41,7 @@ #include "kfd_priv.h" #include "kfd_device_queue_manager.h" #include "kfd_svm.h" +#include "kfd_pc_sampling.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" #include "amdgpu_dma_buf.h" @@ -1745,6 +1746,39 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) } #endif +static int kfd_ioctl_pc_sample(struct file *filep, + struct kfd_process *p, void __user *data) +{ + struct kfd_ioctl_pc_sample_args *args = data; + struct kfd_process_device *pdd; + int ret = 0; + + if (sched_policy == KFD_SCHED_POLICY_NO_HWS) { + pr_err("PC Sampling does not support sched_policy %i", sched_policy); + return -EINVAL; + } + + mutex_lock(>mutex); + pdd = kfd_process_device_data_by_id(p, args->gpu_id); + + if (!pdd) { + pr_debug("could not find gpu id 0x%x.", args->gpu_id); + ret = -EINVAL; + } else if (args->op == KFD_IOCTL_PCS_OP_START) { + pdd = kfd_bind_process_to_device(pdd->dev, p); + if (IS_ERR(pdd)) { + pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); + ret = -ESRCH; + } + } + + if (!ret) + ret = kfd_pc_sample(pdd, args); + mutex_unlock(>mutex); + + return ret; +} + static int criu_checkpoint_process(struct kfd_process *p, uint8_t __user *user_priv_data, uint64_t *priv_offset) @@ -3219,6 +3253,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = { AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP, kfd_ioctl_set_debug_trap, 0), + + AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE, + kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON), }; #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls) @@ -3295,6 +3332,14 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) } } + /* PC Sampling Monitor */ + if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) { + if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) { + retcode = -EACCES; + goto err_i1; + } + } + if (cmd & (IOC_IN | IOC_OUT)) { if (asize <= sizeof(stack_kdata)) { kdata = stack_kdata; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c new file mode 100644 index ..a7e78ff42d07 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED
[PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942
PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (5): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: Set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 75 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 26 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 426 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 46 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |7 + include/uapi/linux/kfd_ioctl.h| 64 +- 21 files changed, 1914 insertions(+), 1080 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h -- 2.25.1
Re: [PATCH] drm/amd/display: Increase frame-larger-than for all display_mode_vba files
Applied. Thanks! On Mon, Feb 5, 2024 at 5:08 PM Nathan Chancellor wrote: > > After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y > and CONFIG_WERROR=y enabled) has a few new instances of > -Wframe-larger-than for the mode support and system configuration > functions: > > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: > error: stack frame size (2144) exceeds limit (2048) in > 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] >3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct > display_mode_lib *mode_lib) > | ^ > 1 error generated. > > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6: > error: stack frame size (2192) exceeds limit (2048) in > 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] >3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct > display_mode_lib *mode_lib) > | ^ > 1 error generated. > > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: > error: stack frame size (2128) exceeds limit (2048) in > 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] >3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct > display_mode_lib *mode_lib) > | ^ > 1 error generated. > > Without the sanitizers enabled, there are no warnings. > > This was the catalyst for commit 6740ec97bcdb ("drm/amd/display: > Increase frame warning limit with KASAN or KCSAN in dml2") and that same > change was made to dml in commit 5b750b22530f ("drm/amd/display: > Increase frame warning limit with KASAN or KCSAN in dml") but the > frame_warn_flag variable was not applied to all files. Do so now to > clear up the warnings and make all these files consistent. > > Cc: sta...@vger.kernel.org > Closes: https://github.com/ClangBuiltLinux/linux/issue/1990 > Signed-off-by: Nathan Chancellor > --- > drivers/gpu/drm/amd/display/dc/dml/Makefile | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile > b/drivers/gpu/drm/amd/display/dc/dml/Makefile > index 6042a5a6a44f..59ade76ffb18 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile > +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile > @@ -72,11 +72,11 @@ CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_lib.o := > $(dml_ccflags) > CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_vba.o := $(dml_ccflags) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn10/dcn10_fpu.o := $(dml_ccflags) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/dcn20_fpu.o := $(dml_ccflags) > -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) > +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) > $(frame_warn_flag) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20.o := $(dml_ccflags) > -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) > +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) > $(frame_warn_flag) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20v2.o := > $(dml_ccflags) > -CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) > +CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) > $(frame_warn_flag) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_rq_dlg_calc_21.o := $(dml_ccflags) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) > $(frame_warn_flag) > CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_rq_dlg_calc_30.o := $(dml_ccflags) > > --- > base-commit: 6813cdca4ab94a238f8eb0cef3d3f3fcbdfb0ee0 > change-id: 20240205-amdgpu-raise-flt-for-dml-vba-files-ee5b5a9c5e43 > > Best regards, > -- > Nathan Chancellor >
Re: [PATCH 3/3] drm/amdgpu: wire up the can_remove() callback
On Fri, Feb 02, 2024 at 03:40:03PM -0800, Greg Kroah-Hartman wrote: > On Fri, Feb 02, 2024 at 05:25:56PM -0500, Hamza Mahfooz wrote: > > Removing an amdgpu device that still has user space references allocated > > to it causes undefined behaviour. > > Then fix that please. There should not be anything special about your > hardware that all of the tens of thousands of other devices can't handle > today. > > What happens when I yank your device out of a system with a pci hotplug > bus? You can't prevent that either, so this should not be any different > at all. > > sorry, but please, just fix your driver. fwiw Christian König from amd already rejected this too, I have no idea why this was submitted since the very elaborate plan I developed with a bunch of amd folks was to fix the various lifetime lolz we still have in drm. We unfortunately export the world of internal objects to userspace as uabi objects with dma_buf, dma_fence and everything else, but it's all fixable and we have the plan even documented: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug So yeah anything that isn't that plan of record is very much no-go for drm drivers. Unless we change that plan of course, but that needs a documentation patch first and a big discussion. Aside from an absolute massive pile of kernel-internal refcounting bugs the really big one we agreed on after a lot of discussion is that SIGBUS on dma-buf mmaps is no-go for drm drivers, because it would break way too much userspace in ways which are simply not fixable (since sig handlers are shared in a process, which means the gl/vk driver cannot use it). Otherwise it's bog standard "fix the kernel bugs" work, just a lot of it. Cheers, Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v2 3/3] drm/amdgpu: sync page table freeing with tlb flush
Hey Christian, On 01/02/2024 14:48, Christian König wrote: Am 31.01.24 um 18:14 schrieb Shashank Sharma: This patch: - Attaches the TLB flush fence to the PT objects being freed - Adds a new ptr in VM to save this last TLB flush fence - Adds a new lock in VM to prevent out-of-context update of saved TLB flush fence - Adds a new ptr in tlb_flush structure to save vm The idea is to delay freeing of page table objects until we have the respective TLB entries flushed. V2: rebase Cc: Christian König Cc: Alex Deucher Cc: Felix Kuehling Cc: Rajneesh Bhardwaj Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 27 +++ .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 13 +++-- 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 67c690044b97..b0e81c249e3a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2245,6 +2245,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, vm->generation = 0; mutex_init(>eviction_lock); + mutex_init(>tlb_flush_lock); vm->evicting = false; vm->tlb_fence_context = dma_fence_context_alloc(1); @@ -2360,7 +2361,9 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) } dma_fence_put(vm->last_update); + dma_fence_put(vm->tlb_fence_last); vm->last_update = dma_fence_get_stub(); + vm->tlb_fence_last = dma_fence_get_stub(); vm->is_compute_context = true; /* Free the shadow bo for compute VM */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 8e6fd25d07b7..b05bc586237f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -334,6 +334,10 @@ struct amdgpu_vm { uint64_t *tlb_seq_cpu_addr; uint64_t tlb_fence_context; + /* Ptr and lock to maintain tlb flush sync */ + struct mutex tlb_flush_lock; + struct dma_fence *tlb_fence_last; + atomic64_t kfd_last_flushed_seq; /* How many times we had to re-generate the page tables */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 95dc0afdaffb..f1c4418c4d63 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -631,6 +631,18 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, return r; } +static inline +void amdgpu_vm_attach_tlb_fence(struct amdgpu_bo *bo, struct dma_fence *fence) +{ + if (!bo || !fence) + return; + + if (!dma_resv_reserve_fences(bo->tbo.base.resv, 1)) { + dma_resv_add_fence(bo->tbo.base.resv, fence, + DMA_RESV_USAGE_BOOKKEEP); + } +} + /** * amdgpu_vm_pt_free - free one PD/PT * @@ -638,6 +650,7 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev, */ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) { + struct amdgpu_vm *vm; struct amdgpu_bo *shadow; if (!entry->bo) @@ -646,9 +659,23 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) entry->bo->vm_bo = NULL; shadow = amdgpu_bo_shadowed(entry->bo); if (shadow) { + vm = shadow->vm_bo->vm; + + mutex_lock(>tlb_flush_lock); + if (vm->tlb_fence_last) + amdgpu_vm_attach_tlb_fence(shadow, vm->tlb_fence_last); + mutex_unlock(>tlb_flush_lock); + ttm_bo_set_bulk_move(>tbo, NULL); amdgpu_bo_unref(); } + + vm = entry->vm; + mutex_lock(>tlb_flush_lock); + if (vm->tlb_fence_last) + amdgpu_vm_attach_tlb_fence(entry->bo, vm->tlb_fence_last); + mutex_unlock(>tlb_flush_lock); + That approach doesn't make sense. Instead add the freed PT/PDs to a linked list in the parameters structure and only really free them after adding the fence to the root PD. Sure, I will do those changes. Just for the curiosity, why wouldn't this approach work ? Wouldn't this delay the actual freeing of buffers TTM until the fence signal ? - Shashank ttm_bo_set_bulk_move(>bo->tbo, NULL); spin_lock(>vm->status_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c index 569681badd7c..54ec81d30034 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c @@ -31,6 +31,7 @@ struct amdgpu_tlb_fence { struct dma_fence base; struct amdgpu_device *adev; + struct amdgpu_vm *vm; Big NAK to that. The VM might not live long enough to see the end of the TLB flush. Regards, Christian. struct dma_fence
Re: linux-next: Tree for Feb 6 (gpu/drm/amd/display/ kernel-doc warnings)
On 2/5/24 20:43, Stephen Rothwell wrote: > Hi all, > > Changes since 20240205: > Hi Rodrigo, Are you aware of these kernel-doc warnings? I think they are due to commit b8c1c3a82e75 Author: Rodrigo Siqueira Date: Mon Jan 22 14:24:57 2024 -0700 Documentation/gpu: Add kernel doc entry for MPC ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no structured comments found ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of kernel-doc format: * @@overlap_only: Whether overlapping of different planes is allowed. ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of kernel-doc format: * @@overlap_only: Whether overlapping of different planes is allowed. ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of kernel-doc format: * @@overlap_only: Whether overlapping of different planes is allowed. ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter or struct member 'pre_multiplied_alpha' not described in 'mpcc_blnd_cfg' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter or struct member 'overlap_only' not described in 'mpcc_blnd_cfg' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'read_mpcc_state' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'mpc_init_single_inst' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'get_mpcc_for_dpp_from_secondary' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'get_mpcc_for_dpp' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'wait_for_idle' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'assert_mpcc_idle_before_connect' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'init_mpcc_list_from_hw' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_denorm' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_denorm_clamp' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_output_csc' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_ocsc_default' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_output_gamma' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'power_on_mpc_mem_pwr' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_dwb_mux' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'disable_dwb_mux' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'is_dwb_idle' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_out_rate_control' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'set_gamut_remap' not described in 'mpc_funcs' ../drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or struct member 'program_1dlut' not described in 'mpc_funcs'
[lvc-project] [PATCH v2] drm/amd/pm: check return value of amdgpu_irq_add_id()
amdgpu_irq_ad_id() may fail and the irq handlers will not be registered. This patch adds error code check. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Igor Artemiev --- v2: Free the source as Alexey Khoroshilov suggested. .../drm/amd/pm/powerplay/hwmgr/smu_helper.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c index 79a566f3564a..109df1039d5c 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c @@ -647,28 +647,41 @@ int smu9_register_irq_handlers(struct pp_hwmgr *hwmgr) { struct amdgpu_irq_src *source = kzalloc(sizeof(struct amdgpu_irq_src), GFP_KERNEL); + int ret; if (!source) return -ENOMEM; source->funcs = _irq_funcs; - amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), + ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), SOC15_IH_CLIENTID_THM, THM_9_0__SRCID__THM_DIG_THERM_L2H, source); - amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), + if (ret) + goto err; + + ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), SOC15_IH_CLIENTID_THM, THM_9_0__SRCID__THM_DIG_THERM_H2L, source); + if (ret) + goto err; /* Register CTF(GPIO_19) interrupt */ - amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), + ret = amdgpu_irq_add_id((struct amdgpu_device *)(hwmgr->adev), SOC15_IH_CLIENTID_ROM_SMUIO, SMUIO_9_0__SRCID__SMUIO_GPIO19, source); + if (ret) + goto err; return 0; + +err: + kfree(source); + + return ret; } void *smu_atom_get_data_table(void *dev, uint32_t table, uint16_t *size, -- 2.39.2