Re: Spontaneous reboots when using RX 560
Finally some progress ! I found a thread with a couple of people having the same symptoms as I do ( [1] ), and interestingly that was with the same brand & model of card. Although there is no solution, there is a work around that works : echo -n low > /sys/class/drm/card0/device/power_dpm_force_performance_level Then the card seems stable. At least I was able to get through an entire GL benchmark and also a bunch of CL tests without crashing. (By default it crashes nearly instantly). Of course the card is slow but it's better than nothing and maybe gives a clue to a solution ? Following some advice on IRC, I also tried setting it to "high". This doesn't crash immediately when doing that and the display stays fine and I can move window and light stuff, but trying to actually run GL or CL stuff and it then crashes. I also dumped the Power Play tables, see [2]. I can't really understand them, there is definitely some weird values, but not sure if that's normal or not. As I noted earlier in the thread, when I first used the card on windows, using just AMD's driver the card was stuck at its lowest clock rate and performed poorly in benchmark. It was only after I loaded Asrock's own tweak utility that the card started to auto adapt its clock / voltages. Not sure if there is a way to dump windows power play config ? Cheers, Sylvain [1] https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1112121-rx-560-crash-under-light-load [2] https://pastebin.com/raw/uWh6WLmh ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Just in case there was any doubt, seems OpenCL workload crashes the card just as hard. (That was the AMDGPU-Pro OpenCL lib, legacy version. Can't get PAL to detect the card at all) Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 1/3] drm/amd/powerplay: add lock protection for swSMU APIs V2
Acked-by: Feifei Xu Thanks, Feifei -Original Message- From: amd-gfx On Behalf Of Quan, Evan Sent: Friday, October 18, 2019 10:57 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Grodzovsky, Andrey ; Quan, Evan Subject: [PATCH 1/3] drm/amd/powerplay: add lock protection for swSMU APIs V2 This is a quick and low risk fix. Those APIs which are exposed to other IPs or to support sysfs/hwmon interfaces or DAL will have lock protection. Meanwhile no lock protection is enforced for swSMU internal used APIs. Future optimization is needed. V2: strip the lock protection for all swSMU internal APIs Change-Id: I8392652c9da1574a85acd9b171f04380f3630852 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h | 6 - drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 23 +- .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 4 +- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 696 -- drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 3 - .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 163 ++-- drivers/gpu/drm/amd/powerplay/navi10_ppt.c| 15 +- drivers/gpu/drm/amd/powerplay/renoir_ppt.c| 14 +- drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 22 +- drivers/gpu/drm/amd/powerplay/smu_v12_0.c | 3 - drivers/gpu/drm/amd/powerplay/vega20_ppt.c| 20 +- 12 files changed, 777 insertions(+), 198 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c index 263265245e19..28d32725285b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c @@ -912,7 +912,8 @@ int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool low) if (is_support_sw_smu(adev)) { ret = smu_get_dpm_freq_range(&adev->smu, SMU_GFXCLK, low ? &clk_freq : NULL, -!low ? &clk_freq : NULL); +!low ? &clk_freq : NULL, +true); if (ret) return 0; return clk_freq * 100; @@ -930,7 +931,8 @@ int amdgpu_dpm_get_mclk(struct amdgpu_device *adev, bool low) if (is_support_sw_smu(adev)) { ret = smu_get_dpm_freq_range(&adev->smu, SMU_UCLK, low ? &clk_freq : NULL, -!low ? &clk_freq : NULL); +!low ? &clk_freq : NULL, +true); if (ret) return 0; return clk_freq * 100; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h index 1c5c0fd76dbf..2cfb677272af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h @@ -298,12 +298,6 @@ enum amdgpu_pcie_gen { #define amdgpu_dpm_get_current_power_state(adev) \ ((adev)->powerplay.pp_funcs->get_current_power_state((adev)->powerplay.pp_handle)) -#define amdgpu_smu_get_current_power_state(adev) \ - ((adev)->smu.ppt_funcs->get_current_power_state(&((adev)->smu))) - -#define amdgpu_smu_set_power_state(adev) \ - ((adev)->smu.ppt_funcs->set_power_state(&((adev)->smu))) - #define amdgpu_dpm_get_pp_num_states(adev, data) \ ((adev)->powerplay.pp_funcs->get_pp_num_states((adev)->powerplay.pp_handle, data)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c index c50d5f1e75e5..36f36b35000d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -211,7 +211,7 @@ static ssize_t amdgpu_get_dpm_state(struct device *dev, if (is_support_sw_smu(adev)) { if (adev->smu.ppt_funcs->get_current_power_state) - pm = amdgpu_smu_get_current_power_state(adev); + pm = smu_get_current_power_state(&adev->smu); else pm = adev->pm.dpm.user_state; } else if (adev->powerplay.pp_funcs->get_current_power_state) { @@ -957,7 +957,7 @@ static ssize_t amdgpu_set_pp_dpm_sclk(struct device *dev, return ret; if (is_support_sw_smu(adev)) - ret = smu_force_clk_levels(&adev->smu, SMU_SCLK, mask); + ret = smu_force_clk_levels(&adev->smu, SMU_SCLK, mask, true); else if (adev->powerplay.pp_funcs->force_clock_level) ret = amdgpu_dpm_force_clock_level(adev, PP_SCLK, mask); @@ -1004,7 +1004,7 @@ static ssize_t amdgpu_set_pp_dpm_mclk(struct device *dev, return ret; if (is_support_sw_smu(adev)) - ret = smu_force_clk_levels(&adev->smu, SMU_MCLK, mask); + ret = smu_force_clk_lev
[PATCH] drm/amd/amdgpu: correct length misspelling
Correct the "_LENTH" mispelling in the AMDGPU_MAX_TIMEOUT_PARAM_LENGTH constant. Signed-off-by: Wambui Karuga --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index c5b3c0c9193b..aaab37833659 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -86,7 +86,7 @@ #define KMS_DRIVER_MINOR 34 #define KMS_DRIVER_PATCHLEVEL 0 -#define AMDGPU_MAX_TIMEOUT_PARAM_LENTH 256 +#define AMDGPU_MAX_TIMEOUT_PARAM_LENGTH256 int amdgpu_vram_limit = 0; int amdgpu_vis_vram_limit = 0; @@ -100,7 +100,7 @@ int amdgpu_disp_priority = 0; int amdgpu_hw_i2c = 0; int amdgpu_pcie_gen2 = -1; int amdgpu_msi = -1; -static char amdgpu_lockup_timeout[AMDGPU_MAX_TIMEOUT_PARAM_LENTH]; +static char amdgpu_lockup_timeout[AMDGPU_MAX_TIMEOUT_PARAM_LENGTH]; int amdgpu_dpm = -1; int amdgpu_fw_load_type = -1; int amdgpu_aspm = -1; @@ -1327,9 +1327,9 @@ int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout; adev->compute_timeout = MAX_SCHEDULE_TIMEOUT; - if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) { + if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) { while ((timeout_setting = strsep(&input, ",")) && - strnlen(timeout_setting, AMDGPU_MAX_TIMEOUT_PARAM_LENTH)) { + strnlen(timeout_setting, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) { ret = kstrtol(timeout_setting, 0, &timeout); if (ret) return ret; -- 2.23.0
[PATCH] drm/radeon: remove assignment for return value
Remove unnecessary assignment for return value and have the function return the required value directly. Issue found by coccinelle: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Wambui Karuga --- drivers/gpu/drm/radeon/cik.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 62eab82a64f9..daff9a2af3be 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -221,9 +221,7 @@ int ci_get_temp(struct radeon_device *rdev) else actual_temp = temp & 0x1ff; - actual_temp = actual_temp * 1000; - - return actual_temp; + return actual_temp * 1000; } /* get temperature in millidegrees */ @@ -239,9 +237,7 @@ int kv_get_temp(struct radeon_device *rdev) else actual_temp = 0; - actual_temp = actual_temp * 1000; - - return actual_temp; + return actual_temp * 1000; } /* -- 2.23.0
[PATCH] drm/amd/amdgpu: make undeclared variables static
Make the `amdgpu_lockup_timeout` and `amdgpu_exp_hw_support` variables static to remove the following sparse warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:103:19: warning: symbol 'amdgpu_lockup_timeout' was not declared. Should it be static? drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:117:18: warning: symbol 'amdgpu_exp_hw_support' was not declared. Should it be static? Signed-off-by: Wambui Karuga --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3fae1007143e..c5b3c0c9193b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -100,7 +100,7 @@ int amdgpu_disp_priority = 0; int amdgpu_hw_i2c = 0; int amdgpu_pcie_gen2 = -1; int amdgpu_msi = -1; -char amdgpu_lockup_timeout[AMDGPU_MAX_TIMEOUT_PARAM_LENTH]; +static char amdgpu_lockup_timeout[AMDGPU_MAX_TIMEOUT_PARAM_LENTH]; int amdgpu_dpm = -1; int amdgpu_fw_load_type = -1; int amdgpu_aspm = -1; @@ -114,7 +114,7 @@ int amdgpu_vm_block_size = -1; int amdgpu_vm_fault_stop = 0; int amdgpu_vm_debug = 0; int amdgpu_vm_update_mode = -1; -int amdgpu_exp_hw_support = 0; +static int amdgpu_exp_hw_support; int amdgpu_dc = -1; int amdgpu_sched_jobs = 32; int amdgpu_sched_hw_submission = 2; -- 2.23.0