RE: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display
[AMD Official Use Only - Internal Distribution Only] Yes, it just won't report bigger than 16384 mode to user mode, as it won't work properly. Best wishes Emily Deng >-Original Message- >From: Alex Deucher >Sent: Friday, January 8, 2021 11:14 PM >To: Deng, Emily >Cc: amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display > >On Thu, Jan 7, 2021 at 8:45 PM Deng, Emily wrote: >> >> [AMD Official Use Only - Internal Distribution Only] >> >> Ping .. >> > >It's not clear what problem this solves. > >Alex > > >> Best wishes >> Emily Deng >> >> >> >> >-Original Message- >> >From: Emily Deng >> >Sent: Thursday, January 7, 2021 11:29 AM >> >To: amd-gfx@lists.freedesktop.org >> >Cc: Deng, Emily >> >Subject: [PATCH v2] drm/amdgpu:Limit the resolution for >> >virtual_display >> > >> >From: "Emily.Deng" >> > >> >Limit the resolution not bigger than 16384, which means >> >dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384. >> > >> >v2: >> > Refine the code >> > >> >Signed-off-by: Emily.Deng >> >--- >> > drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +-- >> > 1 file changed, 5 insertions(+), 2 deletions(-) >> > >> >diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c >> >b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c >> >index 2b16c8faca34..fd2b3a6dfd60 100644 >> >--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c >> >+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c >> >@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector >> >*connector) static int dce_virtual_get_modes(struct drm_connector >> >*connector) { >> > struct drm_device *dev = connector->dev; >> >+struct amdgpu_device *adev = dev->dev_private; >> > struct drm_display_mode *mode = NULL; unsigned i; static const >> >struct mode_size { @@ -350,8 +351,10 @@ static int >> >dce_virtual_get_modes(struct drm_connector *connector) }; >> > >> > for (i = 0; i < ARRAY_SIZE(common_modes); i++) { -mode = >> >drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false, >> >false, false); -drm_mode_probed_add(connector, mode); >> >+if (adev->mode_info.num_crtc * common_modes[i].w <= >> >16384) { >> >+mode = drm_cvt_mode(dev, common_modes[i].w, >> >common_modes[i].h, 60, false, false, false); >> >+drm_mode_probed_add(connector, mode); } >> > } >> > >> > return 0; >> >-- >> >2.25.1 >> >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f >reedesktop.org%2Fmailman%2Flistinfo%2Famd- >gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C74bdfb637c914153 >938508d8b3e81705%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7 >C637457156687621480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw >MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda >ta=nDMymiTnfGo3ScIogcE8bch6ptCULS2GXSnuLcYQZEA%3Dreserved= >0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF
[AMD Official Use Only - Internal Distribution Only] Ping . >-Original Message- >From: Emily Deng >Sent: Thursday, January 7, 2021 10:51 AM >To: amd-gfx@lists.freedesktop.org >Cc: Deng, Emily >Subject: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov >multiple VF > >From: "Emily.Deng" > >For multiple VF, after engine hang,as host driver will first encounter FLR, so >has no meanning to set compute to 60s. > >v2: > Refine the patch and comment > >Signed-off-by: Emily.Deng >--- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >index 5527c549db82..35edf58c825d 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >@@ -3133,7 +3133,10 @@ static int >amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) > */ > adev->gfx_timeout = msecs_to_jiffies(1); > adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout; >-if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) >+if (amdgpu_sriov_vf(adev)) >+adev->compute_timeout = >amdgpu_sriov_is_pp_one_vf(adev) ? >+msecs_to_jiffies(6) : >msecs_to_jiffies(1); >+else if (amdgpu_passthrough(adev)) > adev->compute_timeout = msecs_to_jiffies(6); > else > adev->compute_timeout = MAX_SCHEDULE_TIMEOUT; >-- >2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for vangogh (v2)
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Evan Quan -Original Message- From: Huang, Ray Sent: Monday, January 11, 2021 12:26 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Liu, Aaron ; Du, Xiaojian ; Hou, Xiaomeng (Matthew) ; Quan, Evan ; Huang, Ray Subject: [PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for vangogh (v2) This patch is to implement the processor fine grain feature for vangogh. It's similar with gfx clock, the only difference is below: echo "p core_id level value" > pp_od_clk_voltage 1. "p" - set the cclk (processor) frequency 2. "core_id" - 0/1/2/3, represents which cpu core you want to select 2. "level" - 0 or 1, "0" represents the min value, "1" represents the max value 3. "value" - the target value of cclk frequency, it should be limited in the safe range v2: fix some missing changes as Evan's suggestion. Signed-off-by: Huang Rui --- .../gpu/drm/amd/include/kgd_pp_interface.h| 1 + drivers/gpu/drm/amd/pm/amdgpu_pm.c| 3 + drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 6 ++ drivers/gpu/drm/amd/pm/inc/smu_types.h| 1 + .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 80 ++- 5 files changed, 90 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index 57b24c4c205b..a41875ac5dfb 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -156,6 +156,7 @@ enum { enum PP_OD_DPM_TABLE_COMMAND { PP_OD_EDIT_SCLK_VDDC_TABLE, PP_OD_EDIT_MCLK_VDDC_TABLE, +PP_OD_EDIT_CCLK_VDDC_TABLE, PP_OD_EDIT_VDDC_CURVE, PP_OD_RESTORE_DEFAULT_TABLE, PP_OD_COMMIT_DPM_TABLE, diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index a5be03aa384b..75cefcb25a44 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -800,6 +800,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev, if (*buf == 's') type = PP_OD_EDIT_SCLK_VDDC_TABLE; +else if (*buf == 'p') +type = PP_OD_EDIT_CCLK_VDDC_TABLE; else if (*buf == 'm') type = PP_OD_EDIT_MCLK_VDDC_TABLE; else if(*buf == 'r') @@ -916,6 +918,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, size += smu_print_clk_levels(>smu, SMU_OD_VDDC_CURVE, buf+size); size += smu_print_clk_levels(>smu, SMU_OD_VDDGFX_OFFSET, buf+size); size += smu_print_clk_levels(>smu, SMU_OD_RANGE, buf+size); +size += smu_print_clk_levels(>smu, SMU_OD_CCLK, buf+size); } else if (adev->powerplay.pp_funcs->print_clock_levels) { size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf); size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf+size); diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 97d788451624..25ee9f51813b 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -465,6 +465,12 @@ struct smu_context uint32_t gfx_default_soft_max_freq; uint32_t gfx_actual_hard_min_freq; uint32_t gfx_actual_soft_max_freq; + +uint32_t cpu_default_soft_min_freq; +uint32_t cpu_default_soft_max_freq; +uint32_t cpu_actual_soft_min_freq; +uint32_t cpu_actual_soft_max_freq; +uint32_t cpu_core_id_select; }; struct i2c_adapter; diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h b/drivers/gpu/drm/amd/pm/inc/smu_types.h index 8e428c728e0e..b76270e8767c 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_types.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h @@ -237,6 +237,7 @@ enum smu_clk_type { SMU_SCLK, SMU_MCLK, SMU_PCIE, +SMU_OD_CCLK, SMU_OD_SCLK, SMU_OD_MCLK, SMU_OD_VDDC_CURVE, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 63be82386964..b2b2955c1024 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -449,11 +449,22 @@ static int vangogh_print_fine_grain_clk(struct smu_context *smu, (smu->gfx_actual_soft_max_freq > 0) ? smu->gfx_actual_soft_max_freq : smu->gfx_default_soft_max_freq); } break; +case SMU_OD_CCLK: +if (smu->od_enabled) { +size = sprintf(buf, "CCLK_RANGE in Core%d:\n", smu->cpu_core_id_select); +size += sprintf(buf + size, "0: %10uMhz\n", +(smu->cpu_actual_soft_min_freq > 0) ? smu->cpu_actual_soft_min_freq : smu->cpu_default_soft_min_freq); +size += sprintf(buf + size, "1: %10uMhz\n", +(smu->cpu_actual_soft_max_freq > 0) ? smu->cpu_actual_soft_max_freq : smu->cpu_default_soft_max_freq); +} +break; case SMU_OD_RANGE: if (smu->od_enabled) { size = sprintf(buf, "%s:\n", "OD_RANGE"); size += sprintf(buf + size, "SCLK: %7uMhz %10uMhz\n", smu->gfx_default_hard_min_freq, smu->gfx_default_soft_max_freq); +size += sprintf(buf + size, "CCLK: %7uMhz %10uMhz\n", +smu->cpu_default_soft_min_freq, smu->cpu_default_soft_max_freq); } break; case SMU_SOCCLK: @@ -1245,7
RE: [PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message (v2)
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Evan Quan -Original Message- From: Huang, Ray Sent: Monday, January 11, 2021 12:26 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Liu, Aaron ; Du, Xiaojian ; Hou, Xiaomeng (Matthew) ; Quan, Evan ; Huang, Ray Subject: [PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message (v2) The user prefers to know the real response value from C2PMSG 90 register which is written by firmware not -EIO. v2: return C2PMSG 90 value Signed-off-by: Huang Rui --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index f8260769061c..59cf650efbd9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -92,7 +92,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu) for (i = 0; i < timeout; i++) { cur_value = RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90); if ((cur_value & MP1_C2PMSG_90__CONTENT_MASK) != 0) -return cur_value == 0x1 ? 0 : -EIO; +return cur_value; udelay(1); } @@ -101,7 +101,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu) if (i == timeout) return -ETIME; -return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90) == 0x1 ? 0 : -EIO; +return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90); } int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, @@ -123,7 +123,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, mutex_lock(>message_lock); ret = smu_cmn_wait_for_response(smu); -if (ret) { +if (ret != 0x1) { dev_err(adev->dev, "Msg issuing pre-check failed and " "SMU may be not in the right state!\n"); goto out; @@ -136,9 +136,9 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, smu_cmn_send_msg_without_waiting(smu, (uint16_t)index); ret = smu_cmn_wait_for_response(smu); -if (ret) { +if (ret != 0x1) { dev_err(adev->dev, "failed send message: %10s (%d) \tparam: 0x%08x response %#x\n", - smu_get_message_name(smu, msg), index, param, ret); +smu_get_message_name(smu, msg), index, param, ret); goto out; } -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for vangogh (v2)
This patch is to implement the processor fine grain feature for vangogh. It's similar with gfx clock, the only difference is below: echo "p core_id level value" > pp_od_clk_voltage 1. "p" - set the cclk (processor) frequency 2. "core_id" - 0/1/2/3, represents which cpu core you want to select 2. "level" - 0 or 1, "0" represents the min value, "1" represents the max value 3. "value" - the target value of cclk frequency, it should be limited in the safe range v2: fix some missing changes as Evan's suggestion. Signed-off-by: Huang Rui --- .../gpu/drm/amd/include/kgd_pp_interface.h| 1 + drivers/gpu/drm/amd/pm/amdgpu_pm.c| 3 + drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 6 ++ drivers/gpu/drm/amd/pm/inc/smu_types.h| 1 + .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 80 ++- 5 files changed, 90 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index 57b24c4c205b..a41875ac5dfb 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -156,6 +156,7 @@ enum { enum PP_OD_DPM_TABLE_COMMAND { PP_OD_EDIT_SCLK_VDDC_TABLE, PP_OD_EDIT_MCLK_VDDC_TABLE, + PP_OD_EDIT_CCLK_VDDC_TABLE, PP_OD_EDIT_VDDC_CURVE, PP_OD_RESTORE_DEFAULT_TABLE, PP_OD_COMMIT_DPM_TABLE, diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index a5be03aa384b..75cefcb25a44 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -800,6 +800,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev, if (*buf == 's') type = PP_OD_EDIT_SCLK_VDDC_TABLE; + else if (*buf == 'p') + type = PP_OD_EDIT_CCLK_VDDC_TABLE; else if (*buf == 'm') type = PP_OD_EDIT_MCLK_VDDC_TABLE; else if(*buf == 'r') @@ -916,6 +918,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, size += smu_print_clk_levels(>smu, SMU_OD_VDDC_CURVE, buf+size); size += smu_print_clk_levels(>smu, SMU_OD_VDDGFX_OFFSET, buf+size); size += smu_print_clk_levels(>smu, SMU_OD_RANGE, buf+size); + size += smu_print_clk_levels(>smu, SMU_OD_CCLK, buf+size); } else if (adev->powerplay.pp_funcs->print_clock_levels) { size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf); size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf+size); diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h index 97d788451624..25ee9f51813b 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h @@ -465,6 +465,12 @@ struct smu_context uint32_t gfx_default_soft_max_freq; uint32_t gfx_actual_hard_min_freq; uint32_t gfx_actual_soft_max_freq; + + uint32_t cpu_default_soft_min_freq; + uint32_t cpu_default_soft_max_freq; + uint32_t cpu_actual_soft_min_freq; + uint32_t cpu_actual_soft_max_freq; + uint32_t cpu_core_id_select; }; struct i2c_adapter; diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h b/drivers/gpu/drm/amd/pm/inc/smu_types.h index 8e428c728e0e..b76270e8767c 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_types.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h @@ -237,6 +237,7 @@ enum smu_clk_type { SMU_SCLK, SMU_MCLK, SMU_PCIE, + SMU_OD_CCLK, SMU_OD_SCLK, SMU_OD_MCLK, SMU_OD_VDDC_CURVE, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 63be82386964..b2b2955c1024 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -449,11 +449,22 @@ static int vangogh_print_fine_grain_clk(struct smu_context *smu, (smu->gfx_actual_soft_max_freq > 0) ? smu->gfx_actual_soft_max_freq : smu->gfx_default_soft_max_freq); } break; + case SMU_OD_CCLK: + if (smu->od_enabled) { + size = sprintf(buf, "CCLK_RANGE in Core%d:\n", smu->cpu_core_id_select); + size += sprintf(buf + size, "0: %10uMhz\n", + (smu->cpu_actual_soft_min_freq > 0) ? smu->cpu_actual_soft_min_freq : smu->cpu_default_soft_min_freq); + size += sprintf(buf + size, "1: %10uMhz\n", + (smu->cpu_actual_soft_max_freq > 0) ? smu->cpu_actual_soft_max_freq : smu->cpu_default_soft_max_freq); + } + break; case SMU_OD_RANGE: if (smu->od_enabled) { size = sprintf(buf, "%s:\n", "OD_RANGE"); size += sprintf(buf + size, "SCLK: %7uMhz %10uMhz\n",
[PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message (v2)
The user prefers to know the real response value from C2PMSG 90 register which is written by firmware not -EIO. v2: return C2PMSG 90 value Signed-off-by: Huang Rui --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index f8260769061c..59cf650efbd9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -92,7 +92,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu) for (i = 0; i < timeout; i++) { cur_value = RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90); if ((cur_value & MP1_C2PMSG_90__CONTENT_MASK) != 0) - return cur_value == 0x1 ? 0 : -EIO; + return cur_value; udelay(1); } @@ -101,7 +101,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu) if (i == timeout) return -ETIME; - return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90) == 0x1 ? 0 : -EIO; + return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90); } int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, @@ -123,7 +123,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, mutex_lock(>message_lock); ret = smu_cmn_wait_for_response(smu); - if (ret) { + if (ret != 0x1) { dev_err(adev->dev, "Msg issuing pre-check failed and " "SMU may be not in the right state!\n"); goto out; @@ -136,9 +136,9 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, smu_cmn_send_msg_without_waiting(smu, (uint16_t)index); ret = smu_cmn_wait_for_response(smu); - if (ret) { + if (ret != 0x1) { dev_err(adev->dev, "failed send message: %10s (%d) \tparam: 0x%08x response %#x\n", - smu_get_message_name(smu, msg), index, param, ret); + smu_get_message_name(smu, msg), index, param, ret); goto out; } -- 2.25.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout
Hi Horace, The XGMI part should be already well protected by hive->hive_lock. So, I think you need the non-XGMI part only. Also, it seems better to place the modifications within the hive check. /* * Here we trylock to avoid chain of resets executing from * either trigger by jobs on different adevs in XGMI hive or jobs on * different schedulers for same device while this TO handler is running. * We always reset all schedulers for device and all devices for XGMI * hive so that should take care of them too. */ hive = amdgpu_get_xgmi_hive(adev); if (hive) { if (atomic_cmpxchg(>in_reset, 0, 1) != 0) { DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", job ? job->base.id : -1, hive->hive_id); amdgpu_put_xgmi_hive(hive); return 0; } mutex_lock(>hive_lock); } else { + /* if current dev is already in reset, skip adding list to prevent race issue */ + if (!amdgpu_device_lock_adev(adev, hive)) { + dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", + job ? job->base.id : -1); + r = 0; + goto skip_recovery; + } } BR Evan -Original Message- From: amd-gfx On Behalf Of Chen, Horace Sent: Friday, January 8, 2021 7:35 PM To: Chen, Horace ; amd-gfx@lists.freedesktop.org Cc: Xiao, Jack ; Xu, Feifei ; Wang, Kevin(Yang) ; Tuikov, Luben ; Deucher, Alexander ; Koenig, Christian ; Liu, Monk ; Zhang, Hawking Subject: RE: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout [AMD Public Use] Hi Christian, Can you help review this change? This issue happens when 2 jobs on 2 schedulers time out at the same time. Which will lead 2 threads to enter amdgpu_device_gpu_recover() at the same time. The problem is that if device is not an XGMI node, the adev->gmc.xgmi.head will be added to device_list which is a stack variable. So the first thread will get the device in to its device list and start to iterate, meanwhile the second thread may rob the device away from the first thread and add to its own device list. This will cause the first thread get in to a bad state in its iteration. The solution is to lock the device earily, before we add device to the local device list. Thanks & Regards, Horace. -Original Message- From: Horace Chen Sent: Wednesday, January 6, 2021 8:43 PM To: amd-gfx@lists.freedesktop.org Cc: Chen, Horace ; Tuikov, Luben ; Koenig, Christian ; Deucher, Alexander ; Xiao, Jack ; Zhang, Hawking ; Liu, Monk ; Xu, Feifei ; Wang, Kevin(Yang) ; Xiaojie Yuan Subject: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout Fix a racing issue when 2 rings job timeout simultaneously. If 2 rings timed out at the same time, the amdgpu_device_gpu_recover will be reentered. Then the adev->gmc.xgmi.head will be grabbed by 2 local linked list, which may cause wild pointer issue in iterating. lock the device earily to prevent the node be added to 2 different lists. Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 -- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9a3cb98d03be..233dae27c8eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4620,23 +4620,34 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (adev->gmc.xgmi.num_physical_nodes > 1) { if (!hive) return -ENODEV; + + list_for_each_entry(tmp_adev, >device_list, gmc.xgmi.head) { + if (!amdgpu_device_lock_adev(tmp_adev, hive)) { + dev_info(tmp_adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", + job ? job->base.id : -1); + r = 0; + goto skip_recovery; + } + } + if (!list_is_first(>gmc.xgmi.head, >device_list)) list_rotate_to_front(>gmc.xgmi.head, >device_list); device_list_handle = >device_list; } else { + /* if current dev is already in reset, skip adding list to prevent race issue */ + if (!amdgpu_device_lock_adev(adev, hive)) { + dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", + job ? job->base.id : -1); + r = 0; + goto skip_recovery; +
Re: [PATCH 06/21] drm/amd/display: Remove HUBP_DISABLE from default
Hi, After a discussion, we decided to drop this patch from the weekly promotion. Please, don't apply this change to amd-staging-drm-next. Thanks On 01/08, Rodrigo Siqueira wrote: > From: Wesley Chalmers > > [WHY] > HW team plans to rename HUBP_DISABLE to HUBP_SOFT_RESET in future HW > revisions. Those future revisions should not inherit the HUBP_DISABLE > name. > > Signed-off-by: Wesley Chalmers > Reviewed-by: Aric Cyr > Acked-by: Rodrigo Siqueira > --- > .../gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h | 2 +- > .../gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h | 22 ++- > 2 files changed, 18 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h > b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h > index a9a6ed7f4f99..80794fed6e20 100644 > --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h > +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h > @@ -450,7 +450,6 @@ > > #define DCN_HUBP_REG_FIELD_BASE_LIST(type) \ > type HUBP_BLANK_EN;\ > - type HUBP_DISABLE;\ > type HUBP_TTU_DISABLE;\ > type HUBP_NO_OUTSTANDING_REQ;\ > type HUBP_VTG_SEL;\ > @@ -644,6 +643,7 @@ > > #define DCN_HUBP_REG_FIELD_LIST(type) \ > DCN_HUBP_REG_FIELD_BASE_LIST(type);\ > + type HUBP_DISABLE;\ > type ALPHA_PLANE_EN > > struct dcn_mi_registers { > diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h > b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h > index f501c02c244b..98ec1f9171b6 100644 > --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h > +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h > @@ -161,7 +161,7 @@ > DCN21_HUBP_REG_COMMON_VARIABLE_LIST;\ > uint32_t DCN_DMDATA_VM_CNTL > > -#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > +#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \ > DCN_HUBP_REG_FIELD_BASE_LIST(type); \ > type DMDATA_ADDRESS_HIGH;\ > type DMDATA_MODE;\ > @@ -186,8 +186,12 @@ > type SURFACE_TRIPLE_BUFFER_ENABLE;\ > type VMID > > -#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > - DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type);\ > +#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > + DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type); \ > + type HUBP_DISABLE > + > +#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \ > + DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\ > type REFCYC_PER_VM_GROUP_FLIP;\ > type REFCYC_PER_VM_REQ_FLIP;\ > type REFCYC_PER_VM_GROUP_VBLANK;\ > @@ -196,8 +200,12 @@ > type REFCYC_PER_META_CHUNK_FLIP_C; \ > type VM_GROUP_SIZE > > -#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > - DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type);\ > +#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > + DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\ > + type HUBP_DISABLE > + > +#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \ > + DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\ > type PRIMARY_SURFACE_DCC_IND_BLK;\ > type SECONDARY_SURFACE_DCC_IND_BLK;\ > type PRIMARY_SURFACE_DCC_IND_BLK_C;\ > @@ -216,6 +224,10 @@ > type ROW_TTU_MODE; \ > type NUM_PKRS > > +#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST(type) \ > + DCN30_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\ > + type HUBP_DISABLE > + > struct dcn_hubp2_registers { > DCN30_HUBP_REG_COMMON_VARIABLE_LIST; > }; > -- > 2.25.1 > -- Rodrigo Siqueira https://siqueira.tech signature.asc Description: PGP signature ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
Hi folks, today I joined to testing Kernel 5.11 and saw that the kernel log was flooded with BUG messages: BUG: sleeping function called from invalid context at mm/vmalloc.c:1756 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0 INFO: lockdep is turned off. CPU: 15 PID: 266 Comm: kswapd0 Tainted: GW- --- 5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1 Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 2802 10/21/2020 Call Trace: dump_stack+0x8b/0xb0 ___might_sleep.cold+0xb6/0xc6 vm_unmap_aliases+0x21/0x40 change_page_attr_set_clr+0x9e/0x190 set_memory_wb+0x2f/0x80 ttm_pool_free_page+0x28/0x90 [ttm] ttm_pool_shrink+0x45/0xb0 [ttm] ttm_pool_shrinker_scan+0xa/0x20 [ttm] do_shrink_slab+0x177/0x3a0 shrink_slab+0x9c/0x290 shrink_node+0x2e6/0x700 balance_pgdat+0x2f5/0x650 kswapd+0x21d/0x4d0 ? do_wait_intr_irq+0xd0/0xd0 ? balance_pgdat+0x650/0x650 kthread+0x13a/0x150 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 But the most unpleasant thing is that after a while the monitor turns off and does not go on again until the restart. This is accompanied by an entry in the kernel log: amdgpu :0b:00.0: amdgpu: ff7d8b94 pin failed [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12 $ grep "Failed to pin framebuffer with error" -Rn . ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816: DRM_ERROR("Failed to pin framebuffer with error %d\n", r); $ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c Blaming lines: 0% (11/9167), done. 5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811) domain = AMDGPU_GEM_DOMAIN_VRAM; e7b07ceef2a65 (Harry Wentland 2017-08-10 13:29:07 -0400 5812) 7b7c6c81b3a37 (Junwei Zhang2018-06-25 12:51:14 +0800 5813) r = amdgpu_bo_pin(rbo, domain); e7b07ceef2a65 (Harry Wentland 2017-08-10 13:29:07 -0400 5814) if (unlikely(r != 0)) { 30b7c6147d18d (Harry Wentland 2017-10-26 15:35:14 -0400 5815) if (r != -ERESTARTSYS) 30b7c6147d18d (Harry Wentland 2017-10-26 15:35:14 -0400 5816) DRM_ERROR("Failed to pin framebuffer with error %d\n", r); 0f257b09531b4 (Chunming Zhou 2019-05-07 19:45:31 +0800 5817) ttm_eu_backoff_reservation(, ); e7b07ceef2a65 (Harry Wentland 2017-08-10 13:29:07 -0400 5818) return r; e7b07ceef2a65 (Harry Wentland 2017-08-10 13:29:07 -0400 5819) } e7b07ceef2a65 (Harry Wentland 2017-08-10 13:29:07 -0400 5820) bb812f1ea87dd (Junwei Zhang2018-06-25 13:32:24 +0800 5821) r = amdgpu_ttm_alloc_gart(>tbo); Who knows how to fix it? Full kernel logs is here: [1] https://pastebin.com/fLasjDHX [2] https://pastebin.com/g3wR2r9e -- Best Regards, Mike Gavrilov. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx