RE: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display

2021-01-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Yes, it just won't report bigger than 16384 mode to user mode, as it won't work 
properly.

Best wishes
Emily Deng



>-Original Message-
>From: Alex Deucher 
>Sent: Friday, January 8, 2021 11:14 PM
>To: Deng, Emily 
>Cc: amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH v2] drm/amdgpu:Limit the resolution for virtual_display
>
>On Thu, Jan 7, 2021 at 8:45 PM Deng, Emily  wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Ping ..
>>
>
>It's not clear what problem this solves.
>
>Alex
>
>
>> Best wishes
>> Emily Deng
>>
>>
>>
>> >-Original Message-
>> >From: Emily Deng 
>> >Sent: Thursday, January 7, 2021 11:29 AM
>> >To: amd-gfx@lists.freedesktop.org
>> >Cc: Deng, Emily 
>> >Subject: [PATCH v2] drm/amdgpu:Limit the resolution for
>> >virtual_display
>> >
>> >From: "Emily.Deng" 
>> >
>> >Limit the resolution not bigger than 16384, which means
>> >dev->mode_info.num_crtc * common_modes[i].w not bigger than 16384.
>> >
>> >v2:
>> >  Refine the code
>> >
>> >Signed-off-by: Emily.Deng 
>> >---
>> > drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 7 +--
>> > 1 file changed, 5 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >index 2b16c8faca34..fd2b3a6dfd60 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>> >@@ -319,6 +319,7 @@ dce_virtual_encoder(struct drm_connector
>> >*connector)  static int dce_virtual_get_modes(struct drm_connector
>> >*connector)  {
>> > struct drm_device *dev = connector->dev;
>> >+struct amdgpu_device *adev = dev->dev_private;
>> > struct drm_display_mode *mode = NULL;  unsigned i;  static const
>> >struct mode_size { @@ -350,8 +351,10 @@ static int
>> >dce_virtual_get_modes(struct drm_connector *connector)  };
>> >
>> > for (i = 0; i < ARRAY_SIZE(common_modes); i++) { -mode =
>> >drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h, 60, false,
>> >false, false); -drm_mode_probed_add(connector, mode);
>> >+if (adev->mode_info.num_crtc * common_modes[i].w <=
>> >16384) {
>> >+mode = drm_cvt_mode(dev, common_modes[i].w,
>> >common_modes[i].h, 60, false, false, false);
>> >+drm_mode_probed_add(connector, mode); }
>> > }
>> >
>> > return 0;
>> >--
>> >2.25.1
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.f
>reedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfxdata=04%7C01%7CEmily.Deng%40amd.com%7C74bdfb637c914153
>938508d8b3e81705%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637457156687621480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sda
>ta=nDMymiTnfGo3ScIogcE8bch6ptCULS2GXSnuLcYQZEA%3Dreserved=
>0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov multiple VF

2021-01-10 Thread Deng, Emily
[AMD Official Use Only - Internal Distribution Only]

Ping .

>-Original Message-
>From: Emily Deng 
>Sent: Thursday, January 7, 2021 10:51 AM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily 
>Subject: [PATCH v2] drm/amdgpu: Decrease compute timeout to 10 s for sriov
>multiple VF
>
>From: "Emily.Deng" 
>
>For multiple VF, after engine hang,as host driver will first encounter FLR, so
>has no meanning to set compute to 60s.
>
>v2:
>   Refine the patch and comment
>
>Signed-off-by: Emily.Deng 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 5527c549db82..35edf58c825d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3133,7 +3133,10 @@ static int
>amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>  */
> adev->gfx_timeout = msecs_to_jiffies(1);
> adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
>-if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
>+if (amdgpu_sriov_vf(adev))
>+adev->compute_timeout =
>amdgpu_sriov_is_pp_one_vf(adev) ?
>+msecs_to_jiffies(6) :
>msecs_to_jiffies(1);
>+else if (amdgpu_passthrough(adev))
> adev->compute_timeout =  msecs_to_jiffies(6);
> else
> adev->compute_timeout = MAX_SCHEDULE_TIMEOUT;
>--
>2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for vangogh (v2)

2021-01-10 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: Huang, Ray 
Sent: Monday, January 11, 2021 12:26 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Liu, Aaron 
; Du, Xiaojian ; Hou, Xiaomeng 
(Matthew) ; Quan, Evan ; Huang, Ray 

Subject: [PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for 
vangogh (v2)

This patch is to implement the processor fine grain feature for vangogh.
It's similar with gfx clock, the only difference is below:

echo "p core_id level value" > pp_od_clk_voltage

1. "p" - set the cclk (processor) frequency 2. "core_id" - 0/1/2/3, represents 
which cpu core you want to select 2. "level" - 0 or 1, "0" represents the min 
value,  "1" represents the
   max value
3. "value" - the target value of cclk frequency, it should be limited in
   the safe range

v2: fix some missing changes as Evan's suggestion.

Signed-off-by: Huang Rui 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h|  1 +
 drivers/gpu/drm/amd/pm/amdgpu_pm.c|  3 +
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  6 ++
 drivers/gpu/drm/amd/pm/inc/smu_types.h|  1 +
 .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 80 ++-
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 57b24c4c205b..a41875ac5dfb 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -156,6 +156,7 @@ enum {
 enum PP_OD_DPM_TABLE_COMMAND {
 PP_OD_EDIT_SCLK_VDDC_TABLE,
 PP_OD_EDIT_MCLK_VDDC_TABLE,
+PP_OD_EDIT_CCLK_VDDC_TABLE,
 PP_OD_EDIT_VDDC_CURVE,
 PP_OD_RESTORE_DEFAULT_TABLE,
 PP_OD_COMMIT_DPM_TABLE,
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index a5be03aa384b..75cefcb25a44 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -800,6 +800,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device 
*dev,

 if (*buf == 's')
 type = PP_OD_EDIT_SCLK_VDDC_TABLE;
+else if (*buf == 'p')
+type = PP_OD_EDIT_CCLK_VDDC_TABLE;
 else if (*buf == 'm')
 type = PP_OD_EDIT_MCLK_VDDC_TABLE;
 else if(*buf == 'r')
@@ -916,6 +918,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device 
*dev,
 size += smu_print_clk_levels(>smu, SMU_OD_VDDC_CURVE, buf+size);
 size += smu_print_clk_levels(>smu, SMU_OD_VDDGFX_OFFSET, buf+size);
 size += smu_print_clk_levels(>smu, SMU_OD_RANGE, buf+size);
+size += smu_print_clk_levels(>smu, SMU_OD_CCLK, buf+size);
 } else if (adev->powerplay.pp_funcs->print_clock_levels) {
 size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf);
 size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf+size); diff --git 
a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 97d788451624..25ee9f51813b 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -465,6 +465,12 @@ struct smu_context
 uint32_t gfx_default_soft_max_freq;
 uint32_t gfx_actual_hard_min_freq;
 uint32_t gfx_actual_soft_max_freq;
+
+uint32_t cpu_default_soft_min_freq;
+uint32_t cpu_default_soft_max_freq;
+uint32_t cpu_actual_soft_min_freq;
+uint32_t cpu_actual_soft_max_freq;
+uint32_t cpu_core_id_select;
 };

 struct i2c_adapter;
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 8e428c728e0e..b76270e8767c 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -237,6 +237,7 @@ enum smu_clk_type {
 SMU_SCLK,
 SMU_MCLK,
 SMU_PCIE,
+SMU_OD_CCLK,
 SMU_OD_SCLK,
 SMU_OD_MCLK,
 SMU_OD_VDDC_CURVE,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 63be82386964..b2b2955c1024 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -449,11 +449,22 @@ static int vangogh_print_fine_grain_clk(struct 
smu_context *smu,
 (smu->gfx_actual_soft_max_freq > 0) ? smu->gfx_actual_soft_max_freq : 
smu->gfx_default_soft_max_freq);
 }
 break;
+case SMU_OD_CCLK:
+if (smu->od_enabled) {
+size = sprintf(buf, "CCLK_RANGE in Core%d:\n",  smu->cpu_core_id_select);
+size += sprintf(buf + size, "0: %10uMhz\n",
+(smu->cpu_actual_soft_min_freq > 0) ? smu->cpu_actual_soft_min_freq : 
smu->cpu_default_soft_min_freq);
+size += sprintf(buf + size, "1: %10uMhz\n",
+(smu->cpu_actual_soft_max_freq > 0) ? smu->cpu_actual_soft_max_freq : 
smu->cpu_default_soft_max_freq);
+}
+break;
 case SMU_OD_RANGE:
 if (smu->od_enabled) {
 size = sprintf(buf, "%s:\n", "OD_RANGE");
 size += sprintf(buf + size, "SCLK: %7uMhz %10uMhz\n",
 smu->gfx_default_hard_min_freq, smu->gfx_default_soft_max_freq);
+size += sprintf(buf + size, "CCLK: %7uMhz %10uMhz\n",
+smu->cpu_default_soft_min_freq, smu->cpu_default_soft_max_freq);
 }
 break;
 case SMU_SOCCLK:
@@ -1245,7 

RE: [PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message (v2)

2021-01-10 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Evan Quan 

-Original Message-
From: Huang, Ray 
Sent: Monday, January 11, 2021 12:26 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Liu, Aaron 
; Du, Xiaojian ; Hou, Xiaomeng 
(Matthew) ; Quan, Evan ; Huang, Ray 

Subject: [PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message 
(v2)

The user prefers to know the real response value from C2PMSG 90 register
which is written by firmware not -EIO.

v2: return C2PMSG 90 value

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index f8260769061c..59cf650efbd9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -92,7 +92,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu)
 for (i = 0; i < timeout; i++) {
 cur_value = RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90);
 if ((cur_value & MP1_C2PMSG_90__CONTENT_MASK) != 0)
-return cur_value == 0x1 ? 0 : -EIO;
+return cur_value;

 udelay(1);
 }
@@ -101,7 +101,7 @@ static int smu_cmn_wait_for_response(struct smu_context 
*smu)
 if (i == timeout)
 return -ETIME;

-return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90) == 0x1 ? 0 : -EIO;
+return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90);
 }

 int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
@@ -123,7 +123,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,

 mutex_lock(>message_lock);
 ret = smu_cmn_wait_for_response(smu);
-if (ret) {
+if (ret != 0x1) {
 dev_err(adev->dev, "Msg issuing pre-check failed and "
"SMU may be not in the right state!\n");
 goto out;
@@ -136,9 +136,9 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
 smu_cmn_send_msg_without_waiting(smu, (uint16_t)index);

 ret = smu_cmn_wait_for_response(smu);
-if (ret) {
+if (ret != 0x1) {
 dev_err(adev->dev, "failed send message: %10s (%d) \tparam: 0x%08x response 
%#x\n",
-   smu_get_message_name(smu, msg), index, param, ret);
+smu_get_message_name(smu, msg), index, param, ret);
 goto out;
 }

--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 7/7] drm/amd/pm: implement processor fine grain feature for vangogh (v2)

2021-01-10 Thread Huang Rui
This patch is to implement the processor fine grain feature for vangogh.
It's similar with gfx clock, the only difference is below:

echo "p core_id level value" > pp_od_clk_voltage

1. "p" - set the cclk (processor) frequency
2. "core_id" - 0/1/2/3, represents which cpu core you want to select
2. "level" - 0 or 1, "0" represents the min value,  "1" represents the
   max value
3. "value" - the target value of cclk frequency, it should be limited in
   the safe range

v2: fix some missing changes as Evan's suggestion.

Signed-off-by: Huang Rui 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h|  1 +
 drivers/gpu/drm/amd/pm/amdgpu_pm.c|  3 +
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h   |  6 ++
 drivers/gpu/drm/amd/pm/inc/smu_types.h|  1 +
 .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 80 ++-
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 57b24c4c205b..a41875ac5dfb 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -156,6 +156,7 @@ enum {
 enum PP_OD_DPM_TABLE_COMMAND {
PP_OD_EDIT_SCLK_VDDC_TABLE,
PP_OD_EDIT_MCLK_VDDC_TABLE,
+   PP_OD_EDIT_CCLK_VDDC_TABLE,
PP_OD_EDIT_VDDC_CURVE,
PP_OD_RESTORE_DEFAULT_TABLE,
PP_OD_COMMIT_DPM_TABLE,
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index a5be03aa384b..75cefcb25a44 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -800,6 +800,8 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device 
*dev,
 
if (*buf == 's')
type = PP_OD_EDIT_SCLK_VDDC_TABLE;
+   else if (*buf == 'p')
+   type = PP_OD_EDIT_CCLK_VDDC_TABLE;
else if (*buf == 'm')
type = PP_OD_EDIT_MCLK_VDDC_TABLE;
else if(*buf == 'r')
@@ -916,6 +918,7 @@ static ssize_t amdgpu_get_pp_od_clk_voltage(struct device 
*dev,
size += smu_print_clk_levels(>smu, SMU_OD_VDDC_CURVE, 
buf+size);
size += smu_print_clk_levels(>smu, SMU_OD_VDDGFX_OFFSET, 
buf+size);
size += smu_print_clk_levels(>smu, SMU_OD_RANGE, 
buf+size);
+   size += smu_print_clk_levels(>smu, SMU_OD_CCLK, buf+size);
} else if (adev->powerplay.pp_funcs->print_clock_levels) {
size = amdgpu_dpm_print_clock_levels(adev, OD_SCLK, buf);
size += amdgpu_dpm_print_clock_levels(adev, OD_MCLK, buf+size);
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 97d788451624..25ee9f51813b 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -465,6 +465,12 @@ struct smu_context
uint32_t gfx_default_soft_max_freq;
uint32_t gfx_actual_hard_min_freq;
uint32_t gfx_actual_soft_max_freq;
+
+   uint32_t cpu_default_soft_min_freq;
+   uint32_t cpu_default_soft_max_freq;
+   uint32_t cpu_actual_soft_min_freq;
+   uint32_t cpu_actual_soft_max_freq;
+   uint32_t cpu_core_id_select;
 };
 
 struct i2c_adapter;
diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/inc/smu_types.h
index 8e428c728e0e..b76270e8767c 100644
--- a/drivers/gpu/drm/amd/pm/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h
@@ -237,6 +237,7 @@ enum smu_clk_type {
SMU_SCLK,
SMU_MCLK,
SMU_PCIE,
+   SMU_OD_CCLK,
SMU_OD_SCLK,
SMU_OD_MCLK,
SMU_OD_VDDC_CURVE,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 63be82386964..b2b2955c1024 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -449,11 +449,22 @@ static int vangogh_print_fine_grain_clk(struct 
smu_context *smu,
(smu->gfx_actual_soft_max_freq > 0) ? 
smu->gfx_actual_soft_max_freq : smu->gfx_default_soft_max_freq);
}
break;
+   case SMU_OD_CCLK:
+   if (smu->od_enabled) {
+   size = sprintf(buf, "CCLK_RANGE in Core%d:\n",  
smu->cpu_core_id_select);
+   size += sprintf(buf + size, "0: %10uMhz\n",
+   (smu->cpu_actual_soft_min_freq > 0) ? 
smu->cpu_actual_soft_min_freq : smu->cpu_default_soft_min_freq);
+   size += sprintf(buf + size, "1: %10uMhz\n",
+   (smu->cpu_actual_soft_max_freq > 0) ? 
smu->cpu_actual_soft_max_freq : smu->cpu_default_soft_max_freq);
+   }
+   break;
case SMU_OD_RANGE:
if (smu->od_enabled) {
size = sprintf(buf, "%s:\n", "OD_RANGE");
size += sprintf(buf + size, "SCLK: %7uMhz %10uMhz\n",
 

[PATCH v2 2/7] drm/amd/pm: enhance the real response for smu message (v2)

2021-01-10 Thread Huang Rui
The user prefers to know the real response value from C2PMSG 90 register
which is written by firmware not -EIO.

v2: return C2PMSG 90 value

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index f8260769061c..59cf650efbd9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -92,7 +92,7 @@ static int smu_cmn_wait_for_response(struct smu_context *smu)
for (i = 0; i < timeout; i++) {
cur_value = RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90);
if ((cur_value & MP1_C2PMSG_90__CONTENT_MASK) != 0)
-   return cur_value == 0x1 ? 0 : -EIO;
+   return cur_value;
 
udelay(1);
}
@@ -101,7 +101,7 @@ static int smu_cmn_wait_for_response(struct smu_context 
*smu)
if (i == timeout)
return -ETIME;
 
-   return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90) == 0x1 ? 0 : 
-EIO;
+   return RREG32_SOC15_NO_KIQ(MP1, 0, mmMP1_SMN_C2PMSG_90);
 }
 
 int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
@@ -123,7 +123,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
 
mutex_lock(>message_lock);
ret = smu_cmn_wait_for_response(smu);
-   if (ret) {
+   if (ret != 0x1) {
dev_err(adev->dev, "Msg issuing pre-check failed and "
   "SMU may be not in the right state!\n");
goto out;
@@ -136,9 +136,9 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
smu_cmn_send_msg_without_waiting(smu, (uint16_t)index);
 
ret = smu_cmn_wait_for_response(smu);
-   if (ret) {
+   if (ret != 0x1) {
dev_err(adev->dev, "failed send message: %10s (%d) \tparam: 
0x%08x response %#x\n",
-  smu_get_message_name(smu, msg), index, param, ret);
+   smu_get_message_name(smu, msg), index, param, ret);
goto out;
}
 
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout

2021-01-10 Thread Quan, Evan
Hi Horace,

The XGMI part should be already well protected by hive->hive_lock. So, I think 
you need the non-XGMI part only.
Also, it seems better to place the modifications within the hive check.
/*
 * Here we trylock to avoid chain of resets executing from
 * either trigger by jobs on different adevs in XGMI hive or jobs on
 * different schedulers for same device while this TO handler is 
running.
 * We always reset all schedulers for device and all devices for XGMI
 * hive so that should take care of them too.
 */
hive = amdgpu_get_xgmi_hive(adev);
if (hive) {
if (atomic_cmpxchg(>in_reset, 0, 1) != 0) {
DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as 
another already in progress",
job ? job->base.id : -1, hive->hive_id);
amdgpu_put_xgmi_hive(hive);
return 0;
}
mutex_lock(>hive_lock);
} else {
+   /* if current dev is already in reset, skip adding list to 
prevent race issue */
+   if (!amdgpu_device_lock_adev(adev, hive)) {
+   dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as 
another already in progress",
+   job ? job->base.id : -1);
+   r = 0;
+   goto skip_recovery;
+   }
}

BR
Evan
-Original Message-
From: amd-gfx  On Behalf Of Chen, Horace
Sent: Friday, January 8, 2021 7:35 PM
To: Chen, Horace ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Tuikov, Luben ; 
Deucher, Alexander ; Koenig, Christian 
; Liu, Monk ; Zhang, Hawking 

Subject: RE: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout

[AMD Public Use]

Hi Christian,

Can you help review this change?

This issue happens when 2 jobs on 2 schedulers time out at the same time. Which 
will lead 2 threads to enter amdgpu_device_gpu_recover() at the same time. The 
problem is that if device is not an XGMI node, the adev->gmc.xgmi.head will be 
added to device_list which is a stack variable. 
So the first thread will get the device in to its device list and start to 
iterate, meanwhile the second thread may rob the device away from the first 
thread and add to its own device list. This will cause the first thread get in 
to a bad state in its iteration.

The solution is to lock the device earily, before we add device to the local 
device list.

Thanks & Regards,
Horace.

-Original Message-
From: Horace Chen  
Sent: Wednesday, January 6, 2021 8:43 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Horace ; Tuikov, Luben ; 
Koenig, Christian ; Deucher, Alexander 
; Xiao, Jack ; Zhang, Hawking 
; Liu, Monk ; Xu, Feifei 
; Wang, Kevin(Yang) ; Xiaojie Yuan 

Subject: [PATCH] drm/amdgpu: fix issue when 2 ring job timeout

Fix a racing issue when 2 rings job timeout simultaneously.

If 2 rings timed out at the same time, the
amdgpu_device_gpu_recover will be reentered. Then the
adev->gmc.xgmi.head will be grabbed by 2 local linked list,
which may cause wild pointer issue in iterating.

lock the device earily to prevent the node be added to 2
different lists.

Signed-off-by: Horace Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 --
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 9a3cb98d03be..233dae27c8eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4620,23 +4620,34 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
*adev,
if (adev->gmc.xgmi.num_physical_nodes > 1) {
if (!hive)
return -ENODEV;
+
+   list_for_each_entry(tmp_adev, >device_list, 
gmc.xgmi.head) {
+   if (!amdgpu_device_lock_adev(tmp_adev, hive)) {
+   dev_info(tmp_adev->dev, "Bailing on TDR for 
s_job:%llx, as another already in progress",
+   job ? job->base.id : -1);
+   r = 0;
+   goto skip_recovery;
+   }
+   }
+
if (!list_is_first(>gmc.xgmi.head, >device_list))
list_rotate_to_front(>gmc.xgmi.head, 
>device_list);
device_list_handle = >device_list;
} else {
+   /* if current dev is already in reset, skip adding list to 
prevent race issue */
+   if (!amdgpu_device_lock_adev(adev, hive)) {
+   dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as 
another already in progress",
+   job ? job->base.id : -1);
+   r = 0;
+   goto skip_recovery;
+   

Re: [PATCH 06/21] drm/amd/display: Remove HUBP_DISABLE from default

2021-01-10 Thread Rodrigo Siqueira
Hi,

After a discussion, we decided to drop this patch from the weekly
promotion. Please, don't apply this change to amd-staging-drm-next.

Thanks

On 01/08, Rodrigo Siqueira wrote:
> From: Wesley Chalmers 
> 
> [WHY]
> HW team plans to rename HUBP_DISABLE to HUBP_SOFT_RESET in future HW
> revisions. Those future revisions should not inherit the HUBP_DISABLE
> name.
> 
> Signed-off-by: Wesley Chalmers 
> Reviewed-by: Aric Cyr 
> Acked-by: Rodrigo Siqueira 
> ---
>  .../gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h |  2 +-
>  .../gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h | 22 ++-
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h 
> b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h
> index a9a6ed7f4f99..80794fed6e20 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h
> +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h
> @@ -450,7 +450,6 @@
>  
>  #define DCN_HUBP_REG_FIELD_BASE_LIST(type) \
>   type HUBP_BLANK_EN;\
> - type HUBP_DISABLE;\
>   type HUBP_TTU_DISABLE;\
>   type HUBP_NO_OUTSTANDING_REQ;\
>   type HUBP_VTG_SEL;\
> @@ -644,6 +643,7 @@
>  
>  #define DCN_HUBP_REG_FIELD_LIST(type) \
>   DCN_HUBP_REG_FIELD_BASE_LIST(type);\
> + type HUBP_DISABLE;\
>   type ALPHA_PLANE_EN
>  
>  struct dcn_mi_registers {
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h 
> b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h
> index f501c02c244b..98ec1f9171b6 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h
> +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.h
> @@ -161,7 +161,7 @@
>   DCN21_HUBP_REG_COMMON_VARIABLE_LIST;\
>   uint32_t DCN_DMDATA_VM_CNTL
>  
> -#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> +#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \
>   DCN_HUBP_REG_FIELD_BASE_LIST(type); \
>   type DMDATA_ADDRESS_HIGH;\
>   type DMDATA_MODE;\
> @@ -186,8 +186,12 @@
>   type SURFACE_TRIPLE_BUFFER_ENABLE;\
>   type VMID
>  
> -#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> - DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type);\
> +#define DCN2_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> + DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type); \
> + type HUBP_DISABLE
> +
> +#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \
> + DCN2_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\
>   type REFCYC_PER_VM_GROUP_FLIP;\
>   type REFCYC_PER_VM_REQ_FLIP;\
>   type REFCYC_PER_VM_GROUP_VBLANK;\
> @@ -196,8 +200,12 @@
>   type REFCYC_PER_META_CHUNK_FLIP_C; \
>   type VM_GROUP_SIZE
>  
> -#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> - DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type);\
> +#define DCN21_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> + DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\
> + type HUBP_DISABLE
> +
> +#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type) \
> + DCN21_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\
>   type PRIMARY_SURFACE_DCC_IND_BLK;\
>   type SECONDARY_SURFACE_DCC_IND_BLK;\
>   type PRIMARY_SURFACE_DCC_IND_BLK_C;\
> @@ -216,6 +224,10 @@
>   type ROW_TTU_MODE; \
>   type NUM_PKRS
>  
> +#define DCN30_HUBP_REG_FIELD_VARIABLE_LIST(type) \
> + DCN30_HUBP_REG_FIELD_VARIABLE_LIST_COMMON(type);\
> + type HUBP_DISABLE
> +
>  struct dcn_hubp2_registers {
>   DCN30_HUBP_REG_COMMON_VARIABLE_LIST;
>  };
> -- 
> 2.25.1
> 

-- 
Rodrigo Siqueira
https://siqueira.tech


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-10 Thread Mikhail Gavrilov
Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
INFO: lockdep is turned off.
CPU: 15 PID: 266 Comm: kswapd0 Tainted: GW-
---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 vm_unmap_aliases+0x21/0x40
 change_page_attr_set_clr+0x9e/0x190
 set_memory_wb+0x2f/0x80
 ttm_pool_free_page+0x28/0x90 [ttm]
 ttm_pool_shrink+0x45/0xb0 [ttm]
 ttm_pool_shrinker_scan+0xa/0x20 [ttm]
 do_shrink_slab+0x177/0x3a0
 shrink_slab+0x9c/0x290
 shrink_node+0x2e6/0x700
 balance_pgdat+0x2f5/0x650
 kswapd+0x21d/0x4d0
 ? do_wait_intr_irq+0xd0/0xd0
 ? balance_pgdat+0x650/0x650
 kthread+0x13a/0x150
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

But the most unpleasant thing is that after a while the monitor turns
off and does not go on again until the restart.
This is accompanied by an entry in the kernel log:

amdgpu :0b:00.0: amdgpu: ff7d8b94 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12

$ grep "Failed to pin framebuffer with error" -Rn .
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
DRM_ERROR("Failed to pin framebuffer with error %d\n", r);

$ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
Blaming lines:   0% (11/9167), done.
5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
 domain = AMDGPU_GEM_DOMAIN_VRAM;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
7b7c6c81b3a37 (Junwei Zhang2018-06-25 12:51:14 +0800 5813)  r =
amdgpu_bo_pin(rbo, domain);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814)  if
(unlikely(r != 0)) {
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
 if (r != -ERESTARTSYS)
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
 DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
 ttm_eu_backoff_reservation(, );
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
 return r;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819)  }
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
bb812f1ea87dd (Junwei Zhang2018-06-25 13:32:24 +0800 5821)  r =
amdgpu_ttm_alloc_gart(>tbo);

Who knows how to fix it?

Full kernel logs is here:
[1] https://pastebin.com/fLasjDHX
[2] https://pastebin.com/g3wR2r9e

--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx