Re: [PATCH 1/2] Revert "drm/amd/pm: Use gpu_metrics_v1_6 for SMUv13.0.6"

2024-05-19 Thread Lazar, Lijo



On 5/20/2024 10:31 AM, Asad Kamal wrote:
> Remove gpu_metrics_v1_6 usage for SMUv13.0.6 temporarily and use
> gpu_metrics_v1_5 until tool support is ready for it.
> 
> This reverts commit e6efb71ae640eada28f44cc97aa79e8ae4901e63.
> 
> Signed-off-by: Asad Kamal 

Series is
Reviewed-by: Lijo Lazar 

Thanks,
Lijo

> ---
>  .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c   | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> index ceb2174baff6..81a241ed18f5 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> @@ -351,7 +351,7 @@ static int smu_v13_0_6_tables_init(struct smu_context 
> *smu)
>   return -ENOMEM;
>   smu_table->metrics_time = 0;
>  
> - smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_6);
> + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_5);
>   smu_table->gpu_metrics_table =
>   kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
>   if (!smu_table->gpu_metrics_table) {
> @@ -2290,8 +2290,8 @@ static int 
> smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu)
>  static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void 
> **table)
>  {
>   struct smu_table_context *smu_table = >smu_table;
> - struct gpu_metrics_v1_6 *gpu_metrics =
> - (struct gpu_metrics_v1_6 *)smu_table->gpu_metrics_table;
> + struct gpu_metrics_v1_5 *gpu_metrics =
> + (struct gpu_metrics_v1_5 *)smu_table->gpu_metrics_table;
>   struct amdgpu_device *adev = smu->adev;
>   int ret = 0, xcc_id, inst, i, j;
>   MetricsTableX_t *metrics_x;
> @@ -2307,7 +2307,7 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
> smu_context *smu, void **table
>  
>   metrics_a = (MetricsTableA_t *)metrics_x;
>  
> - smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 6);
> + smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 5);
>  
>   gpu_metrics->temperature_hotspot =
>   SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature));
> @@ -2349,16 +2349,6 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
> smu_context *smu, void **table
>  
>   gpu_metrics->current_uclk = 
> SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency));
>  
> - /* Total accumulated cycle counter */
> - gpu_metrics->accumulation_counter = 
> GET_METRIC_FIELD(AccumulationCounter);
> -
> - /* Accumulated throttler residencies */
> - gpu_metrics->prochot_residency_acc = 
> GET_METRIC_FIELD(ProchotResidencyAcc);
> - gpu_metrics->ppt_residency_acc = GET_METRIC_FIELD(PptResidencyAcc);
> - gpu_metrics->socket_thm_residency_acc = 
> GET_METRIC_FIELD(SocketThmResidencyAcc);
> - gpu_metrics->vr_thm_residency_acc = GET_METRIC_FIELD(VrThmResidencyAcc);
> - gpu_metrics->hbm_thm_residency_acc = 
> GET_METRIC_FIELD(HbmThmResidencyAcc);
> -
>   /* Throttle status is not reported through metrics now */
>   gpu_metrics->throttle_status = 0;
>  


[PATCH 2/2] Revert "drm/amd/pm: Add gpu_metrics_v1_6"

2024-05-19 Thread Asad Kamal
Remove gpu_metrics_v1_6 temporarily until tool support is ready

This reverts commit d24c43b1e3193415e28c4b1f3b2908ff88b28eb3.

Signed-off-by: Asad Kamal 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h| 89 ---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c|  3 -
 2 files changed, 92 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index ac76bcd81a5d..4b20e2274313 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -871,95 +871,6 @@ struct gpu_metrics_v1_5 {
uint16_tpadding;
 };
 
-struct gpu_metrics_v1_6 {
-   struct metrics_table_header common_header;
-
-   /* Temperature (Celsius) */
-   uint16_ttemperature_hotspot;
-   uint16_ttemperature_mem;
-   uint16_ttemperature_vrsoc;
-
-   /* Power (Watts) */
-   uint16_tcurr_socket_power;
-
-   /* Utilization (%) */
-   uint16_taverage_gfx_activity;
-   uint16_taverage_umc_activity; // memory 
controller
-   uint16_tvcn_activity[NUM_VCN];
-   uint16_tjpeg_activity[NUM_JPEG_ENG];
-
-   /* Energy (15.259uJ (2^-16) units) */
-   uint64_tenergy_accumulator;
-
-   /* Driver attached timestamp (in ns) */
-   uint64_tsystem_clock_counter;
-
-   /* Accumulation cycle counter */
-   uint32_taccumulation_counter;
-
-   /* Accumulated throttler residencies */
-   uint32_tprochot_residency_acc;
-   uint32_tppt_residency_acc;
-   uint32_tsocket_thm_residency_acc;
-   uint32_tvr_thm_residency_acc;
-   uint32_thbm_thm_residency_acc;
-
-   /* Throttle status */
-   uint32_tthrottle_status;
-
-   /* Clock Lock Status. Each bit corresponds to clock instance */
-   uint32_tgfxclk_lock_status;
-
-   /* Link width (number of lanes) and speed (in 0.1 GT/s) */
-   uint16_tpcie_link_width;
-   uint16_tpcie_link_speed;
-
-   /* XGMI bus width and bitrate (in Gbps) */
-   uint16_txgmi_link_width;
-   uint16_txgmi_link_speed;
-
-   /* Utilization Accumulated (%) */
-   uint32_tgfx_activity_acc;
-   uint32_tmem_activity_acc;
-
-   /*PCIE accumulated bandwidth (Mbps) */
-   uint64_tpcie_bandwidth_acc;
-
-   /*PCIE instantaneous bandwidth (Mbps) */
-   uint64_tpcie_bandwidth_inst;
-
-   /* PCIE L0 to recovery state transition accumulated count */
-   uint64_tpcie_l0_to_recov_count_acc;
-
-   /* PCIE replay accumulated count */
-   uint64_tpcie_replay_count_acc;
-
-   /* PCIE replay rollover accumulated count */
-   uint64_tpcie_replay_rover_count_acc;
-
-   /* PCIE NAK sent  accumulated count */
-   uint32_tpcie_nak_sent_count_acc;
-
-   /* PCIE NAK received accumulated count */
-   uint32_tpcie_nak_rcvd_count_acc;
-
-   /* XGMI accumulated data transfer size(KiloBytes) */
-   uint64_txgmi_read_data_acc[NUM_XGMI_LINKS];
-   uint64_txgmi_write_data_acc[NUM_XGMI_LINKS];
-
-   /* PMFW attached timestamp (10ns resolution) */
-   uint64_tfirmware_timestamp;
-
-   /* Current clocks (Mhz) */
-   uint16_tcurrent_gfxclk[MAX_GFX_CLKS];
-   uint16_tcurrent_socclk[MAX_CLKS];
-   uint16_tcurrent_vclk0[MAX_CLKS];
-   uint16_tcurrent_dclk0[MAX_CLKS];
-   uint16_tcurrent_uclk;
-
-   uint16_tpadding;
-};
-
 /*
  * gpu_metrics_v2_0 is not recommended as it's not naturally aligned.
  * Use gpu_metrics_v2_1 or later instead.
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index f265a449c342..5592fd825aa3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -1052,9 +1052,6 @@ void smu_cmn_init_soft_gpu_metrics(void *table, uint8_t 
frev, uint8_t crev)
case METRICS_VERSION(1, 5):
structure_size = sizeof(struct gpu_metrics_v1_5);
break;
-   case METRICS_VERSION(1, 6):
-   

[PATCH 1/2] Revert "drm/amd/pm: Use gpu_metrics_v1_6 for SMUv13.0.6"

2024-05-19 Thread Asad Kamal
Remove gpu_metrics_v1_6 usage for SMUv13.0.6 temporarily and use
gpu_metrics_v1_5 until tool support is ready for it.

This reverts commit e6efb71ae640eada28f44cc97aa79e8ae4901e63.

Signed-off-by: Asad Kamal 
---
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c   | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index ceb2174baff6..81a241ed18f5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -351,7 +351,7 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu)
return -ENOMEM;
smu_table->metrics_time = 0;
 
-   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_6);
+   smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_5);
smu_table->gpu_metrics_table =
kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL);
if (!smu_table->gpu_metrics_table) {
@@ -2290,8 +2290,8 @@ static int smu_v13_0_6_get_current_pcie_link_speed(struct 
smu_context *smu)
 static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void 
**table)
 {
struct smu_table_context *smu_table = >smu_table;
-   struct gpu_metrics_v1_6 *gpu_metrics =
-   (struct gpu_metrics_v1_6 *)smu_table->gpu_metrics_table;
+   struct gpu_metrics_v1_5 *gpu_metrics =
+   (struct gpu_metrics_v1_5 *)smu_table->gpu_metrics_table;
struct amdgpu_device *adev = smu->adev;
int ret = 0, xcc_id, inst, i, j;
MetricsTableX_t *metrics_x;
@@ -2307,7 +2307,7 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
 
metrics_a = (MetricsTableA_t *)metrics_x;
 
-   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 6);
+   smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 5);
 
gpu_metrics->temperature_hotspot =
SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature));
@@ -2349,16 +2349,6 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct 
smu_context *smu, void **table
 
gpu_metrics->current_uclk = 
SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency));
 
-   /* Total accumulated cycle counter */
-   gpu_metrics->accumulation_counter = 
GET_METRIC_FIELD(AccumulationCounter);
-
-   /* Accumulated throttler residencies */
-   gpu_metrics->prochot_residency_acc = 
GET_METRIC_FIELD(ProchotResidencyAcc);
-   gpu_metrics->ppt_residency_acc = GET_METRIC_FIELD(PptResidencyAcc);
-   gpu_metrics->socket_thm_residency_acc = 
GET_METRIC_FIELD(SocketThmResidencyAcc);
-   gpu_metrics->vr_thm_residency_acc = GET_METRIC_FIELD(VrThmResidencyAcc);
-   gpu_metrics->hbm_thm_residency_acc = 
GET_METRIC_FIELD(HbmThmResidencyAcc);
-
/* Throttle status is not reported through metrics now */
gpu_metrics->throttle_status = 0;
 
-- 
2.42.0



RE: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

2024-05-19 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only]


Hmm... but in __amdgpu_eeprom_xfer, the u32 will still be cut to u16.
__amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 
*eeprom_buf, u16 buf_size, bool read)

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: Monday, May 20, 2024 10:46
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

Avoid overflow issue.

Signed-off-by: Tao Zhou mailto:tao.zh...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++---  
drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
index e71768661ca8..09a34c7258e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
@@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter 
*i2c_adap, u32 eeprom_addr,
  * Returns the number of bytes read/written; -errno on error.
  */
 static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr,
- u8 *eeprom_buf, u16 buf_size, bool read)
+ u8 *eeprom_buf, u32 buf_size, bool read)
 {
const struct i2c_adapter_quirks *quirks = i2c_adap->quirks;
u16 limit;
@@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, 
u32 eeprom_addr,

 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes)
+  u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  true);
@@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,

 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes)
+   u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
index 6935adb2be1f..8083b8253ef4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
@@ -28,10 +28,10 @@

 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes);
+  u32 bytes);

 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes);
+   u32 bytes);

 #endif
--
2.34.1



RE: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

2024-05-19 Thread Wang, Yang(Kevin)
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Yang Wang 

Best Regards,
Kevin

-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: Monday, May 20, 2024 10:46 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

Avoid overflow issue.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++---  
drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
index e71768661ca8..09a34c7258e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
@@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter 
*i2c_adap, u32 eeprom_addr,
  * Returns the number of bytes read/written; -errno on error.
  */
 static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr,
- u8 *eeprom_buf, u16 buf_size, bool read)
+ u8 *eeprom_buf, u32 buf_size, bool read)
 {
const struct i2c_adapter_quirks *quirks = i2c_adap->quirks;
u16 limit;
@@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, 
u32 eeprom_addr,

 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes)
+  u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  true);
@@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,

 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes)
+   u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
index 6935adb2be1f..8083b8253ef4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
@@ -28,10 +28,10 @@

 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes);
+  u32 bytes);

 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes);
+   u32 bytes);

 #endif
--
2.34.1



[PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions

2024-05-19 Thread Tao Zhou
Avoid overflow issue.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
index e71768661ca8..09a34c7258e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
@@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter 
*i2c_adap, u32 eeprom_addr,
  * Returns the number of bytes read/written; -errno on error.
  */
 static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr,
- u8 *eeprom_buf, u16 buf_size, bool read)
+ u8 *eeprom_buf, u32 buf_size, bool read)
 {
const struct i2c_adapter_quirks *quirks = i2c_adap->quirks;
u16 limit;
@@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, 
u32 eeprom_addr,
 
 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes)
+  u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  true);
@@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
 
 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes)
+   u32 bytes)
 {
return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes,
  false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
index 6935adb2be1f..8083b8253ef4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h
@@ -28,10 +28,10 @@
 
 int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap,
   u32 eeprom_addr, u8 *eeprom_buf,
-  u16 bytes);
+  u32 bytes);
 
 int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap,
u32 eeprom_addr, u8 *eeprom_buf,
-   u16 bytes);
+   u32 bytes);
 
 #endif
-- 
2.34.1



RE: [PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED

2024-05-19 Thread Luo, Zhigang
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Zhigang Luo 

-Original Message-
From: amd-gfx  On Behalf Of Victor 
Skvortsov
Sent: Sunday, May 19, 2024 10:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Skvortsov, Victor 
Subject: [PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED

The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset 
workitems are queued at the same time. In addition, FLR guest recovery sequence 
is out of order when PF/VF communication breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset request 
from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index d98d619fba97..3d5f58e76f2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct 
work_struct *work)
amdgpu_sriov_runtime(adev)) {
amdgpu_ras_set_fed(adev, true);
if (amdgpu_reset_domain_schedule(adev->reset_domain,
- >virt.flr_work))
+ 
>kfd.reset_work))
return;
else
dev_err(adev->dev, "Failed to queue work! at 
%s", __func__);
--
2.34.1



RE: [PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF

2024-05-19 Thread Luo, Zhigang
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Zhigang Luo 

-Original Message-
From: amd-gfx  On Behalf Of Victor 
Skvortsov
Sent: Sunday, May 19, 2024 10:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Skvortsov, Victor 
Subject: [PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF

Runtime KIQ interface to read/write registers in VF may take longer than 
expected for BM environment. Extend the timeout.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d749c6abdc5e..e8980b6009c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -348,9 +348,9 @@ enum amdgpu_kiq_irq {
AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
AMDGPU_CP_KIQ_IRQ_LAST
 };
-#define SRIOV_USEC_TIMEOUT  120 /* wait 12 * 100ms for SRIOV */
-#define MAX_KIQ_REG_WAIT   5000 /* in usecs, 5ms */
-#define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
+#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */
+#define MAX_KIQ_REG_WAIT amdgpu_sriov_vf(adev) ? 5 : 5000 /* in
+usecs, extend for VF */ #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in
+msecs, 5ms */
 #define MAX_KIQ_REG_TRY 1000

 int amdgpu_device_ip_set_clockgating_state(void *dev,
--
2.34.1



[PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED

2024-05-19 Thread Victor Skvortsov
The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index d98d619fba97..3d5f58e76f2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct 
work_struct *work)
amdgpu_sriov_runtime(adev)) {
amdgpu_ras_set_fed(adev, true);
if (amdgpu_reset_domain_schedule(adev->reset_domain,
- >virt.flr_work))
+ 
>kfd.reset_work))
return;
else
dev_err(adev->dev, "Failed to queue work! at 
%s", __func__);
-- 
2.34.1



[PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF

2024-05-19 Thread Victor Skvortsov
Runtime KIQ interface to read/write registers in VF may take longer
than expected for BM environment. Extend the timeout.

Signed-off-by: Victor Skvortsov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d749c6abdc5e..e8980b6009c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -348,9 +348,9 @@ enum amdgpu_kiq_irq {
AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
AMDGPU_CP_KIQ_IRQ_LAST
 };
-#define SRIOV_USEC_TIMEOUT  120 /* wait 12 * 100ms for SRIOV */
-#define MAX_KIQ_REG_WAIT   5000 /* in usecs, 5ms */
-#define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
+#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */
+#define MAX_KIQ_REG_WAIT amdgpu_sriov_vf(adev) ? 5 : 5000 /* in usecs, 
extend for VF */
+#define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */
 #define MAX_KIQ_REG_TRY 1000
 
 int amdgpu_device_ip_set_clockgating_state(void *dev,
-- 
2.34.1



[PATCH 2/2] tests/amdgpu/amd_abm: Add support for panel_power_saving property

2024-05-19 Thread Mario Limonciello
From: Mario Limonciello 

When the "panel power saving" property is set to forbidden the
compositor has indicated that userspace prefers to have color
accuracy and fidelity instead of power saving.

Verify that the sysfs file behaves as expected in this situation.

Signed-off-by: Mario Limonciello 
---
 tests/amdgpu/amd_abm.c | 65 ++
 1 file changed, 65 insertions(+)

diff --git a/tests/amdgpu/amd_abm.c b/tests/amdgpu/amd_abm.c
index f74c3012c..971cda153 100644
--- a/tests/amdgpu/amd_abm.c
+++ b/tests/amdgpu/amd_abm.c
@@ -104,6 +104,32 @@ static int backlight_write_brightness(int value)
return 0;
 }
 
+static int toggle_panel_power_saving_prop(data_t *data, igt_output_t *output, 
bool forbid)
+{
+   uint32_t type = DRM_MODE_OBJECT_CONNECTOR;
+   bool prop_exists;
+   uint32_t prop_id;
+
+   prop_exists = kmstest_get_property(
+   data->drm_fd, output->id, type, "panel power saving",
+   _id, NULL, NULL);
+
+   if (!prop_exists)
+   return -ENODEV;
+
+   return drmModeObjectSetProperty(data->drm_fd, output->id, type, 
prop_id, forbid);
+}
+
+static int allow_panel_power_saving(data_t *data, igt_output_t *output)
+{
+   return toggle_panel_power_saving_prop(data, output, false);
+}
+
+static int forbid_panel_power_saving(data_t *data, igt_output_t *output)
+{
+   return toggle_panel_power_saving_prop(data, output, true);
+}
+
 static int set_abm_level(data_t *data, igt_output_t *output, int level)
 {
char buf[PATH_MAX];
@@ -365,6 +391,43 @@ static void abm_gradual(data_t *data)
}
 }
 
+
+static void abm_forbidden(data_t *data)
+{
+   igt_output_t *output;
+   enum pipe pipe;
+   int target, r;
+
+   for_each_pipe_with_valid_output(>display, pipe, output) {
+   if (output->config.connector->connector_type != 
DRM_MODE_CONNECTOR_eDP)
+   continue;
+
+   r = allow_panel_power_saving(data, output);
+   if (r == -ENODEV) {
+   igt_skip("No panel power saving prop\n");
+   return;
+   }
+   igt_assert_eq(r, 0);
+
+   target = 3;
+   r = set_abm_level(data, output, target);
+   igt_assert_eq(r, 0);
+
+   r = forbid_panel_power_saving(data, output);
+   igt_assert_eq(r, 0);
+
+   target = 0;
+   r = set_abm_level(data, output, target);
+   igt_assert_eq(r, -1);
+
+   r = allow_panel_power_saving(data, output);
+   igt_assert_eq(r, 0);
+
+   r = set_abm_level(data, output, target);
+   igt_assert_eq(r, 0);
+   }
+}
+
 igt_main
 {
data_t data = {};
@@ -393,6 +456,8 @@ igt_main
abm_enabled();
igt_subtest("abm_gradual")
abm_gradual();
+   igt_subtest("abm_forbidden")
+   abm_forbidden();
 
igt_fixture {
igt_display_fini();
-- 
2.45.0



[PATCH 0/2] Add support for testing panel power saving DRM property

2024-05-19 Thread Mario Limonciello
During the Display Next hackfest 2024 one of the topics discussed
was the need for compositor to be able to relay intention to drivers
that color fidelity is preferred over power savings.

To accomplish this a new optional DRM property is being introduced called
"panel power saving".  This property is an enum that can be configured
by the compositor to "Allowed" or "Forbidden".

When a driver advertises support for this property and the compositor
sets it to "Forbidden" then the driver will disable any power saving
features that can compromise color fidelity.

This set of IGT changes introduces a new subtest that will cover the
expected kernel behavior when switching between Allowed and Forbidden.

Mario Limonciello (2):
  tests/amdgpu/amd_abm: Make set_abm_level return type int
  tests/amdgpu/amd_abm: Add support for panel_power_saving property

 tests/amdgpu/amd_abm.c | 98 +-
 1 file changed, 88 insertions(+), 10 deletions(-)

-- 
2.45.0



[PATCH 1/2] tests/amdgpu/amd_abm: Make set_abm_level return type int

2024-05-19 Thread Mario Limonciello
From: Mario Limonciello 

In order to bubble of cases of expeted errors on set_abm_level()
change the return type to int.

Signed-off-by: Mario Limonciello 
---
 tests/amdgpu/amd_abm.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/tests/amdgpu/amd_abm.c b/tests/amdgpu/amd_abm.c
index 2882c2c18..f74c3012c 100644
--- a/tests/amdgpu/amd_abm.c
+++ b/tests/amdgpu/amd_abm.c
@@ -104,10 +104,11 @@ static int backlight_write_brightness(int value)
return 0;
 }
 
-static void set_abm_level(data_t *data, igt_output_t *output, int level)
+static int set_abm_level(data_t *data, igt_output_t *output, int level)
 {
char buf[PATH_MAX];
int fd;
+   int ret;
 
igt_assert(snprintf(buf, PATH_MAX, PANEL_POWER_SAVINGS_PATH,
output->name) < PATH_MAX);
@@ -116,8 +117,12 @@ static void set_abm_level(data_t *data, igt_output_t 
*output, int level)
 
igt_assert(fd != -1);
 
-   igt_assert_eq(snprintf(buf, sizeof(buf), "%d", level),
- write(fd, buf, 1));
+   snprintf(buf, sizeof(buf), "%d", level);
+   ret = write(fd, buf, 1);
+   if (ret < 0) {
+   close(fd);
+   return ret;
+   }
 
igt_assert_eq(close(fd), 0);
 
@@ -129,6 +134,7 @@ static void set_abm_level(data_t *data, igt_output_t 
*output, int level)
 DRM_MODE_DPMS_OFF);
kmstest_set_connector_dpms(data->drm_fd, output->config.connector,
 DRM_MODE_DPMS_ON);
+   return 0;
 }
 
 static int backlight_read_max_brightness(int *result)
@@ -192,7 +198,8 @@ static void backlight_dpms_cycle(data_t *data)
ret = backlight_read_max_brightness(_brightness);
igt_assert_eq(ret, 0);
 
-   set_abm_level(data, output, 0);
+   ret = set_abm_level(data, output, 0);
+   igt_assert_eq(ret, 0);
backlight_write_brightness(max_brightness / 2);
usleep(10);
pwm_1 = read_target_backlight_pwm(data->drm_fd, output->name);
@@ -223,7 +230,8 @@ static void backlight_monotonic_basic(data_t *data)
 
brightness_step = max_brightness / 10;
 
-   set_abm_level(data, output, 0);
+   ret = set_abm_level(data, output, 0);
+   igt_assert_eq(ret, 0);
backlight_write_brightness(max_brightness);
usleep(10);
prev_pwm = read_target_backlight_pwm(data->drm_fd, 
output->name);
@@ -257,7 +265,8 @@ static void backlight_monotonic_abm(data_t *data)
 
brightness_step = max_brightness / 10;
for (i = 1; i < 5; i++) {
-   set_abm_level(data, output, 0);
+   ret = set_abm_level(data, output, 0);
+   igt_assert_eq(ret, 0);
backlight_write_brightness(max_brightness);
usleep(10);
prev_pwm = read_target_backlight_pwm(data->drm_fd, 
output->name);
@@ -289,14 +298,16 @@ static void abm_enabled(data_t *data)
ret = backlight_read_max_brightness(_brightness);
igt_assert_eq(ret, 0);
 
-   set_abm_level(data, output, 0);
+   ret = set_abm_level(data, output, 0);
+   igt_assert_eq(ret, 0);
backlight_write_brightness(max_brightness);
usleep(10);
prev_pwm = read_target_backlight_pwm(data->drm_fd, 
output->name);
pwm_without_abm = prev_pwm;
 
for (i = 1; i < 5; i++) {
-   set_abm_level(data, output, i);
+   ret = set_abm_level(data, output, i);
+   igt_assert_eq(ret, 0);
usleep(10);
pwm = read_target_backlight_pwm(data->drm_fd, 
output->name);
igt_assert(pwm <= prev_pwm);
@@ -323,7 +334,8 @@ static void abm_gradual(data_t *data)
 
igt_assert_eq(ret, 0);
 
-   set_abm_level(data, output, 0);
+   ret = set_abm_level(data, output, 0);
+   igt_assert_eq(ret, 0);
backlight_write_brightness(max_brightness);
 
sleep(convergence_delay);
@@ -331,7 +343,8 @@ static void abm_gradual(data_t *data)
curr = read_current_backlight_pwm(data->drm_fd, output->name);
 
igt_assert_eq(prev_pwm, curr);
-   set_abm_level(data, output, 4);
+   ret = set_abm_level(data, output, 4);
+   igt_assert_eq(ret, 0);
for (i = 0; i < 10; i++) {
usleep(10);
pwm = read_current_backlight_pwm(data->drm_fd, 
output->name);
-- 
2.45.0



[PATCH 1/2] drm: Introduce panel_power_saving drm property

2024-05-19 Thread Mario Limonciello
The `panel_power_saving` DRM property is an optional property that
can be added to a connector by a driver.

This property is for compositors to indicate intent of allowing
policy for the driver to use power saving features that may
compromise color fidelity.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/drm_connector.c | 36 +
 include/drm/drm_connector.h |  1 +
 include/drm/drm_mode_config.h   |  6 ++
 include/uapi/drm/drm_mode.h |  4 
 4 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index 4d2df7f64dc5..ccf672c55e0c 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -961,6 +961,11 @@ static const struct drm_prop_enum_list 
drm_scaling_mode_enum_list[] = {
{ DRM_MODE_SCALE_ASPECT, "Full aspect" },
 };
 
+static const struct drm_prop_enum_list drm_panel_power_saving_enum_list[] = {
+   { DRM_MODE_PANEL_POWER_SAVING_ALLOWED, "Allowed" },
+   { DRM_MODE_PANEL_POWER_SAVING_FORBIDDEN, "Forbidden" },
+};
+
 static const struct drm_prop_enum_list drm_aspect_ratio_enum_list[] = {
{ DRM_MODE_PICTURE_ASPECT_NONE, "Automatic" },
{ DRM_MODE_PICTURE_ASPECT_4_3, "4:3" },
@@ -1963,6 +1968,37 @@ int drm_mode_create_scaling_mode_property(struct 
drm_device *dev)
 }
 EXPORT_SYMBOL(drm_mode_create_scaling_mode_property);
 
+/**
+ * drm_mode_create_panel_power_saving_property - create panel power saving 
property
+ * @dev: DRM device
+ *
+ * Called by a driver the first time it's needed, must be attached to desired
+ * connectors.
+ *
+ * Atomic drivers should use drm_mode_create_panel_power_saving_property()
+ * instead to correctly assign _connector_state.panel_power_saving
+ * in the atomic state.
+ *
+ * Returns: %0
+ */
+int drm_mode_create_panel_power_saving_property(struct drm_device *dev)
+{
+   struct drm_property *panel_power_saving;
+
+   if (dev->mode_config.panel_power_saving)
+   return 0;
+
+   panel_power_saving =
+   drm_property_create_enum(dev, 0, "panel power saving",
+   drm_panel_power_saving_enum_list,
+   
ARRAY_SIZE(drm_panel_power_saving_enum_list));
+
+   dev->mode_config.panel_power_saving = panel_power_saving;
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_mode_create_panel_power_saving_property);
+
 /**
  * DOC: Variable refresh properties
  *
diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
index fe88d7fc6b8f..4ea3f912c641 100644
--- a/include/drm/drm_connector.h
+++ b/include/drm/drm_connector.h
@@ -2025,6 +2025,7 @@ int drm_mode_create_dp_colorspace_property(struct 
drm_connector *connector,
   u32 supported_colorspaces);
 int drm_mode_create_content_type_property(struct drm_device *dev);
 int drm_mode_create_suggested_offset_properties(struct drm_device *dev);
+int drm_mode_create_panel_power_saving_property(struct drm_device *dev);
 
 int drm_connector_set_path_property(struct drm_connector *connector,
const char *path);
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..099ad2d8c5c1 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -954,6 +954,12 @@ struct drm_mode_config {
 */
struct drm_atomic_state *suspend_state;
 
+   /**
+* @panel_power_saving: DRM ENUM property for type of
+* Panel Power Saving.
+*/
+   struct drm_property *panel_power_saving;
+
const struct drm_mode_config_helper_funcs *helper_private;
 };
 
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 7040e7ea80c7..82e565cc76fb 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -152,6 +152,10 @@ extern "C" {
 #define DRM_MODE_SCALE_CENTER  2 /* Centered, no scaling */
 #define DRM_MODE_SCALE_ASPECT  3 /* Full screen, preserve aspect */
 
+/* Panel power saving options */
+#define DRM_MODE_PANEL_POWER_SAVING_ALLOWED0 /* Panel power savings 
features allowed */
+#define DRM_MODE_PANEL_POWER_SAVING_FORBIDDEN  1 /* Panel power savings 
features not allowed */
+
 /* Dithering mode options */
 #define DRM_MODE_DITHERING_OFF 0
 #define DRM_MODE_DITHERING_ON  1
-- 
2.45.0



[PATCH 2/2] drm/amd: Add panel_power_saving drm property to eDP connectors

2024-05-19 Thread Mario Limonciello
When the `panel_power_saving` property is set to "Forbidden" ABM
should be disabled immediately and any requests by sysfs to update
will return an -EBUSY error.

When the property is restored to "Allowed" the previous value of
ABM will be restored.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  3 ++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  1 +
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 3ecc7ef95172..6e6531c93d81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1350,6 +1350,9 @@ int amdgpu_display_modeset_create_props(struct 
amdgpu_device *adev)
 "dither",
 amdgpu_dither_enum_list, sz);
 
+   if (adev->dc_enabled)
+   drm_mode_create_panel_power_saving_property(adev_to_drm(adev));
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 01b0a331e4a6..f6b80018b136 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6421,6 +6421,12 @@ int amdgpu_dm_connector_atomic_set_property(struct 
drm_connector *connector,
} else if (property == adev->mode_info.underscan_property) {
dm_new_state->underscan_enable = val;
ret = 0;
+   } else if (property == dev->mode_config.panel_power_saving) {
+   dm_new_state->abm_forbidden = val;
+   dm_new_state->abm_level = (val || !amdgpu_dm_abm_level) ?
+   ABM_LEVEL_IMMEDIATE_DISABLE :
+   amdgpu_dm_abm_level;
+   ret = 0;
}
 
return ret;
@@ -6463,6 +6469,9 @@ int amdgpu_dm_connector_atomic_get_property(struct 
drm_connector *connector,
} else if (property == adev->mode_info.underscan_property) {
*val = dm_state->underscan_enable;
ret = 0;
+   } else if (property == dev->mode_config.panel_power_saving) {
+   *val = dm_state->abm_forbidden;
+   ret = 0;
}
 
return ret;
@@ -6489,9 +6498,12 @@ static ssize_t panel_power_savings_show(struct device 
*device,
u8 val;
 
drm_modeset_lock(>mode_config.connection_mutex, NULL);
-   val = to_dm_connector_state(connector->state)->abm_level ==
-   ABM_LEVEL_IMMEDIATE_DISABLE ? 0 :
-   to_dm_connector_state(connector->state)->abm_level;
+   if (to_dm_connector_state(connector->state)->abm_forbidden)
+   val = 0;
+   else
+   val = to_dm_connector_state(connector->state)->abm_level ==
+   ABM_LEVEL_IMMEDIATE_DISABLE ? 0 :
+   to_dm_connector_state(connector->state)->abm_level;
drm_modeset_unlock(>mode_config.connection_mutex);
 
return sysfs_emit(buf, "%u\n", val);
@@ -6515,10 +6527,16 @@ static ssize_t panel_power_savings_store(struct device 
*device,
return -EINVAL;
 
drm_modeset_lock(>mode_config.connection_mutex, NULL);
-   to_dm_connector_state(connector->state)->abm_level = val ?:
-   ABM_LEVEL_IMMEDIATE_DISABLE;
+   if (to_dm_connector_state(connector->state)->abm_forbidden)
+   ret = -EBUSY;
+   else
+   to_dm_connector_state(connector->state)->abm_level = val ?:
+   ABM_LEVEL_IMMEDIATE_DISABLE;
drm_modeset_unlock(>mode_config.connection_mutex);
 
+   if (ret)
+   return ret;
+
drm_kms_helper_hotplug_event(dev);
 
return count;
@@ -7689,6 +7707,14 @@ void amdgpu_dm_connector_init_helper(struct 
amdgpu_display_manager *dm,
aconnector->base.state->max_bpc = 16;
aconnector->base.state->max_requested_bpc = 
aconnector->base.state->max_bpc;
 
+   if (connector_type == DRM_MODE_CONNECTOR_eDP &&
+   (dc_is_dmcu_initialized(adev->dm.dc) ||
+adev->dm.dc->ctx->dmub_srv) && amdgpu_dm_abm_level < 0) {
+   drm_object_attach_property(>base.base,
+   dm->ddev->mode_config.panel_power_saving,
+   DRM_MODE_PANEL_POWER_SAVING_ALLOWED);
+   }
+
if (connector_type == DRM_MODE_CONNECTOR_HDMIA) {
/* Content Type is currently only implemented for HDMI. */
drm_connector_attach_content_type_property(>base);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 09519b7abf67..43c3e5845a94 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ 

[PATCH 0/2] Add support for Panel Power Savings property

2024-05-19 Thread Mario Limonciello
During the Display Next hackfest 2024 one of the topics discussed
was the need for compositor to be able to relay intention to drivers
that color fidelity is preferred over power savings.

To accomplish this a new optional DRM property is being introduced called
"panel power saving".  This property is an enum that can be configured
by the compositor to "Allowed" or "Forbidden".

When a driver advertises support for this property and the compositor
sets it to "Forbidden" then the driver will disable any power saving
features that can compromise color fidelity.

In practice the main feature this currently applies to is the
"Adaptive Backlight Modulation" feature within AMD DCN on eDP panels.

When the compositor has marked the property  "Forbidden" then this
feature will be disabled and any userspace that tries to turn it on
will get an -EBUSY return code.

When the compositor has restored the value back to "Allowed" then the
previous value that would have been programmed will be restored.

Mario Limonciello (2):
  drm: Introduce panel_power_saving drm property
  drm/amd: Add panel_power_saving drm property to eDP connectors

 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  3 ++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 34 ++---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  1 +
 drivers/gpu/drm/drm_connector.c   | 37 +++
 include/drm/drm_connector.h   |  1 +
 include/drm/drm_mode_config.h |  6 +++
 include/uapi/drm/drm_mode.h   |  4 ++
 7 files changed, 81 insertions(+), 5 deletions(-)

-- 
2.45.0



[PATCH] drm/amd/display: Pass errors from amdgpu_dm_init() up

2024-05-19 Thread Mario Limonciello
Errors in amdgpu_dm_init() are silently ignored and dm_hw_init()
will succeed. However often these are fatal errors and it would
be better to pass them up.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d6e71aa808d8..01b0a331e4a6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2556,8 +2556,12 @@ static int amdgpu_dm_smu_write_watermarks_table(struct 
amdgpu_device *adev)
 static int dm_hw_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   int r;
+
/* Create DAL display manager */
-   amdgpu_dm_init(adev);
+   r = amdgpu_dm_init(adev);
+   if (r)
+   return r;
amdgpu_dm_hpd_init(adev);
 
return 0;
-- 
2.43.0



Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-19 Thread Guenter Roeck
On Thu, May 16, 2024 at 01:34:54PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> [
>This is a treewide change. I will likely re-create this patch again in
>the second week of the merge window of v6.10 and submit it then. Hoping
>to keep the conflicts that it will cause to a minimum.
> ]
> 
> With the rework of how the __string() handles dynamic strings where it
> saves off the source string in field in the helper structure[1], the
> assignment of that value to the trace event field is stored in the helper
> value and does not need to be passed in again.
> 
> This means that with:
> 
>   __string(field, mystring)
> 
> Which use to be assigned with __assign_str(field, mystring), no longer
> needs the second parameter and it is unused. With this, __assign_str()
> will now only get a single parameter.
> 
> There's over 700 users of __assign_str() and because coccinelle does not
> handle the TRACE_EVENT() macro I ended up using the following sed script:
> 
>   git grep -l __assign_str | while read a ; do
>   sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
>   mv /tmp/test-file $a;
>   done
> 
> I then searched for __assign_str() that did not end with ';' as those
> were multi line assignments that the sed script above would fail to catch.
> 

Building csky:allmodconfig (and others) ... failed
--
Error log:
In file included from include/trace/trace_events.h:419,
 from include/trace/define_trace.h:102,
 from drivers/cxl/core/trace.h:737,
 from drivers/cxl/core/trace.c:8:
drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
arguments, but takes just 1

This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
So far that seems to be the only build failure.
Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
cxl_general_media and cxl_dram events"). Guess we'll see more of those
towards the end of the commit window.

Guenter


Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-19 Thread Guenter Roeck

On 5/17/24 10:48, Steven Rostedt wrote:

On Fri, 17 May 2024 10:36:37 -0700
Guenter Roeck  wrote:


Building csky:allmodconfig (and others) ... failed
--
Error log:
In file included from include/trace/trace_events.h:419,
  from include/trace/define_trace.h:102,
  from drivers/cxl/core/trace.h:737,
  from drivers/cxl/core/trace.c:8:
drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
arguments, but takes just 1

This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
So far that seems to be the only build failure.
Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
cxl_general_media and cxl_dram events"). Guess we'll see more of those
towards the end of the commit window.


Looks like I made this patch just before this commit was pulled into
Linus's tree.

Which is why I'll apply and rerun the above again probably on Tuesday of
next week against Linus's latest.

This patch made it through both an allyesconfig and an allmodconfig, but on
the commit I had applied it to, which was:

   1b294a1f3561 ("Merge tag 'net-next-6.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

I'll be compiling those two builds after I update it then.



I am currently repeating my test builds with the above errors fixed.
That should take a couple of hours. I'll let you know how it goes.

Guenter



Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1

2024-05-19 Thread Tim Van Patten
> Fair enough, but this is also the only gfx9 APU which defaults to
> noretry=1, all of the rest are dGPUs.  I'd argue it should align with
> the other GFX9 APUs or they should all enable noretry=1.

Do you mean we should remove all IP_VERSION(9, X, X) entries from
amdgpu_gmc_noretry_set(), leaving just >= IP_VERSION(10, 3, 0)?


Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-05-19 Thread Armin Wolf

Am 17.05.24 um 03:30 schrieb Barry Kauler:


Armin, Yifan, Prike,
I will top-post, so you don't have to scroll down.
After identifying the commit that causes black screen with my gpu, I
posted the result to you guys, on May 9.
It is now May 17 and no reply.
OK, I have now created a patch that reverts Yifan's commit, compiled
5.15.158, and my gpu now works.
Note, the radeon module is not loaded, so it is not a factor.
I'm not a kernel developer. I have identified the culprit and it is up
to you guys to fix it, Yifan especially, as you are the person who has
created the regression.
I will attach my patch.
Regards,
Barry Kauler


Hi,

sorry for not responding to your findings. I normally do not work with GPU 
drivers,
so i hoped one of the amdgpu developers would handle this.

I cceddri-de...@lists.freedesktop.org  and amd-gfx@lists.freedesktop.org so 
that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.
Armin Wolf



On Thu, May 9, 2024 at 4:08 PM Barry Kauler  wrote:

On Fri, May 3, 2024 at 9:03 PM Armin Wolf  wrote:

...
# lspci | grep VGA
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c2)
05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
[AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver

# lspci -n -k
...
05:00.0 0300: 1002:15d8 (rev c2)
Subsystem: 1025:1456
Kernel driver in use: amdgpu
Kernel modules: amdgpu
...

thanks for informing us of this regression. Since there are four commits 
affecting
amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty 
commits,
see https://docs.kernel.org/admin-guide/bug-bisect.html for details.

I think you can speed up the bisecting process by limiting yourself to the AMD 
DRM
driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at 
the
man page of "git bisect" for details.

Thanks,
Armin Wolf

Armin,
Thanks for the advice. I am unfamiliar with git on the commandline.
Previously only used SmartGit gui.
EasyOS requires aufs patch, and for a few days tried to figure out how
to use that with git bisect, then gave up. Changed to testing with my
"QV" distro, which is more conventional, doesn't need any kernel
patches. Managed to get it down to one commit. Here are the steps I
followed:

# git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
# cd linux-stable
# git tag -l | grep '5\.15\.150'
v5.15.150
# git checkout -b my5.15.150 v5.15.150
Updating files: 100% (65776/65776), done.
Switched to a new branch 'my5.15.150'

Copied in my .config then...

# make menuconfig
# git bisect start -- drivers/gpu/drm/amd
# git bisect bad
# git bisect good v5.15.149
Bisecting: 1 revision left to test after this (roughly 1 step)
[b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
s3 suspend abort case
# make
# rm -rf /boot2
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, with amdgpu, works!
# git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
after amdkfd device init
# make
# mkdir -p /boot2/lib/modules
# make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
# cp arch/x86/boot/bzImage /boot2/vmlinuz
# sync
...QV on Acer laptop, black screen!

# git bisect bad
56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
commit 56b522f4668167096a50c39446d6263c96219f5f
Author: Yifan Zhang 
Date:   Tue Sep 28 15:42:35 2021 +0800

 drm/amdgpu: init iommu after amdkfd device init

 [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]

 This patch is to fix clinfo failure in Raven/Picasso:

 Number of platforms: 1
   Platform Profile: FULL_PROFILE
   Platform Version: OpenCL 2.2 AMD-APP (3364.0)
   Platform Name: AMD Accelerated Parallel Processing
   Platform Vendor: Advanced Micro Devices, Inc.
   Platform Extensions: cl_khr_icd cl_amd_event_callback

   Platform Name: AMD Accelerated Parallel Processing Number of devices: 0

 Signed-off-by: Yifan Zhang 
 Reviewed-by: James Zhu 
 Tested-by: James Zhu 
 Acked-by: Felix Kuehling 
 Signed-off-by: Alex Deucher 
 Signed-off-by: Sasha Levin 

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

Anything else I should do, to identify what in this commit is the
likely culprit?
Regards,
Barry Kauler


Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-19 Thread Steven Rostedt
On Fri, 17 May 2024 10:36:37 -0700
Guenter Roeck  wrote:

> Building csky:allmodconfig (and others) ... failed
> --
> Error log:
> In file included from include/trace/trace_events.h:419,
>  from include/trace/define_trace.h:102,
>  from drivers/cxl/core/trace.h:737,
>  from drivers/cxl/core/trace.c:8:
> drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
> arguments, but takes just 1
> 
> This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
> So far that seems to be the only build failure.
> Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
> cxl_general_media and cxl_dram events"). Guess we'll see more of those
> towards the end of the commit window.

Looks like I made this patch just before this commit was pulled into
Linus's tree.

Which is why I'll apply and rerun the above again probably on Tuesday of
next week against Linus's latest.

This patch made it through both an allyesconfig and an allmodconfig, but on
the commit I had applied it to, which was:

  1b294a1f3561 ("Merge tag 'net-next-6.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

I'll be compiling those two builds after I update it then.

-- Steve


Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1

2024-05-19 Thread Tim Van Patten
On Fri, May 17, 2024 at 12:35 AM Christian König
 wrote:
>
> Am 16.05.24 um 19:57 schrieb Tim Van Patten:
> > From: Tim Van Patten 
> >
> > The following commit updated gmc->noretry from 0 to 1 for GC HW IP
> > 9.3.0:
> >
> >  commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1")
> >
> > This causes the device to hang when a page fault occurs, until the
> > device is rebooted. Instead, revert back to gmc->noretry=0 so the device
> > is still responsive.
>
> Wait a second. Why does the device hang on a page fault? That shouldn't
> happen independent of noretry.

No idea, but hopefully someone within AMD can help answer that.

I'm not an expert in this area, I was just able to bisect to the CL
causing the change in behavior. There are other reports of people
bisecting to the same CL, so this issue appears to extend beyond
ChromeOS:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9728#note_2063879

> So that strongly sounds like this is just hiding a bug elsewhere.

That's entirely possible, bringing the number of real issues up to (at
least) two:
1. Why the page faults are occurring to begin with.
  - Any video of size 66x2158 seems to trigger the issue.
2. Why the page faults result in the device hanging with gmc->noretry=1.

I've asked prathyushi.nangia@amd (chris.kuruts@amd may be helping as
well) to look into the page faults further, since they can't hang the
device if they don't exist. She should be able to provide any further
details if you're interested, but please feel free to reach out to me
as well if you have any other questions.


Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-19 Thread Guenter Roeck

On 5/17/24 11:00, Guenter Roeck wrote:

On 5/17/24 10:48, Steven Rostedt wrote:

On Fri, 17 May 2024 10:36:37 -0700
Guenter Roeck  wrote:


Building csky:allmodconfig (and others) ... failed
--
Error log:
In file included from include/trace/trace_events.h:419,
  from include/trace/define_trace.h:102,
  from drivers/cxl/core/trace.h:737,
  from drivers/cxl/core/trace.c:8:
drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
arguments, but takes just 1

This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
So far that seems to be the only build failure.
Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
cxl_general_media and cxl_dram events"). Guess we'll see more of those
towards the end of the commit window.


Looks like I made this patch just before this commit was pulled into
Linus's tree.

Which is why I'll apply and rerun the above again probably on Tuesday of
next week against Linus's latest.

This patch made it through both an allyesconfig and an allmodconfig, but on
the commit I had applied it to, which was:

   1b294a1f3561 ("Merge tag 'net-next-6.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

I'll be compiling those two builds after I update it then.



I am currently repeating my test builds with the above errors fixed.
That should take a couple of hours. I'll let you know how it goes.



There are no more build failures caused by this patch after fixing the above
errors.

Tested-by: Guenter Roeck 

Guenter