Re: [PATCH 1/2] Revert "drm/amd/pm: Use gpu_metrics_v1_6 for SMUv13.0.6"
On 5/20/2024 10:31 AM, Asad Kamal wrote: > Remove gpu_metrics_v1_6 usage for SMUv13.0.6 temporarily and use > gpu_metrics_v1_5 until tool support is ready for it. > > This reverts commit e6efb71ae640eada28f44cc97aa79e8ae4901e63. > > Signed-off-by: Asad Kamal Series is Reviewed-by: Lijo Lazar Thanks, Lijo > --- > .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 18 -- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > index ceb2174baff6..81a241ed18f5 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > @@ -351,7 +351,7 @@ static int smu_v13_0_6_tables_init(struct smu_context > *smu) > return -ENOMEM; > smu_table->metrics_time = 0; > > - smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_6); > + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_5); > smu_table->gpu_metrics_table = > kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); > if (!smu_table->gpu_metrics_table) { > @@ -2290,8 +2290,8 @@ static int > smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu) > static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void > **table) > { > struct smu_table_context *smu_table = >smu_table; > - struct gpu_metrics_v1_6 *gpu_metrics = > - (struct gpu_metrics_v1_6 *)smu_table->gpu_metrics_table; > + struct gpu_metrics_v1_5 *gpu_metrics = > + (struct gpu_metrics_v1_5 *)smu_table->gpu_metrics_table; > struct amdgpu_device *adev = smu->adev; > int ret = 0, xcc_id, inst, i, j; > MetricsTableX_t *metrics_x; > @@ -2307,7 +2307,7 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct > smu_context *smu, void **table > > metrics_a = (MetricsTableA_t *)metrics_x; > > - smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 6); > + smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 5); > > gpu_metrics->temperature_hotspot = > SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature)); > @@ -2349,16 +2349,6 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct > smu_context *smu, void **table > > gpu_metrics->current_uclk = > SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency)); > > - /* Total accumulated cycle counter */ > - gpu_metrics->accumulation_counter = > GET_METRIC_FIELD(AccumulationCounter); > - > - /* Accumulated throttler residencies */ > - gpu_metrics->prochot_residency_acc = > GET_METRIC_FIELD(ProchotResidencyAcc); > - gpu_metrics->ppt_residency_acc = GET_METRIC_FIELD(PptResidencyAcc); > - gpu_metrics->socket_thm_residency_acc = > GET_METRIC_FIELD(SocketThmResidencyAcc); > - gpu_metrics->vr_thm_residency_acc = GET_METRIC_FIELD(VrThmResidencyAcc); > - gpu_metrics->hbm_thm_residency_acc = > GET_METRIC_FIELD(HbmThmResidencyAcc); > - > /* Throttle status is not reported through metrics now */ > gpu_metrics->throttle_status = 0; >
[PATCH 2/2] Revert "drm/amd/pm: Add gpu_metrics_v1_6"
Remove gpu_metrics_v1_6 temporarily until tool support is ready This reverts commit d24c43b1e3193415e28c4b1f3b2908ff88b28eb3. Signed-off-by: Asad Kamal --- .../gpu/drm/amd/include/kgd_pp_interface.h| 89 --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 3 - 2 files changed, 92 deletions(-) diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index ac76bcd81a5d..4b20e2274313 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -871,95 +871,6 @@ struct gpu_metrics_v1_5 { uint16_tpadding; }; -struct gpu_metrics_v1_6 { - struct metrics_table_header common_header; - - /* Temperature (Celsius) */ - uint16_ttemperature_hotspot; - uint16_ttemperature_mem; - uint16_ttemperature_vrsoc; - - /* Power (Watts) */ - uint16_tcurr_socket_power; - - /* Utilization (%) */ - uint16_taverage_gfx_activity; - uint16_taverage_umc_activity; // memory controller - uint16_tvcn_activity[NUM_VCN]; - uint16_tjpeg_activity[NUM_JPEG_ENG]; - - /* Energy (15.259uJ (2^-16) units) */ - uint64_tenergy_accumulator; - - /* Driver attached timestamp (in ns) */ - uint64_tsystem_clock_counter; - - /* Accumulation cycle counter */ - uint32_taccumulation_counter; - - /* Accumulated throttler residencies */ - uint32_tprochot_residency_acc; - uint32_tppt_residency_acc; - uint32_tsocket_thm_residency_acc; - uint32_tvr_thm_residency_acc; - uint32_thbm_thm_residency_acc; - - /* Throttle status */ - uint32_tthrottle_status; - - /* Clock Lock Status. Each bit corresponds to clock instance */ - uint32_tgfxclk_lock_status; - - /* Link width (number of lanes) and speed (in 0.1 GT/s) */ - uint16_tpcie_link_width; - uint16_tpcie_link_speed; - - /* XGMI bus width and bitrate (in Gbps) */ - uint16_txgmi_link_width; - uint16_txgmi_link_speed; - - /* Utilization Accumulated (%) */ - uint32_tgfx_activity_acc; - uint32_tmem_activity_acc; - - /*PCIE accumulated bandwidth (Mbps) */ - uint64_tpcie_bandwidth_acc; - - /*PCIE instantaneous bandwidth (Mbps) */ - uint64_tpcie_bandwidth_inst; - - /* PCIE L0 to recovery state transition accumulated count */ - uint64_tpcie_l0_to_recov_count_acc; - - /* PCIE replay accumulated count */ - uint64_tpcie_replay_count_acc; - - /* PCIE replay rollover accumulated count */ - uint64_tpcie_replay_rover_count_acc; - - /* PCIE NAK sent accumulated count */ - uint32_tpcie_nak_sent_count_acc; - - /* PCIE NAK received accumulated count */ - uint32_tpcie_nak_rcvd_count_acc; - - /* XGMI accumulated data transfer size(KiloBytes) */ - uint64_txgmi_read_data_acc[NUM_XGMI_LINKS]; - uint64_txgmi_write_data_acc[NUM_XGMI_LINKS]; - - /* PMFW attached timestamp (10ns resolution) */ - uint64_tfirmware_timestamp; - - /* Current clocks (Mhz) */ - uint16_tcurrent_gfxclk[MAX_GFX_CLKS]; - uint16_tcurrent_socclk[MAX_CLKS]; - uint16_tcurrent_vclk0[MAX_CLKS]; - uint16_tcurrent_dclk0[MAX_CLKS]; - uint16_tcurrent_uclk; - - uint16_tpadding; -}; - /* * gpu_metrics_v2_0 is not recommended as it's not naturally aligned. * Use gpu_metrics_v2_1 or later instead. diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index f265a449c342..5592fd825aa3 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -1052,9 +1052,6 @@ void smu_cmn_init_soft_gpu_metrics(void *table, uint8_t frev, uint8_t crev) case METRICS_VERSION(1, 5): structure_size = sizeof(struct gpu_metrics_v1_5); break; - case METRICS_VERSION(1, 6): -
[PATCH 1/2] Revert "drm/amd/pm: Use gpu_metrics_v1_6 for SMUv13.0.6"
Remove gpu_metrics_v1_6 usage for SMUv13.0.6 temporarily and use gpu_metrics_v1_5 until tool support is ready for it. This reverts commit e6efb71ae640eada28f44cc97aa79e8ae4901e63. Signed-off-by: Asad Kamal --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index ceb2174baff6..81a241ed18f5 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -351,7 +351,7 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu) return -ENOMEM; smu_table->metrics_time = 0; - smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_6); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v1_5); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) { @@ -2290,8 +2290,8 @@ static int smu_v13_0_6_get_current_pcie_link_speed(struct smu_context *smu) static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table) { struct smu_table_context *smu_table = >smu_table; - struct gpu_metrics_v1_6 *gpu_metrics = - (struct gpu_metrics_v1_6 *)smu_table->gpu_metrics_table; + struct gpu_metrics_v1_5 *gpu_metrics = + (struct gpu_metrics_v1_5 *)smu_table->gpu_metrics_table; struct amdgpu_device *adev = smu->adev; int ret = 0, xcc_id, inst, i, j; MetricsTableX_t *metrics_x; @@ -2307,7 +2307,7 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table metrics_a = (MetricsTableA_t *)metrics_x; - smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 6); + smu_cmn_init_soft_gpu_metrics(gpu_metrics, 1, 5); gpu_metrics->temperature_hotspot = SMUQ10_ROUND(GET_METRIC_FIELD(MaxSocketTemperature)); @@ -2349,16 +2349,6 @@ static ssize_t smu_v13_0_6_get_gpu_metrics(struct smu_context *smu, void **table gpu_metrics->current_uclk = SMUQ10_ROUND(GET_METRIC_FIELD(UclkFrequency)); - /* Total accumulated cycle counter */ - gpu_metrics->accumulation_counter = GET_METRIC_FIELD(AccumulationCounter); - - /* Accumulated throttler residencies */ - gpu_metrics->prochot_residency_acc = GET_METRIC_FIELD(ProchotResidencyAcc); - gpu_metrics->ppt_residency_acc = GET_METRIC_FIELD(PptResidencyAcc); - gpu_metrics->socket_thm_residency_acc = GET_METRIC_FIELD(SocketThmResidencyAcc); - gpu_metrics->vr_thm_residency_acc = GET_METRIC_FIELD(VrThmResidencyAcc); - gpu_metrics->hbm_thm_residency_acc = GET_METRIC_FIELD(HbmThmResidencyAcc); - /* Throttle status is not reported through metrics now */ gpu_metrics->throttle_status = 0; -- 2.42.0
RE: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions
[AMD Official Use Only - AMD Internal Distribution Only] Hmm... but in __amdgpu_eeprom_xfer, the u32 will still be cut to u16. __amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, u16 buf_size, bool read) Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Tao Zhou Sent: Monday, May 20, 2024 10:46 To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions Avoid overflow issue. Signed-off-by: Tao Zhou mailto:tao.zh...@amd.com>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c index e71768661ca8..09a34c7258e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c @@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, * Returns the number of bytes read/written; -errno on error. */ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, - u8 *eeprom_buf, u16 buf_size, bool read) + u8 *eeprom_buf, u32 buf_size, bool read) { const struct i2c_adapter_quirks *quirks = i2c_adap->quirks; u16 limit; @@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, true); @@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h index 6935adb2be1f..8083b8253ef4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h @@ -28,10 +28,10 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); #endif -- 2.34.1
RE: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Yang Wang Best Regards, Kevin -Original Message- From: amd-gfx On Behalf Of Tao Zhou Sent: Monday, May 20, 2024 10:46 AM To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: [PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions Avoid overflow issue. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c index e71768661ca8..09a34c7258e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c @@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, * Returns the number of bytes read/written; -errno on error. */ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, - u8 *eeprom_buf, u16 buf_size, bool read) + u8 *eeprom_buf, u32 buf_size, bool read) { const struct i2c_adapter_quirks *quirks = i2c_adap->quirks; u16 limit; @@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, true); @@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h index 6935adb2be1f..8083b8253ef4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h @@ -28,10 +28,10 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); #endif -- 2.34.1
[PATCH] drm/amdgpu: update type of buf size to u32 for eeprom functions
Avoid overflow issue. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c index e71768661ca8..09a34c7258e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c @@ -179,7 +179,7 @@ static int __amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, * Returns the number of bytes read/written; -errno on error. */ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, - u8 *eeprom_buf, u16 buf_size, bool read) + u8 *eeprom_buf, u32 buf_size, bool read) { const struct i2c_adapter_quirks *quirks = i2c_adap->quirks; u16 limit; @@ -225,7 +225,7 @@ static int amdgpu_eeprom_xfer(struct i2c_adapter *i2c_adap, u32 eeprom_addr, int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, true); @@ -233,7 +233,7 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes) + u32 bytes) { return amdgpu_eeprom_xfer(i2c_adap, eeprom_addr, eeprom_buf, bytes, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h index 6935adb2be1f..8083b8253ef4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h @@ -28,10 +28,10 @@ int amdgpu_eeprom_read(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); int amdgpu_eeprom_write(struct i2c_adapter *i2c_adap, u32 eeprom_addr, u8 *eeprom_buf, - u16 bytes); + u32 bytes); #endif -- 2.34.1
RE: [PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Zhigang Luo -Original Message- From: amd-gfx On Behalf Of Victor Skvortsov Sent: Sunday, May 19, 2024 10:52 AM To: amd-gfx@lists.freedesktop.org Cc: Skvortsov, Victor Subject: [PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index d98d619fba97..3d5f58e76f2d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) amdgpu_sriov_runtime(adev)) { amdgpu_ras_set_fed(adev, true); if (amdgpu_reset_domain_schedule(adev->reset_domain, - >virt.flr_work)) + >kfd.reset_work)) return; else dev_err(adev->dev, "Failed to queue work! at %s", __func__); -- 2.34.1
RE: [PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Zhigang Luo -Original Message- From: amd-gfx On Behalf Of Victor Skvortsov Sent: Sunday, May 19, 2024 10:52 AM To: amd-gfx@lists.freedesktop.org Cc: Skvortsov, Victor Subject: [PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF Runtime KIQ interface to read/write registers in VF may take longer than expected for BM environment. Extend the timeout. Signed-off-by: Victor Skvortsov --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d749c6abdc5e..e8980b6009c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -348,9 +348,9 @@ enum amdgpu_kiq_irq { AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0, AMDGPU_CP_KIQ_IRQ_LAST }; -#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */ -#define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ -#define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ +#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */ +#define MAX_KIQ_REG_WAIT amdgpu_sriov_vf(adev) ? 5 : 5000 /* in +usecs, extend for VF */ #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in +msecs, 5ms */ #define MAX_KIQ_REG_TRY 1000 int amdgpu_device_ip_set_clockgating_state(void *dev, -- 2.34.1
[PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED
The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index d98d619fba97..3d5f58e76f2d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) amdgpu_sriov_runtime(adev)) { amdgpu_ras_set_fed(adev, true); if (amdgpu_reset_domain_schedule(adev->reset_domain, - >virt.flr_work)) + >kfd.reset_work)) return; else dev_err(adev->dev, "Failed to queue work! at %s", __func__); -- 2.34.1
[PATCH 1/2] drm/amdgpu: Extend KIQ reg polling wait for VF
Runtime KIQ interface to read/write registers in VF may take longer than expected for BM environment. Extend the timeout. Signed-off-by: Victor Skvortsov --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index d749c6abdc5e..e8980b6009c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -348,9 +348,9 @@ enum amdgpu_kiq_irq { AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0, AMDGPU_CP_KIQ_IRQ_LAST }; -#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */ -#define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ -#define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ +#define SRIOV_USEC_TIMEOUT 120 /* wait 12 * 100ms for SRIOV */ +#define MAX_KIQ_REG_WAIT amdgpu_sriov_vf(adev) ? 5 : 5000 /* in usecs, extend for VF */ +#define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ #define MAX_KIQ_REG_TRY 1000 int amdgpu_device_ip_set_clockgating_state(void *dev, -- 2.34.1
[PATCH 2/2] tests/amdgpu/amd_abm: Add support for panel_power_saving property
From: Mario Limonciello When the "panel power saving" property is set to forbidden the compositor has indicated that userspace prefers to have color accuracy and fidelity instead of power saving. Verify that the sysfs file behaves as expected in this situation. Signed-off-by: Mario Limonciello --- tests/amdgpu/amd_abm.c | 65 ++ 1 file changed, 65 insertions(+) diff --git a/tests/amdgpu/amd_abm.c b/tests/amdgpu/amd_abm.c index f74c3012c..971cda153 100644 --- a/tests/amdgpu/amd_abm.c +++ b/tests/amdgpu/amd_abm.c @@ -104,6 +104,32 @@ static int backlight_write_brightness(int value) return 0; } +static int toggle_panel_power_saving_prop(data_t *data, igt_output_t *output, bool forbid) +{ + uint32_t type = DRM_MODE_OBJECT_CONNECTOR; + bool prop_exists; + uint32_t prop_id; + + prop_exists = kmstest_get_property( + data->drm_fd, output->id, type, "panel power saving", + _id, NULL, NULL); + + if (!prop_exists) + return -ENODEV; + + return drmModeObjectSetProperty(data->drm_fd, output->id, type, prop_id, forbid); +} + +static int allow_panel_power_saving(data_t *data, igt_output_t *output) +{ + return toggle_panel_power_saving_prop(data, output, false); +} + +static int forbid_panel_power_saving(data_t *data, igt_output_t *output) +{ + return toggle_panel_power_saving_prop(data, output, true); +} + static int set_abm_level(data_t *data, igt_output_t *output, int level) { char buf[PATH_MAX]; @@ -365,6 +391,43 @@ static void abm_gradual(data_t *data) } } + +static void abm_forbidden(data_t *data) +{ + igt_output_t *output; + enum pipe pipe; + int target, r; + + for_each_pipe_with_valid_output(>display, pipe, output) { + if (output->config.connector->connector_type != DRM_MODE_CONNECTOR_eDP) + continue; + + r = allow_panel_power_saving(data, output); + if (r == -ENODEV) { + igt_skip("No panel power saving prop\n"); + return; + } + igt_assert_eq(r, 0); + + target = 3; + r = set_abm_level(data, output, target); + igt_assert_eq(r, 0); + + r = forbid_panel_power_saving(data, output); + igt_assert_eq(r, 0); + + target = 0; + r = set_abm_level(data, output, target); + igt_assert_eq(r, -1); + + r = allow_panel_power_saving(data, output); + igt_assert_eq(r, 0); + + r = set_abm_level(data, output, target); + igt_assert_eq(r, 0); + } +} + igt_main { data_t data = {}; @@ -393,6 +456,8 @@ igt_main abm_enabled(); igt_subtest("abm_gradual") abm_gradual(); + igt_subtest("abm_forbidden") + abm_forbidden(); igt_fixture { igt_display_fini(); -- 2.45.0
[PATCH 0/2] Add support for testing panel power saving DRM property
During the Display Next hackfest 2024 one of the topics discussed was the need for compositor to be able to relay intention to drivers that color fidelity is preferred over power savings. To accomplish this a new optional DRM property is being introduced called "panel power saving". This property is an enum that can be configured by the compositor to "Allowed" or "Forbidden". When a driver advertises support for this property and the compositor sets it to "Forbidden" then the driver will disable any power saving features that can compromise color fidelity. This set of IGT changes introduces a new subtest that will cover the expected kernel behavior when switching between Allowed and Forbidden. Mario Limonciello (2): tests/amdgpu/amd_abm: Make set_abm_level return type int tests/amdgpu/amd_abm: Add support for panel_power_saving property tests/amdgpu/amd_abm.c | 98 +- 1 file changed, 88 insertions(+), 10 deletions(-) -- 2.45.0
[PATCH 1/2] tests/amdgpu/amd_abm: Make set_abm_level return type int
From: Mario Limonciello In order to bubble of cases of expeted errors on set_abm_level() change the return type to int. Signed-off-by: Mario Limonciello --- tests/amdgpu/amd_abm.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/tests/amdgpu/amd_abm.c b/tests/amdgpu/amd_abm.c index 2882c2c18..f74c3012c 100644 --- a/tests/amdgpu/amd_abm.c +++ b/tests/amdgpu/amd_abm.c @@ -104,10 +104,11 @@ static int backlight_write_brightness(int value) return 0; } -static void set_abm_level(data_t *data, igt_output_t *output, int level) +static int set_abm_level(data_t *data, igt_output_t *output, int level) { char buf[PATH_MAX]; int fd; + int ret; igt_assert(snprintf(buf, PATH_MAX, PANEL_POWER_SAVINGS_PATH, output->name) < PATH_MAX); @@ -116,8 +117,12 @@ static void set_abm_level(data_t *data, igt_output_t *output, int level) igt_assert(fd != -1); - igt_assert_eq(snprintf(buf, sizeof(buf), "%d", level), - write(fd, buf, 1)); + snprintf(buf, sizeof(buf), "%d", level); + ret = write(fd, buf, 1); + if (ret < 0) { + close(fd); + return ret; + } igt_assert_eq(close(fd), 0); @@ -129,6 +134,7 @@ static void set_abm_level(data_t *data, igt_output_t *output, int level) DRM_MODE_DPMS_OFF); kmstest_set_connector_dpms(data->drm_fd, output->config.connector, DRM_MODE_DPMS_ON); + return 0; } static int backlight_read_max_brightness(int *result) @@ -192,7 +198,8 @@ static void backlight_dpms_cycle(data_t *data) ret = backlight_read_max_brightness(_brightness); igt_assert_eq(ret, 0); - set_abm_level(data, output, 0); + ret = set_abm_level(data, output, 0); + igt_assert_eq(ret, 0); backlight_write_brightness(max_brightness / 2); usleep(10); pwm_1 = read_target_backlight_pwm(data->drm_fd, output->name); @@ -223,7 +230,8 @@ static void backlight_monotonic_basic(data_t *data) brightness_step = max_brightness / 10; - set_abm_level(data, output, 0); + ret = set_abm_level(data, output, 0); + igt_assert_eq(ret, 0); backlight_write_brightness(max_brightness); usleep(10); prev_pwm = read_target_backlight_pwm(data->drm_fd, output->name); @@ -257,7 +265,8 @@ static void backlight_monotonic_abm(data_t *data) brightness_step = max_brightness / 10; for (i = 1; i < 5; i++) { - set_abm_level(data, output, 0); + ret = set_abm_level(data, output, 0); + igt_assert_eq(ret, 0); backlight_write_brightness(max_brightness); usleep(10); prev_pwm = read_target_backlight_pwm(data->drm_fd, output->name); @@ -289,14 +298,16 @@ static void abm_enabled(data_t *data) ret = backlight_read_max_brightness(_brightness); igt_assert_eq(ret, 0); - set_abm_level(data, output, 0); + ret = set_abm_level(data, output, 0); + igt_assert_eq(ret, 0); backlight_write_brightness(max_brightness); usleep(10); prev_pwm = read_target_backlight_pwm(data->drm_fd, output->name); pwm_without_abm = prev_pwm; for (i = 1; i < 5; i++) { - set_abm_level(data, output, i); + ret = set_abm_level(data, output, i); + igt_assert_eq(ret, 0); usleep(10); pwm = read_target_backlight_pwm(data->drm_fd, output->name); igt_assert(pwm <= prev_pwm); @@ -323,7 +334,8 @@ static void abm_gradual(data_t *data) igt_assert_eq(ret, 0); - set_abm_level(data, output, 0); + ret = set_abm_level(data, output, 0); + igt_assert_eq(ret, 0); backlight_write_brightness(max_brightness); sleep(convergence_delay); @@ -331,7 +343,8 @@ static void abm_gradual(data_t *data) curr = read_current_backlight_pwm(data->drm_fd, output->name); igt_assert_eq(prev_pwm, curr); - set_abm_level(data, output, 4); + ret = set_abm_level(data, output, 4); + igt_assert_eq(ret, 0); for (i = 0; i < 10; i++) { usleep(10); pwm = read_current_backlight_pwm(data->drm_fd, output->name); -- 2.45.0
[PATCH 1/2] drm: Introduce panel_power_saving drm property
The `panel_power_saving` DRM property is an optional property that can be added to a connector by a driver. This property is for compositors to indicate intent of allowing policy for the driver to use power saving features that may compromise color fidelity. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/drm_connector.c | 36 + include/drm/drm_connector.h | 1 + include/drm/drm_mode_config.h | 6 ++ include/uapi/drm/drm_mode.h | 4 4 files changed, 47 insertions(+) diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c index 4d2df7f64dc5..ccf672c55e0c 100644 --- a/drivers/gpu/drm/drm_connector.c +++ b/drivers/gpu/drm/drm_connector.c @@ -961,6 +961,11 @@ static const struct drm_prop_enum_list drm_scaling_mode_enum_list[] = { { DRM_MODE_SCALE_ASPECT, "Full aspect" }, }; +static const struct drm_prop_enum_list drm_panel_power_saving_enum_list[] = { + { DRM_MODE_PANEL_POWER_SAVING_ALLOWED, "Allowed" }, + { DRM_MODE_PANEL_POWER_SAVING_FORBIDDEN, "Forbidden" }, +}; + static const struct drm_prop_enum_list drm_aspect_ratio_enum_list[] = { { DRM_MODE_PICTURE_ASPECT_NONE, "Automatic" }, { DRM_MODE_PICTURE_ASPECT_4_3, "4:3" }, @@ -1963,6 +1968,37 @@ int drm_mode_create_scaling_mode_property(struct drm_device *dev) } EXPORT_SYMBOL(drm_mode_create_scaling_mode_property); +/** + * drm_mode_create_panel_power_saving_property - create panel power saving property + * @dev: DRM device + * + * Called by a driver the first time it's needed, must be attached to desired + * connectors. + * + * Atomic drivers should use drm_mode_create_panel_power_saving_property() + * instead to correctly assign _connector_state.panel_power_saving + * in the atomic state. + * + * Returns: %0 + */ +int drm_mode_create_panel_power_saving_property(struct drm_device *dev) +{ + struct drm_property *panel_power_saving; + + if (dev->mode_config.panel_power_saving) + return 0; + + panel_power_saving = + drm_property_create_enum(dev, 0, "panel power saving", + drm_panel_power_saving_enum_list, + ARRAY_SIZE(drm_panel_power_saving_enum_list)); + + dev->mode_config.panel_power_saving = panel_power_saving; + + return 0; +} +EXPORT_SYMBOL(drm_mode_create_panel_power_saving_property); + /** * DOC: Variable refresh properties * diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h index fe88d7fc6b8f..4ea3f912c641 100644 --- a/include/drm/drm_connector.h +++ b/include/drm/drm_connector.h @@ -2025,6 +2025,7 @@ int drm_mode_create_dp_colorspace_property(struct drm_connector *connector, u32 supported_colorspaces); int drm_mode_create_content_type_property(struct drm_device *dev); int drm_mode_create_suggested_offset_properties(struct drm_device *dev); +int drm_mode_create_panel_power_saving_property(struct drm_device *dev); int drm_connector_set_path_property(struct drm_connector *connector, const char *path); diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h index 973119a9176b..099ad2d8c5c1 100644 --- a/include/drm/drm_mode_config.h +++ b/include/drm/drm_mode_config.h @@ -954,6 +954,12 @@ struct drm_mode_config { */ struct drm_atomic_state *suspend_state; + /** +* @panel_power_saving: DRM ENUM property for type of +* Panel Power Saving. +*/ + struct drm_property *panel_power_saving; + const struct drm_mode_config_helper_funcs *helper_private; }; diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h index 7040e7ea80c7..82e565cc76fb 100644 --- a/include/uapi/drm/drm_mode.h +++ b/include/uapi/drm/drm_mode.h @@ -152,6 +152,10 @@ extern "C" { #define DRM_MODE_SCALE_CENTER 2 /* Centered, no scaling */ #define DRM_MODE_SCALE_ASPECT 3 /* Full screen, preserve aspect */ +/* Panel power saving options */ +#define DRM_MODE_PANEL_POWER_SAVING_ALLOWED0 /* Panel power savings features allowed */ +#define DRM_MODE_PANEL_POWER_SAVING_FORBIDDEN 1 /* Panel power savings features not allowed */ + /* Dithering mode options */ #define DRM_MODE_DITHERING_OFF 0 #define DRM_MODE_DITHERING_ON 1 -- 2.45.0
[PATCH 2/2] drm/amd: Add panel_power_saving drm property to eDP connectors
When the `panel_power_saving` property is set to "Forbidden" ABM should be disabled immediately and any requests by sysfs to update will return an -EBUSY error. When the property is restored to "Allowed" the previous value of ABM will be restored. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++ .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 1 + 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index 3ecc7ef95172..6e6531c93d81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -1350,6 +1350,9 @@ int amdgpu_display_modeset_create_props(struct amdgpu_device *adev) "dither", amdgpu_dither_enum_list, sz); + if (adev->dc_enabled) + drm_mode_create_panel_power_saving_property(adev_to_drm(adev)); + return 0; } diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 01b0a331e4a6..f6b80018b136 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6421,6 +6421,12 @@ int amdgpu_dm_connector_atomic_set_property(struct drm_connector *connector, } else if (property == adev->mode_info.underscan_property) { dm_new_state->underscan_enable = val; ret = 0; + } else if (property == dev->mode_config.panel_power_saving) { + dm_new_state->abm_forbidden = val; + dm_new_state->abm_level = (val || !amdgpu_dm_abm_level) ? + ABM_LEVEL_IMMEDIATE_DISABLE : + amdgpu_dm_abm_level; + ret = 0; } return ret; @@ -6463,6 +6469,9 @@ int amdgpu_dm_connector_atomic_get_property(struct drm_connector *connector, } else if (property == adev->mode_info.underscan_property) { *val = dm_state->underscan_enable; ret = 0; + } else if (property == dev->mode_config.panel_power_saving) { + *val = dm_state->abm_forbidden; + ret = 0; } return ret; @@ -6489,9 +6498,12 @@ static ssize_t panel_power_savings_show(struct device *device, u8 val; drm_modeset_lock(>mode_config.connection_mutex, NULL); - val = to_dm_connector_state(connector->state)->abm_level == - ABM_LEVEL_IMMEDIATE_DISABLE ? 0 : - to_dm_connector_state(connector->state)->abm_level; + if (to_dm_connector_state(connector->state)->abm_forbidden) + val = 0; + else + val = to_dm_connector_state(connector->state)->abm_level == + ABM_LEVEL_IMMEDIATE_DISABLE ? 0 : + to_dm_connector_state(connector->state)->abm_level; drm_modeset_unlock(>mode_config.connection_mutex); return sysfs_emit(buf, "%u\n", val); @@ -6515,10 +6527,16 @@ static ssize_t panel_power_savings_store(struct device *device, return -EINVAL; drm_modeset_lock(>mode_config.connection_mutex, NULL); - to_dm_connector_state(connector->state)->abm_level = val ?: - ABM_LEVEL_IMMEDIATE_DISABLE; + if (to_dm_connector_state(connector->state)->abm_forbidden) + ret = -EBUSY; + else + to_dm_connector_state(connector->state)->abm_level = val ?: + ABM_LEVEL_IMMEDIATE_DISABLE; drm_modeset_unlock(>mode_config.connection_mutex); + if (ret) + return ret; + drm_kms_helper_hotplug_event(dev); return count; @@ -7689,6 +7707,14 @@ void amdgpu_dm_connector_init_helper(struct amdgpu_display_manager *dm, aconnector->base.state->max_bpc = 16; aconnector->base.state->max_requested_bpc = aconnector->base.state->max_bpc; + if (connector_type == DRM_MODE_CONNECTOR_eDP && + (dc_is_dmcu_initialized(adev->dm.dc) || +adev->dm.dc->ctx->dmub_srv) && amdgpu_dm_abm_level < 0) { + drm_object_attach_property(>base.base, + dm->ddev->mode_config.panel_power_saving, + DRM_MODE_PANEL_POWER_SAVING_ALLOWED); + } + if (connector_type == DRM_MODE_CONNECTOR_HDMIA) { /* Content Type is currently only implemented for HDMI. */ drm_connector_attach_content_type_property(>base); diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index 09519b7abf67..43c3e5845a94 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++
[PATCH 0/2] Add support for Panel Power Savings property
During the Display Next hackfest 2024 one of the topics discussed was the need for compositor to be able to relay intention to drivers that color fidelity is preferred over power savings. To accomplish this a new optional DRM property is being introduced called "panel power saving". This property is an enum that can be configured by the compositor to "Allowed" or "Forbidden". When a driver advertises support for this property and the compositor sets it to "Forbidden" then the driver will disable any power saving features that can compromise color fidelity. In practice the main feature this currently applies to is the "Adaptive Backlight Modulation" feature within AMD DCN on eDP panels. When the compositor has marked the property "Forbidden" then this feature will be disabled and any userspace that tries to turn it on will get an -EBUSY return code. When the compositor has restored the value back to "Allowed" then the previous value that would have been programmed will be restored. Mario Limonciello (2): drm: Introduce panel_power_saving drm property drm/amd: Add panel_power_saving drm property to eDP connectors drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++ .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 34 ++--- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 1 + drivers/gpu/drm/drm_connector.c | 37 +++ include/drm/drm_connector.h | 1 + include/drm/drm_mode_config.h | 6 +++ include/uapi/drm/drm_mode.h | 4 ++ 7 files changed, 81 insertions(+), 5 deletions(-) -- 2.45.0
[PATCH] drm/amd/display: Pass errors from amdgpu_dm_init() up
Errors in amdgpu_dm_init() are silently ignored and dm_hw_init() will succeed. However often these are fatal errors and it would be better to pass them up. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index d6e71aa808d8..01b0a331e4a6 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2556,8 +2556,12 @@ static int amdgpu_dm_smu_write_watermarks_table(struct amdgpu_device *adev) static int dm_hw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; + int r; + /* Create DAL display manager */ - amdgpu_dm_init(adev); + r = amdgpu_dm_init(adev); + if (r) + return r; amdgpu_dm_hpd_init(adev); return 0; -- 2.43.0
Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Thu, May 16, 2024 at 01:34:54PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > [ >This is a treewide change. I will likely re-create this patch again in >the second week of the merge window of v6.10 and submit it then. Hoping >to keep the conflicts that it will cause to a minimum. > ] > > With the rework of how the __string() handles dynamic strings where it > saves off the source string in field in the helper structure[1], the > assignment of that value to the trace event field is stored in the helper > value and does not need to be passed in again. > > This means that with: > > __string(field, mystring) > > Which use to be assigned with __assign_str(field, mystring), no longer > needs the second parameter and it is unused. With this, __assign_str() > will now only get a single parameter. > > There's over 700 users of __assign_str() and because coccinelle does not > handle the TRACE_EVENT() macro I ended up using the following sed script: > > git grep -l __assign_str | while read a ; do > sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; > mv /tmp/test-file $a; > done > > I then searched for __assign_str() that did not end with ';' as those > were multi line assignments that the sed script above would fail to catch. > Building csky:allmodconfig (and others) ... failed -- Error log: In file included from include/trace/trace_events.h:419, from include/trace/define_trace.h:102, from drivers/cxl/core/trace.h:737, from drivers/cxl/core/trace.c:8: drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 arguments, but takes just 1 This is with the patch applied on top of v6.9-8410-gff2632d7d08e. So far that seems to be the only build failure. Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to cxl_general_media and cxl_dram events"). Guess we'll see more of those towards the end of the commit window. Guenter
Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
On 5/17/24 10:48, Steven Rostedt wrote: On Fri, 17 May 2024 10:36:37 -0700 Guenter Roeck wrote: Building csky:allmodconfig (and others) ... failed -- Error log: In file included from include/trace/trace_events.h:419, from include/trace/define_trace.h:102, from drivers/cxl/core/trace.h:737, from drivers/cxl/core/trace.c:8: drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 arguments, but takes just 1 This is with the patch applied on top of v6.9-8410-gff2632d7d08e. So far that seems to be the only build failure. Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to cxl_general_media and cxl_dram events"). Guess we'll see more of those towards the end of the commit window. Looks like I made this patch just before this commit was pulled into Linus's tree. Which is why I'll apply and rerun the above again probably on Tuesday of next week against Linus's latest. This patch made it through both an allyesconfig and an allmodconfig, but on the commit I had applied it to, which was: 1b294a1f3561 ("Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") I'll be compiling those two builds after I update it then. I am currently repeating my test builds with the above errors fixed. That should take a couple of hours. I'll let you know how it goes. Guenter
Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1
> Fair enough, but this is also the only gfx9 APU which defaults to > noretry=1, all of the rest are dGPUs. I'd argue it should align with > the other GFX9 APUs or they should all enable noretry=1. Do you mean we should remove all IP_VERSION(9, X, X) entries from amdgpu_gmc_noretry_set(), leaving just >= IP_VERSION(10, 3, 0)?
Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU
Am 17.05.24 um 03:30 schrieb Barry Kauler: Armin, Yifan, Prike, I will top-post, so you don't have to scroll down. After identifying the commit that causes black screen with my gpu, I posted the result to you guys, on May 9. It is now May 17 and no reply. OK, I have now created a patch that reverts Yifan's commit, compiled 5.15.158, and my gpu now works. Note, the radeon module is not loaded, so it is not a factor. I'm not a kernel developer. I have identified the culprit and it is up to you guys to fix it, Yifan especially, as you are the person who has created the regression. I will attach my patch. Regards, Barry Kauler Hi, sorry for not responding to your findings. I normally do not work with GPU drivers, so i hoped one of the amdgpu developers would handle this. I cceddri-de...@lists.freedesktop.org and amd-gfx@lists.freedesktop.org so that other amdgpu developers hear from this issue. Thanks you for you persistence in finding the offending commit. Armin Wolf On Thu, May 9, 2024 at 4:08 PM Barry Kauler wrote: On Fri, May 3, 2024 at 9:03 PM Armin Wolf wrote: ... # lspci | grep VGA 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c2) 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver # lspci -n -k ... 05:00.0 0300: 1002:15d8 (rev c2) Subsystem: 1025:1456 Kernel driver in use: amdgpu Kernel modules: amdgpu ... thanks for informing us of this regression. Since there are four commits affecting amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits, see https://docs.kernel.org/admin-guide/bug-bisect.html for details. I think you can speed up the bisecting process by limiting yourself to the AMD DRM driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the man page of "git bisect" for details. Thanks, Armin Wolf Armin, Thanks for the advice. I am unfamiliar with git on the commandline. Previously only used SmartGit gui. EasyOS requires aufs patch, and for a few days tried to figure out how to use that with git bisect, then gave up. Changed to testing with my "QV" distro, which is more conventional, doesn't need any kernel patches. Managed to get it down to one commit. Here are the steps I followed: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git # cd linux-stable # git tag -l | grep '5\.15\.150' v5.15.150 # git checkout -b my5.15.150 v5.15.150 Updating files: 100% (65776/65776), done. Switched to a new branch 'my5.15.150' Copied in my .config then... # make menuconfig # git bisect start -- drivers/gpu/drm/amd # git bisect bad # git bisect good v5.15.149 Bisecting: 1 revision left to test after this (roughly 1 step) [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for s3 suspend abort case # make # rm -rf /boot2 # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, with amdgpu, works! # git bisect good Bisecting: 0 revisions left to test after this (roughly 0 steps) [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu after amdkfd device init # make # mkdir -p /boot2/lib/modules # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install # cp arch/x86/boot/bzImage /boot2/vmlinuz # sync ...QV on Acer laptop, black screen! # git bisect bad 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit commit 56b522f4668167096a50c39446d6263c96219f5f Author: Yifan Zhang Date: Tue Sep 28 15:42:35 2021 +0800 drm/amdgpu: init iommu after amdkfd device init [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ] This patch is to fix clinfo failure in Raven/Picasso: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.2 AMD-APP (3364.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0 Signed-off-by: Yifan Zhang Reviewed-by: James Zhu Tested-by: James Zhu Acked-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) Anything else I should do, to identify what in this commit is the likely culprit? Regards, Barry Kauler
Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 17 May 2024 10:36:37 -0700 Guenter Roeck wrote: > Building csky:allmodconfig (and others) ... failed > -- > Error log: > In file included from include/trace/trace_events.h:419, > from include/trace/define_trace.h:102, > from drivers/cxl/core/trace.h:737, > from drivers/cxl/core/trace.c:8: > drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 > arguments, but takes just 1 > > This is with the patch applied on top of v6.9-8410-gff2632d7d08e. > So far that seems to be the only build failure. > Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to > cxl_general_media and cxl_dram events"). Guess we'll see more of those > towards the end of the commit window. Looks like I made this patch just before this commit was pulled into Linus's tree. Which is why I'll apply and rerun the above again probably on Tuesday of next week against Linus's latest. This patch made it through both an allyesconfig and an allmodconfig, but on the commit I had applied it to, which was: 1b294a1f3561 ("Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") I'll be compiling those two builds after I update it then. -- Steve
Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1
On Fri, May 17, 2024 at 12:35 AM Christian König wrote: > > Am 16.05.24 um 19:57 schrieb Tim Van Patten: > > From: Tim Van Patten > > > > The following commit updated gmc->noretry from 0 to 1 for GC HW IP > > 9.3.0: > > > > commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") > > > > This causes the device to hang when a page fault occurs, until the > > device is rebooted. Instead, revert back to gmc->noretry=0 so the device > > is still responsive. > > Wait a second. Why does the device hang on a page fault? That shouldn't > happen independent of noretry. No idea, but hopefully someone within AMD can help answer that. I'm not an expert in this area, I was just able to bisect to the CL causing the change in behavior. There are other reports of people bisecting to the same CL, so this issue appears to extend beyond ChromeOS: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9728#note_2063879 > So that strongly sounds like this is just hiding a bug elsewhere. That's entirely possible, bringing the number of real issues up to (at least) two: 1. Why the page faults are occurring to begin with. - Any video of size 66x2158 seems to trigger the issue. 2. Why the page faults result in the device hanging with gmc->noretry=1. I've asked prathyushi.nangia@amd (chris.kuruts@amd may be helping as well) to look into the page faults further, since they can't hang the device if they don't exist. She should be able to provide any further details if you're interested, but please feel free to reach out to me as well if you have any other questions.
Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
On 5/17/24 11:00, Guenter Roeck wrote: On 5/17/24 10:48, Steven Rostedt wrote: On Fri, 17 May 2024 10:36:37 -0700 Guenter Roeck wrote: Building csky:allmodconfig (and others) ... failed -- Error log: In file included from include/trace/trace_events.h:419, from include/trace/define_trace.h:102, from drivers/cxl/core/trace.h:737, from drivers/cxl/core/trace.c:8: drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 arguments, but takes just 1 This is with the patch applied on top of v6.9-8410-gff2632d7d08e. So far that seems to be the only build failure. Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to cxl_general_media and cxl_dram events"). Guess we'll see more of those towards the end of the commit window. Looks like I made this patch just before this commit was pulled into Linus's tree. Which is why I'll apply and rerun the above again probably on Tuesday of next week against Linus's latest. This patch made it through both an allyesconfig and an allmodconfig, but on the commit I had applied it to, which was: 1b294a1f3561 ("Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") I'll be compiling those two builds after I update it then. I am currently repeating my test builds with the above errors fixed. That should take a couple of hours. I'll let you know how it goes. There are no more build failures caused by this patch after fixing the above errors. Tested-by: Guenter Roeck Guenter