Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle
On 5/14/2024 6:30 AM, Ma, Jun wrote: > Hi Lijo & Kevin, thanks for review, will drop this patch > In the original function below check is there. if (!handle || !info || type >= ACA_ERROR_TYPE_COUNT) return -EINVAL; So moving this to a later stage is still valid. struct aca_error_cache *error_cache = >error_cache; Further NULL check of error_cache is not required. Thanks, Lijo > Regards, > Ma Jun > > On 5/14/2024 7:13 AM, Wang, Yang(Kevin) wrote: >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> -Original Message- >> From: Ma, Jun >> Sent: Monday, May 13, 2024 4:56 PM >> To: amd-gfx@lists.freedesktop.org >> Cc: Feng, Kenneth ; Deucher, Alexander >> ; Wang, Yang(Kevin) ; >> Koenig, Christian ; Ma, Jun >> Subject: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle >> >> Check handle pointer before using it >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >> index 28febf33fb1b..e969a7d77b4d 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >> @@ -279,7 +279,7 @@ static struct aca_bank_error *get_bank_error(struct >> aca_error *aerr, struct aca_ int aca_error_cache_log_bank_error(struct >> aca_handle *handle, struct aca_bank_info *info, >>enum aca_error_type type, u64 count) { >> - struct aca_error_cache *error_cache = >error_cache; >> + struct aca_error_cache *error_cache; >> struct aca_bank_error *bank_error; >> struct aca_error *aerr; >> >> @@ -289,6 +289,10 @@ int aca_error_cache_log_bank_error(struct aca_handle >> *handle, struct aca_bank_in >> if (!count) >> return 0; >> >> + error_cache = >error_cache; >> [Kevin]: >> The above code is always return non-0 value, right? >> >> Best Regards, >> Kevin >> + if (!error_cache) >> + return -EINVAL; >> + >> aerr = _cache->errors[type]; >> bank_error = get_bank_error(aerr, info); >> if (!bank_error) >> -- >> 2.34.1 >>
[PATCH] drm/amdgpu/pm: Drop hard-code value of usTMax
Drop hard-code value of nsTmax because we read this value from fantable below. Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c index 17882f8dfdd3..6cfef1b295ab 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c @@ -977,8 +977,6 @@ static int init_thermal_controller( = le16_to_cpu(tonga_fan_table->usPWMMed); hwmgr->thermal_controller.advanceFanControlParameters.usPWMHigh = le16_to_cpu(tonga_fan_table->usPWMHigh); - hwmgr->thermal_controller.advanceFanControlParameters.usTMax - = 10900; /* hard coded */ hwmgr->thermal_controller.advanceFanControlParameters.usTMax = le16_to_cpu(tonga_fan_table->usTMax); hwmgr->thermal_controller.advanceFanControlParameters.ucFanControlMode -- 2.34.1
[PATCH v2] drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 925ec65ac5ed..2bcf5c3b5d70 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2172,12 +2172,15 @@ static void amdgpu_ras_interrupt_process_handler(struct work_struct *work) int amdgpu_ras_interrupt_dispatch(struct amdgpu_device *adev, struct ras_dispatch_if *info) { - struct ras_manager *obj = amdgpu_ras_find_obj(adev, >head); - struct ras_ih_data *data = >ih_data; + struct ras_manager *obj; + struct ras_ih_data *data; + obj = amdgpu_ras_find_obj(adev, >head); if (!obj) return -EINVAL; + data = >ih_data; + if (data->inuse == 0) return 0; -- 2.34.1
RE: [PATCH 01/22] drm/amdgpu: fix dereference after null check
[AMD Official Use Only - AMD Internal Distribution Only] Hi Christian, -Original Message- From: Christian König Sent: Monday, May 13, 2024 7:41 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Huang, Tim Subject: Re: [PATCH 01/22] drm/amdgpu: fix dereference after null check Am 10.05.24 um 04:50 schrieb Jesse Zhang: > check the pointer hive before use. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 77f6fd50002a..00fe3c2d5431 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5725,7 +5725,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device > *adev, >* to put adev in the 1st position. >*/ > INIT_LIST_HEAD(_list); > - if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) { > + if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > > +1) && hive) { That solution looks not optimal to me. Checking adev->gmc.xgmi.num_physical_nodes > 1 already makes sure that hive shouldn't be NULL. If automated checkers complain about that we should probably drop the adev->gmc.xgmi.num_physical_nodes > 1 check and check for hive instead. [Zhang, Jesse(Jie)] gmc.xgmi.num_physical_nodes is obtained by reading register GCMC_VM_XGMI_LFB_CNTL. But getting hive may fail because of no memory (NOMEM), or the kobject of xgmi hive cannot be initialized in the function amdgpu_get_xgmi_hive. Is (adev->gmc.xgmi.num_physical_nodes > 1) equivalent to (!hive) here? Regards Jesse Regards, Christian. > list_for_each_entry(tmp_adev, >device_list, > gmc.xgmi.head) { > list_add_tail(_adev->reset_list, _list); > if (adev->shutdown)
RE: [PATCH 1/5] drm/amdgpu/pm: Fix the null pointer dereference for smu7
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Yang Wang Best Regards, Kevin -Original Message- From: Ma, Jun Sent: Monday, May 13, 2024 4:56 PM To: amd-gfx@lists.freedesktop.org Cc: Feng, Kenneth ; Deucher, Alexander ; Wang, Yang(Kevin) ; Koenig, Christian ; Ma, Jun Subject: [PATCH 1/5] drm/amdgpu/pm: Fix the null pointer dereference for smu7 optimize the code to avoid pass a null pointer (hwmgr->backend) to function smu7_update_edc_leakage_table. Signed-off-by: Ma Jun --- .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 50 +-- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c index 123af237878f..632a25957477 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -2957,6 +2957,7 @@ static int smu7_update_edc_leakage_table(struct pp_hwmgr *hwmgr) static int smu7_hwmgr_backend_init(struct pp_hwmgr *hwmgr) { + struct amdgpu_device *adev = hwmgr->adev; struct smu7_hwmgr *data; int result = 0; @@ -2993,40 +2994,37 @@ static int smu7_hwmgr_backend_init(struct pp_hwmgr *hwmgr) /* Initalize Dynamic State Adjustment Rule Settings */ result = phm_initializa_dynamic_state_adjustment_rule_settings(hwmgr); - if (0 == result) { - struct amdgpu_device *adev = hwmgr->adev; + if (result) + goto fail; - data->is_tlu_enabled = false; + data->is_tlu_enabled = false; - hwmgr->platform_descriptor.hardwareActivityPerformanceLevels = + hwmgr->platform_descriptor.hardwareActivityPerformanceLevels = SMU7_MAX_HARDWARE_POWERLEVELS; - hwmgr->platform_descriptor.hardwarePerformanceLevels = 2; - hwmgr->platform_descriptor.minimumClocksReductionPercentage = 50; + hwmgr->platform_descriptor.hardwarePerformanceLevels = 2; + hwmgr->platform_descriptor.minimumClocksReductionPercentage = 50; - data->pcie_gen_cap = adev->pm.pcie_gen_mask; - if (data->pcie_gen_cap & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) - data->pcie_spc_cap = 20; - else - data->pcie_spc_cap = 16; - data->pcie_lane_cap = adev->pm.pcie_mlw_mask; - - hwmgr->platform_descriptor.vbiosInterruptId = 0x2400; /* IRQ_SOURCE1_SW_INT */ -/* The true clock step depends on the frequency, typically 4.5 or 9 MHz. Here we use 5. */ - hwmgr->platform_descriptor.clockStep.engineClock = 500; - hwmgr->platform_descriptor.clockStep.memoryClock = 500; - smu7_thermal_parameter_init(hwmgr); - } else { - /* Ignore return value in here, we are cleaning up a mess. */ - smu7_hwmgr_backend_fini(hwmgr); - } + data->pcie_gen_cap = adev->pm.pcie_gen_mask; + if (data->pcie_gen_cap & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) + data->pcie_spc_cap = 20; + else + data->pcie_spc_cap = 16; + data->pcie_lane_cap = adev->pm.pcie_mlw_mask; + + hwmgr->platform_descriptor.vbiosInterruptId = 0x2400; /* IRQ_SOURCE1_SW_INT */ + /* The true clock step depends on the frequency, typically 4.5 or 9 MHz. Here we use 5. */ + hwmgr->platform_descriptor.clockStep.engineClock = 500; + hwmgr->platform_descriptor.clockStep.memoryClock = 500; + smu7_thermal_parameter_init(hwmgr); result = smu7_update_edc_leakage_table(hwmgr); - if (result) { - smu7_hwmgr_backend_fini(hwmgr); - return result; - } + if (result) + goto fail; return 0; +fail: + smu7_hwmgr_backend_fini(hwmgr); + return result; } static int smu7_force_dpm_highest(struct pp_hwmgr *hwmgr) -- 2.34.1
RE: [PATCH 5/5] drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: YiPeng Chai - Best Regards, Thomas -Original Message- From: amd-gfx On Behalf Of Ma Jun Sent: Monday, May 13, 2024 4:56 PM To: amd-gfx@lists.freedesktop.org Cc: Feng, Kenneth ; Deucher, Alexander ; Wang, Yang(Kevin) ; Koenig, Christian ; Ma, Jun Subject: [PATCH 5/5] drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr Remove dead code in amdgpu_ras_add_mca_err_addr Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 6da02a209890..0cf67923c0fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -4292,21 +4292,8 @@ static struct ras_err_info *amdgpu_ras_error_get_info(struct ras_err_data *err_d void amdgpu_ras_add_mca_err_addr(struct ras_err_info *err_info, struct ras_err_addr *err_addr) { - struct ras_err_addr *mca_err_addr; - /* This function will be retired. */ return; - mca_err_addr = kzalloc(sizeof(*mca_err_addr), GFP_KERNEL); - if (!mca_err_addr) - return; - - INIT_LIST_HEAD(_err_addr->node); - - mca_err_addr->err_status = err_addr->err_status; - mca_err_addr->err_ipid = err_addr->err_ipid; - mca_err_addr->err_addr = err_addr->err_addr; - - list_add_tail(_err_addr->node, _info->err_addr_list); } void amdgpu_ras_del_mca_err_addr(struct ras_err_info *err_info, struct ras_err_addr *mca_err_addr) -- 2.34.1
Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle
Hi Lijo & Kevin, thanks for review, will drop this patch Regards, Ma Jun On 5/14/2024 7:13 AM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > -Original Message- > From: Ma, Jun > Sent: Monday, May 13, 2024 4:56 PM > To: amd-gfx@lists.freedesktop.org > Cc: Feng, Kenneth ; Deucher, Alexander > ; Wang, Yang(Kevin) ; > Koenig, Christian ; Ma, Jun > Subject: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle > > Check handle pointer before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > index 28febf33fb1b..e969a7d77b4d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > @@ -279,7 +279,7 @@ static struct aca_bank_error *get_bank_error(struct > aca_error *aerr, struct aca_ int aca_error_cache_log_bank_error(struct > aca_handle *handle, struct aca_bank_info *info, >enum aca_error_type type, u64 count) { > - struct aca_error_cache *error_cache = >error_cache; > + struct aca_error_cache *error_cache; > struct aca_bank_error *bank_error; > struct aca_error *aerr; > > @@ -289,6 +289,10 @@ int aca_error_cache_log_bank_error(struct aca_handle > *handle, struct aca_bank_in > if (!count) > return 0; > > + error_cache = >error_cache; > [Kevin]: > The above code is always return non-0 value, right? > > Best Regards, > Kevin > + if (!error_cache) > + return -EINVAL; > + > aerr = _cache->errors[type]; > bank_error = get_bank_error(aerr, info); > if (!bank_error) > -- > 2.34.1 >
[PATCH] drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()
create a new helper function to avoid compiler 'side-effect' check about RAS_EVENT_LOG() macro. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 18 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 13 ++--- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1dd13ed3b7b5..c04e6ced1af3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -4504,3 +4504,21 @@ int amdgpu_ras_reserve_page(struct amdgpu_device *adev, uint64_t pfn) return ret; } + +void amdgpu_ras_event_log_print(struct amdgpu_device *adev, u64 event_id, + const char *fmt, ...) +{ + struct va_format vaf; + va_list args; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = + + if (amdgpu_ras_event_id_is_valid(adev, event_id)) + dev_printk(KERN_INFO, adev->dev, "{%llu}%pV", event_id, ); + else + dev_printk(KERN_INFO, adev->dev, "%pV", ); + + va_end(args); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index c8980d5f6540..6a8c7b1609df 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h @@ -67,13 +67,8 @@ struct amdgpu_iv_entry; /* The high three bits indicates socketid */ #define AMDGPU_RAS_GET_FEATURES(val) ((val) & ~AMDGPU_RAS_FEATURES_SOCKETID_MASK) -#define RAS_EVENT_LOG(_adev, _id, _fmt, ...) \ -do { \ - if (amdgpu_ras_event_id_is_valid((_adev), (_id))) \ - dev_info((_adev)->dev, "{%llu}" _fmt, (_id), ##__VA_ARGS__); \ - else\ - dev_info((_adev)->dev, _fmt, ##__VA_ARGS__); \ -} while (0) +#define RAS_EVENT_LOG(adev, id, fmt, ...) \ + amdgpu_ras_event_log_print((adev), (id), (fmt), ##__VA_ARGS__); enum amdgpu_ras_block { AMDGPU_RAS_BLOCK__UMC = 0, @@ -956,4 +951,8 @@ int amdgpu_ras_put_poison_req(struct amdgpu_device *adev, enum amdgpu_ras_block block, uint16_t pasid, pasid_notify pasid_fn, void *data, uint32_t reset); +__printf(3, 4) +void amdgpu_ras_event_log_print(struct amdgpu_device *adev, u64 event_id, + const char *fmt, ...); + #endif -- 2.34.1
Re: [PATCH] drm/kfd: Correct pined buffer handling at kfd restore and validate process
On 2024-05-13 11:18, Xiaogang.Chen wrote: > From: Xiaogang Chen > > This reverts 8a774fe912ff09e39c2d3a3589c729330113f388 "drm/amdgpu: avoid > restore > process run into dead loop" since buffer got pined is not related whether it Spelling: pined -> pinned Same in the commit headline. > needs mapping. And skip buffer validation at kfd driver if the buffer has been > pinned. > > Signed-off-by: Xiaogang Chen > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index 3314821e4cf3..80018738bd1c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -415,6 +415,10 @@ static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo > *bo, uint32_t domain, >"Called with userptr BO")) > return -EINVAL; > > + /* bo has been pined, not need validate it */ pined -> pinned With those typos fixed, the patch is Reviewed-by: Felix Kuehling > + if (bo->tbo.pin_count) > + return 0; > + > amdgpu_bo_placement_from_domain(bo, domain); > > ret = ttm_bo_validate(>tbo, >placement, ); > @@ -2736,7 +2740,7 @@ static int confirm_valid_user_pages_locked(struct > amdkfd_process_info *process_i > > /* keep mem without hmm range at userptr_inval_list */ > if (!mem->range) > - continue; > + continue; > > /* Only check mem with hmm range associated */ > valid = amdgpu_ttm_tt_get_user_pages_done( > @@ -2981,9 +2985,6 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, > struct dma_fence __rcu * > if (!attachment->is_mapped) > continue; > > - if (attachment->bo_va->base.bo->tbo.pin_count) > - continue; > - > kfd_mem_dmaunmap_attachment(mem, attachment); > ret = update_gpuvm_pte(mem, attachment, _obj); > if (ret) {
RE: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle
[AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Ma, Jun Sent: Monday, May 13, 2024 4:56 PM To: amd-gfx@lists.freedesktop.org Cc: Feng, Kenneth ; Deucher, Alexander ; Wang, Yang(Kevin) ; Koenig, Christian ; Ma, Jun Subject: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle Check handle pointer before using it Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c index 28febf33fb1b..e969a7d77b4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c @@ -279,7 +279,7 @@ static struct aca_bank_error *get_bank_error(struct aca_error *aerr, struct aca_ int aca_error_cache_log_bank_error(struct aca_handle *handle, struct aca_bank_info *info, enum aca_error_type type, u64 count) { - struct aca_error_cache *error_cache = >error_cache; + struct aca_error_cache *error_cache; struct aca_bank_error *bank_error; struct aca_error *aerr; @@ -289,6 +289,10 @@ int aca_error_cache_log_bank_error(struct aca_handle *handle, struct aca_bank_in if (!count) return 0; + error_cache = >error_cache; [Kevin]: The above code is always return non-0 value, right? Best Regards, Kevin + if (!error_cache) + return -EINVAL; + aerr = _cache->errors[type]; bank_error = get_bank_error(aerr, info); if (!bank_error) -- 2.34.1
RE: [PATCH v3 10/10] Documentation/amdgpu: Add PM policy documentation
[Public] > -Original Message- > From: Lazar, Lijo > Sent: Monday, May 13, 2024 5:21 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Kamal, Asad ; Ma, > Le > Subject: [PATCH v3 10/10] Documentation/amdgpu: Add PM policy > documentation > > Add documentation about the newly added pm_policy node in sysfs. > > Signed-off-by: Lijo Lazar > --- > Documentation/gpu/amdgpu/thermal.rst | 6 > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 48 > > 2 files changed, 54 insertions(+) > > diff --git a/Documentation/gpu/amdgpu/thermal.rst > b/Documentation/gpu/amdgpu/thermal.rst > index 2f6166f81e6a..6d942b5c58f0 100644 > --- a/Documentation/gpu/amdgpu/thermal.rst > +++ b/Documentation/gpu/amdgpu/thermal.rst > @@ -49,6 +49,12 @@ pp_power_profile_mode .. kernel-doc:: > drivers/gpu/drm/amd/pm/amdgpu_pm.c > :doc: pp_power_profile_mode > > +pm_policy > +- > + > +.. kernel-doc:: drivers/gpu/drm/amd/pm/amdgpu_pm.c > + :doc: pm_policy > + > \*_busy_percent > --- > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > index 9cca4716ec43..45766d49f1f2 100644 > --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > @@ -2214,6 +2214,54 @@ static int pp_dpm_clk_default_attr_update(struct > amdgpu_device *adev, struct amd > return 0; > } > > +/** > + * DOC: pm_policy > + * > + * Certain SOCs can support different power policies to optimize > +application > + * performance. However, this policy is provided only at SOC level and > +not at a > + * per-process level. This is useful especially when entire SOC is > +utilized for > + * dedicated workload. > + * > + * The amdgpu driver provides a sysfs API for selecting the policy. > +Presently, > + * only two types of policies are supported through this interface. > + * > + * Pstate Policy Selection - This is to select different Pstate > +profiles which > + * decides clock/throttling preferences. > + * > + * XGMI PLPD Policy Selection - When multiple devices are connected > +over XGMI, > + * this helps to select policy to be applied for per link power down. > + * > + * The list of available policies and policy levels vary between SOCs. > +They can > + * be viewed by reading the file. The policy level which is applied > +presently is > + * denoted by * (asterisk). E.g., > + * > + * .. code-block:: console > + * > + * cat /sys/bus/pci/devices/.../pm_policy > + * soc pstate > + * 0 : soc_pstate_default > + * 1 : soc_pstate_0 > + * 2 : soc_pstate_1* > + * 3 : soc_pstate_2 > + * xgmi plpd > + * 0 : plpd_disallow > + * 1 : plpd_default > + * 2 : plpd_optimized* > + * > + * To apply a specific policy > + * > + * "echo > /sys/bus/pci/devices/.../pm_policy" > + * > + * For the levels listed in the example above, to select > +"plpd_optimized" for > + * XGMI and "soc_pstate_2" for soc pstate policy - > + * > + * .. code-block:: console > + * > + * echo "xgmi 2" > /sys/bus/pci/devices/.../pm_policy > + * echo "soc_pstate 3" > /sys/bus/pci/devices/.../pm_policy The naming should be consistent between what is printed when you read the file and what you write to it. E.g., policy_type should be soc_pstate and xgmi_plpd in both cases. Alex > + * > + */ > + > static ssize_t amdgpu_get_pm_policy(struct device *dev, > struct device_attribute *attr, char *buf) { > -- > 2.25.1
[PATCH 2/2] drm/amdgpu: apply state adjust rules to some additional HAINAN vairants
They need a similar workaround. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c index f245fc0bc6d36..c836bb03547f6 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c @@ -3435,9 +3435,11 @@ static void si_apply_state_adjust_rules(struct amdgpu_device *adev, if (adev->asic_type == CHIP_HAINAN) { if ((adev->pdev->revision == 0x81) || (adev->pdev->revision == 0xC3) || + (adev->pdev->device == 0x6660) || (adev->pdev->device == 0x6664) || (adev->pdev->device == 0x6665) || - (adev->pdev->device == 0x6667)) { + (adev->pdev->device == 0x6667) || + (adev->pdev->device == 0x666F)) { max_sclk = 75000; } if ((adev->pdev->revision == 0xC3) || -- 2.45.0
[PATCH 1/2] drm/radeon: apply state adjust rules to some additional HAINAN vairants
They need a similar workaround. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/si_dpm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c index 9deb91970d4df..5db16c20bd173 100644 --- a/drivers/gpu/drm/radeon/si_dpm.c +++ b/drivers/gpu/drm/radeon/si_dpm.c @@ -2915,9 +2915,11 @@ static void si_apply_state_adjust_rules(struct radeon_device *rdev, if (rdev->family == CHIP_HAINAN) { if ((rdev->pdev->revision == 0x81) || (rdev->pdev->revision == 0xC3) || + (rdev->pdev->device == 0x6660) || (rdev->pdev->device == 0x6664) || (rdev->pdev->device == 0x6665) || - (rdev->pdev->device == 0x6667)) { + (rdev->pdev->device == 0x6667) || + (rdev->pdev->device == 0x666F)) { max_sclk = 75000; } if ((rdev->pdev->revision == 0xC3) || -- 2.45.0
Re: [PATCH] drm/amdgpu/pm: update documentation on memory clock
Ping? On Thu, May 2, 2024 at 5:07 PM Alex Deucher wrote: > > Update documentation for RDNA3 dGPUs. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > index ec9058c80647a..9ad114e695e5d 100644 > --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c > @@ -610,12 +610,18 @@ static ssize_t amdgpu_set_pp_table(struct device *dev, > * > * Clock conversion (Mhz): > * > + * Pre-RDNA3 GPUs: > + * > * HBM: effective_memory_clock = memory_controller_clock * 1 > * > * G5: effective_memory_clock = memory_controller_clock * 1 > * > * G6: effective_memory_clock = memory_controller_clock * 2 > * > + * RDNA3 GPUs: > + * > + * G6: effective_memory_clock = memory_controller_clock * 1 > + * > * DRAM data rate (MT/s): > * > * HBM: effective_memory_clock * 2 = data_rate > -- > 2.44.0 >
[PATCH 3/3] drm/amdgpu/gfx11: enable gfx pipe1 hardware support
Enable gfx pipe1 hardware support. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index d750ab86e4b27..96f441917b719 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -51,7 +51,7 @@ #include "nbio_v4_3.h" #include "mes_v11_0.h" -#define GFX11_NUM_GFX_RINGS1 +#define GFX11_NUM_GFX_RINGS2 #define GFX11_MEC_HPD_SIZE 2048 #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L @@ -1342,7 +1342,7 @@ static int gfx_v11_0_sw_init(void *handle) case IP_VERSION(11, 0, 2): case IP_VERSION(11, 0, 3): adev->gfx.me.num_me = 1; - adev->gfx.me.num_pipe_per_me = 1; + adev->gfx.me.num_pipe_per_me = 2; adev->gfx.me.num_queue_per_pipe = 1; adev->gfx.mec.num_mec = 2; adev->gfx.mec.num_pipe_per_mec = 4; @@ -1353,7 +1353,7 @@ static int gfx_v11_0_sw_init(void *handle) case IP_VERSION(11, 5, 0): case IP_VERSION(11, 5, 1): adev->gfx.me.num_me = 1; - adev->gfx.me.num_pipe_per_me = 1; + adev->gfx.me.num_pipe_per_me = 2; adev->gfx.me.num_queue_per_pipe = 1; adev->gfx.mec.num_mec = 1; adev->gfx.mec.num_pipe_per_mec = 4; -- 2.45.0
[PATCH 1/3] drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe
Use correct ref/mask for differnent gfx ring pipe. Ported from ZhenGuo's patch for gfx10. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c87943f6c4436..c8c055ef2f3c2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -5294,7 +5294,7 @@ static void gfx_v11_0_ring_emit_hdp_flush(struct amdgpu_ring *ring) } reg_mem_engine = 0; } else { - ref_and_mask = nbio_hf_reg->ref_and_mask_cp0; + ref_and_mask = nbio_hf_reg->ref_and_mask_cp0 << ring->pipe; reg_mem_engine = 1; /* pfp */ } -- 2.45.0
[PATCH 2/3] drm/amdgpu/gfx11: handle priority setup for gfx pipe1
Set up pipe1 as a high priority queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 36 ++ 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c8c055ef2f3c2..d750ab86e4b27 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -930,9 +930,9 @@ static int gfx_v11_0_gpu_early_init(struct amdgpu_device *adev) static int gfx_v11_0_gfx_ring_init(struct amdgpu_device *adev, int ring_id, int me, int pipe, int queue) { - int r; struct amdgpu_ring *ring; unsigned int irq_type; + unsigned int hw_prio; ring = >gfx.gfx_ring[ring_id]; @@ -951,11 +951,10 @@ static int gfx_v11_0_gfx_ring_init(struct amdgpu_device *adev, int ring_id, sprintf(ring->name, "gfx_%d.%d.%d", ring->me, ring->pipe, ring->queue); irq_type = AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP + ring->pipe; - r = amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq, irq_type, -AMDGPU_RING_PRIO_DEFAULT, NULL); - if (r) - return r; - return 0; + hw_prio = amdgpu_gfx_is_high_priority_graphics_queue(adev, ring) ? + AMDGPU_GFX_PIPE_PRIO_HIGH : AMDGPU_GFX_PIPE_PRIO_NORMAL; + return amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq, irq_type, + hw_prio, NULL); } static int gfx_v11_0_compute_ring_init(struct amdgpu_device *adev, int ring_id, @@ -3616,6 +3615,24 @@ static void gfx_v11_0_cp_set_doorbell_range(struct amdgpu_device *adev) (adev->doorbell_index.userqueue_end * 2) << 2); } +static void gfx_v11_0_gfx_mqd_set_priority(struct amdgpu_device *adev, + struct v11_gfx_mqd *mqd, + struct amdgpu_mqd_prop *prop) +{ + bool priority = 0; + u32 tmp; + + /* set up default queue priority level +* 0x0 = low priority, 0x1 = high priority +*/ + if (prop->hqd_pipe_priority == AMDGPU_GFX_PIPE_PRIO_HIGH) + priority = 1; + + tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY); + tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, priority); + mqd->cp_gfx_hqd_queue_priority = tmp; +} + static int gfx_v11_0_gfx_mqd_init(struct amdgpu_device *adev, void *m, struct amdgpu_mqd_prop *prop) { @@ -3644,11 +3661,8 @@ static int gfx_v11_0_gfx_mqd_init(struct amdgpu_device *adev, void *m, tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0); mqd->cp_gfx_hqd_vmid = 0; - /* set up default queue priority level -* 0x0 = low priority, 0x1 = high priority */ - tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY); - tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0); - mqd->cp_gfx_hqd_queue_priority = tmp; + /* set up gfx queue priority */ + gfx_v11_0_gfx_mqd_set_priority(adev, mqd, prop); /* set up time quantum */ tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM); -- 2.45.0
Re: [PATCH] drm/amdgpu/soc24: use common nbio callback to set remap offset
Ping. On Wed, May 8, 2024 at 3:42 PM Alex Deucher wrote: > > This fixes HDP flushes on systems with non-4K pages. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/soc24.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c > b/drivers/gpu/drm/amd/amdgpu/soc24.c > index 12900488dd618..66c7138fc6aa4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc24.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c > @@ -372,11 +372,9 @@ static const struct amdgpu_asic_funcs soc24_asic_funcs = > { > > static int soc24_common_early_init(void *handle) > { > -#define MMIO_REG_HOLE_OFFSET (0x8 - PAGE_SIZE) > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > - adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET; > - adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET; > + adev->nbio.funcs->set_reg_remap(adev); > adev->smc_rreg = NULL; > adev->smc_wreg = NULL; > adev->pcie_rreg = _device_indirect_rreg; > -- > 2.45.0 >
Re: [PATCH v2] drm/amd/display: fix documentation warnings for mpc.h
On Sat, May 11, 2024 at 12:22 AM Stephen Rothwell wrote: > > Hi Marcelo, > > On Sat, 11 May 2024 13:37:17 +1000 Stephen Rothwell > wrote: > > > > Thanks for doing this. > > > > I haven't tested it, but just a couple of little things: > > > > On Fri, 10 May 2024 21:02:02 -0300 Marcelo Mendes Spessoto Junior > > wrote: > > > > > > Fix most of the display documentation compile warnings by > > > documenting struct mpc_funcs functions in dc/inc/hw/mpc.h file. > > > > > . > > . > > . > > > > > > Fixes: > > > b8c1c3a82e75 ("Documentation/gpu: Add kernel doc entry for MPC") > > > > Please don't split the Fixes tag line and also don't add blank lines > > between tags. > > > > I also appreciate being credited for reporting (unless you got a report > > from somewhere else as well, then maybe credit both). > > > > Reported-by: Stephen Rothwell > > Now tested. > > Tested-by: Stephen Rothwell > Closes: > https://lore.kernel.org/linux-next/20240130134954.04fcf...@canb.auug.org.au/ Applied and added/fixed up tags. Thanks! Alex > > -- > Cheers, > Stephen Rothwell
Re: [PATCH v2] drm/amdgpu: Add Ring Hang Events
Am 09.05.24 um 22:41 schrieb Ori Messinger: This patch adds 'ring hang' events to the driver. This is done by adding a 'reset_ring_hang' bool variable to the struct 'amdgpu_reset_context' in the amdgpu_reset.h file. The purpose for this 'reset_ring_hang' variable is whenever a GPU reset is initiated due to a ring hang, the reset_ring_hang should be set to 'true'. This 'amdgpu_reset_context' struct is now also passed through across all relevant functions, and another event type "KFD_SMI_EVENT_RING_HANG" is added to the kfd_smi_event enum. To repeat myself on the newest version on the patch, this whole approach is a clear NAK. Driver hangs including the cause of it are exposed through udev. And in general exposing driver telemetry through the SMI interface in KFD is a pretty clear no-go. Regards, Christian. Signed-off-by: Ori Messinger Change-Id: I6af3022eb1b4514201c9430d635ff87f167ad6f7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 9 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 5 - include/uapi/linux/kfd_ioctl.h | 15 --- 9 files changed, 38 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index e3738d417245..ac0ee4322555 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -133,6 +133,8 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work) reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; + reset_context.reset_ring_hang = true; + strscpy(reset_context.reset_cause, "scheduler_hang", sizeof(reset_context.reset_cause)); clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags); amdgpu_device_gpu_recover(adev, NULL, _context); @@ -261,12 +263,12 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm) return r; } -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev) +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { int r = 0; if (adev->kfd.dev) - r = kgd2kfd_pre_reset(adev->kfd.dev); + r = kgd2kfd_pre_reset(adev->kfd.dev, reset_context); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 1de021ebdd46..c9030d8b8308 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -47,6 +47,7 @@ enum TLB_FLUSH_TYPE { }; struct amdgpu_device; +struct amdgpu_reset_context; enum kfd_mem_attachment_type { KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */ @@ -170,7 +171,8 @@ bool amdgpu_amdkfd_have_atomics_support(struct amdgpu_device *adev); bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid); -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev); +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, + struct amdgpu_reset_context *reset_context); int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev); @@ -416,7 +418,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, void kgd2kfd_device_exit(struct kfd_dev *kfd); void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm); int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm); -int kgd2kfd_pre_reset(struct kfd_dev *kfd); +int kgd2kfd_pre_reset(struct kfd_dev *kfd, + struct amdgpu_reset_context *reset_context); int kgd2kfd_post_reset(struct kfd_dev *kfd); void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry); void kgd2kfd_set_sram_ecc_flag(struct kfd_dev *kfd); @@ -459,7 +462,7 @@ static inline int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) return 0; } -static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd) +static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd, struct amdgpu_reset_context *reset_context) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 77f6fd50002a..f9fa784f36f4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5772,7 +5772,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, cancel_delayed_work_sync(_adev->delayed_init_work); - amdgpu_amdkfd_pre_reset(tmp_adev); + amdgpu_amdkfd_pre_reset(tmp_adev, reset_context); /* * Mark these ASICs to be reseted as untracked first diff --git
Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend
On 2024-05-13 13:43, Christian König wrote: Am 13.05.24 um 19:41 schrieb David Wu: On 2024-05-13 13:11, Christian König wrote: Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index 851975b5ce29..9b87d6a49b39 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -229,8 +229,6 @@ static int vcn_v5_0_0_hw_fini(void *handle) for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; - - amdgpu_irq_put(adev, >vcn.inst[i].irq, 0); } Looks like you can now remove the whole for loop. I realized that but there is a new patch added inside this loop to cover the suspend/resume issue. Is that added in a later patch or did you rebased your patch ontop of it? If it's added in a later patch then it's better to remove and re-add the lines. Otherwise you can get a mail from automated scripts that you have dead code. Bit annoying but the documented way of doing things in the kernel. understood - it is added in the next patch right after this one - so far I have not got an email for the dead code yet. If it raises an issue I will work on it. Regards, Christian. Apart from that looks good to me, Christian. return 0; @@ -1226,22 +1224,6 @@ static int vcn_v5_0_0_set_powergating_state(void *handle, enum amd_powergating_s return ret; } -/** - * vcn_v5_0_0_set_interrupt_state - set VCN block interrupt state - * - * @adev: amdgpu_device pointer - * @source: interrupt sources - * @type: interrupt types - * @state: interrupt states - * - * Set VCN block interrupt state - */ -static int vcn_v5_0_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source, - unsigned type, enum amdgpu_interrupt_state state) -{ - return 0; -} - /** * vcn_v5_0_0_process_interrupt - process VCN block interrupt * @@ -1287,7 +1269,6 @@ static int vcn_v5_0_0_process_interrupt(struct amdgpu_device *adev, struct amdgp } static const struct amdgpu_irq_src_funcs vcn_v5_0_0_irq_funcs = { - .set = vcn_v5_0_0_set_interrupt_state, .process = vcn_v5_0_0_process_interrupt, };
Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend
Am 13.05.24 um 19:41 schrieb David Wu: On 2024-05-13 13:11, Christian König wrote: Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index 851975b5ce29..9b87d6a49b39 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -229,8 +229,6 @@ static int vcn_v5_0_0_hw_fini(void *handle) for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; - - amdgpu_irq_put(adev, >vcn.inst[i].irq, 0); } Looks like you can now remove the whole for loop. I realized that but there is a new patch added inside this loop to cover the suspend/resume issue. Is that added in a later patch or did you rebased your patch ontop of it? If it's added in a later patch then it's better to remove and re-add the lines. Otherwise you can get a mail from automated scripts that you have dead code. Bit annoying but the documented way of doing things in the kernel. Regards, Christian. Apart from that looks good to me, Christian. return 0; @@ -1226,22 +1224,6 @@ static int vcn_v5_0_0_set_powergating_state(void *handle, enum amd_powergating_s return ret; } -/** - * vcn_v5_0_0_set_interrupt_state - set VCN block interrupt state - * - * @adev: amdgpu_device pointer - * @source: interrupt sources - * @type: interrupt types - * @state: interrupt states - * - * Set VCN block interrupt state - */ -static int vcn_v5_0_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source, - unsigned type, enum amdgpu_interrupt_state state) -{ - return 0; -} - /** * vcn_v5_0_0_process_interrupt - process VCN block interrupt * @@ -1287,7 +1269,6 @@ static int vcn_v5_0_0_process_interrupt(struct amdgpu_device *adev, struct amdgp } static const struct amdgpu_irq_src_funcs vcn_v5_0_0_irq_funcs = { - .set = vcn_v5_0_0_set_interrupt_state, .process = vcn_v5_0_0_process_interrupt, };
Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend
On 2024-05-13 13:11, Christian König wrote: Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index 851975b5ce29..9b87d6a49b39 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -229,8 +229,6 @@ static int vcn_v5_0_0_hw_fini(void *handle) for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; - - amdgpu_irq_put(adev, >vcn.inst[i].irq, 0); } Looks like you can now remove the whole for loop. I realized that but there is a new patch added inside this loop to cover the suspend/resume issue. Apart from that looks good to me, Christian. return 0; @@ -1226,22 +1224,6 @@ static int vcn_v5_0_0_set_powergating_state(void *handle, enum amd_powergating_s return ret; } -/** - * vcn_v5_0_0_set_interrupt_state - set VCN block interrupt state - * - * @adev: amdgpu_device pointer - * @source: interrupt sources - * @type: interrupt types - * @state: interrupt states - * - * Set VCN block interrupt state - */ -static int vcn_v5_0_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source, - unsigned type, enum amdgpu_interrupt_state state) -{ - return 0; -} - /** * vcn_v5_0_0_process_interrupt - process VCN block interrupt * @@ -1287,7 +1269,6 @@ static int vcn_v5_0_0_process_interrupt(struct amdgpu_device *adev, struct amdgp } static const struct amdgpu_irq_src_funcs vcn_v5_0_0_irq_funcs = { - .set = vcn_v5_0_0_set_interrupt_state, .process = vcn_v5_0_0_process_interrupt, };
Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend
Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index 851975b5ce29..9b87d6a49b39 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -229,8 +229,6 @@ static int vcn_v5_0_0_hw_fini(void *handle) for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; - - amdgpu_irq_put(adev, >vcn.inst[i].irq, 0); } Looks like you can now remove the whole for loop. Apart from that looks good to me, Christian. return 0; @@ -1226,22 +1224,6 @@ static int vcn_v5_0_0_set_powergating_state(void *handle, enum amd_powergating_s return ret; } -/** - * vcn_v5_0_0_set_interrupt_state - set VCN block interrupt state - * - * @adev: amdgpu_device pointer - * @source: interrupt sources - * @type: interrupt types - * @state: interrupt states - * - * Set VCN block interrupt state - */ -static int vcn_v5_0_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source, - unsigned type, enum amdgpu_interrupt_state state) -{ - return 0; -} - /** * vcn_v5_0_0_process_interrupt - process VCN block interrupt * @@ -1287,7 +1269,6 @@ static int vcn_v5_0_0_process_interrupt(struct amdgpu_device *adev, struct amdgp } static const struct amdgpu_irq_src_funcs vcn_v5_0_0_irq_funcs = { - .set = vcn_v5_0_0_set_interrupt_state, .process = vcn_v5_0_0_process_interrupt, };
Re: [RESEND 4/6] drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr
On Fri, May 10, 2024 at 5:09 PM Jani Nikula wrote: > > amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side > effect if amdgpu_connector->edid isn't initialized. However, everywhere > that the returned EDID is used, the EDID should have been set > beforehands. > > Only the drm EDID code should look at the EDID property, anyway, so stop > using it. > > Cc: Alex Deucher > Cc: Christian König > Cc: Pan, Xinhui > Cc: amd-gfx@lists.freedesktop.org > Signed-off-by: Jani Nikula > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 16 > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h | 1 - > drivers/gpu/drm/amd/amdgpu/dce_v10_0.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/dce_v11_0.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 ++-- > 6 files changed, 8 insertions(+), 25 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c > index 9caba10315a8..cae7479c3ecf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c > @@ -246,22 +246,6 @@ amdgpu_connector_find_encoder(struct drm_connector > *connector, > return NULL; > } > > -struct edid *amdgpu_connector_edid(struct drm_connector *connector) > -{ > - struct amdgpu_connector *amdgpu_connector = > to_amdgpu_connector(connector); > - struct drm_property_blob *edid_blob = connector->edid_blob_ptr; > - > - if (amdgpu_connector->edid) { > - return amdgpu_connector->edid; > - } else if (edid_blob) { > - struct edid *edid = kmemdup(edid_blob->data, > edid_blob->length, GFP_KERNEL); > - > - if (edid) > - amdgpu_connector->edid = edid; > - } > - return amdgpu_connector->edid; > -} > - > static struct edid * > amdgpu_connector_get_hardcoded_edid(struct amdgpu_device *adev) > { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h > index 61fcef15ad72..eff833b6ed31 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h > @@ -24,7 +24,6 @@ > #ifndef __AMDGPU_CONNECTORS_H__ > #define __AMDGPU_CONNECTORS_H__ > > -struct edid *amdgpu_connector_edid(struct drm_connector *connector); > void amdgpu_connector_hotplug(struct drm_connector *connector); > int amdgpu_connector_get_monitor_bpc(struct drm_connector *connector); > u16 amdgpu_connector_encoder_get_dp_bridge_encoder_id(struct drm_connector > *connector); > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c > index b44fce44c066..dddb5fe16f2c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c > @@ -1299,7 +1299,7 @@ static void > dce_v10_0_audio_write_speaker_allocation(struct drm_encoder *encoder > return; > } > > - sad_count = > drm_edid_to_speaker_allocation(amdgpu_connector_edid(connector), ); > + sad_count = drm_edid_to_speaker_allocation(amdgpu_connector->edid, > ); > if (sad_count < 0) { > DRM_ERROR("Couldn't read Speaker Allocation Data Block: > %d\n", sad_count); > sad_count = 0; > @@ -1369,7 +1369,7 @@ static void dce_v10_0_audio_write_sad_regs(struct > drm_encoder *encoder) > return; > } > > - sad_count = drm_edid_to_sad(amdgpu_connector_edid(connector), ); > + sad_count = drm_edid_to_sad(amdgpu_connector->edid, ); > if (sad_count < 0) > DRM_ERROR("Couldn't read SADs: %d\n", sad_count); > if (sad_count <= 0) > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > index 80b2e7f79acf..11780e4d7e9f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > @@ -1331,7 +1331,7 @@ static void > dce_v11_0_audio_write_speaker_allocation(struct drm_encoder *encoder > return; > } > > - sad_count = > drm_edid_to_speaker_allocation(amdgpu_connector_edid(connector), ); > + sad_count = drm_edid_to_speaker_allocation(amdgpu_connector->edid, > ); > if (sad_count < 0) { > DRM_ERROR("Couldn't read Speaker Allocation Data Block: > %d\n", sad_count); > sad_count = 0; > @@ -1401,7 +1401,7 @@ static void dce_v11_0_audio_write_sad_regs(struct > drm_encoder *encoder) > return; > } > > - sad_count = drm_edid_to_sad(amdgpu_connector_edid(connector), ); > + sad_count = drm_edid_to_sad(amdgpu_connector->edid, ); > if (sad_count < 0) > DRM_ERROR("Couldn't read SADs: %d\n", sad_count); > if (sad_count <= 0) > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c >
Re: [RESEND 3/6] drm/radeon: remove radeon_connector_edid() and stop using edid_blob_ptr
On Fri, May 10, 2024 at 5:08 PM Jani Nikula wrote: > > radeon_connector_edid() copies the EDID from edid_blob_ptr as a side > effect if radeon_connector->edid isn't initialized. However, everywhere > that the returned EDID is used, the EDID should have been set > beforehands. > > Only the drm EDID code should look at the EDID property, anyway, so stop > using it. > > Cc: Alex Deucher > Cc: Christian König > Cc: Pan, Xinhui > Cc: amd-gfx@lists.freedesktop.org > Signed-off-by: Jani Nikula > --- > drivers/gpu/drm/radeon/radeon_audio.c | 7 --- > drivers/gpu/drm/radeon/radeon_connectors.c | 15 --- > drivers/gpu/drm/radeon/radeon_mode.h | 2 -- > 3 files changed, 4 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_audio.c > b/drivers/gpu/drm/radeon/radeon_audio.c > index 16c10db3ce6f..0bcd767b9f47 100644 > --- a/drivers/gpu/drm/radeon/radeon_audio.c > +++ b/drivers/gpu/drm/radeon/radeon_audio.c > @@ -303,6 +303,7 @@ void radeon_audio_endpoint_wreg(struct radeon_device > *rdev, u32 offset, > static void radeon_audio_write_sad_regs(struct drm_encoder *encoder) > { > struct drm_connector *connector = > radeon_get_connector_for_encoder(encoder); > + struct radeon_connector *radeon_connector = > to_radeon_connector(connector); > struct radeon_encoder *radeon_encoder = to_radeon_encoder(encoder); > struct cea_sad *sads; > int sad_count; > @@ -310,7 +311,7 @@ static void radeon_audio_write_sad_regs(struct > drm_encoder *encoder) > if (!connector) > return; > > - sad_count = drm_edid_to_sad(radeon_connector_edid(connector), ); > + sad_count = drm_edid_to_sad(radeon_connector->edid, ); > if (sad_count < 0) > DRM_ERROR("Couldn't read SADs: %d\n", sad_count); > if (sad_count <= 0) > @@ -326,6 +327,7 @@ static void radeon_audio_write_sad_regs(struct > drm_encoder *encoder) > static void radeon_audio_write_speaker_allocation(struct drm_encoder > *encoder) > { > struct drm_connector *connector = > radeon_get_connector_for_encoder(encoder); > + struct radeon_connector *radeon_connector = > to_radeon_connector(connector); > struct radeon_encoder *radeon_encoder = to_radeon_encoder(encoder); > u8 *sadb = NULL; > int sad_count; > @@ -333,8 +335,7 @@ static void radeon_audio_write_speaker_allocation(struct > drm_encoder *encoder) > if (!connector) > return; > > - sad_count = > drm_edid_to_speaker_allocation(radeon_connector_edid(connector), > - ); > + sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, > ); > if (sad_count < 0) { > DRM_DEBUG("Couldn't read Speaker Allocation Data Block: %d\n", > sad_count); > diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c > b/drivers/gpu/drm/radeon/radeon_connectors.c > index 81b5c3c8f658..80879e946342 100644 > --- a/drivers/gpu/drm/radeon/radeon_connectors.c > +++ b/drivers/gpu/drm/radeon/radeon_connectors.c > @@ -255,21 +255,6 @@ static struct drm_encoder *radeon_find_encoder(struct > drm_connector *connector, > return NULL; > } > > -struct edid *radeon_connector_edid(struct drm_connector *connector) > -{ > - struct radeon_connector *radeon_connector = > to_radeon_connector(connector); > - struct drm_property_blob *edid_blob = connector->edid_blob_ptr; > - > - if (radeon_connector->edid) { > - return radeon_connector->edid; > - } else if (edid_blob) { > - struct edid *edid = kmemdup(edid_blob->data, > edid_blob->length, GFP_KERNEL); > - if (edid) > - radeon_connector->edid = edid; > - } > - return radeon_connector->edid; > -} > - > static void radeon_connector_get_edid(struct drm_connector *connector) > { > struct drm_device *dev = connector->dev; > diff --git a/drivers/gpu/drm/radeon/radeon_mode.h > b/drivers/gpu/drm/radeon/radeon_mode.h > index 546381a5c918..e0a5af180801 100644 > --- a/drivers/gpu/drm/radeon/radeon_mode.h > +++ b/drivers/gpu/drm/radeon/radeon_mode.h > @@ -701,8 +701,6 @@ extern u16 > radeon_connector_encoder_get_dp_bridge_encoder_id(struct drm_connecto > extern bool radeon_connector_is_dp12_capable(struct drm_connector > *connector); > extern int radeon_get_monitor_bpc(struct drm_connector *connector); > > -extern struct edid *radeon_connector_edid(struct drm_connector *connector); > - > extern void radeon_connector_hotplug(struct drm_connector *connector); > extern int radeon_dp_mode_valid_helper(struct drm_connector *connector, >struct drm_display_mode *mode); > -- > 2.39.2 > Reviewed-by: Robert Foss
Re: [PATCH] drm/amdgpu: fix getting vram info for gfx12
On Mon, May 13, 2024 at 11:32 AM Min, Frank wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > From: Frank Min > > gfx12 query video mem channel/type/width from umc_info of atom list, so fix > it accordingly. > > Signed-off-by: Frank Min Acked-by: Alex Deucher > --- > .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 263 ++ > 1 file changed, 148 insertions(+), 115 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c > index a6d64bdbbb14..6fe84151332e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c > @@ -289,7 +289,6 @@ static int convert_atom_mem_type_to_vram_type(struct > amdgpu_device *adev, > return vram_type; > } > > - > int > amdgpu_atomfirmware_get_vram_info(struct amdgpu_device *adev, > int *vram_width, int *vram_type, > @@ -300,6 +299,7 @@ amdgpu_atomfirmware_get_vram_info(struct amdgpu_device > *adev, > u16 data_offset, size; > union igp_info *igp_info; > union vram_info *vram_info; > + union umc_info *umc_info; > union vram_module *vram_module; > u8 frev, crev; > u8 mem_type; > @@ -311,10 +311,16 @@ amdgpu_atomfirmware_get_vram_info(struct amdgpu_device > *adev, > if (adev->flags & AMD_IS_APU) > index = > get_index_into_master_table(atom_master_list_of_data_tables_v2_1, > integratedsysteminfo); > - else > - index = > get_index_into_master_table(atom_master_list_of_data_tables_v2_1, > - vram_info); > - > + else { > + switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { > + case IP_VERSION(12, 0, 0): > + case IP_VERSION(12, 0, 1): > + index = > get_index_into_master_table(atom_master_list_of_data_tables_v2_1, umc_info); > + break; > + default: > + index = > get_index_into_master_table(atom_master_list_of_data_tables_v2_1, vram_info); > + } > + } > if (amdgpu_atom_parse_data_header(mode_info->atom_context, > index, , > , , _offset)) { > @@ -368,123 +374,150 @@ amdgpu_atomfirmware_get_vram_info(struct > amdgpu_device *adev, > return -EINVAL; > } > } else { > - vram_info = (union vram_info *) > - (mode_info->atom_context->bios + data_offset); > - module_id = (RREG32(adev->bios_scratch_reg_offset + > 4) & 0x00ff) >> 16; > - if (frev == 3) { > - switch (crev) { > - /* v30 */ > - case 0: > - vram_module = (union vram_module > *)vram_info->v30.vram_module; > - mem_vendor = > (vram_module->v30.dram_vendor_id) & 0xF; > - if (vram_vendor) > - *vram_vendor = mem_vendor; > - mem_type = vram_info->v30.memory_type; > - if (vram_type) > - *vram_type = > convert_atom_mem_type_to_vram_type(adev, mem_type); > - mem_channel_number = > vram_info->v30.channel_num; > - mem_channel_width = > vram_info->v30.channel_width; > - if (vram_width) > - *vram_width = > mem_channel_number * (1 << mem_channel_width); > - break; > - default: > - return -EINVAL; > - } > - } else if (frev == 2) { > - switch (crev) { > - /* v23 */ > - case 3: > - if (module_id > > vram_info->v23.vram_module_num) > - module_id = 0; > - vram_module = (union vram_module > *)vram_info->v23.vram_module; > - while (i < module_id) { > - vram_module = (union > vram_module *) > - ((u8 *)vram_module + > vram_module->v9.vram_module_size); > - i++;
Re: [RESEND 2/6] drm/radeon: convert to using is_hdmi and has_audio from display info
On Fri, May 10, 2024 at 5:08 PM Jani Nikula wrote: > > Prefer the parsed results for is_hdmi and has_audio in display info over > calling drm_detect_hdmi_monitor() and drm_detect_monitor_audio(), > respectively. > > Cc: Alex Deucher > Cc: Christian König > Cc: Pan, Xinhui > Cc: amd-gfx@lists.freedesktop.org > Signed-off-by: Jani Nikula > --- > drivers/gpu/drm/radeon/atombios_encoders.c | 10 +- > drivers/gpu/drm/radeon/evergreen_hdmi.c| 5 ++--- > drivers/gpu/drm/radeon/radeon_audio.c | 6 +++--- > drivers/gpu/drm/radeon/radeon_connectors.c | 12 ++-- > drivers/gpu/drm/radeon/radeon_display.c| 2 +- > drivers/gpu/drm/radeon/radeon_encoders.c | 4 ++-- > 6 files changed, 19 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c > b/drivers/gpu/drm/radeon/atombios_encoders.c > index 2bff0d9e20f5..0aa395fac36f 100644 > --- a/drivers/gpu/drm/radeon/atombios_encoders.c > +++ b/drivers/gpu/drm/radeon/atombios_encoders.c > @@ -701,7 +701,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) > if (radeon_connector->use_digital && > (radeon_connector->audio == RADEON_AUDIO_ENABLE)) > return ATOM_ENCODER_MODE_HDMI; > - else if > (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && > + else if (connector->display_info.is_hdmi && > (radeon_connector->audio == > RADEON_AUDIO_AUTO)) > return ATOM_ENCODER_MODE_HDMI; > else if (radeon_connector->use_digital) > @@ -720,7 +720,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) > if (radeon_audio != 0) { > if (radeon_connector->audio == RADEON_AUDIO_ENABLE) > return ATOM_ENCODER_MODE_HDMI; > - else if > (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && > + else if (connector->display_info.is_hdmi && > (radeon_connector->audio == > RADEON_AUDIO_AUTO)) > return ATOM_ENCODER_MODE_HDMI; > else > @@ -737,14 +737,14 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) > if ((dig_connector->dp_sink_type == > CONNECTOR_OBJECT_ID_DISPLAYPORT) || > (dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_eDP)) > { > if (radeon_audio != 0 && > - > drm_detect_monitor_audio(radeon_connector_edid(connector)) && > + connector->display_info.has_audio && > ASIC_IS_DCE4(rdev) && !ASIC_IS_DCE5(rdev)) > return ATOM_ENCODER_MODE_DP_AUDIO; > return ATOM_ENCODER_MODE_DP; > } else if (radeon_audio != 0) { > if (radeon_connector->audio == RADEON_AUDIO_ENABLE) > return ATOM_ENCODER_MODE_HDMI; > - else if > (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && > + else if (connector->display_info.is_hdmi && > (radeon_connector->audio == > RADEON_AUDIO_AUTO)) > return ATOM_ENCODER_MODE_HDMI; > else > @@ -755,7 +755,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) > break; > case DRM_MODE_CONNECTOR_eDP: > if (radeon_audio != 0 && > - > drm_detect_monitor_audio(radeon_connector_edid(connector)) && > + connector->display_info.has_audio && > ASIC_IS_DCE4(rdev) && !ASIC_IS_DCE5(rdev)) > return ATOM_ENCODER_MODE_DP_AUDIO; > return ATOM_ENCODER_MODE_DP; > diff --git a/drivers/gpu/drm/radeon/evergreen_hdmi.c > b/drivers/gpu/drm/radeon/evergreen_hdmi.c > index 681119c91d94..09dda114e218 100644 > --- a/drivers/gpu/drm/radeon/evergreen_hdmi.c > +++ b/drivers/gpu/drm/radeon/evergreen_hdmi.c > @@ -412,7 +412,7 @@ void evergreen_hdmi_enable(struct drm_encoder *encoder, > bool enable) > if (enable) { > struct drm_connector *connector = > radeon_get_connector_for_encoder(encoder); > > - if (connector && > drm_detect_monitor_audio(radeon_connector_edid(connector))) { > + if (connector && connector->display_info.has_audio) { > WREG32(HDMI_INFOFRAME_CONTROL0 + dig->afmt->offset, >HDMI_AVI_INFO_SEND | /* enable AVI info frames > */ >HDMI_AVI_INFO_CONT | /* required for audio > info values to be updated */ > @@ -450,8 +450,7 @@ void evergreen_dp_enable(struct drm_encoder *encoder, > bool enable) >
Re: [PATCH v10 6/6] drm/amdgpu: Enable userq fence interrupt support
Am 10.05.24 um 10:50 schrieb Arunpravin Paneer Selvam: Add support to handle the userqueue protected fence signal hardware interrupt. Create a xarray which maps the doorbell index to the fence driver address. This would help to retrieve the fence driver information when an userq fence interrupt is triggered. Firmware sends the doorbell offset value and this info is compared with the queue's mqd doorbell offset value. If they are same, we process the userq fence interrupt. v1:(Christian) - use xa_load() instead of going over all entries - keep the xa_lock until the fence driver process completes - create a separate patch to remove the MES self test function call. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 2 ++ .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 15 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 23 +-- 4 files changed, 30 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 4ca14b02668b..2d5ef2e74c71 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1043,6 +1043,8 @@ struct amdgpu_device { struct amdgpu_mqd mqds[AMDGPU_HW_IP_NUM]; const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM]; + struct xarray userq_xa; + /* df */ struct amdgpu_dfdf; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2d9fa3d0d4a4..fd919105a181 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3982,6 +3982,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, spin_lock_init(>audio_endpt_idx_lock); spin_lock_init(>mm_stats.lock); + xa_init_flags(>userq_xa, XA_FLAGS_LOCK_IRQ); + INIT_LIST_HEAD(>shadow_list); mutex_init(>shadow_list_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 339d82d5808f..4cbc25595226 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -70,6 +70,7 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev, struct amdgpu_usermode_queue *userq) { struct amdgpu_userq_fence_driver *fence_drv; + unsigned long flags; int r; fence_drv = kzalloc(sizeof(*fence_drv), GFP_KERNEL); @@ -97,6 +98,11 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev, fence_drv->context = dma_fence_context_alloc(1); get_task_comm(fence_drv->timeline_name, current); + xa_lock_irqsave(>userq_xa, flags); + __xa_store(>userq_xa, userq->doorbell_index, + fence_drv, GFP_KERNEL); + xa_unlock_irqrestore(>userq_xa, flags); + userq->fence_drv = fence_drv; return 0; @@ -147,8 +153,11 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref) struct amdgpu_userq_fence_driver *fence_drv = container_of(ref, struct amdgpu_userq_fence_driver, refcount); + struct amdgpu_userq_fence_driver *xa_fence_drv; struct amdgpu_device *adev = fence_drv->adev; struct amdgpu_userq_fence *fence, *tmp; + struct xarray *xa = >userq_xa; + unsigned long index; struct dma_fence *f; spin_lock(_drv->fence_list_lock); @@ -165,6 +174,12 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref) } spin_unlock(_drv->fence_list_lock); + xa_lock(xa); + xa_for_each(xa, index, xa_fence_drv) + if (xa_fence_drv == fence_drv) + __xa_erase(xa, index); + xa_unlock(xa); That is rather inefficient. We should probably move registering a fence_drv for a certain doorbell into the userq code instead. + /* Free seq64 memory */ amdgpu_seq64_free(adev, fence_drv->gpu_addr); kfree(fence_drv); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index a786e25432ae..0a206f484240 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -49,6 +49,7 @@ #include "gfx_v11_0_3.h" #include "nbio_v4_3.h" #include "mes_v11_0.h" +#include "amdgpu_userq_fence.h" #define GFX11_NUM_GFX_RINGS 1 #define GFX11_MEC_HPD_SIZE2048 @@ -5939,25 +5940,23 @@ static int gfx_v11_0_eop_irq(struct amdgpu_device *adev, struct amdgpu_irq_src *source, struct amdgpu_iv_entry *entry) { - int i; + u32 doorbell_offset = entry->src_data[0]; u8 me_id, pipe_id, queue_id; struct amdgpu_ring *ring; -
Re: [PATCH v10 5/6] drm/amdgpu: Remove the MES self test
Am 10.05.24 um 10:50 schrieb Arunpravin Paneer Selvam: Remove MES self test as this conflicts the userqueue fence interrupts. Please also completely remove the amdgpu_mes_self_test() function and any now unused code. Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 12 +--- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 14 +- 3 files changed, 2 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7753a2e64d41..2d9fa3d0d4a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4719,9 +4719,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon) } adev->in_suspend = false; - if (adev->enable_mes) - amdgpu_mes_self_test(adev); - if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D0)) DRM_WARN("smart shift update failed\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c index 4d1121d1a1e7..b7bfb3185a30 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c @@ -1161,20 +1161,10 @@ static int mes_v10_0_early_init(void *handle) return 0; } -static int mes_v10_0_late_init(void *handle) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; - - if (!amdgpu_in_reset(adev)) - amdgpu_mes_self_test(adev); - - return 0; -} - static const struct amd_ip_funcs mes_v10_1_ip_funcs = { .name = "mes_v10_1", .early_init = mes_v10_0_early_init, - .late_init = mes_v10_0_late_init, + .late_init = NULL, .sw_init = mes_v10_1_sw_init, .sw_fini = mes_v10_1_sw_fini, .hw_init = mes_v10_1_hw_init, diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index feb7fa2c304c..5923b7b0bd95 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -1274,22 +1274,10 @@ static int mes_v11_0_early_init(void *handle) return 0; } -static int mes_v11_0_late_init(void *handle) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; - - /* it's only intended for use in mes_self_test case, not for s0ix and reset */ - if (!amdgpu_in_reset(adev) && !adev->in_s0ix && !adev->in_suspend && - (amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(11, 0, 3))) - amdgpu_mes_self_test(adev); - - return 0; -} - static const struct amd_ip_funcs mes_v11_0_ip_funcs = { .name = "mes_v11_0", .early_init = mes_v11_0_early_init, - .late_init = mes_v11_0_late_init, + .late_init = NULL, .sw_init = mes_v11_0_sw_init, .sw_fini = mes_v11_0_sw_fini, .hw_init = mes_v11_0_hw_init,
Re: [PATCH v3] drm/amdgpu: Add Ring Hang Events
Am 13.05.24 um 06:14 schrieb Ori Messinger: This patch adds 'ring hang' events to the driver. This is done by adding a 'reset_ring_hang' bool variable to the struct 'amdgpu_reset_context' in the amdgpu_reset.h file. The purpose for this 'reset_ring_hang' variable is whenever a GPU reset is initiated due to a ring hang, the reset_ring_hang should be set to 'true'. Additionally, the reset cause is passed into the kfd smi event as a string, and is displayed in dmesg as an error. This 'amdgpu_reset_context' struct is now also passed through across all relevant functions, and another event type "KFD_SMI_EVENT_RING_HANG" is added to the kfd_smi_event enum. Well general NAK to that design. Why in the world would we want to expose the ring hang through the KFD SMI interface? Regards, Christian. Signed-off-by: Ori Messinger Change-Id: I6af3022eb1b4514201c9430d635ff87f167ad6f7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 9 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 16 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 5 - include/uapi/linux/kfd_ioctl.h | 15 --- 10 files changed, 56 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index e3738d417245..f1c6dc939cc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -133,6 +133,9 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work) reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; + reset_context.reset_ring_hang = true; + strscpy(reset_context.reset_cause, "hws_hang", sizeof(reset_context.reset_cause)); + DRM_ERROR("Reset cause: %s\n", reset_context.reset_cause); clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags); amdgpu_device_gpu_recover(adev, NULL, _context); @@ -261,12 +264,12 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm) return r; } -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev) +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, struct amdgpu_reset_context *reset_context) { int r = 0; if (adev->kfd.dev) - r = kgd2kfd_pre_reset(adev->kfd.dev); + r = kgd2kfd_pre_reset(adev->kfd.dev, reset_context); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 1de021ebdd46..c9030d8b8308 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -47,6 +47,7 @@ enum TLB_FLUSH_TYPE { }; struct amdgpu_device; +struct amdgpu_reset_context; enum kfd_mem_attachment_type { KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */ @@ -170,7 +171,8 @@ bool amdgpu_amdkfd_have_atomics_support(struct amdgpu_device *adev); bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid); -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev); +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, + struct amdgpu_reset_context *reset_context); int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev); @@ -416,7 +418,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, void kgd2kfd_device_exit(struct kfd_dev *kfd); void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm); int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm); -int kgd2kfd_pre_reset(struct kfd_dev *kfd); +int kgd2kfd_pre_reset(struct kfd_dev *kfd, + struct amdgpu_reset_context *reset_context); int kgd2kfd_post_reset(struct kfd_dev *kfd); void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry); void kgd2kfd_set_sram_ecc_flag(struct kfd_dev *kfd); @@ -459,7 +462,7 @@ static inline int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) return 0; } -static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd) +static inline int kgd2kfd_pre_reset(struct kfd_dev *kfd, struct amdgpu_reset_context *reset_context) { return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 00fe3c2d5431..b18f37426b5e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5772,7 +5772,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, cancel_delayed_work_sync(_adev->delayed_init_work); - amdgpu_amdkfd_pre_reset(tmp_adev); + amdgpu_amdkfd_pre_reset(tmp_adev, reset_context); /*
Re: [PATCH v10 4/6] drm/amdgpu: Implement userqueue signal/wait IOCTL
Am 10.05.24 um 10:50 schrieb Arunpravin Paneer Selvam: This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as parameter to the driver to install the fence into it. The wait IOCTL gets an array of drm syncobjs, finds the fences attached to the drm syncobjs and obtain the array of memory_address/fence_value combintion which are returned to userspace. v2: (Christian) - Install fence into GEM BO object. - Lock all BO's using the dma resv subsystem - Reorder the sequence in signal IOCTL function. - Get write pointer from the shadow wptr - use userq_fence to fetch the va/value in wait IOCTL. v3: (Christian) - Use drm_exec helper for the proper BO drm reserve and avoid BO lock/unlock issues. - fence/fence driver reference count logic for signal/wait IOCTLs. v4: (Christian) - Fixed the drm_exec calling sequence - use dma_resv_for_each_fence_unlock if BO's are not locked - Modified the fence_info array storing logic. v5: (Christian) - Keep fence_drv until wait queue execution. - Add dma_fence_wait for other fences. - Lock BO's using drm_exec as the number of fences in them could change. - Install signaled fences as well into BO/Syncobj. - Move Syncobj fence installation code after the drm_exec_prepare_array. - Directly add dma_resv_usage_rw(args->bo_flags - remove unnecessary dma_fence_put. v6: (Christian) - Add xarray stuff to store the fence_drv - Implement a function to iterate over the xarray and drop the fence_drv references. - Add drm_exec_until_all_locked() wrapper - Add a check that if we haven't exceeded the user allocated num_fences before adding dma_fence to the fences array. v7: (Christian) - Use memdup_user() for kmalloc_array + copy_from_user - Move the fence_drv references from the xarray into the newly created fence and drop the fence_drv references when we signal this fence. - Move this locking of BOs before the "if (!wait_info->num_fences)", this way you need this code block only once. - Merge the error handling code and the cleanup + return 0 code. - Initializing the xa should probably be done in the userq code. - Remove the userq back pointer stored in fence_drv. - Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv() v8: (Christian) - Move fence_drv references must come before adding the fence to the list. - Use xa_lock_irqsave_nested for nested spinlock operations. - userq_mgr should be per fpriv and not one per device. - Restructure the interrupt process code for the early exit of the loop. - The reference acquired in the syncobj fence replace code needs to be kept around. - Modify the dma_fence acquire placement in wait IOCTL. - Move USERQ_BO_WRITE flag to UAPI header file. - drop the fence drv reference after telling the hw to stop accessing it. - Add multi sync object support to userq signal IOCTL. v9: (Christian) - Store all the fence_drv ref to other drivers and not ourself. - Iterate over uq_fence_drv_xa without holding a lock. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 + .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 431 +- .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 29 +- .../gpu/drm/amd/include/amdgpu_userqueue.h| 1 + 5 files changed, 462 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 844f7b5f90db..5892a4c1a92e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2918,6 +2918,8 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), }; static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index f7baea2c67ab..339d82d5808f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -25,6 +25,7 @@ #include #include +#include #include #include "amdgpu.h" @@ -92,6 +93,7 @@ int amdgpu_userq_fence_driver_alloc(struct
Re: [PATCH v2 2/2] drm/tests: Add a unit test for range bias allocation
On 13/05/2024 16:11, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 5/13/2024 1:49 PM, Matthew Auld wrote: On 12/05/2024 08:59, Arunpravin Paneer Selvam wrote: Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 35 ++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index e3b50e240d36..a194f271bc55 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -26,6 +26,8 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; DRM_RND_STATE(prng, random_seed); unsigned int i, count, *order; + struct drm_buddy_block *block; + unsigned long flags; struct drm_buddy mm; LIST_HEAD(allocated); @@ -222,6 +224,39 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_free_list(, , 0); drm_buddy_fini(); + + /* + * Allocate cleared blocks in the bias range when the DRM buddy's clear avail is + * zero. This will validate the bias range allocation in scenarios like system boot + * when no cleared blocks are available and exercise the fallback path too. The resulting + * blocks should always be dirty. + */ + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(, mm_size, ps), + "buddy_init failed\n"); + mm.clear_avail = 0; Should already be zero, right? Maybe make this an assert instead? No, since the mm declared as a local variable in the test case, mm.clear_avail is not zero. That sounds like a bug IMO. The init() should initialize it, like it does for mm.avail and everything else. + + bias_start = round_up(prandom_u32_state() % (mm_size - ps), ps); + bias_end = round_up(bias_start + prandom_u32_state() % (mm_size - bias_start), ps); + bias_end = max(bias_end, bias_start + ps); + bias_rem = bias_end - bias_start; + + flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION; + u32 size = max(round_up(prandom_u32_state() % bias_rem, ps), ps); u32 declaration should be moved to above? Sure. Thanks, Arun. Otherwise, Reviewed-by: Matthew Auld + + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(, bias_start, + bias_end, size, ps, + , + flags), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, size, ps); + + list_for_each_entry(block, , link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(, , 0); + drm_buddy_fini(); } static void drm_test_buddy_alloc_clear(struct kunit *test)
[PATCH] drm/kfd: Correct pined buffer handling at kfd restore and validate process
From: Xiaogang Chen This reverts 8a774fe912ff09e39c2d3a3589c729330113f388 "drm/amdgpu: avoid restore process run into dead loop" since buffer got pined is not related whether it needs mapping. And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 3314821e4cf3..80018738bd1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -415,6 +415,10 @@ static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain, "Called with userptr BO")) return -EINVAL; + /* bo has been pined, not need validate it */ + if (bo->tbo.pin_count) + return 0; + amdgpu_bo_placement_from_domain(bo, domain); ret = ttm_bo_validate(>tbo, >placement, ); @@ -2736,7 +2740,7 @@ static int confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i /* keep mem without hmm range at userptr_inval_list */ if (!mem->range) -continue; + continue; /* Only check mem with hmm range associated */ valid = amdgpu_ttm_tt_get_user_pages_done( @@ -2981,9 +2985,6 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu * if (!attachment->is_mapped) continue; - if (attachment->bo_va->base.bo->tbo.pin_count) - continue; - kfd_mem_dmaunmap_attachment(mem, attachment); ret = update_gpuvm_pte(mem, attachment, _obj); if (ret) { -- 2.25.1
Re: [PATCH v2 2/2] drm/tests: Add a unit test for range bias allocation
Hi Matthew, On 5/13/2024 1:49 PM, Matthew Auld wrote: On 12/05/2024 08:59, Arunpravin Paneer Selvam wrote: Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 35 ++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index e3b50e240d36..a194f271bc55 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -26,6 +26,8 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; DRM_RND_STATE(prng, random_seed); unsigned int i, count, *order; + struct drm_buddy_block *block; + unsigned long flags; struct drm_buddy mm; LIST_HEAD(allocated); @@ -222,6 +224,39 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_free_list(, , 0); drm_buddy_fini(); + + /* + * Allocate cleared blocks in the bias range when the DRM buddy's clear avail is + * zero. This will validate the bias range allocation in scenarios like system boot + * when no cleared blocks are available and exercise the fallback path too. The resulting + * blocks should always be dirty. + */ + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(, mm_size, ps), + "buddy_init failed\n"); + mm.clear_avail = 0; Should already be zero, right? Maybe make this an assert instead? No, since the mm declared as a local variable in the test case, mm.clear_avail is not zero. + + bias_start = round_up(prandom_u32_state() % (mm_size - ps), ps); + bias_end = round_up(bias_start + prandom_u32_state() % (mm_size - bias_start), ps); + bias_end = max(bias_end, bias_start + ps); + bias_rem = bias_end - bias_start; + + flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION; + u32 size = max(round_up(prandom_u32_state() % bias_rem, ps), ps); u32 declaration should be moved to above? Sure. Thanks, Arun. Otherwise, Reviewed-by: Matthew Auld + + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(, bias_start, + bias_end, size, ps, + , + flags), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, size, ps); + + list_for_each_entry(block, , link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(, , 0); + drm_buddy_fini(); } static void drm_test_buddy_alloc_clear(struct kunit *test)
[PATCH] drm/amdgpu: fix getting vram info for gfx12
[AMD Official Use Only - AMD Internal Distribution Only] From: Frank Min gfx12 query video mem channel/type/width from umc_info of atom list, so fix it accordingly. Signed-off-by: Frank Min --- .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 263 ++ 1 file changed, 148 insertions(+), 115 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c index a6d64bdbbb14..6fe84151332e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c @@ -289,7 +289,6 @@ static int convert_atom_mem_type_to_vram_type(struct amdgpu_device *adev, return vram_type; } - int amdgpu_atomfirmware_get_vram_info(struct amdgpu_device *adev, int *vram_width, int *vram_type, @@ -300,6 +299,7 @@ amdgpu_atomfirmware_get_vram_info(struct amdgpu_device *adev, u16 data_offset, size; union igp_info *igp_info; union vram_info *vram_info; + union umc_info *umc_info; union vram_module *vram_module; u8 frev, crev; u8 mem_type; @@ -311,10 +311,16 @@ amdgpu_atomfirmware_get_vram_info(struct amdgpu_device *adev, if (adev->flags & AMD_IS_APU) index = get_index_into_master_table(atom_master_list_of_data_tables_v2_1, integratedsysteminfo); - else - index = get_index_into_master_table(atom_master_list_of_data_tables_v2_1, - vram_info); - + else { + switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { + case IP_VERSION(12, 0, 0): + case IP_VERSION(12, 0, 1): + index = get_index_into_master_table(atom_master_list_of_data_tables_v2_1, umc_info); + break; + default: + index = get_index_into_master_table(atom_master_list_of_data_tables_v2_1, vram_info); + } + } if (amdgpu_atom_parse_data_header(mode_info->atom_context, index, , , , _offset)) { @@ -368,123 +374,150 @@ amdgpu_atomfirmware_get_vram_info(struct amdgpu_device *adev, return -EINVAL; } } else { - vram_info = (union vram_info *) - (mode_info->atom_context->bios + data_offset); - module_id = (RREG32(adev->bios_scratch_reg_offset + 4) & 0x00ff) >> 16; - if (frev == 3) { - switch (crev) { - /* v30 */ - case 0: - vram_module = (union vram_module *)vram_info->v30.vram_module; - mem_vendor = (vram_module->v30.dram_vendor_id) & 0xF; - if (vram_vendor) - *vram_vendor = mem_vendor; - mem_type = vram_info->v30.memory_type; - if (vram_type) - *vram_type = convert_atom_mem_type_to_vram_type(adev, mem_type); - mem_channel_number = vram_info->v30.channel_num; - mem_channel_width = vram_info->v30.channel_width; - if (vram_width) - *vram_width = mem_channel_number * (1 << mem_channel_width); - break; - default: - return -EINVAL; - } - } else if (frev == 2) { - switch (crev) { - /* v23 */ - case 3: - if (module_id > vram_info->v23.vram_module_num) - module_id = 0; - vram_module = (union vram_module *)vram_info->v23.vram_module; - while (i < module_id) { - vram_module = (union vram_module *) - ((u8 *)vram_module + vram_module->v9.vram_module_size); - i++; - } - mem_type = vram_module->v9.memory_type; - if (vram_type) - *vram_type = convert_atom_mem_type_to_vram_type(adev, mem_type); -
Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting
On 09.05.24 11:19, Tvrtko Ursulin wrote: On 08/05/2024 20:08, Friedrich Vock wrote: On 08.05.24 20:09, Tvrtko Ursulin wrote: From: Tvrtko Ursulin The logic assumed any migration attempt worked and therefore would over- account the amount of data migrated during buffer re-validation. As a consequence client can be unfairly penalised by incorrectly considering its migration budget spent. If the migration failed but data was still moved (which I think could be the case when we try evicting everything but it still doesn't work?), shouldn't the eviction movements count towards the ratelimit too? Possibly, which path would that be? Thinking about it more, the only case where allocation still won't succeed after evicting everything from a place is the edge case when the buffer is larger than the place's size. The most likely condition for this to happen (without the submission failing entirely because the buffer just doesn't fit anywhere) would be if the app tries creating a 256MB+ visible-VRAM buffer if resizeable BAR is disabled. This case could potentially trigger an allocation failure when trying with preferred_domains, but retrying with allowed_domains, which includes GTT, could subsequently work. I mean there are definitely more migration which *should not* be counted which I think your mini-series approaches more accurately. What this patch achieves, in its current RFC form, is reduces the "false-positive" migration budget depletions. So larger improvements aside, point of the series was to illustrate that even the things which were said to be working do not seem to. See cover letter to see what I thought does not work either well or at all. Fair point. If this patchset does "wrong"/inaccurate accounting in a different way that improves the experience, then it's still an improvement. Fix it by looking at the before and after buffer object backing store and only account if there was a change. FIXME: I think this needs a better solution to account for migrations between VRAM visible and non-visible portions. FWIW, I have some WIP patches (not posted on any MLs yet though) that attempt to solve this issue (+actually enforcing ratelimits) by moving the ratelimit accounting/enforcement to TTM entirely. By moving the accounting to TTM we can count moved bytes when we move them, and don't have to rely on comparing resources to determine whether moving actually happened. This should address your FIXME as well. Yep, I've seen them. They are not necessarily conflicting with this series, potentialy TTM placement flag aside. *If* something like this can be kept small and still manage to fix up a few simple things which do not appear to work at all at the moment. For the larger re-work it is quite, well, large and it is not easy to be certain the end result would work as expected. IMO it would be best to sketch out a larger series which brings some practical and masurable change in behaviour before commiting to merge things piecemeal. Yeah, fully agree. Getting something working and iterating on that based on the results you get seems like the best way forward, that's what I'll be focusing on for now. Thanks, Friedrich For instance I have a niggling feeling the runtime games driver plays with placements and domains are not great and wonder if things could be cleaner if simplified by letting TTM manage things more, more explicitly, and having the list of placements more static. Thinking about it seems a step too far for now though. Regards, Tvrtko Regards, Friedrich Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index ec888fc6ead8..22708954ae68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) .no_wait_gpu = false, .resv = bo->tbo.base.resv }; + struct ttm_resource *old_res; uint32_t domain; int r; if (bo->tbo.pin_count) return 0; + old_res = bo->tbo.resource; + /* Don't move this buffer if we have depleted our allowance * to move it. Don't move anything if the threshold is zero. */ @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo) amdgpu_bo_placement_from_domain(bo, domain); r = ttm_bo_validate(>tbo, >placement, ); - p->bytes_moved += ctx.bytes_moved; - if (!amdgpu_gmc_vram_full_visible(>gmc) && - amdgpu_res_cpu_visible(adev, bo->tbo.resource)) - p->bytes_moved_vis += ctx.bytes_moved; - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; goto retry; } + if (!r) { +
Re: [PATCH] drm/amdgpu: add initial value for gfx12 AGP aperture
Reviewed-by: Alex Deucher On Mon, May 13, 2024 at 5:37 AM Gao, Likun wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > This patch was > Reviewed-by: Likun Gao . > > Regards, > Likun > > -Original Message- > From: Min, Frank > Sent: Monday, May 13, 2024 4:56 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Zhang, Hawking ; Gao, > Likun > Subject: [PATCH] drm/amdgpu: add initial value for gfx12 AGP aperture > > [AMD Official Use Only - AMD Internal Distribution Only] > > From: Frank Min > > add initial value for gfx12 AGP aperture > > Signed-off-by: Frank Min > --- > drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > index 34264a33dcdf..b876300bb9f6 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > @@ -622,6 +622,7 @@ static void gmc_v12_0_vram_gtt_location(struct > amdgpu_device *adev, > > base = adev->mmhub.funcs->get_fb_location(adev); > > + amdgpu_gmc_set_agp_default(adev, mc); > amdgpu_gmc_vram_location(adev, >gmc, base); > amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW); > if (!amdgpu_sriov_vf(adev) && (amdgpu_agp == 1)) > -- > 2.34.1 > >
[PATCH v1 0/4] Add CP and GFX Queue register in ip Dump for
*** BLURB HERE *** Sunil Khatri (4): drm/amdgpu: update the ip_dump to ipdump_core drm/amdgpu: Add support to dump gfx10 cp registers drm/amdgpu: add support to dump gfx10 queue registers drm/amdgpu: add prints while ip registr dump drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 4 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 214 +++-- 3 files changed, 208 insertions(+), 12 deletions(-) -- 2.34.1
[PATCH v2 3/4] drm/amdgpu: add support to dump gfx10 queue registers
Add gfx queue register for all instances in ip dump for gfx10. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 86 + 2 files changed, 87 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index d96873c154ed..54232066cd3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -437,6 +437,7 @@ struct amdgpu_gfx { /* IP reg dump */ uint32_t*ipdump_core; uint32_t*ipdump_cp; + uint32_t*ipdump_gfx_queue; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index daf9a3571183..221fbde297e9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -424,6 +424,33 @@ static const struct amdgpu_hwip_reg_entry gc_cp_reg_list_10[] = { SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) }; +static const struct amdgpu_hwip_reg_entry gc_gfx_queue_reg_list_10[] = { + /* gfx queue registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_ACTIVE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CSMD_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_MAPPED), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_QUE_MGR_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_CONTROL0), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_HQ_STATUS0), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_CSMD_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_HQD_CE_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_GFX_MQD_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_RB_WPTR_POLL_ADDR_HI) +}; + static const struct soc15_reg_golden golden_settings_gc_10_1[] = { SOC15_REG_GOLDEN_VALUE(GC, 0, mmCB_HW_CONTROL_4, 0x, 0x00400014), SOC15_REG_GOLDEN_VALUE(GC, 0, mmCGTT_CPF_CLK_CTRL, 0xfcff8fff, 0xf8000100), @@ -4664,6 +4691,19 @@ static void gfx_v10_0_alloc_ip_dump(struct amdgpu_device *adev) } else { adev->gfx.ipdump_cp = ptr; } + + /* Allocate memory for gfx cp queue registers for all the instances */ + reg_count = ARRAY_SIZE(gc_cp_reg_list_10); + inst = adev->gfx.me.num_me * adev->gfx.me.num_pipe_per_me * + adev->gfx.me.num_queue_per_pipe; + + ptr = kcalloc(reg_count * inst, sizeof(uint32_t), GFP_KERNEL); + if (ptr == NULL) { + DRM_ERROR("Failed to allocate memory for GFX CP IP Dump\n"); + adev->gfx.ipdump_gfx_queue = NULL; + } else { + adev->gfx.ipdump_gfx_queue = ptr; + } } static int gfx_v10_0_sw_init(void *handle) @@ -4874,6 +4914,7 @@ static int gfx_v10_0_sw_fini(void *handle) kfree(adev->gfx.ipdump_core); kfree(adev->gfx.ipdump_cp); + kfree(adev->gfx.ipdump_gfx_queue); return 0; } @@ -9368,6 +9409,26 @@ static void gfx_v10_ip_print(void *handle, struct drm_printer *p) } } } + + /* print gfx queue registers for all instances */ + if (!adev->gfx.ipdump_gfx_queue) + return; + + reg_count = ARRAY_SIZE(gc_gfx_queue_reg_list_10); + + for (i = 0; i < adev->gfx.me.num_me; i++) { + for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) { + for (k = 0; k < adev->gfx.me.num_queue_per_pipe; k++) { + drm_printf(p, "me %d, pipe %d, queue %d\n", i, j, k); + for (reg = 0; reg < reg_count; reg++) { + drm_printf(p, "%-50s \t 0x%08x\n", + gc_gfx_queue_reg_list_10[reg].reg_name, + adev->gfx.ipdump_gfx_queue[index + reg]); + } + index += reg_count; + } + } + } } static void gfx_v10_ip_dump(void *handle)
[PATCH v2 4/4] drm/amdgpu: add prints while ip registr dump
add prints before and after during ip registers dump. It avoids user to think of system being stuck/hung as register dump takes time after a gpu hang. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 00fe3c2d5431..b0186c61d24a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5373,11 +5373,13 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, if (!test_bit(AMDGPU_SKIP_COREDUMP, _context->flags)) { amdgpu_reset_reg_dumps(tmp_adev); + dev_info(tmp_adev->dev, "Dumping IP Registers\n"); /* Trigger ip dump before we reset the asic */ for (i = 0; i < tmp_adev->num_ip_blocks; i++) if (tmp_adev->ip_blocks[i].version->funcs->dump_ip_state) tmp_adev->ip_blocks[i].version->funcs ->dump_ip_state((void *)tmp_adev); + dev_info(tmp_adev->dev, "Dumping IP Registers Completed\n"); } reset_context->reset_device_list = device_list_handle; -- 2.34.1
[PATCH v2 2/4] drm/amdgpu: Add support to dump gfx10 cp registers
add support to dump registers of all instances of cp registers in gfx10 Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 117 +++- 2 files changed, 114 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 30d7f9c29478..d96873c154ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -436,6 +436,7 @@ struct amdgpu_gfx { /* IP reg dump */ uint32_t*ipdump_core; + uint32_t*ipdump_cp; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index f6d6a4b9802d..daf9a3571183 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -381,6 +381,49 @@ static const struct amdgpu_hwip_reg_entry gc_reg_list_10_1[] = { SOC15_REG_ENTRY_STR(GC, 0, mmGRBM_STATUS_SE3) }; +static const struct amdgpu_hwip_reg_entry gc_cp_reg_list_10[] = { + /* compute registers */ + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_VMID), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PERSISTENT_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PIPE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUEUE_PRIORITY), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_QUANTUM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_BASE_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_IB_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_RPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_EVENTS), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_BASE_ADDR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_CONTROL), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CNTL_STACK_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_CTX_SAVE_SIZE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_GDS_RESOURCE_STATE), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_ERROR), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_EOP_WPTR_MEM), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_LO), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_PQ_WPTR_HI), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_CNTL_STACK_DW_CNT), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_SUSPEND_WG_STATE_OFFSET), + SOC15_REG_ENTRY_STR(GC, 0, mmCP_HQD_DEQUEUE_STATUS) +}; + static const struct soc15_reg_golden golden_settings_gc_10_1[] = { SOC15_REG_GOLDEN_VALUE(GC, 0, mmCB_HW_CONTROL_4, 0x, 0x00400014), SOC15_REG_GOLDEN_VALUE(GC, 0, mmCGTT_CPF_CLK_CTRL, 0xfcff8fff, 0xf8000100), @@ -4595,10 +4638,11 @@ static int gfx_v10_0_compute_ring_init(struct amdgpu_device *adev, int ring_id, hw_prio, NULL); } -static void gfx_v10_0_alloc_dump_mem(struct amdgpu_device *adev) +static void gfx_v10_0_alloc_ip_dump(struct amdgpu_device *adev) { uint32_t reg_count = ARRAY_SIZE(gc_reg_list_10_1); uint32_t *ptr; + uint32_t inst; ptr = kcalloc(reg_count, sizeof(uint32_t), GFP_KERNEL); if (ptr == NULL) { @@ -4607,6 +4651,19 @@ static void gfx_v10_0_alloc_dump_mem(struct amdgpu_device *adev) } else { adev->gfx.ipdump_core = ptr; } + + /* Allocate memory for gfx cp registers for all the instances */ + reg_count = ARRAY_SIZE(gc_cp_reg_list_10); + inst = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe_per_mec * + adev->gfx.mec.num_queue_per_pipe; + + ptr = kcalloc(reg_count * inst, sizeof(uint32_t), GFP_KERNEL); + if (ptr == NULL) { + DRM_ERROR("Failed to allocate memory for GFX CP IP Dump\n"); + adev->gfx.ipdump_cp = NULL; + } else { + adev->gfx.ipdump_cp = ptr; + } } static
[PATCH v2 1/4] drm/amdgpu: update the ip_dump to ipdump_core
Update the memory pointer from ip_dump to ipdump_core to make it specific to core registers and rest other registers to be dumped in their respective memories. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h index 109f471ff315..30d7f9c29478 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h @@ -435,7 +435,7 @@ struct amdgpu_gfx { boolmcbp; /* mid command buffer preemption */ /* IP reg dump */ - uint32_t*ip_dump; + uint32_t*ipdump_core; }; struct amdgpu_gfx_ras_reg_entry { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 953df202953a..f6d6a4b9802d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4603,9 +4603,9 @@ static void gfx_v10_0_alloc_dump_mem(struct amdgpu_device *adev) ptr = kcalloc(reg_count, sizeof(uint32_t), GFP_KERNEL); if (ptr == NULL) { DRM_ERROR("Failed to allocate memory for IP Dump\n"); - adev->gfx.ip_dump = NULL; + adev->gfx.ipdump_core = NULL; } else { - adev->gfx.ip_dump = ptr; + adev->gfx.ipdump_core = ptr; } } @@ -4815,7 +4815,7 @@ static int gfx_v10_0_sw_fini(void *handle) gfx_v10_0_free_microcode(adev); - kfree(adev->gfx.ip_dump); + kfree(adev->gfx.ipdump_core); return 0; } @@ -9283,13 +9283,13 @@ static void gfx_v10_ip_print(void *handle, struct drm_printer *p) uint32_t i; uint32_t reg_count = ARRAY_SIZE(gc_reg_list_10_1); - if (!adev->gfx.ip_dump) + if (!adev->gfx.ipdump_core) return; for (i = 0; i < reg_count; i++) drm_printf(p, "%-50s \t 0x%08x\n", gc_reg_list_10_1[i].reg_name, - adev->gfx.ip_dump[i]); + adev->gfx.ipdump_core[i]); } static void gfx_v10_ip_dump(void *handle) @@ -9298,12 +9298,12 @@ static void gfx_v10_ip_dump(void *handle) uint32_t i; uint32_t reg_count = ARRAY_SIZE(gc_reg_list_10_1); - if (!adev->gfx.ip_dump) + if (!adev->gfx.ipdump_core) return; amdgpu_gfx_off_ctrl(adev, false); for (i = 0; i < reg_count; i++) - adev->gfx.ip_dump[i] = RREG32(SOC15_REG_ENTRY_OFFSET(gc_reg_list_10_1[i])); + adev->gfx.ipdump_core[i] = RREG32(SOC15_REG_ENTRY_OFFSET(gc_reg_list_10_1[i])); amdgpu_gfx_off_ctrl(adev, true); } -- 2.34.1
Re: [RESEND 0/6] drm, nouveau/radeon/amdpgu: edid_blob_ptr cleanups
On Mon, May 13, 2024 at 8:20 AM Jani Nikula wrote: > > On Fri, 10 May 2024, Alex Deucher wrote: > > On Fri, May 10, 2024 at 11:17 AM Jani Nikula wrote: > >> > >> I've sent this some moths ago, let's try again... > >> > >> BR, > >> Jani. > >> > >> Jani Nikula (6): > >> drm/nouveau: convert to using is_hdmi and has_audio from display info > >> drm/radeon: convert to using is_hdmi and has_audio from display info > >> drm/radeon: remove radeon_connector_edid() and stop using > >> edid_blob_ptr > >> drm/amdgpu: remove amdgpu_connector_edid() and stop using > >> edid_blob_ptr > >> drm/edid: add a helper for EDID sysfs property show > >> drm/connector: update edid_blob_ptr documentation > > > > Series is: > > Acked-by: Alex Deucher > > Thanks, do you want to pick these up via your tree? And do you expect a > proper R-b before merging? Feel free to take them via drm-misc if you'd prefer to land the whole set together, otherwise, I can pick up the radeon/amdgpu patches. Alex > > BR, > Jani. > > > > > >> > >> .../gpu/drm/amd/amdgpu/amdgpu_connectors.c| 16 - > >> .../gpu/drm/amd/amdgpu/amdgpu_connectors.h| 1 - > >> drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 4 +-- > >> drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 4 +-- > >> drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 4 +-- > >> drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 +-- > >> drivers/gpu/drm/drm_crtc_internal.h | 2 ++ > >> drivers/gpu/drm/drm_edid.c| 33 +++ > >> drivers/gpu/drm/drm_sysfs.c | 24 ++ > >> drivers/gpu/drm/nouveau/dispnv50/disp.c | 8 ++--- > >> drivers/gpu/drm/nouveau/dispnv50/head.c | 8 + > >> drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- > >> drivers/gpu/drm/radeon/atombios_encoders.c| 10 +++--- > >> drivers/gpu/drm/radeon/evergreen_hdmi.c | 5 ++- > >> drivers/gpu/drm/radeon/radeon_audio.c | 13 > >> drivers/gpu/drm/radeon/radeon_connectors.c| 27 --- > >> drivers/gpu/drm/radeon/radeon_display.c | 2 +- > >> drivers/gpu/drm/radeon/radeon_encoders.c | 4 +-- > >> drivers/gpu/drm/radeon/radeon_mode.h | 2 -- > >> include/drm/drm_connector.h | 6 +++- > >> 20 files changed, 79 insertions(+), 100 deletions(-) > >> > >> -- > >> 2.39.2 > >> > > -- > Jani Nikula, Intel
Re: [RFC 0/5] Discussion around eviction improvements
On 09/05/2024 13:40, Tvrtko Ursulin wrote: On 08/05/2024 19:09, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Last few days I was looking at the situation with VRAM over subscription, what happens versus what perhaps should happen. Browsing through the driver and running some simple experiments. I ended up with this patch series which, as a disclaimer, may be completely wrong but as I found some suspicious things, to me at least, I thought it was a good point to stop and request some comments. To perhaps summarise what are the main issues I think I found: * Migration rate limiting does not bother knowing if actual migration happened and so can over-account and unfairly penalise. * Migration rate limiting does not even work, at least not for the common case where userspace configures VRAM+GTT. It thinks it can stop migration attempts by playing with bo->allowed_domains vs bo->preferred domains but, both from the code, and from empirical experiments, I see that not working at all. Both masks are identical so fiddling with them achieves nothing. * Idea of the fallback placement only works when VRAM has free space. As soon as it does not, ttm_resource_compatible is happy to leave the buffers in the secondary placement forever. * Driver thinks it will be re-validating evicted buffers on the next submission but it does not for the very common case of VRAM+GTT because it only checks if current placement is *none* of the preferred placements. All those problems are addressed in individual patches. End result of this series appears to be driver which will try harder to move buffers back into VRAM, but will be (more) correctly throttled in doing so by the existing rate limiting logic. I have run a quick benchmark of Cyberpunk 2077 and cannot say that I saw a change but that could be a good thing too. At least I did not break anything, perhaps.. On one occassion I did see the rate limiting logic get confused while for a period of few minutes it went to a mode where it was constantly giving a high migration budget. But that recovered itself when I switched clients and did not come back so I don't know. If there is something wrong there I don't think it would be caused by any patches in this series. Since yesterday I also briefly tested with Far Cry New Dawn. One run each so possibly doesn't mean anything apart that there isn't a regression aka migration throttling is keeping things at bay even with increased requests to migrate things back to VRAM: before after min/avg/max fps 36/44/54 37/45/55 Cyberpunk 2077 from yesterday was similarly close: 26.96/29.59/30.40 29.70/30.00/30.32 I guess the real story is proper DGPU where misplaced buffers have a real cost. I found one game which regresses spectacularly badly with this series - Assasin's Creed Valhalla. The built-in benchmark at least. The game appears to have a working set much larger than the other games I tested, around 5GiB total during the benchmark. And for some reason migration throttling totally fails to put it in check. I will be investigating this shortly. Regards, Tvrtko Series is probably rough but should be good enough for dicsussion. I am curious to hear if I identified at least something correctly as a real problem. It would also be good to hear what are the suggested games to check and see whether there is any improvement. Cc: Christian König Cc: Friedrich Vock Tvrtko Ursulin (5): drm/amdgpu: Fix migration rate limiting accounting drm/amdgpu: Actually respect buffer migration budget drm/ttm: Add preferred placement flag drm/amdgpu: Use preferred placement for VRAM+GTT drm/amdgpu: Re-validate evicted buffers drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 38 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++-- drivers/gpu/drm/ttm/ttm_resource.c | 13 +--- include/drm/ttm/ttm_placement.h | 3 ++ 5 files changed, 65 insertions(+), 18 deletions(-)
Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
Hi, On 02.05.24 16:23, Maarten Lankhorst wrote: Hey, [snip] For Xe, I've been loking at using cgroups. A small prototype is available at https://cgit.freedesktop.org/~mlankhorst/linux/log/?h=dumpcg To stimulate discussion, I've added amdgpu support as well. This should make it possible to isolate the compositor allocations from the target program. This support is still incomplete and covers vram only, but I need help from userspace and consensus from other drivers on how to move forward. I'm thinking of making 3 cgroup limits: 1. Physical memory, each time a buffer is allocated, it counts towards it, regardless where it resides. 2. Mappable memory, all buffers allocated in sysmem or vram count towards this limit. 3. VRAM, only buffers residing in VRAM count here. This ensures that VRAM can always be evicted to sysmem, by having a mappable memory quota, and having a sysmem reservation. The main trouble is that when evicting, you want to charge the original process the changes in allocation limits, but it should be solvable. I've been looking for someone else needing the usecase in a different context, so let me know what you think of the idea. Sorry for the late reply. The idea sounds really good! I think cgroups are great fit for what we'd need to prioritize game+compositor over other potential non-foreground apps. From what I can tell looking through the code, the current cgroup properties are absolute memory sizes that userspace asks the kernel to restrict the cgroup usage to? While that sounds useful for some usecases too, I'm not sure just these limits are a good solution for making sure that your compositor's and foreground app's resources stay in memory (in favor of background apps) when there is pressure. This can be generalized towards all uses of the GPU, but the compositor vs game thrashing is a good example of why it is useful to have. IIRC Tvrtko's original proposal was about per-cgroup DRM scheduling priorities providing lower submission latency for prioritized cgroups, right? I think what we need here would pretty much exactly such a priority system, but for memory: The cgroup containing the foreground app/game and the compositor should have some hint telling TTM to try its hardest to avoid evicting its buffers (i.e. a high memory priority). Your existing drm_cgroup work looks like a great base for this, and I'd be happy to help/participate with the implementation for amdgpu. Thanks, Friedrich I should still have my cgroup testcase somewhere, this is only a rebase of my previous proposal, but I think it fits the usecase. Cheers, Maarten
Re: [RESEND 0/6] drm, nouveau/radeon/amdpgu: edid_blob_ptr cleanups
On Fri, 10 May 2024, Alex Deucher wrote: > On Fri, May 10, 2024 at 11:17 AM Jani Nikula wrote: >> >> I've sent this some moths ago, let's try again... >> >> BR, >> Jani. >> >> Jani Nikula (6): >> drm/nouveau: convert to using is_hdmi and has_audio from display info >> drm/radeon: convert to using is_hdmi and has_audio from display info >> drm/radeon: remove radeon_connector_edid() and stop using >> edid_blob_ptr >> drm/amdgpu: remove amdgpu_connector_edid() and stop using >> edid_blob_ptr >> drm/edid: add a helper for EDID sysfs property show >> drm/connector: update edid_blob_ptr documentation > > Series is: > Acked-by: Alex Deucher Thanks, do you want to pick these up via your tree? And do you expect a proper R-b before merging? BR, Jani. > >> >> .../gpu/drm/amd/amdgpu/amdgpu_connectors.c| 16 - >> .../gpu/drm/amd/amdgpu/amdgpu_connectors.h| 1 - >> drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 4 +-- >> drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 4 +-- >> drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 4 +-- >> drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 +-- >> drivers/gpu/drm/drm_crtc_internal.h | 2 ++ >> drivers/gpu/drm/drm_edid.c| 33 +++ >> drivers/gpu/drm/drm_sysfs.c | 24 ++ >> drivers/gpu/drm/nouveau/dispnv50/disp.c | 8 ++--- >> drivers/gpu/drm/nouveau/dispnv50/head.c | 8 + >> drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- >> drivers/gpu/drm/radeon/atombios_encoders.c| 10 +++--- >> drivers/gpu/drm/radeon/evergreen_hdmi.c | 5 ++- >> drivers/gpu/drm/radeon/radeon_audio.c | 13 >> drivers/gpu/drm/radeon/radeon_connectors.c| 27 --- >> drivers/gpu/drm/radeon/radeon_display.c | 2 +- >> drivers/gpu/drm/radeon/radeon_encoders.c | 4 +-- >> drivers/gpu/drm/radeon/radeon_mode.h | 2 -- >> include/drm/drm_connector.h | 6 +++- >> 20 files changed, 79 insertions(+), 100 deletions(-) >> >> -- >> 2.39.2 >> -- Jani Nikula, Intel
Re: [RESEND 1/6] drm/nouveau: convert to using is_hdmi and has_audio from display info
On Fri, 10 May 2024, Lyude Paul wrote: > Reviewed-by: Lyude Paul Thanks, how do you want to handle merging this? BR, Jani. > > On Fri, 2024-05-10 at 18:08 +0300, Jani Nikula wrote: >> Prefer the parsed results for is_hdmi and has_audio in display info >> over >> calling drm_detect_hdmi_monitor() and drm_detect_monitor_audio(), >> respectively. >> >> Conveniently, this also removes the need to use edid_blob_ptr. >> >> v2: Reverse a backwards if condition (Ilia) >> >> Cc: Karol Herbst >> Cc: Lyude Paul >> Cc: Danilo Krummrich >> Cc: nouv...@lists.freedesktop.org >> Signed-off-by: Jani Nikula >> --- >> drivers/gpu/drm/nouveau/dispnv50/disp.c | 8 >> drivers/gpu/drm/nouveau/dispnv50/head.c | 8 +--- >> drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- >> 3 files changed, 6 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c >> b/drivers/gpu/drm/nouveau/dispnv50/disp.c >> index 0c3d88ad0b0e..168c27213287 100644 >> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c >> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c >> @@ -751,7 +751,7 @@ nv50_audio_enable(struct drm_encoder *encoder, >> struct nouveau_crtc *nv_crtc, >> struct nouveau_encoder *nv_encoder = >> nouveau_encoder(encoder); >> struct nvif_outp *outp = _encoder->outp; >> >> -if (!nv50_audio_supported(encoder) || >> !drm_detect_monitor_audio(nv_connector->edid)) >> +if (!nv50_audio_supported(encoder) || !nv_connector- >> >base.display_info.has_audio) >> return; >> >> mutex_lock(>audio.lock); >> @@ -1765,7 +1765,7 @@ nv50_sor_atomic_enable(struct drm_encoder >> *encoder, struct drm_atomic_state *sta >> if ((disp->disp->object.oclass == GT214_DISP || >> disp->disp->object.oclass >= GF110_DISP) && >> nv_encoder->dcb->type != DCB_OUTPUT_LVDS && >> - drm_detect_monitor_audio(nv_connector->edid)) >> + nv_connector->base.display_info.has_audio) >> hda = true; >> >> if (!nvif_outp_acquired(outp)) >> @@ -1774,7 +1774,7 @@ nv50_sor_atomic_enable(struct drm_encoder >> *encoder, struct drm_atomic_state *sta >> switch (nv_encoder->dcb->type) { >> case DCB_OUTPUT_TMDS: >> if (disp->disp->object.oclass != NV50_DISP && >> - drm_detect_hdmi_monitor(nv_connector->edid)) >> + nv_connector->base.display_info.is_hdmi) >> nv50_hdmi_enable(encoder, nv_crtc, >> nv_connector, state, mode, hda); >> >> if (nv_encoder->outp.or.link & 1) { >> @@ -1787,7 +1787,7 @@ nv50_sor_atomic_enable(struct drm_encoder >> *encoder, struct drm_atomic_state *sta >> */ >> if (mode->clock >= 165000 && >> nv_encoder->dcb->duallink_possible && >> - !drm_detect_hdmi_monitor(nv_connector- >> >edid)) >> + !nv_connector- >> >base.display_info.is_hdmi) >> proto = >> NV507D_SOR_SET_CONTROL_PROTOCOL_DUAL_TMDS; >> } else { >> proto = >> NV507D_SOR_SET_CONTROL_PROTOCOL_SINGLE_TMDS_B; >> diff --git a/drivers/gpu/drm/nouveau/dispnv50/head.c >> b/drivers/gpu/drm/nouveau/dispnv50/head.c >> index 83355dbc15ee..d7c74cc43ba5 100644 >> --- a/drivers/gpu/drm/nouveau/dispnv50/head.c >> +++ b/drivers/gpu/drm/nouveau/dispnv50/head.c >> @@ -127,14 +127,8 @@ nv50_head_atomic_check_view(struct >> nv50_head_atom *armh, >> struct drm_display_mode *omode = >state.adjusted_mode; >> struct drm_display_mode *umode = >state.mode; >> int mode = asyc->scaler.mode; >> -struct edid *edid; >> int umode_vdisplay, omode_hdisplay, omode_vdisplay; >> >> -if (connector->edid_blob_ptr) >> -edid = (struct edid *)connector->edid_blob_ptr- >> >data; >> -else >> -edid = NULL; >> - >> if (!asyc->scaler.full) { >> if (mode == DRM_MODE_SCALE_NONE) >> omode = umode; >> @@ -162,7 +156,7 @@ nv50_head_atomic_check_view(struct nv50_head_atom >> *armh, >> */ >> if ((asyc->scaler.underscan.mode == UNDERSCAN_ON || >> (asyc->scaler.underscan.mode == UNDERSCAN_AUTO && >> - drm_detect_hdmi_monitor(edid { >> + connector->display_info.is_hdmi))) { >> u32 bX = asyc->scaler.underscan.hborder; >> u32 bY = asyc->scaler.underscan.vborder; >> u32 r = (asyh->view.oH << 19) / asyh->view.oW; >> diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c >> b/drivers/gpu/drm/nouveau/nouveau_connector.c >> index 856b3ef5edb8..938832a6af15 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_connector.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c >> @@ -1034,7 +1034,7 @@ get_tmds_link_bandwidth(struct drm_connector >> *connector) >> unsigned duallink_scale = >> nouveau_duallink && nv_encoder->dcb- >> >duallink_possible ? 2 : 1; >> >> -if
Re: [PATCH 4/5] drm/amdgpu: Fix null pointer dereference to bo
Am 13.05.24 um 10:56 schrieb Ma Jun: Check bo before using it Signed-off-by: Ma Jun Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c index 2b7b67916c1d..8269fd6828bf 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -497,9 +497,8 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device *adev, uint64_t *flags) { struct amdgpu_bo *bo = mapping->bo_va->base.bo; - struct amdgpu_device *bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); - bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; - bool is_system = bo->tbo.resource->mem_type == TTM_PL_SYSTEM; + struct amdgpu_device *bo_adev; + bool coherent, is_system; *flags &= ~AMDGPU_PTE_EXECUTABLE; @@ -515,13 +514,20 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device *adev, *flags &= ~AMDGPU_PTE_VALID; } - if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + if (!bo) + return; + + if (bo->flags & (AMDGPU_GEM_CREATE_COHERENT | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_GFX12_MASK) | AMDGPU_PTE_MTYPE_GFX12(MTYPE_UC); + bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); + coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; + is_system = bo->tbo.resource->mem_type == TTM_PL_SYSTEM; + /* WA for HW bug */ - if ((bo && is_system) || ((bo_adev != adev) && coherent)) + if (is_system || ((bo_adev != adev) && coherent)) *flags |= AMDGPU_PTE_MTYPE_GFX12(MTYPE_NC); }
RE: [PATCH] drm/amd/amdgpu: update jpeg 5 capability
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Leo Liu > -Original Message- > From: Wu, David > Sent: Thursday, May 9, 2024 5:37 PM > To: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > Cc: Koenig, Christian ; Liu, Leo > ; Jiang, Sonny > Subject: [PATCH] drm/amd/amdgpu: update jpeg 5 capability > > Based on the documentation the maximum resolustion should be 16384x16384. > > Signed-off-by: David (Ming Qiang) Wu > --- > drivers/gpu/drm/amd/amdgpu/soc24.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c > b/drivers/gpu/drm/amd/amdgpu/soc24.c > index 12900488dd61..285d6af10f62 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc24.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c > @@ -61,7 +61,7 @@ static const struct amdgpu_video_codecs > vcn_5_0_0_video_codecs_encode_vcn0 = { static const struct > amdgpu_video_codec_info vcn_5_0_0_video_codecs_decode_array_vcn0[] = { > > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_ > AVC, 4096, 2160, 52)}, > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, > 8192, 4320, 183)}, > - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG, > 4096, 4096, 0)}, > + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG, > 16384, 16384, > +0)}, > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_VP9, > 8192, 4352, 0)}, > {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, > 8192, 4352, 0)}, }; > -- > 2.34.1
Re: [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata
Am 10.05.24 um 20:12 schrieb Matthew Auld: The driver release callback is called when a particular drm_device goes away, just like with drmm, so here we should never nuke the pdev drvdata pointer, since that could already be pointing to a new drvdata. For example something hotunplugs the device, for which we have an open driver fd, keeping the drm_device alive, and in the meantime the same physical device is re-attached to a new drm_device therefore setting drvdata again. Once the original drm_device goes away, we might then call the release which then incorrectly tramples the drvdata. The driver core will already nuke the pointer for us when the pci device is removed, so should be safe to simply drop. Alternative would be to move to the driver pci remove callback. Signed-off-by: Matthew Auld Cc: Christian König Cc: Daniel Vetter Cc: amd-gfx@lists.freedesktop.org Oh! Very good catch! That might become important for a feature we current discuss internally. Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index a0ea6fe8d060..d5fed007c698 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -1450,7 +1450,6 @@ void amdgpu_driver_release_kms(struct drm_device *dev) struct amdgpu_device *adev = drm_to_adev(dev); amdgpu_device_fini_sw(adev); - pci_set_drvdata(adev->pdev, NULL); } /*
Re: [PATCH 01/22] drm/amdgpu: fix dereference after null check
Am 10.05.24 um 04:50 schrieb Jesse Zhang: check the pointer hive before use. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 77f6fd50002a..00fe3c2d5431 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5725,7 +5725,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * to put adev in the 1st position. */ INIT_LIST_HEAD(_list); - if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) { + if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1) && hive) { That solution looks not optimal to me. Checking adev->gmc.xgmi.num_physical_nodes > 1 already makes sure that hive shouldn't be NULL. If automated checkers complain about that we should probably drop the adev->gmc.xgmi.num_physical_nodes > 1 check and check for hive instead. Regards, Christian. list_for_each_entry(tmp_adev, >device_list, gmc.xgmi.head) { list_add_tail(_adev->reset_list, _list); if (adev->shutdown)
Re: [PATCH 05/22] drm/amd/pm: check specific index for aldebaran
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > index ce941fbb9cfb..a22eb6bbb05e 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > @@ -1886,7 +1886,8 @@ static int aldebaran_mode2_reset(struct smu_context > *smu) > > index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG, > SMU_MSG_GfxDeviceDriverReset); > - > + if (index < 0 ) > + return -EINVAL; To avoid warning problems, drop index and use PPSMC_MSG_GfxDriverReset instead of index. Thanks, Lijo > mutex_lock(>message_lock); > if (smu->smc_fw_version >= 0x00441400) { > ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, > SMU_RESET_MODE_2);
Re: [PATCH] drm/amd/display: Don't register panel_power_savings on OLED panels
On 2024-05-09 13:05, Mario Limonciello wrote: OLED panels don't support the ABM, they shouldn't offer the panel_power_savings attribute to the user. Check whether aux BL control support was enabled to decide whether to offer it. Reported-by: Gergo Koteles Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3359 Signed-off-by: Mario Limonciello Reviewed-by: Harry Wentland Harry --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 29 --- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3054bf79fc99..ce2ec857b8a6 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6532,12 +6532,34 @@ static const struct attribute_group amdgpu_group = { .attrs = amdgpu_attrs }; +static bool +amdgpu_dm_should_create_sysfs(struct amdgpu_dm_connector *amdgpu_dm_connector) +{ + if (amdgpu_dm_abm_level >= 0) + return false; + + if (amdgpu_dm_connector->base.connector_type != DRM_MODE_CONNECTOR_eDP) + return false; + + /* check for OLED panels */ + if (amdgpu_dm_connector->bl_idx >= 0) { + struct drm_device *drm = amdgpu_dm_connector->base.dev; + struct amdgpu_display_manager *dm = _to_adev(drm)->dm; + struct amdgpu_dm_backlight_caps *caps; + + caps = >backlight_caps[amdgpu_dm_connector->bl_idx]; + if (caps->aux_support) + return false; + } + + return true; +} + static void amdgpu_dm_connector_unregister(struct drm_connector *connector) { struct amdgpu_dm_connector *amdgpu_dm_connector = to_amdgpu_dm_connector(connector); - if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) + if (amdgpu_dm_should_create_sysfs(amdgpu_dm_connector)) sysfs_remove_group(>kdev->kobj, _group); drm_dp_aux_unregister(_dm_connector->dm_dp_aux.aux); @@ -6644,8 +,7 @@ amdgpu_dm_connector_late_register(struct drm_connector *connector) to_amdgpu_dm_connector(connector); int r; - if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) { + if (amdgpu_dm_should_create_sysfs(amdgpu_dm_connector)) { r = sysfs_create_group(>kdev->kobj, _group); if (r)
Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13
On 5/13/2024 4:27 PM, Lazar, Lijo wrote: > > > On 5/10/2024 8:20 AM, Jesse Zhang wrote: >> Check for specific indexes that may be invalid values. >> >> Signed-off-by: Jesse Zhang >> --- >> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c >> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c >> index 051092f1b1b4..7c343dd12a7f 100644 >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c >> @@ -2336,6 +2336,8 @@ static int smu_v13_0_6_mode2_reset(struct smu_context >> *smu) >> >> index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG, >> SMU_MSG_GfxDeviceDriverReset); >> +if (index < 0) >> +ret = -EINVAL; >> > > This check is unnecessary. This is IP specific logic and the index is > expected to exist. > If you are seeing a warning problem because of this, drop index and use PPSMC_MSG_GfxDriverReset directly. Thanks, Lijo > See this entry in smu_v13_0_6_message_map > > MSG_MAP(GfxDeviceDriverReset,PPSMC_MSG_GfxDriverReset, >SMU_MSG_RAS_PRI), > > > Thanks, > Lijo > >> mutex_lock(>message_lock); >>
Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > index 051092f1b1b4..7c343dd12a7f 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c > @@ -2336,6 +2336,8 @@ static int smu_v13_0_6_mode2_reset(struct smu_context > *smu) > > index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG, > SMU_MSG_GfxDeviceDriverReset); > + if (index < 0) > + ret = -EINVAL; > This check is unnecessary. This is IP specific logic and the index is expected to exist. See this entry in smu_v13_0_6_message_map MSG_MAP(GfxDeviceDriverReset,PPSMC_MSG_GfxDriverReset, SMU_MSG_RAS_PRI), Thanks, Lijo > mutex_lock(>message_lock); >
Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check handle pointer before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > index 28febf33fb1b..e969a7d77b4d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c > @@ -279,7 +279,7 @@ static struct aca_bank_error *get_bank_error(struct > aca_error *aerr, struct aca_ > int aca_error_cache_log_bank_error(struct aca_handle *handle, struct > aca_bank_info *info, > enum aca_error_type type, u64 count) > { > - struct aca_error_cache *error_cache = >error_cache; > + struct aca_error_cache *error_cache; > struct aca_bank_error *bank_error; > struct aca_error *aerr; > > @@ -289,6 +289,10 @@ int aca_error_cache_log_bank_error(struct aca_handle > *handle, struct aca_bank_in > if (!count) > return 0; > > + error_cache = >error_cache; > + if (!error_cache) > + return -EINVAL; Similar as in patch 2. error_cache is not a pointer variable. struct aca_error_cache error_cache; Thanks, Lijo > + > aerr = _cache->errors[type]; > bank_error = get_bank_error(aerr, info); > if (!bank_error)
Re: [PATCH 2/5] drm/amdgpu: Fix the null pointer dereference to ras_manager
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check ras_manager before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 1dd13ed3b7b5..6da02a209890 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -2172,12 +2172,17 @@ static void > amdgpu_ras_interrupt_process_handler(struct work_struct *work) > int amdgpu_ras_interrupt_dispatch(struct amdgpu_device *adev, > struct ras_dispatch_if *info) > { > - struct ras_manager *obj = amdgpu_ras_find_obj(adev, >head); > - struct ras_ih_data *data = >ih_data; > + struct ras_manager *obj; > + struct ras_ih_data *data; > > + obj = amdgpu_ras_find_obj(adev, >head); > if (!obj) > return -EINVAL; > > + data = >ih_data; > + if (!data) > + return -EINVAL; This check is not needed. ih_data is declared as below in ras_manager. struct ras_ih_data ih_data; Thanks, Lijo > + > if (data->inuse == 0) > return 0; >
[PATCH v3 10/10] Documentation/amdgpu: Add PM policy documentation
Add documentation about the newly added pm_policy node in sysfs. Signed-off-by: Lijo Lazar --- Documentation/gpu/amdgpu/thermal.rst | 6 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 48 2 files changed, 54 insertions(+) diff --git a/Documentation/gpu/amdgpu/thermal.rst b/Documentation/gpu/amdgpu/thermal.rst index 2f6166f81e6a..6d942b5c58f0 100644 --- a/Documentation/gpu/amdgpu/thermal.rst +++ b/Documentation/gpu/amdgpu/thermal.rst @@ -49,6 +49,12 @@ pp_power_profile_mode .. kernel-doc:: drivers/gpu/drm/amd/pm/amdgpu_pm.c :doc: pp_power_profile_mode +pm_policy +- + +.. kernel-doc:: drivers/gpu/drm/amd/pm/amdgpu_pm.c + :doc: pm_policy + \*_busy_percent --- diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 9cca4716ec43..45766d49f1f2 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2214,6 +2214,54 @@ static int pp_dpm_clk_default_attr_update(struct amdgpu_device *adev, struct amd return 0; } +/** + * DOC: pm_policy + * + * Certain SOCs can support different power policies to optimize application + * performance. However, this policy is provided only at SOC level and not at a + * per-process level. This is useful especially when entire SOC is utilized for + * dedicated workload. + * + * The amdgpu driver provides a sysfs API for selecting the policy. Presently, + * only two types of policies are supported through this interface. + * + * Pstate Policy Selection - This is to select different Pstate profiles which + * decides clock/throttling preferences. + * + * XGMI PLPD Policy Selection - When multiple devices are connected over XGMI, + * this helps to select policy to be applied for per link power down. + * + * The list of available policies and policy levels vary between SOCs. They can + * be viewed by reading the file. The policy level which is applied presently is + * denoted by * (asterisk). E.g., + * + * .. code-block:: console + * + * cat /sys/bus/pci/devices/.../pm_policy + * soc pstate + * 0 : soc_pstate_default + * 1 : soc_pstate_0 + * 2 : soc_pstate_1* + * 3 : soc_pstate_2 + * xgmi plpd + * 0 : plpd_disallow + * 1 : plpd_default + * 2 : plpd_optimized* + * + * To apply a specific policy + * + * "echo > /sys/bus/pci/devices/.../pm_policy" + * + * For the levels listed in the example above, to select "plpd_optimized" for + * XGMI and "soc_pstate_2" for soc pstate policy - + * + * .. code-block:: console + * + * echo "xgmi 2" > /sys/bus/pci/devices/.../pm_policy + * echo "soc_pstate 3" > /sys/bus/pci/devices/.../pm_policy + * + */ + static ssize_t amdgpu_get_pm_policy(struct device *dev, struct device_attribute *attr, char *buf) { -- 2.25.1
[PATCH v3 06/10] drm/amd/pm: Add xgmi plpd to aldebaran pm_policy
On aldebaran, allow changing xgmi plpd policy through pm_policy sysfs interface. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 35 +++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index a22eb6bbb05e..66d386ef1da9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c @@ -266,9 +266,30 @@ static int aldebaran_tables_init(struct smu_context *smu) return 0; } +static int aldebaran_select_plpd_policy(struct smu_context *smu, int level) +{ + struct amdgpu_device *adev = smu->adev; + + /* The message only works on master die and NACK will be sent + back for other dies, only send it on master die */ + if (adev->smuio.funcs->get_socket_id(adev) || + adev->smuio.funcs->get_die_id(adev)) + return 0; + + if (level == XGMI_PLPD_DEFAULT) + return smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_GmiPwrDnControl, 0, NULL); + else if (level == XGMI_PLPD_DISALLOW) + return smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_GmiPwrDnControl, 1, NULL); + else + return -EINVAL; +} + static int aldebaran_allocate_dpm_context(struct smu_context *smu) { struct smu_dpm_context *smu_dpm = >smu_dpm; + struct smu_dpm_policy *policy; smu_dpm->dpm_context = kzalloc(sizeof(struct smu_13_0_dpm_context), GFP_KERNEL); @@ -276,6 +297,20 @@ static int aldebaran_allocate_dpm_context(struct smu_context *smu) return -ENOMEM; smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context); + smu_dpm->dpm_policies = + kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL); + + if (!smu_dpm->dpm_policies) + return -ENOMEM; + + policy = &(smu_dpm->dpm_policies->policies[0]); + policy->policy_type = PP_PM_POLICY_XGMI_PLPD; + policy->level_mask = BIT(XGMI_PLPD_DISALLOW) | BIT(XGMI_PLPD_DEFAULT); + policy->current_level = XGMI_PLPD_DEFAULT; + policy->set_policy = aldebaran_select_plpd_policy; + smu_cmn_generic_plpd_policy_desc(policy); + smu_dpm->dpm_policies->policy_mask |= BIT(PP_PM_POLICY_XGMI_PLPD); + return 0; } -- 2.25.1
[PATCH v3 09/10] drm/amd/pm: Remove unused interface to set plpd
Remove unused callback to set PLPD policy and its implementation from arcturus, aldebaran and SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 6 --- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 22 --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 24 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 39 --- 4 files changed, 91 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h index e460fa9aaf23..dd9356c7428e 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h @@ -876,12 +876,6 @@ struct pptable_funcs { */ int (*set_df_cstate)(struct smu_context *smu, enum pp_df_cstate state); - /** -* @select_xgmi_plpd_policy: Select xgmi per-link power down policy. -*/ - int (*select_xgmi_plpd_policy)(struct smu_context *smu, - enum pp_xgmi_plpd_mode mode); - /** * @update_pcie_parameters: Update and upload the system's PCIe * capabilites to the SMU. diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c index 84f7d4139bda..c0f6b59369b7 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -,27 +,6 @@ static int arcturus_set_df_cstate(struct smu_context *smu, return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_DFCstateControl, state, NULL); } -static int arcturus_select_xgmi_plpd_policy(struct smu_context *smu, - enum pp_xgmi_plpd_mode mode) -{ - /* PPSMC_MSG_GmiPwrDnControl is supported by 54.23.0 and onwards */ - if (smu->smc_fw_version < 0x00361700) { - dev_err(smu->adev->dev, "XGMI power down control is only supported by PMFW 54.23.0 and onwards\n"); - return -EINVAL; - } - - if (mode == XGMI_PLPD_DEFAULT) - return smu_cmn_send_smc_msg_with_param(smu, - SMU_MSG_GmiPwrDnControl, - 1, NULL); - else if (mode == XGMI_PLPD_DISALLOW) - return smu_cmn_send_smc_msg_with_param(smu, - SMU_MSG_GmiPwrDnControl, - 0, NULL); - else - return -EINVAL; -} - static const struct throttling_logging_label { uint32_t feature_mask; const char *label; @@ -2440,7 +2419,6 @@ static const struct pptable_funcs arcturus_ppt_funcs = { .get_dpm_ultimate_freq = smu_v11_0_get_dpm_ultimate_freq, .set_soft_freq_limited_range = smu_v11_0_set_soft_freq_limited_range, .set_df_cstate = arcturus_set_df_cstate, - .select_xgmi_plpd_policy = arcturus_select_xgmi_plpd_policy, .log_thermal_throttling_event = arcturus_log_thermal_throttling_event, .get_pp_feature_mask = smu_cmn_get_pp_feature_mask, .set_pp_feature_mask = smu_cmn_set_pp_feature_mask, diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index 66d386ef1da9..e584e53e3760 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c @@ -1642,29 +1642,6 @@ static int aldebaran_set_df_cstate(struct smu_context *smu, return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_DFCstateControl, state, NULL); } -static int aldebaran_select_xgmi_plpd_policy(struct smu_context *smu, -enum pp_xgmi_plpd_mode mode) -{ - struct amdgpu_device *adev = smu->adev; - - /* The message only works on master die and NACK will be sent - back for other dies, only send it on master die */ - if (adev->smuio.funcs->get_socket_id(adev) || - adev->smuio.funcs->get_die_id(adev)) - return 0; - - if (mode == XGMI_PLPD_DEFAULT) - return smu_cmn_send_smc_msg_with_param(smu, - SMU_MSG_GmiPwrDnControl, - 0, NULL); - else if (mode == XGMI_PLPD_DISALLOW) - return smu_cmn_send_smc_msg_with_param(smu, - SMU_MSG_GmiPwrDnControl, - 1, NULL); - else - return -EINVAL; -} - static const struct throttling_logging_label { uint32_t feature_mask; const char *label; @@ -2104,7 +2081,6 @@ static const struct pptable_funcs aldebaran_ppt_funcs = { .set_soft_freq_limited_range =
[PATCH v3 08/10] drm/amd/pm: Remove legacy interface for xgmi plpd
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd sysfs node is not used by any client. Remove that as well. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- v2: No change v3: Rebase to remove device_attr_id__xgmi_plpd_policy drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 43 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 68 --- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 5 -- drivers/gpu/drm/amd/pm/inc/amdgpu_pm.h| 1 - drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 27 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 - 7 files changed, 2 insertions(+), 148 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 4a14f9c1bfe8..821ba2309dec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -1446,7 +1446,7 @@ static int amdgpu_ras_error_inject_xgmi(struct amdgpu_device *adev, if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW)) dev_warn(adev->dev, "Failed to disallow df cstate"); - ret1 = amdgpu_dpm_set_xgmi_plpd_mode(adev, XGMI_PLPD_DISALLOW); + ret1 = amdgpu_dpm_set_pm_policy(adev, PP_PM_POLICY_XGMI_PLPD, XGMI_PLPD_DISALLOW); if (ret1 && ret1 != -EOPNOTSUPP) dev_warn(adev->dev, "Failed to disallow XGMI power down"); @@ -1455,7 +1455,7 @@ static int amdgpu_ras_error_inject_xgmi(struct amdgpu_device *adev, if (amdgpu_ras_intr_triggered()) return ret2; - ret1 = amdgpu_dpm_set_xgmi_plpd_mode(adev, XGMI_PLPD_DEFAULT); + ret1 = amdgpu_dpm_set_pm_policy(adev, PP_PM_POLICY_XGMI_PLPD, XGMI_PLPD_DEFAULT); if (ret1 && ret1 != -EOPNOTSUPP) dev_warn(adev->dev, "Failed to allow XGMI power down"); diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index b443906484e7..cd169af35399 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -368,49 +368,6 @@ int amdgpu_dpm_set_df_cstate(struct amdgpu_device *adev, return ret; } -int amdgpu_dpm_get_xgmi_plpd_mode(struct amdgpu_device *adev, char **mode_desc) -{ - struct smu_context *smu = adev->powerplay.pp_handle; - int mode = XGMI_PLPD_NONE; - - if (is_support_sw_smu(adev)) { - mode = smu->plpd_mode; - if (mode_desc == NULL) - return mode; - switch (smu->plpd_mode) { - case XGMI_PLPD_DISALLOW: - *mode_desc = "disallow"; - break; - case XGMI_PLPD_DEFAULT: - *mode_desc = "default"; - break; - case XGMI_PLPD_OPTIMIZED: - *mode_desc = "optimized"; - break; - case XGMI_PLPD_NONE: - default: - *mode_desc = "none"; - break; - } - } - - return mode; -} - -int amdgpu_dpm_set_xgmi_plpd_mode(struct amdgpu_device *adev, int mode) -{ - struct smu_context *smu = adev->powerplay.pp_handle; - int ret = -EOPNOTSUPP; - - if (is_support_sw_smu(adev)) { - mutex_lock(>pm.mutex); - ret = smu_set_xgmi_plpd_mode(smu, mode); - mutex_unlock(>pm.mutex); - } - - return ret; -} - ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char *buf) { struct smu_context *smu = adev->powerplay.pp_handle; diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 48da662da005..9cca4716ec43 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2214,70 +2214,6 @@ static int pp_dpm_clk_default_attr_update(struct amdgpu_device *adev, struct amd return 0; } -/* Following items will be read out to indicate current plpd policy: - * - -1: none - * - 0: disallow - * - 1: default - * - 2: optimized - */ -static ssize_t amdgpu_get_xgmi_plpd_policy(struct device *dev, - struct device_attribute *attr, - char *buf) -{ - struct drm_device *ddev = dev_get_drvdata(dev); - struct amdgpu_device *adev = drm_to_adev(ddev); - char *mode_desc = "none"; - int mode; - - if (amdgpu_in_reset(adev)) - return -EPERM; - if (adev->in_suspend && !adev->in_runpm) - return -EPERM; - - mode = amdgpu_dpm_get_xgmi_plpd_mode(adev, _desc); - - return sysfs_emit(buf, "%d: %s\n", mode, mode_desc); -} - -/* Following argument value is expected from user to change plpd policy - * - arg 0: disallow plpd - * - arg 1: default policy - * - arg 2: optimized policy - */
[PATCH v3 05/10] drm/amd/pm: Add xgmi plpd to SMU v13.0.6 pm_policy
On SOCs with SMU v13.0.6, allow changing xgmi plpd policy through pm_policy sysfs interface. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 19 +-- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 51 +-- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 27 ++ drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h| 1 + 4 files changed, 91 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index df9ff377ebfd..a8116ae4472a 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -1197,6 +1197,9 @@ static void smu_swctf_delayed_work_handler(struct work_struct *work) static void smu_init_xgmi_plpd_mode(struct smu_context *smu) { + struct smu_dpm_context *dpm_ctxt = &(smu->smu_dpm); + struct smu_dpm_policy_ctxt *policy_ctxt; + if (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(11, 0, 2)) { smu->plpd_mode = XGMI_PLPD_DEFAULT; return; @@ -1204,10 +1207,20 @@ static void smu_init_xgmi_plpd_mode(struct smu_context *smu) /* PMFW put PLPD into default policy after enabling the feature */ if (smu_feature_is_enabled(smu, - SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT)) + SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT)) { + struct smu_dpm_policy *policy; + smu->plpd_mode = XGMI_PLPD_DEFAULT; - else + policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD); + if (policy) + policy->current_level = XGMI_PLPD_DEFAULT; + } else { smu->plpd_mode = XGMI_PLPD_NONE; + policy_ctxt = dpm_ctxt->dpm_policies; + if (policy_ctxt) + policy_ctxt->policy_mask &= + ~BIT(PP_PM_POLICY_XGMI_PLPD); + } } static int smu_sw_init(void *handle) @@ -3555,7 +3568,7 @@ struct smu_dpm_policy *smu_get_pm_policy(struct smu_context *smu, policy_ctxt = dpm_ctxt->dpm_policies; if (!policy_ctxt) return NULL; - + for_each_set_bit(i, _ctxt->policy_mask, PP_PM_POLICY_NUM) { if (policy_ctxt->policies[i].policy_type == p_type) return _ctxt->policies[i]; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index 0ed0b5326d35..173c5599279b 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -403,9 +403,45 @@ static int smu_v13_0_6_select_policy_soc_pstate(struct smu_context *smu, return ret; } +static int smu_v13_0_6_select_plpd_policy(struct smu_context *smu, int level) +{ + struct amdgpu_device *adev = smu->adev; + int ret, param; + + switch (level) { + case XGMI_PLPD_DEFAULT: + param = PPSMC_PLPD_MODE_DEFAULT; + break; + case XGMI_PLPD_OPTIMIZED: + param = PPSMC_PLPD_MODE_OPTIMIZED; + break; + case XGMI_PLPD_DISALLOW: + param = 0; + break; + default: + return -EINVAL; + } + + if (level == XGMI_PLPD_DISALLOW) + ret = smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_GmiPwrDnControl, param, NULL); + else + /* change xgmi per-link power down policy */ + ret = smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_SelectPLPDMode, param, NULL); + + if (ret) + dev_err(adev->dev, + "select xgmi per-link power down policy %d failed\n", + level); + + return ret; +} + static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu) { struct smu_dpm_context *smu_dpm = >smu_dpm; + struct smu_dpm_policy *policy; smu_dpm->dpm_context = kzalloc(sizeof(struct smu_13_0_dpm_context), GFP_KERNEL); @@ -413,11 +449,9 @@ static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu) return -ENOMEM; smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context); + smu_dpm->dpm_policies = + kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL); if (!(smu->adev->flags & AMD_IS_APU)) { - struct smu_dpm_policy *policy; - - smu_dpm->dpm_policies = - kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL); policy = &(smu_dpm->dpm_policies->policies[0]); policy->policy_type = PP_PM_POLICY_SOC_PSTATE; @@ -430,6 +464,15 @@ static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu)
[PATCH v3 04/10] drm/amd/pm: Add xgmi plpd policy to pm_policy
Add support to set XGMI PLPD policy levels through pm_policy sysfs node. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/include/kgd_pp_interface.h | 1 + drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index 8ed9aa9a990d..4b20e2274313 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -276,6 +276,7 @@ enum pp_xgmi_plpd_mode { enum pp_pm_policy { PP_PM_POLICY_NONE = -1, PP_PM_POLICY_SOC_PSTATE = 0, + PP_PM_POLICY_XGMI_PLPD, PP_PM_POLICY_NUM, }; diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index d62f1d1d6c84..48da662da005 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2318,6 +2318,9 @@ static ssize_t amdgpu_set_pm_policy(struct device *dev, if (strncmp(tmp, "soc_pstate", strlen("soc_pstate")) == 0) { policy_type = PP_PM_POLICY_SOC_PSTATE; tmp += strlen("soc_pstate"); + } else if (strncmp(tmp, "xgmi", strlen("xgmi")) == 0) { + policy_type = PP_PM_POLICY_XGMI_PLPD; + tmp += strlen("xgmi"); } else { return -EINVAL; } -- 2.25.1
[PATCH v3 07/10] drm/amd/pm: Add xgmi plpd to arcturus pm_policy
On arcturus, allow changing xgmi plpd policy through pm_policy sysfs interface. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 7 ++-- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 42 +++ 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index a8116ae4472a..a395967905d2 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -1199,19 +1199,20 @@ static void smu_init_xgmi_plpd_mode(struct smu_context *smu) { struct smu_dpm_context *dpm_ctxt = &(smu->smu_dpm); struct smu_dpm_policy_ctxt *policy_ctxt; + struct smu_dpm_policy *policy; + policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD); if (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(11, 0, 2)) { smu->plpd_mode = XGMI_PLPD_DEFAULT; + if (policy) + policy->current_level = XGMI_PLPD_DEFAULT; return; } /* PMFW put PLPD into default policy after enabling the feature */ if (smu_feature_is_enabled(smu, SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT)) { - struct smu_dpm_policy *policy; - smu->plpd_mode = XGMI_PLPD_DEFAULT; - policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD); if (policy) policy->current_level = XGMI_PLPD_DEFAULT; } else { diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c index 623f6052f97e..84f7d4139bda 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -283,9 +283,29 @@ static int arcturus_tables_init(struct smu_context *smu) return 0; } +static int arcturus_select_plpd_policy(struct smu_context *smu, int level) +{ + /* PPSMC_MSG_GmiPwrDnControl is supported by 54.23.0 and onwards */ + if (smu->smc_fw_version < 0x00361700) { + dev_err(smu->adev->dev, + "XGMI power down control is only supported by PMFW 54.23.0 and onwards\n"); + return -EINVAL; + } + + if (level == XGMI_PLPD_DEFAULT) + return smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_GmiPwrDnControl, 1, NULL); + else if (level == XGMI_PLPD_DISALLOW) + return smu_cmn_send_smc_msg_with_param( + smu, SMU_MSG_GmiPwrDnControl, 0, NULL); + else + return -EINVAL; +} + static int arcturus_allocate_dpm_context(struct smu_context *smu) { struct smu_dpm_context *smu_dpm = >smu_dpm; + struct smu_dpm_policy *policy; smu_dpm->dpm_context = kzalloc(sizeof(struct smu_11_0_dpm_context), GFP_KERNEL); @@ -293,6 +313,20 @@ static int arcturus_allocate_dpm_context(struct smu_context *smu) return -ENOMEM; smu_dpm->dpm_context_size = sizeof(struct smu_11_0_dpm_context); + smu_dpm->dpm_policies = + kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL); + + if (!smu_dpm->dpm_policies) + return -ENOMEM; + + policy = &(smu_dpm->dpm_policies->policies[0]); + policy->policy_type = PP_PM_POLICY_XGMI_PLPD; + policy->level_mask = BIT(XGMI_PLPD_DISALLOW) | BIT(XGMI_PLPD_DEFAULT); + policy->current_level = XGMI_PLPD_DEFAULT; + policy->set_policy = arcturus_select_plpd_policy; + smu_cmn_generic_plpd_policy_desc(policy); + smu_dpm->dpm_policies->policy_mask |= BIT(PP_PM_POLICY_XGMI_PLPD); + return 0; } @@ -403,6 +437,14 @@ static int arcturus_set_default_dpm_table(struct smu_context *smu) dpm_table->max = dpm_table->dpm_levels[0].value; } + /* XGMI PLPD is supported by 54.23.0 and onwards */ + if (smu->smc_fw_version < 0x00361700) { + struct smu_dpm_context *smu_dpm = >smu_dpm; + + smu_dpm->dpm_policies->policy_mask &= + ~BIT(PP_PM_POLICY_XGMI_PLPD); + } + return 0; } -- 2.25.1
[PATCH v3 03/10] drm/amd/pm: Add support to select pstate policy
Add support to select pstate policy in SOCs with SMUv13.0.6 Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 2 + .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 71 +++ drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 30 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h| 1 + 4 files changed, 104 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 9c0445fa9f9b..3a50076e44f0 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -531,10 +531,12 @@ int smu_v13_0_fini_smc_tables(struct smu_context *smu) smu_table->watermarks_table = NULL; smu_table->metrics_time = 0; + kfree(smu_dpm->dpm_policies); kfree(smu_dpm->dpm_context); kfree(smu_dpm->golden_dpm_context); kfree(smu_dpm->dpm_current_power_state); kfree(smu_dpm->dpm_request_power_state); + smu_dpm->dpm_policies = NULL; smu_dpm->dpm_context = NULL; smu_dpm->golden_dpm_context = NULL; smu_dpm->dpm_context_size = 0; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index ed9c4866b6e4..0ed0b5326d35 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -174,6 +174,7 @@ static const struct cmn2asic_msg_mapping smu_v13_0_6_message_map[SMU_MSG_MAX_COU MSG_MAP(McaBankCeDumpDW, PPSMC_MSG_McaBankCeDumpDW, SMU_MSG_RAS_PRI), MSG_MAP(SelectPLPDMode, PPSMC_MSG_SelectPLPDMode, 0), MSG_MAP(RmaDueToBadPageThreshold, PPSMC_MSG_RmaDueToBadPageThreshold,0), + MSG_MAP(SelectPstatePolicy, PPSMC_MSG_SelectPstatePolicy, 0), }; // clang-format on @@ -369,6 +370,39 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu) return 0; } +static int smu_v13_0_6_select_policy_soc_pstate(struct smu_context *smu, + int policy) +{ + struct amdgpu_device *adev = smu->adev; + int ret, param; + + switch (policy) { + case SOC_PSTATE_DEFAULT: + param = 0; + break; + case SOC_PSTATE_0: + param = 1; + break; + case SOC_PSTATE_1: + param = 2; + break; + case SOC_PSTATE_2: + param = 3; + break; + default: + return -EINVAL; + } + + ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SelectPstatePolicy, + param, NULL); + + if (ret) + dev_err(adev->dev, "select soc pstate policy %d failed", + policy); + + return ret; +} + static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu) { struct smu_dpm_context *smu_dpm = >smu_dpm; @@ -379,6 +413,24 @@ static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu) return -ENOMEM; smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context); + if (!(smu->adev->flags & AMD_IS_APU)) { + struct smu_dpm_policy *policy; + + smu_dpm->dpm_policies = + kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL); + policy = &(smu_dpm->dpm_policies->policies[0]); + + policy->policy_type = PP_PM_POLICY_SOC_PSTATE; + policy->level_mask = BIT(SOC_PSTATE_DEFAULT) | +BIT(SOC_PSTATE_0) | BIT(SOC_PSTATE_1) | +BIT(SOC_PSTATE_2); + policy->current_level = SOC_PSTATE_DEFAULT; + policy->set_policy = smu_v13_0_6_select_policy_soc_pstate; + smu_cmn_generic_soc_policy_desc(policy); + smu_dpm->dpm_policies->policy_mask |= + BIT(PP_PM_POLICY_SOC_PSTATE); + } + return 0; } @@ -639,6 +691,15 @@ static int smu_v13_0_6_get_dpm_level_count(struct smu_context *smu, return ret; } +static void smu_v13_0_6_pm_policy_init(struct smu_context *smu) +{ + struct smu_dpm_policy *policy; + + policy = smu_get_pm_policy(smu, PP_PM_POLICY_SOC_PSTATE); + if (policy) + policy->current_level = SOC_PSTATE_DEFAULT; +} + static int smu_v13_0_6_set_default_dpm_table(struct smu_context *smu) { struct smu_13_0_dpm_context *dpm_context = smu->smu_dpm.dpm_context; @@ -668,6 +729,16 @@ static int smu_v13_0_6_set_default_dpm_table(struct smu_context *smu) smu_v13_0_6_setup_driver_pptable(smu); + /* DPM policy not supported in older firmwares */ + if (!(smu->adev->flags & AMD_IS_APU) &&
[PATCH v3 02/10] drm/amd/pm: Update PMFW messages for SMUv13.0.6
Add PMF message to select a Pstate policy in SOCs with SMU v13.0.6. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 3 ++- drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h index 86758051cb93..41cb681927e2 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h @@ -92,7 +92,8 @@ #define PPSMC_MSG_McaBankCeDumpDW 0x3B #define PPSMC_MSG_SelectPLPDMode0x40 #define PPSMC_MSG_RmaDueToBadPageThreshold 0x43 -#define PPSMC_Message_Count 0x44 +#define PPSMC_MSG_SelectPstatePolicy0x44 +#define PPSMC_Message_Count 0x45 //PPSMC Reset Types for driver msg argument #define PPSMC_RESET_TYPE_DRIVER_MODE_1_RESET0x1 diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h index c48214e3dc8e..dff36bd7a17c 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h @@ -272,7 +272,8 @@ __SMU_DUMMY_MAP(SetSoftMinVpe), \ __SMU_DUMMY_MAP(GetMetricsVersion), \ __SMU_DUMMY_MAP(EnableUCLKShadow), \ - __SMU_DUMMY_MAP(RmaDueToBadPageThreshold), + __SMU_DUMMY_MAP(RmaDueToBadPageThreshold),\ + __SMU_DUMMY_MAP(SelectPstatePolicy), #undef __SMU_DUMMY_MAP #define __SMU_DUMMY_MAP(type) SMU_MSG_##type -- 2.25.1
[PATCH v3 01/10] drm/amd/pm: Add support for DPM policies
Add support to set/get information about different DPM policies. The support is only available on SOCs which use swsmu architecture. A DPM policy type may be defined with different levels. For example, a policy may be defined to select Pstate preference and then later a pstate preference may be chosen. Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang --- v2: Add NULL checks before accessing smu_dpm_policy_ctxt v3: Rebase to add device_attr_id__pm_policy .../gpu/drm/amd/include/kgd_pp_interface.h| 16 +++ drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 29 ++ drivers/gpu/drm/amd/pm/amdgpu_pm.c| 92 + drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 4 + drivers/gpu/drm/amd/pm/inc/amdgpu_pm.h| 1 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 98 +++ drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 29 ++ 7 files changed, 269 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index 805c9d37a2b4..8ed9aa9a990d 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -273,6 +273,22 @@ enum pp_xgmi_plpd_mode { XGMI_PLPD_COUNT, }; +enum pp_pm_policy { + PP_PM_POLICY_NONE = -1, + PP_PM_POLICY_SOC_PSTATE = 0, + PP_PM_POLICY_NUM, +}; + +enum pp_policy_soc_pstate { + SOC_PSTATE_DEFAULT = 0, + SOC_PSTATE_0, + SOC_PSTATE_1, + SOC_PSTATE_2, + SOC_PSTAT_COUNT, +}; + +#define PP_POLICY_MAX_LEVELS 5 + #define PP_GROUP_MASK0xF000 #define PP_GROUP_SHIFT 28 diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index eee919577b44..b443906484e7 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -411,6 +411,35 @@ int amdgpu_dpm_set_xgmi_plpd_mode(struct amdgpu_device *adev, int mode) return ret; } +ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char *buf) +{ + struct smu_context *smu = adev->powerplay.pp_handle; + int ret = -EOPNOTSUPP; + + if (is_support_sw_smu(adev)) { + mutex_lock(>pm.mutex); + ret = smu_get_pm_policy_info(smu, buf); + mutex_unlock(>pm.mutex); + } + + return ret; +} + +int amdgpu_dpm_set_pm_policy(struct amdgpu_device *adev, int policy_type, +int policy_level) +{ + struct smu_context *smu = adev->powerplay.pp_handle; + int ret = -EOPNOTSUPP; + + if (is_support_sw_smu(adev)) { + mutex_lock(>pm.mutex); + ret = smu_set_pm_policy(smu, policy_type, policy_level); + mutex_unlock(>pm.mutex); + } + + return ret; +} + int amdgpu_dpm_enable_mgpu_fan_boost(struct amdgpu_device *adev) { void *pp_handle = adev->powerplay.pp_handle; diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 110f2fc31754..d62f1d1d6c84 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2278,6 +2278,96 @@ static ssize_t amdgpu_set_xgmi_plpd_policy(struct device *dev, return count; } +static ssize_t amdgpu_get_pm_policy(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct drm_device *ddev = dev_get_drvdata(dev); + struct amdgpu_device *adev = drm_to_adev(ddev); + + if (amdgpu_in_reset(adev)) + return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; + + return amdgpu_dpm_get_pm_policy_info(adev, buf); +} + +static ssize_t amdgpu_set_pm_policy(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct drm_device *ddev = dev_get_drvdata(dev); + struct amdgpu_device *adev = drm_to_adev(ddev); + int policy_type, ret, num_params = 0; + char delimiter[] = " \n\t"; + char tmp_buf[128]; + char *tmp, *param; + long val; + + if (amdgpu_in_reset(adev)) + return -EPERM; + if (adev->in_suspend && !adev->in_runpm) + return -EPERM; + + count = min(count, sizeof(tmp_buf)); + memcpy(tmp_buf, buf, count); + tmp_buf[count - 1] = '\0'; + tmp = tmp_buf; + + tmp = skip_spaces(tmp); + if (strncmp(tmp, "soc_pstate", strlen("soc_pstate")) == 0) { + policy_type = PP_PM_POLICY_SOC_PSTATE; + tmp += strlen("soc_pstate"); + } else { + return -EINVAL; + } + + tmp = skip_spaces(tmp); + while ((param = strsep(, delimiter))) { + if (!strlen(param)) { + tmp = skip_spaces(tmp); + continue; + } + ret = kstrtol(param, 0, );
[PATCH v3 00/10] Add PM policy interfaces
This series adds APIs to get the supported PM policies and also set them. A PM policy type is a predefined policy type supported by an SOC and each policy may define two or more levels to choose from. A user can select the appropriate level through amdgpu_dpm_set_pm_policy() or through sysfs node pm_policy. Based on the specific PM functional area, multiple PM policies may be defined for an SOC For ex: a policy may be defined to set the right setting for XGMI per link power down feature and another may be defined to select the SOC Pstate preferences. Presently, XGMI PLPD and SOC Pstate policy types are supported. It also removes the legacy sysfs interface to set XGMI PLPD as it is not used any client like SMI tool. v2: Add NULL checks to avoid access on SOCs which don't support any policy. v3: Rebase and add documentation patch Lijo Lazar (10): drm/amd/pm: Add support for DPM policies drm/amd/pm: Update PMFW messages for SMUv13.0.6 drm/amd/pm: Add support to select pstate policy drm/amd/pm: Add xgmi plpd policy to pm_policy drm/amd/pm: Add xgmi plpd to SMU v13.0.6 pm_policy drm/amd/pm: Add xgmi plpd to aldebaran pm_policy drm/amd/pm: Add xgmi plpd to arcturus pm_policy drm/amd/pm: Remove legacy interface for xgmi plpd drm/amd/pm: Remove unused interface to set plpd Documentation/amdgpu: Add PM policy documentation Documentation/gpu/amdgpu/thermal.rst | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +- .../gpu/drm/amd/include/kgd_pp_interface.h| 17 ++ drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 32 ++-- drivers/gpu/drm/amd/pm/amdgpu_pm.c| 133 +++ drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 9 +- drivers/gpu/drm/amd/pm/inc/amdgpu_pm.h| 2 +- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 113 +++-- drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 37 - .../pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 3 +- drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 +- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 64 +--- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 59 --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 2 + .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 153 +- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 57 +++ drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h| 2 + 17 files changed, 527 insertions(+), 169 deletions(-) -- 2.25.1
RE: [PATCH] drm/amdgpu: add initial value for gfx12 AGP aperture
[AMD Official Use Only - AMD Internal Distribution Only] This patch was Reviewed-by: Likun Gao . Regards, Likun -Original Message- From: Min, Frank Sent: Monday, May 13, 2024 4:56 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Zhang, Hawking ; Gao, Likun Subject: [PATCH] drm/amdgpu: add initial value for gfx12 AGP aperture [AMD Official Use Only - AMD Internal Distribution Only] From: Frank Min add initial value for gfx12 AGP aperture Signed-off-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c index 34264a33dcdf..b876300bb9f6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -622,6 +622,7 @@ static void gmc_v12_0_vram_gtt_location(struct amdgpu_device *adev, base = adev->mmhub.funcs->get_fb_location(adev); + amdgpu_gmc_set_agp_default(adev, mc); amdgpu_gmc_vram_location(adev, >gmc, base); amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW); if (!amdgpu_sriov_vf(adev) && (amdgpu_agp == 1)) -- 2.34.1
[PATCH 4/5] drm/amdgpu: Fix null pointer dereference to bo
Check bo before using it Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c index 2b7b67916c1d..8269fd6828bf 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -497,9 +497,8 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device *adev, uint64_t *flags) { struct amdgpu_bo *bo = mapping->bo_va->base.bo; - struct amdgpu_device *bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); - bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; - bool is_system = bo->tbo.resource->mem_type == TTM_PL_SYSTEM; + struct amdgpu_device *bo_adev; + bool coherent, is_system; *flags &= ~AMDGPU_PTE_EXECUTABLE; @@ -515,13 +514,20 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device *adev, *flags &= ~AMDGPU_PTE_VALID; } - if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + if (!bo) + return; + + if (bo->flags & (AMDGPU_GEM_CREATE_COHERENT | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_GFX12_MASK) | AMDGPU_PTE_MTYPE_GFX12(MTYPE_UC); + bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); + coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; + is_system = bo->tbo.resource->mem_type == TTM_PL_SYSTEM; + /* WA for HW bug */ - if ((bo && is_system) || ((bo_adev != adev) && coherent)) + if (is_system || ((bo_adev != adev) && coherent)) *flags |= AMDGPU_PTE_MTYPE_GFX12(MTYPE_NC); } -- 2.34.1
[PATCH 5/5] drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr
Remove dead code in amdgpu_ras_add_mca_err_addr Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 6da02a209890..0cf67923c0fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -4292,21 +4292,8 @@ static struct ras_err_info *amdgpu_ras_error_get_info(struct ras_err_data *err_d void amdgpu_ras_add_mca_err_addr(struct ras_err_info *err_info, struct ras_err_addr *err_addr) { - struct ras_err_addr *mca_err_addr; - /* This function will be retired. */ return; - mca_err_addr = kzalloc(sizeof(*mca_err_addr), GFP_KERNEL); - if (!mca_err_addr) - return; - - INIT_LIST_HEAD(_err_addr->node); - - mca_err_addr->err_status = err_addr->err_status; - mca_err_addr->err_ipid = err_addr->err_ipid; - mca_err_addr->err_addr = err_addr->err_addr; - - list_add_tail(_err_addr->node, _info->err_addr_list); } void amdgpu_ras_del_mca_err_addr(struct ras_err_info *err_info, struct ras_err_addr *mca_err_addr) -- 2.34.1
[PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle
Check handle pointer before using it Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c index 28febf33fb1b..e969a7d77b4d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c @@ -279,7 +279,7 @@ static struct aca_bank_error *get_bank_error(struct aca_error *aerr, struct aca_ int aca_error_cache_log_bank_error(struct aca_handle *handle, struct aca_bank_info *info, enum aca_error_type type, u64 count) { - struct aca_error_cache *error_cache = >error_cache; + struct aca_error_cache *error_cache; struct aca_bank_error *bank_error; struct aca_error *aerr; @@ -289,6 +289,10 @@ int aca_error_cache_log_bank_error(struct aca_handle *handle, struct aca_bank_in if (!count) return 0; + error_cache = >error_cache; + if (!error_cache) + return -EINVAL; + aerr = _cache->errors[type]; bank_error = get_bank_error(aerr, info); if (!bank_error) -- 2.34.1
[PATCH 2/5] drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 1dd13ed3b7b5..6da02a209890 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2172,12 +2172,17 @@ static void amdgpu_ras_interrupt_process_handler(struct work_struct *work) int amdgpu_ras_interrupt_dispatch(struct amdgpu_device *adev, struct ras_dispatch_if *info) { - struct ras_manager *obj = amdgpu_ras_find_obj(adev, >head); - struct ras_ih_data *data = >ih_data; + struct ras_manager *obj; + struct ras_ih_data *data; + obj = amdgpu_ras_find_obj(adev, >head); if (!obj) return -EINVAL; + data = >ih_data; + if (!data) + return -EINVAL; + if (data->inuse == 0) return 0; -- 2.34.1
[PATCH 1/5] drm/amdgpu/pm: Fix the null pointer dereference for smu7
optimize the code to avoid pass a null pointer (hwmgr->backend) to function smu7_update_edc_leakage_table. Signed-off-by: Ma Jun --- .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 50 +-- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c index 123af237878f..632a25957477 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -2957,6 +2957,7 @@ static int smu7_update_edc_leakage_table(struct pp_hwmgr *hwmgr) static int smu7_hwmgr_backend_init(struct pp_hwmgr *hwmgr) { + struct amdgpu_device *adev = hwmgr->adev; struct smu7_hwmgr *data; int result = 0; @@ -2993,40 +2994,37 @@ static int smu7_hwmgr_backend_init(struct pp_hwmgr *hwmgr) /* Initalize Dynamic State Adjustment Rule Settings */ result = phm_initializa_dynamic_state_adjustment_rule_settings(hwmgr); - if (0 == result) { - struct amdgpu_device *adev = hwmgr->adev; + if (result) + goto fail; - data->is_tlu_enabled = false; + data->is_tlu_enabled = false; - hwmgr->platform_descriptor.hardwareActivityPerformanceLevels = + hwmgr->platform_descriptor.hardwareActivityPerformanceLevels = SMU7_MAX_HARDWARE_POWERLEVELS; - hwmgr->platform_descriptor.hardwarePerformanceLevels = 2; - hwmgr->platform_descriptor.minimumClocksReductionPercentage = 50; + hwmgr->platform_descriptor.hardwarePerformanceLevels = 2; + hwmgr->platform_descriptor.minimumClocksReductionPercentage = 50; - data->pcie_gen_cap = adev->pm.pcie_gen_mask; - if (data->pcie_gen_cap & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) - data->pcie_spc_cap = 20; - else - data->pcie_spc_cap = 16; - data->pcie_lane_cap = adev->pm.pcie_mlw_mask; - - hwmgr->platform_descriptor.vbiosInterruptId = 0x2400; /* IRQ_SOURCE1_SW_INT */ -/* The true clock step depends on the frequency, typically 4.5 or 9 MHz. Here we use 5. */ - hwmgr->platform_descriptor.clockStep.engineClock = 500; - hwmgr->platform_descriptor.clockStep.memoryClock = 500; - smu7_thermal_parameter_init(hwmgr); - } else { - /* Ignore return value in here, we are cleaning up a mess. */ - smu7_hwmgr_backend_fini(hwmgr); - } + data->pcie_gen_cap = adev->pm.pcie_gen_mask; + if (data->pcie_gen_cap & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3) + data->pcie_spc_cap = 20; + else + data->pcie_spc_cap = 16; + data->pcie_lane_cap = adev->pm.pcie_mlw_mask; + + hwmgr->platform_descriptor.vbiosInterruptId = 0x2400; /* IRQ_SOURCE1_SW_INT */ + /* The true clock step depends on the frequency, typically 4.5 or 9 MHz. Here we use 5. */ + hwmgr->platform_descriptor.clockStep.engineClock = 500; + hwmgr->platform_descriptor.clockStep.memoryClock = 500; + smu7_thermal_parameter_init(hwmgr); result = smu7_update_edc_leakage_table(hwmgr); - if (result) { - smu7_hwmgr_backend_fini(hwmgr); - return result; - } + if (result) + goto fail; return 0; +fail: + smu7_hwmgr_backend_fini(hwmgr); + return result; } static int smu7_force_dpm_highest(struct pp_hwmgr *hwmgr) -- 2.34.1
[PATCH] drm/amdgpu: add initial value for gfx12 AGP aperture
[AMD Official Use Only - AMD Internal Distribution Only] From: Frank Min add initial value for gfx12 AGP aperture Signed-off-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c index 34264a33dcdf..b876300bb9f6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -622,6 +622,7 @@ static void gmc_v12_0_vram_gtt_location(struct amdgpu_device *adev, base = adev->mmhub.funcs->get_fb_location(adev); + amdgpu_gmc_set_agp_default(adev, mc); amdgpu_gmc_vram_location(adev, >gmc, base); amdgpu_gmc_gart_location(adev, mc, AMDGPU_GART_PLACEMENT_LOW); if (!amdgpu_sriov_vf(adev) && (amdgpu_agp == 1)) -- 2.34.1
Re: [PATCH v2 2/2] drm/tests: Add a unit test for range bias allocation
On 12/05/2024 08:59, Arunpravin Paneer Selvam wrote: Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 35 ++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index e3b50e240d36..a194f271bc55 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -26,6 +26,8 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; DRM_RND_STATE(prng, random_seed); unsigned int i, count, *order; + struct drm_buddy_block *block; + unsigned long flags; struct drm_buddy mm; LIST_HEAD(allocated); @@ -222,6 +224,39 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_free_list(, , 0); drm_buddy_fini(); + + /* +* Allocate cleared blocks in the bias range when the DRM buddy's clear avail is +* zero. This will validate the bias range allocation in scenarios like system boot +* when no cleared blocks are available and exercise the fallback path too. The resulting +* blocks should always be dirty. +*/ + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(, mm_size, ps), + "buddy_init failed\n"); + mm.clear_avail = 0; Should already be zero, right? Maybe make this an assert instead? + + bias_start = round_up(prandom_u32_state() % (mm_size - ps), ps); + bias_end = round_up(bias_start + prandom_u32_state() % (mm_size - bias_start), ps); + bias_end = max(bias_end, bias_start + ps); + bias_rem = bias_end - bias_start; + + flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION; + u32 size = max(round_up(prandom_u32_state() % bias_rem, ps), ps); u32 declaration should be moved to above? Otherwise, Reviewed-by: Matthew Auld + + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(, bias_start, + bias_end, size, ps, + , + flags), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, size, ps); + + list_for_each_entry(block, , link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(, , 0); + drm_buddy_fini(); } static void drm_test_buddy_alloc_clear(struct kunit *test)
RE: [PATCH 18/22 V3] drm/amd/pm: check negtive return for table entries
[AMD Official Use Only - AMD Internal Distribution Only] This patch is, Reviewed-by: Tim Huang > -Original Message- > From: Jesse Zhang > Sent: Monday, May 13, 2024 4:06 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: [PATCH 18/22 V3] drm/amd/pm: check negtive return for table > entries > > Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) > returns a negative number > > Signed-off-by: Jesse Zhang > Suggested-by: Tim Huang > --- > drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > index f4bd8e9357e2..18f00038d844 100644 > --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > @@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) { > int result; > unsigned int i; > - unsigned int table_entries; > struct pp_power_state *state; > - int size; > + int size, table_entries; > > if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL) > return 0; > @@ -40,15 +39,19 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) > if (hwmgr->hwmgr_func->get_power_state_size == NULL) > return 0; > > - hwmgr->num_ps = table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > + table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > > - hwmgr->ps_size = size = > hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > + size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > sizeof(struct pp_power_state); > > - if (table_entries == 0 || size == 0) { > + if (table_entries <= 0 || size == 0) { > pr_warn("Please check whether power state management is > supported on this asic\n"); > + hwmgr->num_ps = 0; > + hwmgr->ps_size = 0; > return 0; > } > + hwmgr->num_ps = table_entries; > + hwmgr->ps_size = size; > > hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL); > if (hwmgr->ps == NULL) > -- > 2.25.1
[PATCH 18/22 V3] drm/amd/pm: check negtive return for table entries
Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) returns a negative number Signed-off-by: Jesse Zhang Suggested-by: Tim Huang --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c index f4bd8e9357e2..18f00038d844 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c @@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr) { int result; unsigned int i; - unsigned int table_entries; struct pp_power_state *state; - int size; + int size, table_entries; if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL) return 0; @@ -40,15 +39,19 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr) if (hwmgr->hwmgr_func->get_power_state_size == NULL) return 0; - hwmgr->num_ps = table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); + table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); - hwmgr->ps_size = size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + + size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + sizeof(struct pp_power_state); - if (table_entries == 0 || size == 0) { + if (table_entries <= 0 || size == 0) { pr_warn("Please check whether power state management is supported on this asic\n"); + hwmgr->num_ps = 0; + hwmgr->ps_size = 0; return 0; } + hwmgr->num_ps = table_entries; + hwmgr->ps_size = size; hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL); if (hwmgr->ps == NULL) -- 2.25.1
RE: [PATCH 18/22] drm/amd/pm: check negtive return for table entries
[AMD Official Use Only - AMD Internal Distribution Only] Hi Tim, -Original Message- From: Huang, Tim Sent: Monday, May 13, 2024 3:40 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Zhang, Jesse(Jie) ; Zhang, Jesse(Jie) Subject: RE: [PATCH 18/22] drm/amd/pm: check negtive return for table entries [AMD Official Use Only - AMD Internal Distribution Only] Hi Jesse, > -Original Message- > From: Jesse Zhang > Sent: Monday, May 13, 2024 3:34 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: [PATCH 18/22] drm/amd/pm: check negtive return for table > entries > > Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) > returns a negative number > > Signed-off-by: Jesse Zhang > Suggested-by: Tim Huang > --- > drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > index f4bd8e9357e2..1276a95acc90 100644 > --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > @@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) { > int result; > unsigned int i; > - unsigned int table_entries; > struct pp_power_state *state; > - int size; > + int size, table_entries; > > if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL) > return 0; > @@ -40,15 +39,17 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) > if (hwmgr->hwmgr_func->get_power_state_size == NULL) > return 0; > > - hwmgr->num_ps = table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > + table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > > - hwmgr->ps_size = size = > hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > + size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > sizeof(struct pp_power_state); > > - if (table_entries == 0 || size == 0) { > + if (table_entries <= 0 || size == 0) { > pr_warn("Please check whether power state management is > supported on this asic\n"); As we return 0 here, we still need to set the hwmgr->num_ps and hwmgr->ps_size to 0 here. [Zhang, Jesse(Jie)] yes, right. Thanks Tim. Tim Huang > return 0; > } > + hwmgr->num_ps = table_entries; > + hwmgr->ps_size = size; > > hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL); > if (hwmgr->ps == NULL) > -- > 2.25.1
[RESEND 2/6] drm/radeon: convert to using is_hdmi and has_audio from display info
Prefer the parsed results for is_hdmi and has_audio in display info over calling drm_detect_hdmi_monitor() and drm_detect_monitor_audio(), respectively. Cc: Alex Deucher Cc: Christian König Cc: Pan, Xinhui Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Jani Nikula --- drivers/gpu/drm/radeon/atombios_encoders.c | 10 +- drivers/gpu/drm/radeon/evergreen_hdmi.c| 5 ++--- drivers/gpu/drm/radeon/radeon_audio.c | 6 +++--- drivers/gpu/drm/radeon/radeon_connectors.c | 12 ++-- drivers/gpu/drm/radeon/radeon_display.c| 2 +- drivers/gpu/drm/radeon/radeon_encoders.c | 4 ++-- 6 files changed, 19 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c index 2bff0d9e20f5..0aa395fac36f 100644 --- a/drivers/gpu/drm/radeon/atombios_encoders.c +++ b/drivers/gpu/drm/radeon/atombios_encoders.c @@ -701,7 +701,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) if (radeon_connector->use_digital && (radeon_connector->audio == RADEON_AUDIO_ENABLE)) return ATOM_ENCODER_MODE_HDMI; - else if (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && + else if (connector->display_info.is_hdmi && (radeon_connector->audio == RADEON_AUDIO_AUTO)) return ATOM_ENCODER_MODE_HDMI; else if (radeon_connector->use_digital) @@ -720,7 +720,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) if (radeon_audio != 0) { if (radeon_connector->audio == RADEON_AUDIO_ENABLE) return ATOM_ENCODER_MODE_HDMI; - else if (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && + else if (connector->display_info.is_hdmi && (radeon_connector->audio == RADEON_AUDIO_AUTO)) return ATOM_ENCODER_MODE_HDMI; else @@ -737,14 +737,14 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) if ((dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_DISPLAYPORT) || (dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_eDP)) { if (radeon_audio != 0 && - drm_detect_monitor_audio(radeon_connector_edid(connector)) && + connector->display_info.has_audio && ASIC_IS_DCE4(rdev) && !ASIC_IS_DCE5(rdev)) return ATOM_ENCODER_MODE_DP_AUDIO; return ATOM_ENCODER_MODE_DP; } else if (radeon_audio != 0) { if (radeon_connector->audio == RADEON_AUDIO_ENABLE) return ATOM_ENCODER_MODE_HDMI; - else if (drm_detect_hdmi_monitor(radeon_connector_edid(connector)) && + else if (connector->display_info.is_hdmi && (radeon_connector->audio == RADEON_AUDIO_AUTO)) return ATOM_ENCODER_MODE_HDMI; else @@ -755,7 +755,7 @@ atombios_get_encoder_mode(struct drm_encoder *encoder) break; case DRM_MODE_CONNECTOR_eDP: if (radeon_audio != 0 && - drm_detect_monitor_audio(radeon_connector_edid(connector)) && + connector->display_info.has_audio && ASIC_IS_DCE4(rdev) && !ASIC_IS_DCE5(rdev)) return ATOM_ENCODER_MODE_DP_AUDIO; return ATOM_ENCODER_MODE_DP; diff --git a/drivers/gpu/drm/radeon/evergreen_hdmi.c b/drivers/gpu/drm/radeon/evergreen_hdmi.c index 681119c91d94..09dda114e218 100644 --- a/drivers/gpu/drm/radeon/evergreen_hdmi.c +++ b/drivers/gpu/drm/radeon/evergreen_hdmi.c @@ -412,7 +412,7 @@ void evergreen_hdmi_enable(struct drm_encoder *encoder, bool enable) if (enable) { struct drm_connector *connector = radeon_get_connector_for_encoder(encoder); - if (connector && drm_detect_monitor_audio(radeon_connector_edid(connector))) { + if (connector && connector->display_info.has_audio) { WREG32(HDMI_INFOFRAME_CONTROL0 + dig->afmt->offset, HDMI_AVI_INFO_SEND | /* enable AVI info frames */ HDMI_AVI_INFO_CONT | /* required for audio info values to be updated */ @@ -450,8 +450,7 @@ void evergreen_dp_enable(struct drm_encoder *encoder, bool enable) if (!dig || !dig->afmt) return; - if (enable && connector && - drm_detect_monitor_audio(radeon_connector_edid(connector))) { + if (enable && connector && connector->display_info.has_audio) {
[RESEND 5/6] drm/edid: add a helper for EDID sysfs property show
Add a helper to get the EDID property for sysfs property show. This hides all the edid_blob_ptr usage within drm_edid.c. Signed-off-by: Jani Nikula --- drivers/gpu/drm/drm_crtc_internal.h | 2 ++ drivers/gpu/drm/drm_edid.c | 33 + drivers/gpu/drm/drm_sysfs.c | 24 ++--- 3 files changed, 37 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h index 25aaae937ceb..20e9d7b206a2 100644 --- a/drivers/gpu/drm/drm_crtc_internal.h +++ b/drivers/gpu/drm/drm_crtc_internal.h @@ -303,6 +303,8 @@ const u8 *drm_edid_find_extension(const struct drm_edid *drm_edid, int ext_id, int *ext_index); void drm_edid_cta_sad_get(const struct cea_sad *cta_sad, u8 *sad); void drm_edid_cta_sad_set(struct cea_sad *cta_sad, const u8 *sad); +ssize_t drm_edid_connector_property_show(struct drm_connector *connector, +char *buf, loff_t off, size_t count); /* drm_edid_load.c */ #ifdef CONFIG_DRM_LOAD_EDID_FIRMWARE diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index 4f54c91b31b2..97362dd2330b 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -6969,6 +6969,39 @@ static int _drm_edid_connector_property_update(struct drm_connector *connector, return ret; } +/* For sysfs edid show implementation */ +ssize_t drm_edid_connector_property_show(struct drm_connector *connector, +char *buf, loff_t off, size_t count) +{ + const void *edid; + size_t size; + ssize_t ret = 0; + + mutex_lock(>dev->mode_config.mutex); + + if (!connector->edid_blob_ptr) + goto unlock; + + edid = connector->edid_blob_ptr->data; + size = connector->edid_blob_ptr->length; + if (!edid) + goto unlock; + + if (off >= size) + goto unlock; + + if (off + count > size) + count = size - off; + + memcpy(buf, edid + off, count); + + ret = count; +unlock: + mutex_unlock(>dev->mode_config.mutex); + + return ret; +} + /** * drm_edid_connector_update - Update connector information from EDID * @connector: Connector diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index bd9b8ab4f82b..fb3bbb6adcd1 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -266,29 +266,9 @@ static ssize_t edid_show(struct file *filp, struct kobject *kobj, { struct device *connector_dev = kobj_to_dev(kobj); struct drm_connector *connector = to_drm_connector(connector_dev); - unsigned char *edid; - size_t size; - ssize_t ret = 0; + ssize_t ret; - mutex_lock(>dev->mode_config.mutex); - if (!connector->edid_blob_ptr) - goto unlock; - - edid = connector->edid_blob_ptr->data; - size = connector->edid_blob_ptr->length; - if (!edid) - goto unlock; - - if (off >= size) - goto unlock; - - if (off + count > size) - count = size - off; - memcpy(buf, edid + off, count); - - ret = count; -unlock: - mutex_unlock(>dev->mode_config.mutex); + ret = drm_edid_connector_property_show(connector, buf, off, count); return ret; } -- 2.39.2
Re: [PATCH v2] drm/amd/display: fix documentation warnings for mpc.h
Hi Marcelo, On Sat, 11 May 2024 13:37:17 +1000 Stephen Rothwell wrote: > > Thanks for doing this. > > I haven't tested it, but just a couple of little things: > > On Fri, 10 May 2024 21:02:02 -0300 Marcelo Mendes Spessoto Junior > wrote: > > > > Fix most of the display documentation compile warnings by > > documenting struct mpc_funcs functions in dc/inc/hw/mpc.h file. > > > . > . > . > > > > Fixes: > > b8c1c3a82e75 ("Documentation/gpu: Add kernel doc entry for MPC") > > Please don't split the Fixes tag line and also don't add blank lines > between tags. > > I also appreciate being credited for reporting (unless you got a report > from somewhere else as well, then maybe credit both). > > Reported-by: Stephen Rothwell Now tested. Tested-by: Stephen Rothwell Closes: https://lore.kernel.org/linux-next/20240130134954.04fcf...@canb.auug.org.au/ -- Cheers, Stephen Rothwell pgpn3TqFBg4mT.pgp Description: OpenPGP digital signature
Re: [PATCH v2] drm/amd/display: fix documentation warnings for mpc.h
Hi Marcelo, Thanks for doing this. I haven't tested it, but just a couple of little things: On Fri, 10 May 2024 21:02:02 -0300 Marcelo Mendes Spessoto Junior wrote: > > Fix most of the display documentation compile warnings by > documenting struct mpc_funcs functions in dc/inc/hw/mpc.h file. > . . . > > Fixes: > b8c1c3a82e75 ("Documentation/gpu: Add kernel doc entry for MPC") Please don't split the Fixes tag line and also don't add blank lines between tags. I also appreciate being credited for reporting (unless you got a report from somewhere else as well, then maybe credit both). Reported-by: Stephen Rothwell > > Signed-off-by: Marcelo Mendes Spessoto Junior > --- -- Cheers, Stephen Rothwell pgpleiX0ogMI_.pgp Description: OpenPGP digital signature
[PATCH] drm/amd/display: clean up some inconsistent indenting
No functional modification involved. drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn401/dcn401_resource.c:792 dcn401_i2c_hw_create() warn: inconsistent indenting. drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn401/dcn401_resource.c:894 dcn401_hubp_create() warn: inconsistent indenting. drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn401/dcn401_resource.c:1738 dcn401_resource_construct() warn: inconsistent indenting. Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9002 Signed-off-by: Jiapeng Chong --- .../dc/resource/dcn401/dcn401_resource.c | 29 +-- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c index 75e2c62ae792..3e1bfddc6e43 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c @@ -784,14 +784,13 @@ static struct dce_i2c_hw *dcn401_i2c_hw_create( #undef REG_STRUCT #define REG_STRUCT i2c_hw_regs - i2c_inst_regs_init(1), - i2c_inst_regs_init(2), - i2c_inst_regs_init(3), - i2c_inst_regs_init(4); + i2c_inst_regs_init(1), + i2c_inst_regs_init(2), + i2c_inst_regs_init(3), + i2c_inst_regs_init(4); dcn2_i2c_hw_construct(dce_i2c_hw, ctx, inst, - _hw_regs[inst], _shifts, _masks); - + _hw_regs[inst], _shifts, _masks); return dce_i2c_hw; } @@ -886,13 +885,13 @@ static struct hubp *dcn401_hubp_create( #undef REG_STRUCT #define REG_STRUCT hubp_regs - hubp_regs_init(0), - hubp_regs_init(1), - hubp_regs_init(2), - hubp_regs_init(3); + hubp_regs_init(0), + hubp_regs_init(1), + hubp_regs_init(2), + hubp_regs_init(3); if (hubp401_construct(hubp2, ctx, inst, - _regs[inst], _shift, _mask)) + _regs[inst], _shift, _mask)) return >base; BREAK_TO_DEBUGGER(); @@ -1735,10 +1734,10 @@ static bool dcn401_resource_construct( #undef REG_STRUCT #define REG_STRUCT abm_regs - abm_regs_init(0), - abm_regs_init(1), - abm_regs_init(2), - abm_regs_init(3); + abm_regs_init(0), + abm_regs_init(1), + abm_regs_init(2), + abm_regs_init(3); #undef REG_STRUCT #define REG_STRUCT dccg_regs -- 2.20.1.7.g153144c
[RESEND 0/6] drm, nouveau/radeon/amdpgu: edid_blob_ptr cleanups
I've sent this some moths ago, let's try again... BR, Jani. Jani Nikula (6): drm/nouveau: convert to using is_hdmi and has_audio from display info drm/radeon: convert to using is_hdmi and has_audio from display info drm/radeon: remove radeon_connector_edid() and stop using edid_blob_ptr drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr drm/edid: add a helper for EDID sysfs property show drm/connector: update edid_blob_ptr documentation .../gpu/drm/amd/amdgpu/amdgpu_connectors.c| 16 - .../gpu/drm/amd/amdgpu/amdgpu_connectors.h| 1 - drivers/gpu/drm/amd/amdgpu/dce_v10_0.c| 4 +-- drivers/gpu/drm/amd/amdgpu/dce_v11_0.c| 4 +-- drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 4 +-- drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 +-- drivers/gpu/drm/drm_crtc_internal.h | 2 ++ drivers/gpu/drm/drm_edid.c| 33 +++ drivers/gpu/drm/drm_sysfs.c | 24 ++ drivers/gpu/drm/nouveau/dispnv50/disp.c | 8 ++--- drivers/gpu/drm/nouveau/dispnv50/head.c | 8 + drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- drivers/gpu/drm/radeon/atombios_encoders.c| 10 +++--- drivers/gpu/drm/radeon/evergreen_hdmi.c | 5 ++- drivers/gpu/drm/radeon/radeon_audio.c | 13 drivers/gpu/drm/radeon/radeon_connectors.c| 27 --- drivers/gpu/drm/radeon/radeon_display.c | 2 +- drivers/gpu/drm/radeon/radeon_encoders.c | 4 +-- drivers/gpu/drm/radeon/radeon_mode.h | 2 -- include/drm/drm_connector.h | 6 +++- 20 files changed, 79 insertions(+), 100 deletions(-) -- 2.39.2
[RESEND 6/6] drm/connector: update edid_blob_ptr documentation
Accessing the EDID via edid_blob_ptr causes chicken-and-egg problems. Keep edid_blob_ptr as the userspace interface that should be accessed via dedicated functions. Signed-off-by: Jani Nikula --- include/drm/drm_connector.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h index fe88d7fc6b8f..58ee9adf9091 100644 --- a/include/drm/drm_connector.h +++ b/include/drm/drm_connector.h @@ -1636,8 +1636,12 @@ struct drm_connector { /** * @edid_blob_ptr: DRM property containing EDID if present. Protected by -* _mode_config.mutex. This should be updated only by calling +* _mode_config.mutex. +* +* This must be updated only by calling drm_edid_connector_update() or * drm_connector_update_edid_property(). +* +* This must not be used by drivers directly. */ struct drm_property_blob *edid_blob_ptr; -- 2.39.2
[RESEND 1/6] drm/nouveau: convert to using is_hdmi and has_audio from display info
Prefer the parsed results for is_hdmi and has_audio in display info over calling drm_detect_hdmi_monitor() and drm_detect_monitor_audio(), respectively. Conveniently, this also removes the need to use edid_blob_ptr. v2: Reverse a backwards if condition (Ilia) Cc: Karol Herbst Cc: Lyude Paul Cc: Danilo Krummrich Cc: nouv...@lists.freedesktop.org Signed-off-by: Jani Nikula --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 8 drivers/gpu/drm/nouveau/dispnv50/head.c | 8 +--- drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- 3 files changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 0c3d88ad0b0e..168c27213287 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -751,7 +751,7 @@ nv50_audio_enable(struct drm_encoder *encoder, struct nouveau_crtc *nv_crtc, struct nouveau_encoder *nv_encoder = nouveau_encoder(encoder); struct nvif_outp *outp = _encoder->outp; - if (!nv50_audio_supported(encoder) || !drm_detect_monitor_audio(nv_connector->edid)) + if (!nv50_audio_supported(encoder) || !nv_connector->base.display_info.has_audio) return; mutex_lock(>audio.lock); @@ -1765,7 +1765,7 @@ nv50_sor_atomic_enable(struct drm_encoder *encoder, struct drm_atomic_state *sta if ((disp->disp->object.oclass == GT214_DISP || disp->disp->object.oclass >= GF110_DISP) && nv_encoder->dcb->type != DCB_OUTPUT_LVDS && - drm_detect_monitor_audio(nv_connector->edid)) + nv_connector->base.display_info.has_audio) hda = true; if (!nvif_outp_acquired(outp)) @@ -1774,7 +1774,7 @@ nv50_sor_atomic_enable(struct drm_encoder *encoder, struct drm_atomic_state *sta switch (nv_encoder->dcb->type) { case DCB_OUTPUT_TMDS: if (disp->disp->object.oclass != NV50_DISP && - drm_detect_hdmi_monitor(nv_connector->edid)) + nv_connector->base.display_info.is_hdmi) nv50_hdmi_enable(encoder, nv_crtc, nv_connector, state, mode, hda); if (nv_encoder->outp.or.link & 1) { @@ -1787,7 +1787,7 @@ nv50_sor_atomic_enable(struct drm_encoder *encoder, struct drm_atomic_state *sta */ if (mode->clock >= 165000 && nv_encoder->dcb->duallink_possible && - !drm_detect_hdmi_monitor(nv_connector->edid)) + !nv_connector->base.display_info.is_hdmi) proto = NV507D_SOR_SET_CONTROL_PROTOCOL_DUAL_TMDS; } else { proto = NV507D_SOR_SET_CONTROL_PROTOCOL_SINGLE_TMDS_B; diff --git a/drivers/gpu/drm/nouveau/dispnv50/head.c b/drivers/gpu/drm/nouveau/dispnv50/head.c index 83355dbc15ee..d7c74cc43ba5 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/head.c +++ b/drivers/gpu/drm/nouveau/dispnv50/head.c @@ -127,14 +127,8 @@ nv50_head_atomic_check_view(struct nv50_head_atom *armh, struct drm_display_mode *omode = >state.adjusted_mode; struct drm_display_mode *umode = >state.mode; int mode = asyc->scaler.mode; - struct edid *edid; int umode_vdisplay, omode_hdisplay, omode_vdisplay; - if (connector->edid_blob_ptr) - edid = (struct edid *)connector->edid_blob_ptr->data; - else - edid = NULL; - if (!asyc->scaler.full) { if (mode == DRM_MODE_SCALE_NONE) omode = umode; @@ -162,7 +156,7 @@ nv50_head_atomic_check_view(struct nv50_head_atom *armh, */ if ((asyc->scaler.underscan.mode == UNDERSCAN_ON || (asyc->scaler.underscan.mode == UNDERSCAN_AUTO && -drm_detect_hdmi_monitor(edid { +connector->display_info.is_hdmi))) { u32 bX = asyc->scaler.underscan.hborder; u32 bY = asyc->scaler.underscan.vborder; u32 r = (asyh->view.oH << 19) / asyh->view.oW; diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 856b3ef5edb8..938832a6af15 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -1034,7 +1034,7 @@ get_tmds_link_bandwidth(struct drm_connector *connector) unsigned duallink_scale = nouveau_duallink && nv_encoder->dcb->duallink_possible ? 2 : 1; - if (drm_detect_hdmi_monitor(nv_connector->edid)) { + if (nv_connector->base.display_info.is_hdmi) { info = _connector->base.display_info; duallink_scale = 1; } -- 2.39.2
[RESEND 3/6] drm/radeon: remove radeon_connector_edid() and stop using edid_blob_ptr
radeon_connector_edid() copies the EDID from edid_blob_ptr as a side effect if radeon_connector->edid isn't initialized. However, everywhere that the returned EDID is used, the EDID should have been set beforehands. Only the drm EDID code should look at the EDID property, anyway, so stop using it. Cc: Alex Deucher Cc: Christian König Cc: Pan, Xinhui Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Jani Nikula --- drivers/gpu/drm/radeon/radeon_audio.c | 7 --- drivers/gpu/drm/radeon/radeon_connectors.c | 15 --- drivers/gpu/drm/radeon/radeon_mode.h | 2 -- 3 files changed, 4 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_audio.c b/drivers/gpu/drm/radeon/radeon_audio.c index 16c10db3ce6f..0bcd767b9f47 100644 --- a/drivers/gpu/drm/radeon/radeon_audio.c +++ b/drivers/gpu/drm/radeon/radeon_audio.c @@ -303,6 +303,7 @@ void radeon_audio_endpoint_wreg(struct radeon_device *rdev, u32 offset, static void radeon_audio_write_sad_regs(struct drm_encoder *encoder) { struct drm_connector *connector = radeon_get_connector_for_encoder(encoder); + struct radeon_connector *radeon_connector = to_radeon_connector(connector); struct radeon_encoder *radeon_encoder = to_radeon_encoder(encoder); struct cea_sad *sads; int sad_count; @@ -310,7 +311,7 @@ static void radeon_audio_write_sad_regs(struct drm_encoder *encoder) if (!connector) return; - sad_count = drm_edid_to_sad(radeon_connector_edid(connector), ); + sad_count = drm_edid_to_sad(radeon_connector->edid, ); if (sad_count < 0) DRM_ERROR("Couldn't read SADs: %d\n", sad_count); if (sad_count <= 0) @@ -326,6 +327,7 @@ static void radeon_audio_write_sad_regs(struct drm_encoder *encoder) static void radeon_audio_write_speaker_allocation(struct drm_encoder *encoder) { struct drm_connector *connector = radeon_get_connector_for_encoder(encoder); + struct radeon_connector *radeon_connector = to_radeon_connector(connector); struct radeon_encoder *radeon_encoder = to_radeon_encoder(encoder); u8 *sadb = NULL; int sad_count; @@ -333,8 +335,7 @@ static void radeon_audio_write_speaker_allocation(struct drm_encoder *encoder) if (!connector) return; - sad_count = drm_edid_to_speaker_allocation(radeon_connector_edid(connector), - ); + sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, ); if (sad_count < 0) { DRM_DEBUG("Couldn't read Speaker Allocation Data Block: %d\n", sad_count); diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index 81b5c3c8f658..80879e946342 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -255,21 +255,6 @@ static struct drm_encoder *radeon_find_encoder(struct drm_connector *connector, return NULL; } -struct edid *radeon_connector_edid(struct drm_connector *connector) -{ - struct radeon_connector *radeon_connector = to_radeon_connector(connector); - struct drm_property_blob *edid_blob = connector->edid_blob_ptr; - - if (radeon_connector->edid) { - return radeon_connector->edid; - } else if (edid_blob) { - struct edid *edid = kmemdup(edid_blob->data, edid_blob->length, GFP_KERNEL); - if (edid) - radeon_connector->edid = edid; - } - return radeon_connector->edid; -} - static void radeon_connector_get_edid(struct drm_connector *connector) { struct drm_device *dev = connector->dev; diff --git a/drivers/gpu/drm/radeon/radeon_mode.h b/drivers/gpu/drm/radeon/radeon_mode.h index 546381a5c918..e0a5af180801 100644 --- a/drivers/gpu/drm/radeon/radeon_mode.h +++ b/drivers/gpu/drm/radeon/radeon_mode.h @@ -701,8 +701,6 @@ extern u16 radeon_connector_encoder_get_dp_bridge_encoder_id(struct drm_connecto extern bool radeon_connector_is_dp12_capable(struct drm_connector *connector); extern int radeon_get_monitor_bpc(struct drm_connector *connector); -extern struct edid *radeon_connector_edid(struct drm_connector *connector); - extern void radeon_connector_hotplug(struct drm_connector *connector); extern int radeon_dp_mode_valid_helper(struct drm_connector *connector, struct drm_display_mode *mode); -- 2.39.2
[RESEND 4/6] drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr
amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side effect if amdgpu_connector->edid isn't initialized. However, everywhere that the returned EDID is used, the EDID should have been set beforehands. Only the drm EDID code should look at the EDID property, anyway, so stop using it. Cc: Alex Deucher Cc: Christian König Cc: Pan, Xinhui Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Jani Nikula --- drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 16 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h | 1 - drivers/gpu/drm/amd/amdgpu/dce_v10_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/dce_v11_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 4 ++-- 6 files changed, 8 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c index 9caba10315a8..cae7479c3ecf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c @@ -246,22 +246,6 @@ amdgpu_connector_find_encoder(struct drm_connector *connector, return NULL; } -struct edid *amdgpu_connector_edid(struct drm_connector *connector) -{ - struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector); - struct drm_property_blob *edid_blob = connector->edid_blob_ptr; - - if (amdgpu_connector->edid) { - return amdgpu_connector->edid; - } else if (edid_blob) { - struct edid *edid = kmemdup(edid_blob->data, edid_blob->length, GFP_KERNEL); - - if (edid) - amdgpu_connector->edid = edid; - } - return amdgpu_connector->edid; -} - static struct edid * amdgpu_connector_get_hardcoded_edid(struct amdgpu_device *adev) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h index 61fcef15ad72..eff833b6ed31 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.h @@ -24,7 +24,6 @@ #ifndef __AMDGPU_CONNECTORS_H__ #define __AMDGPU_CONNECTORS_H__ -struct edid *amdgpu_connector_edid(struct drm_connector *connector); void amdgpu_connector_hotplug(struct drm_connector *connector); int amdgpu_connector_get_monitor_bpc(struct drm_connector *connector); u16 amdgpu_connector_encoder_get_dp_bridge_encoder_id(struct drm_connector *connector); diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c index b44fce44c066..dddb5fe16f2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c @@ -1299,7 +1299,7 @@ static void dce_v10_0_audio_write_speaker_allocation(struct drm_encoder *encoder return; } - sad_count = drm_edid_to_speaker_allocation(amdgpu_connector_edid(connector), ); + sad_count = drm_edid_to_speaker_allocation(amdgpu_connector->edid, ); if (sad_count < 0) { DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count); sad_count = 0; @@ -1369,7 +1369,7 @@ static void dce_v10_0_audio_write_sad_regs(struct drm_encoder *encoder) return; } - sad_count = drm_edid_to_sad(amdgpu_connector_edid(connector), ); + sad_count = drm_edid_to_sad(amdgpu_connector->edid, ); if (sad_count < 0) DRM_ERROR("Couldn't read SADs: %d\n", sad_count); if (sad_count <= 0) diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index 80b2e7f79acf..11780e4d7e9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -1331,7 +1331,7 @@ static void dce_v11_0_audio_write_speaker_allocation(struct drm_encoder *encoder return; } - sad_count = drm_edid_to_speaker_allocation(amdgpu_connector_edid(connector), ); + sad_count = drm_edid_to_speaker_allocation(amdgpu_connector->edid, ); if (sad_count < 0) { DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count); sad_count = 0; @@ -1401,7 +1401,7 @@ static void dce_v11_0_audio_write_sad_regs(struct drm_encoder *encoder) return; } - sad_count = drm_edid_to_sad(amdgpu_connector_edid(connector), ); + sad_count = drm_edid_to_sad(amdgpu_connector->edid, ); if (sad_count < 0) DRM_ERROR("Couldn't read SADs: %d\n", sad_count); if (sad_count <= 0) diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c index db20012600f5..05c0df97f01d 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c @@ -1217,7 +1217,7 @@ static void dce_v6_0_audio_write_speaker_allocation(struct drm_encoder *encoder) return;
RE: [PATCH 18/22] drm/amd/pm: check negtive return for table entries
[AMD Official Use Only - AMD Internal Distribution Only] Hi Jesse, > -Original Message- > From: Jesse Zhang > Sent: Monday, May 13, 2024 3:34 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: [PATCH 18/22] drm/amd/pm: check negtive return for table entries > > Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) > returns a negative number > > Signed-off-by: Jesse Zhang > Suggested-by: Tim Huang > --- > drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > index f4bd8e9357e2..1276a95acc90 100644 > --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c > @@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) { > int result; > unsigned int i; > - unsigned int table_entries; > struct pp_power_state *state; > - int size; > + int size, table_entries; > > if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL) > return 0; > @@ -40,15 +39,17 @@ int psm_init_power_state_table(struct pp_hwmgr > *hwmgr) > if (hwmgr->hwmgr_func->get_power_state_size == NULL) > return 0; > > - hwmgr->num_ps = table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > + table_entries = > hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); > > - hwmgr->ps_size = size = > hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > + size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + > sizeof(struct pp_power_state); > > - if (table_entries == 0 || size == 0) { > + if (table_entries <= 0 || size == 0) { > pr_warn("Please check whether power state management is > supported on this asic\n"); As we return 0 here, we still need to set the hwmgr->num_ps and hwmgr->ps_size to 0 here. Tim Huang > return 0; > } > + hwmgr->num_ps = table_entries; > + hwmgr->ps_size = size; > > hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL); > if (hwmgr->ps == NULL) > -- > 2.25.1
RE: [PATCH 2/22 V2] drm/amdgpu: the warning dereferencing obj for nbio_v7_4
[AMD Official Use Only - AMD Internal Distribution Only] This patch is, Reviewed-by: Tim Huang > -Original Message- > From: Jesse Zhang > Sent: Monday, May 13, 2024 3:33 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: [PATCH 2/22 V2] drm/amdgpu: the warning dereferencing obj for > nbio_v7_4 > > if ras_manager obj null, don't print NBIO err data > > Signed-off-by: Jesse Zhang > Suggested-by: Tim Huang > --- > drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > index fe18df10daaa..32cc60ce5521 100644 > --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > @@ -383,7 +383,7 @@ static void > nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device > else > WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, > bif_doorbell_intr_cntl); > > - if (!ras->disable_ras_err_cnt_harvest) { > + if (ras && !ras->disable_ras_err_cnt_harvest && obj) { > /* >* clear error status after ras_controller_intr >* according to hw team and count ue number > -- > 2.25.1
[PATCH 18/22] drm/amd/pm: check negtive return for table entries
Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) returns a negative number Signed-off-by: Jesse Zhang Suggested-by: Tim Huang --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c index f4bd8e9357e2..1276a95acc90 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c @@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr) { int result; unsigned int i; - unsigned int table_entries; struct pp_power_state *state; - int size; + int size, table_entries; if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL) return 0; @@ -40,15 +39,17 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr) if (hwmgr->hwmgr_func->get_power_state_size == NULL) return 0; - hwmgr->num_ps = table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); + table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr); - hwmgr->ps_size = size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + + size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) + sizeof(struct pp_power_state); - if (table_entries == 0 || size == 0) { + if (table_entries <= 0 || size == 0) { pr_warn("Please check whether power state management is supported on this asic\n"); return 0; } + hwmgr->num_ps = table_entries; + hwmgr->ps_size = size; hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL); if (hwmgr->ps == NULL) -- 2.25.1
[PATCH 2/22 V2] drm/amdgpu: the warning dereferencing obj for nbio_v7_4
if ras_manager obj null, don't print NBIO err data Signed-off-by: Jesse Zhang Suggested-by: Tim Huang --- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c index fe18df10daaa..32cc60ce5521 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c @@ -383,7 +383,7 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device else WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl); - if (!ras->disable_ras_err_cnt_harvest) { + if (ras && !ras->disable_ras_err_cnt_harvest && obj) { /* * clear error status after ras_controller_intr * according to hw team and count ue number -- 2.25.1
Re: [RFC 0/5] Discussion around eviction improvements
Just FYI, I've been on sick leave for a while and now trying to catch up. It will probably be at least week until I can look into this again. Sorry, Christian. Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Last few days I was looking at the situation with VRAM over subscription, what happens versus what perhaps should happen. Browsing through the driver and running some simple experiments. I ended up with this patch series which, as a disclaimer, may be completely wrong but as I found some suspicious things, to me at least, I thought it was a good point to stop and request some comments. To perhaps summarise what are the main issues I think I found: * Migration rate limiting does not bother knowing if actual migration happened and so can over-account and unfairly penalise. * Migration rate limiting does not even work, at least not for the common case where userspace configures VRAM+GTT. It thinks it can stop migration attempts by playing with bo->allowed_domains vs bo->preferred domains but, both from the code, and from empirical experiments, I see that not working at all. Both masks are identical so fiddling with them achieves nothing. * Idea of the fallback placement only works when VRAM has free space. As soon as it does not, ttm_resource_compatible is happy to leave the buffers in the secondary placement forever. * Driver thinks it will be re-validating evicted buffers on the next submission but it does not for the very common case of VRAM+GTT because it only checks if current placement is *none* of the preferred placements. All those problems are addressed in individual patches. End result of this series appears to be driver which will try harder to move buffers back into VRAM, but will be (more) correctly throttled in doing so by the existing rate limiting logic. I have run a quick benchmark of Cyberpunk 2077 and cannot say that I saw a change but that could be a good thing too. At least I did not break anything, perhaps.. On one occassion I did see the rate limiting logic get confused while for a period of few minutes it went to a mode where it was constantly giving a high migration budget. But that recovered itself when I switched clients and did not come back so I don't know. If there is something wrong there I don't think it would be caused by any patches in this series. Series is probably rough but should be good enough for dicsussion. I am curious to hear if I identified at least something correctly as a real problem. It would also be good to hear what are the suggested games to check and see whether there is any improvement. Cc: Christian König Cc: Friedrich Vock Tvrtko Ursulin (5): drm/amdgpu: Fix migration rate limiting accounting drm/amdgpu: Actually respect buffer migration budget drm/ttm: Add preferred placement flag drm/amdgpu: Use preferred placement for VRAM+GTT drm/amdgpu: Re-validate evicted buffers drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 38 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++-- drivers/gpu/drm/ttm/ttm_resource.c | 13 +--- include/drm/ttm/ttm_placement.h| 3 ++ 5 files changed, 65 insertions(+), 18 deletions(-)
RE: [PATCH 02/22] drm/amdgpu: the warning dereferencing obj for nbio_v7_4
[AMD Official Use Only - AMD Internal Distribution Only] Hi Tim -Original Message- From: Huang, Tim Sent: Monday, May 13, 2024 12:23 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: RE: [PATCH 02/22] drm/amdgpu: the warning dereferencing obj for nbio_v7_4 [AMD Official Use Only - AMD Internal Distribution Only] Hi Jesse, > -Original Message- > From: Zhang, Jesse(Jie) > Sent: Monday, May 13, 2024 10:18 AM > To: Zhang, Jesse(Jie) ; > amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim > Subject: RE: [PATCH 02/22] drm/amdgpu: the warning dereferencing obj > for > nbio_v7_4 > > [AMD Official Use Only - AMD Internal Distribution Only] > > Ping ... > > -Original Message- > From: Jesse Zhang > Sent: Friday, May 10, 2024 10:50 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Huang, Tim ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: [PATCH 02/22] drm/amdgpu: the warning dereferencing obj for > nbio_v7_4 > > if ras_manager obj null, don't print NBIO err data > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > index fe18df10daaa..26e5885db9b7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c > @@ -383,7 +383,7 @@ static void > nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device > else > WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, > bif_doorbell_intr_cntl); > > - if (!ras->disable_ras_err_cnt_harvest) { > + if (!ras->disable_ras_err_cnt_harvest && obj) { We may need to check the ras pointer as well? Such as change to " if (ras && !ras->disable_ras_err_cnt_harvest && obj) {" [Zhang, Jesse(Jie)] Thanks, will update the patch . Tim Huang > /* > * clear error status after ras_controller_intr > * according to hw team and count ue number > -- > 2.25.1 >
Re: [PATCH v3] drm/amdgpu: Add Ring Hang Events
On 5/13/2024 9:44 AM, Ori Messinger wrote: > This patch adds 'ring hang' events to the driver. > This is done by adding a 'reset_ring_hang' bool variable to the > struct 'amdgpu_reset_context' in the amdgpu_reset.h file. > The purpose for this 'reset_ring_hang' variable is whenever a GPU > reset is initiated due to a ring hang, the reset_ring_hang should > be set to 'true'. > > Additionally, the reset cause is passed into the kfd smi event as > a string, and is displayed in dmesg as an error. > > This 'amdgpu_reset_context' struct is now also passed > through across all relevant functions, and another event type > "KFD_SMI_EVENT_RING_HANG" is added to the kfd_smi_event enum. > > Signed-off-by: Ori Messinger > Change-Id: I6af3022eb1b4514201c9430d635ff87f167ad6f7 > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 9 ++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 16 > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 > drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 2 ++ > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 --- > drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 7 ++- > drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 5 - > include/uapi/linux/kfd_ioctl.h | 15 --- > 10 files changed, 56 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > index e3738d417245..f1c6dc939cc3 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > @@ -133,6 +133,9 @@ static void amdgpu_amdkfd_reset_work(struct work_struct > *work) > > reset_context.method = AMD_RESET_METHOD_NONE; > reset_context.reset_req_dev = adev; > + reset_context.reset_ring_hang = true; > + strscpy(reset_context.reset_cause, "hws_hang", > sizeof(reset_context.reset_cause)); Please add only reset cause as an enum to reset context. There is no need for a separate variable like ring hang. For a user ring that induces a HWS hang, that may be identified separately or identified generically with "HWS hang" as the reason. HWS hang also could be caused by RAS error. A possible list is - DRIVER_TRIGGERED (suspend/reset on init etc) JOB HANG, HWS HANG, USER TRIGGERED, RAS ERROR, The description string for reset cause may be obtained separately with something like below which returns the details - no need to add them to reset context and pass around. amdgpu_reset_get_reset_desc(reset_context); If reset is caused by a specific job, reset context already has a job pointer. From there you may get more details like device/partition id, job id, ring name etc. and provide the description. For RAS errors, there is already detailed dmesg log of IP which caused issue. So only device could be sufficient. > + DRM_ERROR("Reset cause: %s\n", reset_context.reset_cause); Please don't use DRM_*. They are deprecated. Use either drm_err() or dev_err() - they help to identify the device also. Thanks, Lijo > clear_bit(AMDGPU_NEED_FULL_RESET, _context.flags); > > amdgpu_device_gpu_recover(adev, NULL, _context); > @@ -261,12 +264,12 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, > bool run_pm) > return r; > } > > -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev) > +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, struct > amdgpu_reset_context *reset_context) > { > int r = 0; > > if (adev->kfd.dev) > - r = kgd2kfd_pre_reset(adev->kfd.dev); > + r = kgd2kfd_pre_reset(adev->kfd.dev, reset_context); > > return r; > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > index 1de021ebdd46..c9030d8b8308 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > @@ -47,6 +47,7 @@ enum TLB_FLUSH_TYPE { > }; > > struct amdgpu_device; > +struct amdgpu_reset_context; > > enum kfd_mem_attachment_type { > KFD_MEM_ATT_SHARED, /* Share kgd_mem->bo or another attachment's */ > @@ -170,7 +171,8 @@ bool amdgpu_amdkfd_have_atomics_support(struct > amdgpu_device *adev); > > bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid); > > -int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev); > +int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev, > + struct amdgpu_reset_context *reset_context); > > int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev); > > @@ -416,7 +418,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > void kgd2kfd_device_exit(struct kfd_dev *kfd); > void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm); > int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm); > -int kgd2kfd_pre_reset(struct kfd_dev *kfd); > +int