On Thu, Oct 16, 2025 at 01:55:27PM -0500, Mario Limonciello wrote:
> [Why]
> Newer VPE microcode has functionality that will decrease DPM level
> only when a workload has run for 2 or more seconds.  If VPE is turned
> off before this DPM decrease and the PMFW doesn't reset it when
> power gating VPE, the SOC can get stuck with a higher DPM level.
> 
> This can happen from amdgpu's ring buffer test because it's a short
> quick workload for VPE and VPE is turned off after 1s.
> 
> [How]
> In idle handler besides checking fences are drained check PMFW version
> to determine if it will reset DPM when power gating VPE.  If PMFW will
> not do this, then check VPE DPM level. If it is not DPM0 reschedule
> delayed work again until it is.
> 
> Cc: [email protected]
> Reported-by: Sultan Alsawaf <[email protected]>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 33 ++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> index 474bfe36c0c2..f4932339d79d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> @@ -322,6 +322,26 @@ static int vpe_early_init(struct amdgpu_ip_block 
> *ip_block)
>       return 0;
>  }
>  
> +static bool vpe_need_dpm0_at_power_down(struct amdgpu_device *adev)
> +{
> +     switch (amdgpu_ip_version(adev, VPE_HWIP, 0)) {
> +     case IP_VERSION(6, 1, 1):
> +             return adev->pm.fw_version < 0x0a640500;
> +     default:
> +             return false;
> +     }
> +}
> +
> +static int vpe_get_dpm_level(struct amdgpu_device *adev)
> +{
> +     struct amdgpu_vpe *vpe = &adev->vpe;
> +
> +     if (!adev->pm.dpm_enabled)
> +             return 0;
> +
> +     return RREG32(vpe_get_reg_offset(vpe, 0, vpe->regs.dpm_request_lv));
> +}
> +
>  static void vpe_idle_work_handler(struct work_struct *work)
>  {
>       struct amdgpu_device *adev =
> @@ -329,11 +349,16 @@ static void vpe_idle_work_handler(struct work_struct 
> *work)
>       unsigned int fences = 0;
>  
>       fences += amdgpu_fence_count_emitted(&adev->vpe.ring);
> +     if (fences)
> +             goto reschedule;
>  
> -     if (fences == 0)
> -             amdgpu_device_ip_set_powergating_state(adev, 
> AMD_IP_BLOCK_TYPE_VPE, AMD_PG_STATE_GATE);
> -     else
> -             schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
> +     if (vpe_need_dpm0_at_power_down(adev) && vpe_get_dpm_level(adev) != 0)
> +             goto reschedule;
> +
> +     amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VPE, 
> AMD_PG_STATE_GATE);
> +

Wait a second, there's no return here! My laptop kept getting kicked out of
S0i3 as a result when I'd suspend it, and I found my laptop cooking in my
backpack with its battery mostly drained. :-(

> +reschedule:
> +     schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
>  }
>  
>  static int vpe_common_init(struct amdgpu_vpe *vpe)
> -- 
> 2.51.0
> 

Sultan

Reply via email to