On Thu, Oct 16, 2025 at 01:55:27PM -0500, Mario Limonciello wrote: > [Why] > Newer VPE microcode has functionality that will decrease DPM level > only when a workload has run for 2 or more seconds. If VPE is turned > off before this DPM decrease and the PMFW doesn't reset it when > power gating VPE, the SOC can get stuck with a higher DPM level. > > This can happen from amdgpu's ring buffer test because it's a short > quick workload for VPE and VPE is turned off after 1s. > > [How] > In idle handler besides checking fences are drained check PMFW version > to determine if it will reset DPM when power gating VPE. If PMFW will > not do this, then check VPE DPM level. If it is not DPM0 reschedule > delayed work again until it is. > > Cc: [email protected] > Reported-by: Sultan Alsawaf <[email protected]> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615 > Signed-off-by: Mario Limonciello <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 33 ++++++++++++++++++++++--- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c > index 474bfe36c0c2..f4932339d79d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c > @@ -322,6 +322,26 @@ static int vpe_early_init(struct amdgpu_ip_block > *ip_block) > return 0; > } > > +static bool vpe_need_dpm0_at_power_down(struct amdgpu_device *adev) > +{ > + switch (amdgpu_ip_version(adev, VPE_HWIP, 0)) { > + case IP_VERSION(6, 1, 1): > + return adev->pm.fw_version < 0x0a640500; > + default: > + return false; > + } > +} > + > +static int vpe_get_dpm_level(struct amdgpu_device *adev) > +{ > + struct amdgpu_vpe *vpe = &adev->vpe; > + > + if (!adev->pm.dpm_enabled) > + return 0; > + > + return RREG32(vpe_get_reg_offset(vpe, 0, vpe->regs.dpm_request_lv)); > +} > + > static void vpe_idle_work_handler(struct work_struct *work) > { > struct amdgpu_device *adev = > @@ -329,11 +349,16 @@ static void vpe_idle_work_handler(struct work_struct > *work) > unsigned int fences = 0; > > fences += amdgpu_fence_count_emitted(&adev->vpe.ring); > + if (fences) > + goto reschedule; > > - if (fences == 0) > - amdgpu_device_ip_set_powergating_state(adev, > AMD_IP_BLOCK_TYPE_VPE, AMD_PG_STATE_GATE); > - else > - schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT); > + if (vpe_need_dpm0_at_power_down(adev) && vpe_get_dpm_level(adev) != 0) > + goto reschedule; > + > + amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VPE, > AMD_PG_STATE_GATE); > +
Wait a second, there's no return here! My laptop kept getting kicked out of S0i3 as a result when I'd suspend it, and I found my laptop cooking in my backpack with its battery mostly drained. :-( > +reschedule: > + schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT); > } > > static int vpe_common_init(struct amdgpu_vpe *vpe) > -- > 2.51.0 > Sultan
