On Fri, 2025-11-07 at 10:25 +0100, Christian König wrote:
> On 11/6/25 19:44, Timur Kristóf wrote:
> > The VCPU BO doesn't only contain the VCE firmware but also other
> > ranges that the VCE uses for its stack and data. Let's initialize
> > this to zero to avoid having garbage in the VCPU BO.
> >
> > v2:
> > - Only clear BO after creation, not on resume.
> >
> > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> > Signed-off-by: Timur Kristóf <[email protected]>
>
> For now this patch here is Reviewed-by: Christian König
> <[email protected]> since it addresses a clear problem and
> potentially even need to be back-ported to older kernels.
Thank you, I agree.
>
> But I think we should clean that up more full after we landed VCE1
> support.
Yes, I'm happy to continue this work after the VCE 1 support lands.
>
> Assuming that it hold true that VCE1-3 can't continue with sessions
> after suspend resume we should do something like this:
>
> 1. Remove all amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr).
> As kernel BO the VCE FW BO is pinned and mapped on creation
> time.
This is already done by patch 6 of this series:
"Save/restore and pin VCPU BO for all VCE (v2)"
>
> 2. Rename amdgpu_vce_resume() into amdgpu_vce_reload_fw() and add the
> memset_io() there like you originally planned.
Also done by patch 6 of this series, except for the rename.
>
> 3. Also add resetting the VCE FW handles into amdgpu_vce_reload_fw().
>
> E.g. something like this:
> for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
> atomic_set(&adev->vce.handles[i], 0);
> adev->vce.filp[i] = NULL;
> }
>
> This way the kernel will reject submissions when userspace tries
> to use the same FW handles as before the suspend/resume and prevent
> the HW from crashing.
>
> Does that sounds like a plan to you?
Yes, that sounds like a good plan to me.
>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > index b9060bcd4806..e028ad0d3b7a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > @@ -187,6 +187,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device
> > *adev, unsigned long size)
> > return r;
> > }
> >
> > + memset_io(adev->vce.cpu_addr, 0, size);
> > +
> > for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
> > atomic_set(&adev->vce.handles[i], 0);
> > adev->vce.filp[i] = NULL;