On Fri, 2025-11-07 at 10:25 +0100, Christian König wrote:
> On 11/6/25 19:44, Timur Kristóf wrote:
> > The VCPU BO doesn't only contain the VCE firmware but also other
> > ranges that the VCE uses for its stack and data. Let's initialize
> > this to zero to avoid having garbage in the VCPU BO.
> > 
> > v2:
> > - Only clear BO after creation, not on resume.
> > 
> > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
> > Signed-off-by: Timur Kristóf <[email protected]>
> 
> For now this patch here is Reviewed-by: Christian König
> <[email protected]> since it addresses a clear problem and
> potentially even need to be back-ported to older kernels.

Thank you, I agree.

> 
> But I think we should clean that up more full after we landed VCE1
> support.

Yes, I'm happy to continue this work after the VCE 1 support lands.

> 
> Assuming that it hold true that VCE1-3 can't continue with sessions
> after suspend resume we should do something like this:
> 
> 1. Remove all amdgpu_bo_kmap(adev->vce.vcpu_bo, &cpu_addr).
>       As kernel BO the VCE FW BO is pinned and mapped on creation
> time.

This is already done by patch 6 of this series:
"Save/restore and pin VCPU BO for all VCE (v2)"

> 
> 2. Rename amdgpu_vce_resume() into amdgpu_vce_reload_fw() and add the
> memset_io() there like you originally planned.

Also done by patch 6 of this series, except for the rename.

> 
> 3. Also add resetting the VCE FW handles into amdgpu_vce_reload_fw().
> 
>    E.g. something like this:
>       for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
>               atomic_set(&adev->vce.handles[i], 0);
>               adev->vce.filp[i] = NULL;
>       }
> 
>    This way the kernel will reject submissions when userspace tries
> to use the same FW handles as before the suspend/resume and prevent
> the HW from crashing.
> 
> Does that sounds like a plan to you?

Yes, that sounds like a good plan to me.

> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > index b9060bcd4806..e028ad0d3b7a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> > @@ -187,6 +187,8 @@ int amdgpu_vce_sw_init(struct amdgpu_device
> > *adev, unsigned long size)
> >             return r;
> >     }
> >  
> > +   memset_io(adev->vce.cpu_addr, 0, size);
> > +
> >     for (i = 0; i < AMDGPU_MAX_VCE_HANDLES; ++i) {
> >             atomic_set(&adev->vce.handles[i], 0);
> >             adev->vce.filp[i] = NULL;

Reply via email to