On 09/23, Alex Deucher wrote:
> On Tue, Sep 23, 2025 at 5:12 PM Rodrigo Siqueira <[email protected]> wrote:
> >
> > When trying to unload amdgpu in the SteamDeck (TTY mode), the following
> > set of errors happens and the system gets unstable:
> >
> > [..]
> >  [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0
> >  amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> > failed on gfx_0.0.0 (-110).
> >  amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
> > [..]
> >  amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: 
> > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
> >  amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
> >  amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: 
> > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
> >  amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
> > [..]
> >
> > When the driver initializes the GPU, the PSP validates all the firmware
> > loaded, and after that, it is not possible to load any other firmware
> > unless the device is reset. What is happening in the load/unload
> > situation is that PSP halts the GC engine because it suspects that
> > something is amiss. To address this issue, this commit ensures that the
> > GPU is reset (mode 2 reset) in the load/unload sequence.
> >
> > Suggested-by: Alex Deucher <[email protected]>
> > Signed-off-by: Rodrigo Siqueira <[email protected]>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/nv.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
> > b/drivers/gpu/drm/amd/amdgpu/nv.c
> > index 50e77d9b30af..1964aa37c499 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> > @@ -543,6 +543,13 @@ static bool nv_need_reset_on_init(struct amdgpu_device 
> > *adev)
> >  {
> >         u32 sol_reg;
> >
> > +       /* GFX in the SteamDeck hangs when amdgpu module is reloaded, since 
> > the
> > +        * firmware is already loaded. To avoid this issue, ensure that the
> > +        * device is reset to put the PSP in a good state.
> > +        */
> > +       if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(10, 3, 1))
> > +               return true;
> 
> This will force a reset every time the driver loads.  That will add a
> lot of latency to the driver load sequence.  I think it would be
> better to reset on unload or add a check to see if CP firmware is
> already loaded here so we only reset if the driver has been previously
> loaded.

Hi Alex,

Thanks for the feedback.

First, I tried to call amdgpu_asic_reset() in amdgpu_pci_remove(), and
then in amdgpu_device_fini_hw(). Something like this:

r = amdgpu_asic_reset(adev); // mode 2

However, the situation worsened, causing a hang followed by the
SteamDeck fan to spin really fast, and then the system shut down. In
this sense, do you have any suggestions on which stage I should invoke
the GPU reset in the unload phase? It feels like amdgpu_device_fini_hw()
and amdgpu_pci_remove() are already too late to invoke the GPU reset. Or
maybe the reset operation that I used was not the correct one?

Thanks

> 
> Alex
> 
> > +
> >         if (adev->flags & AMD_IS_APU)
> >                 return false;
> >
> > --
> > 2.51.0
> >

-- 
Rodrigo Siqueira

Reply via email to