On 09/23, Alex Deucher wrote: > On Tue, Sep 23, 2025 at 5:12 PM Rodrigo Siqueira <[email protected]> wrote: > > > > When trying to unload amdgpu in the SteamDeck (TTY mode), the following > > set of errors happens and the system gets unstable: > > > > [..] > > [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0 > > amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test > > failed on gfx_0.0.0 (-110). > > amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). > > [..] > > amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: > > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 > > amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! > > amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: > > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 > > amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! > > [..] > > > > When the driver initializes the GPU, the PSP validates all the firmware > > loaded, and after that, it is not possible to load any other firmware > > unless the device is reset. What is happening in the load/unload > > situation is that PSP halts the GC engine because it suspects that > > something is amiss. To address this issue, this commit ensures that the > > GPU is reset (mode 2 reset) in the load/unload sequence. > > > > Suggested-by: Alex Deucher <[email protected]> > > Signed-off-by: Rodrigo Siqueira <[email protected]> > > --- > > drivers/gpu/drm/amd/amdgpu/nv.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c > > b/drivers/gpu/drm/amd/amdgpu/nv.c > > index 50e77d9b30af..1964aa37c499 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/nv.c > > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c > > @@ -543,6 +543,13 @@ static bool nv_need_reset_on_init(struct amdgpu_device > > *adev) > > { > > u32 sol_reg; > > > > + /* GFX in the SteamDeck hangs when amdgpu module is reloaded, since > > the > > + * firmware is already loaded. To avoid this issue, ensure that the > > + * device is reset to put the PSP in a good state. > > + */ > > + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(10, 3, 1)) > > + return true; > > This will force a reset every time the driver loads. That will add a > lot of latency to the driver load sequence. I think it would be > better to reset on unload or add a check to see if CP firmware is > already loaded here so we only reset if the driver has been previously > loaded.
Hi Alex, Thanks for the feedback. First, I tried to call amdgpu_asic_reset() in amdgpu_pci_remove(), and then in amdgpu_device_fini_hw(). Something like this: r = amdgpu_asic_reset(adev); // mode 2 However, the situation worsened, causing a hang followed by the SteamDeck fan to spin really fast, and then the system shut down. In this sense, do you have any suggestions on which stage I should invoke the GPU reset in the unload phase? It feels like amdgpu_device_fini_hw() and amdgpu_pci_remove() are already too late to invoke the GPU reset. Or maybe the reset operation that I used was not the correct one? Thanks > > Alex > > > + > > if (adev->flags & AMD_IS_APU) > > return false; > > > > -- > > 2.51.0 > > -- Rodrigo Siqueira
