Hello Juerg, Em quinta-feira, 20 de janeiro de 2022, às 12:32:48 -03, Juerg Haefliger escreveu: > If you want this fixed in Ubuntu I need to know what series are > affected. Hirsute goes EOL at the end of the month. Are Impish and/or > Jammy working or affected as well?
I upgraded to Impish a while ago. I haven’t seen “retry page fault” messages in a long while (I don’t think it’s related to the distro upgrade, but not sure) so I’d say this particular bug is fixed at least for me (I have a Picasso GPU). Which is not to say that things are rosy, unfortunately. But the other issues I see don’t cause any message to appear in dmesg so it’s hard to search for existing bug reports about them or open a new one. The following is off-topic for this bug report, but I’ll mention anyway, hope you’ll bear with me: One thing I noticed is that things did get rosy when I did two things: 1. Switched from Xorg to Wayland. 2. Switched Firefox to use Wayland as well. This led me to the conclusion that the bugs that plague my machine are triggered by something that Firefox does when it uses X (both “natively” or via XWayland). For some reason, when it uses Wayland it doesn’t trigger these GPU bugs. Another thing that might be relevant is that I have tons of tabs open (probably more than 200) distributed in 27 open windows. Perhaps I’m stressing some kind of resource limit in the driver or firmware? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-firmware in Ubuntu. https://bugs.launchpad.net/bugs/1928393 Title: linux-firmware 1.197 causes kernel to report error "amdgpu: [gfxhub0] retry page fault" Status in amd: New Status in linux-firmware package in Ubuntu: Invalid Status in mesa package in Ubuntu: Invalid Status in linux-firmware source package in Hirsute: Incomplete Bug description: After upgrading linux-firmware from 1.190.5 to 1.197 (as part of the upgrade from Ubuntu 20.10 to 21.04), I started experiencing frequent and severe GPU instability. When this happens, I see this error in dmesg: [20061.061069] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1141 thread Xorg:cs0 pid 1236) [20061.061103] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800000401000 from client 27 [20061.061135] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 [20061.061147] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [20061.061157] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [20061.061167] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [20061.061174] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [20061.061183] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [20061.061189] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 I'll attach a couple of full dmesgs that I collected. Many of the times when this happens, the screen and keyboard freeze irreversibly (I tried waiting for more than 30 minutes, but it doesn't help). I can still log in via ssh though. When there's no freeze, I can continue using the computer normally, but the laptop fans keep running are always running and the battery depletes fast. There's probably something on a permanent loop either in the kernel or in the GPU. This bug happens several times a day, rendering the machine so unstable as to be almost unusable. It is a severe regression and I'm aghast that it passed AMD's Quality Assurance. After downgrading back to linux-firmware 1.190.5, the machine is back to the previous, mostly-reliable state. Which is to say, this bug is gone, I'm just left with the other amdgpu suspend bug I've learned to live with since I bought this computer. Please revert the amdgpu firmware in this package as soon as possible. This is unbearable. Relevant information: Ubuntu version: 21.04 Linux kernel: 5.11.0-17-generic x86_64 CPU model: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx GPU: 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c1) Laptop model: Lenovo Ideapad S145 To manage notifications about this bug go to: https://bugs.launchpad.net/amd/+bug/1928393/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp