I don't think this patch is intended to fix suspend/resume bug, that's
not the problem I was seeing at any rate.  That said...

Does this bug apply to GCP kernel?  I'm assuming GCP is "Google Cloud
Platform", so I'm not sure they are even using affected hardware.  AMD
CPUs are fairly power-efficient, so I could see Google using them, but
probably not the ones with built-in GPU (in favor of getting ones with
more CPU cores instead).  I assume if they want to support CUDA-style
workloads they'd thrown some monster GPUs into their "GPU compute" cloud
systems.  (That said, that all means for the use case of GCP kernel, it
should be fine either way, apply or not apply this patch.)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1956401

Title:
  amdgpu hangs for 90 seconds at a time in 5.13.0-23, but 5.13.0-22
  works

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  SRU Justification

  Impact:

  This does not occur with linux-image-5.13.0-22-generic, but does with 
linux-image-5.13.0-23-generic.
  On startup, I get about a 60 second hang, with the following in the kernel 
dmesg:
  Jan  4 15:26:36 inspiron-3505 kernel: [   34.160572] amdgpu 0000:04:00.0: 
amdgp : failed to write reg 28b4 wait reg 28c6
  Jan  4 15:26:56 inspiron-3505 kernel: [   54.189055] amdgpu 0000:04:00.0: 
amdgp : failed to write reg 1a6f4 wait reg 1a706
  Jan  4 15:27:16 inspiron-3505 kernel: [   74.329264] amdgpu 0000:04:00.0: 
amdgp : failed to write reg 28b4 wait reg 28c6
  Jan  4 15:27:36 inspiron-3505 kernel: [   94.337904] amdgpu 0000:04:00.0: 
amdgp : failed to write reg 1a6f4 wait reg 1a706
  I have the following GPU:
  04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
Picass
  o (rev c2) (prog-if 00 [VGA controller])
  04:00.0 0300: 1002:15d8 (rev c2)
  (This is a Ryzen 5 3450U CPU with Radeon Vega Mobile.)

  I get a similar hang if I start firefox (when it's probing OpenGL
  contexts), and even with glxgears and glxinfo. Seems like anything
  that'd kick on a OpenGL context does it.  I had a freeze as well when
  I tried running firefox and glxgears both.  Along with odd BUG:
  messages logged (I have some in the attached log.)

  I was running with "iommu=pt", but did try with this removed, still
  got the errors (I think amdgpu driver uses the IOMMU even when it's
  set to IOMMU=pt though.).  See the attached log for some very odd
  "[Hardware Error]" messages that were logged on one test run.  I think
  this was when I tried to run firestorm (second life viewer) -- that
  had a large pause then opened to a black window.

  Per Google, I see there was a bug like this that turned up in kernel
  5.14.15 but fixed in 5.14.17.  See
  https://gitlab.freedesktop.org/drm/amd/-/issues/1770

  Thanks!
  --Henry

  Fix:
  upstream commit afd18180c070 ("drm/amdkfd: fix boot failure when iommu is 
disabled in Picasso.")

  Patch was included in the Impish kernel in -proposed (5.13.0.24.24)
  from an upstream patch set. multiple confirmations the problem is
  resolved with the kernel in -proposed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956401/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to