On Thursday, 18 May 2023 12:52:24 CEST David Reviejo wrote:
> Seems to be an amdgpu bug introduced two or three kernel releases ago, as
> you can see googling around; for example here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2191739
>
> or here:
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/2522
>
> If it's this bug, the fix seems to be in yesterday's latest upstream
> kernel update: 6.3.3 (and 6.1.29 for the stable longterm).
I _think_ I got the right commit for the 6.3 branch attached.
Para 4.5(.2) of the Debian Kernel Handbook describes how to test a simple patch:
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-official
Could you try that and see whether it indeed fixes your issue?>From c5123c193696bf97fdf259c825ebfac517b54e44 Mon Sep 17 00:00:00 2001
From: Guchun Chen
Date: Sat, 6 May 2023 16:52:59 +0800
Subject: drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in
suspend
commit 8b229ada2669b74fdae06c83fbfda5a5a99fc253 upstream.
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini,
driver unconditionally disables ecc_irq which is only enabled on
those asics enabling sdma ecc. This will introduce a warning in
suspend cycle on those chips with sdma ip v4.0, while without
sdma ecc. So this patch correct this.
[ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[ 7283.167001] RSP: 0018:9a5fc3967d08 EFLAGS: 00010246
[ 7283.167019] RAX: 98d88afd3770 RBX: 0001 RCX:
[ 7283.167023] RDX: RSI: 98d89da30390 RDI: 98d89da2
[ 7283.167025] RBP: 98d89da2 R08: 00036838 R09: 0006
[ 7283.167028] R10: d5764243c008 R11: R12: 98d89da30390
[ 7283.167030] R13: 98d89da38978 R14: 999ae15a R15: 98d880130105
[ 7283.167032] FS: () GS:98d996f0() knlGS:
[ 7283.167036] CS: 0010 DS: ES: CR0: 80050033
[ 7283.167039] CR2: f7a9d178 CR3: 0001c42ea000 CR4: 003506e0
[ 7283.167041] Call Trace:
[ 7283.167046]
[ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu]
[ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu]
[ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu]
[ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu]
[ 7283.169464] pci_pm_freeze+0x54/0xc0
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
Signed-off-by: Guchun Chen
Reviewed-by: Tao Zhou
Signed-off-by: Alex Deucher
Cc: sta...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index b5affba221569..8b8ddf0502661 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1903,9 +1903,11 @@ static int sdma_v4_0_hw_fini(void *handle)
return 0;
}
- for (i = 0; i < adev->sdma.num_instances; i++) {
- amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
- AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+ if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
+ for (i = 0; i < adev->sdma.num_instances; i++) {
+ amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
+ AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+ }
}
sdma_v4_0_ctx_switch_enable(adev, false);
--
cgit
signature.asc
Description: This is a digitally signed message part.