On 5/14/23 21:30, Christian Gelinek wrote:

I've had 2 similar lockups that needed a front panel reset just in the last 2 weeks.
Something isn't right.
Hi,

I encountered my Debian frozen this morning. This is the 2nd time this happened, the 1st one was on April 10, with very similar symptoms: The PC was still running, but moving the mouse or typing didn't wake up my screens and I couldn't connect to it via SSH.

After force-rebooting, I had a look at journalctl and these are the messages before the reboot:

May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" capability=12  capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" capability=12  capname="net_admin" May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make remote CUPS printers available locally. May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully. May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log files. May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed for user root May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change power state from D3hot to D0, device inaccessible May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* render: timed out waiting for forcewake ack to clear. May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* render: timed out waiting for forcewake ack to clear. May 14 00:54:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* render: timed out waiting for forcewake ack to clear. May 14 00:54:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* gt: timed out waiting for forcewake ack to clear. May 14 00:54:16 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] May 14 00:54:17 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT: Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* render: timed out waiting for forcewake ack to clear. May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] *ERROR* gt: timed out waiting for forcewake ack to clear. May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915] May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck for 26s! [kworker/15:1:233] May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_> May 14 00:54:26 gar kernel:  intel_uncore ee1004 pcspkr watchdog snd soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop efi_pstore configfs efivarfs ip_tables x_tables autof> May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: G     U  W          6.1.0-8-amd64 #1  Debian 6.1.25-1 May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022
May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf 4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 ea ff ff ff
May 14 00:54:26 gar kernel: RSP: 0018:ffffa9d000947cc0 EFLAGS: 00000286
May 14 00:54:26 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000 RCX: 0000000000000ffc May 14 00:54:26 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004 RDI: 0000000000000000 May 14 00:54:26 gar kernel: RBP: ffffa9d000947cfc R08: 0000000000000004 R09: ffffa9d000947cfc May 14 00:54:26 gar kernel: R10: 0000000000000004 R11: ffffffffbb7a6b80 R12: 0000000000000ffc May 14 00:54:26 gar kernel: R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000 May 14 00:54:26 gar kernel: FS:  0000000000000000(0000) GS:ffff967f1fbc0000(0000) knlGS:0000000000000000 May 14 00:54:26 gar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 14 00:54:26 gar kernel: CR2: 000055ba02054018 CR3: 0000000109b4c004 CR4: 0000000000770ee0
May 14 00:54:26 gar kernel: PKRU: 55555554
May 14 00:54:26 gar kernel: Call Trace:
May 14 00:54:26 gar kernel:  <TASK>
May 14 00:54:26 gar kernel:  pci_bus_read_config_dword+0x46/0x80
May 14 00:54:26 gar kernel:  pci_find_next_ext_capability+0x82/0xe0
May 14 00:54:26 gar kernel:  ? pci_conf1_read+0x9b/0xf0
May 14 00:54:26 gar kernel:  pci_restore_state.part.0+0x5d/0x3a0
May 14 00:54:26 gar kernel:  pci_pm_runtime_resume+0x41/0xe0
May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel:  __rpm_callback+0x41/0x170
May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel:  rpm_callback+0x5d/0x70
May 14 00:54:26 gar kernel:  ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel:  rpm_resume+0x5df/0x820
May 14 00:54:26 gar kernel:  pm_runtime_work+0x6c/0xa0
May 14 00:54:26 gar kernel:  process_one_work+0x1c4/0x380
May 14 00:54:26 gar kernel:  worker_thread+0x4d/0x380
May 14 00:54:26 gar kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
May 14 00:54:26 gar kernel:  ? rescuer_thread+0x3a0/0x3a0
May 14 00:54:26 gar kernel:  kthread+0xe6/0x110
May 14 00:54:26 gar kernel:  ? kthread_complete_and_exit+0x20/0x20
May 14 00:54:26 gar kernel:  ret_from_fork+0x1f/0x30
May 14 00:54:26 gar kernel:  </TASK>
-- Boot 846264f027214bbfbb81c66db4ff1c81 --

It seems to be an issue with the i915 driver, potentially triggered by snd_hda_intel.

`sudo lspci -v` reports (among others):

03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev 08) (prog-if 00 [VGA controller])
         Subsystem: Intel Corporation DG2 [Arc A750]
         Flags: bus master, fast devsel, latency 0, IRQ 153, IOMMU group 14
         Memory at 80000000 (64-bit, non-prefetchable) [size=16M]
         Memory at 4000000000 (64-bit, prefetchable) [size=8G]
         Expansion ROM at 81000000 [disabled] [size=2M]
         Capabilities: [40] Vendor Specific Information: Len=0c <?>
         Capabilities: [70] Express Endpoint, MSI 00
         Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
         Capabilities: [d0] Power Management version 3
         Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
         Capabilities: [420] Physical Resizable BAR
         Capabilities: [400] Latency Tolerance Reporting
         Kernel driver in use: i915
         Kernel modules: i915

00:1f.3 Audio device: Intel Corporation Device 7a50 (rev 11)
         DeviceName: Onboard - Sound
         Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e02
        Flags: bus master, fast devsel, latency 32, IRQ 158, IOMMU group 10
         Memory at 4200920000 (64-bit, non-prefetchable) [size=16K]
         Memory at 4200800000 (64-bit, non-prefetchable) [size=1M]
         Capabilities: [50] Power Management version 3
         Capabilities: [80] Vendor Specific Information: Len=14 <?>
         Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
         Kernel driver in use: snd_hda_intel
         Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl

I'm using firmware-misc-nonfree version 20230210-4,
`sudo dmesg |grep i915` returns

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1 [    0.018130] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64 root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1 [    1.379955] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3 - HuC is not supported!
[    1.380780] i915 0000:03:00.0: [drm] VT-d active for gfx access
[    1.380845] i915 0000:03:00.0: vgaarb: deactivate vga console
[    1.380869] i915 0000:03:00.0: [drm] Local memory IO size: 0x00000001fc000000 [    1.380870] i915 0000:03:00.0: [drm] Local memory available: 0x00000001fc000000 [    1.393505] i915 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none [    1.393643] i915 0000:03:00.0: firmware: direct-loading firmware i915/dg2_dmc_ver2_07.bin [    1.396144] i915 0000:03:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_07.bin (v2.7) [    1.404739] i915 0000:03:00.0: firmware: direct-loading firmware i915/dg2_guc_70.bin [    1.484762] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)! [    1.484763] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)! [    1.487222] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)! [    1.487223] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)! [    1.488237] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.bin version 70.5.1 [    1.488347] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Class(1):Compute(4)! [    1.488348] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist Instance(2):Compute(4)!
[    1.500565] i915 0000:03:00.0: [drm] GuC submission enabled
[    1.500565] i915 0000:03:00.0: [drm] GuC SLPC enabled
[    1.500891] i915 0000:03:00.0: [drm] GuC RC: enabled
[    1.521026] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on minor 0
[    2.234182] fbcon: i915drmfb (fb0) is primary device
[    2.326912] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[    4.824372] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops i915_audio_component_bind_ops [i915])

Is anyone else seeing a similar problem? What can I do to avoid this? Do we need anything else to narrow it down further?

Thanks for your time!

.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/>

Reply via email to