On 5/14/23 21:30, Christian Gelinek wrote:
I've had 2 similar lockups that needed a front panel reset just in the
last 2 weeks.
Something isn't right.
Hi,
I encountered my Debian frozen this morning. This is the 2nd time this
happened, the 1st one was on April 10, with very similar symptoms: The
PC was still running, but moving the mouse or typing didn't wake up my
screens and I couldn't connect to it via SSH.
After force-rebooting, I had a look at journalctl and these are the
messages before the reboot:
May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED"
operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd"
capability=12 capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32):
apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912
comm="cupsd" capability=12 capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make
remote CUPS printers available locally.
May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated
successfully.
May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log
files.
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened
for user root(uid=0) by (uid=0)
May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report
/etc/cron.hourly)
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed
for user root
May 14 00:54:00 gar kernel: snd_hda_intel 0000:04:00.0: Unable to change
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:07 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:11 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:16 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:17 gar kernel: i915 0000:03:00.0: [drm] *ERROR* CT:
Corrupted descriptor head=4294967295 tail=4294967295 status=0xffffffff
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]]
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 0000:03:00.0: [drm:add_taint_for_CI
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck
for 26s! [kworker/15:1:233]
May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc
binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
May 14 00:54:26 gar kernel: intel_uncore ee1004 pcspkr watchdog snd
soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core
acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop
efi_pstore configfs efivarfs ip_tables x_tables autof>
May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted:
G U W 6.1.0-8-amd64 #1 Debian 6.1.25-1
May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co.,
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022
May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf
4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8
ea ff ff ff
May 14 00:54:26 gar kernel: RSP: 0018:ffffa9d000947cc0 EFLAGS: 00000286
May 14 00:54:26 gar kernel: RAX: 00000000ffffffff RBX: 0000000000400000
RCX: 0000000000000ffc
May 14 00:54:26 gar kernel: RDX: 00000000000000ff RSI: 0000000000000004
RDI: 0000000000000000
May 14 00:54:26 gar kernel: RBP: ffffa9d000947cfc R08: 0000000000000004
R09: ffffa9d000947cfc
May 14 00:54:26 gar kernel: R10: 0000000000000004 R11: ffffffffbb7a6b80
R12: 0000000000000ffc
May 14 00:54:26 gar kernel: R13: 0000000000000000 R14: 0000000000000004
R15: 0000000000000000
May 14 00:54:26 gar kernel: FS: 0000000000000000(0000)
GS:ffff967f1fbc0000(0000) knlGS:0000000000000000
May 14 00:54:26 gar kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 14 00:54:26 gar kernel: CR2: 000055ba02054018 CR3: 0000000109b4c004
CR4: 0000000000770ee0
May 14 00:54:26 gar kernel: PKRU: 55555554
May 14 00:54:26 gar kernel: Call Trace:
May 14 00:54:26 gar kernel: <TASK>
May 14 00:54:26 gar kernel: pci_bus_read_config_dword+0x46/0x80
May 14 00:54:26 gar kernel: pci_find_next_ext_capability+0x82/0xe0
May 14 00:54:26 gar kernel: ? pci_conf1_read+0x9b/0xf0
May 14 00:54:26 gar kernel: pci_restore_state.part.0+0x5d/0x3a0
May 14 00:54:26 gar kernel: pci_pm_runtime_resume+0x41/0xe0
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: __rpm_callback+0x41/0x170
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: rpm_callback+0x5d/0x70
May 14 00:54:26 gar kernel: ? pci_pm_restore_noirq+0xc0/0xc0
May 14 00:54:26 gar kernel: rpm_resume+0x5df/0x820
May 14 00:54:26 gar kernel: pm_runtime_work+0x6c/0xa0
May 14 00:54:26 gar kernel: process_one_work+0x1c4/0x380
May 14 00:54:26 gar kernel: worker_thread+0x4d/0x380
May 14 00:54:26 gar kernel: ? _raw_spin_lock_irqsave+0x23/0x50
May 14 00:54:26 gar kernel: ? rescuer_thread+0x3a0/0x3a0
May 14 00:54:26 gar kernel: kthread+0xe6/0x110
May 14 00:54:26 gar kernel: ? kthread_complete_and_exit+0x20/0x20
May 14 00:54:26 gar kernel: ret_from_fork+0x1f/0x30
May 14 00:54:26 gar kernel: </TASK>
-- Boot 846264f027214bbfbb81c66db4ff1c81 --
It seems to be an issue with the i915 driver, potentially triggered by
snd_hda_intel.
`sudo lspci -v` reports (among others):
03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A750] (rev
08) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation DG2 [Arc A750]
Flags: bus master, fast devsel, latency 0, IRQ 153, IOMMU group 14
Memory at 80000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=8G]
Expansion ROM at 81000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915
00:1f.3 Audio device: Intel Corporation Device 7a50 (rev 11)
DeviceName: Onboard - Sound
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e02
Flags: bus master, fast devsel, latency 32, IRQ 158, IOMMU
group 10
Memory at 4200920000 (64-bit, non-prefetchable) [size=16K]
Memory at 4200800000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [50] Power Management version 3
Capabilities: [80] Vendor Specific Information: Len=14 <?>
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
I'm using firmware-misc-nonfree version 20230210-4,
`sudo dmesg |grep i915` returns
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64
root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
[ 0.018130] Kernel command line: BOOT_IMAGE=/vmlinuz-6.1.0-8-amd64
root=/dev/mapper/gar--vg-root ro quiet i915.force_probe=56a1
[ 1.379955] i915 0000:03:00.0: [drm] Incompatible option enable_guc=3
- HuC is not supported!
[ 1.380780] i915 0000:03:00.0: [drm] VT-d active for gfx access
[ 1.380845] i915 0000:03:00.0: vgaarb: deactivate vga console
[ 1.380869] i915 0000:03:00.0: [drm] Local memory IO size:
0x00000001fc000000
[ 1.380870] i915 0000:03:00.0: [drm] Local memory available:
0x00000001fc000000
[ 1.393505] i915 0000:03:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=io+mem:owns=none
[ 1.393643] i915 0000:03:00.0: firmware: direct-loading firmware
i915/dg2_dmc_ver2_07.bin
[ 1.396144] i915 0000:03:00.0: [drm] Finished loading DMC firmware
i915/dg2_dmc_ver2_07.bin (v2.7)
[ 1.404739] i915 0000:03:00.0: firmware: direct-loading firmware
i915/dg2_guc_70.bin
[ 1.484762] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.484763] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.487222] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.487223] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.488237] i915 0000:03:00.0: [drm] GuC firmware i915/dg2_guc_70.bin
version 70.5.1
[ 1.488347] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Class(1):Compute(4)!
[ 1.488348] i915 0000:03:00.0: [drm] Missing GuC-Err-Cap reglist
Instance(2):Compute(4)!
[ 1.500565] i915 0000:03:00.0: [drm] GuC submission enabled
[ 1.500565] i915 0000:03:00.0: [drm] GuC SLPC enabled
[ 1.500891] i915 0000:03:00.0: [drm] GuC RC: enabled
[ 1.521026] [drm] Initialized i915 1.6.0 20201103 for 0000:03:00.0 on
minor 0
[ 2.234182] fbcon: i915drmfb (fb0) is primary device
[ 2.326912] i915 0000:03:00.0: [drm] fb0: i915drmfb frame buffer device
[ 4.824372] snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops
i915_audio_component_bind_ops [i915])
Is anyone else seeing a similar problem? What can I do to avoid this? Do
we need anything else to narrow it down further?
Thanks for your time!
.
Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/>