Re: Bookworm soft lockup

2023-05-30 Thread Christian Gelinek

On Wed, 17 May 2023 02:12:32 +0100, Philip Wyett wrote:


A little research shows that this is not that uncommon. A suggested 
workaround is to disable the

power management for the device as follows.

Create a file (such as): 
/etc/modprobe.d/snd-intel-disable-power-management.conf


Add the following line: options snd_hda_intel power_save=0

Reboot.

Hopefully this may assist.


I tried:

$ ls -l /etc/modprobe.d/snd-intel-disable-power-management.conf
-rw-r- 1 root root 36 May 23 10:01 
/etc/modprobe.d/snd-intel-disable-power-management.conf


$ sudo cat /etc/modprobe.d/snd-intel-disable-power-management.conf
options snd_hda_intel power_save=0

Nevertheless, this morning I arrived to the locked-up state. Journalctl 
output:


May 31 04:17:01 gar CRON[2175]: (root) CMD (cd / && run-parts --report 
/etc/cron.hourly)
May 31 04:17:01 gar CRON[2174]: pam_unix(cron:session): session closed 
for user root
May 31 05:12:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
May 31 05:12:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:07 gar kernel: INFO: NMI handler (perf_event_nmi_handler) 
took too long to run: 84.155 msecs
May 31 05:12:07 gar kernel: perf: interrupt took too long (657458 > 
2500), lowering kernel.perf_event_max_sample_rate to 250
May 31 05:12:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:07 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:11 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:15 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 31 05:12:15 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:16 gar kernel: i915 :03:00.0: [drm] *ERROR* CT: 
Corrupted descriptor head=4294967295 tail=4294967295 status=0xf>
May 31 05:12:27 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 31 05:12:27 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:27 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 31 05:12:27 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x>
May 31 05:12:27 gar kernel: watchdog: BUG: soft lockup - CPU#14 stuck 
for 26s! [kworker/14:2:2123]
May 31 05:12:27 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer 
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_r>
May 31 05:12:27 gar kernel:  intel_uncore wmi_bmof ee1004 pcspkr 
watchdog soundcore intel_vsec serial_multi_instantiate intel_pmc_co>
May 31 05:12:27 gar kernel: CPU: 14 PID: 2123 Comm: kworker/14:2 
Tainted: G U  W  6.1.0-9-amd64 #1  Debian 6.1.27-1
May 31 05:12:27 gar kernel: Hardware name: Micro-Star International Co., 
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/20>

May 31 05:12:27 gar kernel: Workqueue: pm pm_runtime_work
May 31 05:12:27 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 31 05:12:27 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89>

May 31 05:12:27 gar kernel: RSP: 0018:b5ce87b9bcc0 EFLAGS: 0286
May 31 05:12:27 gar kernel: RAX:  RBX: 0040 
RCX: 0ffc
May 31 05:12:27 gar kernel: RDX: 00ff RSI: 0004 
RDI: 
May 31 05:12:27 gar kernel: RBP: b5ce87b9bcfc R08: 0004 
R09: b5ce87b9bcfc
May 31 05:12:27 gar kernel: R10: 0004 R11: b19a70a0 
R12: 0ffc
May 31 05:12:27 gar kernel: R13:  R14: 0004 
R15: 
May 31 05:12:27 gar kernel: FS:  () 
GS:9c641fb8() knlGS:
May 31 05:12:27 gar kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 31 05:12:27 gar kernel: CR2: 7f78df2005c9 CR3: 00010571e005 
CR4: 00770ee0

May 31 05:12:27 gar kernel: PKRU: 5554
May 31 05:12:27 gar kernel: Call Trace:
May 31 05:12:27 gar kernel:  
May 31 05:12:27 gar kernel:  pci_bus_read_config_dword+0x46/0x80
May 31 05:12:27 gar kernel:  pci_find_next_ext_capability+0x82/0xe

Re: Bookworm soft lockup

2023-05-22 Thread Christian Gelinek
It happened again (see journal entries below), so I'm going to try your 
workaround and see if that helps.


On Wed, 17 May 2023 02:12:32 +0100, Philip Wyett wrote:

A little research shows that this is not that uncommon. A suggested 
workaround is to disable the

power management for the device as follows.

Create a file (such as): 
/etc/modprobe.d/snd-intel-disable-power-management.conf


Add the following line: options snd_hda_intel power_save=0

Reboot.

Hopefully this may assist.


Thanks for the suggestion.

Journal of the most recent lockup below:

May 19 23:17:01 gar CRON[1902]: pam_unix(cron:session): session opened 
for user root(uid=0) by (uid=0)
May 19 23:17:01 gar CRON[1903]: (root) CMD (cd / && run-parts --report 
/etc/cron.hourly)
May 19 23:17:01 gar CRON[1902]: pam_unix(cron:session): session closed 
for user root
May 19 23:19:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
May 19 23:19:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:07 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]

May 19 23:19:09 gar kernel: hrtimer: interrupt took 252466511 ns
May 19 23:19:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 19 23:19:11 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:15 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 19 23:19:15 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 19 23:19:16 gar kernel: i915 :03:00.0: [drm] *ERROR* CT: 
Corrupted descriptor head=4294967295 tail=4294967295 status=0x

-- Boot 4540f787dd2341cea70a75aac62b1843 --
May 23 09:24:43 gar kernel: Linux version 6.1.0-9-amd64 
(debian-ker...@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU 
ld (GNU Binutils for Debian) 2.40) #1 SMP P>


Thank you for your time!

Christian



Re: Bookworm soft lockup

2023-05-16 Thread Xiyue Deng

Philip Wyett  writes:

> [[PGP Signed Part:Undecided]]
> On Wed, 2023-05-17 at 09:51 +0930, Christian Gelinek wrote:
>> On Mon, 15 May 2023 18:30:31, David wrote:
>> > On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:
>> > > Christian Gelinek  writes:
>> > > 
>> > > > Is anyone else seeing a similar problem? What can I do to avoid
>> > > > this?
>> > > > Do we need anything else to narrow it down further?
>> > > 
>> > > Only time I've seen a soft lockup was from a bad CPU. There were a
>> > > bunch
>> > > of them and eventually the computer hung. Going back to the slow
>> > > plodding Celeron fixed all issues. Except CPU performance of course.
>> > 
>> > It's happened to me a couple of times, but only since I switched from
>> > stable to testing, over the last month.
>> > As I don't think everybody is running a Dell 980 desktop, or the same
>> > desktop environment, it's probably not a hardware/software mismatch.
>> > We'd be looking at strictly software, I suspect.
>> 
>> I have the same hunch. I got that PC new on Feb 20th, so I hope it's not 
>> the CPU.  It also happened just twice since then, even though I keep it 
>> running pretty much 24/7.  Not doing very much when I'm not there, which 
>> was the case both times.
>> 
>> And for both times, the journalctl log looks suspiciously similar, 
>> starting with the snd_hda_intel entry.
>> 
>> First time:
>> 
>> Apr 10 07:36:07 gar systemd[1]: anacron.service: Deactivated successfully.
>> Apr 10 07:50:01 gar kernel: snd_hda_intel :04:00.0: Unable to change 
>> power state from D3hot to D0, device inaccessible
>> Apr 10 07:50:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
>> *ERROR* render: timed out waiting for forcewake ack to clear.
>> Apr 10 07:50:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
>> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
>> ...
>> 
>> Second time:
>> 
>> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
>> for user root
>> May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
>> power state from D3hot to D0, device inaccessible
>> May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
>> *ERROR* render: timed out waiting for forcewake ack to clear.
>> May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
>> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
>> ...
>> 
>> To reproduce it, I'd probably have to somehow trigger the condition 
>> manually, any ideas?
>> 
>> Thanks for your time!
>> 
>
> Hi,
>
> You state that this is a new PC. These issues can be caused by faulty 
> firmware e.g. BIOS. Keep an
> eye out on your motherboards manufacturer to see if any BIOS updates become 
> available.
>
> Regards
>
> Phil

Seconded. My system had a similar soft lockup issue[1] (also after
upgrading from Bullseye to Bookworm) though without any backtrace in
journalctl.  After debugging over a month it turned out that the BIOS
had an issue that certain instructions to access the TPM may cause the
system to freeze, and an upgrade to a beta BIOS fixed this issue.  So
definitely contact their customer service and check for similar reports.

[1] https://lists.debian.org/debian-user/2023/04/msg00425.html
-- 
Manphiz


signature.asc
Description: PGP signature


Re: Bookworm soft lockup

2023-05-16 Thread Philip Wyett
On Wed, 2023-05-17 at 09:51 +0930, Christian Gelinek wrote:
> On Mon, 15 May 2023 18:30:31, David wrote:
> > On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:
> > > Christian Gelinek  writes:
> > > 
> > > > Is anyone else seeing a similar problem? What can I do to avoid
> > > > this?
> > > > Do we need anything else to narrow it down further?
> > > 
> > > Only time I've seen a soft lockup was from a bad CPU. There were a
> > > bunch
> > > of them and eventually the computer hung. Going back to the slow
> > > plodding Celeron fixed all issues. Except CPU performance of course.
> > 
> > It's happened to me a couple of times, but only since I switched from
> > stable to testing, over the last month.
> > As I don't think everybody is running a Dell 980 desktop, or the same
> > desktop environment, it's probably not a hardware/software mismatch.
> > We'd be looking at strictly software, I suspect.
> 
> I have the same hunch. I got that PC new on Feb 20th, so I hope it's not 
> the CPU.  It also happened just twice since then, even though I keep it 
> running pretty much 24/7.  Not doing very much when I'm not there, which 
> was the case both times.
> 
> And for both times, the journalctl log looks suspiciously similar, 
> starting with the snd_hda_intel entry.
> 
> First time:
> 
> Apr 10 07:36:07 gar systemd[1]: anacron.service: Deactivated successfully.
> Apr 10 07:50:01 gar kernel: snd_hda_intel :04:00.0: Unable to change 
> power state from D3hot to D0, device inaccessible
> Apr 10 07:50:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> Apr 10 07:50:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> ...
> 
> Second time:
> 
> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
> for user root
> May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
> power state from D3hot to D0, device inaccessible
> May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> ...
> 
> To reproduce it, I'd probably have to somehow trigger the condition 
> manually, any ideas?
> 
> Thanks for your time!
> 

Hi,

You state that this is a new PC. These issues can be caused by faulty firmware 
e.g. BIOS. Keep an
eye out on your motherboards manufacturer to see if any BIOS updates become 
available.

Regards

Phil

-- 
*** Playing the game for the games own sake. ***


Associations:

* Debian Maintainer (DM)
* Fedora/EPEL Maintainer.
* Contributor member of the AlmaLinux foundation.

WWW: https://kathenas.org

Buy Me a Coffee: https://www.buymeacoffee.com/kathenasorg

Twitter: @kathenasorg

Instagram: @kathenasorg

IRC: kathenas

GPG: 724AA9B52F024C8B


signature.asc
Description: This is a digitally signed message part


Re: Bookworm soft lockup

2023-05-16 Thread Philip Wyett
On Mon, 2023-05-15 at 10:43 +0930, Christian Gelinek wrote:
> Hi,
> 
> I encountered my Debian frozen this morning. This is the 2nd time this 
> happened, the 1st one was on April 10, with very similar symptoms: The 
> PC was still running, but moving the mouse or typing didn't wake up my 
> screens and I couldn't connect to it via SSH.
> 
> After force-rebooting, I had a look at journalctl and these are the 
> messages before the reboot:
> 
> May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
> May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" 
> operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" 
> capability=12  capname="net_admin"
> May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
> May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): 
> apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 
> comm="cupsd" capability=12  capname="net_admin"
> May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make 
> remote CUPS printers available locally.
> May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully.
> May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log 
> files.
> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened 
> for user root(uid=0) by (uid=0)
> May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report 
> /etc/cron.hourly)
> May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
> for user root
> May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
> power state from D3hot to D0, device inaccessible
> May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:07 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
> May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:11 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* gt: timed out waiting for forcewake ack to clear.
> May 14 00:54:16 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:17 gar kernel: i915 :03:00.0: [drm] *ERROR* CT: 
> Corrupted descriptor head=4294967295 tail=4294967295 status=0x
> May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* render: timed out waiting for forcewake ack to clear.
> May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
> *ERROR* gt: timed out waiting for forcewake ack to clear.
> May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
> [i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
> May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck 
> for 26s! [kworker/15:1:233]
> May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer 
> snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 
> dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc 
> binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
> May 14 00:54:26 gar kernel:  intel_uncore ee1004 pcspkr watchdog snd 
> soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core 
> acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop 
> efi_pstore configfs efivarfs ip_tables x_tables autof>
> May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: 
> G U  W  6.1.0-8-amd64 #1  Debian 6.1.25-1
> May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., 
> Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022
> May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
> May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
> May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 
> 66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf 
> 4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 
> ea ff ff ff
> May 14 00:54:26 gar kernel: RSP: 0018:a9d000947cc0 EFLAGS: 0286
> May 14 00:54:26 gar kernel: RAX:  RBX: 0040 

Re: Bookworm soft lockup

2023-05-16 Thread Christian Gelinek

On Mon, 15 May 2023 09:48:12 +0100, piorunz  wrote:

On 15/05/2023 02:13, Christian Gelinek wrote:

It seems to be an issue with the i915 driver, potentially triggered by
snd_hda_intel.


Yes indeed that looks like it, to my untrained eye.
Does it happen on Debian Stable (bullseye) also?
I have one laptop with Intel CPU, Intel integrated graphics, on Debian
Stable (bullseye), and it's ok, never crashes. Probably using same
driver as you, i915 on that one.


Interesting. I'm using an Intel ARC 750 discrete graphics card and 
couldn't get it to work properly with bullseye, which is why I'm using 
bookworm. (In bullseye, the kernel and the firmware-misc-nonfree 
packages weren't recent enough).


Since it happened just twice in the last (almost) 3 months, I think I'd 
need to manually trigger the cause of this lockup - any suggestions are 
welcome.


Thanks for your time!



Re: Bookworm soft lockup

2023-05-16 Thread Christian Gelinek

On Mon, 15 May 2023 18:30:31, David wrote:

On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:

Christian Gelinek  writes:

> Is anyone else seeing a similar problem? What can I do to avoid
> this?
> Do we need anything else to narrow it down further?

Only time I've seen a soft lockup was from a bad CPU. There were a
bunch
of them and eventually the computer hung. Going back to the slow
plodding Celeron fixed all issues. Except CPU performance of course.


It's happened to me a couple of times, but only since I switched from
stable to testing, over the last month.
As I don't think everybody is running a Dell 980 desktop, or the same
desktop environment, it's probably not a hardware/software mismatch.
We'd be looking at strictly software, I suspect.


I have the same hunch. I got that PC new on Feb 20th, so I hope it's not 
the CPU.  It also happened just twice since then, even though I keep it 
running pretty much 24/7.  Not doing very much when I'm not there, which 
was the case both times.


And for both times, the journalctl log looks suspiciously similar, 
starting with the snd_hda_intel entry.


First time:

Apr 10 07:36:07 gar systemd[1]: anacron.service: Deactivated successfully.
Apr 10 07:50:01 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
Apr 10 07:50:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
Apr 10 07:50:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]

...

Second time:

May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
for user root
May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]

...

To reproduce it, I'd probably have to somehow trigger the condition 
manually, any ideas?


Thanks for your time!



Re: Bookworm soft lockup

2023-05-15 Thread Timothy M Butterworth
On Mon, May 15, 2023 at 4:31 AM David  wrote:

> On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:
> > Christian Gelinek  writes:
> >
> > > Is anyone else seeing a similar problem? What can I do to avoid
> > > this?
> > > Do we need anything else to narrow it down further?
> >
> > Only time I've seen a soft lockup was from a bad CPU. There were a
> > bunch
> > of them and eventually the computer hung. Going back to the slow
> > plodding Celeron fixed all issues. Except CPU performance of course.
>
> It's happened to me a couple of times, but only since I switched from
> stable to testing, over the last month.
> As I don't think everybody is running a Dell 980 desktop, or the same
> desktop environment, it's probably not a hardware/software mismatch.
> We'd be looking at strictly software, I suspect.
> Cheers!
>

I am also running Bookworm. I am not having the problems you describe. I
have KDE Plasma installed. I do have an issue with CrossOver Office locking
up when exiting TES IV Oblivion. I have to switch terminals and manually
kill the Oblivion processes to get access to the GUI back.


>
> --
> A Kiwi in Australia,
> doing my bit toward raising the national standard.
>
>

-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀


Re: Bookworm soft lockup

2023-05-15 Thread piorunz

On 15/05/2023 02:13, Christian Gelinek wrote:

It seems to be an issue with the i915 driver, potentially triggered by
snd_hda_intel.


Yes indeed that looks like it, to my untrained eye.
Does it happen on Debian Stable (bullseye) also?
I have one laptop with Intel CPU, Intel integrated graphics, on Debian
Stable (bullseye), and it's ok, never crashes. Probably using same
driver as you, i915 on that one.

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄



Re: Bookworm soft lockup

2023-05-15 Thread David
On Mon, 2023-05-15 at 11:17 +0300, Anssi Saari wrote:
> Christian Gelinek  writes:
> 
> > Is anyone else seeing a similar problem? What can I do to avoid
> > this?
> > Do we need anything else to narrow it down further?
> 
> Only time I've seen a soft lockup was from a bad CPU. There were a
> bunch
> of them and eventually the computer hung. Going back to the slow
> plodding Celeron fixed all issues. Except CPU performance of course.

It's happened to me a couple of times, but only since I switched from
stable to testing, over the last month.
As I don't think everybody is running a Dell 980 desktop, or the same
desktop environment, it's probably not a hardware/software mismatch.
We'd be looking at strictly software, I suspect.
Cheers!


-- 
A Kiwi in Australia,
doing my bit toward raising the national standard.



Re: Bookworm soft lockup

2023-05-15 Thread Anssi Saari
Christian Gelinek  writes:

> Is anyone else seeing a similar problem? What can I do to avoid this?
> Do we need anything else to narrow it down further?

Only time I've seen a soft lockup was from a bad CPU. There were a bunch
of them and eventually the computer hung. Going back to the slow
plodding Celeron fixed all issues. Except CPU performance of course.




Re: Bookworm soft lockup

2023-05-15 Thread gene heskett

On 5/14/23 21:30, Christian Gelinek wrote:

I've had 2 similar lockups that needed a front panel reset just in the 
last 2 weeks.

Something isn't right.

Hi,

I encountered my Debian frozen this morning. This is the 2nd time this 
happened, the 1st one was on April 10, with very similar symptoms: The 
PC was still running, but moving the mouse or typing didn't wake up my 
screens and I couldn't connect to it via SSH.


After force-rebooting, I had a look at journalctl and these are the 
messages before the reboot:


May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" 
operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" 
capability=12  capname="net_admin"

May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): 
apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 
comm="cupsd" capability=12  capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make 
remote CUPS printers available locally.
May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated 
successfully.
May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log 
files.
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened 
for user root(uid=0) by (uid=0)
May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report 
/etc/cron.hourly)
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
for user root
May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:07 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]

May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:11 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:16 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:17 gar kernel: i915 :03:00.0: [drm] *ERROR* CT: 
Corrupted descriptor head=4294967295 tail=4294967295 status=0x
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck 
for 26s! [kworker/15:1:233]
May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer 
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 
dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc 
binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
May 14 00:54:26 gar kernel:  intel_uncore ee1004 pcspkr watchdog snd 
soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core 
acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop 
efi_pstore configfs efivarfs ip_tables x_tables autof>
May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: 
G U  W  6.1.0-8-amd64 #1  Debian 6.1.25-1
May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., 
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022

May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf 
4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 
ea ff ff ff

May 14 00:54:26 gar kernel: RSP: 0018:a9d000947cc0 EFLAGS: 0286
May 14 00:54:26 gar kernel: RAX:  RBX: 0040 
RCX: 0ffc
May 14 00:54:26 gar kernel: RDX: 00

Bookworm soft lockup

2023-05-14 Thread Christian Gelinek

Hi,

I encountered my Debian frozen this morning. This is the 2nd time this 
happened, the 1st one was on April 10, with very similar symptoms: The 
PC was still running, but moving the mouse or typing didn't wake up my 
screens and I couldn't connect to it via SSH.


After force-rebooting, I had a look at journalctl and these are the 
messages before the reboot:


May 14 00:00:09 gar systemd[1]: Starting cups.service - CUPS Scheduler...
May 14 00:00:09 gar audit[2912]: AVC apparmor="DENIED" 
operation="capable" profile="/usr/sbin/cupsd" pid=2912 comm="cupsd" 
capability=12  capname="net_admin"

May 14 00:00:09 gar systemd[1]: Started cups.service - CUPS Scheduler.
May 14 00:00:09 gar kernel: audit: type=1400 audit(1683988209.079:32): 
apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2912 
comm="cupsd" capability=12  capname="net_admin"
May 14 00:00:09 gar systemd[1]: Started cups-browsed.service - Make 
remote CUPS printers available locally.

May 14 00:00:09 gar systemd[1]: logrotate.service: Deactivated successfully.
May 14 00:00:09 gar systemd[1]: Finished logrotate.service - Rotate log 
files.
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session opened 
for user root(uid=0) by (uid=0)
May 14 00:17:01 gar CRON[2930]: (root) CMD (cd / && run-parts --report 
/etc/cron.hourly)
May 14 00:17:01 gar CRON[2929]: pam_unix(cron:session): session closed 
for user root
May 14 00:54:00 gar kernel: snd_hda_intel :04:00.0: Unable to change 
power state from D3hot to D0, device inaccessible
May 14 00:54:03 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:03 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:07 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:07 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]

May 14 00:54:11 gar kernel: hrtimer: interrupt took 252466383 ns
May 14 00:54:11 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:11 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:16 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:16 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:17 gar kernel: i915 :03:00.0: [drm] *ERROR* CT: 
Corrupted descriptor head=4294967295 tail=4294967295 status=0x
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* render: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: [drm:fw_domains_get_with_fallback [i915]] 
*ERROR* gt: timed out waiting for forcewake ack to clear.
May 14 00:54:26 gar kernel: i915 :03:00.0: [drm:add_taint_for_CI 
[i915]] CI tainted:0x9 by fw_domains_get_with_fallback+0x20c/0x230 [i915]
May 14 00:54:26 gar kernel: watchdog: BUG: soft lockup - CPU#15 stuck 
for 26s! [kworker/15:1:233]
May 14 00:54:26 gar kernel: Modules linked in: snd_seq_dummy snd_hrtimer 
snd_seq snd_seq_device nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 
dns_resolver nfs lockd grace fscache netfs rfkill qrtr sunrpc 
binfmt_misc nls_ascii nls_cp437 vfat fat snd_sof_pci_>
May 14 00:54:26 gar kernel:  intel_uncore ee1004 pcspkr watchdog snd 
soundcore intel_vsec serial_multi_instantiate acpi_pad intel_pmc_core 
acpi_tad mei_me sg mei evdev parport_pc ppdev lp parport fuse loop 
efi_pstore configfs efivarfs ip_tables x_tables autof>
May 14 00:54:26 gar kernel: CPU: 15 PID: 233 Comm: kworker/15:1 Tainted: 
G U  W  6.1.0-8-amd64 #1  Debian 6.1.25-1
May 14 00:54:26 gar kernel: Hardware name: Micro-Star International Co., 
Ltd. MS-7E02/PRO B760M-P DDR4 (MS-7E02), BIOS 1.00 10/21/2022

May 14 00:54:26 gar kernel: Workqueue: pm pm_runtime_work
May 14 00:54:26 gar kernel: RIP: 0010:pci_mmcfg_read+0xb0/0xe0
May 14 00:54:26 gar kernel: Code: 5d 41 5e 41 5f c3 cc cc cc cc 4c 01 e0 
66 8b 00 0f b7 c0 89 45 00 eb dc 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb cf 
4c 01 e0 8b 00 <89> 45 00 eb c5 e8 66 a2 78 ff c7 45 00 ff ff ff ff b8 
ea ff ff ff

May 14 00:54:26 gar kernel: RSP: 0018:a9d000947cc0 EFLAGS: 0286
May 14 00:54:26 gar kernel: RAX:  RBX: 0040 
RCX: 0ffc
May 14 00:54:26 gar kernel: RDX: 00ff RSI: 0004 
RDI: 
May 14 00:54:26 gar kernel: RBP: a9d000947cfc R08: 0004 
R09: a9d000947cfc