Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
Hi, Thank you for your help and patience. >Jul 3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm: >rt_cansend Tainted: G OE 4.19.84-solve-alc-failure #1 >Jul 3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux >Jul 3 10:23:31 yx kernel: [ 1176.166218] RIP: >0010:queued_spin_lock_slowpath+0xd9/0x1a0 ... > Jul 3 10:23:31 yx kernel: [ 1176.166252] Call Trace: > Jul 3 10:23:31 yx kernel: [ 1176.166261] _raw_spin_lock+0x20/0x30 > Jul 3 10:23:31 yx kernel: [ 1176.166270] can_write+0x6c/0x2c0 [advcan] > Jul 3 10:23:31 yx kernel: [ 1176.166292] __vfs_write+0x3a/0x190 One more question, what's the relation between "queued_spin_lock_slowpath" and "_raw_spin_lock"? Best Regards. 孙世龙 sunshilong 于2020年7月4日周六 下午5:13写道: > > Hi, Valdis Klētnieks > > Thank you for taking the time to respond to me. > I have a better understanding of this matter. > > >> Can I draw the conclusion that continually acquiring the spinlock causes > >> the soft > >> lockup and the CPU has been stuck for 22s? > >> Can I think in this way? > > >No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > I see. So there is a thread that has held the corresponding spinlock > for more 22s, and a CPU is sticking(busy acquiring the spinlock) at the > same duration. > Can I think in this way? > > Thank you for your attention to this matter. > Best Regards. > > Valdis Klētnieks 于2020年7月4日周六 下午4:09写道: > > 孙世龙 sunshilong 于2020年7月4日周六 下午5:04写道: > > > > Hi, Valdis Klētnieks > > > > Thank you for taking the time to respond to me. > > I have a better understanding of this matter. > > > > >> Can I draw the conclusion that continually acquiring the spinlock causes > > >> the soft > > >> lockup and the CPU has been stuck for 22s? > > >> Can I think in this way? > > > > >No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > > > I see. So there is a thread that has held the corresponding spinlock > > for more 22s. > > Can I think in this way? > > > > Thank you for your attention to this matter. > > Best Regards. > > > > Valdis Klētnieks 于2020年7月4日周六 下午4:09写道: > > > > > > > > > > Can I draw the conclusion that continually acquiring the spinlock > > > > causes the soft > > > > lockup and the CPU has been stuck for 22s? > > > > Can I think in this way? > > > > > > No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > > > > > For comparison - spinlocks are usually used when you need a lock, but the > > > code protected by the lock is short (things like adding to a linked list, > > > etc), > > > so it should again become available in milliseconds - things where it > > > would take > > > longer to put this thread to sleep and wake another one up than we expect > > > to be waiting for this lock. ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
Hi, Valdis Klētnieks Thank you for taking the time to respond to me. I have a better understanding of this matter. >> Can I draw the conclusion that continually acquiring the spinlock causes the >> soft >> lockup and the CPU has been stuck for 22s? >> Can I think in this way? >No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. I see. So there is a thread that has held the corresponding spinlock for more 22s, and a CPU is sticking(busy acquiring the spinlock) at the same duration. Can I think in this way? Thank you for your attention to this matter. Best Regards. Valdis Klētnieks 于2020年7月4日周六 下午4:09写道: 孙世龙 sunshilong 于2020年7月4日周六 下午5:04写道: > > Hi, Valdis Klētnieks > > Thank you for taking the time to respond to me. > I have a better understanding of this matter. > > >> Can I draw the conclusion that continually acquiring the spinlock causes > >> the soft > >> lockup and the CPU has been stuck for 22s? > >> Can I think in this way? > > >No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > I see. So there is a thread that has held the corresponding spinlock > for more 22s. > Can I think in this way? > > Thank you for your attention to this matter. > Best Regards. > > Valdis Klētnieks 于2020年7月4日周六 下午4:09写道: > > > > > > > Can I draw the conclusion that continually acquiring the spinlock causes > > > the soft > > > lockup and the CPU has been stuck for 22s? > > > Can I think in this way? > > > > No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > > > For comparison - spinlocks are usually used when you need a lock, but the > > code protected by the lock is short (things like adding to a linked list, > > etc), > > so it should again become available in milliseconds - things where it would > > take > > longer to put this thread to sleep and wake another one up than we expect > > to be waiting for this lock. ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
Hi, Valdis Klētnieks Thank you for taking the time to respond to me. I have a better understanding of this matter. >> Can I draw the conclusion that continually acquiring the spinlock causes the >> soft >> lockup and the CPU has been stuck for 22s? >> Can I think in this way? >No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. I see. So there is a thread that has held the corresponding spinlock for more 22s. Can I think in this way? Thank you for your attention to this matter. Best Regards. Valdis Klētnieks 于2020年7月4日周六 下午4:09写道: > > > > Can I draw the conclusion that continually acquiring the spinlock causes > > the soft > > lockup and the CPU has been stuck for 22s? > > Can I think in this way? > > No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. > > For comparison - spinlocks are usually used when you need a lock, but the > code protected by the lock is short (things like adding to a linked list, > etc), > so it should again become available in milliseconds - things where it would > take > longer to put this thread to sleep and wake another one up than we expect > to be waiting for this lock. ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
> Can I draw the conclusion that continually acquiring the spinlock causes the > soft > lockup and the CPU has been stuck for 22s? > Can I think in this way? No. It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock. For comparison - spinlocks are usually used when you need a lock, but the code protected by the lock is short (things like adding to a linked list, etc), so it should again become available in milliseconds - things where it would take longer to put this thread to sleep and wake another one up than we expect to be waiting for this lock. pgpSMIjGyL8bs.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
Hi, Valdis Klētnieks Thank you for your generous help. My understanding of this matter is on a different level with your help. >>Jul 3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup - >>CPU#0 stuck for 22s! [rt_cansend:1837] >>Jul 3 10:23:31 yx kernel: [ 1176.166066] Modules linked in: >>.. >>Jul 3 10:23:31 yx kernel: [ 1176.166252] Call Trace: >>Jul 3 10:23:31 yx kernel: [ 1176.166261] _raw_spin_lock+0x20/0x30 >>Jul 3 10:23:31 yx kernel: [ 1176.166270] can_write+0x6c/0x2c0 [advcan] >> >You get into function can_write() in module advcan. >That tries to take a spinlock, while something else already has it. Can I draw the conclusion that continually acquiring the spinlock causes the soft lockup and the CPU has been stuck for 22s? Can I think in this way? Thank you for your attention to this matter. Best Regards. Valdis Klētnieks 于2020年7月4日周六 下午12:39写道: > > > Could you please give me some hint on how to investigate the cause deeply? > > Shortening the call trace to the relevant lines: > > > Jul 3 10:23:31 yx kernel: [ 1176.166252] Call Trace: > > Jul 3 10:23:31 yx kernel: [ 1176.166261] _raw_spin_lock+0x20/0x30 > > Jul 3 10:23:31 yx kernel: [ 1176.166270] can_write+0x6c/0x2c0 [advcan] > > Jul 3 10:23:31 yx kernel: [ 1176.166292] __vfs_write+0x3a/0x190 > > You get into function can_write() in module advcan. > > That tries to take a spinlock, while something else already has it. > > The spinlock call is (roughly) 15% of the way through the function > can_write(). > > The 'modules linked in' list includes "advcan(OE)". > > The 'O' tells us it's an out-of-tree module, which means you need to talk to > whoever wrote the module and find out why it's hanging on a spin lock (most > likely something else is failing to release it). > > And that's about as far as we can hint, since we don't have the source for > your > out-of-tree module. If the people who wrote it would clean it up and get it > into the base Linux tree, then we'd all have access to it and be able to help > in much greater detail. > ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?
> Could you please give me some hint on how to investigate the cause deeply? Shortening the call trace to the relevant lines: > Jul 3 10:23:31 yx kernel: [ 1176.166252] Call Trace: > Jul 3 10:23:31 yx kernel: [ 1176.166261] _raw_spin_lock+0x20/0x30 > Jul 3 10:23:31 yx kernel: [ 1176.166270] can_write+0x6c/0x2c0 [advcan] > Jul 3 10:23:31 yx kernel: [ 1176.166292] __vfs_write+0x3a/0x190 You get into function can_write() in module advcan. That tries to take a spinlock, while something else already has it. The spinlock call is (roughly) 15% of the way through the function can_write(). The 'modules linked in' list includes "advcan(OE)". The 'O' tells us it's an out-of-tree module, which means you need to talk to whoever wrote the module and find out why it's hanging on a spin lock (most likely something else is failing to release it). And that's about as far as we can hint, since we don't have the source for your out-of-tree module. If the people who wrote it would clean it up and get it into the base Linux tree, then we'd all have access to it and be able to help in much greater detail. pgp837ZGaCkbE.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
How can I investigate the cause of "watchdog: BUG: soft lockup"?
Hi, list I encountered the error of "watchdog: BUG: soft lockup" when I sent data through the can bus. Could you please give me some hint on how to investigate the cause deeply? Thank you for your attention to this matter. The most related log(full log is seen at the footnote): Jul 3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun Jul 3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [rt_cansend:1837] ... Jul 3 10:23:31 yx kernel: [ 1176.166252] Call Trace: Jul 3 10:23:31 yx kernel: [ 1176.166261] _raw_spin_lock+0x20/0x30 Jul 3 10:23:31 yx kernel: [ 1176.166270] can_write+0x6c/0x2c0 [advcan] Jul 3 10:23:31 yx kernel: [ 1176.166276] ? dequeue_signal+0xae/0x1a0 Jul 3 10:23:31 yx kernel: [ 1176.166281] ? recalc_sigpending+0x1b/0x50 Jul 3 10:23:31 yx kernel: [ 1176.166286] ? __set_task_blocked+0x3c/0xa0 Jul 3 10:23:31 yx kernel: [ 1176.166292] __vfs_write+0x3a/0x190 Jul 3 10:23:31 yx kernel: [ 1176.166298] ? apparmor_file_permission+0x1a/0x20 Jul 3 10:23:31 yx kernel: [ 1176.166302] ? security_file_permission+0x3b/0xc0 Jul 3 10:23:31 yx kernel: [ 1176.166307] vfs_write+0xb8/0x1b0 Jul 3 10:23:31 yx kernel: [ 1176.166312] ksys_write+0x5c/0xe0 Jul 3 10:23:31 yx kernel: [ 1176.166316] __x64_sys_write+0x1a/0x20 Jul 3 10:23:31 yx kernel: [ 1176.166321] do_syscall_64+0x87/0x250 Jul 3 10:23:31 yx kernel: [ 1176.166326] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Here is the full log: Jul 3 10:06:16 yx kernel: [ 140.313856] CAN[0][0] RX: FIFO overrun Jul 3 10:06:59 yx kernel: [ 183.323792] CAN[0][0] RX: FIFO overrun Jul 3 10:07:42 yx kernel: [ 226.329465] CAN[0][0] RX: FIFO overrun Jul 3 10:08:24 yx kernel: [ 268.362822] CAN[0][0] RX: FIFO overrun Jul 3 10:09:07 yx kernel: [ 311.372488] CAN[0][0] RX: FIFO overrun Jul 3 10:09:50 yx kernel: [ 354.377996] CAN[0][0] RX: FIFO overrun Jul 3 10:10:32 yx kernel: [ 396.411726] CAN[0][0] RX: FIFO overrun Jul 3 10:11:15 yx kernel: [ 439.421156] CAN[0][0] RX: FIFO overrun Jul 3 10:11:58 yx kernel: [ 482.426522] CAN[0][0] RX: FIFO overrun Jul 3 10:12:40 yx kernel: [ 524.460688] CAN[0][0] RX: FIFO overrun Jul 3 10:13:23 yx kernel: [ 567.469857] CAN[0][0] RX: FIFO overrun Jul 3 10:14:06 yx kernel: [ 610.475021] CAN[0][0] RX: FIFO overrun Jul 3 10:14:48 yx kernel: [ 652.509597] CAN[0][0] RX: FIFO overrun Jul 3 10:15:31 yx kernel: [ 695.518491] CAN[0][0] RX: FIFO overrun Jul 3 10:16:14 yx kernel: [ 738.523551] CAN[0][0] RX: FIFO overrun Jul 3 10:16:55 yx kernel: [ 779.558139] CAN[0][0] RX: FIFO overrun Jul 3 10:17:38 yx kernel: [ 822.566773] CAN[0][0] RX: FIFO overrun Jul 3 10:18:21 yx kernel: [ 865.571697] CAN[0][0] RX: FIFO overrun Jul 3 10:19:03 yx kernel: [ 907.607049] CAN[0][0] RX: FIFO overrun Jul 3 10:19:46 yx kernel: [ 950.615449] CAN[0][0] RX: FIFO overrun Jul 3 10:20:29 yx kernel: [ 993.620196] CAN[0][0] RX: FIFO overrun Jul 3 10:21:11 yx kernel: [ 1035.655974] CAN[0][0] RX: FIFO overrun Jul 3 10:21:54 yx kernel: [ 1078.664116] CAN[0][0] RX: FIFO overrun Jul 3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun Jul 3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [rt_cansend:1837] Jul 3 10:23:31 yx kernel: [ 1176.166066] Modules linked in: bnep snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic nls_iso8859_1 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp kvm_intel snd_seq punit_atom_debug crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_seq_device cryptd intel_cstate snd_timer hci_uart snd lpc_ich advcan(OE) mei_txe btqca soundcore mei btbcm btintel bluetooth ecdh_generic rfkill_gpio pwm_lpss_platform mac_hid pwm_lpss parport_pc ppdev lp parport autofs4 i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb drm dca ahci i2c_algo_bit libahci video i2c_hid hid Jul 3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm: rt_cansend Tainted: G OE 4.19.84-solve-alc-failure #1 Jul 3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux Jul 3 10:23:31 yx kernel: [ 1176.166218] RIP: 0010:queued_spin_lock_slowpath+0xd9/0x1a0 Jul 3 10:23:31 yx kernel: [ 1176.166223] Code: 48 03 34 c5 00 67 37 91 48 89 16 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 48 85 f6 74 07 0f 0d 0e eb 02 f3 90 <8b> 07 66 85 c0 75 f7 41 89 c0 66 45 31 c0 41 39 c8 0f 84 96 00 00 Jul 3 10:23:31 yx kernel: [ 1176.166226] RSP: 0018:be6f4c17bd08 EFLAGS: 0202 ORIG_RAX: ff13 Jul 3 10:23:31 yx kernel: [ 1176.166231] RAX: RBX: RCX: Jul 3 10:23:31 yx kernel: [ 1176.166234] RDX: RSI: RDI: Jul 3 10:23:31 yx kernel: [ 1176.166236] RBP: be6f4c17bd08 R08: R09