Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-07 Thread 孙世龙 sunshilong
Hi,

Thank you for your help and patience.

>Jul  3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm:
>rt_cansend Tainted: G   OE 4.19.84-solve-alc-failure #1
>Jul  3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux
>Jul  3 10:23:31 yx kernel: [ 1176.166218] RIP:
>0010:queued_spin_lock_slowpath+0xd9/0x1a0
...
>  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190

One more question, what's the relation between "queued_spin_lock_slowpath"
and "_raw_spin_lock"?

Best Regards.

孙世龙 sunshilong  于2020年7月4日周六 下午5:13写道:
>
> Hi, Valdis Klētnieks
>
> Thank you for taking the time to respond to me.
> I have a better understanding of this matter.
>
> >> Can I draw the conclusion that continually acquiring the spinlock causes 
> >> the soft
> >> lockup and the CPU has been stuck for 22s?
> >> Can I think in this way?
>
> >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> I see. So there is a thread that has held the corresponding spinlock
> for more 22s,  and a CPU is sticking(busy acquiring the spinlock) at the
> same duration.
> Can I think in this way?
>
> Thank you for your attention to this matter.
> Best Regards.
>
> Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
>
> 孙世龙 sunshilong  于2020年7月4日周六 下午5:04写道:
> >
> > Hi, Valdis Klētnieks
> >
> > Thank you for taking the time to respond to me.
> > I have a better understanding of this matter.
> >
> > >> Can I draw the conclusion that continually acquiring the spinlock causes 
> > >> the soft
> > >> lockup and the CPU has been stuck for 22s?
> > >> Can I think in this way?
> >
> > >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> >
> > I see. So there is a thread that has held the corresponding spinlock
> > for more 22s.
> > Can I think in this way?
> >
> > Thank you for your attention to this matter.
> > Best Regards.
> >
> > Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
> > >
> > >
> > > > Can I draw the conclusion that continually acquiring the spinlock 
> > > > causes the soft
> > > > lockup and the CPU has been stuck for 22s?
> > > > Can I think in this way?
> > >
> > > No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> > >
> > > For comparison - spinlocks are usually used when you need a lock, but the
> > > code protected by the lock is short (things like adding to a linked list, 
> > > etc),
> > > so it should again become available in milliseconds - things where it 
> > > would take
> > > longer to put this thread to sleep and wake another one up than we expect
> > > to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for taking the time to respond to me.
I have a better understanding of this matter.

>> Can I draw the conclusion that continually acquiring the spinlock causes the 
>> soft
>> lockup and the CPU has been stuck for 22s?
>> Can I think in this way?

>No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

I see. So there is a thread that has held the corresponding spinlock
for more 22s,  and a CPU is sticking(busy acquiring the spinlock) at the
same duration.
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.

Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:

孙世龙 sunshilong  于2020年7月4日周六 下午5:04写道:
>
> Hi, Valdis Klētnieks
>
> Thank you for taking the time to respond to me.
> I have a better understanding of this matter.
>
> >> Can I draw the conclusion that continually acquiring the spinlock causes 
> >> the soft
> >> lockup and the CPU has been stuck for 22s?
> >> Can I think in this way?
>
> >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> I see. So there is a thread that has held the corresponding spinlock
> for more 22s.
> Can I think in this way?
>
> Thank you for your attention to this matter.
> Best Regards.
>
> Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
> >
> >
> > > Can I draw the conclusion that continually acquiring the spinlock causes 
> > > the soft
> > > lockup and the CPU has been stuck for 22s?
> > > Can I think in this way?
> >
> > No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> >
> > For comparison - spinlocks are usually used when you need a lock, but the
> > code protected by the lock is short (things like adding to a linked list, 
> > etc),
> > so it should again become available in milliseconds - things where it would 
> > take
> > longer to put this thread to sleep and wake another one up than we expect
> > to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for taking the time to respond to me.
I have a better understanding of this matter.

>> Can I draw the conclusion that continually acquiring the spinlock causes the 
>> soft
>> lockup and the CPU has been stuck for 22s?
>> Can I think in this way?

>No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

I see. So there is a thread that has held the corresponding spinlock
for more 22s.
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.

Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
>
>
> > Can I draw the conclusion that continually acquiring the spinlock causes 
> > the soft
> > lockup and the CPU has been stuck for 22s?
> > Can I think in this way?
>
> No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> For comparison - spinlocks are usually used when you need a lock, but the
> code protected by the lock is short (things like adding to a linked list, 
> etc),
> so it should again become available in milliseconds - things where it would 
> take
> longer to put this thread to sleep and wake another one up than we expect
> to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread Valdis Klētnieks

> Can I draw the conclusion that continually acquiring the spinlock causes the 
> soft
> lockup and the CPU has been stuck for 22s?
> Can I think in this way?

No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

For comparison - spinlocks are usually used when you need a lock, but the
code protected by the lock is short (things like adding to a linked list, etc),
so it should again become available in milliseconds - things where it would take
longer to put this thread to sleep and wake another one up than we expect
to be waiting for this lock.


pgpSMIjGyL8bs.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for your generous help.
My understanding of this matter is on a different level with your help.

>>Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
>>CPU#0 stuck for 22s! [rt_cansend:1837]
>>Jul  3 10:23:31 yx kernel: [ 1176.166066] Modules linked in:
>>..
>>Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
>>Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
>>Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
>>
>You get into function can_write() in module advcan.
>That tries to take a spinlock, while something else already has it.
Can I draw the conclusion that continually acquiring the spinlock
causes the soft
lockup and the CPU has been stuck for 22s?
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.


Valdis Klētnieks  于2020年7月4日周六 下午12:39写道:
>
> > Could you please give me some hint on how to investigate the cause deeply?
>
> Shortening the call trace to the relevant lines:
>
> >  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> > Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190
>
> You get into function can_write() in module advcan.
>
> That tries to take a spinlock, while something else already has it.
>
> The spinlock call is (roughly) 15% of the way through the function 
> can_write().
>
> The 'modules linked in' list includes "advcan(OE)".
>
> The 'O' tells us it's an out-of-tree module, which means you need to talk to
> whoever wrote the module and find out why it's hanging on a spin lock (most
> likely something else is failing to release it).
>
> And that's about as far as we can hint, since we don't have the source for 
> your
> out-of-tree module.  If the people who wrote it would clean it up and get it
> into the base Linux tree, then we'd all have access to it and be able to help
> in much greater detail.
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-03 Thread Valdis Klētnieks
> Could you please give me some hint on how to investigate the cause deeply?

Shortening the call trace to the relevant lines:

>  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190

You get into function can_write() in module advcan.

That tries to take a spinlock, while something else already has it.

The spinlock call is (roughly) 15% of the way through the function can_write().

The 'modules linked in' list includes "advcan(OE)".

The 'O' tells us it's an out-of-tree module, which means you need to talk to
whoever wrote the module and find out why it's hanging on a spin lock (most
likely something else is failing to release it).

And that's about as far as we can hint, since we don't have the source for your
out-of-tree module.  If the people who wrote it would clean it up and get it
into the base Linux tree, then we'd all have access to it and be able to help
in much greater detail.



pgp837ZGaCkbE.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-03 Thread 孙世龙 sunshilong
Hi, list
I encountered the error of "watchdog: BUG: soft lockup" when I sent
data through the can bus.
Could you please give me some hint on how to investigate the cause deeply?
Thank you for your attention to this matter.

The most related log(full log is seen at the footnote):
 Jul  3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun
Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
CPU#0 stuck for 22s! [rt_cansend:1837]
...
 Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
Jul  3 10:23:31 yx kernel: [ 1176.166276]  ? dequeue_signal+0xae/0x1a0
Jul  3 10:23:31 yx kernel: [ 1176.166281]  ? recalc_sigpending+0x1b/0x50
Jul  3 10:23:31 yx kernel: [ 1176.166286]  ? __set_task_blocked+0x3c/0xa0
Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190
Jul  3 10:23:31 yx kernel: [ 1176.166298]  ? apparmor_file_permission+0x1a/0x20
Jul  3 10:23:31 yx kernel: [ 1176.166302]  ? security_file_permission+0x3b/0xc0
Jul  3 10:23:31 yx kernel: [ 1176.166307]  vfs_write+0xb8/0x1b0
Jul  3 10:23:31 yx kernel: [ 1176.166312]  ksys_write+0x5c/0xe0
Jul  3 10:23:31 yx kernel: [ 1176.166316]  __x64_sys_write+0x1a/0x20
Jul  3 10:23:31 yx kernel: [ 1176.166321]  do_syscall_64+0x87/0x250
Jul  3 10:23:31 yx kernel: [ 1176.166326]
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Here is the full log:
Jul  3 10:06:16 yx kernel: [  140.313856] CAN[0][0] RX: FIFO overrun
Jul  3 10:06:59 yx kernel: [  183.323792] CAN[0][0] RX: FIFO overrun
Jul  3 10:07:42 yx kernel: [  226.329465] CAN[0][0] RX: FIFO overrun
Jul  3 10:08:24 yx kernel: [  268.362822] CAN[0][0] RX: FIFO overrun
Jul  3 10:09:07 yx kernel: [  311.372488] CAN[0][0] RX: FIFO overrun
Jul  3 10:09:50 yx kernel: [  354.377996] CAN[0][0] RX: FIFO overrun
Jul  3 10:10:32 yx kernel: [  396.411726] CAN[0][0] RX: FIFO overrun
Jul  3 10:11:15 yx kernel: [  439.421156] CAN[0][0] RX: FIFO overrun
Jul  3 10:11:58 yx kernel: [  482.426522] CAN[0][0] RX: FIFO overrun
Jul  3 10:12:40 yx kernel: [  524.460688] CAN[0][0] RX: FIFO overrun
Jul  3 10:13:23 yx kernel: [  567.469857] CAN[0][0] RX: FIFO overrun
Jul  3 10:14:06 yx kernel: [  610.475021] CAN[0][0] RX: FIFO overrun
Jul  3 10:14:48 yx kernel: [  652.509597] CAN[0][0] RX: FIFO overrun
Jul  3 10:15:31 yx kernel: [  695.518491] CAN[0][0] RX: FIFO overrun
Jul  3 10:16:14 yx kernel: [  738.523551] CAN[0][0] RX: FIFO overrun
Jul  3 10:16:55 yx kernel: [  779.558139] CAN[0][0] RX: FIFO overrun
Jul  3 10:17:38 yx kernel: [  822.566773] CAN[0][0] RX: FIFO overrun
Jul  3 10:18:21 yx kernel: [  865.571697] CAN[0][0] RX: FIFO overrun
Jul  3 10:19:03 yx kernel: [  907.607049] CAN[0][0] RX: FIFO overrun
Jul  3 10:19:46 yx kernel: [  950.615449] CAN[0][0] RX: FIFO overrun
Jul  3 10:20:29 yx kernel: [  993.620196] CAN[0][0] RX: FIFO overrun
Jul  3 10:21:11 yx kernel: [ 1035.655974] CAN[0][0] RX: FIFO overrun
Jul  3 10:21:54 yx kernel: [ 1078.664116] CAN[0][0] RX: FIFO overrun
Jul  3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun
Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
CPU#0 stuck for 22s! [rt_cansend:1837]
Jul  3 10:23:31 yx kernel: [ 1176.166066] Modules linked in: bnep
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
nls_iso8859_1 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep
snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl
intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp
kvm_intel snd_seq punit_atom_debug crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel snd_seq_device cryptd intel_cstate snd_timer
hci_uart snd lpc_ich advcan(OE) mei_txe btqca soundcore mei btbcm
btintel bluetooth ecdh_generic rfkill_gpio pwm_lpss_platform mac_hid
pwm_lpss parport_pc ppdev lp parport autofs4 i915 kvmgt vfio_mdev mdev
vfio_iommu_type1 vfio kvm irqbypass drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops igb drm dca ahci i2c_algo_bit
libahci video i2c_hid hid
Jul  3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm:
rt_cansend Tainted: G   OE 4.19.84-solve-alc-failure #1
Jul  3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux
Jul  3 10:23:31 yx kernel: [ 1176.166218] RIP:
0010:queued_spin_lock_slowpath+0xd9/0x1a0
Jul  3 10:23:31 yx kernel: [ 1176.166223] Code: 48 03 34 c5 00 67 37
91 48 89 16 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
48 85 f6 74 07 0f 0d 0e eb 02 f3 90 <8b> 07 66 85 c0 75 f7 41 89 c0 66
45 31 c0 41 39 c8 0f 84 96 00 00
Jul  3 10:23:31 yx kernel: [ 1176.166226] RSP: 0018:be6f4c17bd08
EFLAGS: 0202 ORIG_RAX: ff13
Jul  3 10:23:31 yx kernel: [ 1176.166231] RAX:  RBX:
 RCX: 
Jul  3 10:23:31 yx kernel: [ 1176.166234] RDX:  RSI:
 RDI: 
Jul  3 10:23:31 yx kernel: [ 1176.166236] RBP: be6f4c17bd08 R08:
 R09