Bug#822936: INFO: rcu_sched self-detected stall on CPU

2016-05-02 Thread Ben Hutchings
On Fri, 2016-04-29 at 03:10 -0400, Jiann-Ming Su wrote:

> Package: linux-image-3.16.0-4-686-pae
> Version: 3.16.7-ckt25-1
> 
> The netbook doesn't crash, but the entire networking stack freezes up.
> When I press a key on the keyboard, the backlight comes on, and
> everything is functional again.
> 
> Apr 27 21:53:20 ranfan kernel: INFO: rcu_sched self-detected stall on CPU { 
> 0}  (t=6590 jiffies g=23325 c=23324 q=2)
> Apr 27 21:53:20 ranfan kernel: sending NMI to all CPUs:
> Apr 27 21:53:20 ranfan kernel: NMI backtrace for cpu 0
[...]
> Apr 27 21:52:54 ranfan kernel: NMI backtrace for cpu 1

Notice how the log time goes backward by 26 seconds, which corresponds
closely to the stall time of 6590 jiffies.  The syslog daemon generates
the timestamps, not the kernel, but presumably it has been woken up
once on each of the CPUs as they started to write a call trace.

[...]
> Apr 28 03:19:37 ranfan kernel: INFO: rcu_sched self-detected stall on CPU { 
> 0}  (t=67279 jiffies g=88895 c=88894 q=1)
> Apr 28 03:19:37 ranfan kernel: sending NMI to all CPUs:
> Apr 28 03:19:37 ranfan kernel: NMI backtrace for cpu 0
[...]
> Apr 28 03:15:08 ranfan kernel: NMI backtrace for cpu 1

The log time goes backward by 269 seconds, which again corresponds
closely to the stall time of 67279 jiffies.

After that long a stall, the soft-lockup watchdog should also have
fired and logged errror messages, but we don't see them.

In the first two cases the CPU that didn't detect the stall is in
cpu_startup_entry which implies it's just coming online.  But in the
last case, both CPUs seem to be in a more normal state.  Also there's
no sign of clock skew in the log, though it doesn't mean there isn't a
skew.

Ben.

-- 
Ben Hutchings
Life is what happens to you while you're busy making other plans.
   - John Lennon

signature.asc
Description: This is a digitally signed message part


Bug#822936: INFO: rcu_sched self-detected stall on CPU

2016-04-29 Thread Jiann-Ming Su
Package: linux-image-3.16.0-4-686-pae
Version: 3.16.7-ckt25-1

The netbook doesn't crash, but the entire networking stack freezes up.
When I press a key on the keyboard, the backlight comes on, and
everything is functional again.

Apr 27 21:53:20 ranfan kernel: INFO: rcu_sched self-detected stall on
CPU { 0}  (t=6590 jiffies g=23325 c=23324 q=2)
Apr 27 21:53:20 ranfan kernel: sending NMI to all CPUs:
Apr 27 21:53:20 ranfan kernel: NMI backtrace for cpu 0
Apr 27 21:53:20 ranfan kernel: CPU: 0 PID: 0 Comm: swapper/0 Not
tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt25-1
Apr 27 21:53:20 ranfan kernel: Hardware name: Acer AO751h
 /JV11-ML  , BIOS V0.3212 02/26/2010
Apr 27 21:53:20 ranfan kernel: task: c15fca00 ti: c15ee000 task.ti: c15ee000
Apr 27 21:53:20 ranfan kernel: EIP: 0060:[] EFLAGS: 0026 CPU: 0
Apr 27 21:53:20 ranfan kernel: EIP is at
arch_trigger_all_cpu_backtrace+0x95/0xd0
Apr 27 21:53:20 ranfan kernel: EAX: 00418958 EBX: 2710 ECX:
f000 EDX: f000
Apr 27 21:53:20 ranfan kernel: ESI: c1614f80 EDI: f75d9620 EBP:
f5007e40 ESP: f5007e34
Apr 27 21:53:20 ranfan kernel:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr 27 21:53:20 ranfan kernel: CR0: 8005003b CR2: b7765000 CR3:
0171e000 CR4: 07f0
Apr 27 21:53:20 ranfan kernel: Stack:
Apr 27 21:53:20 ranfan kernel:  c1548401 c157a0ce c1614f80 f5007e88
c10ac95d c1557924 19be 5b1d
Apr 27 21:53:20 ranfan kernel:  5b1c 0002  f5007e88
c1084790 0001 c166a5ec c1614f80
Apr 27 21:53:20 ranfan kernel:  f75d9620  c15fca00 
 f5007e9c c1062dac f5009f7c
Apr 27 21:53:20 ranfan kernel: Call Trace:
Apr 27 21:53:20 ranfan kernel:  [] ? rcu_check_callbacks+0x38d/0x5b0
Apr 27 21:53:20 ranfan kernel:  [] ? account_process_tick+0x60/0x130
Apr 27 21:53:20 ranfan kernel:  [] ? update_process_times+0x3c/0x60
Apr 27 21:53:20 ranfan kernel:  [] ?
tick_sched_handle.isra.13+0x26/0x60
Apr 27 21:53:20 ranfan kernel:  [] ? tick_sched_timer+0x37/0x70
Apr 27 21:53:20 ranfan kernel:  [] ? __remove_hrtimer+0x38/0x90
Apr 27 21:53:20 ranfan kernel:  [] ? __run_hrtimer+0x6d/0x190
Apr 27 21:53:20 ranfan kernel:  [] ?
tick_sched_handle.isra.13+0x60/0x60
Apr 27 21:53:20 ranfan kernel:  [] ? hrtimer_interrupt+0x1e8/0x2a0
Apr 27 21:53:20 ranfan kernel:  [] ?
tick_do_broadcast.constprop.7+0x6b/0x70
Apr 27 21:53:20 ranfan kernel:  [] ?
tick_handle_oneshot_broadcast+0x105/0x180
Apr 27 21:53:20 ranfan kernel:  [] ?
add_interrupt_randomness+0x15a/0x1a0
Apr 27 21:53:20 ranfan kernel:  [] ? timer_interrupt+0x12/0x20
Apr 27 21:53:20 ranfan kernel:  [] ?
handle_irq_event_percpu+0x35/0x180
Apr 27 21:53:20 ranfan kernel:  [] ? handle_irq_event+0x2a/0x50
Apr 27 21:53:20 ranfan kernel:  [] ? handle_simple_irq+0x70/0x70
Apr 27 21:53:20 ranfan kernel:  [] ? handle_edge_irq+0x66/0x100
Apr 27 21:53:20 ranfan kernel:  [] ? handle_irq+0x71/0x90
Apr 27 21:53:20 ranfan kernel:  
Apr 27 21:53:20 ranfan kernel:
Apr 27 21:53:20 ranfan kernel:  [] ? do_IRQ+0x3c/0xd0
Apr 27 21:53:20 ranfan kernel:  [] ? rebalance_domains+0x14e/0x250
Apr 27 21:53:20 ranfan kernel:  [] ? common_interrupt+0x33/0x38
Apr 27 21:53:20 ranfan kernel:  [] ?
psb_intel_sdvo_read_response+0xcb/0x200 [gma500_gfx]
Apr 27 21:53:20 ranfan kernel:  [] ? __do_softirq+0x6d/0x230
Apr 27 21:53:20 ranfan kernel:  [] ? cpu_callback+0x160/0x160
Apr 27 21:53:20 ranfan kernel:  [] ? do_softirq_own_stack+0x22/0x30
Apr 27 21:53:20 ranfan kernel:  
Apr 27 21:53:20 ranfan kernel:
Apr 27 21:53:20 ranfan kernel:  [] ? irq_exit+0x8d/0xa0
Apr 27 21:53:20 ranfan kernel:  [] ? do_IRQ+0x45/0xd0
Apr 27 21:53:20 ranfan kernel:  [] ?
tick_broadcast_oneshot_control+0x7b/0x210
Apr 27 21:53:20 ranfan kernel:  [] ? common_interrupt+0x33/0x38
Apr 27 21:53:20 ranfan kernel:  [] ? cpuidle_enter_state+0x3e/0xd0
Apr 27 21:53:20 ranfan kernel:  [] ? cpu_startup_entry+0x2be/0x3b0
Apr 27 21:53:20 ranfan kernel:  [] ? start_kernel+0x3dd/0x3e2
Apr 27 21:53:20 ranfan kernel:  [] ? set_init_arg+0x45/0x45
Apr 27 21:53:20 ranfan kernel: Code: 08 5b 5d c3 66 90 f0 0f b3 15 a8
a4 66 c1 a1 a8 a4 66 c1 85 c0 75 3a 31 c0 bb 10 27 00 00 eb 1f 8d b6
00 00 00 00 b8 58 89 41 00  36 aa 21 00 e8 91 ca 09 00 83 eb 01 74
09 a1 a8 a4 66 c1 85
Apr 27 21:52:54 ranfan kernel: NMI backtrace for cpu 1
Apr 27 21:52:54 ranfan kernel: CPU: 1 PID: 0 Comm: swapper/1 Not
tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt25-1
Apr 27 21:52:54 ranfan kernel: Hardware name: Acer AO751h
 /JV11-ML  , BIOS V0.3212 02/26/2010
Apr 27 21:52:54 ranfan kernel: task: f50b9560 ti: f50d task.ti: f50d
Apr 27 21:52:54 ranfan kernel: EIP: 0060:[] EFLAGS: 00200246 CPU: 1
Apr 27 21:52:54 ranfan kernel: EIP is at cpu_startup_entry+0xf2/0x3b0
Apr 27 21:52:54 ranfan kernel: EAX: 0020 EBX: f50d ECX:
0001 EDX: f50d1fec
Apr 27 21:52:54 ranfan kernel: ESI: 0004 EDI: 0001 EBP:
f50d1f90 ESP: f50d1f60
Apr 27 21:52:54 ranfan kernel:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr 27 21:52:54 ranfan