Bug#822936: INFO: rcu_sched self-detected stall on CPU
On Fri, 2016-04-29 at 03:10 -0400, Jiann-Ming Su wrote: > Package: linux-image-3.16.0-4-686-pae > Version: 3.16.7-ckt25-1 > > The netbook doesn't crash, but the entire networking stack freezes up. > When I press a key on the keyboard, the backlight comes on, and > everything is functional again. > > Apr 27 21:53:20 ranfan kernel: INFO: rcu_sched self-detected stall on CPU { > 0} (t=6590 jiffies g=23325 c=23324 q=2) > Apr 27 21:53:20 ranfan kernel: sending NMI to all CPUs: > Apr 27 21:53:20 ranfan kernel: NMI backtrace for cpu 0 [...] > Apr 27 21:52:54 ranfan kernel: NMI backtrace for cpu 1 Notice how the log time goes backward by 26 seconds, which corresponds closely to the stall time of 6590 jiffies. The syslog daemon generates the timestamps, not the kernel, but presumably it has been woken up once on each of the CPUs as they started to write a call trace. [...] > Apr 28 03:19:37 ranfan kernel: INFO: rcu_sched self-detected stall on CPU { > 0} (t=67279 jiffies g=88895 c=88894 q=1) > Apr 28 03:19:37 ranfan kernel: sending NMI to all CPUs: > Apr 28 03:19:37 ranfan kernel: NMI backtrace for cpu 0 [...] > Apr 28 03:15:08 ranfan kernel: NMI backtrace for cpu 1 The log time goes backward by 269 seconds, which again corresponds closely to the stall time of 67279 jiffies. After that long a stall, the soft-lockup watchdog should also have fired and logged errror messages, but we don't see them. In the first two cases the CPU that didn't detect the stall is in cpu_startup_entry which implies it's just coming online. But in the last case, both CPUs seem to be in a more normal state. Also there's no sign of clock skew in the log, though it doesn't mean there isn't a skew. Ben. -- Ben Hutchings Life is what happens to you while you're busy making other plans. - John Lennon signature.asc Description: This is a digitally signed message part
Bug#822936: INFO: rcu_sched self-detected stall on CPU
Package: linux-image-3.16.0-4-686-pae Version: 3.16.7-ckt25-1 The netbook doesn't crash, but the entire networking stack freezes up. When I press a key on the keyboard, the backlight comes on, and everything is functional again. Apr 27 21:53:20 ranfan kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6590 jiffies g=23325 c=23324 q=2) Apr 27 21:53:20 ranfan kernel: sending NMI to all CPUs: Apr 27 21:53:20 ranfan kernel: NMI backtrace for cpu 0 Apr 27 21:53:20 ranfan kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt25-1 Apr 27 21:53:20 ranfan kernel: Hardware name: Acer AO751h /JV11-ML , BIOS V0.3212 02/26/2010 Apr 27 21:53:20 ranfan kernel: task: c15fca00 ti: c15ee000 task.ti: c15ee000 Apr 27 21:53:20 ranfan kernel: EIP: 0060:[] EFLAGS: 0026 CPU: 0 Apr 27 21:53:20 ranfan kernel: EIP is at arch_trigger_all_cpu_backtrace+0x95/0xd0 Apr 27 21:53:20 ranfan kernel: EAX: 00418958 EBX: 2710 ECX: f000 EDX: f000 Apr 27 21:53:20 ranfan kernel: ESI: c1614f80 EDI: f75d9620 EBP: f5007e40 ESP: f5007e34 Apr 27 21:53:20 ranfan kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Apr 27 21:53:20 ranfan kernel: CR0: 8005003b CR2: b7765000 CR3: 0171e000 CR4: 07f0 Apr 27 21:53:20 ranfan kernel: Stack: Apr 27 21:53:20 ranfan kernel: c1548401 c157a0ce c1614f80 f5007e88 c10ac95d c1557924 19be 5b1d Apr 27 21:53:20 ranfan kernel: 5b1c 0002 f5007e88 c1084790 0001 c166a5ec c1614f80 Apr 27 21:53:20 ranfan kernel: f75d9620 c15fca00 f5007e9c c1062dac f5009f7c Apr 27 21:53:20 ranfan kernel: Call Trace: Apr 27 21:53:20 ranfan kernel: [] ? rcu_check_callbacks+0x38d/0x5b0 Apr 27 21:53:20 ranfan kernel: [] ? account_process_tick+0x60/0x130 Apr 27 21:53:20 ranfan kernel: [] ? update_process_times+0x3c/0x60 Apr 27 21:53:20 ranfan kernel: [] ? tick_sched_handle.isra.13+0x26/0x60 Apr 27 21:53:20 ranfan kernel: [] ? tick_sched_timer+0x37/0x70 Apr 27 21:53:20 ranfan kernel: [] ? __remove_hrtimer+0x38/0x90 Apr 27 21:53:20 ranfan kernel: [] ? __run_hrtimer+0x6d/0x190 Apr 27 21:53:20 ranfan kernel: [] ? tick_sched_handle.isra.13+0x60/0x60 Apr 27 21:53:20 ranfan kernel: [] ? hrtimer_interrupt+0x1e8/0x2a0 Apr 27 21:53:20 ranfan kernel: [] ? tick_do_broadcast.constprop.7+0x6b/0x70 Apr 27 21:53:20 ranfan kernel: [] ? tick_handle_oneshot_broadcast+0x105/0x180 Apr 27 21:53:20 ranfan kernel: [] ? add_interrupt_randomness+0x15a/0x1a0 Apr 27 21:53:20 ranfan kernel: [] ? timer_interrupt+0x12/0x20 Apr 27 21:53:20 ranfan kernel: [] ? handle_irq_event_percpu+0x35/0x180 Apr 27 21:53:20 ranfan kernel: [] ? handle_irq_event+0x2a/0x50 Apr 27 21:53:20 ranfan kernel: [] ? handle_simple_irq+0x70/0x70 Apr 27 21:53:20 ranfan kernel: [] ? handle_edge_irq+0x66/0x100 Apr 27 21:53:20 ranfan kernel: [] ? handle_irq+0x71/0x90 Apr 27 21:53:20 ranfan kernel: Apr 27 21:53:20 ranfan kernel: Apr 27 21:53:20 ranfan kernel: [] ? do_IRQ+0x3c/0xd0 Apr 27 21:53:20 ranfan kernel: [] ? rebalance_domains+0x14e/0x250 Apr 27 21:53:20 ranfan kernel: [] ? common_interrupt+0x33/0x38 Apr 27 21:53:20 ranfan kernel: [] ? psb_intel_sdvo_read_response+0xcb/0x200 [gma500_gfx] Apr 27 21:53:20 ranfan kernel: [] ? __do_softirq+0x6d/0x230 Apr 27 21:53:20 ranfan kernel: [] ? cpu_callback+0x160/0x160 Apr 27 21:53:20 ranfan kernel: [] ? do_softirq_own_stack+0x22/0x30 Apr 27 21:53:20 ranfan kernel: Apr 27 21:53:20 ranfan kernel: Apr 27 21:53:20 ranfan kernel: [] ? irq_exit+0x8d/0xa0 Apr 27 21:53:20 ranfan kernel: [] ? do_IRQ+0x45/0xd0 Apr 27 21:53:20 ranfan kernel: [] ? tick_broadcast_oneshot_control+0x7b/0x210 Apr 27 21:53:20 ranfan kernel: [] ? common_interrupt+0x33/0x38 Apr 27 21:53:20 ranfan kernel: [] ? cpuidle_enter_state+0x3e/0xd0 Apr 27 21:53:20 ranfan kernel: [] ? cpu_startup_entry+0x2be/0x3b0 Apr 27 21:53:20 ranfan kernel: [] ? start_kernel+0x3dd/0x3e2 Apr 27 21:53:20 ranfan kernel: [] ? set_init_arg+0x45/0x45 Apr 27 21:53:20 ranfan kernel: Code: 08 5b 5d c3 66 90 f0 0f b3 15 a8 a4 66 c1 a1 a8 a4 66 c1 85 c0 75 3a 31 c0 bb 10 27 00 00 eb 1f 8d b6 00 00 00 00 b8 58 89 41 00 36 aa 21 00 e8 91 ca 09 00 83 eb 01 74 09 a1 a8 a4 66 c1 85 Apr 27 21:52:54 ranfan kernel: NMI backtrace for cpu 1 Apr 27 21:52:54 ranfan kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt25-1 Apr 27 21:52:54 ranfan kernel: Hardware name: Acer AO751h /JV11-ML , BIOS V0.3212 02/26/2010 Apr 27 21:52:54 ranfan kernel: task: f50b9560 ti: f50d task.ti: f50d Apr 27 21:52:54 ranfan kernel: EIP: 0060:[] EFLAGS: 00200246 CPU: 1 Apr 27 21:52:54 ranfan kernel: EIP is at cpu_startup_entry+0xf2/0x3b0 Apr 27 21:52:54 ranfan kernel: EAX: 0020 EBX: f50d ECX: 0001 EDX: f50d1fec Apr 27 21:52:54 ranfan kernel: ESI: 0004 EDI: 0001 EBP: f50d1f90 ESP: f50d1f60 Apr 27 21:52:54 ranfan kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Apr 27 21:52:54 ranfan