On Mon, Oct 15, 2018 at 10:58:54PM +0200, Heiner Kallweit wrote: > On 28.09.2018 15:18, Frederic Weisbecker wrote: > > On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote: > >> On Tue, 28 Aug 2018, Frederic Weisbecker wrote: > >>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote: > >>>> I tested it and Frederic is right, it doesn't help. Can it be somehow > >>>> related to > >>>> the cpu being brought down during suspend? Because I get the warning > >>>> only during > >>>> suspend when the cpu is inactive already (but still online). > >>> > >>> It's hard to tell, I haven't been able to reproduce on suspend to > >>> disk/mem. > >>> > >>> Does this script eventually trigger it after some time? > >> > >> Any update to this? > > > > Heiner? Can you please test the script I sent to you? > > > > Thanks. > > > Sorry, took some time .. And yes, running your script triggers the message > too. > > [ 25.646015] x86: Booting SMP configuration: > [ 25.646044] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.664491] smpboot: CPU 1 is now offline > [ 25.679299] x86: Booting SMP configuration: > [ 25.679329] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.698449] smpboot: CPU 1 is now offline > [ 25.711698] x86: Booting SMP configuration: > [ 25.711727] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.729185] NOHZ: local_softirq_pending 202 > [ 25.729229] NOHZ: local_softirq_pending 202 > [ 25.730759] smpboot: CPU 1 is now offline > [ 25.744053] x86: Booting SMP configuration: > [ 25.744083] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.762520] smpboot: CPU 1 is now offline > [ 25.776834] x86: Booting SMP configuration: > [ 25.776863] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.794189] NOHZ: local_softirq_pending 202 > [ 25.796672] smpboot: CPU 1 is now offline > [ 25.805970] x86: Booting SMP configuration: > [ 25.805999] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.827360] smpboot: CPU 1 is now offline > [ 25.839043] x86: Booting SMP configuration: > [ 25.839073] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.858184] NOHZ: local_softirq_pending 202 > [ 25.862182] smpboot: CPU 1 is now offline > [ 25.873759] x86: Booting SMP configuration: > [ 25.873789] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 25.893385] smpboot: CPU 1 is now offline >
Sorry, I got sidetracked and almost forgot about it. So this is triggered by CPU hotplug. At some point the CPU has an opportunity to go idle and for some reason the timer softirq is still pending. We need to know which timer this is about and why the timer softirq keeps pending. I'm going to need your help again. Can you please run the following (possibly repeat until it triggers the bug) ? echo 1 > /sys/devices/system/cpu/cpu1/online # Pause and reset tracing echo 0 > /sys/kernel/debug/tracing/tracing_on echo > /sys/kernel/debug/tracing/trace # Turn on relevant events echo 1 > /sys/kernel/debug/tracing/events/timer/timer_*/enable echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_*/enable echo 1 > /sys/kernel/debug/tracing/tracing_on # Trigger echo 0 > /sys/devices/system/cpu/cpu1/online echo 0 > /sys/kernel/debug/tracing/tracing_on And please apply the following patch before. With all that I'll have the relevant informations stored in /sys/kernel/debug/tracing/per_cpu/cpu1/trace Please send its content to me. Thanks! diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 69e673b..0e57a3b 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -892,6 +892,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { pr_warn("NOHZ: local_softirq_pending %02x\n", (unsigned int) local_softirq_pending()); + trace_dump_stack(0); ratelimit++; } return false;