Re: frequent lockups in 3.18rc4

Frederic Weisbecker Thu, 04 Dec 2014 08:52:38 -0800

On Thu, Dec 04, 2014 at 08:18:10AM -0800, Linus Torvalds wrote:
> On Thu, Dec 4, 2014 at 12:43 AM, Dâniel Fraga <[email protected]> wrote:
> >
> >         Linus, today it's your lucky day, because I think I found the
> > real bad commit (if it isn't, then it's some very close to it). I
> > managed to narrow the bisect and here's the result:
> 
> Ok, that actually looks very reasonable, I had actually looked at it
> because of the whole "changes IPI" thing.
> 
> One more thing to try: does a revert fix it on current git?
> 
> It doesn't revert entirely cleanly, but close enough - attached a
> quick rough patch that may or may not work, but looks like a good
> revert.
> 
> Dave - this might be worth testing for you too, exactly because of
> that whole "it changes how we do IPI's". It was your bug report with
> TLB IPI's that made me look at that commit originally.


I think this is a different issue. What Daniel reported is:

Dec  4 06:03:41 tux kernel: [  737.180761]  [<ffffffff810637ca>] 
hrtimer_cancel+0x1a/0x30
Dec  4 06:03:41 tux kernel: [  737.180766]  [<ffffffff81097842>] 
tick_nohz_restart+0x12/0x80
Dec  4 06:03:41 tux kernel: [  737.180769]  [<ffffffff81097c4f>] 
__tick_nohz_full_check+0x9f/0xb0
Dec  4 06:03:41 tux kernel: [  737.180771]  [<ffffffff81097c69>] 
nohz_full_kick_work_func+0x9/0x10
Dec  4 06:03:41 tux kernel: [  737.180774]  [<ffffffff810aecd4>] 
irq_work_run_list+0x44/0x70
Dec  4 06:03:41 tux kernel: [  737.180777]  [<ffffffff81097730>] ? 
tick_sched_handle.isra.20+0x40/0x40
Dec  4 06:03:41 tux kernel: [  737.180779]  [<ffffffff810aed19>] 
__irq_work_run+0x19/0x30
Dec  4 06:03:41 tux kernel: [  737.180782]  [<ffffffff810aed98>] 
irq_work_run+0x18/0x40
Dec  4 06:03:41 tux kernel: [  737.180784]  [<ffffffff8104deb6>] 
update_process_times+0x56/0x70
Dec  4 06:03:41 tux kernel: [  737.180786]  [<ffffffff81097721>] 
tick_sched_handle.isra.20+0x31/0x40
Dec  4 06:03:42 tux kernel: [  737.180788]  [<ffffffff81097769>] 
tick_sched_timer+0x39/0x60
Dec  4 06:03:42 tux kernel: [  737.180790]  [<ffffffff810636a1>] 
__run_hrtimer.isra.33+0x41/0xd0
Dec  4 06:03:42 tux kernel: [  737.180792]  [<ffffffff81063a4f>] 
hrtimer_interrupt+0xef/0x250
Dec  4 06:03:42 tux kernel: [  737.180795]  [<ffffffff8102db65>] 
local_apic_timer_interrupt+0x35/0x60
Dec  4 06:03:42 tux kernel: [  737.180797]  [<ffffffff8102e12a>] 
smp_apic_timer_interrupt+0x3a/0x50
Dec  4 06:03:42 tux kernel: [  737.180799]  [<ffffffff81391a3a>] 
apic_timer_interrupt+0x6a/0x70

And this bug has been fixed upstream with:

     _ nohz: nohz full depends on irq work self IPI support
     _ x86: Tell irq work about self IPI support
     _ irq_work: Force raised irq work to run on irq work interrupt
     _ nohz: Move nohz full init call to tick init

These patches have been backported to stable as well.

I suspect Daniel rewinded far enough to fall on that old bug.

Daniel, did you see the above very stacktrace in latest upstream too? Or was it
a different one?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: frequent lockups in 3.18rc4

Reply via email to