Re: [ANNOUNCE] 3.12.6-rt9
On 01/21/2014 03:17 AM, Steven Rostedt wrote: > Signed-off-by: Steven Rostedt > > diff --git a/kernel/timer.c b/kernel/timer.c > index 46467be..8212c10 100644 > --- a/kernel/timer.c > +++ b/kernel/timer.c > @@ -1464,13 +1464,11 @@ void run_local_timers(void) > raise_softirq(TIMER_SOFTIRQ); > return; > } > - if (!base->active_timers) > - goto out; > > /* Check whether the next pending timer has expired */ > if (time_before_eq(base->next_timer, jiffies)) > raise_softirq(TIMER_SOFTIRQ); Hmmm. If active_timers is 0 and "time_before_eq(base->next_timer, jiffies))" is true than that timer should have been initialized with init_timer_deferrable() or we have a serious bug here where active_timers isn't properly synchronized anymore. Now. If there is really just a deferrable timer that expired and nothing else then this would explain it. > -out: > + > rt_spin_unlock_after_trylock_in_irq(&base->lock); > > } Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Mon, 20 Jan 2014 21:17:36 -0500 Steven Rostedt wrote: > I happen to have a i7 box to test on, and sure enough, the latest > 3.12-rt locks up on boot and reverting the > timers-do-not-raise-softirq-unconditionally.patch, it boots fine. > Signed-off-by: Steven Rostedt > > diff --git a/kernel/timer.c b/kernel/timer.c > index 46467be..8212c10 100644 > --- a/kernel/timer.c > +++ b/kernel/timer.c > @@ -1464,13 +1464,11 @@ void run_local_timers(void) > raise_softirq(TIMER_SOFTIRQ); > return; > } > - if (!base->active_timers) > - goto out; > > /* Check whether the next pending timer has expired */ > if (time_before_eq(base->next_timer, jiffies)) > raise_softirq(TIMER_SOFTIRQ); > -out: > + > rt_spin_unlock_after_trylock_in_irq(&base->lock); > > } This fixes the problem on my i7-2600k. -- Joakim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Tue, Jan 21, 2014 at 01:39:10AM -0500, Muli Baron wrote: > On 21/1/2014 04:17, Steven Rostedt wrote: > > On Sat, 18 Jan 2014 04:15:29 +0100 > > Mike Galbraith wrote: > > > > > >>> So you also have the timers-do-not-raise-softirq-unconditionally.patch? > >> > > > > People have been complaining that the latest 3.12-rt does not boot on > > intel i7 boxes. And by reverting this patch, it boots fine. > > > > I happen to have a i7 box to test on, and sure enough, the latest > > 3.12-rt locks up on boot and reverting the > > timers-do-not-raise-softirq-unconditionally.patch, it boots fine. > > > > Looking into it, I made this small update, and the box boots. Seems > > checking "active_timers" is not enough to skip raising softirqs. I > > haven't looked at why yet, but I would like others to test this patch > > too. > > > > I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-) > > > > -- Steve > > > > Signed-off-by: Steven Rostedt > > > > diff --git a/kernel/timer.c b/kernel/timer.c > > index 46467be..8212c10 100644 > > --- a/kernel/timer.c > > +++ b/kernel/timer.c > > @@ -1464,13 +1464,11 @@ void run_local_timers(void) > > raise_softirq(TIMER_SOFTIRQ); > > return; > > } > > - if (!base->active_timers) > > - goto out; > > > > /* Check whether the next pending timer has expired */ > > if (time_before_eq(base->next_timer, jiffies)) > > raise_softirq(TIMER_SOFTIRQ); > > -out: > > + > > rt_spin_unlock_after_trylock_in_irq(&base->lock); > > > > } > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > While this might fix booting on i7 machines it kinds of defeats the > original purpose of this patch, which was to let NO_HZ_FULL work > properly with threaded interrupts. With the active_timers check removed > the timer interrupt keeps firing even though there is only one task > running on a specific processor, since it can't shut down the tick > because the ksoftirqd thread keeps getting scheduled (see the previous > thread "CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo" for the full > discussion). > > -- Muli Would something like this work? This would get us past boot, which has always been this strange, half initialized thing one has to tiptoe around. - if (!base->active_timers) + if (!base->active_timers && system_state == SYSTEM_RUNNING) Joe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 18 Jan 2014 04:15:29 +0100 Mike Galbraith wrote: > > So you also have the timers-do-not-raise-softirq-unconditionally.patch? > People have been complaining that the latest 3.12-rt does not boot on intel i7 boxes. And by reverting this patch, it boots fine. I happen to have a i7 box to test on, and sure enough, the latest 3.12-rt locks up on boot and reverting the timers-do-not-raise-softirq-unconditionally.patch, it boots fine. Looking into it, I made this small update, and the box boots. Seems checking "active_timers" is not enough to skip raising softirqs. I haven't looked at why yet, but I would like others to test this patch too. I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-) -- Steve Signed-off-by: Steven Rostedt diff --git a/kernel/timer.c b/kernel/timer.c index 46467be..8212c10 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1464,13 +1464,11 @@ void run_local_timers(void) raise_softirq(TIMER_SOFTIRQ); return; } - if (!base->active_timers) - goto out; /* Check whether the next pending timer has expired */ if (time_before_eq(base->next_timer, jiffies)) raise_softirq(TIMER_SOFTIRQ); -out: + rt_spin_unlock_after_trylock_in_irq(&base->lock); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Fri, 2014-01-17 at 18:00 +0100, Sebastian Andrzej Siewior wrote: > * Mike Galbraith | 2013-12-24 16:47:47 [+0100]: > > >I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64 > >core box. I haven't seen RCU grip yet, but I just checked on it after > >3.5 hours into this boot/beat (after fixing crash+kdump setup), and > >found it in the process of dumping. > > So you also have the timers-do-not-raise-softirq-unconditionally.patch? Oh dear, there's holidays, vacation, and massive turkey overdose between then and now, but I'm almost positive that the tree was virgin $subject, with only Paul's patch enabled, that being what I wanted to beat on. > I have a small problem with understanding this… > > |#24 [880273a03cd0] run_timer_softirq at 81069002 > > Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to > init_lists() before the apic timer kicks in. So we have the wait_lock. gdb fibs a little, we're acquiring. >--- --- > >#21 [880273a03b28] apic_timer_interrupt at 815cbf9d > >[exception RIP: _raw_spin_lock+50] > In the hard interrupt triggered by the apic timer we get to > get_next_timer_interrupt() and go again for same the wait_lock. Here we > have the try_lock so we avoid this deadlock. > The odd part: we get the lock. It should be the same lock because both use > | struct tvec_base *base = __this_cpu_read(tvec_bases); > to ge it. And we shouldn't get it because the lock is already hold. > We get into trouble in the unlock path where we spin forever: > > |#14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at > 815c3425 > |#12 [880276803e28] _raw_spin_trylock at 815c3790 > > which releases the lock with a trylock in order to keep lockdep happy. > My understanding was that we should be able to obtain the wait_lock here > since we were able to obtain it in the lock path and in irq off context > there is nothing that could take the lock in the meantime. IIRC, we were endlessly trying, but with an un-punched ticket under us, and no Xen like evilness to save the day. I've since cleaned out my crashdump directory and moved on to frolicking with hotplug gremlins, so don't have that one to revisit, but the don't unconditionally raise timer softirq patch is the bad guy. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
* Mike Galbraith | 2013-12-24 16:47:47 [+0100]: >I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64 >core box. I haven't seen RCU grip yet, but I just checked on it after >3.5 hours into this boot/beat (after fixing crash+kdump setup), and >found it in the process of dumping. So you also have the timers-do-not-raise-softirq-unconditionally.patch? >crash> bt >PID: 508TASK: 8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16" > #0 [880276806a40] machine_kexec at 8103bc07 > #1 [880276806aa0] crash_kexec at 810d56b3 > #2 [880276806b70] panic at 815bf8b0 > #3 [880276806bf0] watchdog_overflow_callback at 810fed3d > #4 [880276806c10] __perf_event_overflow at 81131928 > #5 [880276806ca0] perf_event_overflow at 81132254 > #6 [880276806cb0] intel_pmu_handle_irq at 8102078f > #7 [880276806de0] perf_event_nmi_handler at 815c5825 > #8 [880276806e10] nmi_handle at 815c4ed3 > #9 [880276806ea0] default_do_nmi at 815c5063 >#10 [880276806ed0] do_nmi at 815c5388 >#11 [880276806ef0] end_repeat_nmi at 815c4371 >[exception RIP: _raw_spin_trylock+48] >RIP: 815c3790 RSP: 880276803e28 RFLAGS: 0002 >RAX: 0010 RBX: 0010 RCX: 0002 >RDX: 880276803e28 RSI: 0018 RDI: 0001 >RBP: 815c3790 R8: 815c3790 R9: 0018 >R10: 880276803e28 R11: 0002 R12: >R13: 880273a0c000 R14: 8802739ba340 R15: 880273a03fd8 >ORIG_RAX: 880273a03fd8 CS: 0010 SS: 0018 >--- --- >#12 [880276803e28] _raw_spin_trylock at 815c3790 >#13 [880276803e30] rt_spin_lock_slowunlock_hirq at 815c2cc8 >#14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at 815c3425 >#15 [880276803e60] get_next_timer_interrupt at 810684a7 >#16 [880276803ed0] tick_nohz_stop_sched_tick at 810c5f2e >#17 [880276803f50] tick_nohz_irq_exit at 810c6333 >#18 [880276803f70] irq_exit at 81060065 >#19 [880276803f90] smp_apic_timer_interrupt at 810358f5 >#20 [880276803fb0] apic_timer_interrupt at 815cbf9d >--- --- >#21 [880273a03b28] apic_timer_interrupt at 815cbf9d >[exception RIP: _raw_spin_lock+50] >RIP: 815c3642 RSP: 880273a03bd8 RFLAGS: 0202 >RAX: 8b49 RBX: 880272157290 RCX: 8802739ba340 >RDX: 8b4a RSI: 0010 RDI: 880273a0c000 >RBP: 880273a03bd8 R8: 0001 R9: >R10: R11: 0001 R12: 810927b5 >R13: 880273a03b68 R14: 0010 R15: 0010 >ORIG_RAX: ff10 CS: 0010 SS: 0018 >#22 [880273a03be0] rt_spin_lock_slowlock at 815c2591 >#23 [880273a03cc0] rt_spin_lock at 815c3362 >#24 [880273a03cd0] run_timer_softirq at 81069002 >#25 [880273a03d70] handle_softirq at 81060d0f >#26 [880273a03db0] do_current_softirqs at 81060f3c >#27 [880273a03e20] run_ksoftirqd at 81061045 >#28 [880273a03e40] smpboot_thread_fn at 81089c31 >#29 [880273a03ec0] kthread at 810807fe >#30 [880273a03f50] ret_from_fork at 815cb28c >crash> gdb list *0x815c2591 >0x815c2591 is in rt_spin_lock_slowlock (kernel/rtmutex.c:109). >104 } >105 #endif >106 >107 static inline void init_lists(struct rt_mutex *lock) >108 { >109 if (unlikely(!lock->wait_list.node_list.prev)) >110 plist_head_init(&lock->wait_list); >111 } >112 >113 /* >crash> gdb list *0x815c2590 >0x815c2590 is in rt_spin_lock_slowlock (kernel/rtmutex.c:744). >739 struct rt_mutex_waiter waiter, *top_waiter; >740 int ret; >741 >742 rt_mutex_init_waiter(&waiter, true); >743 >744 raw_spin_lock(&lock->wait_lock); >745 init_lists(lock); >746 >747 if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) { >748 raw_spin_unlock(&lock->wait_lock); >crash> gdb list *0x815c2cc8 >0x815c2cc8 is in rt_spin_lock_slowunlock_hirq (kernel/rtmutex.c:851). >846 { >847 int ret; >848 >849 do { >850 ret = raw_spin_trylock(&lock->wait_lock); >851 } while (!ret); >852 >853 __rt_spin_lock_slowunlock(lock); >854 } >855 > >Dang, Santa might have delivered a lock pick set in a few more hours. I have a small problem with understanding this… |#24 [880273a03cd0] run_timer_softirq at 81069002 Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to init_lists() before the apic timer kicks in.
Re: [ANNOUNCE] 3.12.6-rt9
* Nicholas Mc Guire | 2013-12-27 21:00:24 [+0100]: >> - A patch from Thomas Gleixner not to raise the timer softirq >> unconditionally (only if a timer is pending) >> > >This one seems to deadlock early in the boot sequence on x86 >(i3/i7/Phenom-4x here and Carsten Emde also had boot failures) > >after droping this patch with: >patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch >3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before >(only ran for a few minutes idle and 1h with load on an i3). > >The main problem with this patch though are proceduaral isues >the commit note - which is a mail exchange - actually does not explain what >the rational for the changes is (...well I don't understand the logic of >run_local_timers - if someone can explain - pleas do) and notably: > >from timers-do-not-raise-softirq-unconditionally.patch > >well, that very same problem is in mainline if you add "threadirqs" to >the command line. But we can be smart about this. The untested patch >^^ >below should address that issue. If that works on mainline we can >adapt it for RT (needs a trylock(&base->lock) there). > > > does make me wonder why this went into -rt9 ? It was on the mailing list for a few weeks. My understanding was that Mike Galbraith tested it on mainline and then I added the RT specific pieces and added it it to the tree. > It also build fails with CONFIG_PREEMPT_RT_FULL not set. I will add a non-RT based config to my compile tests. > as with this patch, systems that booted just fine with 3.12.5-rt7 don't > even boot (atleast my 3 x86 test boxes here did not) this raises some > questions regarding the process of getting patches into -rtX - are > we going to fast here ? > > I would prefere if such patches would go out with a request for testing > or atleast a "might blow up your system" note in them... I didn't expect that much trouble. In general I try to avoid adding explosives unless marked as such. >thx! >hofrat Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Fri, 27 Dec 2013 21:00:24 +0100 Nicholas Mc Guire wrote: > On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote: > > > Dear RT folks! > > > > I'm pleased to announce the v3.12.6-rt9 patch set. > > > > Changes since v3.12.6-rt8 > > > - A patch from Thomas Gleixner not to raise the timer softirq > > unconditionally (only if a timer is pending) > > > > This one seems to deadlock early in the boot sequence on x86 > (i3/i7/Phenom-4x here and Carsten Emde also had boot failures) This patch seems to frequently make the kernel hang hard early in the boot process on my i7-2600k too. Reverting timers-do-not-raise-softirq-unconditionally.patch appears to fix the problem. -- Joakim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 2013-12-28 at 08:43 +0100, Nicholas Mc Guire wrote: > This type of blowups will not help to go mainline (refereing to 3.12.X here, > 3.4/6/8/10 is a different story). Nah. Breakage is a vital sign. When breakage stops, bury it. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 28 Dec 2013, Mike Galbraith wrote: > On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote: > > > (Less than wonderful changelogs probably comes from the fact that > > maintaining -rt out of tree is time consuming as all hell. Everybody > > gets to breaks it, a couple guys get to fix it up again and again.) > > P.S. try rolling your tree forward to master or tip for entertainment, > you'll see what I mean. Hi Peter, Rik.. other breakers of worlds :) > protesting exernal breakage by ameding -rt with home-made landmines does sound like an optimized entertainment strategy... This type of blowups will not help to go mainline (refereing to 3.12.X here, 3.4/6/8/10 is a different story). thx! hofrat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote: > Watchdog barked at two such spots.. btw, lockdep doesn't grumble about that (didn't stare at annotation, don't speak lockdep well). I fixed it up to not take it's toys and go home in a snit at boot (rt_mutex debug offends it methinks), but it didn't gripe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote: > (Less than wonderful changelogs probably comes from the fact that > maintaining -rt out of tree is time consuming as all hell. Everybody > gets to breaks it, a couple guys get to fix it up again and again.) P.S. try rolling your tree forward to master or tip for entertainment, you'll see what I mean. Hi Peter, Rik.. other breakers of worlds :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Fri, 2013-12-27 at 21:00 +0100, Nicholas Mc Guire wrote: > On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote: > > > Dear RT folks! > > > > I'm pleased to announce the v3.12.6-rt9 patch set. > > > > Changes since v3.12.6-rt8 > > > - A patch from Thomas Gleixner not to raise the timer softirq > > unconditionally (only if a timer is pending) > > > > This one seems to deadlock early in the boot sequence on x86 > (i3/i7/Phenom-4x here and Carsten Emde also had boot failures) > > after droping this patch with: > patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch > 3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before > (only ran for a few minutes idle and 1h with load on an i3). > > The main problem with this patch though are proceduaral isues > the commit note - which is a mail exchange - actually does not explain what > the rational for the changes is Raising the timer softirq unconditionally wakes ksoftirqd at every tick, so the only time the no_hz_full "one and only one task is runnable" tick shutdown criteria can be met is when the box has zero other runnable tasks.. i.e. when box is idle. Here, patch works fine boot wise, and no_hz_full tick shutdown works as well, but there are a couple spots where taking an interrupt is a bad idea as things sit. Watchdog barked at two such spots, and there's a "you _will_ hit this warning in -rt" spot as well. With bandaids on the sore spots, my 64 core box survives. -Mike (Less than wonderful changelogs probably comes from the fact that maintaining -rt out of tree is time consuming as all hell. Everybody gets to breaks it, a couple guys get to fix it up again and again.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote: > Dear RT folks! > > I'm pleased to announce the v3.12.6-rt9 patch set. > > Changes since v3.12.6-rt8 > - A patch from Thomas Gleixner not to raise the timer softirq > unconditionally (only if a timer is pending) > This one seems to deadlock early in the boot sequence on x86 (i3/i7/Phenom-4x here and Carsten Emde also had boot failures) after droping this patch with: patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch 3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before (only ran for a few minutes idle and 1h with load on an i3). The main problem with this patch though are proceduaral isues the commit note - which is a mail exchange - actually does not explain what the rational for the changes is (...well I don't understand the logic of run_local_timers - if someone can explain - pleas do) and notably: from timers-do-not-raise-softirq-unconditionally.patch well, that very same problem is in mainline if you add "threadirqs" to the command line. But we can be smart about this. The untested patch ^^ below should address that issue. If that works on mainline we can adapt it for RT (needs a trylock(&base->lock) there). does make me wonder why this went into -rt9 ? It also build fails with CONFIG_PREEMPT_RT_FULL not set. as with this patch, systems that booted just fine with 3.12.5-rt7 don't even boot (atleast my 3 x86 test boxes here did not) this raises some questions regarding the process of getting patches into -rtX - are we going to fast here ? I would prefere if such patches would go out with a request for testing or atleast a "might blow up your system" note in them... thx! hofrat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
On Tue, 2013-12-24 at 20:39 +0400, Pavel Vasilyev wrote: > 24.12.2013 19:47, Mike Galbraith пишет: > > On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: > > > crash> bt > > PID: 508TASK: 8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16" > > YES!!! And ARM code broke :) And NO_HZ_TICK config survived for only 4.5 hours. PID: 6948 TASK: 880272d1f1c0 CPU: 29 COMMAND: "tbench" #0 [8802769a6a40] machine_kexec at 8103bc07 #1 [8802769a6aa0] crash_kexec at 810d3e93 #2 [8802769a6b70] panic at 815bce70 #3 [8802769a6bf0] watchdog_overflow_callback at 810fd51d #4 [8802769a6c10] __perf_event_overflow at 8112f1f8 #5 [8802769a6ca0] perf_event_overflow at 8112fb14 #6 [8802769a6cb0] intel_pmu_handle_irq at 8102078f #7 [8802769a6de0] perf_event_nmi_handler at 815c2de5 #8 [8802769a6e10] nmi_handle at 815c2493 #9 [8802769a6ea0] default_do_nmi at 815c2623 #10 [8802769a6ed0] do_nmi at 815c2948 #11 [8802769a6ef0] end_repeat_nmi at 815c1931 [exception RIP: preempt_schedule+36] RIP: 815be944 RSP: 8802769a3d98 RFLAGS: 0002 RAX: 0010 RBX: 0010 RCX: 0002 RDX: 8802769a3d98 RSI: 0018 RDI: 0001 RBP: 815be944 R8: 815be944 R9: 0018 R10: 8802769a3d98 R11: 0002 R12: R13: 880273f74000 R14: 880272d1f1c0 R15: 880269cedfd8 ORIG_RAX: 880269cedfd8 CS: 0010 SS: 0018 --- --- #12 [8802769a3d98] preempt_schedule at 815be944 #13 [8802769a3db0] _raw_spin_trylock at 815c0d6e #14 [8802769a3dc0] rt_spin_lock_slowunlock_hirq at 815c0288 #15 [8802769a3de0] rt_spin_unlock_after_trylock_in_irq at 815c09e5 #16 [8802769a3df0] run_local_timers at 81068025 #17 [8802769a3e10] update_process_times at 810680ac #18 [8802769a3e40] tick_sched_handle at 810c3a92 #19 [8802769a3e60] tick_sched_timer at 810c3d2f #20 [8802769a3e90] __run_hrtimer at 8108471d #21 [8802769a3ed0] hrtimer_interrupt at 8108497a #22 [8802769a3f70] local_apic_timer_interrupt at 810349e6 #23 [8802769a3f90] smp_apic_timer_interrupt at 810358ee #24 [8802769a3fb0] apic_timer_interrupt at 815c955d --- --- #25 [880269ced848] apic_timer_interrupt at 815c955d [exception RIP: _raw_spin_lock+53] RIP: 815c0c05 RSP: 880269ced8f8 RFLAGS: 0202 RAX: 0b7b RBX: 0282 RCX: 880272d1f1c0 RDX: 0b7d RSI: 880269ceda38 RDI: 880273f74000 RBP: 880269ced8f8 R8: 0001 R9: b54d13a4 R10: 0001 R11: 0001 R12: 880269ced910 R13: 880276d32170 R14: 810c9030 R15: 880269ced8b8 ORIG_RAX: ff10 CS: 0010 SS: 0018 #26 [880269ced900] rt_spin_lock_slowlock at 815bfb51 #27 [880269ced9e0] rt_spin_lock at 815c0922 #28 [880269ced9f0] lock_timer_base at 81067f92 #29 [880269ceda20] mod_timer at 81069bcb #30 [880269ceda70] sk_reset_timer at 814d1e57 #31 [880269ceda90] inet_csk_reset_xmit_timer at 8152d4a8 #32 [880269cedac0] tcp_rearm_rto at 8152d583 #33 [880269cedae0] tcp_ack at 81534085 #34 [880269cedb60] tcp_rcv_established at 8153443d #35 [880269cedbb0] tcp_v4_do_rcv at 8153f56a #36 [880269cedbe0] __release_sock at 814d3891 #37 [880269cedc10] release_sock at 814d3942 #38 [880269cedc30] tcp_sendmsg at 8152b955 #39 [880269cedd00] inet_sendmsg at 8155350e #40 [880269cedd30] sock_sendmsg at 814cea87 #41 [880269cede40] sys_sendto at 814cebdf #42 [880269cedf80] tracesys at 815c8b09 (via system_call) RIP: 7f0441a1fc35 RSP: 7fffdea86130 RFLAGS: 0246 RAX: ffda RBX: 815c8b09 RCX: RDX: 248d RSI: 00607260 RDI: 0004 RBP: 248d R8: R9: R10: R11: 0246 R12: 7fffdea86a10 R13: 7fffdea86414 R14: 0004 R15: 00607260 ORIG_RAX: 002c CS: 0033 SS: 002b -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.12.6-rt9
24.12.2013 19:47, Mike Galbraith пишет: > On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: > crash> bt > PID: 508TASK: 8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16" YES!!! And ARM code broke :) -- Pavel. signature.asc Description: OpenPGP digital signature
Re: [ANNOUNCE] 3.12.6-rt9
On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: > Dear RT folks! > > I'm pleased to announce the v3.12.6-rt9 patch set. > > Changes since v3.12.6-rt8 > - ARM's mach-sti is now using rawlock as boot_lock (like the other > mach-*) > - There was a callpath to rcu_preempt_qs() with interrupts enabled. Tiejun > Chen posted a patch to call it with interrupt disabled like we always > do. > - A patch from Paul E. McKenney to not activate RCU core on NO_HZ_FULL > CPUs > - A patch from Thomas Gleixner not to raise the timer softirq > unconditionally (only if a timer is pending) > > > There is also a patch in the queue from Paul E. McKenney to move RCU > processing from softirq into its own thread. After Mike Galbraith > reported a few RCU stalls I decided to keep it disabled for now until I > have some time to look at it. I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64 core box. I haven't seen RCU grip yet, but I just checked on it after 3.5 hours into this boot/beat (after fixing crash+kdump setup), and found it in the process of dumping. crash> bt PID: 508TASK: 8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16" #0 [880276806a40] machine_kexec at 8103bc07 #1 [880276806aa0] crash_kexec at 810d56b3 #2 [880276806b70] panic at 815bf8b0 #3 [880276806bf0] watchdog_overflow_callback at 810fed3d #4 [880276806c10] __perf_event_overflow at 81131928 #5 [880276806ca0] perf_event_overflow at 81132254 #6 [880276806cb0] intel_pmu_handle_irq at 8102078f #7 [880276806de0] perf_event_nmi_handler at 815c5825 #8 [880276806e10] nmi_handle at 815c4ed3 #9 [880276806ea0] default_do_nmi at 815c5063 #10 [880276806ed0] do_nmi at 815c5388 #11 [880276806ef0] end_repeat_nmi at 815c4371 [exception RIP: _raw_spin_trylock+48] RIP: 815c3790 RSP: 880276803e28 RFLAGS: 0002 RAX: 0010 RBX: 0010 RCX: 0002 RDX: 880276803e28 RSI: 0018 RDI: 0001 RBP: 815c3790 R8: 815c3790 R9: 0018 R10: 880276803e28 R11: 0002 R12: R13: 880273a0c000 R14: 8802739ba340 R15: 880273a03fd8 ORIG_RAX: 880273a03fd8 CS: 0010 SS: 0018 --- --- #12 [880276803e28] _raw_spin_trylock at 815c3790 #13 [880276803e30] rt_spin_lock_slowunlock_hirq at 815c2cc8 #14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at 815c3425 #15 [880276803e60] get_next_timer_interrupt at 810684a7 #16 [880276803ed0] tick_nohz_stop_sched_tick at 810c5f2e #17 [880276803f50] tick_nohz_irq_exit at 810c6333 #18 [880276803f70] irq_exit at 81060065 #19 [880276803f90] smp_apic_timer_interrupt at 810358f5 #20 [880276803fb0] apic_timer_interrupt at 815cbf9d --- --- #21 [880273a03b28] apic_timer_interrupt at 815cbf9d [exception RIP: _raw_spin_lock+50] RIP: 815c3642 RSP: 880273a03bd8 RFLAGS: 0202 RAX: 8b49 RBX: 880272157290 RCX: 8802739ba340 RDX: 8b4a RSI: 0010 RDI: 880273a0c000 RBP: 880273a03bd8 R8: 0001 R9: R10: R11: 0001 R12: 810927b5 R13: 880273a03b68 R14: 0010 R15: 0010 ORIG_RAX: ff10 CS: 0010 SS: 0018 #22 [880273a03be0] rt_spin_lock_slowlock at 815c2591 #23 [880273a03cc0] rt_spin_lock at 815c3362 #24 [880273a03cd0] run_timer_softirq at 81069002 #25 [880273a03d70] handle_softirq at 81060d0f #26 [880273a03db0] do_current_softirqs at 81060f3c #27 [880273a03e20] run_ksoftirqd at 81061045 #28 [880273a03e40] smpboot_thread_fn at 81089c31 #29 [880273a03ec0] kthread at 810807fe #30 [880273a03f50] ret_from_fork at 815cb28c crash> gdb list *0x815c2591 0x815c2591 is in rt_spin_lock_slowlock (kernel/rtmutex.c:109). 104 } 105 #endif 106 107 static inline void init_lists(struct rt_mutex *lock) 108 { 109 if (unlikely(!lock->wait_list.node_list.prev)) 110 plist_head_init(&lock->wait_list); 111 } 112 113 /* crash> gdb list *0x815c2590 0x815c2590 is in rt_spin_lock_slowlock (kernel/rtmutex.c:744). 739 struct rt_mutex_waiter waiter, *top_waiter; 740 int ret; 741 742 rt_mutex_init_waiter(&waiter, true); 743 744 raw_spin_lock(&lock->wait_lock); 745 init_lists(lock); 746 747 if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) { 748 raw_spin_unlock(&lock->wait_lock); c