Re: [ANNOUNCE] 3.12.6-rt9

2014-01-24 Thread Sebastian Andrzej Siewior
On 01/21/2014 03:17 AM, Steven Rostedt wrote:
> Signed-off-by: Steven Rostedt 
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 46467be..8212c10 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1464,13 +1464,11 @@ void run_local_timers(void)
>   raise_softirq(TIMER_SOFTIRQ);
>   return;
>   }
> - if (!base->active_timers)
> - goto out;
>  
>   /* Check whether the next pending timer has expired */
>   if (time_before_eq(base->next_timer, jiffies))
>   raise_softirq(TIMER_SOFTIRQ);

Hmmm. If active_timers is 0 and "time_before_eq(base->next_timer,
jiffies))" is true than that timer should have been initialized with
init_timer_deferrable() or we have a serious bug here where
active_timers isn't properly synchronized anymore.

Now. If there is really just a deferrable timer that expired and nothing
else then this would explain it.

> -out:
> +
>   rt_spin_unlock_after_trylock_in_irq(&base->lock);
>  
>  }

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-22 Thread Joakim Hernberg
On Mon, 20 Jan 2014 21:17:36 -0500
Steven Rostedt  wrote:

> I happen to have a i7 box to test on, and sure enough, the latest
> 3.12-rt locks up on boot and reverting the
> timers-do-not-raise-softirq-unconditionally.patch, it boots fine.

> Signed-off-by: Steven Rostedt 
> 
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 46467be..8212c10 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1464,13 +1464,11 @@ void run_local_timers(void)
>   raise_softirq(TIMER_SOFTIRQ);
>   return;
>   }
> - if (!base->active_timers)
> - goto out;
>  
>   /* Check whether the next pending timer has expired */
>   if (time_before_eq(base->next_timer, jiffies))
>   raise_softirq(TIMER_SOFTIRQ);
> -out:
> +
>   rt_spin_unlock_after_trylock_in_irq(&base->lock);
>  
>  }

This fixes the problem on my i7-2600k.

-- 

   Joakim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-21 Thread Joe Korty
On Tue, Jan 21, 2014 at 01:39:10AM -0500, Muli Baron wrote:
> On 21/1/2014 04:17, Steven Rostedt wrote:
> > On Sat, 18 Jan 2014 04:15:29 +0100
> > Mike Galbraith  wrote:
> >
> >
> >>> So you also have the timers-do-not-raise-softirq-unconditionally.patch?
> >>
> >
> > People have been complaining that the latest 3.12-rt does not boot on
> > intel i7 boxes. And by reverting this patch, it boots fine.
> >
> > I happen to have a i7 box to test on, and sure enough, the latest
> > 3.12-rt locks up on boot and reverting the
> > timers-do-not-raise-softirq-unconditionally.patch, it boots fine.
> >
> > Looking into it, I made this small update, and the box boots. Seems
> > checking "active_timers" is not enough to skip raising softirqs. I
> > haven't looked at why yet, but I would like others to test this patch
> > too.
> >
> > I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-)
> >
> > -- Steve
> >
> > Signed-off-by: Steven Rostedt 
> >
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index 46467be..8212c10 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1464,13 +1464,11 @@ void run_local_timers(void)
> > raise_softirq(TIMER_SOFTIRQ);
> > return;
> > }
> > -   if (!base->active_timers)
> > -   goto out;
> >
> > /* Check whether the next pending timer has expired */
> > if (time_before_eq(base->next_timer, jiffies))
> > raise_softirq(TIMER_SOFTIRQ);
> > -out:
> > +
> > rt_spin_unlock_after_trylock_in_irq(&base->lock);
> >
> >   }
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> While this might fix booting on i7 machines it kinds of defeats the 
> original purpose of this patch, which was to let NO_HZ_FULL work 
> properly with threaded interrupts. With the active_timers check removed 
> the timer interrupt keeps firing even though there is only one task 
> running on a specific processor, since it can't shut down the tick 
> because the ksoftirqd thread keeps getting scheduled (see the previous 
> thread "CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo" for the full 
> discussion).
> 
> -- Muli


Would something like this work?  This would get us past boot, which has
always been this strange, half initialized thing one has to tiptoe around.

-   if (!base->active_timers)
+   if (!base->active_timers && system_state == SYSTEM_RUNNING)

Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-20 Thread Steven Rostedt
On Sat, 18 Jan 2014 04:15:29 +0100
Mike Galbraith  wrote:

 
> > So you also have the timers-do-not-raise-softirq-unconditionally.patch?
> 

People have been complaining that the latest 3.12-rt does not boot on
intel i7 boxes. And by reverting this patch, it boots fine.

I happen to have a i7 box to test on, and sure enough, the latest
3.12-rt locks up on boot and reverting the
timers-do-not-raise-softirq-unconditionally.patch, it boots fine.

Looking into it, I made this small update, and the box boots. Seems
checking "active_timers" is not enough to skip raising softirqs. I
haven't looked at why yet, but I would like others to test this patch
too.

I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-)

-- Steve

Signed-off-by: Steven Rostedt 

diff --git a/kernel/timer.c b/kernel/timer.c
index 46467be..8212c10 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1464,13 +1464,11 @@ void run_local_timers(void)
raise_softirq(TIMER_SOFTIRQ);
return;
}
-   if (!base->active_timers)
-   goto out;
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
raise_softirq(TIMER_SOFTIRQ);
-out:
+
rt_spin_unlock_after_trylock_in_irq(&base->lock);
 
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-17 Thread Mike Galbraith
On Fri, 2014-01-17 at 18:00 +0100, Sebastian Andrzej Siewior wrote: 
> * Mike Galbraith | 2013-12-24 16:47:47 [+0100]:
> 
> >I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
> >core box.  I haven't seen RCU grip yet, but I just checked on it after
> >3.5 hours into this boot/beat (after fixing crash+kdump setup), and
> >found it in the process of dumping. 
> 
> So you also have the timers-do-not-raise-softirq-unconditionally.patch?

Oh dear, there's holidays, vacation, and massive turkey overdose between
then and now, but I'm almost positive that the tree was virgin $subject,
with only Paul's patch enabled, that being what I wanted to beat on.

> I have a small problem with understanding this…
> 
> |#24 [880273a03cd0] run_timer_softirq at 81069002
> 
> Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
> init_lists() before the apic timer kicks in. So we have the wait_lock.

gdb fibs a little, we're acquiring.

>---  ---
> >#21 [880273a03b28] apic_timer_interrupt at 815cbf9d
> >[exception RIP: _raw_spin_lock+50]

> In the hard interrupt triggered by the apic timer we get to
> get_next_timer_interrupt() and go again for same the wait_lock. Here we
> have the try_lock so we avoid this deadlock.
> The odd part: we get the lock. It should be the same lock because both use
> | struct tvec_base *base = __this_cpu_read(tvec_bases);
> to ge it. And we shouldn't get it because the lock is already hold.
> We get into trouble in the unlock path where we spin forever:
> 
> |#14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at 
> 815c3425
> |#12 [880276803e28] _raw_spin_trylock at 815c3790
> 
> which releases the lock with a trylock in order to keep lockdep happy.
> My understanding was that we should be able to obtain the wait_lock here
> since we were able to obtain it in the lock path and in irq off context
> there is nothing that could take the lock in the meantime.

IIRC, we were endlessly trying, but with an un-punched ticket under us,
and no Xen like evilness to save the day.

I've since cleaned out my crashdump directory and moved on to frolicking
with hotplug gremlins, so don't have that one to revisit, but the don't
unconditionally raise timer softirq patch is the bad guy.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-17 Thread Sebastian Andrzej Siewior
* Mike Galbraith | 2013-12-24 16:47:47 [+0100]:

>I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
>core box.  I haven't seen RCU grip yet, but I just checked on it after
>3.5 hours into this boot/beat (after fixing crash+kdump setup), and
>found it in the process of dumping. 

So you also have the timers-do-not-raise-softirq-unconditionally.patch?

>crash> bt
>PID: 508TASK: 8802739ba340  CPU: 16  COMMAND: "ksoftirqd/16"
> #0 [880276806a40] machine_kexec at 8103bc07
> #1 [880276806aa0] crash_kexec at 810d56b3
> #2 [880276806b70] panic at 815bf8b0
> #3 [880276806bf0] watchdog_overflow_callback at 810fed3d
> #4 [880276806c10] __perf_event_overflow at 81131928
> #5 [880276806ca0] perf_event_overflow at 81132254
> #6 [880276806cb0] intel_pmu_handle_irq at 8102078f
> #7 [880276806de0] perf_event_nmi_handler at 815c5825
> #8 [880276806e10] nmi_handle at 815c4ed3
> #9 [880276806ea0] default_do_nmi at 815c5063
>#10 [880276806ed0] do_nmi at 815c5388
>#11 [880276806ef0] end_repeat_nmi at 815c4371
>[exception RIP: _raw_spin_trylock+48]
>RIP: 815c3790  RSP: 880276803e28  RFLAGS: 0002
>RAX: 0010  RBX: 0010  RCX: 0002
>RDX: 880276803e28  RSI: 0018  RDI: 0001
>RBP: 815c3790   R8: 815c3790   R9: 0018
>R10: 880276803e28  R11: 0002  R12: 
>R13: 880273a0c000  R14: 8802739ba340  R15: 880273a03fd8
>ORIG_RAX: 880273a03fd8  CS: 0010  SS: 0018
>---  ---
>#12 [880276803e28] _raw_spin_trylock at 815c3790
>#13 [880276803e30] rt_spin_lock_slowunlock_hirq at 815c2cc8
>#14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at 815c3425
>#15 [880276803e60] get_next_timer_interrupt at 810684a7
>#16 [880276803ed0] tick_nohz_stop_sched_tick at 810c5f2e
>#17 [880276803f50] tick_nohz_irq_exit at 810c6333
>#18 [880276803f70] irq_exit at 81060065
>#19 [880276803f90] smp_apic_timer_interrupt at 810358f5
>#20 [880276803fb0] apic_timer_interrupt at 815cbf9d
>---  ---
>#21 [880273a03b28] apic_timer_interrupt at 815cbf9d
>[exception RIP: _raw_spin_lock+50]
>RIP: 815c3642  RSP: 880273a03bd8  RFLAGS: 0202
>RAX: 8b49  RBX: 880272157290  RCX: 8802739ba340
>RDX: 8b4a  RSI: 0010  RDI: 880273a0c000
>RBP: 880273a03bd8   R8: 0001   R9: 
>R10:   R11: 0001  R12: 810927b5
>R13: 880273a03b68  R14: 0010  R15: 0010
>ORIG_RAX: ff10  CS: 0010  SS: 0018
>#22 [880273a03be0] rt_spin_lock_slowlock at 815c2591
>#23 [880273a03cc0] rt_spin_lock at 815c3362
>#24 [880273a03cd0] run_timer_softirq at 81069002
>#25 [880273a03d70] handle_softirq at 81060d0f
>#26 [880273a03db0] do_current_softirqs at 81060f3c
>#27 [880273a03e20] run_ksoftirqd at 81061045
>#28 [880273a03e40] smpboot_thread_fn at 81089c31
>#29 [880273a03ec0] kthread at 810807fe
>#30 [880273a03f50] ret_from_fork at 815cb28c
>crash> gdb list *0x815c2591
>0x815c2591 is in rt_spin_lock_slowlock (kernel/rtmutex.c:109).
>104 }
>105 #endif
>106 
>107 static inline void init_lists(struct rt_mutex *lock)
>108 {
>109 if (unlikely(!lock->wait_list.node_list.prev))
>110 plist_head_init(&lock->wait_list);
>111 }
>112 
>113 /*
>crash> gdb list *0x815c2590
>0x815c2590 is in rt_spin_lock_slowlock (kernel/rtmutex.c:744).
>739 struct rt_mutex_waiter waiter, *top_waiter;
>740 int ret;
>741 
>742 rt_mutex_init_waiter(&waiter, true);
>743 
>744 raw_spin_lock(&lock->wait_lock);
>745 init_lists(lock);
>746 
>747 if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
>748 raw_spin_unlock(&lock->wait_lock);
>crash> gdb list *0x815c2cc8
>0x815c2cc8 is in rt_spin_lock_slowunlock_hirq (kernel/rtmutex.c:851).
>846 {
>847 int ret;
>848 
>849 do {
>850 ret = raw_spin_trylock(&lock->wait_lock);
>851 } while (!ret);
>852 
>853 __rt_spin_lock_slowunlock(lock);
>854 }
>855
>
>Dang, Santa might have delivered a lock pick set in a few more hours.

I have a small problem with understanding this…

|#24 [880273a03cd0] run_timer_softirq at 81069002

Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
init_lists() before the apic timer kicks in.

Re: [ANNOUNCE] 3.12.6-rt9

2014-01-17 Thread Sebastian Andrzej Siewior
* Nicholas Mc Guire | 2013-12-27 21:00:24 [+0100]:

>> - A patch from Thomas Gleixner not to raise the timer softirq
>>   unconditionally (only if a timer is pending)
>> 
>
>This one seems to deadlock early in the boot sequence on x86
>(i3/i7/Phenom-4x here and Carsten Emde also had boot failures)
>
>after droping this patch with:
>patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
>3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
>(only ran for a few minutes idle and 1h with load on an i3).
>
>The main problem with this patch though are proceduaral isues 
>the commit note - which is a mail exchange - actually does not explain what 
>the rational for the changes is (...well I don't understand the logic of
>run_local_timers - if someone can explain - pleas do) and notably:
>
>from timers-do-not-raise-softirq-unconditionally.patch
>
>well, that very same problem is in mainline if you add "threadirqs" to
>the command line. But we can be smart about this. The untested patch
>^^
>below should address that issue. If that works on mainline we can
>adapt it for RT (needs a trylock(&base->lock) there).
>
>
> does make me wonder why this went into -rt9 ?

It was on the mailing list for a few weeks. My understanding was that
Mike Galbraith tested it on mainline and then I added the RT specific
pieces and added it it to the tree.

> It also build fails with CONFIG_PREEMPT_RT_FULL not set.

I will add a non-RT based config to my compile tests.

> as with this patch, systems that booted just fine with 3.12.5-rt7 don't
> even boot (atleast my 3 x86 test boxes here did not) this raises some
> questions regarding the process of getting patches into -rtX - are
> we going to fast here ?
>
> I would prefere if such patches would go out with a request for testing
> or atleast a "might blow up your system" note in them...

I didn't expect that much trouble. In general I try to avoid adding
explosives unless marked as such.

>thx!
>hofrat

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2014-01-11 Thread Joakim Hernberg
On Fri, 27 Dec 2013 21:00:24 +0100
Nicholas Mc Guire  wrote:

> On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:
> 
> > Dear RT folks!
> > 
> > I'm pleased to announce the v3.12.6-rt9 patch set.
> > 
> > Changes since v3.12.6-rt8
> 
> > - A patch from Thomas Gleixner not to raise the timer softirq
> >   unconditionally (only if a timer is pending)
> > 
> 
> This one seems to deadlock early in the boot sequence on x86
> (i3/i7/Phenom-4x here and Carsten Emde also had boot failures)

This patch seems to frequently make the kernel hang hard early in the
boot process on my i7-2600k too. Reverting
timers-do-not-raise-softirq-unconditionally.patch appears to fix the
problem.

-- 

   Joakim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-28 Thread Mike Galbraith
On Sat, 2013-12-28 at 08:43 +0100, Nicholas Mc Guire wrote:

> This type of blowups will not help to go mainline (refereing to 3.12.X here, 
> 3.4/6/8/10 is a different story).

Nah.  Breakage is a vital sign.  When breakage stops, bury it.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-27 Thread Nicholas Mc Guire
On Sat, 28 Dec 2013, Mike Galbraith wrote:

> On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:
> 
> > (Less than wonderful changelogs probably comes from the fact that
> > maintaining -rt out of tree is time consuming as all hell.  Everybody
> > gets to breaks it, a couple guys get to fix it up again and again.)
> 
> P.S.  try rolling your tree forward to master or tip for entertainment,
> you'll see what I mean.  Hi Peter, Rik.. other breakers of worlds :)
>
protesting exernal breakage by ameding -rt with home-made landmines
does sound like an optimized entertainment strategy...

This type of blowups will not help to go mainline (refereing to 3.12.X here, 
3.4/6/8/10 is a different story).

thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-27 Thread Mike Galbraith
On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:

> Watchdog barked at two such spots..

btw, lockdep doesn't grumble about that (didn't stare at annotation,
don't speak lockdep well).  I fixed it up to not take it's toys and go
home in a snit at boot (rt_mutex debug offends it methinks), but it
didn't gripe.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-27 Thread Mike Galbraith
On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:

> (Less than wonderful changelogs probably comes from the fact that
> maintaining -rt out of tree is time consuming as all hell.  Everybody
> gets to breaks it, a couple guys get to fix it up again and again.)

P.S.  try rolling your tree forward to master or tip for entertainment,
you'll see what I mean.  Hi Peter, Rik.. other breakers of worlds :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-27 Thread Mike Galbraith
On Fri, 2013-12-27 at 21:00 +0100, Nicholas Mc Guire wrote: 
> On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:
> 
> > Dear RT folks!
> > 
> > I'm pleased to announce the v3.12.6-rt9 patch set.
> > 
> > Changes since v3.12.6-rt8
> 
> > - A patch from Thomas Gleixner not to raise the timer softirq
> >   unconditionally (only if a timer is pending)
> > 
> 
> This one seems to deadlock early in the boot sequence on x86
> (i3/i7/Phenom-4x here and Carsten Emde also had boot failures)
> 
> after droping this patch with:
> patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
> 3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
> (only ran for a few minutes idle and 1h with load on an i3).
> 
> The main problem with this patch though are proceduaral isues 
> the commit note - which is a mail exchange - actually does not explain what 
> the rational for the changes is

Raising the timer softirq unconditionally wakes ksoftirqd at every tick,
so the only time the no_hz_full "one and only one task is runnable" tick
shutdown criteria can be met is when the box has zero other runnable
tasks.. i.e. when box is idle.

Here, patch works fine boot wise, and no_hz_full tick shutdown works as
well, but there are a couple spots where taking an interrupt is a bad
idea as things sit.  Watchdog barked at two such spots, and there's a
"you _will_ hit this warning in -rt" spot as well.  

With bandaids on the sore spots, my 64 core box survives.

-Mike

(Less than wonderful changelogs probably comes from the fact that
maintaining -rt out of tree is time consuming as all hell.  Everybody
gets to breaks it, a couple guys get to fix it up again and again.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-27 Thread Nicholas Mc Guire
On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:

> Dear RT folks!
> 
> I'm pleased to announce the v3.12.6-rt9 patch set.
> 
> Changes since v3.12.6-rt8

> - A patch from Thomas Gleixner not to raise the timer softirq
>   unconditionally (only if a timer is pending)
> 

This one seems to deadlock early in the boot sequence on x86
(i3/i7/Phenom-4x here and Carsten Emde also had boot failures)

after droping this patch with:
patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
(only ran for a few minutes idle and 1h with load on an i3).

The main problem with this patch though are proceduaral isues 
the commit note - which is a mail exchange - actually does not explain what 
the rational for the changes is (...well I don't understand the logic of
run_local_timers - if someone can explain - pleas do) and notably:

from timers-do-not-raise-softirq-unconditionally.patch

well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
^^
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(&base->lock) there).


 does make me wonder why this went into -rt9 ?
 It also build fails with CONFIG_PREEMPT_RT_FULL not set.

 as with this patch, systems that booted just fine with 3.12.5-rt7 don't
 even boot (atleast my 3 x86 test boxes here did not) this raises some
 questions regarding the process of getting patches into -rtX - are
 we going to fast here ?

 I would prefere if such patches would go out with a request for testing
 or atleast a "might blow up your system" note in them...

thx!
hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-24 Thread Mike Galbraith
On Tue, 2013-12-24 at 20:39 +0400, Pavel Vasilyev wrote: 
> 24.12.2013 19:47, Mike Galbraith пишет:
> > On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: 
> 
> > crash> bt
> > PID: 508TASK: 8802739ba340  CPU: 16  COMMAND: "ksoftirqd/16"
> 
> YES!!! And ARM code broke :)

And NO_HZ_TICK config survived for only 4.5 hours.

PID: 6948   TASK: 880272d1f1c0  CPU: 29  COMMAND: "tbench"
 #0 [8802769a6a40] machine_kexec at 8103bc07
 #1 [8802769a6aa0] crash_kexec at 810d3e93
 #2 [8802769a6b70] panic at 815bce70
 #3 [8802769a6bf0] watchdog_overflow_callback at 810fd51d
 #4 [8802769a6c10] __perf_event_overflow at 8112f1f8
 #5 [8802769a6ca0] perf_event_overflow at 8112fb14
 #6 [8802769a6cb0] intel_pmu_handle_irq at 8102078f
 #7 [8802769a6de0] perf_event_nmi_handler at 815c2de5
 #8 [8802769a6e10] nmi_handle at 815c2493
 #9 [8802769a6ea0] default_do_nmi at 815c2623
#10 [8802769a6ed0] do_nmi at 815c2948
#11 [8802769a6ef0] end_repeat_nmi at 815c1931
[exception RIP: preempt_schedule+36]
RIP: 815be944  RSP: 8802769a3d98  RFLAGS: 0002
RAX: 0010  RBX: 0010  RCX: 0002
RDX: 8802769a3d98  RSI: 0018  RDI: 0001
RBP: 815be944   R8: 815be944   R9: 0018
R10: 8802769a3d98  R11: 0002  R12: 
R13: 880273f74000  R14: 880272d1f1c0  R15: 880269cedfd8
ORIG_RAX: 880269cedfd8  CS: 0010  SS: 0018
---  ---
#12 [8802769a3d98] preempt_schedule at 815be944
#13 [8802769a3db0] _raw_spin_trylock at 815c0d6e
#14 [8802769a3dc0] rt_spin_lock_slowunlock_hirq at 815c0288
#15 [8802769a3de0] rt_spin_unlock_after_trylock_in_irq at 815c09e5
#16 [8802769a3df0] run_local_timers at 81068025
#17 [8802769a3e10] update_process_times at 810680ac
#18 [8802769a3e40] tick_sched_handle at 810c3a92
#19 [8802769a3e60] tick_sched_timer at 810c3d2f
#20 [8802769a3e90] __run_hrtimer at 8108471d
#21 [8802769a3ed0] hrtimer_interrupt at 8108497a
#22 [8802769a3f70] local_apic_timer_interrupt at 810349e6
#23 [8802769a3f90] smp_apic_timer_interrupt at 810358ee
#24 [8802769a3fb0] apic_timer_interrupt at 815c955d
---  ---
#25 [880269ced848] apic_timer_interrupt at 815c955d
[exception RIP: _raw_spin_lock+53]
RIP: 815c0c05  RSP: 880269ced8f8  RFLAGS: 0202
RAX: 0b7b  RBX: 0282  RCX: 880272d1f1c0
RDX: 0b7d  RSI: 880269ceda38  RDI: 880273f74000
RBP: 880269ced8f8   R8: 0001   R9: b54d13a4
R10: 0001  R11: 0001  R12: 880269ced910
R13: 880276d32170  R14: 810c9030  R15: 880269ced8b8
ORIG_RAX: ff10  CS: 0010  SS: 0018
#26 [880269ced900] rt_spin_lock_slowlock at 815bfb51
#27 [880269ced9e0] rt_spin_lock at 815c0922
#28 [880269ced9f0] lock_timer_base at 81067f92
#29 [880269ceda20] mod_timer at 81069bcb
#30 [880269ceda70] sk_reset_timer at 814d1e57
#31 [880269ceda90] inet_csk_reset_xmit_timer at 8152d4a8
#32 [880269cedac0] tcp_rearm_rto at 8152d583
#33 [880269cedae0] tcp_ack at 81534085
#34 [880269cedb60] tcp_rcv_established at 8153443d
#35 [880269cedbb0] tcp_v4_do_rcv at 8153f56a
#36 [880269cedbe0] __release_sock at 814d3891
#37 [880269cedc10] release_sock at 814d3942
#38 [880269cedc30] tcp_sendmsg at 8152b955
#39 [880269cedd00] inet_sendmsg at 8155350e
#40 [880269cedd30] sock_sendmsg at 814cea87
#41 [880269cede40] sys_sendto at 814cebdf
#42 [880269cedf80] tracesys at 815c8b09 (via system_call)
RIP: 7f0441a1fc35  RSP: 7fffdea86130  RFLAGS: 0246
RAX: ffda  RBX: 815c8b09  RCX: 
RDX: 248d  RSI: 00607260  RDI: 0004
RBP: 248d   R8:    R9: 
R10:   R11: 0246  R12: 7fffdea86a10
R13: 7fffdea86414  R14: 0004  R15: 00607260
ORIG_RAX: 002c  CS: 0033  SS: 002b


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-24 Thread Pavel Vasilyev
24.12.2013 19:47, Mike Galbraith пишет:
> On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: 

> crash> bt
> PID: 508TASK: 8802739ba340  CPU: 16  COMMAND: "ksoftirqd/16"

YES!!! And ARM code broke :)



-- 

 Pavel.



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] 3.12.6-rt9

2013-12-24 Thread Mike Galbraith
On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote: 
> Dear RT folks!
> 
> I'm pleased to announce the v3.12.6-rt9 patch set.
> 
> Changes since v3.12.6-rt8
> - ARM's mach-sti is now using rawlock as boot_lock (like the other
>   mach-*)
> - There was a callpath to rcu_preempt_qs() with interrupts enabled. Tiejun
>   Chen posted a patch to call it with interrupt disabled like we always
>   do.
> - A patch from Paul E. McKenney to not activate RCU core on NO_HZ_FULL
>   CPUs
> - A patch from Thomas Gleixner not to raise the timer softirq
>   unconditionally (only if a timer is pending)
> 
> 
> There is also a patch in the queue from Paul E. McKenney to move RCU
> processing from softirq into its own thread. After Mike Galbraith
> reported a few RCU stalls I decided to keep it disabled for now until I
> have some time to look at it.

I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
core box.  I haven't seen RCU grip yet, but I just checked on it after
3.5 hours into this boot/beat (after fixing crash+kdump setup), and
found it in the process of dumping. 

crash> bt
PID: 508TASK: 8802739ba340  CPU: 16  COMMAND: "ksoftirqd/16"
 #0 [880276806a40] machine_kexec at 8103bc07
 #1 [880276806aa0] crash_kexec at 810d56b3
 #2 [880276806b70] panic at 815bf8b0
 #3 [880276806bf0] watchdog_overflow_callback at 810fed3d
 #4 [880276806c10] __perf_event_overflow at 81131928
 #5 [880276806ca0] perf_event_overflow at 81132254
 #6 [880276806cb0] intel_pmu_handle_irq at 8102078f
 #7 [880276806de0] perf_event_nmi_handler at 815c5825
 #8 [880276806e10] nmi_handle at 815c4ed3
 #9 [880276806ea0] default_do_nmi at 815c5063
#10 [880276806ed0] do_nmi at 815c5388
#11 [880276806ef0] end_repeat_nmi at 815c4371
[exception RIP: _raw_spin_trylock+48]
RIP: 815c3790  RSP: 880276803e28  RFLAGS: 0002
RAX: 0010  RBX: 0010  RCX: 0002
RDX: 880276803e28  RSI: 0018  RDI: 0001
RBP: 815c3790   R8: 815c3790   R9: 0018
R10: 880276803e28  R11: 0002  R12: 
R13: 880273a0c000  R14: 8802739ba340  R15: 880273a03fd8
ORIG_RAX: 880273a03fd8  CS: 0010  SS: 0018
---  ---
#12 [880276803e28] _raw_spin_trylock at 815c3790
#13 [880276803e30] rt_spin_lock_slowunlock_hirq at 815c2cc8
#14 [880276803e50] rt_spin_unlock_after_trylock_in_irq at 815c3425
#15 [880276803e60] get_next_timer_interrupt at 810684a7
#16 [880276803ed0] tick_nohz_stop_sched_tick at 810c5f2e
#17 [880276803f50] tick_nohz_irq_exit at 810c6333
#18 [880276803f70] irq_exit at 81060065
#19 [880276803f90] smp_apic_timer_interrupt at 810358f5
#20 [880276803fb0] apic_timer_interrupt at 815cbf9d
---  ---
#21 [880273a03b28] apic_timer_interrupt at 815cbf9d
[exception RIP: _raw_spin_lock+50]
RIP: 815c3642  RSP: 880273a03bd8  RFLAGS: 0202
RAX: 8b49  RBX: 880272157290  RCX: 8802739ba340
RDX: 8b4a  RSI: 0010  RDI: 880273a0c000
RBP: 880273a03bd8   R8: 0001   R9: 
R10:   R11: 0001  R12: 810927b5
R13: 880273a03b68  R14: 0010  R15: 0010
ORIG_RAX: ff10  CS: 0010  SS: 0018
#22 [880273a03be0] rt_spin_lock_slowlock at 815c2591
#23 [880273a03cc0] rt_spin_lock at 815c3362
#24 [880273a03cd0] run_timer_softirq at 81069002
#25 [880273a03d70] handle_softirq at 81060d0f
#26 [880273a03db0] do_current_softirqs at 81060f3c
#27 [880273a03e20] run_ksoftirqd at 81061045
#28 [880273a03e40] smpboot_thread_fn at 81089c31
#29 [880273a03ec0] kthread at 810807fe
#30 [880273a03f50] ret_from_fork at 815cb28c
crash> gdb list *0x815c2591
0x815c2591 is in rt_spin_lock_slowlock (kernel/rtmutex.c:109).
104 }
105 #endif
106 
107 static inline void init_lists(struct rt_mutex *lock)
108 {
109 if (unlikely(!lock->wait_list.node_list.prev))
110 plist_head_init(&lock->wait_list);
111 }
112 
113 /*
crash> gdb list *0x815c2590
0x815c2590 is in rt_spin_lock_slowlock (kernel/rtmutex.c:744).
739 struct rt_mutex_waiter waiter, *top_waiter;
740 int ret;
741 
742 rt_mutex_init_waiter(&waiter, true);
743 
744 raw_spin_lock(&lock->wait_lock);
745 init_lists(lock);
746 
747 if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
748 raw_spin_unlock(&lock->wait_lock);
c