subject:"\[RFC PATCH 00\/11\] printk\: safe printing in NMI context"

On Wed, 18 Jun 2014 23:12:48 +0200 (CEST)
Jiri Kosina  wrote:


> > I believe that this was what Steven was suggesting, though by using
> > tracing.  
> 
> My understanding was that Steven is suggesting using trace_printk() from 
> NMI.

Not quite. I was suggesting using the ftrace ring buffer. It could have
its own way to write the stack and not depend on tracing itself.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 11:32:53PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > > I agree that it might work nicely for RCU stall detector indeed. I was 
> > > looking for solution that'd work nicely both for RCU and for sysrq-l 
> > > (where we can't rely on processess being stuck in any way).
> > 
> > Agreed.  And if some more generally useful approach appears, I will be
> > quite happy to adjust RCU to use it.  In the meantime, I expect that
> > my patch will be helpful.
> 
> Agreed. And we'll look into fixing sysrq-l in parallel I guess; once there 
> is a working solution (hangs with sysrq-l can be trivially reproduced 
> almost immediately), we can then migrate RCU to it.
> 
> Still, I feel bad about the fact that we are now hostages of our printk() 
> implementation, which doesn't allow for any fixes/improvements. Having the 
> possibility to printk() from NMI would be nice and more robust ... 
> otherwise, we'll be getting people trying to do it in the future over and 
> over again, even if we now get rid of it at once.

Well, we could always have printk() splat if invoke in_nmi().

Oh, wait...  ;-)

More seriously, an in_nmi() printk() could taint the kernel, set a
flag that results in a deferred splat, do a trace_printk(), or any
number of things to let the developer know that this was a bad idea.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > I agree that it might work nicely for RCU stall detector indeed. I was 
> > looking for solution that'd work nicely both for RCU and for sysrq-l 
> > (where we can't rely on processess being stuck in any way).
> 
> Agreed.  And if some more generally useful approach appears, I will be
> quite happy to adjust RCU to use it.  In the meantime, I expect that
> my patch will be helpful.

Agreed. And we'll look into fixing sysrq-l in parallel I guess; once there 
is a working solution (hangs with sysrq-l can be trivially reproduced 
almost immediately), we can then migrate RCU to it.

Still, I feel bad about the fact that we are now hostages of our printk() 
implementation, which doesn't allow for any fixes/improvements. Having the 
possibility to printk() from NMI would be nice and more robust ... 
otherwise, we'll be getting people trying to do it in the future over and 
over again, even if we now get rid of it at once.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 11:12:48PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > > > /* Complain about tasks blocking the grace period. */
> > > > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> > > > pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
> > > > jiffies - rsp->gp_start,
> > > > (long)rsp->gpnum, (long)rsp->completed, totqlen);
> > > > -   if (!trigger_all_cpu_backtrace())
> > > > -   dump_stack();
> > > > +   rcu_dump_cpu_stacks(rsp);
> > > 
> > > This is prone to producing not really consistent stacktraces though, 
> > > right? As the target task is still running at the time the stack is being 
> > > walked, it might produce stacktraces that are potentially nonsensial.
> > 
> > If a CPU is stuck, the stack trace down to where it is stuck is
> > likely to be static.  But yes, there is some potential for confusion.
> > My (admittedly limited) rcutorture testing produced sensible stack traces,
> > but things might be a bit uglier in other situations.
> 
> I agree that it might work nicely for RCU stall detector indeed. I was 
> looking for solution that'd work nicely both for RCU and for sysrq-l 
> (where we can't rely on processess being stuck in any way).

Agreed.  And if some more generally useful approach appears, I will be
quite happy to adjust RCU to use it.  In the meantime, I expect that
my patch will be helpful.

Thanx, Paul

> > > How about sending NMI to the target CPU, so that the task is actually 
> > > stopped, but printing its stacktrace from the CPU that detected the stall 
> > > while it's stopped?
> > > 
> > > That way, there is no printk()-from-NMI, but also the stacktrace is 
> > > guaranteed to be self-consistent.
> > 
> > I believe that this was what Steven was suggesting, though by using
> > tracing.  
> 
> My understanding was that Steven is suggesting using trace_printk() from 
> NMI.
> 
> > Of course, if my current approach isn't up to the job, then something 
> > like this general approach would look quite good.
> 
> Thanks,
> 
> -- 
> Jiri Kosina
> SUSE Labs
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > >   /* Complain about tasks blocking the grace period. */
> > > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> > >   pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
> > >   jiffies - rsp->gp_start,
> > >   (long)rsp->gpnum, (long)rsp->completed, totqlen);
> > > - if (!trigger_all_cpu_backtrace())
> > > - dump_stack();
> > > + rcu_dump_cpu_stacks(rsp);
> > 
> > This is prone to producing not really consistent stacktraces though, 
> > right? As the target task is still running at the time the stack is being 
> > walked, it might produce stacktraces that are potentially nonsensial.
> 
> If a CPU is stuck, the stack trace down to where it is stuck is
> likely to be static.  But yes, there is some potential for confusion.
> My (admittedly limited) rcutorture testing produced sensible stack traces,
> but things might be a bit uglier in other situations.

I agree that it might work nicely for RCU stall detector indeed. I was 
looking for solution that'd work nicely both for RCU and for sysrq-l 
(where we can't rely on processess being stuck in any way).

> > How about sending NMI to the target CPU, so that the task is actually 
> > stopped, but printing its stacktrace from the CPU that detected the stall 
> > while it's stopped?
> > 
> > That way, there is no printk()-from-NMI, but also the stacktrace is 
> > guaranteed to be self-consistent.
> 
> I believe that this was what Steven was suggesting, though by using
> tracing.  

My understanding was that Steven is suggesting using trace_printk() from 
NMI.

> Of course, if my current approach isn't up to the job, then something 
> like this general approach would look quite good.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 10:36:10PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > OK, unconditional non-use of NMIs is even easier.  ;-)
> > 
> > Something like the following.
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > rcu: Don't use NMIs to dump other CPUs' stacks
> > 
> > Although NMI-based stack dumps are in principle more accurate, they are
> > also more likely to trigger deadlocks.  This commit therefore replaces
> > all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
> > that the CPU detecting an RCU CPU stall does the stack dumping.
> > 
> > Signed-off-by: Paul E. McKenney 
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c590e1201c74..777624e1329b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct 
> > rcu_state *rsp)
> >  }
> >  
> >  /*
> > - * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
> > - * for architectures that do not implement trigger_all_cpu_backtrace().
> > - * The NMI-triggered stack traces are more accurate because they are
> > - * printed by the target CPU.
> > + * Dump stacks of all tasks running on stalled CPUs.
> >   */
> >  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
> >  {
> > @@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state 
> > *rsp)
> >(long)rsp->gpnum, (long)rsp->completed, totqlen);
> > if (ndetected == 0)
> > pr_err("INFO: Stall ended before state dump start\n");
> > -   else if (!trigger_all_cpu_backtrace())
> > +   else
> > rcu_dump_cpu_stacks(rsp);
> >  
> > /* Complain about tasks blocking the grace period. */
> > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> > pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
> > jiffies - rsp->gp_start,
> > (long)rsp->gpnum, (long)rsp->completed, totqlen);
> > -   if (!trigger_all_cpu_backtrace())
> > -   dump_stack();
> > +   rcu_dump_cpu_stacks(rsp);
> 
> This is prone to producing not really consistent stacktraces though, 
> right? As the target task is still running at the time the stack is being 
> walked, it might produce stacktraces that are potentially nonsensial.

If a CPU is stuck, the stack trace down to where it is stuck is
likely to be static.  But yes, there is some potential for confusion.
My (admittedly limited) rcutorture testing produced sensible stack traces,
but things might be a bit uglier in other situations.

> How about sending NMI to the target CPU, so that the task is actually 
> stopped, but printing its stacktrace from the CPU that detected the stall 
> while it's stopped?
> 
> That way, there is no printk()-from-NMI, but also the stacktrace is 
> guaranteed to be self-consistent.

I believe that this was what Steven was suggesting, though by using
tracing.  Of course, if my current approach isn't up to the job,
then something like this general approach would look quite good.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> OK, unconditional non-use of NMIs is even easier.  ;-)
> 
> Something like the following.
> 
>   Thanx, Paul
> 
> 
> 
> rcu: Don't use NMIs to dump other CPUs' stacks
> 
> Although NMI-based stack dumps are in principle more accurate, they are
> also more likely to trigger deadlocks.  This commit therefore replaces
> all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
> that the CPU detecting an RCU CPU stall does the stack dumping.
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c590e1201c74..777624e1329b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state 
> *rsp)
>  }
>  
>  /*
> - * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
> - * for architectures that do not implement trigger_all_cpu_backtrace().
> - * The NMI-triggered stack traces are more accurate because they are
> - * printed by the target CPU.
> + * Dump stacks of all tasks running on stalled CPUs.
>   */
>  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
>  {
> @@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>  (long)rsp->gpnum, (long)rsp->completed, totqlen);
>   if (ndetected == 0)
>   pr_err("INFO: Stall ended before state dump start\n");
> - else if (!trigger_all_cpu_backtrace())
> + else
>   rcu_dump_cpu_stacks(rsp);
>  
>   /* Complain about tasks blocking the grace period. */
> @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
>   pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
>   jiffies - rsp->gp_start,
>   (long)rsp->gpnum, (long)rsp->completed, totqlen);
> - if (!trigger_all_cpu_backtrace())
> - dump_stack();
> + rcu_dump_cpu_stacks(rsp);

This is prone to producing not really consistent stacktraces though, 
right? As the target task is still running at the time the stack is being 
walked, it might produce stacktraces that are potentially nonsensial.

How about sending NMI to the target CPU, so that the task is actually 
stopped, but printing its stacktrace from the CPU that detected the stall 
while it's stopped?

That way, there is no printk()-from-NMI, but also the stacktrace is 
guaranteed to be self-consistent.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 12:38:37PM -0400, Steven Rostedt wrote:
> On Wed, 18 Jun 2014 09:21:17 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
> > > On Jun 18, 2014 4:36 AM, "Paul E. McKenney" 
> > > wrote:
> > > >
> > > > I could easily add an option to RCU to allow people to tell it not to
> > > > use NMIs to dump the stack.
> > > 
> > > I don't think it should be an "option".
> > > 
> > > We should stop using nmi as if it was something "normal". It isn't. Code
> > > running in nmi context should be special, and should be very very aware
> > > that it is special. That goes way beyond "don't use printk". We seem to
> > > have gone way way too far in using nmi context.
> > > 
> > > So we should get *rid* of code in nmi context rather than then complain
> > > about printk being buggy.
> > 
> > OK, unconditional non-use of NMIs is even easier.  ;-)
> > 
> > Something like the following.
> 
> I have found the RCU stalls extremely useful in debugging lockups. In
> case this doesn't work as well, I'm willing to write up something that
> could send NMIs to all CPUs that would write into the ftrace ring
> buffer and when finished, the calling CPU can dump it out. No printk
> from NMI context at all.

Sounds like a good plan to me!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014 09:21:17 -0700
"Paul E. McKenney"  wrote:

> On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
> > On Jun 18, 2014 4:36 AM, "Paul E. McKenney" 
> > wrote:
> > >
> > > I could easily add an option to RCU to allow people to tell it not to
> > > use NMIs to dump the stack.
> > 
> > I don't think it should be an "option".
> > 
> > We should stop using nmi as if it was something "normal". It isn't. Code
> > running in nmi context should be special, and should be very very aware
> > that it is special. That goes way beyond "don't use printk". We seem to
> > have gone way way too far in using nmi context.
> > 
> > So we should get *rid* of code in nmi context rather than then complain
> > about printk being buggy.
> 
> OK, unconditional non-use of NMIs is even easier.  ;-)
> 
> Something like the following.
> 

I have found the RCU stalls extremely useful in debugging lockups. In
case this doesn't work as well, I'm willing to write up something that
could send NMIs to all CPUs that would write into the ftrace ring
buffer and when finished, the calling CPU can dump it out. No printk
from NMI context at all.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
> On Jun 18, 2014 4:36 AM, "Paul E. McKenney" 
> wrote:
> >
> > I could easily add an option to RCU to allow people to tell it not to
> > use NMIs to dump the stack.
> 
> I don't think it should be an "option".
> 
> We should stop using nmi as if it was something "normal". It isn't. Code
> running in nmi context should be special, and should be very very aware
> that it is special. That goes way beyond "don't use printk". We seem to
> have gone way way too far in using nmi context.
> 
> So we should get *rid* of code in nmi context rather than then complain
> about printk being buggy.

OK, unconditional non-use of NMIs is even easier.  ;-)

Something like the following.

Thanx, Paul



rcu: Don't use NMIs to dump other CPUs' stacks

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks.  This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c590e1201c74..777624e1329b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state 
*rsp)
 }
 
 /*
- * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
  */
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
@@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
   (long)rsp->gpnum, (long)rsp->completed, totqlen);
if (ndetected == 0)
pr_err("INFO: Stall ended before state dump start\n");
-   else if (!trigger_all_cpu_backtrace())
+   else
rcu_dump_cpu_stacks(rsp);
 
/* Complain about tasks blocking the grace period. */
@@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
jiffies - rsp->gp_start,
(long)rsp->gpnum, (long)rsp->completed, totqlen);
-   if (!trigger_all_cpu_backtrace())
-   dump_stack();
+   rcu_dump_cpu_stacks(rsp);
 
raw_spin_lock_irqsave(>lock, flags);
if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall)))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:53:14PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
> > > > >   seen it happening for real) cause a complete, undebuggable, silent 
> > > > > hang 
> > > > >   of machine (deadlock in NMI context)
> > > > 
> > > > I could easily add an option to RCU to allow people to tell it not to
> > > > use NMIs to dump the stack.  Would that help?
> > > 
> > > Well, that would make unfortunately the information provided by RCU stall 
> > > detector rather useless ... workqueue-based stack dumping is very 
> > > unlikely 
> > > to point its finger to the real offender, as it'd be coming way too late.
> > 
> > I would not use workqueues, but rather have the CPU detecting the
> > stall grovel through the other CPUs' stacks, which is what I do now for
> > architectures that don't support NMI-based stack dumps.  Would that be
> > a reasonable approach?
> 
> That would indeed solve lockups induced by RCU stall detector (and we 
> should convert sysrq stack dumping code to use the same mechanism 
> afterwards).
> 
> But then, the kernel is still polluted by quite a few instances of
> 
>   WARN_ON(in_nmi())
> 
>   BUG_IN(in_nmi())
> 
>   if (in_nmi())
>   printk()
> 
> which need to be fixed separately afterwards anyway.

True enough!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
> > > >   seen it happening for real) cause a complete, undebuggable, silent 
> > > > hang 
> > > >   of machine (deadlock in NMI context)
> > > 
> > > I could easily add an option to RCU to allow people to tell it not to
> > > use NMIs to dump the stack.  Would that help?
> > 
> > Well, that would make unfortunately the information provided by RCU stall 
> > detector rather useless ... workqueue-based stack dumping is very unlikely 
> > to point its finger to the real offender, as it'd be coming way too late.
> 
> I would not use workqueues, but rather have the CPU detecting the
> stall grovel through the other CPUs' stacks, which is what I do now for
> architectures that don't support NMI-based stack dumps.  Would that be
> a reasonable approach?

That would indeed solve lockups induced by RCU stall detector (and we 
should convert sysrq stack dumping code to use the same mechanism 
afterwards).

But then, the kernel is still polluted by quite a few instances of

WARN_ON(in_nmi())

BUG_IN(in_nmi())

if (in_nmi())
printk()

which need to be fixed separately afterwards anyway.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:41:09PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
> > >   seen it happening for real) cause a complete, undebuggable, silent hang 
> > >   of machine (deadlock in NMI context)
> > 
> > I could easily add an option to RCU to allow people to tell it not to
> > use NMIs to dump the stack.  Would that help?
> 
> Well, that would make unfortunately the information provided by RCU stall 
> detector rather useless ... workqueue-based stack dumping is very unlikely 
> to point its finger to the real offender, as it'd be coming way too late.

I would not use workqueues, but rather have the CPU detecting the
stall grovel through the other CPUs' stacks, which is what I do now for
architectures that don't support NMI-based stack dumps.  Would that be
a reasonable approach?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
> >   seen it happening for real) cause a complete, undebuggable, silent hang 
> >   of machine (deadlock in NMI context)
> 
> I could easily add an option to RCU to allow people to tell it not to
> use NMIs to dump the stack.  Would that help?

Well, that would make unfortunately the information provided by RCU stall 
detector rather useless ... workqueue-based stack dumping is very unlikely 
to point its finger to the real offender, as it'd be coming way too late.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 01:03:05PM +0200, Jiri Kosina wrote:
> On Tue, 10 Jun 2014, Linus Torvalds wrote:
> 
> > > Lets be crazy and Cc Linus on that.
> > 
> > Quite frankly, I hate seeing something like this:
> > 
> >  kernel/printk/printk.c  | 1218 
> > +--
> > 
> > for something that is stupid and broken. Printing from NMI context
> > isn't really supposed to work, and we all *know* it's not supposed to
> > work.
> > 
> > I'd much rather disallow it, and if there is one or two places that
> > really want to print a warning and know that they are in NMI context,
> > have a special workaround just for them, with something that does
> > *not* try to make printk in general work any better.
> > 
> > Dammit, NMI context is special. I absolutely refuse to buy into the
> > broken concept that we should make more stuff work in NMI context.
> > Hell no, we should *not* try to make more crap work in NMI. NMI people
> > should be careful.
> > 
> > Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> > logbuf_lock, and *maybe* the existing sequence of
> > 
> > if (console_trylock_for_printk())
> > console_unlock();
> > 
> > then works for actually triggering the printout. But the wrapper
> > should be 15 lines of code for "if possible, try to print things", and
> > *not* a thousand lines of changes.
> 
> Alright, so this went silent again without any real progress. Is everyone 
> hoping this gets sorted out on kernel summit, or ... ?
> 
> Let me sum up the current situation:
> 
> - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
>   seen it happening for real) cause a complete, undebuggable, silent hang 
>   of machine (deadlock in NMI context)

I could easily add an option to RCU to allow people to tell it not to
use NMIs to dump the stack.  Would that help?

Thanx, Paul

> - before 7ff9554bb578 and friends, this was trivial to fix more or less 
>   exactly the way Linus is proposing above. We've been carrying the 
>   fix in our kernels for a while already [1]. With printk() having got  
>   overly complicated recently, the "in principle trivial" fix turns out  
>   into crazy mess due to handling of all the indexes, sequence numbers, 
>   etc.
> 
> - printk() from NMI is actually useful in rare cases (such as inter-CPU 
>   stack dumping)
> 
> - using lockless buffers that trace_printk() is using has its own 
>   problems, as described by Petr elsewhere in this thread
> 
> 
> I find it rather outrageous that fixing *real bugs* (leading to hangs) 
> becomes impossible due to printk() being too complex. It's very 
> unfortunate that the same level of pushback didn't happen when new 
> features (that actually *made* it so complicated) have been pushed; that 
> would be much more valuable an appropriate.
> 
> I believe Jan Kara is in the same situation with his softlockup fixes for 
> printk. Those are real problems, as they are bringing machines down, and 
> after two years, still not fixed, because "printk() code is scary enough 
> as-is"
> 
> [1] 
> http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> -- 
> Jiri Kosina
> SUSE Labs
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > Lets be crazy and Cc Linus on that.
> 
> Quite frankly, I hate seeing something like this:
> 
>  kernel/printk/printk.c  | 1218 
> +--
> 
> for something that is stupid and broken. Printing from NMI context
> isn't really supposed to work, and we all *know* it's not supposed to
> work.
> 
> I'd much rather disallow it, and if there is one or two places that
> really want to print a warning and know that they are in NMI context,
> have a special workaround just for them, with something that does
> *not* try to make printk in general work any better.
> 
> Dammit, NMI context is special. I absolutely refuse to buy into the
> broken concept that we should make more stuff work in NMI context.
> Hell no, we should *not* try to make more crap work in NMI. NMI people
> should be careful.
> 
> Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> logbuf_lock, and *maybe* the existing sequence of
> 
> if (console_trylock_for_printk())
> console_unlock();
> 
> then works for actually triggering the printout. But the wrapper
> should be 15 lines of code for "if possible, try to print things", and
> *not* a thousand lines of changes.

Alright, so this went silent again without any real progress. Is everyone 
hoping this gets sorted out on kernel summit, or ... ?

Let me sum up the current situation:

- both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
  seen it happening for real) cause a complete, undebuggable, silent hang 
  of machine (deadlock in NMI context)

- before 7ff9554bb578 and friends, this was trivial to fix more or less 
  exactly the way Linus is proposing above. We've been carrying the 
  fix in our kernels for a while already [1]. With printk() having got  
  overly complicated recently, the "in principle trivial" fix turns out  
  into crazy mess due to handling of all the indexes, sequence numbers, 
  etc.

- printk() from NMI is actually useful in rare cases (such as inter-CPU 
  stack dumping)

- using lockless buffers that trace_printk() is using has its own 
  problems, as described by Petr elsewhere in this thread


I find it rather outrageous that fixing *real bugs* (leading to hangs) 
becomes impossible due to printk() being too complex. It's very 
unfortunate that the same level of pushback didn't happen when new 
features (that actually *made* it so complicated) have been pushed; that 
would be much more valuable an appropriate.

I believe Jan Kara is in the same situation with his softlockup fixes for 
printk. Those are real problems, as they are bringing machines down, and 
after two years, still not fixed, because "printk() code is scary enough 
as-is"

[1] 
http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Tue, 10 Jun 2014, Linus Torvalds wrote:

Lets be crazy and Cc Linus on that.

Quite frankly, I hate seeing something like this:

kernel/printk/printk.c | 1218
+--

for something that is stupid and broken. Printing from NMI context
isn't really supposed to work, and we all *know* it's not supposed to
work.

I'd much rather disallow it, and if there is one or two places that
really want to print a warning and know that they are in NMI context,
have a special workaround just for them, with something that does
*not* try to make printk in general work any better.

Dammit, NMI context is special. I absolutely refuse to buy into the
broken concept that we should make more stuff work in NMI context.
Hell no, we should *not* try to make more crap work in NMI. NMI people
should be careful.

Make a trivial printk_nmi() wrapper that tries to do a trylock on
logbuf_lock, and *maybe* the existing sequence of

if (console_trylock_for_printk())
console_unlock();

then works for actually triggering the printout. But the wrapper
should be 15 lines of code for if possible, try to print things, and
*not* a thousand lines of changes.

Alright, so this went silent again without any real progress. Is everyone
hoping this gets sorted out on kernel summit, or ... ?

Let me sum up the current situation:

- both RCU stall detector and 'echo l sysrq-trigger' can (and we've
seen it happening for real) cause a complete, undebuggable, silent hang
of machine (deadlock in NMI context)

- before 7ff9554bb578 and friends, this was trivial to fix more or less
exactly the way Linus is proposing above. We've been carrying the
fix in our kernels for a while already [1]. With printk() having got
overly complicated recently, the in principle trivial fix turns out
into crazy mess due to handling of all the indexes, sequence numbers,
etc.

- printk() from NMI is actually useful in rare cases (such as inter-CPU
stack dumping)

- using lockless buffers that trace_printk() is using has its own
problems, as described by Petr elsewhere in this thread

I find it rather outrageous that fixing *real bugs* (leading to hangs)
becomes impossible due to printk() being too complex. It's very
unfortunate that the same level of pushback didn't happen when new
features (that actually *made* it so complicated) have been pushed; that
would be much more valuable an appropriate.

I believe Jan Kara is in the same situation with his softlockup fixes for
printk. Those are real problems, as they are bringing machines down, and
after two years, still not fixed, because printk() code is scary enough
as-is

[1]
http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

  - both RCU stall detector and 'echo l  sysrq-trigger' can (and we've 
seen it happening for real) cause a complete, undebuggable, silent hang 
of machine (deadlock in NMI context)
 
 I could easily add an option to RCU to allow people to tell it not to
 use NMIs to dump the stack.  Would that help?

Well, that would make unfortunately the information provided by RCU stall 
detector rather useless ... workqueue-based stack dumping is very unlikely 
to point its finger to the real offender, as it'd be coming way too late.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:41:09PM +0200, Jiri Kosina wrote:
 On Wed, 18 Jun 2014, Paul E. McKenney wrote:
 
   - both RCU stall detector and 'echo l  sysrq-trigger' can (and we've 
 seen it happening for real) cause a complete, undebuggable, silent hang 
 of machine (deadlock in NMI context)
  
  I could easily add an option to RCU to allow people to tell it not to
  use NMIs to dump the stack.  Would that help?
 
 Well, that would make unfortunately the information provided by RCU stall 
 detector rather useless ... workqueue-based stack dumping is very unlikely 
 to point its finger to the real offender, as it'd be coming way too late.

I would not use workqueues, but rather have the CPU detecting the
stall grovel through the other CPUs' stacks, which is what I do now for
architectures that don't support NMI-based stack dumps.  Would that be
a reasonable approach?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

- both RCU stall detector and 'echo l  sysrq-trigger' can (and we've 
  seen it happening for real) cause a complete, undebuggable, silent 
hang 
  of machine (deadlock in NMI context)
   
   I could easily add an option to RCU to allow people to tell it not to
   use NMIs to dump the stack.  Would that help?
  
  Well, that would make unfortunately the information provided by RCU stall 
  detector rather useless ... workqueue-based stack dumping is very unlikely 
  to point its finger to the real offender, as it'd be coming way too late.
 
 I would not use workqueues, but rather have the CPU detecting the
 stall grovel through the other CPUs' stacks, which is what I do now for
 architectures that don't support NMI-based stack dumps.  Would that be
 a reasonable approach?

That would indeed solve lockups induced by RCU stall detector (and we 
should convert sysrq stack dumping code to use the same mechanism 
afterwards).

But then, the kernel is still polluted by quite a few instances of

WARN_ON(in_nmi())

BUG_IN(in_nmi())

if (in_nmi())
printk()

which need to be fixed separately afterwards anyway.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 01:03:05PM +0200, Jiri Kosina wrote:
On Tue, 10 Jun 2014, Linus Torvalds wrote:

Lets be crazy and Cc Linus on that.

Quite frankly, I hate seeing something like this:

kernel/printk/printk.c | 1218
+--

for something that is stupid and broken. Printing from NMI context
isn't really supposed to work, and we all *know* it's not supposed to
work.

Make a trivial printk_nmi() wrapper that tries to do a trylock on
logbuf_lock, and *maybe* the existing sequence of

if (console_trylock_for_printk())
console_unlock();

then works for actually triggering the printout. But the wrapper
should be 15 lines of code for if possible, try to print things, and
*not* a thousand lines of changes.

Alright, so this went silent again without any real progress. Is everyone
hoping this gets sorted out on kernel summit, or ... ?

Let me sum up the current situation:

- both RCU stall detector and 'echo l sysrq-trigger' can (and we've
seen it happening for real) cause a complete, undebuggable, silent hang
of machine (deadlock in NMI context)

I could easily add an option to RCU to allow people to tell it not to
use NMIs to dump the stack. Would that help?

Thanx, Paul

- printk() from NMI is actually useful in rare cases (such as inter-CPU
stack dumping)

- using lockless buffers that trace_printk() is using has its own
problems, as described by Petr elsewhere in this thread

[1]
http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 04:53:14PM +0200, Jiri Kosina wrote:
 On Wed, 18 Jun 2014, Paul E. McKenney wrote:
 
 - both RCU stall detector and 'echo l  sysrq-trigger' can (and we've 
   seen it happening for real) cause a complete, undebuggable, silent 
 hang 
   of machine (deadlock in NMI context)

I could easily add an option to RCU to allow people to tell it not to
use NMIs to dump the stack.  Would that help?
   
   Well, that would make unfortunately the information provided by RCU stall 
   detector rather useless ... workqueue-based stack dumping is very 
   unlikely 
   to point its finger to the real offender, as it'd be coming way too late.
  
  I would not use workqueues, but rather have the CPU detecting the
  stall grovel through the other CPUs' stacks, which is what I do now for
  architectures that don't support NMI-based stack dumps.  Would that be
  a reasonable approach?
 
 That would indeed solve lockups induced by RCU stall detector (and we 
 should convert sysrq stack dumping code to use the same mechanism 
 afterwards).
 
 But then, the kernel is still polluted by quite a few instances of
 
   WARN_ON(in_nmi())
 
   BUG_IN(in_nmi())
 
   if (in_nmi())
   printk()
 
 which need to be fixed separately afterwards anyway.

True enough!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
 On Jun 18, 2014 4:36 AM, Paul E. McKenney paul...@linux.vnet.ibm.com
 wrote:
 
  I could easily add an option to RCU to allow people to tell it not to
  use NMIs to dump the stack.
 
 I don't think it should be an option.
 
 We should stop using nmi as if it was something normal. It isn't. Code
 running in nmi context should be special, and should be very very aware
 that it is special. That goes way beyond don't use printk. We seem to
 have gone way way too far in using nmi context.
 
 So we should get *rid* of code in nmi context rather than then complain
 about printk being buggy.

OK, unconditional non-use of NMIs is even easier.  ;-)

Something like the following.

Thanx, Paul



rcu: Don't use NMIs to dump other CPUs' stacks

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks.  This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c590e1201c74..777624e1329b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state 
*rsp)
 }
 
 /*
- * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
  */
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
@@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
   (long)rsp-gpnum, (long)rsp-completed, totqlen);
if (ndetected == 0)
pr_err(INFO: Stall ended before state dump start\n);
-   else if (!trigger_all_cpu_backtrace())
+   else
rcu_dump_cpu_stacks(rsp);
 
/* Complain about tasks blocking the grace period. */
@@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
pr_cont( (t=%lu jiffies g=%ld c=%ld q=%lu)\n,
jiffies - rsp-gp_start,
(long)rsp-gpnum, (long)rsp-completed, totqlen);
-   if (!trigger_all_cpu_backtrace())
-   dump_stack();
+   rcu_dump_cpu_stacks(rsp);
 
raw_spin_lock_irqsave(rnp-lock, flags);
if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp-jiffies_stall)))

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014 09:21:17 -0700
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
  On Jun 18, 2014 4:36 AM, Paul E. McKenney paul...@linux.vnet.ibm.com
  wrote:
  
   I could easily add an option to RCU to allow people to tell it not to
   use NMIs to dump the stack.
  
  I don't think it should be an option.
  
  We should stop using nmi as if it was something normal. It isn't. Code
  running in nmi context should be special, and should be very very aware
  that it is special. That goes way beyond don't use printk. We seem to
  have gone way way too far in using nmi context.
  
  So we should get *rid* of code in nmi context rather than then complain
  about printk being buggy.
 
 OK, unconditional non-use of NMIs is even easier.  ;-)
 
 Something like the following.
 

I have found the RCU stalls extremely useful in debugging lockups. In
case this doesn't work as well, I'm willing to write up something that
could send NMIs to all CPUs that would write into the ftrace ring
buffer and when finished, the calling CPU can dump it out. No printk
from NMI context at all.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 12:38:37PM -0400, Steven Rostedt wrote:
 On Wed, 18 Jun 2014 09:21:17 -0700
 Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
 
  On Wed, Jun 18, 2014 at 05:58:40AM -1000, Linus Torvalds wrote:
   On Jun 18, 2014 4:36 AM, Paul E. McKenney paul...@linux.vnet.ibm.com
   wrote:
   
I could easily add an option to RCU to allow people to tell it not to
use NMIs to dump the stack.
   
   I don't think it should be an option.
   
   We should stop using nmi as if it was something normal. It isn't. Code
   running in nmi context should be special, and should be very very aware
   that it is special. That goes way beyond don't use printk. We seem to
   have gone way way too far in using nmi context.
   
   So we should get *rid* of code in nmi context rather than then complain
   about printk being buggy.
  
  OK, unconditional non-use of NMIs is even easier.  ;-)
  
  Something like the following.
 
 I have found the RCU stalls extremely useful in debugging lockups. In
 case this doesn't work as well, I'm willing to write up something that
 could send NMIs to all CPUs that would write into the ftrace ring
 buffer and when finished, the calling CPU can dump it out. No printk
 from NMI context at all.

Sounds like a good plan to me!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

 OK, unconditional non-use of NMIs is even easier.  ;-)
 
 Something like the following.
 
   Thanx, Paul
 
 
 
 rcu: Don't use NMIs to dump other CPUs' stacks
 
 Although NMI-based stack dumps are in principle more accurate, they are
 also more likely to trigger deadlocks.  This commit therefore replaces
 all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
 that the CPU detecting an RCU CPU stall does the stack dumping.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 
 diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
 index c590e1201c74..777624e1329b 100644
 --- a/kernel/rcu/tree.c
 +++ b/kernel/rcu/tree.c
 @@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state 
 *rsp)
  }
  
  /*
 - * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
 - * for architectures that do not implement trigger_all_cpu_backtrace().
 - * The NMI-triggered stack traces are more accurate because they are
 - * printed by the target CPU.
 + * Dump stacks of all tasks running on stalled CPUs.
   */
  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
  {
 @@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
  (long)rsp-gpnum, (long)rsp-completed, totqlen);
   if (ndetected == 0)
   pr_err(INFO: Stall ended before state dump start\n);
 - else if (!trigger_all_cpu_backtrace())
 + else
   rcu_dump_cpu_stacks(rsp);
  
   /* Complain about tasks blocking the grace period. */
 @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
   pr_cont( (t=%lu jiffies g=%ld c=%ld q=%lu)\n,
   jiffies - rsp-gp_start,
   (long)rsp-gpnum, (long)rsp-completed, totqlen);
 - if (!trigger_all_cpu_backtrace())
 - dump_stack();
 + rcu_dump_cpu_stacks(rsp);

This is prone to producing not really consistent stacktraces though, 
right? As the target task is still running at the time the stack is being 
walked, it might produce stacktraces that are potentially nonsensial.

How about sending NMI to the target CPU, so that the task is actually 
stopped, but printing its stacktrace from the CPU that detected the stall 
while it's stopped?

That way, there is no printk()-from-NMI, but also the stacktrace is 
guaranteed to be self-consistent.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 10:36:10PM +0200, Jiri Kosina wrote:
 On Wed, 18 Jun 2014, Paul E. McKenney wrote:
 
  OK, unconditional non-use of NMIs is even easier.  ;-)
  
  Something like the following.
  
  Thanx, Paul
  
  
  
  rcu: Don't use NMIs to dump other CPUs' stacks
  
  Although NMI-based stack dumps are in principle more accurate, they are
  also more likely to trigger deadlocks.  This commit therefore replaces
  all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
  that the CPU detecting an RCU CPU stall does the stack dumping.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  
  diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
  index c590e1201c74..777624e1329b 100644
  --- a/kernel/rcu/tree.c
  +++ b/kernel/rcu/tree.c
  @@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct 
  rcu_state *rsp)
   }
   
   /*
  - * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
  - * for architectures that do not implement trigger_all_cpu_backtrace().
  - * The NMI-triggered stack traces are more accurate because they are
  - * printed by the target CPU.
  + * Dump stacks of all tasks running on stalled CPUs.
*/
   static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
   {
  @@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state 
  *rsp)
 (long)rsp-gpnum, (long)rsp-completed, totqlen);
  if (ndetected == 0)
  pr_err(INFO: Stall ended before state dump start\n);
  -   else if (!trigger_all_cpu_backtrace())
  +   else
  rcu_dump_cpu_stacks(rsp);
   
  /* Complain about tasks blocking the grace period. */
  @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
  pr_cont( (t=%lu jiffies g=%ld c=%ld q=%lu)\n,
  jiffies - rsp-gp_start,
  (long)rsp-gpnum, (long)rsp-completed, totqlen);
  -   if (!trigger_all_cpu_backtrace())
  -   dump_stack();
  +   rcu_dump_cpu_stacks(rsp);
 
 This is prone to producing not really consistent stacktraces though, 
 right? As the target task is still running at the time the stack is being 
 walked, it might produce stacktraces that are potentially nonsensial.

If a CPU is stuck, the stack trace down to where it is stuck is
likely to be static.  But yes, there is some potential for confusion.
My (admittedly limited) rcutorture testing produced sensible stack traces,
but things might be a bit uglier in other situations.

 How about sending NMI to the target CPU, so that the task is actually 
 stopped, but printing its stacktrace from the CPU that detected the stall 
 while it's stopped?
 
 That way, there is no printk()-from-NMI, but also the stacktrace is 
 guaranteed to be self-consistent.

I believe that this was what Steven was suggesting, though by using
tracing.  Of course, if my current approach isn't up to the job,
then something like this general approach would look quite good.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

 /* Complain about tasks blocking the grace period. */
   @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 pr_cont( (t=%lu jiffies g=%ld c=%ld q=%lu)\n,
 jiffies - rsp-gp_start,
 (long)rsp-gpnum, (long)rsp-completed, totqlen);
   - if (!trigger_all_cpu_backtrace())
   - dump_stack();
   + rcu_dump_cpu_stacks(rsp);
  
  This is prone to producing not really consistent stacktraces though, 
  right? As the target task is still running at the time the stack is being 
  walked, it might produce stacktraces that are potentially nonsensial.
 
 If a CPU is stuck, the stack trace down to where it is stuck is
 likely to be static.  But yes, there is some potential for confusion.
 My (admittedly limited) rcutorture testing produced sensible stack traces,
 but things might be a bit uglier in other situations.

I agree that it might work nicely for RCU stall detector indeed. I was 
looking for solution that'd work nicely both for RCU and for sysrq-l 
(where we can't rely on processess being stuck in any way).

  How about sending NMI to the target CPU, so that the task is actually 
  stopped, but printing its stacktrace from the CPU that detected the stall 
  while it's stopped?
  
  That way, there is no printk()-from-NMI, but also the stacktrace is 
  guaranteed to be self-consistent.
 
 I believe that this was what Steven was suggesting, though by using
 tracing.  

My understanding was that Steven is suggesting using trace_printk() from 
NMI.

 Of course, if my current approach isn't up to the job, then something 
 like this general approach would look quite good.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 11:12:48PM +0200, Jiri Kosina wrote:
 On Wed, 18 Jun 2014, Paul E. McKenney wrote:
 
/* Complain about tasks blocking the grace period. */
@@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
pr_cont( (t=%lu jiffies g=%ld c=%ld q=%lu)\n,
jiffies - rsp-gp_start,
(long)rsp-gpnum, (long)rsp-completed, totqlen);
-   if (!trigger_all_cpu_backtrace())
-   dump_stack();
+   rcu_dump_cpu_stacks(rsp);
   
   This is prone to producing not really consistent stacktraces though, 
   right? As the target task is still running at the time the stack is being 
   walked, it might produce stacktraces that are potentially nonsensial.
  
  If a CPU is stuck, the stack trace down to where it is stuck is
  likely to be static.  But yes, there is some potential for confusion.
  My (admittedly limited) rcutorture testing produced sensible stack traces,
  but things might be a bit uglier in other situations.
 
 I agree that it might work nicely for RCU stall detector indeed. I was 
 looking for solution that'd work nicely both for RCU and for sysrq-l 
 (where we can't rely on processess being stuck in any way).

Agreed.  And if some more generally useful approach appears, I will be
quite happy to adjust RCU to use it.  In the meantime, I expect that
my patch will be helpful.

Thanx, Paul

   How about sending NMI to the target CPU, so that the task is actually 
   stopped, but printing its stacktrace from the CPU that detected the stall 
   while it's stopped?
   
   That way, there is no printk()-from-NMI, but also the stacktrace is 
   guaranteed to be self-consistent.
  
  I believe that this was what Steven was suggesting, though by using
  tracing.  
 
 My understanding was that Steven is suggesting using trace_printk() from 
 NMI.
 
  Of course, if my current approach isn't up to the job, then something 
  like this general approach would look quite good.
 
 Thanks,
 
 -- 
 Jiri Kosina
 SUSE Labs
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014, Paul E. McKenney wrote:

  I agree that it might work nicely for RCU stall detector indeed. I was 
  looking for solution that'd work nicely both for RCU and for sysrq-l 
  (where we can't rely on processess being stuck in any way).
 
 Agreed.  And if some more generally useful approach appears, I will be
 quite happy to adjust RCU to use it.  In the meantime, I expect that
 my patch will be helpful.

Agreed. And we'll look into fixing sysrq-l in parallel I guess; once there 
is a working solution (hangs with sysrq-l can be trivially reproduced 
almost immediately), we can then migrate RCU to it.

Still, I feel bad about the fact that we are now hostages of our printk() 
implementation, which doesn't allow for any fixes/improvements. Having the 
possibility to printk() from NMI would be nice and more robust ... 
otherwise, we'll be getting people trying to do it in the future over and 
over again, even if we now get rid of it at once.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 11:32:53PM +0200, Jiri Kosina wrote:
 On Wed, 18 Jun 2014, Paul E. McKenney wrote:
 
   I agree that it might work nicely for RCU stall detector indeed. I was 
   looking for solution that'd work nicely both for RCU and for sysrq-l 
   (where we can't rely on processess being stuck in any way).
  
  Agreed.  And if some more generally useful approach appears, I will be
  quite happy to adjust RCU to use it.  In the meantime, I expect that
  my patch will be helpful.
 
 Agreed. And we'll look into fixing sysrq-l in parallel I guess; once there 
 is a working solution (hangs with sysrq-l can be trivially reproduced 
 almost immediately), we can then migrate RCU to it.
 
 Still, I feel bad about the fact that we are now hostages of our printk() 
 implementation, which doesn't allow for any fixes/improvements. Having the 
 possibility to printk() from NMI would be nice and more robust ... 
 otherwise, we'll be getting people trying to do it in the future over and 
 over again, even if we now get rid of it at once.

Well, we could always have printk() splat if invoke in_nmi().

Oh, wait...  ;-)

More seriously, an in_nmi() printk() could taint the kernel, set a
flag that results in a deferred splat, do a trace_printk(), or any
number of things to let the developer know that this was a bad idea.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, 18 Jun 2014 23:12:48 +0200 (CEST)
Jiri Kosina jkos...@suse.cz wrote:


  I believe that this was what Steven was suggesting, though by using
  tracing.  
 
 My understanding was that Steven is suggesting using trace_printk() from 
 NMI.

Not quite. I was suggesting using the ftrace ring buffer. It could have
its own way to write the stack and not depend on tracing itself.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-12 Thread Petr Mládek

On Thu 29-05-14 02:09:11, Frederic Weisbecker wrote:
> On Thu, May 29, 2014 at 12:02:30AM +0200, Jiri Kosina wrote:
> > On Fri, 9 May 2014, Petr Mladek wrote:
> > 
> > > printk() cannot be used safely in NMI context because it uses internal 
> > > locks
> > > and thus could cause a deadlock. Unfortunately there are circumstances 
> > > when
> > > calling printk from NMI is very useful. For example, all WARN.*(in_nmi())
> > > would be much more helpful if they didn't lockup the machine.
> > > 
> > > Another example would be arch_trigger_all_cpu_backtrace for x86 which 
> > > uses NMI
> > > to dump traces on all CPU (either triggered by sysrq+l or from RCU stall
> > > detector).
> > 
> > I am rather surprised that this patchset hasn't received a single review 
> > comment for 3 weeks.
> > 
> > Let me point out that the issues Petr is talking about in the cover letter 
> > are real -- we've actually seen the lockups triggered by RCU stall 
> > detector trying to dump stacks on all CPUs, and hard-locking machine up 
> > while doing so.
> > 
> > So this really needs to be solved.
> 
> The lack of review may be partly due to a not very appealing changestat on an
> old codebase that is already unpopular:
> 
>  Documentation/kernel-parameters.txt |   19 +-
>  kernel/printk/printk.c  | 1218 
> +--
>  2 files changed, 878 insertions(+), 359 deletions(-)
> 
> 
> Your patches look clean and pretty nice actually. They must be seriously 
> considered if
> we want to keep the current locked ring buffer design and extend it to 
> multiple per context
> buffers. But I wonder if it's worth to continue that way with the printk 
> ancient design.
> 
> If it takes more than 1000 line changes (including 500 added) to make it 
> finally work
> correctly with NMIs by working around its fundamental flaws, shouldn't we 
> rather redesign
> it to use a lockless ring buffer like ftrace or perf ones?


I like the idea of the lock less buffer. So I have spent some time
to understand kernel/trace/ring_buffer.c and there are some challenges
that would need to be solved. Some of them might be pretty hard :-(

Here are the hardest missing features from my point of view:

+ storing the last message in all situations
+ reading from more locations in parallel
+ "aggressive" printing to console

, see below for more details.

Also note that the current code is already quite complex. There are
many tricks to solve conflicts a lockless way and it might get worse if
we want to solve the above stuff.



Bellow are more details that explain the above statements. But first,
let me show how I understand the lock less ringbuffer:

+ there are separate buffers for each CPU
+ writers use a circular list of pages that are linked in both
  directions
+ writers reserve space before they copy the data
+ reader has an extra page that is not in the main ring and thus
  not accessible by writers
+ reader swap its page with another one from the main ring buffer
  when it wants to read some newer data

+ pages might have special flag/function:

+ reader: the separate page accessed by reader
+ head: page with the oldest data; the next one to be read
+ tail: page with the newest data; the next write will try to
reserve the space here
+ commit: the newest page where we have already copied the
data; it is usually the same as tail; the difference
happens when the write is interrupted and followup pages
are reserved and written; we must newer move tail over
this page, otherwise we would reserve the same location
twice, overwrite the data, and create mess

I hope that I have got the above correctly. It is why I think that the
missing features are hard to solve.

Here are the details about the above mentioned problems:

1. storing the last message in all situations
-

  The problem is if we reserve space in a normal context, get
  interrupted, and the interrupt wants to write more data than the size
  of the ring buffer. We must stop rotating when we hit the first
  reserved but not committed page. Here are the code pointers:

  ring_buffer_write()
rb_reserve_next_event()
  rb_reserve_next()
rb_move_tail()
  if (unlikely(next_page == commit_page)) {
goto out_reset;

  This is must to have because the data are simply copied when
  the space is reserved, see memcpy(body, data, length) in
  ring_buffer_write()

  I think that we do not want this for printk(). The last messages are
  usually more important, especially in case of a fatal error.

  Possible solutions:

  + use different ring buffer for each context; it would need even
more space

  + lock the page for the given context when a space is reserved; such
locked page will be skipped when the buffer is rotated in a nested

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-12 Thread Petr Mládek

On Thu 29-05-14 02:09:11, Frederic Weisbecker wrote:
 On Thu, May 29, 2014 at 12:02:30AM +0200, Jiri Kosina wrote:
  On Fri, 9 May 2014, Petr Mladek wrote:
  
   printk() cannot be used safely in NMI context because it uses internal 
   locks
   and thus could cause a deadlock. Unfortunately there are circumstances 
   when
   calling printk from NMI is very useful. For example, all WARN.*(in_nmi())
   would be much more helpful if they didn't lockup the machine.
   
   Another example would be arch_trigger_all_cpu_backtrace for x86 which 
   uses NMI
   to dump traces on all CPU (either triggered by sysrq+l or from RCU stall
   detector).
  
  I am rather surprised that this patchset hasn't received a single review 
  comment for 3 weeks.
  
  Let me point out that the issues Petr is talking about in the cover letter 
  are real -- we've actually seen the lockups triggered by RCU stall 
  detector trying to dump stacks on all CPUs, and hard-locking machine up 
  while doing so.
  
  So this really needs to be solved.
 
 The lack of review may be partly due to a not very appealing changestat on an
 old codebase that is already unpopular:
 
  Documentation/kernel-parameters.txt |   19 +-
  kernel/printk/printk.c  | 1218 
 +--
  2 files changed, 878 insertions(+), 359 deletions(-)
 
 
 Your patches look clean and pretty nice actually. They must be seriously 
 considered if
 we want to keep the current locked ring buffer design and extend it to 
 multiple per context
 buffers. But I wonder if it's worth to continue that way with the printk 
 ancient design.
 
 If it takes more than 1000 line changes (including 500 added) to make it 
 finally work
 correctly with NMIs by working around its fundamental flaws, shouldn't we 
 rather redesign
 it to use a lockless ring buffer like ftrace or perf ones?


I like the idea of the lock less buffer. So I have spent some time
to understand kernel/trace/ring_buffer.c and there are some challenges
that would need to be solved. Some of them might be pretty hard :-(

Here are the hardest missing features from my point of view:

+ storing the last message in all situations
+ reading from more locations in parallel
+ aggressive printing to console

, see below for more details.

Also note that the current code is already quite complex. There are
many tricks to solve conflicts a lockless way and it might get worse if
we want to solve the above stuff.



Bellow are more details that explain the above statements. But first,
let me show how I understand the lock less ringbuffer:

+ there are separate buffers for each CPU
+ writers use a circular list of pages that are linked in both
  directions
+ writers reserve space before they copy the data
+ reader has an extra page that is not in the main ring and thus
  not accessible by writers
+ reader swap its page with another one from the main ring buffer
  when it wants to read some newer data

+ pages might have special flag/function:

+ reader: the separate page accessed by reader
+ head: page with the oldest data; the next one to be read
+ tail: page with the newest data; the next write will try to
reserve the space here
+ commit: the newest page where we have already copied the
data; it is usually the same as tail; the difference
happens when the write is interrupted and followup pages
are reserved and written; we must newer move tail over
this page, otherwise we would reserve the same location
twice, overwrite the data, and create mess

I hope that I have got the above correctly. It is why I think that the
missing features are hard to solve.

Here are the details about the above mentioned problems:

1. storing the last message in all situations
-

  The problem is if we reserve space in a normal context, get
  interrupted, and the interrupt wants to write more data than the size
  of the ring buffer. We must stop rotating when we hit the first
  reserved but not committed page. Here are the code pointers:

  ring_buffer_write()
rb_reserve_next_event()
  rb_reserve_next()
rb_move_tail()
  if (unlikely(next_page == commit_page)) {
goto out_reset;

  This is must to have because the data are simply copied when
  the space is reserved, see memcpy(body, data, length) in
  ring_buffer_write()

  I think that we do not want this for printk(). The last messages are
  usually more important, especially in case of a fatal error.

  Possible solutions:

  + use different ring buffer for each context; it would need even
more space

  + lock the page for the given context when a space is reserved; such
locked page will be skipped when the buffer is rotated in a nested
interrupt context; this will make the algorithm even more
complicated; I am

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-11 Thread Petr Mládek

On Tue 2014-06-10 19:32:51, Jiri Kosina wrote:
> On Tue, 10 Jun 2014, Linus Torvalds wrote:
> 
> > > Lets be crazy and Cc Linus on that.
> > 
> > Quite frankly, I hate seeing something like this:
> > 
> >  kernel/printk/printk.c  | 1218 
> > +--
> > 
> > for something that is stupid and broken. Printing from NMI context
> > isn't really supposed to work, and we all *know* it's not supposed to
> > work.
> 
> It's OTOH rather useful in a few scenarios -- particularly it's the only 
> way to dump stacktraces from remote CPUs in order to obtain traces that 
> actually make sense (in situations like RCU stall); using workqueue-based 
> dumping is useless there.
> 
> > Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> > logbuf_lock, and *maybe* the existing sequence of
> > 
> > if (console_trylock_for_printk())
> > console_unlock();
> > 
> > then works for actually triggering the printout. But the wrapper
> > should be 15 lines of code for "if possible, try to print things", and
> > *not* a thousand lines of changes.

I am afraid that basically this is done in my patch set. It does
trylock and uses the main buffer when possible. I am just not able to
squeeze it into 15 lines :-(

One problem is that we do not want to loose the messages,
e.g. stacktraces. We need to store them somewhere and merge them into
the main ring buffer later.

> Well, we are carrying much simpler fix for this whole braindamage in our 
> enterprise kernel that is from pre-7ff9554bb578 era, and it was rather 
> simple fix in principle (the diffstat is much larger than it had to be due 
> to code movements):
> 
>   
> http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> But after the scary 7ff9554bb578 and its successors, things got a lot more 
> complicated.

Yes, another big problem is the above mentioned commit. The reading from the
temporary storage has to be in the normal context and thus lockless.
It is much more complicated when we work with whole messages and all
the added flags.

Also note that we want to save the last messages when the temporary storage
is full. This is why I used a ring buffer and was not able to use a
more simple producer and consumer algorithm.


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-11 Thread Petr Mládek

On Tue 2014-06-10 19:32:51, Jiri Kosina wrote:
On Tue, 10 Jun 2014, Linus Torvalds wrote:

Lets be crazy and Cc Linus on that.

Quite frankly, I hate seeing something like this:

kernel/printk/printk.c | 1218
+--

for something that is stupid and broken. Printing from NMI context
isn't really supposed to work, and we all *know* it's not supposed to
work.

It's OTOH rather useful in a few scenarios -- particularly it's the only
way to dump stacktraces from remote CPUs in order to obtain traces that
actually make sense (in situations like RCU stall); using workqueue-based
dumping is useless there.

Make a trivial printk_nmi() wrapper that tries to do a trylock on
logbuf_lock, and *maybe* the existing sequence of

if (console_trylock_for_printk())
console_unlock();

then works for actually triggering the printout. But the wrapper
should be 15 lines of code for if possible, try to print things, and
*not* a thousand lines of changes.

I am afraid that basically this is done in my patch set. It does
trylock and uses the main buffer when possible. I am just not able to
squeeze it into 15 lines :-(

One problem is that we do not want to loose the messages,
e.g. stacktraces. We need to store them somewhere and merge them into
the main ring buffer later.

Well, we are carrying much simpler fix for this whole braindamage in our
enterprise kernel that is from pre-7ff9554bb578 era, and it was rather
simple fix in principle (the diffstat is much larger than it had to be due
to code movements):

http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

But after the scary 7ff9554bb578 and its successors, things got a lot more
complicated.

Yes, another big problem is the above mentioned commit. The reading from the
temporary storage has to be in the normal context and thus lockless.
It is much more complicated when we work with whole messages and all
the added flags.

Also note that we want to save the last messages when the temporary storage
is full. This is why I used a ring buffer and was not able to use a
more simple producer and consumer algorithm.

Best Regards,
Petr
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-10 Thread Jiri Kosina

On Tue, 10 Jun 2014, Linus Torvalds wrote:

> > Lets be crazy and Cc Linus on that.
> 
> Quite frankly, I hate seeing something like this:
> 
>  kernel/printk/printk.c  | 1218 
> +--
> 
> for something that is stupid and broken. Printing from NMI context
> isn't really supposed to work, and we all *know* it's not supposed to
> work.

It's OTOH rather useful in a few scenarios -- particularly it's the only 
way to dump stacktraces from remote CPUs in order to obtain traces that 
actually make sense (in situations like RCU stall); using workqueue-based 
dumping is useless there.

> I'd much rather disallow it, and if there is one or two places that
> really want to print a warning and know that they are in NMI context,
> have a special workaround just for them, with something that does
> *not* try to make printk in general work any better.

Well, that'd mean that at least our stack dumping mechanism would need to 
know both ways of printing; but yes, it'll still probably be less than 880 
lines added.

> Dammit, NMI context is special. I absolutely refuse to buy into the
> broken concept that we should make more stuff work in NMI context.
> Hell no, we should *not* try to make more crap work in NMI. NMI people
> should be careful.

In parallel, I'd for the sake of argument propose to just drop the whole 
_CONT printing (and all the things that followed on top) as that made 
printk() a complete hell to maintain for a disputable gain IMO.

> Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
> logbuf_lock, and *maybe* the existing sequence of
> 
> if (console_trylock_for_printk())
> console_unlock();
> 
> then works for actually triggering the printout. But the wrapper
> should be 15 lines of code for "if possible, try to print things", and
> *not* a thousand lines of changes.

Well, we are carrying much simpler fix for this whole braindamage in our 
enterprise kernel that is from pre-7ff9554bb578 era, and it was rather 
simple fix in principle (the diffstat is much larger than it had to be due 
to code movements):


http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP3=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

But after the scary 7ff9554bb578 and its successors, things got a lot more 
complicated.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

2014-06-10 Thread Linus Torvalds

On Tue, Jun 10, 2014 at 9:46 AM, Frederic Weisbecker  wrote:
>
> There is also a big risk that if we push back this bugfix, nobody will 
> actually do
> that desired rewrite.
>
> Lets be crazy and Cc Linus on that.

Quite frankly, I hate seeing something like this:

 kernel/printk/printk.c  | 1218 +--

for something that is stupid and broken. Printing from NMI context
isn't really supposed to work, and we all *know* it's not supposed to
work.

I'd much rather disallow it, and if there is one or two places that
really want to print a warning and know that they are in NMI context,
have a special workaround just for them, with something that does
*not* try to make printk in general work any better.

Dammit, NMI context is special. I absolutely refuse to buy into the
broken concept that we should make more stuff work in NMI context.
Hell no, we should *not* try to make more crap work in NMI. NMI people
should be careful.

Make a trivial "printk_nmi()" wrapper that tries to do a trylock on
logbuf_lock, and *maybe* the existing sequence of

if (console_trylock_for_printk())
console_unlock();

then works for actually triggering the printout. But the wrapper
should be 15 lines of code for "if possible, try to print things", and
*not* a thousand lines of changes.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Fri, May 30, 2014 at 10:13:28AM +0200, Jan Kara wrote:
> On Thu 29-05-14 02:09:11, Frederic Weisbecker wrote:
> > On Thu, May 29, 2014 at 12:02:30AM +0200, Jiri Kosina wrote:
> > > On Fri, 9 May 2014, Petr Mladek wrote:
> > > 
> > > > printk() cannot be used safely in NMI context because it uses internal 
> > > > locks
> > > > and thus could cause a deadlock. Unfortunately there are circumstances 
> > > > when
> > > > calling printk from NMI is very useful. For example, all 
> > > > WARN.*(in_nmi())
> > > > would be much more helpful if they didn't lockup the machine.
> > > > 
> > > > Another example would be arch_trigger_all_cpu_backtrace for x86 which 
> > > > uses NMI
> > > > to dump traces on all CPU (either triggered by sysrq+l or from RCU stall
> > > > detector).
> > > 
> > > I am rather surprised that this patchset hasn't received a single review 
> > > comment for 3 weeks.
> > > 
> > > Let me point out that the issues Petr is talking about in the cover 
> > > letter 
> > > are real -- we've actually seen the lockups triggered by RCU stall 
> > > detector trying to dump stacks on all CPUs, and hard-locking machine up 
> > > while doing so.
> > > 
> > > So this really needs to be solved.
> > 
> > The lack of review may be partly due to a not very appealing changestat on 
> > an
> > old codebase that is already unpopular:
> > 
> >  Documentation/kernel-parameters.txt |   19 +-
> >  kernel/printk/printk.c  | 1218 
> > +--
> >  2 files changed, 878 insertions(+), 359 deletions(-)
> > 
> > 
> > Your patches look clean and pretty nice actually. They must be seriously
> > considered if we want to keep the current locked ring buffer design and
> > extend it to multiple per context buffers. But I wonder if it's worth to
> > continue that way with the printk ancient design.
> > 
> > If it takes more than 1000 line changes (including 500 added) to make it
> > finally work correctly with NMIs by working around its fundamental flaws,
> > shouldn't we rather redesign it to use a lockless ring buffer like ftrace
> > or perf ones?
>   I agree that lockless ringbuffer would be a more elegant solution but a
> much more intrusive one and complex as well. Petr's patch set basically
> leaves ordinary printk path intact to avoid concerns about regressions
> there.
> 
> Given how difficult / time consuming is it to push any complex changes to
> printk I'd push for fixing printk from NMI in this inelegant but relatively
> non-contentious way and work on converting printk to lockless
> implementation long term. But before spending huge amount of time on that
> I'd like to get some wider concensus that this is really the way we want to
> go - at least AKPM and Steven - something for discussion in the KS topic I'd
> proposed I think [1].

Agreed, lets wait for others opinion. Andrew, Steve?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Thu, May 29, 2014 at 10:09:48AM +0200, Jiri Kosina wrote:
> On Thu, 29 May 2014, Frederic Weisbecker wrote:
> 
> > > I am rather surprised that this patchset hasn't received a single review 
> > > comment for 3 weeks.
> > > 
> > > Let me point out that the issues Petr is talking about in the cover 
> > > letter 
> > > are real -- we've actually seen the lockups triggered by RCU stall 
> > > detector trying to dump stacks on all CPUs, and hard-locking machine up 
> > > while doing so.
> > > 
> > > So this really needs to be solved.
> > 
> > The lack of review may be partly due to a not very appealing changestat 
> > on an old codebase that is already unpopular:
> > 
> >  Documentation/kernel-parameters.txt |   19 +-
> >  kernel/printk/printk.c  | 1218 
> > +--
> >  2 files changed, 878 insertions(+), 359 deletions(-)
> > 
> > 
> > Your patches look clean and pretty nice actually. They must be seriously 
> > considered if we want to keep the current locked ring buffer design and 
> > extend it to multiple per context buffers. But I wonder if it's worth to 
> > continue that way with the printk ancient design.
> > 
> > If it takes more than 1000 line changes (including 500 added) to make it 
> > finally work correctly with NMIs by working around its fundamental 
> > flaws, shouldn't we rather redesign it to use a lockless ring buffer 
> > like ftrace or perf ones?
> 
> Yeah, printk() has grown over years to a stinking pile of you-know-what, 
> no argument to that.
> 
> I also agree that performing a massive rewrite, which will make it use a 
> lockless buffer, and therefore ultimately solve all its problems 
> (scheduler deadlocks, NMI deadlocks, xtime_lock deadlocks) at once, is 
> necessary in the long run.
> 
> On the other hand, I am completely sure that the diffstat for such rewrite 
> is going to be much more scary :)

Indeed, but probably much more valuable in the long term.

> 
> This is not adding fancy features to printk(), where we really should be 
> saying no; horrible commits like 7ff9554bb5 is exactly something that 
> should be pushed against *heavily*. But bugfixes for hard machine lockups 
> are a completely different story to me (until we have a whole new printk() 
> buffer handling implementation).

Yeah bugfixes are certainly another story. Still it looks like yet another
layer of workaround on a big hack.

But yeah I'm certainly not in a right position to set anyone to do a massive
rewrite on such a boring subsystem :)

There is also a big risk that if we push back this bugfix, nobody will actually 
do
that desired rewrite.

Lets be crazy and Cc Linus on that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Thu, May 29, 2014 at 10:09:48AM +0200, Jiri Kosina wrote:
 On Thu, 29 May 2014, Frederic Weisbecker wrote:
 
   I am rather surprised that this patchset hasn't received a single review 
   comment for 3 weeks.
   
   Let me point out that the issues Petr is talking about in the cover 
   letter 
   are real -- we've actually seen the lockups triggered by RCU stall 
   detector trying to dump stacks on all CPUs, and hard-locking machine up 
   while doing so.
   
   So this really needs to be solved.
  
  The lack of review may be partly due to a not very appealing changestat 
  on an old codebase that is already unpopular:
  
   Documentation/kernel-parameters.txt |   19 +-
   kernel/printk/printk.c  | 1218 
  +--
   2 files changed, 878 insertions(+), 359 deletions(-)
  
  
  Your patches look clean and pretty nice actually. They must be seriously 
  considered if we want to keep the current locked ring buffer design and 
  extend it to multiple per context buffers. But I wonder if it's worth to 
  continue that way with the printk ancient design.
  
  If it takes more than 1000 line changes (including 500 added) to make it 
  finally work correctly with NMIs by working around its fundamental 
  flaws, shouldn't we rather redesign it to use a lockless ring buffer 
  like ftrace or perf ones?
 
 Yeah, printk() has grown over years to a stinking pile of you-know-what, 
 no argument to that.
 
 I also agree that performing a massive rewrite, which will make it use a 
 lockless buffer, and therefore ultimately solve all its problems 
 (scheduler deadlocks, NMI deadlocks, xtime_lock deadlocks) at once, is 
 necessary in the long run.
 
 On the other hand, I am completely sure that the diffstat for such rewrite 
 is going to be much more scary :)

Indeed, but probably much more valuable in the long term.

 
 This is not adding fancy features to printk(), where we really should be 
 saying no; horrible commits like 7ff9554bb5 is exactly something that 
 should be pushed against *heavily*. But bugfixes for hard machine lockups 
 are a completely different story to me (until we have a whole new printk() 
 buffer handling implementation).

Yeah bugfixes are certainly another story. Still it looks like yet another
layer of workaround on a big hack.

But yeah I'm certainly not in a right position to set anyone to do a massive
rewrite on such a boring subsystem :)

There is also a big risk that if we push back this bugfix, nobody will actually 
do
that desired rewrite.

Lets be crazy and Cc Linus on that.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] printk: safe printing in NMI context