Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Sergey Senozhatsky
On (09/07/18 16:03), Peter Zijlstra wrote:
> > 
> > I would even argue that placing printk_nmi_enter() between
> > lockdep_off() and ftrace_nmi_enter() is wrong because if in the future
> > printk_nmi_enter() were to do any ftrace tracing, it wont be caught, as
> > it was by having it before lockdep_off().
> > 
> > printk_nmi_enter() should not muck with IRQ state, nor should it do any
> > ftrace tracing. Since ftrace mucks with IRQ state when it gets enabled
> > or disabled, it will screw up lockdep, and lockdep will complain. That
> > way we can use lockdep not being off to catch this bug.
> 
> The very bestest solution is to rm -rf printk ;-)

Talented, capable and tremendously clever people had spent decades on
making printk what it is today. I feel responsible for respecting that
effort and, thus, my vote would be to keep printk around for a while.
... we also support !CONFIG_PRINTK builds ;)

-ss


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Fri, Sep 07, 2018 at 10:01:28AM -0400, Steven Rostedt wrote:
> On Fri, 7 Sep 2018 09:55:33 -0400
> Steven Rostedt  wrote:
> 
> > On Fri, 7 Sep 2018 15:45:32 +0200
> > Peter Zijlstra  wrote:
> > 
> > > Yes really, we should not muck with the IRQ state from NMI context.  
> > 
> > Right, and we didn't. Your patch didn't change anything, but allow for
> > printk_nmi_enter/exit() to be traced by ftrace, but that's wrong to
> > begin with because it ftrace_nmi_enter() hasn't been called yet.
> > 
> 
> I would even argue that placing printk_nmi_enter() between
> lockdep_off() and ftrace_nmi_enter() is wrong because if in the future
> printk_nmi_enter() were to do any ftrace tracing, it wont be caught, as
> it was by having it before lockdep_off().
> 
> printk_nmi_enter() should not muck with IRQ state, nor should it do any
> ftrace tracing. Since ftrace mucks with IRQ state when it gets enabled
> or disabled, it will screw up lockdep, and lockdep will complain. That
> way we can use lockdep not being off to catch this bug.

The very bestest solution is to rm -rf printk ;-)


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Fri, Sep 07, 2018 at 09:55:33AM -0400, Steven Rostedt wrote:
> On Fri, 7 Sep 2018 15:45:32 +0200
> Peter Zijlstra  wrote:
> 
> > Yes really, we should not muck with the IRQ state from NMI context.
> 
> Right, and we didn't. Your patch didn't change anything, but allow for

It does, it kills lockdep before we get to mucking with the IRQ state.

> printk_nmi_enter/exit() to be traced by ftrace, but that's wrong to
> begin with because it ftrace_nmi_enter() hasn't been called yet.

That's a second bug :-) A better solution might be to inline that
function as well. A direct OR instruction is still way faster than a
CALL+RET.


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Steven Rostedt
On Fri, 7 Sep 2018 09:55:33 -0400
Steven Rostedt  wrote:

> On Fri, 7 Sep 2018 15:45:32 +0200
> Peter Zijlstra  wrote:
> 
> > Yes really, we should not muck with the IRQ state from NMI context.  
> 
> Right, and we didn't. Your patch didn't change anything, but allow for
> printk_nmi_enter/exit() to be traced by ftrace, but that's wrong to
> begin with because it ftrace_nmi_enter() hasn't been called yet.
> 

I would even argue that placing printk_nmi_enter() between
lockdep_off() and ftrace_nmi_enter() is wrong because if in the future
printk_nmi_enter() were to do any ftrace tracing, it wont be caught, as
it was by having it before lockdep_off().

printk_nmi_enter() should not muck with IRQ state, nor should it do any
ftrace tracing. Since ftrace mucks with IRQ state when it gets enabled
or disabled, it will screw up lockdep, and lockdep will complain. That
way we can use lockdep not being off to catch this bug.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Steven Rostedt
On Fri, 7 Sep 2018 15:45:32 +0200
Peter Zijlstra  wrote:

> Yes really, we should not muck with the IRQ state from NMI context.

Right, and we didn't. Your patch didn't change anything, but allow for
printk_nmi_enter/exit() to be traced by ftrace, but that's wrong to
begin with because it ftrace_nmi_enter() hasn't been called yet.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Steven Rostedt
On Fri, 7 Sep 2018 11:30:55 +0200
Peter Zijlstra  wrote:

> > That said, I am not against this change. Especially the inlining
> > is a good move. Note that lockdep_off()/lockdep_on() must not
> > be traced as well.  
> 
> Hard to trace an inline funcion; we could make it __always_inline to
> feel better.

In the old days, you would need to make it __always_inline, because if
gcc didn't inline it (and it did so in the past), it would be traced
and would re-introduce this bug. But I've hit this issue too many times
in the past, that I added "notrace" to the "inline" definition, where
all "inline" labeled functions are also "notrace".

lockdep.o has the -pg option removed in the Makefile so that all
functions in lockdep.c is notrace, which is why it was fine to keep it
there.

But adding __always_inline may not be a bad idea regardless.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Steven Rostedt
On Fri, 7 Sep 2018 17:35:38 +0900
Sergey Senozhatsky  wrote:

> Should't printk_nmi_enter()/printk_nmi_exit() still be notrace?
> Like you and Steven said - it's still before ftrace_nmi_enter()
> and should be notrace regardless.

Correct. My patch fixes both issues.

The patch that Peter is posting is in case for some reason in the future
printk_nmi_enter/exit() do something with interrupts or locking.

And the printk_nmi_enter/exit() is placed where it is, in case for some
reason lockdep_off/on() does a printk.

The printk_nmi_enter/exit() and lockdep_off/on() are only changing
state variables. Neither of them should be traced by ftrace.

My patch solved the problem and was not just fixing a symptom.

It could have been solved either way, but as Petr said, the
printk_nmi_enter/exit() should have been notrace, because it was before
ftrace_nmi_enter/exit(). Lockdep actually found this bug because the
lockdep_on/off() was not first.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Fri, Sep 07, 2018 at 09:41:48AM -0400, Steven Rostedt wrote:
> On Fri, 7 Sep 2018 09:34:48 +0200
> Peter Zijlstra  wrote:
> 
> > On Wed, Sep 05, 2018 at 09:33:34PM -0400, Steven Rostedt wrote:
> > >   do_idle {
> > > 
> > > [interrupts enabled]
> > > 
> > >  [interrupts disabled]
> > >   TRACE_IRQS_OFF [lockdep says irqs off]
> > >   [...]
> > >   TRACE_IRQS_IRET
> > >   test if pt_regs say return to interrupts enabled [yes]
> > >   TRACE_IRQS_ON [lockdep says irqs are on]
> > > 
> > >   
> > >   nmi_enter() {
> > >   printk_nmi_enter() [traced by ftrace]
> > >   [ hit ftrace breakpoint ]
> > >   
> > >   TRACE_IRQS_OFF [lockdep says irqs off]
> > >   [...]
> > >   TRACE_IRQS_IRET [return from breakpoint]
> > >  test if pt_regs say interrupts enabled [no]
> > >  [iret back to interrupt]
> > >  [iret back to code]
> > > 
> > > tick_nohz_idle_enter() {
> > > 
> > >   lockdep_assert_irqs_enabled() [lockdep say no!]  
> > 
> > Isn't the problem that we muck with the IRQ state from NMI context? We
> > shouldn't be doing that.
> 
> Not really. 

Yes really, we should not muck with the IRQ state from NMI context.


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Steven Rostedt
On Fri, 7 Sep 2018 09:34:48 +0200
Peter Zijlstra  wrote:

> On Wed, Sep 05, 2018 at 09:33:34PM -0400, Steven Rostedt wrote:
> >   do_idle {
> > 
> > [interrupts enabled]
> > 
> >  [interrupts disabled]
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET
> > test if pt_regs say return to interrupts enabled [yes]
> > TRACE_IRQS_ON [lockdep says irqs are on]
> > 
> > 
> > nmi_enter() {
> > printk_nmi_enter() [traced by ftrace]
> > [ hit ftrace breakpoint ]
> > 
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET [return from breakpoint]
> >test if pt_regs say interrupts enabled [no]
> >[iret back to interrupt]
> >[iret back to code]
> > 
> > tick_nohz_idle_enter() {
> > 
> > lockdep_assert_irqs_enabled() [lockdep say no!]  
> 
> Isn't the problem that we muck with the IRQ state from NMI context? We
> shouldn't be doing that.

Not really. The problem is that the exception handling code jumps into
lockdep, and we allow exceptions to happen in NMIs. It's just that the
tracing of printk_nmi_enter() caused an exception before lockdep was
turned off.

> 
> The thing is, since we trace the IRQ state from within IRQ-disable,
> since that's the only IRQ-safe option, it is very much not NMI-safe.
> 
> Your patch might avoid the symptom, but I don't think it cures the
> fundamental problem.

No, the issue is that the nmi_enter() code was traced early which adds
breakpoints and breakpoints causes exceptions which jumps into lockdep,
and we did this before lockdep was turned off.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Fri, Sep 07, 2018 at 10:28:34AM +0200, Petr Mladek wrote:
> On Fri 2018-09-07 09:45:31, Peter Zijlstra wrote:
> > On Thu, Sep 06, 2018 at 11:31:51AM +0900, Sergey Senozhatsky wrote:
> > > An alternative option, thus, could be re-instating back the rule that
> > > lockdep_off/on should be the first and the last thing we do in
> > > nmi_enter/nmi_exit. E.g.
> > > 
> > > nmi_enter()
> > >   lockdep_off();
> > >   printk_nmi_enter();
> > > 
> > > nmi_exit()
> > >   printk_nmi_exit();
> > >   lockdep_on();
> > 
> > Yes that. Also, those should probably be inline functions.
> > 
> > ---
> > Subject: locking/lockdep: Fix NMI handling
> > 
> > Someone put code in the NMI handler before lockdep_off(). Since lockdep
> > is not NMI safe, this wrecks stuff.
> 
> My view is that nmi_enter() has to switch several features into
> NMI-safe mode. The code must not trigger the other features when
> they are still in the unsafe mode.
> 
> It is a chicken&egg problem. And it is hard to completely prevent
> regressions caused by future changes.

Sure, not bothered too much about the regression, that happens.

> I though that printk_nmi_enter() should never need any lockdep-related
> code. On the other hand, people might want to printk debug messages
> when lockdep_off() is called. This is why I put it in the current order.

Nah, that'd be broken. Or rather, if you want to debug NMI stuff, you
had better know wth you're doing. Printk, as per mainline, is utterly
useless for that -- I still carry those force_early_printk patches
locally.

Because even if the core printk code were to be NMI safe (possible I
think, all we need is a lockless ring-buffer), then none of the console
drivers are :/

(I really hate this current printk-nmi mess)

> That said, I am not against this change. Especially the inlining
> is a good move. Note that lockdep_off()/lockdep_on() must not
> be traced as well.

Hard to trace an inline funcion; we could make it __always_inline to
feel better.



Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Sergey Senozhatsky
On (09/07/18 10:28), Petr Mladek wrote:
> On Fri 2018-09-07 09:45:31, Peter Zijlstra wrote:
> > On Thu, Sep 06, 2018 at 11:31:51AM +0900, Sergey Senozhatsky wrote:
> > > An alternative option, thus, could be re-instating back the rule that
> > > lockdep_off/on should be the first and the last thing we do in
> > > nmi_enter/nmi_exit. E.g.
> > > 
> > > nmi_enter()
> > >   lockdep_off();
> > >   printk_nmi_enter();
> > > 
> > > nmi_exit()
> > >   printk_nmi_exit();
> > >   lockdep_on();
> > 
> > Yes that. Also, those should probably be inline functions.
> > 
> > ---
> > Subject: locking/lockdep: Fix NMI handling
> > 
> > Someone put code in the NMI handler before lockdep_off(). Since lockdep
> > is not NMI safe, this wrecks stuff.
> 
> My view is that nmi_enter() has to switch several features into
> NMI-safe mode. The code must not trigger the other features when
> they are still in the unsafe mode.
> 
> It is a chicken&egg problem. And it is hard to completely prevent
> regressions caused by future changes.
> 
> I though that printk_nmi_enter() should never need any lockdep-related
> code. On the other hand, people might want to printk debug messages
> when lockdep_off() is called. This is why I put it in the current order.
> 
> That said, I am not against this change. Especially the inlining
> is a good move. Note that lockdep_off()/lockdep_on() must not
> be traced as well.

Should't printk_nmi_enter()/printk_nmi_exit() still be notrace?
Like you and Steven said - it's still before ftrace_nmi_enter()
and should be notrace regardless.

-ss


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Petr Mladek
On Fri 2018-09-07 09:45:31, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 11:31:51AM +0900, Sergey Senozhatsky wrote:
> > An alternative option, thus, could be re-instating back the rule that
> > lockdep_off/on should be the first and the last thing we do in
> > nmi_enter/nmi_exit. E.g.
> > 
> > nmi_enter()
> > lockdep_off();
> > printk_nmi_enter();
> > 
> > nmi_exit()
> > printk_nmi_exit();
> > lockdep_on();
> 
> Yes that. Also, those should probably be inline functions.
> 
> ---
> Subject: locking/lockdep: Fix NMI handling
> 
> Someone put code in the NMI handler before lockdep_off(). Since lockdep
> is not NMI safe, this wrecks stuff.

My view is that nmi_enter() has to switch several features into
NMI-safe mode. The code must not trigger the other features when
they are still in the unsafe mode.

It is a chicken&egg problem. And it is hard to completely prevent
regressions caused by future changes.

I though that printk_nmi_enter() should never need any lockdep-related
code. On the other hand, people might want to printk debug messages
when lockdep_off() is called. This is why I put it in the current order.

That said, I am not against this change. Especially the inlining
is a good move. Note that lockdep_off()/lockdep_on() must not
be traced as well.

Best Regards,
Petr


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Thu, Sep 06, 2018 at 11:31:51AM +0900, Sergey Senozhatsky wrote:
> An alternative option, thus, could be re-instating back the rule that
> lockdep_off/on should be the first and the last thing we do in
> nmi_enter/nmi_exit. E.g.
> 
> nmi_enter()
>   lockdep_off();
>   printk_nmi_enter();
> 
> nmi_exit()
>   printk_nmi_exit();
>   lockdep_on();

Yes that. Also, those should probably be inline functions.

---
Subject: locking/lockdep: Fix NMI handling

Someone put code in the NMI handler before lockdep_off(). Since lockdep
is not NMI safe, this wrecks stuff.

Fixes: 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/hardirq.h  |  4 ++--
 include/linux/lockdep.h  | 11 +--
 kernel/locking/lockdep.c | 12 
 3 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 0fbbcdf0c178..8d70270d9486 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -62,8 +62,8 @@ extern void irq_exit(void);
 
 #define nmi_enter()\
do {\
-   printk_nmi_enter(); \
lockdep_off();  \
+   printk_nmi_enter(); \
ftrace_nmi_enter(); \
BUG_ON(in_nmi());   \
preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
@@ -78,8 +78,8 @@ extern void irq_exit(void);
BUG_ON(!in_nmi());  \
preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit();  \
-   lockdep_on();   \
printk_nmi_exit();  \
+   lockdep_on();   \
} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0d0b51c4d85..70bb9e8fc8f9 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -272,8 +272,15 @@ extern void lockdep_reset_lock(struct lockdep_map *lock);
 extern void lockdep_free_key_range(void *start, unsigned long size);
 extern asmlinkage void lockdep_sys_exit(void);
 
-extern void lockdep_off(void);
-extern void lockdep_on(void);
+static inline void lockdep_off(void)
+{
+   current->lockdep_recursion++;
+}
+
+static inline void lockdep_on(void)
+{
+   current->lockdep_recursion--;
+}
 
 /*
  * These methods are used by specific locking variants (spinlocks,
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e406c5fdb41e..da51ed1c0c21 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -317,18 +317,6 @@ static inline u64 iterate_chain_key(u64 key, u32 idx)
return k0 | (u64)k1 << 32;
 }
 
-void lockdep_off(void)
-{
-   current->lockdep_recursion++;
-}
-EXPORT_SYMBOL(lockdep_off);
-
-void lockdep_on(void)
-{
-   current->lockdep_recursion--;
-}
-EXPORT_SYMBOL(lockdep_on);
-
 /*
  * Debugging switches:
  */


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-07 Thread Peter Zijlstra
On Wed, Sep 05, 2018 at 09:33:34PM -0400, Steven Rostedt wrote:
>   do_idle {
> 
> [interrupts enabled]
> 
>  [interrupts disabled]
>   TRACE_IRQS_OFF [lockdep says irqs off]
>   [...]
>   TRACE_IRQS_IRET
>   test if pt_regs say return to interrupts enabled [yes]
>   TRACE_IRQS_ON [lockdep says irqs are on]
> 
>   
>   nmi_enter() {
>   printk_nmi_enter() [traced by ftrace]
>   [ hit ftrace breakpoint ]
>   
>   TRACE_IRQS_OFF [lockdep says irqs off]
>   [...]
>   TRACE_IRQS_IRET [return from breakpoint]
>  test if pt_regs say interrupts enabled [no]
>  [iret back to interrupt]
>  [iret back to code]
> 
> tick_nohz_idle_enter() {
> 
>   lockdep_assert_irqs_enabled() [lockdep say no!]

Isn't the problem that we muck with the IRQ state from NMI context? We
shouldn't be doing that.

The thing is, since we trace the IRQ state from within IRQ-disable,
since that's the only IRQ-safe option, it is very much not NMI-safe.

Your patch might avoid the symptom, but I don't think it cures the
fundamental problem.


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-06 Thread Steven Rostedt
On Thu, 6 Sep 2018 11:31:51 +0900
Sergey Senozhatsky  wrote:

> Hello,
> 
> On (09/05/18 21:33), Steven Rostedt wrote:
> >   do_idle {
> > 
> > [interrupts enabled]
> > 
> >  [interrupts disabled]
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET
> > test if pt_regs say return to interrupts enabled [yes]
> > TRACE_IRQS_ON [lockdep says irqs are on]
> > 
> > 
> > nmi_enter() {
> > printk_nmi_enter() [traced by ftrace]
> > [ hit ftrace breakpoint ]
> > 
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET [return from breakpoint]
> >test if pt_regs say interrupts enabled [no]
> >[iret back to interrupt]
> >[iret back to code]
> > 
> > tick_nohz_idle_enter() {
> > 
> > lockdep_assert_irqs_enabled() [lockdep say no!]
> > 
> > Although interrupts are indeed enabled, lockdep thinks it is not, and since
> > we now do asserts via lockdep, it gives a false warning. The issue here is
> > that printk_nmi_enter() is called before lockdep_off(), which disables
> > lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
> > printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
> > confused.  
> 
> Great catch and I like the patch!
> 
> Indeed, with printk_nmi we changed the "everything that nmi_enter does
> should happen after lockdep_off" to "everything that nmi_enter does should
> happen after printk_nmi_enter" // +similar change to nmi_exit.
> 
> An alternative option, thus, could be re-instating back the rule that
> lockdep_off/on should be the first and the last thing we do in
> nmi_enter/nmi_exit. E.g.
> 
> nmi_enter()
>   lockdep_off();
>   printk_nmi_enter();
> 
> nmi_exit()
>   printk_nmi_exit();
>   lockdep_on();
> 
> I guess that we can keep printk_nmi_enter/printk_nmi_exit at the top
> and at the bottom of nmi_enter/nmi_exit correspondingly just in case
> if lockdep_off/lockdep_on sometime in the future starts invoking printk(),
> which would deadlock us. Hence

I thought about doing the move first, but figured it wasn't needed, as
Petr said, it's before ftrace_nmi_enter() and should be notrace
regrardless.

> 
> Acked-by: Sergey Senozhatsky 
> 
> 
> > Cc: sta...@vger.kernel.org
> > Fixes: 099f1c84c0052 ("printk: introduce per-cpu safe_print seq buffer")  
> 
> A minor nitpick,
> 
> printk_nmi was introduced by 42a0bb3f71383b457 ("printk/nmi: generic
> solution for safe printk in NMI") - in that commit we declared a new
> printk_nmi_enter/exit should be the first and the last thing we do in
> nmi_enter/nmi_exit rule.

Thanks I'll update the change log. Grumble, my git blame foo is getting
rusty.

-- Steve


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-06 Thread Petr Mladek
On Thu 2018-09-06 11:31:51, Sergey Senozhatsky wrote:
> Hello,
> 
> On (09/05/18 21:33), Steven Rostedt wrote:
> >   do_idle {
> > 
> > [interrupts enabled]
> > 
> >  [interrupts disabled]
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET
> > test if pt_regs say return to interrupts enabled [yes]
> > TRACE_IRQS_ON [lockdep says irqs are on]
> > 
> > 
> > nmi_enter() {
> > printk_nmi_enter() [traced by ftrace]
> > [ hit ftrace breakpoint ]
> > 
> > TRACE_IRQS_OFF [lockdep says irqs off]
> > [...]
> > TRACE_IRQS_IRET [return from breakpoint]
> >test if pt_regs say interrupts enabled [no]
> >[iret back to interrupt]
> >[iret back to code]
> > 
> > tick_nohz_idle_enter() {
> > 
> > lockdep_assert_irqs_enabled() [lockdep say no!]
> > 
> > Although interrupts are indeed enabled, lockdep thinks it is not, and since
> > we now do asserts via lockdep, it gives a false warning. The issue here is
> > that printk_nmi_enter() is called before lockdep_off(), which disables
> > lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
> > printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
> > confused.
> 
> Great catch and I like the patch!
> 
> Indeed, with printk_nmi we changed the "everything that nmi_enter does
> should happen after lockdep_off" to "everything that nmi_enter does should
> happen after printk_nmi_enter" // +similar change to nmi_exit.
> 
> An alternative option, thus, could be re-instating back the rule that
> lockdep_off/on should be the first and the last thing we do in
> nmi_enter/nmi_exit. E.g.
> 
> nmi_enter()
>   lockdep_off();
>   printk_nmi_enter();
> 
> nmi_exit()
>   printk_nmi_exit();
>   lockdep_on();

Yup, this would help here as well.

> I guess that we can keep printk_nmi_enter/printk_nmi_exit at the top
> and at the bottom of nmi_enter/nmi_exit correspondingly just in case
> if lockdep_off/lockdep_on sometime in the future starts invoking printk(),
> which would deadlock us. Hence

Yes.

In general, it does not looks safe to allow tracing functions
that are called before ftrace_nmi_enter().

> Acked-by: Sergey Senozhatsky 

Therefore I am all for Steven's patch.

Acked-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-05 Thread Sergey Senozhatsky
Hello,

On (09/05/18 21:33), Steven Rostedt wrote:
>   do_idle {
> 
> [interrupts enabled]
> 
>  [interrupts disabled]
>   TRACE_IRQS_OFF [lockdep says irqs off]
>   [...]
>   TRACE_IRQS_IRET
>   test if pt_regs say return to interrupts enabled [yes]
>   TRACE_IRQS_ON [lockdep says irqs are on]
> 
>   
>   nmi_enter() {
>   printk_nmi_enter() [traced by ftrace]
>   [ hit ftrace breakpoint ]
>   
>   TRACE_IRQS_OFF [lockdep says irqs off]
>   [...]
>   TRACE_IRQS_IRET [return from breakpoint]
>  test if pt_regs say interrupts enabled [no]
>  [iret back to interrupt]
>  [iret back to code]
> 
> tick_nohz_idle_enter() {
> 
>   lockdep_assert_irqs_enabled() [lockdep say no!]
> 
> Although interrupts are indeed enabled, lockdep thinks it is not, and since
> we now do asserts via lockdep, it gives a false warning. The issue here is
> that printk_nmi_enter() is called before lockdep_off(), which disables
> lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
> printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
> confused.

Great catch and I like the patch!

Indeed, with printk_nmi we changed the "everything that nmi_enter does
should happen after lockdep_off" to "everything that nmi_enter does should
happen after printk_nmi_enter" // +similar change to nmi_exit.

An alternative option, thus, could be re-instating back the rule that
lockdep_off/on should be the first and the last thing we do in
nmi_enter/nmi_exit. E.g.

nmi_enter()
lockdep_off();
printk_nmi_enter();

nmi_exit()
printk_nmi_exit();
lockdep_on();

I guess that we can keep printk_nmi_enter/printk_nmi_exit at the top
and at the bottom of nmi_enter/nmi_exit correspondingly just in case
if lockdep_off/lockdep_on sometime in the future starts invoking printk(),
which would deadlock us. Hence

Acked-by: Sergey Senozhatsky 


> Cc: sta...@vger.kernel.org
> Fixes: 099f1c84c0052 ("printk: introduce per-cpu safe_print seq buffer")

A minor nitpick,

printk_nmi was introduced by 42a0bb3f71383b457 ("printk/nmi: generic
solution for safe printk in NMI") - in that commit we declared a new
printk_nmi_enter/exit should be the first and the last thing we do in
nmi_enter/nmi_exit rule.

-ss


[PATCH] printk/tracing: Do not trace printk_nmi_enter()

2018-09-05 Thread Steven Rostedt
[
  I'm currently testing this patch, and will push this to mainline,
  I'm  just looking for acks and review bys. Also, Peter, this is the
  bug that my patch that differentiates if lockdep is broken or if the
  code is broken when lockdep_assert_irqs_{dis,en}abled() triggers, and
  would have been helpful.
]

From: "Steven Rostedt (VMware)" 

I hit the following splat in my tests:

[ cut here ]
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 
tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 001c EBX: f26e7f8c ECX: 0006 EDX: 0007
ESI: f26dd1c0 EDI:  EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
 do_idle+0x33/0x202
 cpu_startup_entry+0x61/0x63
 start_secondary+0x18e/0x1ed
 startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last  enabled at (18773829): [] 
trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [] 
trace_hardirqs_off_thunk+0xc/0x10
softirqs last  enabled at (18773824): [] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---

After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

  do_idle {

[interrupts enabled]

 [interrupts disabled]
TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET
test if pt_regs say return to interrupts enabled [yes]
TRACE_IRQS_ON [lockdep says irqs are on]


nmi_enter() {
printk_nmi_enter() [traced by ftrace]
[ hit ftrace breakpoint ]

TRACE_IRQS_OFF [lockdep says irqs off]
[...]
TRACE_IRQS_IRET [return from breakpoint]
   test if pt_regs say interrupts enabled [no]
   [iret back to interrupt]
   [iret back to code]

tick_nohz_idle_enter() {

lockdep_assert_irqs_enabled() [lockdep say no!]

Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.

Cc: sta...@vger.kernel.org
Fixes: 099f1c84c0052 ("printk: introduce per-cpu safe_print seq buffer")
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/printk/printk_safe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index a0a74c533e4b..0913b4d385de 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -306,12 +306,12 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, 
va_list args)
return printk_safe_log_store(s, fmt, args);
 }
 
-void printk_nmi_enter(void)
+void notrace printk_nmi_enter(void)
 {
this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
 }
 
-void printk_nmi_exit(void)
+void notrace printk_nmi_exit(void)
 {
this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
 }
-- 
2.13.6