Re: console: lockup on boot

2014-07-08 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 10:54:22AM +0200, Mike Galbraith wrote:
> On Thu, 2014-06-12 at 10:26 +0200, Jan Kara wrote: 
> > On Wed 11-06-14 23:07:04, Sasha Levin wrote:
> 
> > > The first patch fixed it (I assumed that there's no need to try the 
> > > second).
> >   Good. So that shows that it is the increased lockdep coverage which is
> > causing problems - with my patch, lockdep is able to identify some problem
> > because console drivers are now called with lockdep enabled. But because
> > the problem was found in some difficult context, lockdep just hung the
> > machine when trying to report it... Sadly the stacktraces you posted don't
> > tell us what lockdep found.
> > 
> > Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
> > problems when holding logbuf_lock? One possibility would be to extend
> > logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
> > least avoid the spinlock recursion killing the machine but we won't be able
> > to see what lockdep found...
> 
> Could tell lockdep to use trace_printk().

lkml.kernel.org/r/2011122143.401184...@chello.nl




pgpLuUYbI0MSR.pgp
Description: PGP signature


Re: console: lockup on boot

2014-07-08 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 10:54:22AM +0200, Mike Galbraith wrote:
 On Thu, 2014-06-12 at 10:26 +0200, Jan Kara wrote: 
  On Wed 11-06-14 23:07:04, Sasha Levin wrote:
 
   The first patch fixed it (I assumed that there's no need to try the 
   second).
Good. So that shows that it is the increased lockdep coverage which is
  causing problems - with my patch, lockdep is able to identify some problem
  because console drivers are now called with lockdep enabled. But because
  the problem was found in some difficult context, lockdep just hung the
  machine when trying to report it... Sadly the stacktraces you posted don't
  tell us what lockdep found.
  
  Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
  problems when holding logbuf_lock? One possibility would be to extend
  logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
  least avoid the spinlock recursion killing the machine but we won't be able
  to see what lockdep found...
 
 Could tell lockdep to use trace_printk().

lkml.kernel.org/r/2011122143.401184...@chello.nl




pgpLuUYbI0MSR.pgp
Description: PGP signature


Re: console: lockup on boot

2014-06-19 Thread Jan Kara
On Wed 11-06-14 23:07:04, Sasha Levin wrote:
> On 06/11/2014 05:31 PM, Jan Kara wrote:
> > On Wed 11-06-14 22:34:36, Jan Kara wrote:
> >> > On Wed 11-06-14 10:55:55, Sasha Levin wrote:
> >>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote:
>  > > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> > > > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
> >> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
>  > > > On 05/30/2014 09:11 AM, Sasha Levin wrote:
> >> > > >>> Hi all,
> >> > > >>>
> >> > > >>> I sometime see lockups when booting my KVM guest with 
> >> > > >>> the latest -next kernel,
> >> > > >>> it basically hangs right when it should start 'init', 
> >> > > >>> and after a while I get
> >> > > >>> the following spew:
> >> > > >>>
> >> > > >>> [   30.790833] BUG: spinlock lockup suspected on 
> >> > > >>> CPU#1, swapper/1/0
>  > > >
>  > > > Maybe related to this report: 
>  > > > https://lkml.org/lkml/2014/5/30/26
>  > > > from Jet Chen which was bisected to
>  > > >
>  > > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
>  > > > Author: Jan Kara 
>  > > > AuthorDate: Thu May 22 10:43:35 2014 +1000
>  > > > Commit: Stephen Rothwell 
>  > > > CommitDate: Thu May 22 10:43:35 2014 +1000
>  > > >
>  > > >  printk: enable interrupts before calling 
>  > > > console_trylock_for_printk()
>  > > >  We need interrupts disabled when calling 
>  > > > console_trylock_for_printk() only
>  > > >  so that cpu id we pass to can_use_console() remains 
>  > > > valid (for other
>  > > >  things console_sem provides all the exclusion we need 
>  > > > and deadlocks on
>  > > >  console_sem due to interrupts are impossible because 
>  > > > we use
>  > > >  down_trylock()).  However if we are rescheduled, we 
>  > > > are guaranteed to run
>  > > >  on an online cpu so we can easily just get the cpu id 
>  > > > in
>  > > >  can_use_console().
>  > > >  We can lose a bit of performance when we enable 
>  > > > interrupts in
>  > > >  vprintk_emit() and then disable them again in 
>  > > > console_unlock() but OTOH it
>  > > >  can somewhat reduce interrupt latency caused by 
>  > > > console_unlock()
>  > > >  especially since later in the patch series we will 
>  > > > want to spin on
>  > > >  console_sem in console_trylock_for_printk().
>  > > >  Signed-off-by: Jan Kara 
>  > > >  Signed-off-by: Andrew Morton 
>  > > > 
>  > > >
>  > > > ?
> >> > > >>>Yeah, very likely. I think I see the problem, I'll send the 
> >> > > >>> fix shortly.
> > > > >>
> > > > >> Hi Jan,
> > > > >>
> > > > >> It seems that the issue I'm seeing is different from the 
> > > > >> "[prink]  BUG: spinlock
> > > > >> lockup suspected on CPU#0, swapper/1".
> > > > >>
> > > > >> Is there anything else I could try here? The issue is very 
> > > > >> common during testing.
>  > > > 
>  > > > Sasha,
>  > > > 
>  > > > Is this bisectable? Maybe that's the best way forward here.
> >>> > > 
> >>> > > I've ran a bisection again and ended up at the same commit as Jet Chen
> >>> > > (the commit unfortunately already made it to Linus's tree).
> >>> > > 
> >>> > > Note that I did try Jan's proposed fix and that didn't solve the issue
> >>> > > for me, I believe we're seeing different issues caused by the same
> >>> > > commit.
> >> >   Sorry it has been busy time lately and I didn't have as much time to 
> >> > look
> >> > into this as would be needed.
> >   Oops, pressed send too early... So I have two debug patches for you. Can
> > you try whether the problem reproduces with the first one or with both of
> > them applied?
> 
> The first patch fixed it (I assumed that there's no need to try the second).
  OK, so I got back to it. Can you try with attached patch and without my
"fix"? lockdep should dump complaints using trace buffer (so you should
check it from time to time) instead of using printk and thus we should be
able to see what it is complaining about.  Thanks!

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From 93930c92c664d8b834e1e04b92c00023626fc07a Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Thu, 19 Jun 2014 15:36:09 +0200
Subject: [PATCH] lockdep: Dump info via tracing

Signed-off-by: Jan Kara 
---
 kernel/locking/lockdep.c | 707 

Re: console: lockup on boot

2014-06-19 Thread Jan Kara
On Wed 11-06-14 23:07:04, Sasha Levin wrote:
 On 06/11/2014 05:31 PM, Jan Kara wrote:
  On Wed 11-06-14 22:34:36, Jan Kara wrote:
   On Wed 11-06-14 10:55:55, Sasha Levin wrote:
On 06/10/2014 11:59 AM, Peter Hurley wrote:
 On 06/06/2014 03:05 PM, Sasha Levin wrote:
 On 05/30/2014 10:07 AM, Jan Kara wrote:
 On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 On 05/30/2014 09:11 AM, Sasha Levin wrote:
 Hi all,

 I sometime see lockups when booting my KVM guest with 
 the latest -next kernel,
 it basically hangs right when it should start 'init', 
 and after a while I get
 the following spew:

 [   30.790833] BUG: spinlock lockup suspected on 
 CPU#1, swapper/1/0

 Maybe related to this report: 
 https://lkml.org/lkml/2014/5/30/26
 from Jet Chen which was bisected to

 commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 Author: Jan Kara j...@suse.cz
 AuthorDate: Thu May 22 10:43:35 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:43:35 2014 +1000

  printk: enable interrupts before calling 
 console_trylock_for_printk()
  We need interrupts disabled when calling 
 console_trylock_for_printk() only
  so that cpu id we pass to can_use_console() remains 
 valid (for other
  things console_sem provides all the exclusion we need 
 and deadlocks on
  console_sem due to interrupts are impossible because 
 we use
  down_trylock()).  However if we are rescheduled, we 
 are guaranteed to run
  on an online cpu so we can easily just get the cpu id 
 in
  can_use_console().
  We can lose a bit of performance when we enable 
 interrupts in
  vprintk_emit() and then disable them again in 
 console_unlock() but OTOH it
  can somewhat reduce interrupt latency caused by 
 console_unlock()
  especially since later in the patch series we will 
 want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton 
 a...@linux-foundation.org

 ?
Yeah, very likely. I think I see the problem, I'll send the 
 fix shortly.

 Hi Jan,

 It seems that the issue I'm seeing is different from the 
 [prink]  BUG: spinlock
 lockup suspected on CPU#0, swapper/1.

 Is there anything else I could try here? The issue is very 
 common during testing.
 
 Sasha,
 
 Is this bisectable? Maybe that's the best way forward here.

I've ran a bisection again and ended up at the same commit as Jet Chen
(the commit unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue
for me, I believe we're seeing different issues caused by the same
commit.
 Sorry it has been busy time lately and I didn't have as much time to 
   look
   into this as would be needed.
Oops, pressed send too early... So I have two debug patches for you. Can
  you try whether the problem reproduces with the first one or with both of
  them applied?
 
 The first patch fixed it (I assumed that there's no need to try the second).
  OK, so I got back to it. Can you try with attached patch and without my
fix? lockdep should dump complaints using trace buffer (so you should
check it from time to time) instead of using printk and thus we should be
able to see what it is complaining about.  Thanks!

Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
From 93930c92c664d8b834e1e04b92c00023626fc07a Mon Sep 17 00:00:00 2001
From: Jan Kara j...@suse.cz
Date: Thu, 19 Jun 2014 15:36:09 +0200
Subject: [PATCH] lockdep: Dump info via tracing

Signed-off-by: Jan Kara j...@suse.cz
---
 kernel/locking/lockdep.c | 707 +++
 1 file changed, 402 insertions(+), 305 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index d24e4339b46d..b15e7dec55f6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -77,6 +77,26 @@ module_param(lock_stat, int, 0644);
  */
 static arch_spinlock_t lockdep_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 
+static void sprint_ip_sym(char *buf, unsigned long ip)
+{
+	sprintf(buf, [%p] %pS\n, (void *) ip, (void *) ip);
+}
+
+static void trace_print_stack_trace(struct stack_trace *trace, int spaces)
+{
+	int i, n;
+	char buf[256];
+
+	if (!trace-entries)
+		return;
+
+	for (i = 0; i  trace-nr_entries; i++) {
+		n = sprintf(buf, %*c, 1 + spaces, ' ');
+		sprint_ip_sym(buf + n, trace-entries[i]);
+		trace_printk(buf);
+	}
+}
+
 static int graph_lock(void)
 {
 	arch_spin_lock(lockdep_lock);
@@ -382,9 +402,9 @@ static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES];
 
 static void 

Re: console: lockup on boot

2014-06-12 Thread Mike Galbraith
On Thu, 2014-06-12 at 10:26 +0200, Jan Kara wrote: 
> On Wed 11-06-14 23:07:04, Sasha Levin wrote:

> > The first patch fixed it (I assumed that there's no need to try the second).
>   Good. So that shows that it is the increased lockdep coverage which is
> causing problems - with my patch, lockdep is able to identify some problem
> because console drivers are now called with lockdep enabled. But because
> the problem was found in some difficult context, lockdep just hung the
> machine when trying to report it... Sadly the stacktraces you posted don't
> tell us what lockdep found.
> 
> Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
> problems when holding logbuf_lock? One possibility would be to extend
> logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
> least avoid the spinlock recursion killing the machine but we won't be able
> to see what lockdep found...

Could tell lockdep to use trace_printk().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-12 Thread Jan Kara
On Wed 11-06-14 23:07:04, Sasha Levin wrote:
> On 06/11/2014 05:31 PM, Jan Kara wrote:
> > On Wed 11-06-14 22:34:36, Jan Kara wrote:
> >> > On Wed 11-06-14 10:55:55, Sasha Levin wrote:
> >>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote:
>  > > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> > > > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
> >> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
>  > > > On 05/30/2014 09:11 AM, Sasha Levin wrote:
> >> > > >>> Hi all,
> >> > > >>>
> >> > > >>> I sometime see lockups when booting my KVM guest with 
> >> > > >>> the latest -next kernel,
> >> > > >>> it basically hangs right when it should start 'init', 
> >> > > >>> and after a while I get
> >> > > >>> the following spew:
> >> > > >>>
> >> > > >>> [   30.790833] BUG: spinlock lockup suspected on 
> >> > > >>> CPU#1, swapper/1/0
>  > > >
>  > > > Maybe related to this report: 
>  > > > https://lkml.org/lkml/2014/5/30/26
>  > > > from Jet Chen which was bisected to
>  > > >
>  > > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
>  > > > Author: Jan Kara 
>  > > > AuthorDate: Thu May 22 10:43:35 2014 +1000
>  > > > Commit: Stephen Rothwell 
>  > > > CommitDate: Thu May 22 10:43:35 2014 +1000
>  > > >
>  > > >  printk: enable interrupts before calling 
>  > > > console_trylock_for_printk()
>  > > >  We need interrupts disabled when calling 
>  > > > console_trylock_for_printk() only
>  > > >  so that cpu id we pass to can_use_console() remains 
>  > > > valid (for other
>  > > >  things console_sem provides all the exclusion we need 
>  > > > and deadlocks on
>  > > >  console_sem due to interrupts are impossible because 
>  > > > we use
>  > > >  down_trylock()).  However if we are rescheduled, we 
>  > > > are guaranteed to run
>  > > >  on an online cpu so we can easily just get the cpu id 
>  > > > in
>  > > >  can_use_console().
>  > > >  We can lose a bit of performance when we enable 
>  > > > interrupts in
>  > > >  vprintk_emit() and then disable them again in 
>  > > > console_unlock() but OTOH it
>  > > >  can somewhat reduce interrupt latency caused by 
>  > > > console_unlock()
>  > > >  especially since later in the patch series we will 
>  > > > want to spin on
>  > > >  console_sem in console_trylock_for_printk().
>  > > >  Signed-off-by: Jan Kara 
>  > > >  Signed-off-by: Andrew Morton 
>  > > > 
>  > > >
>  > > > ?
> >> > > >>>Yeah, very likely. I think I see the problem, I'll send the 
> >> > > >>> fix shortly.
> > > > >>
> > > > >> Hi Jan,
> > > > >>
> > > > >> It seems that the issue I'm seeing is different from the 
> > > > >> "[prink]  BUG: spinlock
> > > > >> lockup suspected on CPU#0, swapper/1".
> > > > >>
> > > > >> Is there anything else I could try here? The issue is very 
> > > > >> common during testing.
>  > > > 
>  > > > Sasha,
>  > > > 
>  > > > Is this bisectable? Maybe that's the best way forward here.
> >>> > > 
> >>> > > I've ran a bisection again and ended up at the same commit as Jet Chen
> >>> > > (the commit unfortunately already made it to Linus's tree).
> >>> > > 
> >>> > > Note that I did try Jan's proposed fix and that didn't solve the issue
> >>> > > for me, I believe we're seeing different issues caused by the same
> >>> > > commit.
> >> >   Sorry it has been busy time lately and I didn't have as much time to 
> >> > look
> >> > into this as would be needed.
> >   Oops, pressed send too early... So I have two debug patches for you. Can
> > you try whether the problem reproduces with the first one or with both of
> > them applied?
> 
> The first patch fixed it (I assumed that there's no need to try the second).
  Good. So that shows that it is the increased lockdep coverage which is
causing problems - with my patch, lockdep is able to identify some problem
because console drivers are now called with lockdep enabled. But because
the problem was found in some difficult context, lockdep just hung the
machine when trying to report it... Sadly the stacktraces you posted don't
tell us what lockdep found.

Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
problems when holding logbuf_lock? One possibility would be to extend
logbuf_cpu recursion logic to every holder of logbuf_lock. That will at

Re: console: lockup on boot

2014-06-12 Thread Jan Kara
On Wed 11-06-14 23:07:04, Sasha Levin wrote:
 On 06/11/2014 05:31 PM, Jan Kara wrote:
  On Wed 11-06-14 22:34:36, Jan Kara wrote:
   On Wed 11-06-14 10:55:55, Sasha Levin wrote:
On 06/10/2014 11:59 AM, Peter Hurley wrote:
 On 06/06/2014 03:05 PM, Sasha Levin wrote:
 On 05/30/2014 10:07 AM, Jan Kara wrote:
 On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 On 05/30/2014 09:11 AM, Sasha Levin wrote:
 Hi all,

 I sometime see lockups when booting my KVM guest with 
 the latest -next kernel,
 it basically hangs right when it should start 'init', 
 and after a while I get
 the following spew:

 [   30.790833] BUG: spinlock lockup suspected on 
 CPU#1, swapper/1/0

 Maybe related to this report: 
 https://lkml.org/lkml/2014/5/30/26
 from Jet Chen which was bisected to

 commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 Author: Jan Kara j...@suse.cz
 AuthorDate: Thu May 22 10:43:35 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:43:35 2014 +1000

  printk: enable interrupts before calling 
 console_trylock_for_printk()
  We need interrupts disabled when calling 
 console_trylock_for_printk() only
  so that cpu id we pass to can_use_console() remains 
 valid (for other
  things console_sem provides all the exclusion we need 
 and deadlocks on
  console_sem due to interrupts are impossible because 
 we use
  down_trylock()).  However if we are rescheduled, we 
 are guaranteed to run
  on an online cpu so we can easily just get the cpu id 
 in
  can_use_console().
  We can lose a bit of performance when we enable 
 interrupts in
  vprintk_emit() and then disable them again in 
 console_unlock() but OTOH it
  can somewhat reduce interrupt latency caused by 
 console_unlock()
  especially since later in the patch series we will 
 want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton 
 a...@linux-foundation.org

 ?
Yeah, very likely. I think I see the problem, I'll send the 
 fix shortly.

 Hi Jan,

 It seems that the issue I'm seeing is different from the 
 [prink]  BUG: spinlock
 lockup suspected on CPU#0, swapper/1.

 Is there anything else I could try here? The issue is very 
 common during testing.
 
 Sasha,
 
 Is this bisectable? Maybe that's the best way forward here.

I've ran a bisection again and ended up at the same commit as Jet Chen
(the commit unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue
for me, I believe we're seeing different issues caused by the same
commit.
 Sorry it has been busy time lately and I didn't have as much time to 
   look
   into this as would be needed.
Oops, pressed send too early... So I have two debug patches for you. Can
  you try whether the problem reproduces with the first one or with both of
  them applied?
 
 The first patch fixed it (I assumed that there's no need to try the second).
  Good. So that shows that it is the increased lockdep coverage which is
causing problems - with my patch, lockdep is able to identify some problem
because console drivers are now called with lockdep enabled. But because
the problem was found in some difficult context, lockdep just hung the
machine when trying to report it... Sadly the stacktraces you posted don't
tell us what lockdep found.

Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
problems when holding logbuf_lock? One possibility would be to extend
logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
least avoid the spinlock recursion killing the machine but we won't be able
to see what lockdep found...

Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-12 Thread Mike Galbraith
On Thu, 2014-06-12 at 10:26 +0200, Jan Kara wrote: 
 On Wed 11-06-14 23:07:04, Sasha Levin wrote:

  The first patch fixed it (I assumed that there's no need to try the second).
   Good. So that shows that it is the increased lockdep coverage which is
 causing problems - with my patch, lockdep is able to identify some problem
 because console drivers are now called with lockdep enabled. But because
 the problem was found in some difficult context, lockdep just hung the
 machine when trying to report it... Sadly the stacktraces you posted don't
 tell us what lockdep found.
 
 Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
 problems when holding logbuf_lock? One possibility would be to extend
 logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
 least avoid the spinlock recursion killing the machine but we won't be able
 to see what lockdep found...

Could tell lockdep to use trace_printk().

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Sasha Levin
On 06/11/2014 05:31 PM, Jan Kara wrote:
> On Wed 11-06-14 22:34:36, Jan Kara wrote:
>> > On Wed 11-06-14 10:55:55, Sasha Levin wrote:
>>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote:
 > > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> > > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
>> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 > > > On 05/30/2014 09:11 AM, Sasha Levin wrote:
>> > > >>> Hi all,
>> > > >>>
>> > > >>> I sometime see lockups when booting my KVM guest with 
>> > > >>> the latest -next kernel,
>> > > >>> it basically hangs right when it should start 'init', 
>> > > >>> and after a while I get
>> > > >>> the following spew:
>> > > >>>
>> > > >>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, 
>> > > >>> swapper/1/0
 > > >
 > > > Maybe related to this report: 
 > > > https://lkml.org/lkml/2014/5/30/26
 > > > from Jet Chen which was bisected to
 > > >
 > > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 > > > Author: Jan Kara 
 > > > AuthorDate: Thu May 22 10:43:35 2014 +1000
 > > > Commit: Stephen Rothwell 
 > > > CommitDate: Thu May 22 10:43:35 2014 +1000
 > > >
 > > >  printk: enable interrupts before calling 
 > > > console_trylock_for_printk()
 > > >  We need interrupts disabled when calling 
 > > > console_trylock_for_printk() only
 > > >  so that cpu id we pass to can_use_console() remains 
 > > > valid (for other
 > > >  things console_sem provides all the exclusion we need 
 > > > and deadlocks on
 > > >  console_sem due to interrupts are impossible because we 
 > > > use
 > > >  down_trylock()).  However if we are rescheduled, we are 
 > > > guaranteed to run
 > > >  on an online cpu so we can easily just get the cpu id in
 > > >  can_use_console().
 > > >  We can lose a bit of performance when we enable 
 > > > interrupts in
 > > >  vprintk_emit() and then disable them again in 
 > > > console_unlock() but OTOH it
 > > >  can somewhat reduce interrupt latency caused by 
 > > > console_unlock()
 > > >  especially since later in the patch series we will want 
 > > > to spin on
 > > >  console_sem in console_trylock_for_printk().
 > > >  Signed-off-by: Jan Kara 
 > > >  Signed-off-by: Andrew Morton 
 > > >
 > > > ?
>> > > >>>Yeah, very likely. I think I see the problem, I'll send the 
>> > > >>> fix shortly.
> > > >>
> > > >> Hi Jan,
> > > >>
> > > >> It seems that the issue I'm seeing is different from the "[prink]  
> > > >> BUG: spinlock
> > > >> lockup suspected on CPU#0, swapper/1".
> > > >>
> > > >> Is there anything else I could try here? The issue is very common 
> > > >> during testing.
 > > > 
 > > > Sasha,
 > > > 
 > > > Is this bisectable? Maybe that's the best way forward here.
>>> > > 
>>> > > I've ran a bisection again and ended up at the same commit as Jet Chen
>>> > > (the commit unfortunately already made it to Linus's tree).
>>> > > 
>>> > > Note that I did try Jan's proposed fix and that didn't solve the issue
>>> > > for me, I believe we're seeing different issues caused by the same
>>> > > commit.
>> >   Sorry it has been busy time lately and I didn't have as much time to look
>> > into this as would be needed.
>   Oops, pressed send too early... So I have two debug patches for you. Can
> you try whether the problem reproduces with the first one or with both of
> them applied?

The first patch fixed it (I assumed that there's no need to try the second).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 22:34:36, Jan Kara wrote:
> On Wed 11-06-14 10:55:55, Sasha Levin wrote:
> > On 06/10/2014 11:59 AM, Peter Hurley wrote:
> > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
> > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> > > On 05/30/2014 09:11 AM, Sasha Levin wrote:
> > >>> Hi all,
> > >>>
> > >>> I sometime see lockups when booting my KVM guest with the latest 
> > >>> -next kernel,
> > >>> it basically hangs right when it should start 'init', and after a 
> > >>> while I get
> > >>> the following spew:
> > >>>
> > >>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> > >
> > > Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> > > from Jet Chen which was bisected to
> > >
> > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> > > Author: Jan Kara 
> > > AuthorDate: Thu May 22 10:43:35 2014 +1000
> > > Commit: Stephen Rothwell 
> > > CommitDate: Thu May 22 10:43:35 2014 +1000
> > >
> > >  printk: enable interrupts before calling 
> > > console_trylock_for_printk()
> > >  We need interrupts disabled when calling 
> > > console_trylock_for_printk() only
> > >  so that cpu id we pass to can_use_console() remains valid (for 
> > > other
> > >  things console_sem provides all the exclusion we need and 
> > > deadlocks on
> > >  console_sem due to interrupts are impossible because we use
> > >  down_trylock()).  However if we are rescheduled, we are 
> > > guaranteed to run
> > >  on an online cpu so we can easily just get the cpu id in
> > >  can_use_console().
> > >  We can lose a bit of performance when we enable interrupts in
> > >  vprintk_emit() and then disable them again in console_unlock() 
> > > but OTOH it
> > >  can somewhat reduce interrupt latency caused by console_unlock()
> > >  especially since later in the patch series we will want to spin 
> > > on
> > >  console_sem in console_trylock_for_printk().
> > >  Signed-off-by: Jan Kara 
> > >  Signed-off-by: Andrew Morton 
> > >
> > > ?
> > >>>Yeah, very likely. I think I see the problem, I'll send the fix 
> > >>> shortly.
> > >>
> > >> Hi Jan,
> > >>
> > >> It seems that the issue I'm seeing is different from the "[prink]  BUG: 
> > >> spinlock
> > >> lockup suspected on CPU#0, swapper/1".
> > >>
> > >> Is there anything else I could try here? The issue is very common during 
> > >> testing.
> > > 
> > > Sasha,
> > > 
> > > Is this bisectable? Maybe that's the best way forward here.
> > 
> > I've ran a bisection again and ended up at the same commit as Jet Chen
> > (the commit unfortunately already made it to Linus's tree).
> > 
> > Note that I did try Jan's proposed fix and that didn't solve the issue
> > for me, I believe we're seeing different issues caused by the same
> > commit.
>   Sorry it has been busy time lately and I didn't have as much time to look
> into this as would be needed.
  Oops, pressed send too early... So I have two debug patches for you. Can
you try whether the problem reproduces with the first one or with both of
them applied?

The stacktrace from spinlock lockup is somewhat weird - the spinlock lockup
message comes from NMI context where we tried to do print stack dump.
Indeed calling printk from NMI context *can* lead to this sort of lockups
(and the current process is a worker thread which apparently runs
fb_flashcursor() function which calls console_unlock() which can hold
logbuf_lock so all these things fit together).  What I really fail to see
is how my patch makes the problem happen to you pretty reliably.

Another somewhat strange thing to me is that although lock->owner in your
dump looks fine and points to currently running process, lock->owner_cpu is
-1 which is a combination that shouldn't ever happen as I'm looking into
spinlock debug code.

So for now I'm still puzzled.
Honza

-- 
Jan Kara 
SUSE Labs, CR
>From bf739a5af7a73b08b3dea0491d38816f686dbf57 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Wed, 11 Jun 2014 22:33:23 +0200
Subject: [PATCH 1/2] printk: Debug patch1

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5ea6b148a1a6..da94dc18b6fa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1649,6 +1649,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * console_sem which would prevent anyone from printing to console
 	 */
 	preempt_disable();
+	lockdep_off();
 	/*
 	 * Try to acquire and then immediately release the console semaphore.
 	 * The release will print out buffers and wake up /dev/kmsg and syslog()
@@ -1656,6 +1657,7 @@ asmlinkage 

Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 10:55:55, Sasha Levin wrote:
> On 06/10/2014 11:59 AM, Peter Hurley wrote:
> > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> >> On 05/30/2014 10:07 AM, Jan Kara wrote:
> >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> > On 05/30/2014 09:11 AM, Sasha Levin wrote:
> >>> Hi all,
> >>>
> >>> I sometime see lockups when booting my KVM guest with the latest 
> >>> -next kernel,
> >>> it basically hangs right when it should start 'init', and after a 
> >>> while I get
> >>> the following spew:
> >>>
> >>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >
> > Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> > from Jet Chen which was bisected to
> >
> > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> > Author: Jan Kara 
> > AuthorDate: Thu May 22 10:43:35 2014 +1000
> > Commit: Stephen Rothwell 
> > CommitDate: Thu May 22 10:43:35 2014 +1000
> >
> >  printk: enable interrupts before calling 
> > console_trylock_for_printk()
> >  We need interrupts disabled when calling 
> > console_trylock_for_printk() only
> >  so that cpu id we pass to can_use_console() remains valid (for 
> > other
> >  things console_sem provides all the exclusion we need and 
> > deadlocks on
> >  console_sem due to interrupts are impossible because we use
> >  down_trylock()).  However if we are rescheduled, we are guaranteed 
> > to run
> >  on an online cpu so we can easily just get the cpu id in
> >  can_use_console().
> >  We can lose a bit of performance when we enable interrupts in
> >  vprintk_emit() and then disable them again in console_unlock() but 
> > OTOH it
> >  can somewhat reduce interrupt latency caused by console_unlock()
> >  especially since later in the patch series we will want to spin on
> >  console_sem in console_trylock_for_printk().
> >  Signed-off-by: Jan Kara 
> >  Signed-off-by: Andrew Morton 
> >
> > ?
> >>>Yeah, very likely. I think I see the problem, I'll send the fix 
> >>> shortly.
> >>
> >> Hi Jan,
> >>
> >> It seems that the issue I'm seeing is different from the "[prink]  BUG: 
> >> spinlock
> >> lockup suspected on CPU#0, swapper/1".
> >>
> >> Is there anything else I could try here? The issue is very common during 
> >> testing.
> > 
> > Sasha,
> > 
> > Is this bisectable? Maybe that's the best way forward here.
> 
> I've ran a bisection again and ended up at the same commit as Jet Chen
> (the commit unfortunately already made it to Linus's tree).
> 
> Note that I did try Jan's proposed fix and that didn't solve the issue
> for me, I believe we're seeing different issues caused by the same
> commit.
  Sorry it has been busy time lately and I didn't have as much time to look
into this as would be needed.
> 
> 
> 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
> commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
> Author: Jan Kara 
> Date:   Wed Jun 4 16:11:37 2014 -0700
> 
> printk: enable interrupts before calling console_trylock_for_printk()
> 
> We need interrupts disabled when calling console_trylock_for_printk()
> only so that cpu id we pass to can_use_console() remains valid (for
> other things console_sem provides all the exclusion we need and
> deadlocks on console_sem due to interrupts are impossible because we use
> down_trylock()).  However if we are rescheduled, we are guaranteed to
> run on an online cpu so we can easily just get the cpu id in
> can_use_console().
> 
> We can lose a bit of performance when we enable interrupts in
> vprintk_emit() and then disable them again in console_unlock() but OTOH
> it can somewhat reduce interrupt latency caused by console_unlock()
> especially since later in the patch series we will want to spin on
> console_sem in console_trylock_for_printk().
> 
> Signed-off-by: Jan Kara 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> 
> 
> Thanks,
> Sasha
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 11:34:28, Peter Hurley wrote:
> On 06/11/2014 10:55 AM, Sasha Levin wrote:
> >On 06/10/2014 11:59 AM, Peter Hurley wrote:
> >>On 06/06/2014 03:05 PM, Sasha Levin wrote:
> >>>On 05/30/2014 10:07 AM, Jan Kara wrote:
> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> >>On 05/30/2014 09:11 AM, Sasha Levin wrote:
> Hi all,
> 
> I sometime see lockups when booting my KVM guest with the latest 
> -next kernel,
> it basically hangs right when it should start 'init', and after a 
> while I get
> the following spew:
> 
> [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >>
> >>Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> >>from Jet Chen which was bisected to
> >>
> >>commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> >>Author: Jan Kara 
> >>AuthorDate: Thu May 22 10:43:35 2014 +1000
> >>Commit: Stephen Rothwell 
> >>CommitDate: Thu May 22 10:43:35 2014 +1000
> >>
> >>  printk: enable interrupts before calling 
> >> console_trylock_for_printk()
> >>  We need interrupts disabled when calling 
> >> console_trylock_for_printk() only
> >>  so that cpu id we pass to can_use_console() remains valid (for 
> >> other
> >>  things console_sem provides all the exclusion we need and 
> >> deadlocks on
> >>  console_sem due to interrupts are impossible because we use
> >>  down_trylock()).  However if we are rescheduled, we are 
> >> guaranteed to run
> >>  on an online cpu so we can easily just get the cpu id in
> >>  can_use_console().
> >>  We can lose a bit of performance when we enable interrupts in
> >>  vprintk_emit() and then disable them again in console_unlock() 
> >> but OTOH it
> >>  can somewhat reduce interrupt latency caused by console_unlock()
> >>  especially since later in the patch series we will want to spin on
> >>  console_sem in console_trylock_for_printk().
> >>  Signed-off-by: Jan Kara 
> >>  Signed-off-by: Andrew Morton 
> >>
> >>?
> Yeah, very likely. I think I see the problem, I'll send the fix 
>  shortly.
> >>>
> >>>Hi Jan,
> >>>
> >>>It seems that the issue I'm seeing is different from the "[prink]  BUG: 
> >>>spinlock
> >>>lockup suspected on CPU#0, swapper/1".
> >>>
> >>>Is there anything else I could try here? The issue is very common during 
> >>>testing.
> >>
> >>Sasha,
> >>
> >>Is this bisectable? Maybe that's the best way forward here.
> >
> >I've ran a bisection again and ended up at the same commit as Jet Chen (the 
> >commit
> >unfortunately already made it to Linus's tree).
> >
> >Note that I did try Jan's proposed fix and that didn't solve the issue for 
> >me, I
> >believe we're seeing different issues caused by the same commit.
> >
> >
> >939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
> >commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
> >Author: Jan Kara 
> >Date:   Wed Jun 4 16:11:37 2014 -0700
> >
> > printk: enable interrupts before calling console_trylock_for_printk()
> >
> > We need interrupts disabled when calling console_trylock_for_printk()
> > only so that cpu id we pass to can_use_console() remains valid (for
> > other things console_sem provides all the exclusion we need and
> > deadlocks on console_sem due to interrupts are impossible because we use
> > down_trylock()).  However if we are rescheduled, we are guaranteed to
> > run on an online cpu so we can easily just get the cpu id in
> > can_use_console().
> >
> > We can lose a bit of performance when we enable interrupts in
> > vprintk_emit() and then disable them again in console_unlock() but OTOH
> > it can somewhat reduce interrupt latency caused by console_unlock()
> > especially since later in the patch series we will want to spin on
> > console_sem in console_trylock_for_printk().
> >
> > Signed-off-by: Jan Kara 
> > Signed-off-by: Andrew Morton 
> > Signed-off-by: Linus Torvalds 
> 
> I apologize; I didn't look at the patch very closely, but now that I do,
> this sticks out:
> 
> @@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,
> 
>   logbuf_cpu = UINT_MAX;
>   raw_spin_unlock(_lock);
> + lockdep_on();
> + local_irq_restore(flags);
> +
> 
> What prevents cpu migration right here?
  Nothing.

> If nothing, then logbuf_cpu is now stale and the recursion test at
> the top of vprintk_emit is doing nothing to prevent recursion.
  Well, note that logbuf_cpu has just been set to UINT_MAX (i.e. undefined)
two lines above. So my patch changes nothing wrt. how printk recursion
detection works (at least AFAICT).
 
> + /*
> +  * Disable preemption to avoid being preempted while holding
> +  * console_sem which would prevent anyone from 

Re: console: lockup on boot

2014-06-11 Thread Linus Torvalds
On Wed, Jun 11, 2014 at 10:38 AM, Linus Torvalds
 wrote:
>
> Ok, I'm going to revert that commit.  The games it plays with cpu
> numbers and preemption make me worried, and it looks silly.
>
> If the only reason for that patch is interrupt latency, then if the
> "preempt_disable()" had been done *before* restoring irq's, none of
> those "let's reload the CPU number" games would have been necessary,
> and the patch could have been smaller.

Ugh. It doesn't revert cleanly, so I guess I'm not just reverting it
after all. Jan?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Linus Torvalds
On Wed, Jun 11, 2014 at 7:55 AM, Sasha Levin  wrote:
>
> I've ran a bisection again and ended up at the same commit as Jet Chen (the 
> commit
> unfortunately already made it to Linus's tree).

Ok, I'm going to revert that commit.  The games it plays with cpu
numbers and preemption make me worried, and it looks silly.

If the only reason for that patch is interrupt latency, then if the
"preempt_disable()" had been done *before* restoring irq's, none of
those "let's reload the CPU number" games would have been necessary,
and the patch could have been smaller. So even if that is the solution
to the problems people report, it's actually *better* to just revert
the patch entirely, and try again with a smaller and simpler version,
rather than try to change the patch after-the-fact.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Peter Hurley

On 06/11/2014 10:55 AM, Sasha Levin wrote:

On 06/10/2014 11:59 AM, Peter Hurley wrote:

On 06/06/2014 03:05 PM, Sasha Levin wrote:

On 05/30/2014 10:07 AM, Jan Kara wrote:

On Fri 30-05-14 09:58:14, Peter Hurley wrote:

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara 
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell 
CommitDate: Thu May 22 10:43:35 2014 +1000

  printk: enable interrupts before calling console_trylock_for_printk()
  We need interrupts disabled when calling console_trylock_for_printk() 
only
  so that cpu id we pass to can_use_console() remains valid (for other
  things console_sem provides all the exclusion we need and deadlocks on
  console_sem due to interrupts are impossible because we use
  down_trylock()).  However if we are rescheduled, we are guaranteed to run
  on an online cpu so we can easily just get the cpu id in
  can_use_console().
  We can lose a bit of performance when we enable interrupts in
  vprintk_emit() and then disable them again in console_unlock() but OTOH it
  can somewhat reduce interrupt latency caused by console_unlock()
  especially since later in the patch series we will want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara 
  Signed-off-by: Andrew Morton 

?

Yeah, very likely. I think I see the problem, I'll send the fix shortly.


Hi Jan,

It seems that the issue I'm seeing is different from the "[prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1".

Is there anything else I could try here? The issue is very common during 
testing.


Sasha,

Is this bisectable? Maybe that's the best way forward here.


I've ran a bisection again and ended up at the same commit as Jet Chen (the 
commit
unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue for me, I
believe we're seeing different issues caused by the same commit.


939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
Author: Jan Kara 
Date:   Wed Jun 4 16:11:37 2014 -0700

 printk: enable interrupts before calling console_trylock_for_printk()

 We need interrupts disabled when calling console_trylock_for_printk()
 only so that cpu id we pass to can_use_console() remains valid (for
 other things console_sem provides all the exclusion we need and
 deadlocks on console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to
 run on an online cpu so we can easily just get the cpu id in
 can_use_console().

 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH
 it can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().

 Signed-off-by: Jan Kara 
 Signed-off-by: Andrew Morton 
 Signed-off-by: Linus Torvalds 


I apologize; I didn't look at the patch very closely, but now that I do, this
sticks out:

@@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,

logbuf_cpu = UINT_MAX;
raw_spin_unlock(_lock);
+   lockdep_on();
+   local_irq_restore(flags);
+

What prevents cpu migration right here?
If nothing, then logbuf_cpu is now stale and the recursion test at
the top of vprintk_emit is doing nothing to prevent recursion.


+   /*
+* Disable preemption to avoid being preempted while holding
+* console_sem which would prevent anyone from printing to console
+*/
+   preempt_disable();
/*
 * Try to acquire and then immediately release the console semaphore.
 * The release will print out buffers and wake up /dev/kmsg and syslog()
 * users.
 */
-   if (console_trylock_for_printk(this_cpu))
+   if (console_trylock_for_printk())
console_unlock();

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Sasha Levin
On 06/10/2014 11:59 AM, Peter Hurley wrote:
> On 06/06/2014 03:05 PM, Sasha Levin wrote:
>> On 05/30/2014 10:07 AM, Jan Kara wrote:
>>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> On 05/30/2014 09:11 AM, Sasha Levin wrote:
>>> Hi all,
>>>
>>> I sometime see lockups when booting my KVM guest with the latest -next 
>>> kernel,
>>> it basically hangs right when it should start 'init', and after a while 
>>> I get
>>> the following spew:
>>>
>>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
>
> Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> from Jet Chen which was bisected to
>
> commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> Author: Jan Kara 
> AuthorDate: Thu May 22 10:43:35 2014 +1000
> Commit: Stephen Rothwell 
> CommitDate: Thu May 22 10:43:35 2014 +1000
>
>  printk: enable interrupts before calling console_trylock_for_printk()
>  We need interrupts disabled when calling 
> console_trylock_for_printk() only
>  so that cpu id we pass to can_use_console() remains valid (for other
>  things console_sem provides all the exclusion we need and deadlocks 
> on
>  console_sem due to interrupts are impossible because we use
>  down_trylock()).  However if we are rescheduled, we are guaranteed 
> to run
>  on an online cpu so we can easily just get the cpu id in
>  can_use_console().
>  We can lose a bit of performance when we enable interrupts in
>  vprintk_emit() and then disable them again in console_unlock() but 
> OTOH it
>  can somewhat reduce interrupt latency caused by console_unlock()
>  especially since later in the patch series we will want to spin on
>  console_sem in console_trylock_for_printk().
>  Signed-off-by: Jan Kara 
>  Signed-off-by: Andrew Morton 
>
> ?
>>>Yeah, very likely. I think I see the problem, I'll send the fix shortly.
>>
>> Hi Jan,
>>
>> It seems that the issue I'm seeing is different from the "[prink]  BUG: 
>> spinlock
>> lockup suspected on CPU#0, swapper/1".
>>
>> Is there anything else I could try here? The issue is very common during 
>> testing.
> 
> Sasha,
> 
> Is this bisectable? Maybe that's the best way forward here.

I've ran a bisection again and ended up at the same commit as Jet Chen (the 
commit
unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue for me, I
believe we're seeing different issues caused by the same commit.


939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
Author: Jan Kara 
Date:   Wed Jun 4 16:11:37 2014 -0700

printk: enable interrupts before calling console_trylock_for_printk()

We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Sasha Levin
On 06/10/2014 11:59 AM, Peter Hurley wrote:
 On 06/06/2014 03:05 PM, Sasha Levin wrote:
 On 05/30/2014 10:07 AM, Jan Kara wrote:
 On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 On 05/30/2014 09:11 AM, Sasha Levin wrote:
 Hi all,

 I sometime see lockups when booting my KVM guest with the latest -next 
 kernel,
 it basically hangs right when it should start 'init', and after a while 
 I get
 the following spew:

 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0

 Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
 from Jet Chen which was bisected to

 commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 Author: Jan Kara j...@suse.cz
 AuthorDate: Thu May 22 10:43:35 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:43:35 2014 +1000

  printk: enable interrupts before calling console_trylock_for_printk()
  We need interrupts disabled when calling 
 console_trylock_for_printk() only
  so that cpu id we pass to can_use_console() remains valid (for other
  things console_sem provides all the exclusion we need and deadlocks 
 on
  console_sem due to interrupts are impossible because we use
  down_trylock()).  However if we are rescheduled, we are guaranteed 
 to run
  on an online cpu so we can easily just get the cpu id in
  can_use_console().
  We can lose a bit of performance when we enable interrupts in
  vprintk_emit() and then disable them again in console_unlock() but 
 OTOH it
  can somewhat reduce interrupt latency caused by console_unlock()
  especially since later in the patch series we will want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton a...@linux-foundation.org

 ?
Yeah, very likely. I think I see the problem, I'll send the fix shortly.

 Hi Jan,

 It seems that the issue I'm seeing is different from the [prink]  BUG: 
 spinlock
 lockup suspected on CPU#0, swapper/1.

 Is there anything else I could try here? The issue is very common during 
 testing.
 
 Sasha,
 
 Is this bisectable? Maybe that's the best way forward here.

I've ran a bisection again and ended up at the same commit as Jet Chen (the 
commit
unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue for me, I
believe we're seeing different issues caused by the same commit.


939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
Author: Jan Kara j...@suse.cz
Date:   Wed Jun 4 16:11:37 2014 -0700

printk: enable interrupts before calling console_trylock_for_printk()

We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara j...@suse.cz
Signed-off-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Linus Torvalds torva...@linux-foundation.org


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Peter Hurley

On 06/11/2014 10:55 AM, Sasha Levin wrote:

On 06/10/2014 11:59 AM, Peter Hurley wrote:

On 06/06/2014 03:05 PM, Sasha Levin wrote:

On 05/30/2014 10:07 AM, Jan Kara wrote:

On Fri 30-05-14 09:58:14, Peter Hurley wrote:

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara j...@suse.cz
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell s...@canb.auug.org.au
CommitDate: Thu May 22 10:43:35 2014 +1000

  printk: enable interrupts before calling console_trylock_for_printk()
  We need interrupts disabled when calling console_trylock_for_printk() 
only
  so that cpu id we pass to can_use_console() remains valid (for other
  things console_sem provides all the exclusion we need and deadlocks on
  console_sem due to interrupts are impossible because we use
  down_trylock()).  However if we are rescheduled, we are guaranteed to run
  on an online cpu so we can easily just get the cpu id in
  can_use_console().
  We can lose a bit of performance when we enable interrupts in
  vprintk_emit() and then disable them again in console_unlock() but OTOH it
  can somewhat reduce interrupt latency caused by console_unlock()
  especially since later in the patch series we will want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton a...@linux-foundation.org

?

Yeah, very likely. I think I see the problem, I'll send the fix shortly.


Hi Jan,

It seems that the issue I'm seeing is different from the [prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1.

Is there anything else I could try here? The issue is very common during 
testing.


Sasha,

Is this bisectable? Maybe that's the best way forward here.


I've ran a bisection again and ended up at the same commit as Jet Chen (the 
commit
unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue for me, I
believe we're seeing different issues caused by the same commit.


939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
Author: Jan Kara j...@suse.cz
Date:   Wed Jun 4 16:11:37 2014 -0700

 printk: enable interrupts before calling console_trylock_for_printk()

 We need interrupts disabled when calling console_trylock_for_printk()
 only so that cpu id we pass to can_use_console() remains valid (for
 other things console_sem provides all the exclusion we need and
 deadlocks on console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to
 run on an online cpu so we can easily just get the cpu id in
 can_use_console().

 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH
 it can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().

 Signed-off-by: Jan Kara j...@suse.cz
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Linus Torvalds torva...@linux-foundation.org


I apologize; I didn't look at the patch very closely, but now that I do, this
sticks out:

@@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,

logbuf_cpu = UINT_MAX;
raw_spin_unlock(logbuf_lock);
+   lockdep_on();
+   local_irq_restore(flags);
+

What prevents cpu migration right here?
If nothing, then logbuf_cpu is now stale and the recursion test at
the top of vprintk_emit is doing nothing to prevent recursion.


+   /*
+* Disable preemption to avoid being preempted while holding
+* console_sem which would prevent anyone from printing to console
+*/
+   preempt_disable();
/*
 * Try to acquire and then immediately release the console semaphore.
 * The release will print out buffers and wake up /dev/kmsg and syslog()
 * users.
 */
-   if (console_trylock_for_printk(this_cpu))
+   if (console_trylock_for_printk())
console_unlock();

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Linus Torvalds
On Wed, Jun 11, 2014 at 7:55 AM, Sasha Levin sasha.le...@oracle.com wrote:

 I've ran a bisection again and ended up at the same commit as Jet Chen (the 
 commit
 unfortunately already made it to Linus's tree).

Ok, I'm going to revert that commit.  The games it plays with cpu
numbers and preemption make me worried, and it looks silly.

If the only reason for that patch is interrupt latency, then if the
preempt_disable() had been done *before* restoring irq's, none of
those let's reload the CPU number games would have been necessary,
and the patch could have been smaller. So even if that is the solution
to the problems people report, it's actually *better* to just revert
the patch entirely, and try again with a smaller and simpler version,
rather than try to change the patch after-the-fact.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Linus Torvalds
On Wed, Jun 11, 2014 at 10:38 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Ok, I'm going to revert that commit.  The games it plays with cpu
 numbers and preemption make me worried, and it looks silly.

 If the only reason for that patch is interrupt latency, then if the
 preempt_disable() had been done *before* restoring irq's, none of
 those let's reload the CPU number games would have been necessary,
 and the patch could have been smaller.

Ugh. It doesn't revert cleanly, so I guess I'm not just reverting it
after all. Jan?

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 11:34:28, Peter Hurley wrote:
 On 06/11/2014 10:55 AM, Sasha Levin wrote:
 On 06/10/2014 11:59 AM, Peter Hurley wrote:
 On 06/06/2014 03:05 PM, Sasha Levin wrote:
 On 05/30/2014 10:07 AM, Jan Kara wrote:
 On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 On 05/30/2014 09:11 AM, Sasha Levin wrote:
 Hi all,
 
 I sometime see lockups when booting my KVM guest with the latest 
 -next kernel,
 it basically hangs right when it should start 'init', and after a 
 while I get
 the following spew:
 
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 
 Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
 from Jet Chen which was bisected to
 
 commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 Author: Jan Kara j...@suse.cz
 AuthorDate: Thu May 22 10:43:35 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:43:35 2014 +1000
 
   printk: enable interrupts before calling 
  console_trylock_for_printk()
   We need interrupts disabled when calling 
  console_trylock_for_printk() only
   so that cpu id we pass to can_use_console() remains valid (for 
  other
   things console_sem provides all the exclusion we need and 
  deadlocks on
   console_sem due to interrupts are impossible because we use
   down_trylock()).  However if we are rescheduled, we are 
  guaranteed to run
   on an online cpu so we can easily just get the cpu id in
   can_use_console().
   We can lose a bit of performance when we enable interrupts in
   vprintk_emit() and then disable them again in console_unlock() 
  but OTOH it
   can somewhat reduce interrupt latency caused by console_unlock()
   especially since later in the patch series we will want to spin on
   console_sem in console_trylock_for_printk().
   Signed-off-by: Jan Kara j...@suse.cz
   Signed-off-by: Andrew Morton a...@linux-foundation.org
 
 ?
 Yeah, very likely. I think I see the problem, I'll send the fix 
  shortly.
 
 Hi Jan,
 
 It seems that the issue I'm seeing is different from the [prink]  BUG: 
 spinlock
 lockup suspected on CPU#0, swapper/1.
 
 Is there anything else I could try here? The issue is very common during 
 testing.
 
 Sasha,
 
 Is this bisectable? Maybe that's the best way forward here.
 
 I've ran a bisection again and ended up at the same commit as Jet Chen (the 
 commit
 unfortunately already made it to Linus's tree).
 
 Note that I did try Jan's proposed fix and that didn't solve the issue for 
 me, I
 believe we're seeing different issues caused by the same commit.
 
 
 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
 commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
 Author: Jan Kara j...@suse.cz
 Date:   Wed Jun 4 16:11:37 2014 -0700
 
  printk: enable interrupts before calling console_trylock_for_printk()
 
  We need interrupts disabled when calling console_trylock_for_printk()
  only so that cpu id we pass to can_use_console() remains valid (for
  other things console_sem provides all the exclusion we need and
  deadlocks on console_sem due to interrupts are impossible because we use
  down_trylock()).  However if we are rescheduled, we are guaranteed to
  run on an online cpu so we can easily just get the cpu id in
  can_use_console().
 
  We can lose a bit of performance when we enable interrupts in
  vprintk_emit() and then disable them again in console_unlock() but OTOH
  it can somewhat reduce interrupt latency caused by console_unlock()
  especially since later in the patch series we will want to spin on
  console_sem in console_trylock_for_printk().
 
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton a...@linux-foundation.org
  Signed-off-by: Linus Torvalds torva...@linux-foundation.org
 
 I apologize; I didn't look at the patch very closely, but now that I do,
 this sticks out:
 
 @@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,
 
   logbuf_cpu = UINT_MAX;
   raw_spin_unlock(logbuf_lock);
 + lockdep_on();
 + local_irq_restore(flags);
 +
 
 What prevents cpu migration right here?
  Nothing.

 If nothing, then logbuf_cpu is now stale and the recursion test at
 the top of vprintk_emit is doing nothing to prevent recursion.
  Well, note that logbuf_cpu has just been set to UINT_MAX (i.e. undefined)
two lines above. So my patch changes nothing wrt. how printk recursion
detection works (at least AFAICT).
 
 + /*
 +  * Disable preemption to avoid being preempted while holding
 +  * console_sem which would prevent anyone from printing to console
 +  */
 + preempt_disable();
   /*
* Try to acquire and then immediately release the console semaphore.
* The release will print out buffers and wake up /dev/kmsg and syslog()
* users.
*/
 - if (console_trylock_for_printk(this_cpu))
 + if 

Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 10:55:55, Sasha Levin wrote:
 On 06/10/2014 11:59 AM, Peter Hurley wrote:
  On 06/06/2014 03:05 PM, Sasha Levin wrote:
  On 05/30/2014 10:07 AM, Jan Kara wrote:
  On Fri 30-05-14 09:58:14, Peter Hurley wrote:
  On 05/30/2014 09:11 AM, Sasha Levin wrote:
  Hi all,
 
  I sometime see lockups when booting my KVM guest with the latest 
  -next kernel,
  it basically hangs right when it should start 'init', and after a 
  while I get
  the following spew:
 
  [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 
  Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
  from Jet Chen which was bisected to
 
  commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
  Author: Jan Kara j...@suse.cz
  AuthorDate: Thu May 22 10:43:35 2014 +1000
  Commit: Stephen Rothwell s...@canb.auug.org.au
  CommitDate: Thu May 22 10:43:35 2014 +1000
 
   printk: enable interrupts before calling 
  console_trylock_for_printk()
   We need interrupts disabled when calling 
  console_trylock_for_printk() only
   so that cpu id we pass to can_use_console() remains valid (for 
  other
   things console_sem provides all the exclusion we need and 
  deadlocks on
   console_sem due to interrupts are impossible because we use
   down_trylock()).  However if we are rescheduled, we are guaranteed 
  to run
   on an online cpu so we can easily just get the cpu id in
   can_use_console().
   We can lose a bit of performance when we enable interrupts in
   vprintk_emit() and then disable them again in console_unlock() but 
  OTOH it
   can somewhat reduce interrupt latency caused by console_unlock()
   especially since later in the patch series we will want to spin on
   console_sem in console_trylock_for_printk().
   Signed-off-by: Jan Kara j...@suse.cz
   Signed-off-by: Andrew Morton a...@linux-foundation.org
 
  ?
 Yeah, very likely. I think I see the problem, I'll send the fix 
  shortly.
 
  Hi Jan,
 
  It seems that the issue I'm seeing is different from the [prink]  BUG: 
  spinlock
  lockup suspected on CPU#0, swapper/1.
 
  Is there anything else I could try here? The issue is very common during 
  testing.
  
  Sasha,
  
  Is this bisectable? Maybe that's the best way forward here.
 
 I've ran a bisection again and ended up at the same commit as Jet Chen
 (the commit unfortunately already made it to Linus's tree).
 
 Note that I did try Jan's proposed fix and that didn't solve the issue
 for me, I believe we're seeing different issues caused by the same
 commit.
  Sorry it has been busy time lately and I didn't have as much time to look
into this as would be needed.
 
 
 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
 commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
 Author: Jan Kara j...@suse.cz
 Date:   Wed Jun 4 16:11:37 2014 -0700
 
 printk: enable interrupts before calling console_trylock_for_printk()
 
 We need interrupts disabled when calling console_trylock_for_printk()
 only so that cpu id we pass to can_use_console() remains valid (for
 other things console_sem provides all the exclusion we need and
 deadlocks on console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to
 run on an online cpu so we can easily just get the cpu id in
 can_use_console().
 
 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH
 it can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().
 
 Signed-off-by: Jan Kara j...@suse.cz
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Linus Torvalds torva...@linux-foundation.org
 
 
 Thanks,
 Sasha
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-11 Thread Jan Kara
On Wed 11-06-14 22:34:36, Jan Kara wrote:
 On Wed 11-06-14 10:55:55, Sasha Levin wrote:
  On 06/10/2014 11:59 AM, Peter Hurley wrote:
   On 06/06/2014 03:05 PM, Sasha Levin wrote:
   On 05/30/2014 10:07 AM, Jan Kara wrote:
   On Fri 30-05-14 09:58:14, Peter Hurley wrote:
   On 05/30/2014 09:11 AM, Sasha Levin wrote:
   Hi all,
  
   I sometime see lockups when booting my KVM guest with the latest 
   -next kernel,
   it basically hangs right when it should start 'init', and after a 
   while I get
   the following spew:
  
   [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
  
   Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
   from Jet Chen which was bisected to
  
   commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
   Author: Jan Kara j...@suse.cz
   AuthorDate: Thu May 22 10:43:35 2014 +1000
   Commit: Stephen Rothwell s...@canb.auug.org.au
   CommitDate: Thu May 22 10:43:35 2014 +1000
  
printk: enable interrupts before calling 
   console_trylock_for_printk()
We need interrupts disabled when calling 
   console_trylock_for_printk() only
so that cpu id we pass to can_use_console() remains valid (for 
   other
things console_sem provides all the exclusion we need and 
   deadlocks on
console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are 
   guaranteed to run
on an online cpu so we can easily just get the cpu id in
can_use_console().
We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() 
   but OTOH it
can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin 
   on
console_sem in console_trylock_for_printk().
Signed-off-by: Jan Kara j...@suse.cz
Signed-off-by: Andrew Morton a...@linux-foundation.org
  
   ?
  Yeah, very likely. I think I see the problem, I'll send the fix 
   shortly.
  
   Hi Jan,
  
   It seems that the issue I'm seeing is different from the [prink]  BUG: 
   spinlock
   lockup suspected on CPU#0, swapper/1.
  
   Is there anything else I could try here? The issue is very common during 
   testing.
   
   Sasha,
   
   Is this bisectable? Maybe that's the best way forward here.
  
  I've ran a bisection again and ended up at the same commit as Jet Chen
  (the commit unfortunately already made it to Linus's tree).
  
  Note that I did try Jan's proposed fix and that didn't solve the issue
  for me, I believe we're seeing different issues caused by the same
  commit.
   Sorry it has been busy time lately and I didn't have as much time to look
 into this as would be needed.
  Oops, pressed send too early... So I have two debug patches for you. Can
you try whether the problem reproduces with the first one or with both of
them applied?

The stacktrace from spinlock lockup is somewhat weird - the spinlock lockup
message comes from NMI context where we tried to do print stack dump.
Indeed calling printk from NMI context *can* lead to this sort of lockups
(and the current process is a worker thread which apparently runs
fb_flashcursor() function which calls console_unlock() which can hold
logbuf_lock so all these things fit together).  What I really fail to see
is how my patch makes the problem happen to you pretty reliably.

Another somewhat strange thing to me is that although lock-owner in your
dump looks fine and points to currently running process, lock-owner_cpu is
-1 which is a combination that shouldn't ever happen as I'm looking into
spinlock debug code.

So for now I'm still puzzled.
Honza

-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
From bf739a5af7a73b08b3dea0491d38816f686dbf57 Mon Sep 17 00:00:00 2001
From: Jan Kara j...@suse.cz
Date: Wed, 11 Jun 2014 22:33:23 +0200
Subject: [PATCH 1/2] printk: Debug patch1

Signed-off-by: Jan Kara j...@suse.cz
---
 kernel/printk/printk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5ea6b148a1a6..da94dc18b6fa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1649,6 +1649,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * console_sem which would prevent anyone from printing to console
 	 */
 	preempt_disable();
+	lockdep_off();
 	/*
 	 * Try to acquire and then immediately release the console semaphore.
 	 * The release will print out buffers and wake up /dev/kmsg and syslog()
@@ -1656,6 +1657,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 */
 	if (console_trylock_for_printk())
 		console_unlock();
+	lockdep_on();
 	preempt_enable();
 
 	return printed_len;
-- 
1.8.1.4

From 976ab0ea1d855686d8166b9c08239aa50e1c983b Mon Sep 17 00:00:00 2001
From: Jan Kara j...@suse.cz
Date: Wed, 11 Jun 2014 22:38:41 +0200

Re: console: lockup on boot

2014-06-11 Thread Sasha Levin
On 06/11/2014 05:31 PM, Jan Kara wrote:
 On Wed 11-06-14 22:34:36, Jan Kara wrote:
  On Wed 11-06-14 10:55:55, Sasha Levin wrote:
   On 06/10/2014 11:59 AM, Peter Hurley wrote:
On 06/06/2014 03:05 PM, Sasha Levin wrote:
On 05/30/2014 10:07 AM, Jan Kara wrote:
On Fri 30-05-14 09:58:14, Peter Hurley wrote:
On 05/30/2014 09:11 AM, Sasha Levin wrote:
Hi all,
   
I sometime see lockups when booting my KVM guest with 
the latest -next kernel,
it basically hangs right when it should start 'init', 
and after a while I get
the following spew:
   
[   30.790833] BUG: spinlock lockup suspected on CPU#1, 
swapper/1/0
   
Maybe related to this report: 
https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to
   
commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara j...@suse.cz
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell s...@canb.auug.org.au
CommitDate: Thu May 22 10:43:35 2014 +1000
   
 printk: enable interrupts before calling 
console_trylock_for_printk()
 We need interrupts disabled when calling 
console_trylock_for_printk() only
 so that cpu id we pass to can_use_console() remains 
valid (for other
 things console_sem provides all the exclusion we need 
and deadlocks on
 console_sem due to interrupts are impossible because we 
use
 down_trylock()).  However if we are rescheduled, we are 
guaranteed to run
 on an online cpu so we can easily just get the cpu id in
 can_use_console().
 We can lose a bit of performance when we enable 
interrupts in
 vprintk_emit() and then disable them again in 
console_unlock() but OTOH it
 can somewhat reduce interrupt latency caused by 
console_unlock()
 especially since later in the patch series we will want 
to spin on
 console_sem in console_trylock_for_printk().
 Signed-off-by: Jan Kara j...@suse.cz
 Signed-off-by: Andrew Morton a...@linux-foundation.org
   
?
   Yeah, very likely. I think I see the problem, I'll send the 
fix shortly.
   
Hi Jan,
   
It seems that the issue I'm seeing is different from the [prink]  
BUG: spinlock
lockup suspected on CPU#0, swapper/1.
   
Is there anything else I could try here? The issue is very common 
during testing.

Sasha,

Is this bisectable? Maybe that's the best way forward here.
   
   I've ran a bisection again and ended up at the same commit as Jet Chen
   (the commit unfortunately already made it to Linus's tree).
   
   Note that I did try Jan's proposed fix and that didn't solve the issue
   for me, I believe we're seeing different issues caused by the same
   commit.
Sorry it has been busy time lately and I didn't have as much time to look
  into this as would be needed.
   Oops, pressed send too early... So I have two debug patches for you. Can
 you try whether the problem reproduces with the first one or with both of
 them applied?

The first patch fixed it (I assumed that there's no need to try the second).


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-10 Thread Peter Hurley

On 06/06/2014 03:05 PM, Sasha Levin wrote:

On 05/30/2014 10:07 AM, Jan Kara wrote:

On Fri 30-05-14 09:58:14, Peter Hurley wrote:

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara 
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell 
CommitDate: Thu May 22 10:43:35 2014 +1000

 printk: enable interrupts before calling console_trylock_for_printk()
 We need interrupts disabled when calling console_trylock_for_printk() 
only
 so that cpu id we pass to can_use_console() remains valid (for other
 things console_sem provides all the exclusion we need and deadlocks on
 console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to run
 on an online cpu so we can easily just get the cpu id in
 can_use_console().
 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH it
 can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().
 Signed-off-by: Jan Kara 
 Signed-off-by: Andrew Morton 

?

   Yeah, very likely. I think I see the problem, I'll send the fix shortly.


Hi Jan,

It seems that the issue I'm seeing is different from the "[prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1".

Is there anything else I could try here? The issue is very common during 
testing.


Sasha,

Is this bisectable? Maybe that's the best way forward here.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-10 Thread Peter Hurley

On 06/06/2014 03:05 PM, Sasha Levin wrote:

On 05/30/2014 10:07 AM, Jan Kara wrote:

On Fri 30-05-14 09:58:14, Peter Hurley wrote:

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara j...@suse.cz
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell s...@canb.auug.org.au
CommitDate: Thu May 22 10:43:35 2014 +1000

 printk: enable interrupts before calling console_trylock_for_printk()
 We need interrupts disabled when calling console_trylock_for_printk() 
only
 so that cpu id we pass to can_use_console() remains valid (for other
 things console_sem provides all the exclusion we need and deadlocks on
 console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to run
 on an online cpu so we can easily just get the cpu id in
 can_use_console().
 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH it
 can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().
 Signed-off-by: Jan Kara j...@suse.cz
 Signed-off-by: Andrew Morton a...@linux-foundation.org

?

   Yeah, very likely. I think I see the problem, I'll send the fix shortly.


Hi Jan,

It seems that the issue I'm seeing is different from the [prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1.

Is there anything else I could try here? The issue is very common during 
testing.


Sasha,

Is this bisectable? Maybe that's the best way forward here.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-06 Thread Sasha Levin
On 05/30/2014 10:07 AM, Jan Kara wrote:
> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
>> > On 05/30/2014 09:11 AM, Sasha Levin wrote:
>>> > >Hi all,
>>> > >
>>> > >I sometime see lockups when booting my KVM guest with the latest -next 
>>> > >kernel,
>>> > >it basically hangs right when it should start 'init', and after a while 
>>> > >I get
>>> > >the following spew:
>>> > >
>>> > >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
>> > 
>> > Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
>> > from Jet Chen which was bisected to
>> > 
>> > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
>> > Author: Jan Kara 
>> > AuthorDate: Thu May 22 10:43:35 2014 +1000
>> > Commit: Stephen Rothwell 
>> > CommitDate: Thu May 22 10:43:35 2014 +1000
>> > 
>> > printk: enable interrupts before calling console_trylock_for_printk()
>> > We need interrupts disabled when calling 
>> > console_trylock_for_printk() only
>> > so that cpu id we pass to can_use_console() remains valid (for other
>> > things console_sem provides all the exclusion we need and deadlocks on
>> > console_sem due to interrupts are impossible because we use
>> > down_trylock()).  However if we are rescheduled, we are guaranteed to 
>> > run
>> > on an online cpu so we can easily just get the cpu id in
>> > can_use_console().
>> > We can lose a bit of performance when we enable interrupts in
>> > vprintk_emit() and then disable them again in console_unlock() but 
>> > OTOH it
>> > can somewhat reduce interrupt latency caused by console_unlock()
>> > especially since later in the patch series we will want to spin on
>> > console_sem in console_trylock_for_printk().
>> > Signed-off-by: Jan Kara 
>> > Signed-off-by: Andrew Morton 
>> > 
>> > ?
>   Yeah, very likely. I think I see the problem, I'll send the fix shortly.

Hi Jan,

It seems that the issue I'm seeing is different from the "[prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1".

Is there anything else I could try here? The issue is very common during 
testing.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-06-06 Thread Sasha Levin
On 05/30/2014 10:07 AM, Jan Kara wrote:
 On Fri 30-05-14 09:58:14, Peter Hurley wrote:
  On 05/30/2014 09:11 AM, Sasha Levin wrote:
  Hi all,
  
  I sometime see lockups when booting my KVM guest with the latest -next 
  kernel,
  it basically hangs right when it should start 'init', and after a while 
  I get
  the following spew:
  
  [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
  
  Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
  from Jet Chen which was bisected to
  
  commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
  Author: Jan Kara j...@suse.cz
  AuthorDate: Thu May 22 10:43:35 2014 +1000
  Commit: Stephen Rothwell s...@canb.auug.org.au
  CommitDate: Thu May 22 10:43:35 2014 +1000
  
  printk: enable interrupts before calling console_trylock_for_printk()
  We need interrupts disabled when calling 
  console_trylock_for_printk() only
  so that cpu id we pass to can_use_console() remains valid (for other
  things console_sem provides all the exclusion we need and deadlocks on
  console_sem due to interrupts are impossible because we use
  down_trylock()).  However if we are rescheduled, we are guaranteed to 
  run
  on an online cpu so we can easily just get the cpu id in
  can_use_console().
  We can lose a bit of performance when we enable interrupts in
  vprintk_emit() and then disable them again in console_unlock() but 
  OTOH it
  can somewhat reduce interrupt latency caused by console_unlock()
  especially since later in the patch series we will want to spin on
  console_sem in console_trylock_for_printk().
  Signed-off-by: Jan Kara j...@suse.cz
  Signed-off-by: Andrew Morton a...@linux-foundation.org
  
  ?
   Yeah, very likely. I think I see the problem, I'll send the fix shortly.

Hi Jan,

It seems that the issue I'm seeing is different from the [prink]  BUG: spinlock
lockup suspected on CPU#0, swapper/1.

Is there anything else I could try here? The issue is very common during 
testing.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console: lockup on boot

2014-05-30 Thread Jan Kara
On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> On 05/30/2014 09:11 AM, Sasha Levin wrote:
> >Hi all,
> >
> >I sometime see lockups when booting my KVM guest with the latest -next 
> >kernel,
> >it basically hangs right when it should start 'init', and after a while I get
> >the following spew:
> >
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> 
> Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> from Jet Chen which was bisected to
> 
> commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> Author: Jan Kara 
> AuthorDate: Thu May 22 10:43:35 2014 +1000
> Commit: Stephen Rothwell 
> CommitDate: Thu May 22 10:43:35 2014 +1000
> 
> printk: enable interrupts before calling console_trylock_for_printk()
> We need interrupts disabled when calling console_trylock_for_printk() 
> only
> so that cpu id we pass to can_use_console() remains valid (for other
> things console_sem provides all the exclusion we need and deadlocks on
> console_sem due to interrupts are impossible because we use
> down_trylock()).  However if we are rescheduled, we are guaranteed to run
> on an online cpu so we can easily just get the cpu id in
> can_use_console().
> We can lose a bit of performance when we enable interrupts in
> vprintk_emit() and then disable them again in console_unlock() but OTOH it
> can somewhat reduce interrupt latency caused by console_unlock()
> especially since later in the patch series we will want to spin on
> console_sem in console_trylock_for_printk().
> Signed-off-by: Jan Kara 
> Signed-off-by: Andrew Morton 
> 
> ?
  Yeah, very likely. I think I see the problem, I'll send the fix shortly.

Honza
 
> >[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.541585] kworker/dying (172) used greatest stack depth: 3656 bytes left
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.010589] INFO: task swapper/0:1 blocked for more than 200 seconds.
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
> >[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
> >[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
> >[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >[  

Re: console: lockup on boot

2014-05-30 Thread Peter Hurley

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara 
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell 
CommitDate: Thu May 22 10:43:35 2014 +1000

printk: enable interrupts before calling console_trylock_for_printk()
We need interrupts disabled when calling console_trylock_for_printk() 
only
so that cpu id we pass to can_use_console() remains valid (for other
things console_sem provides all the exclusion we need and deadlocks on
console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to run
on an online cpu so we can easily just get the cpu id in
can_use_console().
We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH it
can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().
Signed-off-by: Jan Kara 
Signed-off-by: Andrew Morton 

?


[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.541585] kworker/dying (172) used greatest stack depth: 3656 bytes left
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.010589] INFO: task swapper/0:1 blocked for more than 200 seconds.
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup 

Re: console: lockup on boot

2014-05-30 Thread Peter Hurley

On 05/30/2014 09:11 AM, Sasha Levin wrote:

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0


Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara j...@suse.cz
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell s...@canb.auug.org.au
CommitDate: Thu May 22 10:43:35 2014 +1000

printk: enable interrupts before calling console_trylock_for_printk()
We need interrupts disabled when calling console_trylock_for_printk() 
only
so that cpu id we pass to can_use_console() remains valid (for other
things console_sem provides all the exclusion we need and deadlocks on
console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to run
on an online cpu so we can easily just get the cpu id in
can_use_console().
We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH it
can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().
Signed-off-by: Jan Kara j...@suse.cz
Signed-off-by: Andrew Morton a...@linux-foundation.org

?


[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.541585] kworker/dying (172) used greatest stack depth: 3656 bytes left
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.010589] INFO: task swapper/0:1 blocked for more than 200 seconds.
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[  408.020558] BUG: spinlock lockup suspected 

Re: console: lockup on boot

2014-05-30 Thread Jan Kara
On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 On 05/30/2014 09:11 AM, Sasha Levin wrote:
 Hi all,
 
 I sometime see lockups when booting my KVM guest with the latest -next 
 kernel,
 it basically hangs right when it should start 'init', and after a while I get
 the following spew:
 
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 
 Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
 from Jet Chen which was bisected to
 
 commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 Author: Jan Kara j...@suse.cz
 AuthorDate: Thu May 22 10:43:35 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:43:35 2014 +1000
 
 printk: enable interrupts before calling console_trylock_for_printk()
 We need interrupts disabled when calling console_trylock_for_printk() 
 only
 so that cpu id we pass to can_use_console() remains valid (for other
 things console_sem provides all the exclusion we need and deadlocks on
 console_sem due to interrupts are impossible because we use
 down_trylock()).  However if we are rescheduled, we are guaranteed to run
 on an online cpu so we can easily just get the cpu id in
 can_use_console().
 We can lose a bit of performance when we enable interrupts in
 vprintk_emit() and then disable them again in console_unlock() but OTOH it
 can somewhat reduce interrupt latency caused by console_unlock()
 especially since later in the patch series we will want to spin on
 console_sem in console_trylock_for_printk().
 Signed-off-by: Jan Kara j...@suse.cz
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 
 ?
  Yeah, very likely. I think I see the problem, I'll send the fix shortly.

Honza
 
 [   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [   30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.541585] kworker/dying (172) used greatest stack depth: 3656 bytes left
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.010589] INFO: task swapper/0:1 blocked for more than 200 seconds.
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
 [  321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
 [  320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
 [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
 [  408.020558] BUG: spinlock lockup suspected on CPU#43,