Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-25 Thread Jan Kara
On Fri 22-11-13 15:27:11, Andrew Morton wrote:
> On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara  wrote:
> 
> > On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> > > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > > > On Fri, 8 Nov 2013 00:21:51 +0100
> > > > Frederic Weisbecker  wrote:
> > > > > 
> > > > > Offloading to a workqueue would be perhaps better, and writing to the 
> > > > > serial
> > > > > console could then be done with interrupts enabled, preemptible 
> > > > > context, etc...
> > > > 
> > > > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > > > nightmare of much more complexity for a critical kernel infrastructure.
> > > 
> > > But yeah that's scary, that means workqueues itself can't printk that 
> > > safely.
> > > So, you're right after all.
> >   Yeah, we've been there (that was actually my initial proposal). But
> > Andrew and Steven (rightfully) objected and suggested irq_work should be
> > used instead.
> 
> I still hate the patchset and so does everyone else, including you ;)
> There must be something smarter we can do.  Let's start by restating
> the problem:
> 
> CPU A is in printk, emitting log_buf characters to a slow device. 
> Meanwhile other CPUs come into printk(), see that the system is busy,
> dump their load into log_buf then scram, leaving CPU A to do even more
> work.
> 
> Correct so far?
  Yes, correct.

> If so, what is the role of local_irq_disabled() in this?  Did CPU A
> call printk() with local interrupts disabled, or is printk (or the
> console driver) causing the irqs-off condition?  Where and why is this
> IRQ disablement happening?
  So there are couple of places where we disable interrupts.
a) call_console_drivers() which does the printing to console is always
   called with interrupts disabled. Commonly it is called from
   console_unlock() which takes care of disabling interrupts. I presume
   this is because we want to guard against interrupts doing something
   unexpected with the console while we are printing to it. But I don't
   really understand console drivers to be sure...
b) vprintk_emit() (which is the function usually calling console_unlock())
   also disables interrupts to make updates of log_buf interrupt safe. It
   calls console_unlock() with interrupts disabled which seems to be
   unnecessary as that function takes care of disabling interrupts itself.
   It makes the situation somewhat worse because console_unlock() could
   otherwise enable interrupts from time to time. That being said I've
   tried to fix this shortcoming in previous versions of the patch set but
   it didn't seem to make a difference - maybe
 local_irq_restore(flags);
 spin_lock_irqsafe(&logbuf_lock, flags);
   which is what console_unlock() does, doesn't give APIC enough time to
   deliver blocked interrupts.
c) printk() itself is sometimes called with interrupts disabled. This
   happens a lot for example from sysrq handlers which is sometimes
   unpleasant (sysrq-s simply kills large machines) but not a primary
   concern for me. It doesn't seem to happen too often after an early boot
   is finished (in particular SCSI messages which make machines unbootable
   seem to be generated from kernel thread context). But there are some
   messages like this and if we are unlucky and we get caught in such
   printk, the machine dies. So I believe we have to reliably handle a
   situation when printk() itself gets called with interrupts disabled.
  
> Could we fix this problem by not permitting CPUs B, C and D to DoS CPU
> A?  When CPU B comes into printk() and sees that printk is busy, make
> CPU A hand over to CPU B and let CPU A get out of there?
  We could. In fact I was proposing this in
https://lkml.org/lkml/2013/9/5/329
  It has the advantage that we won't rely on irq work. If we changed
console_trylock() to console_lock() in console_trylock_for_printk() and
made console_unlock() only print the messages in log_buf on function entry,
it would even make things simpler but it would basically undo your change
from ages ago and I'm not sure about consequences. All printk()s could
suddently block much more since printk() would essentially become
completely synchronous.

We could try some more fancy compromise between current "completely async
printk" and ancient "completely synchronous printk" but then it gets more
complex and so far dependence on irq work seemed as a lesser evil to me.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-22 Thread Andrew Morton
On Fri, 8 Nov 2013 11:21:13 +0100 Jan Kara  wrote:

> On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> > On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > > On Fri, 8 Nov 2013 00:21:51 +0100
> > > Frederic Weisbecker  wrote:
> > > > 
> > > > Offloading to a workqueue would be perhaps better, and writing to the 
> > > > serial
> > > > console could then be done with interrupts enabled, preemptible 
> > > > context, etc...
> > > 
> > > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > > nightmare of much more complexity for a critical kernel infrastructure.
> > 
> > But yeah that's scary, that means workqueues itself can't printk that 
> > safely.
> > So, you're right after all.
>   Yeah, we've been there (that was actually my initial proposal). But
> Andrew and Steven (rightfully) objected and suggested irq_work should be
> used instead.

I still hate the patchset and so does everyone else, including you ;)
There must be something smarter we can do.  Let's start by restating
the problem:

CPU A is in printk, emitting log_buf characters to a slow device. 
Meanwhile other CPUs come into printk(), see that the system is busy,
dump their load into log_buf then scram, leaving CPU A to do even more
work.

Correct so far?

If so, what is the role of local_irq_disabled() in this?  Did CPU A
call printk() with local interrupts disabled, or is printk (or the
console driver) causing the irqs-off condition?  Where and why is this
IRQ disablement happening?


Could we fix this problem by not permitting CPUs B, C and D to DoS CPU
A?  When CPU B comes into printk() and sees that printk is busy, make
CPU A hand over to CPU B and let CPU A get out of there?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-16 Thread Pavel Machek
Hi!

> > > > > Reviewed-by: Steven Rostedt 
> > > > > Signed-off-by: Jan Kara 
> > > > 
> > > > When a message takes tens of seconds to be printed, it usually means
> > > > we are in trouble somehow :)
> > > > I wonder what printk source can trigger such a high volume.
> > >   Machines with tens of processors and thousands of scsi devices. When
> > > device discovery happens on boot, all processors are busily reporting new
> > > scsi devices and one poor looser is bound to do the printing for ever and
> > > ever until the machine dies...
> > 
> > Dunno. In these cases, would it make sense to:
> > 
> > 1) reduce amount of text printed
>   I thought about this as well. But
> a) It doesn't seem practical as you would have to modify lots of drivers
>and keep them rather silent. That seems rather fragile. Plus you will
>not display some potentially useful information.
> b) It doesn't address the real underlying problem that the way printk() is
>currently implemented, there is no bound on the time CPU spends in the
>loop printing from buffer to console. And the fact that this loop
>sometimes happens with interrupts disabled makes the situation even
>worse.
>  
> > 2) just print [XXX characters lost] on overruns?
>   We don't overrun the printk buffer so no characters are lost. It just
> takes too long to feed the whole printk buffer through serial console...

Yes, I know. No characters are lost, but we spend seconds with
interrupts disabled, breaking the system. (SCSI timeouts? We do
keyboard autorepeat in software these days...)

Would it be better to just drop the characters?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-11 Thread Jan Kara
On Mon 11-11-13 22:54:15, Pavel Machek wrote:
> Hi!
> 
> > > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > > are reported by our customers) when other CPUs are using printk heavily
> > > > and serial console makes printing slow. Despite serial console drivers
> > > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > > because interrupts are disabled for the whole time console_unlock() runs
> > > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > > >
> > > > In my artifical testing I can also easily trigger a situation when disk
> > > > disappears from the system apparently because interrupt from it wasn't
> > > > served for too long. This is why just silencing watchdogs isn't a
> > > > reliable solution to the problem and we simply have to avoid spending
> > > > too long in console_unlock() with interrupts disabled.
> > > >
> > > > The solution this patch works toward is to postpone printing to a later
> > > > moment / different CPU when we already printed over X characters in
> > > > current console_unlock() invocation. This is a crude heuristic but
> > > > measuring time we spent printing doesn't seem to be really viable - we
> > > > cannot rely on high resolution time being available and with interrupts
> > > > disabled jiffies are not updated. User can tune the value X via
> > > > printk.offload_chars kernel parameter.
> > > >
> > > > Reviewed-by: Steven Rostedt 
> > > > Signed-off-by: Jan Kara 
> > > 
> > > When a message takes tens of seconds to be printed, it usually means
> > > we are in trouble somehow :)
> > > I wonder what printk source can trigger such a high volume.
> >   Machines with tens of processors and thousands of scsi devices. When
> > device discovery happens on boot, all processors are busily reporting new
> > scsi devices and one poor looser is bound to do the printing for ever and
> > ever until the machine dies...
> 
> Dunno. In these cases, would it make sense to:
> 
> 1) reduce amount of text printed
  I thought about this as well. But
a) It doesn't seem practical as you would have to modify lots of drivers
   and keep them rather silent. That seems rather fragile. Plus you will
   not display some potentially useful information.
b) It doesn't address the real underlying problem that the way printk() is
   currently implemented, there is no bound on the time CPU spends in the
   loop printing from buffer to console. And the fact that this loop
   sometimes happens with interrupts disabled makes the situation even
   worse.
 
> 2) just print [XXX characters lost] on overruns?
  We don't overrun the printk buffer so no characters are lost. It just
takes too long to feed the whole printk buffer through serial console...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-11 Thread Pavel Machek
Hi!

> > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > are reported by our customers) when other CPUs are using printk heavily
> > > and serial console makes printing slow. Despite serial console drivers
> > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > because interrupts are disabled for the whole time console_unlock() runs
> > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > >
> > > In my artifical testing I can also easily trigger a situation when disk
> > > disappears from the system apparently because interrupt from it wasn't
> > > served for too long. This is why just silencing watchdogs isn't a
> > > reliable solution to the problem and we simply have to avoid spending
> > > too long in console_unlock() with interrupts disabled.
> > >
> > > The solution this patch works toward is to postpone printing to a later
> > > moment / different CPU when we already printed over X characters in
> > > current console_unlock() invocation. This is a crude heuristic but
> > > measuring time we spent printing doesn't seem to be really viable - we
> > > cannot rely on high resolution time being available and with interrupts
> > > disabled jiffies are not updated. User can tune the value X via
> > > printk.offload_chars kernel parameter.
> > >
> > > Reviewed-by: Steven Rostedt 
> > > Signed-off-by: Jan Kara 
> > 
> > When a message takes tens of seconds to be printed, it usually means
> > we are in trouble somehow :)
> > I wonder what printk source can trigger such a high volume.
>   Machines with tens of processors and thousands of scsi devices. When
> device discovery happens on boot, all processors are busily reporting new
> scsi devices and one poor looser is bound to do the printing for ever and
> ever until the machine dies...

Dunno. In these cases, would it make sense to:

1) reduce amount of text printed

2) just print [XXX characters lost] on overruns?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-08 Thread Jan Kara
On Fri 08-11-13 00:46:49, Frederic Weisbecker wrote:
> On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> > On Fri, 8 Nov 2013 00:21:51 +0100
> > Frederic Weisbecker  wrote:
> > > 
> > > Offloading to a workqueue would be perhaps better, and writing to the 
> > > serial
> > > console could then be done with interrupts enabled, preemptible context, 
> > > etc...
> > 
> > Oh God no ;-)  Adding workqueue logic into printk just spells a
> > nightmare of much more complexity for a critical kernel infrastructure.
> 
> But yeah that's scary, that means workqueues itself can't printk that safely.
> So, you're right after all.
  Yeah, we've been there (that was actually my initial proposal). But
Andrew and Steven (rightfully) objected and suggested irq_work should be
used instead.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker  wrote:
>  
> > Ok I see now.
> > 
> > But then this irq_work based solution won't work if, say, you run in full 
> > dynticks
> > mode. Also the hook on the timer interrupt is something that I wish we get 
> > rid
> > of on archs that can trigger self-IPIs.
> 
> Do we really want that? What about users that use LAZY? That is, the
> work isn't that important to trigger right now (added interrupt
> expense).

Ah right, I forgot that :)

> 
> > 
> > Notwithstanding it's going to have scalibility issues as irq work then 
> > converges
> > to a single list for unbound works.
> 
> Well, it doesn't seem to be something that would be called often. All
> CPUs checking a variable that is seldom changed should not have any
> scalability issues. Unless of course it happens to share a cache line
> with a variable that does change often.

Right, if it was printk specific code I wouldn't mind much in fact. But having 
that in
such a globally available API like irq work make me feel uncomfortable.

> 
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, 
> > etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

True, in fact I was thinking about raising an irq work that enqueues a 
workqueue, circumvoluted right? ;)

But it would make it safe, as irq work can be enqueued everywhere, and as it 
seems that
we can have high bandwidth of stuffs to print to the serial device, a process 
context looks much saner to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 06:37:17PM -0500, Steven Rostedt wrote:
> On Fri, 8 Nov 2013 00:21:51 +0100
> Frederic Weisbecker  wrote:
> > 
> > Offloading to a workqueue would be perhaps better, and writing to the serial
> > console could then be done with interrupts enabled, preemptible context, 
> > etc...
> 
> Oh God no ;-)  Adding workqueue logic into printk just spells a
> nightmare of much more complexity for a critical kernel infrastructure.

But yeah that's scary, that means workqueues itself can't printk that safely.
So, you're right after all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Steven Rostedt
On Fri, 8 Nov 2013 00:21:51 +0100
Frederic Weisbecker  wrote:
 
> Ok I see now.
> 
> But then this irq_work based solution won't work if, say, you run in full 
> dynticks
> mode. Also the hook on the timer interrupt is something that I wish we get rid
> of on archs that can trigger self-IPIs.

Do we really want that? What about users that use LAZY? That is, the
work isn't that important to trigger right now (added interrupt
expense).

> 
> Notwithstanding it's going to have scalibility issues as irq work then 
> converges
> to a single list for unbound works.

Well, it doesn't seem to be something that would be called often. All
CPUs checking a variable that is seldom changed should not have any
scalability issues. Unless of course it happens to share a cache line
with a variable that does change often.

> 
> Offloading to a workqueue would be perhaps better, and writing to the serial
> console could then be done with interrupts enabled, preemptible context, 
> etc...

Oh God no ;-)  Adding workqueue logic into printk just spells a
nightmare of much more complexity for a critical kernel infrastructure.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
On Thu, Nov 07, 2013 at 11:57:33PM +0100, Jan Kara wrote:
> On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> > 2013/11/7 Jan Kara :
> > > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > > are reported by our customers) when other CPUs are using printk heavily
> > > and serial console makes printing slow. Despite serial console drivers
> > > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > > because interrupts are disabled for the whole time console_unlock() runs
> > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > >
> > > In my artifical testing I can also easily trigger a situation when disk
> > > disappears from the system apparently because interrupt from it wasn't
> > > served for too long. This is why just silencing watchdogs isn't a
> > > reliable solution to the problem and we simply have to avoid spending
> > > too long in console_unlock() with interrupts disabled.
> > >
> > > The solution this patch works toward is to postpone printing to a later
> > > moment / different CPU when we already printed over X characters in
> > > current console_unlock() invocation. This is a crude heuristic but
> > > measuring time we spent printing doesn't seem to be really viable - we
> > > cannot rely on high resolution time being available and with interrupts
> > > disabled jiffies are not updated. User can tune the value X via
> > > printk.offload_chars kernel parameter.
> > >
> > > Reviewed-by: Steven Rostedt 
> > > Signed-off-by: Jan Kara 
> > 
> > When a message takes tens of seconds to be printed, it usually means
> > we are in trouble somehow :)
> > I wonder what printk source can trigger such a high volume.
>   Machines with tens of processors and thousands of scsi devices. When
> device discovery happens on boot, all processors are busily reporting new
> scsi devices and one poor looser is bound to do the printing for ever and
> ever until the machine dies...
> 
> Or try running sysrq-t on a large machine with serial console connected. The
> machine will die because of lockups (although in this case I agree it is more
> of a problem of sysrq-t doing lots of printing in interrupt-disabled
> context).
> 
> > May be cutting some huge message into smaller chunks could help? That
> > would re enable interrupts between each call.
> > 
> > It's hard to tell without the context, but using other CPUs for
> > rescuing doesn't look like a good solution. What if the issue happens
> > in UP to begin with?
>   The real trouble in practice is that while one cpu is doing printing,
> other cpus are appending to the printk buffer. So the cpu can be printing
> for a *long* time. So offloading the work to other cpus which are also
> appending messages seems as a fair thing to do.

Ok I see now.

But then this irq_work based solution won't work if, say, you run in full 
dynticks
mode. Also the hook on the timer interrupt is something that I wish we get rid
of on archs that can trigger self-IPIs.

Notwithstanding it's going to have scalibility issues as irq work then converges
to a single list for unbound works.

Offloading to a workqueue would be perhaps better, and writing to the serial
console could then be done with interrupts enabled, preemptible context, etc...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Steven Rostedt
On Thu, 7 Nov 2013 23:43:52 +0100
Frederic Weisbecker  wrote:


> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.

The only ones that I'm aware of is the prints from sysrq. Which
includes a ftrace_dump as well. These have caused issues in the past,
but usually I do them on boxes that I reboot afterward anyway.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Jan Kara
On Thu 07-11-13 23:43:52, Frederic Weisbecker wrote:
> 2013/11/7 Jan Kara :
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because interrupt from it wasn't
> > served for too long. This is why just silencing watchdogs isn't a
> > reliable solution to the problem and we simply have to avoid spending
> > too long in console_unlock() with interrupts disabled.
> >
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over X characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. User can tune the value X via
> > printk.offload_chars kernel parameter.
> >
> > Reviewed-by: Steven Rostedt 
> > Signed-off-by: Jan Kara 
> 
> When a message takes tens of seconds to be printed, it usually means
> we are in trouble somehow :)
> I wonder what printk source can trigger such a high volume.
  Machines with tens of processors and thousands of scsi devices. When
device discovery happens on boot, all processors are busily reporting new
scsi devices and one poor looser is bound to do the printing for ever and
ever until the machine dies...

Or try running sysrq-t on a large machine with serial console connected. The
machine will die because of lockups (although in this case I agree it is more
of a problem of sysrq-t doing lots of printing in interrupt-disabled
context).

> May be cutting some huge message into smaller chunks could help? That
> would re enable interrupts between each call.
> 
> It's hard to tell without the context, but using other CPUs for
> rescuing doesn't look like a good solution. What if the issue happens
> in UP to begin with?
  The real trouble in practice is that while one cpu is doing printing,
other cpus are appending to the printk buffer. So the cpu can be printing
for a *long* time. So offloading the work to other cpus which are also
appending messages seems as a fair thing to do.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Frederic Weisbecker
2013/11/7 Jan Kara :
> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because interrupt from it wasn't
> served for too long. This is why just silencing watchdogs isn't a
> reliable solution to the problem and we simply have to avoid spending
> too long in console_unlock() with interrupts disabled.
>
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over X characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. User can tune the value X via
> printk.offload_chars kernel parameter.
>
> Reviewed-by: Steven Rostedt 
> Signed-off-by: Jan Kara 

When a message takes tens of seconds to be printed, it usually means
we are in trouble somehow :)
I wonder what printk source can trigger such a high volume.

May be cutting some huge message into smaller chunks could help? That
would re enable interrupts between each call.

It's hard to tell without the context, but using other CPUs for
rescuing doesn't look like a good solution. What if the issue happens
in UP to begin with?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-11-07 Thread Jan Kara
A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because interrupt from it wasn't
served for too long. This is why just silencing watchdogs isn't a
reliable solution to the problem and we simply have to avoid spending
too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over X characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. User can tune the value X via
printk.offload_chars kernel parameter.

Reviewed-by: Steven Rostedt 
Signed-off-by: Jan Kara 
---
 Documentation/kernel-parameters.txt | 17 ++
 kernel/printk/printk.c  | 68 +++--
 2 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index fcbb736d55fe..579815bd90eb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2540,6 +2540,23 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
Format:   (1/Y/y=enable, 0/N/n=disable)
default: disabled
 
+   printk.offload_chars=
+   Printing to console can be relatively slow especially
+   in case of serial console. When there is intensive
+   printing happening from several cpus (as is the case
+   during boot), a cpu can be spending significant time
+   (seconds or more) doing printing. To avoid softlockups,
+   lost interrupts, and similar problems a cpu stops
+   printing to console after printing
+   'printk.offload_chars' characters.  Another cpu will
+   then continue printing. Higher value means possibly
+   longer interrupt and other latencies but lower latency
+   of printing and lower chance something goes wrong
+   during system crash and important message is not
+   printed.
+   Format: 
+   default: 0 (disabled)
+
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 572fb1f438b6..35bb70ea6427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -253,6 +253,18 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work (0 means infinity). Tunable via
+ * /proc/sys/kernel/printk_offload_chars.
+ */
+static unsigned int __read_mostly printk_offload_chars = 0;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+  S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+   " cpu after this number of characters");
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -2026,31 +2038,43 @@ out:
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+   /* Oops? Print everything now to maximize chances user will see it */
+   if (oops_in_progress)
+   return false;
+   return printk_offload_chars && printed_chars > printk_offload_chars;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If thi

Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-21 Thread Steven Rostedt
On Wed, 21 Aug 2013 10:08:31 +0200
Jan Kara  wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
> 
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock() with interrupts disabled.
> 
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over 1000 characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. The value 1000 was chosen so that
> things are still bearable with 9600 baud serial console and OTOH it
> shouldn't cause offloading of printing in common cases.
> 

BTW, you can also test by running:

 # cd /sys/kernel/debug/tracing
 # echo function > current_tracer
 # sleep 1
 # echo z > /proc/sysrq-trigger


> Signed-off-by: Jan Kara 

Reviewed-by: Steven Rostedt 

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-21 Thread Jan Kara
A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because commands to / from it
could not be delivered for long enough. This is why just silencing
watchdogs isn't a reliable solution to the problem and we simply have to
avoid spending too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over 1000 characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. The value 1000 was chosen so that
things are still bearable with 9600 baud serial console and OTOH it
shouldn't cause offloading of printing in common cases.

Signed-off-by: Jan Kara 
---
 Documentation/sysctl/kernel.txt | 19 +
 include/linux/printk.h  |  1 +
 kernel/printk/printk.c  | 62 ++---
 kernel/sysctl.c |  8 ++
 4 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ab7d16e..0967bcc 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -57,6 +57,7 @@ show up in /proc/sys/kernel:
 - printk_delay
 - printk_ratelimit
 - printk_ratelimit_burst
+- printk_offload_chars
 - randomize_va_space
 - real-root-dev   ==> Documentation/initrd.txt
 - reboot-cmd  [ SPARC only ]
@@ -524,6 +525,24 @@ send before ratelimiting kicks in.
 
 ==
 
+printk_offload_chars:
+
+Printing to console can be relatively slow especially in case of
+serial console. When there is intensive printing happening from
+several cpus (as is the case during boot), a cpu can be spending
+significant time (seconds or more) doing printing. To avoid
+softlockups, lost interrupts, and similar problems a cpu stops
+printing to console after printing 'printk_offload_chars' characters.
+Another cpu will then continue printing. Higher value means possibly
+longer interrupt and other latencies but lower latency of printing
+and lower chance something goes wrong during system crash and
+important message is not printed.
+
+By default we offload printing from a cpu after printing 1000
+characters. Setting printk_offload_chars to 0 disables offloading.
+
+==
+
 randomize_va_space:
 
 This option can be used to select the type of process address
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 22c7052..135d958 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -141,6 +141,7 @@ extern bool printk_timed_ratelimit(unsigned long 
*caller_jiffies,
 extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
+extern int printk_offload_chars;
 
 extern void wake_up_klogd(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9208e17..e97971b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -253,6 +253,12 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work. Tunable via /proc/sys/kernel/printk_offload_chars.
+ */
+int __read_mostly printk_offload_chars = 1000;
+
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -2026,31 +2032,43 @@ out:
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+   /* Oops? Print everything now to maximize chances user will see it */
+   if (oops_in_progress)
+   return false;
+   return printk_offload_chars && printed_chars > printk_offload_chars;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().

Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-15 Thread Steven Rostedt
On Thu, 15 Aug 2013 09:52:51 +0200
Jan Kara  wrote:

 
> > > +/*
> > > + * How much characters can we print in one call of printk before 
> > > offloading
> > > + * printing work
> > > + */
> > > +#define MAX_PRINTK_CHARS 1000
> > 
> > This really needs to be configurable. 1000 may be good for you, but may
> > not be good for others. Heck, we should be able to disable this
> > offloading too.
> > 
> > This should probably be added to the printk /proc/sys/kernel/... files.
> > 
> > If it is set to zero, then disable any offloading.
>   OK, I can easily do this. I'm somewhat skeptical people would want to
> tune this but maybe they would.
> 

Most people wouldn't. But as this is a new heuristic, if things are not
behaving as people expect, then it would be good to be able to modify
that heuristic to see if that was the issue. Including, disabling it.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-15 Thread Jan Kara
On Wed 14-08-13 21:38:18, Steven Rostedt wrote:
> On Wed, 14 Aug 2013 15:28:27 +0200
> Jan Kara  wrote:
> 
> > A CPU can be caught in console_unlock() for a long time (tens of seconds
> > are reported by our customers) when other CPUs are using printk heavily
> > and serial console makes printing slow. Despite serial console drivers
> > are calling touch_nmi_watchdog() this triggers softlockup warnings
> > because interrupts are disabled for the whole time console_unlock() runs
> > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> > IPIs cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> > 
> > In my artifical testing I can also easily trigger a situation when disk
> > disappears from the system apparently because commands to / from it
> > could not be delivered for long enough. This is why just silencing
> > watchdogs isn't a reliable solution to the problem and we simply have to
> > avoid spending too long in console_unlock() with interrupts disabled.
> > 
> > The solution this patch works toward is to postpone printing to a later
> > moment / different CPU when we already printed over 1000 characters in
> > current console_unlock() invocation. This is a crude heuristic but
> > measuring time we spent printing doesn't seem to be really viable - we
> > cannot rely on high resolution time being available and with interrupts
> > disabled jiffies are not updated. The value 1000 was chosen so that
> > things are still bearable with 9600 baud serial console and OTOH it
> > shouldn't cause offloading of printing in common cases.
> > 
> > Signed-off-by: Jan Kara 
> > ---
> >  kernel/printk/printk.c | 62 
> > ++
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 9208e17..28c220a 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2026,31 +2026,49 @@ out:
> > raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> >  }
> >  
> > +/*
> > + * How much characters can we print in one call of printk before offloading
> > + * printing work
> > + */
> > +#define MAX_PRINTK_CHARS 1000
> 
> This really needs to be configurable. 1000 may be good for you, but may
> not be good for others. Heck, we should be able to disable this
> offloading too.
> 
> This should probably be added to the printk /proc/sys/kernel/... files.
> 
> If it is set to zero, then disable any offloading.
  OK, I can easily do this. I'm somewhat skeptical people would want to
tune this but maybe they would.

Honza

> > +
> > +/* Should we stop printing on this cpu? */
> > +static bool cpu_stop_printing(int printed_chars)
> > +{
> > +   /* Oops? Print everything now to maximize chances user will see it */
> > +   if (oops_in_progress)
> > +   return false;
> > +   return printed_chars > MAX_PRINTK_CHARS;
> > +}
> > +
> >  /**
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-14 Thread Steven Rostedt
On Wed, 14 Aug 2013 15:28:27 +0200
Jan Kara  wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds
> are reported by our customers) when other CPUs are using printk heavily
> and serial console makes printing slow. Despite serial console drivers
> are calling touch_nmi_watchdog() this triggers softlockup warnings
> because interrupts are disabled for the whole time console_unlock() runs
> (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
> IPIs cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
> 
> In my artifical testing I can also easily trigger a situation when disk
> disappears from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock() with interrupts disabled.
> 
> The solution this patch works toward is to postpone printing to a later
> moment / different CPU when we already printed over 1000 characters in
> current console_unlock() invocation. This is a crude heuristic but
> measuring time we spent printing doesn't seem to be really viable - we
> cannot rely on high resolution time being available and with interrupts
> disabled jiffies are not updated. The value 1000 was chosen so that
> things are still bearable with 9600 baud serial console and OTOH it
> shouldn't cause offloading of printing in common cases.
> 
> Signed-off-by: Jan Kara 
> ---
>  kernel/printk/printk.c | 62 
> ++
>  1 file changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9208e17..28c220a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2026,31 +2026,49 @@ out:
>   raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * How much characters can we print in one call of printk before offloading
> + * printing work
> + */
> +#define MAX_PRINTK_CHARS 1000

This really needs to be configurable. 1000 may be good for you, but may
not be good for others. Heck, we should be able to disable this
offloading too.

This should probably be added to the printk /proc/sys/kernel/... files.

If it is set to zero, then disable any offloading.

-- Steve


> +
> +/* Should we stop printing on this cpu? */
> +static bool cpu_stop_printing(int printed_chars)
> +{
> + /* Oops? Print everything now to maximize chances user will see it */
> + if (oops_in_progress)
> + return false;
> + return printed_chars > MAX_PRINTK_CHARS;
> +}
> +
>  /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] printk: Defer printing to irq work when we printed too much

2013-08-14 Thread Jan Kara
A CPU can be caught in console_unlock() for a long time (tens of seconds
are reported by our customers) when other CPUs are using printk heavily
and serial console makes printing slow. Despite serial console drivers
are calling touch_nmi_watchdog() this triggers softlockup warnings
because interrupts are disabled for the whole time console_unlock() runs
(e.g. vprintk() calls console_unlock() with interrupts disabled). Thus
IPIs cannot be processed and other CPUs get stuck spinning in calls like
smp_call_function_many(). Also RCU eventually starts reporting lockups.

In my artifical testing I can also easily trigger a situation when disk
disappears from the system apparently because commands to / from it
could not be delivered for long enough. This is why just silencing
watchdogs isn't a reliable solution to the problem and we simply have to
avoid spending too long in console_unlock() with interrupts disabled.

The solution this patch works toward is to postpone printing to a later
moment / different CPU when we already printed over 1000 characters in
current console_unlock() invocation. This is a crude heuristic but
measuring time we spent printing doesn't seem to be really viable - we
cannot rely on high resolution time being available and with interrupts
disabled jiffies are not updated. The value 1000 was chosen so that
things are still bearable with 9600 baud serial console and OTOH it
shouldn't cause offloading of printing in common cases.

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 62 ++
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9208e17..28c220a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2026,31 +2026,49 @@ out:
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * How much characters can we print in one call of printk before offloading
+ * printing work
+ */
+#define MAX_PRINTK_CHARS 1000
+
+/* Should we stop printing on this cpu? */
+static bool cpu_stop_printing(int printed_chars)
+{
+   /* Oops? Print everything now to maximize chances user will see it */
+   if (oops_in_progress)
+   return false;
+   return printed_chars > MAX_PRINTK_CHARS;
+}
+
 /**
- * console_unlock - unlock the console system
+ * __console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, __console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. Function returns true, if there's more
+ * data that needs printing in the buffer.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * __console_unlock(); may be called from any context.
  */
-void console_unlock(void)
+static bool __console_unlock(void)
 {
static char text[LOG_LINE_MAX + PREFIX_MAX];
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
bool retry;
+   int printed_chars = 0;
 
if (console_suspended) {
up(&console_sem);
-   return;
+   return false;
}
 
console_may_schedule = 0;
@@ -2078,6 +2096,8 @@ again:
 skip:
if (console_seq == log_next_seq)
break;
+   if (cpu_stop_printing(printed_chars))
+   break;
 
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
@@ -2108,6 +2128,7 @@ skip:
stop_critical_timings();/* don't trace print latency */
call_console_drivers(level, text, len);
start_critical_timings();
+   printed_chars += len;
local_irq_restore(flags);
}
console_locked = 0;
@@ -2131,13 +2152,20 @@ skip:
retry = console_seq != log_next_seq;
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-   if (retry && console_trylock())
-   goto again;
+   if (retry && !cpu_stop_printing(printed_chars)) {
+   if (console_trylock())
+   goto again;
+   /*
+* Someone else is printing so the caller doesn't have to
+* schedule irq work
+*/
+   retry = false;
+   }
 
if (wake_klogd)
wake_up_klogd();
+   return retry;
 }
-EXPORT_SYMBOL(console_unlock);
 
 /**
  * console_conditional_schedule - yield the CPU