Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-11 Thread Tetsuo Handa
Tejun Heo wrote:
> On Wed, Jan 10, 2018 at 07:08:30AM +0900, Tetsuo Handa wrote:
> > > * Netconsole tries to send out OOM messages and tries memory
> > >   allocation which fails which then prints allocation failed messages.
> > >   Because this happens while already printing, it just queues the
> > >   messages to the buffer.  This repeats.
> > 
> > What? Does netconsole need to allocate memory when sending? I assume it 
> > does not.
> 
> A lot of network drivers do, unfortunatley.
> 

Excuse me, but can you show me an example of such traces?

Any path which are called from printk() must not (directly or indirectly)
depend on __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page_alloc.c?id=e746bf730a76fe53b82c9e6b6da72d58e9ae3565
If it depends on such memory allocation, it can cause OOM lockup.


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-10 Thread Tejun Heo
Hello, Steven.

On Wed, Jan 10, 2018 at 02:18:27AM -0500, Steven Rostedt wrote:
> My point is, that your test is only hammering at a single CPU. You say
> it is the scenario you see, which means that the OOM is printing out
> more than it should, because if it prints it out once, it should not
> print it out again for the same process, or go into a loop doing it
> over and over on a single CPU. That would be a bug in the
> implementation.

That's not what's happening.  You're not actually reading what I'm
writing.  Can you please go back and re-read the scenario I've been
describing over and over again.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Steven Rostedt
On Tue, 9 Jan 2018 14:53:56 -0800
Tejun Heo  wrote:

> Hello, Steven.
> 
> On Tue, Jan 09, 2018 at 05:47:50PM -0500, Steven Rostedt wrote:
> > > Maybe it can break out eventually but that can take a really long
> > > time.  It's OOM.  Most of userland is waiting for reclaim.  There
> > > isn't all that much going on outside that and there can only be one
> > > CPU which is OOMing.  The kernel isn't gonna be all that chatty.  
> > 
> > Are you saying that the OOM is stuck printing over and over on a single
> > CPU. Perhaps we should fix THAT.  
> 
> I'm not sure what you meant but OOM code isn't doing anything bad

My point is, that your test is only hammering at a single CPU. You say
it is the scenario you see, which means that the OOM is printing out
more than it should, because if it prints it out once, it should not
print it out again for the same process, or go into a loop doing it
over and over on a single CPU. That would be a bug in the
implementation.

> other than excluding others from doing OOM kills simultaneously, which
> is what we want, and printing a lot of messages and then gets caught
> up in a positive feedback loop.
> 
> To me, the whole point of this effort is preventing printk messages
> from causing significant or critical disruptions to overall system
> operation.

I agree, and my patch helps with this tremendously, if we are not doing
something stupid like printk thousands of times in an interrupt
handler, over and over on a single CPU.

>  IOW, it's rather dumb if the machine goes down because
> somebody printk'd wrong or just failed to foresee the combinations of
> events which could lead to such conditions.

I still like to see a trace of a real situation.

> 
> It's not like we don't know how to fix this either.

But we don't want the fix to introduce regressions, and offloading
printk does. Heck, the current fixes to printk has causes issues for me
in my own debugging. Like we can no longer do large dumps of printk from
NMI context. Which I use to do when detecting a lock up and then doing
a task list dump of all tasks. Or even a ftrace_dump_on_oops.

http://lkml.kernel.org/r/20180109162019.gl3...@hirez.programming.kicks-ass.net


-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Tejun Heo
Hello, Steven.

On Tue, Jan 09, 2018 at 05:47:50PM -0500, Steven Rostedt wrote:
> > Maybe it can break out eventually but that can take a really long
> > time.  It's OOM.  Most of userland is waiting for reclaim.  There
> > isn't all that much going on outside that and there can only be one
> > CPU which is OOMing.  The kernel isn't gonna be all that chatty.
> 
> Are you saying that the OOM is stuck printing over and over on a single
> CPU. Perhaps we should fix THAT.

I'm not sure what you meant but OOM code isn't doing anything bad
other than excluding others from doing OOM kills simultaneously, which
is what we want, and printing a lot of messages and then gets caught
up in a positive feedback loop.

To me, the whole point of this effort is preventing printk messages
from causing significant or critical disruptions to overall system
operation.  IOW, it's rather dumb if the machine goes down because
somebody printk'd wrong or just failed to foresee the combinations of
events which could lead to such conditions.

It's not like we don't know how to fix this either.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Steven Rostedt
On Tue, 9 Jan 2018 14:17:05 -0800
Tejun Heo  wrote:

> Hello, Steven.
> 
> On Tue, Jan 09, 2018 at 05:08:47PM -0500, Steven Rostedt wrote:
> > The scenario you listed would affect multiple CPUs and multiple CPUs
> > would be flooding printk. In that case my patch WILL help. Because the
> > current method, the first CPU to do the printk will get stuck doing the
> > printk for ALL OTHER CPUs. With my patch, the printk load will migrate
> > around and there will not be a single CPU that is stuck.  
> 
> Maybe it can break out eventually but that can take a really long
> time.  It's OOM.  Most of userland is waiting for reclaim.  There
> isn't all that much going on outside that and there can only be one
> CPU which is OOMing.  The kernel isn't gonna be all that chatty.

Are you saying that the OOM is stuck printing over and over on a single
CPU. Perhaps we should fix THAT.

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Tejun Heo
On Wed, Jan 10, 2018 at 07:08:30AM +0900, Tetsuo Handa wrote:
> > * Netconsole tries to send out OOM messages and tries memory
> >   allocation which fails which then prints allocation failed messages.
> >   Because this happens while already printing, it just queues the
> >   messages to the buffer.  This repeats.
> 
> What? Does netconsole need to allocate memory when sending? I assume it does 
> not.

A lot of network drivers do, unfortunatley.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Tejun Heo
Hello, Steven.

On Tue, Jan 09, 2018 at 05:08:47PM -0500, Steven Rostedt wrote:
> The scenario you listed would affect multiple CPUs and multiple CPUs
> would be flooding printk. In that case my patch WILL help. Because the
> current method, the first CPU to do the printk will get stuck doing the
> printk for ALL OTHER CPUs. With my patch, the printk load will migrate
> around and there will not be a single CPU that is stuck.

Maybe it can break out eventually but that can take a really long
time.  It's OOM.  Most of userland is waiting for reclaim.  There
isn't all that much going on outside that and there can only be one
CPU which is OOMing.  The kernel isn't gonna be all that chatty.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Steven Rostedt
On Tue, 9 Jan 2018 12:06:20 -0800
Tejun Heo  wrote:


> What's happening is that the OOM killer is trapped flushing printk
> failing to clear the memory condition and that leads irq / softirq
> contexts to produce messages faster than can be flushed.  I don't see
> how we'd be able to clear the condition without introducing an
> independent context to flush the ring buffer.
> 
> Again, this is an actual problem that we've been seeing fairly
> regularly in production machines.

But your test case is pinned to a single CPU. You have a work queue
that does a printk and triggers an timer interrupt to go off on that
same CPU. Then the timer interrupt does a 10,000 printks, over and over
on the same CPU. Of course that will be an issue, and it is NOT similar
to the scenario that you listed above.

The scenario you listed would affect multiple CPUs and multiple CPUs
would be flooding printk. In that case my patch WILL help. Because the
current method, the first CPU to do the printk will get stuck doing the
printk for ALL OTHER CPUs. With my patch, the printk load will migrate
around and there will not be a single CPU that is stuck.

So no, your test is not realistic.

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Tetsuo Handa
Tejun Heo wrote:
> The code might suck but I think this does replicate what we've been
> seeing regularly in the fleet.  The console side is pretty slow - IPMI
> faithfully emulating serial console.  I don't know it's doing 115200
> or even slower.  Please consider something like the following.

Emulated serial consoles tend to be slow.

> 
> * The kernel isn't preemptible.  Machine runs out of memory, hits OOM
>   condition.  It starts printing OOM information.
> 
> * Netconsole tries to send out OOM messages and tries memory
>   allocation which fails which then prints allocation failed messages.
>   Because this happens while already printing, it just queues the
>   messages to the buffer.  This repeats.

What? Does netconsole need to allocate memory when sending? I assume it does 
not.

> 
> * We're still in the middle of OOM and hasn't killed nobody, so memory
>   keeps being short and the printk ring buffer is continuously getting
>   filled by the above.  Also, after a bit, RCU stall warnings kick in
>   too producing more messages.

And mutex_trylock(&oom_lock) keeps wasting CPU. :-(


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-09 Thread Tejun Heo
Hello, Steven.

My apologies for the late reply.  Was traveling and then got sick.

On Thu, Dec 21, 2017 at 11:19:32PM -0500, Steven Rostedt wrote:
> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

Sergey's patch seems more complex (and probably handles more
requirements) but my original patch was pretty simple.

http://lkml.kernel.org/r/20171102135258.go3252...@devbig577.frc2.facebook.com

> > static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> > {
> > int i;
> > 
> > if (READ_ONCE(in_printk))
> > for (i = 0; i < 1; i++)
> > printk("%-80s\n", "XXX TIMER");
> 
> WTF!
> 
> You are printing 10,000 printk messages from an interrupt context???
> And to top it off, I ran this on my box, switching printk() to
> trace_printk() (which is extremely low overhead). And it is triggered
> on the same CPU that did the printk() itself on. Yeah, there is no hand
> off, because you are doing a shitload of printks on one CPU and nothing
> on any of the other CPUs. This isn't the problem that my patch was set
> out to solve, nor is it a very realistic problem. I added a counter to
> the printk as well, to keep track of how many printks there were:

The code might suck but I think this does replicate what we've been
seeing regularly in the fleet.  The console side is pretty slow - IPMI
faithfully emulating serial console.  I don't know it's doing 115200
or even slower.  Please consider something like the following.

* The kernel isn't preemptible.  Machine runs out of memory, hits OOM
  condition.  It starts printing OOM information.

* Netconsole tries to send out OOM messages and tries memory
  allocation which fails which then prints allocation failed messages.
  Because this happens while already printing, it just queues the
  messages to the buffer.  This repeats.

* We're still in the middle of OOM and hasn't killed nobody, so memory
  keeps being short and the printk ring buffer is continuously getting
  filled by the above.  Also, after a bit, RCU stall warnings kick in
  too producing more messages.

What's happening is that the OOM killer is trapped flushing printk
failing to clear the memory condition and that leads irq / softirq
contexts to produce messages faster than can be flushed.  I don't see
how we'd be able to clear the condition without introducing an
independent context to flush the ring buffer.

Again, this is an actual problem that we've been seeing fairly
regularly in production machines.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2018-01-04 Thread Sergey Senozhatsky
On (12/04/17 22:48), Sergey Senozhatsky wrote:
>   A new version, yet another rework. Lots of changes, e.g. hand off
> control based on Steven's patch. Another change is that this time around
> we finally have a kernel module to test printk offloading (YAYY!). The
> module tests a bunch use cases; we also have trace printk events to...
> trace offloading. I'll post the testing script and test module in reply
> to this mail. So... let's have some progress ;) The code is not completely
> awesome, but not tremendously difficult at the same time. We can verify
> the approach/design (we have tests and traces now) first and then start
> improving the code.

a quick note:

the patch set and test_printk module are obsolete.
I have "reworked" both.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-30 Thread Sergey Senozhatsky
Hello,

On (12/29/17 22:59), Tetsuo Handa wrote:
[..]
> Just an idea: Do we really need to use a semaphore for console_sem?
> 
> Is it possible to replace it with a spinlock? Then, I feel that we can write
> to consoles from non-process context (i.e. soft or hard IRQ context), with
> write only one log (or even one byte) at a time (i.e. write one log from one
> context, and defer all remaining logs by "somehow" scheduling for calling
> that context again).
> 
> Since process context might fail to allow printk kernel thread to run for
> long period due to many threads waiting for run, I thought that interrupt
> context might fit better if we can "somehow" chain interrupt contexts.


that's a good question. printk(), indeed, does not care that much. but
the whole thing is more complex. I can copy-paste (sorry for that) one
of my previous emails to give a brief (I'm sure the description is
incomplete) idea.



the real purpose of console_sem is to synchronize all events that can
happen to VT, fbcon, TTY, video, etc. and there are many events that
can happen to VT/fbcon. and some of those events can sleep - that's
where printk() can suffer. and this is why printk() is not different
from any  other console_sem users -- printk() uses that lock in order
to synchronize its own events: to have only one printing CPU, to prevent
concurrent console drivers list modification, to prevent concurrent consoles
modification, and so on.

let's take VT and fbcon for simplicity.

the events are.

1) IOCTL from user space
   they may involve things like resizing, scrolling, rotating,

   take a look at drivers/tty/vt/vt_ioctl.c  vt_ioctl().
   we need to take console_sem there because we modify the very
   important things - size, font maps, etc. we don't want those changes
   to mess with possibly active print outs happening from another CPUs.

2) timer events and workqueue events
   even cursor blinking must take console_sem. because it modifies the
   state of console/screen. take a look at drivers/video/fbdev/core/fbcon.c
   show_cursor_blink() for example.

   and take a look at fbcon_add_cursor_timer() in 
drivers/video/fbdev/core/fbcon.c

3) foreground console may change. video driver may be be initialized and
   registered.

4) PM events
   for exaple, drivers/video/fbdev/aty/radeon_pm.c   radeonfb_pci_suspend()

5) TTY write from user space
   when user space wants to write anything to console it goes through
   nTTY -> con_write() -> do_con_write().

 CPU: 1 PID: 1 Comm: systemd
 Call Trace:
  do_con_write+0x4c/0x1a5f
  con_write+0xa/0x1d
  n_tty_write+0xdb/0x3c5
  tty_write+0x191/0x223
  n_tty_receive_buf+0x8/0x8
  do_loop_readv_writev.part.23+0x58/0x89
  do_iter_write+0x98/0xb1
  vfs_writev+0x62/0x89

take a look at drivers/tty/vt/vt.c do_con_write()


it does a ton of things. why - because we need to scroll the console;
we need to wrap around the lines; we need to process control characters
- like \r or \n and so on and need to modify the console state accordingly;
we need to do UTF8/ASCII/etc. all of this things cannot run concurrently with
IOCTL that modify the font map or resize the console, or flip it, or rotate
it.

take a look at lf() -> con_scroll() -> fbcon_scroll() // 
drivers/video/fbdev/core/fbcon.c

we also don't want printk() to mess with do_con_write(). including
printk() from IRQ.

6) even more TTY
   I suspect that TTY may be invoked from IRQ.

7) printk() write  (and occasional ksmg_dump dumpers, e.g. 
arch/um/kernel/kmsg_dump)

   printk() goes through console_unlock()->vt_console_print().
   and it, basically, must handle all the things that TTY write does.
   handle console chars properly, do scrolling, wrapping, etc. and we
   don't want anthing else to jump in and mess with us at this stage.
   that's why we user console_sem in printk.c - to serialize all the
   events... including concurrent printk() from other CPUs. that's why
   we do console_trylock() in vprintk_emit().

8) even more printk()
   printk() can be called from IRQ. console_sem stops it if some of
   the consoles can't work in IRQ context right now.

9) consoles have notifiers

/*
 * We defer the timer blanking to work queue so it can take the console mutex
 * (console operations can still happen at irq time, but only from printk which
 * has the console mutex. Not perfect yet, but better than no locking
 */
static void blank_screen_t(unsigned long dummy)
{
blank_timer_expired = 1;
schedule_work(&console_work);
}

so console_sem is also used to, basically, synchronize IRQs/etc.

10) I suspect that some consoles can do things with console_sem from IRQ
   context.

and so on. we really use console_sem as a big-kernel-lock.


so where console_sem users might sleep? in tons of places...

like ark_pci_suspend()   console_lock(); mutex_lock(par);
or ark_pci_resume()   console_lock(); mutex_lock();
or con_install()  console_lock(); vc_allocate() -> kzalloc(GFP_KERNEL)

and so on and on a

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-29 Thread Tetsuo Handa
Sergey Senozhatsky wrote:
> On (12/28/17 15:48), Sergey Senozhatsky wrote:
> [..]
> > and I'm actually thinking about returning back the old vprintk_emit()
> > behavior
> > 
> >vprintk_emit()
> >{
> > + preempt_disable();
> >  if (console_trylock())
> >  console_unlock();
> > + preempt_enable();
> >}
> 
> but am not going to.
> it's outside of printk_kthread scope. and, besides, not every CPU which
> is looping on console_unlock() came there via printk(). so by disabling
> preemption in console_unlock() we cover more cases.

Just an idea: Do we really need to use a semaphore for console_sem?

Is it possible to replace it with a spinlock? Then, I feel that we can write
to consoles from non-process context (i.e. soft or hard IRQ context), with
write only one log (or even one byte) at a time (i.e. write one log from one
context, and defer all remaining logs by "somehow" scheduling for calling
that context again).

Since process context might fail to allow printk kernel thread to run for
long period due to many threads waiting for run, I thought that interrupt
context might fit better if we can "somehow" chain interrupt contexts.


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-28 Thread Sergey Senozhatsky
On (12/28/17 15:48), Sergey Senozhatsky wrote:
[..]
> and I'm actually thinking about returning back the old vprintk_emit()
> behavior
> 
>vprintk_emit()
>{
> + preempt_disable();
>  if (console_trylock())
>  console_unlock();
> + preempt_enable();
>}

but am not going to.
it's outside of printk_kthread scope. and, besides, not every CPU which
is looping on console_unlock() came there via printk(). so by disabling
preemption in console_unlock() we cover more cases.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-27 Thread Sergey Senozhatsky

Hello,

On (12/21/17 23:19), Steven Rostedt wrote:
[..]
> > I wrote this before but this isn't a theoretical problem.  We see
> > these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> > pressure is high and OOM triggers and printk starts printing out OOM
> > warnings; then, a network packet comes in which triggers allocations
> > in the network layer, which fails due to memory pressure, which then
> > generates memory allocation failure messages which then generates
> > netconsole packets which then tries to allocate more memory and so on.
> 
> It doesn't matter if preemption is enabled or not. The hand off should
> happen either way.

preemption does matter. it matters so much, that it makes the first
thing your patch depends on to be questionable - if CPUA stuck
   in console_unlock() printing other CPU's messages, it's
   because there are currently printk()-s happening on CPUB-CPUZ.

which is not always true. with preemption enabled CPUA can stuck in
console_unlock() because printk()-s from CPUB-CPUZ could have happened
seconds or even minutes ago. I have demonstrated it with the traces; it
didn't convenience you. OK, I sat down and went trough Tetsuo's reports
starting from 2016:

https://marc.info/?l=linux-kernel&m=151375384500555

and Tetsuo has reported several times that printing CPU can sleep for
minutes with console_sem being locked; while other CPUs are happy to
printk()->log_store() as much as they want. and that's why I believe
that that "The hand off should happen either way" is dangerous, especially
when we hand off from a preemptible context to an atomic context. you
don't think that this is a problem, because your patch requires logbuf to
be small enough for any atomic context (irq, under spin_lock, etc) to be
able to print out all the pending messages to all registered consoles [we
print to consoles sequentially] within watchdog_threshold seconds (10
seconds by default). who does adjust the size of the logbuf in a way that
console_unlock() can print the entire logbuf under 10 seconds?


> > It's just that there's no one else to give that flushing duty too, so
> > the pingpoinging that your patch implements can't really help
> > anything.
> > 
> > You argue that it isn't made worse by your patch, which may be true
> > but your patch doesn't solve actual problems and is most likely
> > unnecessary complication which gets in the way for the actual
> > solution.  It's a weird argument to push an approach which is
> > fundamentally broken.  Let's please stop that.
> 
> BULLSHIT!
> 
> It's not a complex solution, and coming from the cgroup and workqueue
> maintainer, that's a pretty ironic comment.
> 
> It has already been proven that it can solve problems:
> 
>   http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz

and Tetsuo also said that his test does not cover all the scenarios;
and he has sent us all a pretty clear warning:

 : Therefore, while it is true that any approach would survive my
 : environment, it is dangerous to assume that any approach is safe
 : for my customer's enterprise servers.

https://marc.info/?l=linux-kernel&m=151377161705573


when it comes to lockups, printk() design is so flexible, that one can
justify nearly every patch no matter how many regressions it introduces,
by simply saying:
"but the same lockup scenario could have happened even before my
 patch. it's just printk() was designed this way. you were lucky
 enough not to hit the problem before; now you are less lucky".

been there, done that. it's a trap.

> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

printk_kthread *used* to be way too independent. basically what we had
before was

bad_function()
 printk()
  vprintk_emit()
  {
if (oops_in_progress)
   can_offload_to_printk = false;

if (can_offload_to_printk)
   wake_up(printk_kthread);
else if (console_trylock())
   console_unlock();
  }


the bad things:

- first, we do not take into account the fact that printk_kthread can
  already be active.

- second, we do not take into account the fact that printk_kthread can
  be way-way behind - a huge gap between `console_seq' and `log_next_seq'.

- third, we do not take into account the fact that printk_kthread can
  be preempted under console_sem.

so, IOW, the CPU which was in trouble declared the "oh, we should not
offload to printk kthread" emergency only for itself, basically. the CPU
which printk_kthread was running on or sleeping on [preemption] did not
care, neither did printk_kthread.

what we have now:

- printk_kthread cannot be preempted under console_sem. if it acquires
  the console_sem it stays RUNNING, printing the messages.

- printk_kthread does check for emergency on other CPUs. right 

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-21 Thread Steven Rostedt
On Thu, 21 Dec 2017 16:09:32 -0800
Tejun Heo  wrote:
> 
> I tried your v4 patch and ran the test module and could easily
> reproduce RCU stall and other issues stemming from a CPU getting
> pegged down by printk flushing.  I'm attaching the test module code at
> the end.

Thanks, I'll take a look.

> 
> I wrote this before but this isn't a theoretical problem.  We see
> these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> pressure is high and OOM triggers and printk starts printing out OOM
> warnings; then, a network packet comes in which triggers allocations
> in the network layer, which fails due to memory pressure, which then
> generates memory allocation failure messages which then generates
> netconsole packets which then tries to allocate more memory and so on.

It doesn't matter if preemption is enabled or not. The hand off should
happen either way.


> 
> It's just that there's no one else to give that flushing duty too, so
> the pingpoinging that your patch implements can't really help
> anything.
> 
> You argue that it isn't made worse by your patch, which may be true
> but your patch doesn't solve actual problems and is most likely
> unnecessary complication which gets in the way for the actual
> solution.  It's a weird argument to push an approach which is
> fundamentally broken.  Let's please stop that.

BULLSHIT!

It's not a complex solution, and coming from the cgroup and workqueue
maintainer, that's a pretty ironic comment.

It has already been proven that it can solve problems:

  http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz

You don't think handing off printks to an offloaded thread isn't more
complex nor can it cause even more issues (like more likely to lose
relevant information on kernel crashes)?

> 
> Thanks.
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> static bool in_printk;
> static bool stop_testing;
> static struct hrtimer printk_timer;
> static ktime_t timer_interval;
> 
> static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> {
>   int i;
> 
>   if (READ_ONCE(in_printk))
>   for (i = 0; i < 1; i++)
>   printk("%-80s\n", "XXX TIMER");

WTF!

You are printing 10,000 printk messages from an interrupt context???
And to top it off, I ran this on my box, switching printk() to
trace_printk() (which is extremely low overhead). And it is triggered
on the same CPU that did the printk() itself on. Yeah, there is no hand
off, because you are doing a shitload of printks on one CPU and nothing
on any of the other CPUs. This isn't the problem that my patch was set
out to solve, nor is it a very realistic problem. I added a counter to
the printk as well, to keep track of how many printks there were:

 # trace-cmd record -e printk -e irq

kworker/-16030...1  1571.783182: print:(nil)s: start 0
kworker/-16030d..1  1571.783189: console:  [ 1571.540656] XXX 
PREEMPT 
kworker/-16030d.h1  1571.791953: softirq_raise:vec=1 [action=TIMER]
kworker/-16030d.h1  1571.791953: softirq_raise:vec=9 [action=RCU]
kworker/-16030d.h1  1571.791957: softirq_raise:vec=7 [action=SCHED]
kworker/-16030d.h1  1571.791959: print:(nil)s: XXX TIMER
(0)
kworker/-16030d.h1  1571.791960: print:(nil)s: XXX TIMER
(1)


Let's look at the above. My trace_printk() is the "start 0" which is
the first iteration of the 100 printk()s you are doing. The "console"
is the printk() tracepoint of your worker thread printk.

The other "print:" events are my trace_printk()s that replaced the
printk() from your interrupt handler. I added a counter to see which
iteration it is. Note, trace_printk() is much faster than printk()
writing to the log buffer (let alone writing to the console).


kworker/-16030d.h1  1571.791960: print:(nil)s: XXX TIMER
(2)
kworker/-16030d.h1  1571.791961: print:(nil)s: XXX TIMER
(3)
kworker/-16030d.h1  1571.791961: print:(nil)s: XXX TIMER
(4)
kworker/-16030d.h1  1571.791962: print:(nil)s: XXX TIMER
(5)

[..]

kworker/-16030d.h1  1571.794473: print:(nil)s: XXX TIMER
(9992)
kworker/-16030d.h1  1571.794474: print:(nil)s: XXX TIMER
(9993)

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-21 Thread Tejun Heo
Hello,

Sorry about the long delay.

On Thu, Dec 14, 2017 at 01:21:09PM -0500, Steven Rostedt wrote:
> > Yeah, will do, but out of curiosity, Sergey and I already described
> > what the root problem was and you didn't really seem to take that.  Is
> > that because the explanation didn't make sense to you or us
> > misunderstanding what your code does?
> 
> Can you post the message id of the discussion you are referencing.
> Because I've been swamped with other activities and only been skimming
> these threads.

This was already on another reply but just in case.

 http://lkml.kernel.org/r/20171108162813.ga983...@devbig577.frc2.facebook.com

I tried your v4 patch and ran the test module and could easily
reproduce RCU stall and other issues stemming from a CPU getting
pegged down by printk flushing.  I'm attaching the test module code at
the end.

I wrote this before but this isn't a theoretical problem.  We see
these stalls a lot.  Preemption isn't enabled to begin with.  Memory
pressure is high and OOM triggers and printk starts printing out OOM
warnings; then, a network packet comes in which triggers allocations
in the network layer, which fails due to memory pressure, which then
generates memory allocation failure messages which then generates
netconsole packets which then tries to allocate more memory and so on.

It's just that there's no one else to give that flushing duty too, so
the pingpoinging that your patch implements can't really help
anything.

You argue that it isn't made worse by your patch, which may be true
but your patch doesn't solve actual problems and is most likely
unnecessary complication which gets in the way for the actual
solution.  It's a weird argument to push an approach which is
fundamentally broken.  Let's please stop that.

Thanks.

#include 
#include 
#include 
#include 
#include 
#include 

static bool in_printk;
static bool stop_testing;
static struct hrtimer printk_timer;
static ktime_t timer_interval;

static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
{
int i;

if (READ_ONCE(in_printk))
for (i = 0; i < 1; i++)
printk("%-80s\n", "XXX TIMER");

hrtimer_forward_now(&printk_timer, timer_interval);
return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
}   

static void preempt_printk_workfn(struct work_struct *work)
{
int i;

hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
printk_timer.function = printk_timerfn;
timer_interval = ktime_set(0, NSEC_PER_MSEC);
hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);

while (!READ_ONCE(stop_testing)) {
preempt_disable();
WRITE_ONCE(in_printk, true);
for (i = 0; i < 100; i++)
printk("%-80s\n", "XXX PREEMPT");
WRITE_ONCE(in_printk, false);
preempt_enable();
msleep(1);
}
}
static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);

static int __init test_init(void)
{
queue_work_on(0, system_wq, &preempt_printk_work);
return 0;
}

static void __exit test_exit(void)
{
WRITE_ONCE(stop_testing, true);
flush_work(&preempt_printk_work);
hrtimer_cancel(&printk_timer);
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-20 Thread Sergey Senozhatsky
Hi Tetsuo,

On (12/20/17 21:06), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
>
[..]
>
> Anyway, the rule that "do not try to printk() faster than the kernel can
> write to consoles" will remain no matter how printk() changes.

and the "faster than the kernel can write to consoles" is tricky.
it does not depend ONLY on the performance of underlying console
devices. but also on the scheduler and IRQs. simply because of the
way we print:

void console_unlock(void)
{
for (;;) {
local_irq_save();

text = pick_the_next_pending_logbuf_message();

call_console_drivers(text);
local_irq_enable();
 preemption + irqs
}
}

on my board call_console_drivers() can spend up to 0.01 of a second
printing a _single_ message. which basically means that I'm guaranteed
to have preemption in console_unlock() under console_sem after every
line it prints [if console_unlock() is invoked from preemptible context].
this, surely, has huge impact on "do not try to printk() faster than
this".

but there are more points in my "why the patch doesn't work for me"
email than just "preemption in console_unlock()".

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-20 Thread Tetsuo Handa
Sergey Senozhatsky wrote:
> Steven said that this scenario is possible, but is not of any particular
> interest, because printk from IRQ or from any other atomic context is a
> bad thing, which should happen only when something wrong is going on in
> the system. but we are in OOM or has just returned from the OOM. which _is_
> "something bad going on", isn't it? can we instead say - OOM makes that
> printk from atomic context more likely? if it does happen, will there be
> non-atomic printk-s to take over printing from atomic CPUz? we can't tell.
> I don't know much about Tetsuo's test, but I assume that his VM does not
> have any networking activities during the test. I probably wouldn't be so
> surprised to see a bunch of printk-s from atomic contexts under OOM.

I'm using VMware Workstation Player, and my VM does not have any network
activity other than ssh login session. Fortunately, VMware's serial console
(written to host's file) is reliable enough to allow console=ttyS0,115200n8
configuration. But there is a virtualization software where serial console is
so weak that I have to choose netconsole instead. Also, there are enterprise
servers where very slow configuration (e.g. 1200 or 9600) has to be used for
serial console because serial device is emulated using system management
interrupts instead of using real hardware. Therefore, while it is true that
any approach would survive my environment, it is dangerous to assume that any
approach is safe for my customer's enterprise servers.

Thanks for summarizing the pointers. The safest way for not overflowing
printk() will be to use mutex_lock(&oom_lock) at __alloc_pagesmay_oom() (and
yield the CPU resource to the thread flushing the logbuf), but so far we
have not came to agreement. Fortunately, since warn_alloc() for reporting
allocation stall was killed in 4.15-rc1, the risk of overflowing printk()
under OOM was reduced a lot. But yes, since my VM has little network
activity, printk() flooding due to allocation failure might happen in
different VMs.

Anyway, the rule that "do not try to printk() faster than the kernel can
write to consoles" will remain no matter how printk() changes. I think that
any printk() users has to be careful not to waste CPU resource. MM's direct
reclaim + back off combination is a user who really love to waste CPU resource
while someone is printk()ing.


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-19 Thread Sergey Senozhatsky
On (12/19/17 09:40), Steven Rostedt wrote:
> On Tue, 19 Dec 2017 13:58:46 +0900
> Sergey Senozhatsky  wrote:
> 
> > so you are not convinced that my scenarios real/matter; I'm not
> 
> Well, not with the test module. I'm looking for actual code in the
> upstream kernel.
> 
> > convinced that I have stable and functional boards with this patch ;)
> > seems that we are coming to a dead end.
> 
> Can I ask, is it any worse than what we have today?

that's a really hard question. both the existing printk() and the
tweaked printk() have the same thing in common - a) preemption from
console_unlock() and b) printk() being way to fast compared to anything
else (call_console_drivers() and to preemption latency of console_sem
owner). your patch puts some requirements that my workload simply cannot
fulfill. so may be if I'll start pushing it towards OOM and so on, then
I'll see some difference (but both (a) and (b) still gonna stay true).

the thing that really changes everything is offloading to printk_kthread.
given that I can't have a tiny logbuf, and that I can't have fast console,
and that I can't have tons of printks to chose from and to get advantage
of hand off algorithm in any reliable way; I need something more to
guarantee that the current console_sem will not be forced to evict all
logbuf messages.

> > for the record,
> > I'm not going to block the patch if you want it to be merged.
> 
> Thanks,

I mean it :)

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-19 Thread Sergey Senozhatsky
Hello,

not sure if you've been following the whole thread, so I'll try
to summarize it here. apologies if it'll massively repeat the things
that have already been said or will be too long.

On (12/19/17 15:31), Michal Hocko wrote:
> On Tue 19-12-17 10:24:55, Sergey Senozhatsky wrote:
> > On (12/18/17 20:08), Steven Rostedt wrote:
> > > > ... do you guys read my emails? which part of the traces I have provided
> > > > suggests that there is any improvement?
> > > 
> > > The traces I've seen from you were from non-realistic scenarios.
> > > But I have hit issues with printk()s happening that cause one CPU to do 
> > > all
> > > the work, where my patch would fix that. Those are the scenarios I'm
> > > talking about.
> > 
> > any hints about what makes your scenario more realistic than mine?
> 
> Well, Tetsuo had some semi-realistic scenario where alloc stall messages
> managed to stall other printk callers (namely oom handler). I am saying
> sem-realistic because he is testing OOM throughput with an unrealistic
> workload which itself is not very real on production systems. The
> essential thing here is that many processes might be in the allocation
> path and any printk there could just swamp some unrelated printk caller
> and cause hard to debug problems. Steven's patch should address that
> with a relatively simple lock handover. I was pessimistic this would
> work sufficiently well but it survived Tetsuo's testing IIRC so it
> sounds good enough to me.

sure, no denial. Tetsuo indeed said that Steven's patch passed his
test. and for the note, the most recent printk_kthread patch set passed
Tetsuo's test as well; I asked him privately to run the tests before I
published it. but this is not the point. and to make it clear - this is
not a "Steven's patch" vs. "my patch set + Steven's patch atop of it"
type of thing. not at all.


IMPORTANT DISCLAIMER
   I SPEAK FOR MYSELF ONLY. ABOUT MY OBSERVATION ONLY. THIS IS AN
   ATTEMPT TO ANALYSE WHY THE PATCH DIDN'T WORK ON MY SETUP AND WHY
   MY SETUP NEEDS ANOTHER APPROACH. THIS IS NOT TO WARN ANYONE THAT
   THE PATCH WON'T WORK ON THEIR SETUPS. I MEAN NO OFFENSE AND AM
   NOT TRYING TO LOOK/SOUND SMART. AND, LIKE I SAID, IF STEVEN OR
   PETR WANT TO PUSH THE PATCH, I'M NOT GOING TO BLOCK IT.

so why Steven's patch has not succeeded on my boards?... partly because
of the fact that printk is "multidimensional" in its complexity and
Steven's patch just doesn't touch some of those problematic parts; partly
because the patch has the requirements which can't be 100% true on my
boards.

to begin with,
so the patch works only when the console_sem is contended. IOW, when
multiple sites perform printk-s concurrently frequent enough for the
hand off logic to detect it and to pass the control to another CPU.
but this turned out to be a bit hard to count on. why? several problems.

(1) the really big one:
   console_sem owner can be preempted under console_sem, removing any
   console_sem competition, it's already locked - its owner is preempted.
   this removes any possibility of hand off. and this unleashes CPUs that
   do printk-s, because when console_sem is locked, printk-s from other
   CPUs become, basically, as fast as sprintf+memcpy.

(1.1) the really big one, part two. more on this later. see @PART2

(2) another big one:
   printk() fast path - sprintf+memcpy - can be significantly faster than
   call to console drivers. on my board I see that printk CPU can add 1140
   new messages to the logbuf, while active console_sem owner prints a
   single message to the serial console. not to mention console_sem owner
   preemption.

(1) and (2) combined can do magical things. on my extremely trivial
test -- not insanely huge number of printks (much less than 2000 lines)
from a preemptible context, not so much printk-s from other CPUs - I
can see, and I confirmed that with the traces, that when console_sem is
getting handed over to another CPU and that new console_sem owner is
getting preempted or when it begins to print messages to serial console,
the CPU that actually does most of printk-s finishes its job in almost no
time, because all it has to do is sprintf+memcpy. which basically means
that console_sem is not contended anymore, and thus its either current
console_sem owner or _maybe_ some other task that has to print all of
the pending messages.

and to make it worse, the hand off logic does not distinguish between
contexts it's passing the console_sem ownership to. it will be equally
happy to hand off console_sem to atomic context from a non atomic one,
or from atomic to another atomic (and vice versa), regardless the number
of pending messages in the logbuf. more on this later [see @LATER].

now, Steven and Petr said that my test was non realistic and thus the
observations were invalid. OK, let's take a closer look on OOM. and
let's start with the question - what is so special about OOM that
makes (1) or (1.1) or (2) invalid?
i
can we get preempted when we call out_

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-19 Thread Steven Rostedt
On Tue, 19 Dec 2017 13:58:46 +0900
Sergey Senozhatsky  wrote:


> so you are not convinced that my scenarios real/matter; I'm not

Well, not with the test module. I'm looking for actual code in the
upstream kernel.

> convinced that I have stable and functional boards with this patch ;)
> seems that we are coming to a dead end.

Can I ask, is it any worse than what we have today?

> 
> for the record,
> I'm not going to block the patch if you want it to be merged.

Thanks,

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-19 Thread Michal Hocko
On Tue 19-12-17 10:24:55, Sergey Senozhatsky wrote:
> On (12/18/17 20:08), Steven Rostedt wrote:
> > > ... do you guys read my emails? which part of the traces I have provided
> > > suggests that there is any improvement?
> > 
> > The traces I've seen from you were from non-realistic scenarios.
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.
> 
> any hints about what makes your scenario more realistic than mine?

Well, Tetsuo had some semi-realistic scenario where alloc stall messages
managed to stall other printk callers (namely oom handler). I am saying
sem-realistic because he is testing OOM throughput with an unrealistic
workload which itself is not very real on production systems. The
essential thing here is that many processes might be in the allocation
path and any printk there could just swamp some unrelated printk caller
and cause hard to debug problems. Steven's patch should address that
with a relatively simple lock handover. I was pessimistic this would
work sufficiently well but it survived Tetsuo's testing IIRC so it
sounds good enough to me.
-- 
Michal Hocko
SUSE Labs


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 22:38), Steven Rostedt wrote:
> On Tue, 19 Dec 2017 11:46:10 +0900
> Sergey Senozhatsky  wrote:
> 
> > anyway,
> > before you guys push the patch to printk.git, can we wait for Tejun to
> > run his tests against it?
> 
> I've been asking for that since day one ;-)

ok

so you are not convinced that my scenarios real/matter; I'm not
convinced that I have stable and functional boards with this patch ;)
seems that we are coming to a dead end.

for the record,
I'm not going to block the patch if you want it to be merged.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 12:46), Steven Rostedt wrote:
> > One question is if we really want to rely on offloading in
> > this case. What if this is printed to debug some stalled
> > system.
> 
> Correct, and this is what I call when debugging hard lockups, and I do
> it from NMI.
[..]
> show_state_filter() is not a normal printk() call. It is used for
> debugging.

just for the record. a side note.

you guys somehow made spectacularly off-target conclusions from the
traces I have provided and decided NOT to concentrate on demonstrated
behavioural patterns, but on, perhaps, process' names (I really should
have renamed i_do_printks to DONALD_TRUMP ;) ) and on how those printk
lines got into the logbuf. like if it mattered. [seriously, why?]. the
point was not in show_state_filter()... the point was - preemption and
things that hand off does. but somehow filling up logbuf when console_sem
owner is preempted is unrealistic if printks are coming from task A
under normal conditions; and it is a completely different story when
the same task A fills up logbuf from OOM while console_sem owner is
preempted. the end result is the same in both cases: it's not task A
that is going to flush logbuf. it's some other task that will have to
do it, possibly being in atomic context. anyway, anyway.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Steven Rostedt
On Tue, 19 Dec 2017 11:46:10 +0900
Sergey Senozhatsky  wrote:

> anyway,
> before you guys push the patch to printk.git, can we wait for Tejun to
> run his tests against it?

I've been asking for that since day one ;-)

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 21:03), Steven Rostedt wrote:
> > and this is exactly what I'm still observing. i_do_printks-1992 stops
> > printing, while console_sem is owned by another task. Since log_store()
> > much faster than call_console_drivers() AND console_sem owner is getting
> > preempted for unknown period of time, we end up having pending messages
> > in logbuf... and it's kworker/0:1-135 that prints them all.
> > 
> >systemd-udevd-671   [003] d..366.334866: offloading: set 
> > console_owner
> >  kworker/0:1-135   [000] d..266.335999: offloading: 
> > vprintk_emit()->trylock FAIL  will spin? :1
> > i_do_printks-1992  [002] d..266.345474: offloading: 
> > vprintk_emit()->trylock FAIL  will spin? :0x 1100
> >...
> >systemd-udevd-671   [003] d..366.345917: offloading: clear 
> > console_owner  waiter != NULL :1
> 
> And kworker will still be bounded in what it can print. Yes it may end
> up being the entire buffer, but that should not take longer than a
> watchdog.

not the case on my setup. "1100 messages" is already longer than watchdog.
consoles don't scale. if anyone's console can keep up with 2 printing CPUs,
then let's see what logbuf size that person will set on a system with 1024
CPUs under OOM. I doubt that will be 128KB.

anyway,
before you guys push the patch to printk.git, can we wait for Tejun to
run his tests against it? (or do we have a preemptive "non realistic
tests" conclusion?)

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Steven Rostedt
On Tue, 19 Dec 2017 10:24:55 +0900
Sergey Senozhatsky  wrote:

> On (12/18/17 20:08), Steven Rostedt wrote:
> > > ... do you guys read my emails? which part of the traces I have provided
> > > suggests that there is any improvement?  
> > 
> > The traces I've seen from you were from non-realistic scenarios.
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.  
> 
> any hints about what makes your scenario more realistic than mine?
> to begin with, what was the scenario?

It was a while ago when I hit it. I think it was an OOM issue. And it
wasn't contrived. It happened on a production system.

> 
> [..]
> 
> > But I have hit issues with printk()s happening that cause one CPU to do all
> > the work, where my patch would fix that. Those are the scenarios I'm
> > talking about.  
> 
> and this is exactly what I'm still observing. i_do_printks-1992 stops
> printing, while console_sem is owned by another task. Since log_store()
> much faster than call_console_drivers() AND console_sem owner is getting
> preempted for unknown period of time, we end up having pending messages
> in logbuf... and it's kworker/0:1-135 that prints them all.
> 
>systemd-udevd-671   [003] d..366.334866: offloading: set console_owner
>  kworker/0:1-135   [000] d..266.335999: offloading: 
> vprintk_emit()->trylock FAIL  will spin? :1
> i_do_printks-1992  [002] d..266.345474: offloading: 
> vprintk_emit()->trylock FAIL  will spin? :0x 1100
>...
>systemd-udevd-671   [003] d..366.345917: offloading: clear 
> console_owner  waiter != NULL :1

And kworker will still be bounded in what it can print. Yes it may end
up being the entire buffer, but that should not take longer than a
watchdog.

If that proves to be an issue in the real world, then we could simply
wake up an offloaded thread, if the current owner does more than one
iteration (more than what it wrote). Then when the thread wakes up, it
simply does a printk, and it will take over by the waiter logic.

But that is only if it still appears to be an issue.

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 20:08), Steven Rostedt wrote:
> > ... do you guys read my emails? which part of the traces I have provided
> > suggests that there is any improvement?
> 
> The traces I've seen from you were from non-realistic scenarios.
> But I have hit issues with printk()s happening that cause one CPU to do all
> the work, where my patch would fix that. Those are the scenarios I'm
> talking about.

any hints about what makes your scenario more realistic than mine?
to begin with, what was the scenario?

[..]

> But I have hit issues with printk()s happening that cause one CPU to do all
> the work, where my patch would fix that. Those are the scenarios I'm
> talking about.

and this is exactly what I'm still observing. i_do_printks-1992 stops
printing, while console_sem is owned by another task. Since log_store()
much faster than call_console_drivers() AND console_sem owner is getting
preempted for unknown period of time, we end up having pending messages
in logbuf... and it's kworker/0:1-135 that prints them all.

   systemd-udevd-671   [003] d..366.334866: offloading: set console_owner
 kworker/0:1-135   [000] d..266.335999: offloading: 
vprintk_emit()->trylock FAIL  will spin? :1
i_do_printks-1992  [002] d..266.345474: offloading: 
vprintk_emit()->trylock FAIL  will spin? :0x 1100
   ...
   systemd-udevd-671   [003] d..366.345917: offloading: clear console_owner 
 waiter != NULL :1

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 15:10), Petr Mladek wrote:
[..]
> > All this is in the current upstream code as well. Steven's patch
> > should make it better in compare with the current upstream code.
> > 
> > Sure, the printk offload approach does all these things better.
> > But there is still the fear that the offload is not reliable
> > in all situations. The lazy offload should handle this well from
> > my POV but I am not 100% sure.
> 
> BTW: There is one interesting thing. If we rely on the kthread
> to handle flood of messages. It might be too slow because it
> reschedules. It might cause loosing messages. Note that
> the kthread should have rather normal priority to avoid
> blocking other processes.

... and this is why preemption is disabled in console_unlock()
in the patch set I have posted.

obviously you haven't even looked at the patches. wonderful.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Steven Rostedt
On Tue, 19 Dec 2017 10:03:11 +0900
Sergey Senozhatsky  wrote:


> ... do you guys read my emails? which part of the traces I have provided
> suggests that there is any improvement?

The traces I've seen from you were from non-realistic scenarios. But I
have hit issues with printk()s happening that cause one CPU to do all
the work, where my patch would fix that. Those are the scenarios I'm
talking about.

-- Steve



Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Steven Rostedt
On Tue, 19 Dec 2017 09:52:48 +0900
Sergey Senozhatsky  wrote:

> > The case here, you are talking about a CPU doing console_lock() from a
> > non printk() case. Which is what I was asking about how often this
> > happens.  
> 
> I'd say often enough. but the point I was trying to make is that we can
> have non-atomic CPUs which can do the print out, instead of "sharing the
> load" between atomic CPUs.

We don't even know if sharing between "atomic" and "non-atomic" is an
issue. Anything that does a printk() in an atomic location, is going to
have latency to begin with.

> 
> > As for why there's no handoff. Does the non printk()
> > console_lock/unlock ever happen from a critical location? I don't think
> > it does (but I haven't checked). Then it is the perfect candidate to do
> > all the printing.  
> 
> that's right. that is the point I was trying making. we can have better
> candidates to do all the printing.

Sure, but we don't even know if we have to. A problem scenario hasn't
come up that wasn't due to the current implementation (which my patch
changes).


> I did tests yesterday, traces are available. I can't conclude that
> the patch fixes the unfairness of printk().

It doesn't fix the "unfairness" it fixes the unboundedness of printk.
That is what has been triggering all the issues from before.



> consider the following case
> 
> we have console_lock() from non-atomic context. console_sem owner is
> getting preempted, under console_sem. which is totally possible and
> happens a lot. in the mean time we have OOM, which can print a lot of
> info. by the time console_sem returns back to TASK_RUNNING logbuf
> contains some number of pending messages [lets say 10 seconds worth
> of printing]. console owner goes to console_unlock(). accidentally
> we have printk from IRQ on CPUz. console_owner hands over printing
> duty to CPUz. so now we have to print 10 seconds worth of OOM messages
> from irq.

Yes that can happen. But printk's from irq context is not nice to have
either, and should only happen when things are going wrong to begin
with.

> 
> 
> 
> CPU0CPU1 ~ CPUx CPUz
> 
> console_lock
> 
>  << preempted >>
> 
> 
>OOMOOM printouts, lots
>   of OOM traces, etc.
> 
>   OOM end [progress done].
> 
>  << back to RUNNING >>
> 
>   console_unlock()
> 
> for (;;)
>   sets console_owner
>   call_console_drivers()IRQ
>  printk
> sees console_owner
> sets console_waiter
> 
>   clears console_owner
>   sees console_waier
>   handoff
>   for (;;) {
>
> call_console_drivers()
>??? lockup
> }
> up()
> 
> this is how we down() from non-atomic and up() from atomic [if we make
> it to up(). we might end up in NMI panic]. this scenario is totally possible,

The printk buffer needs to be very big, and bad things have to happen
first. This is a theoretical scenario, and I'd like to see it happen in
the real world before we try to fix it. My patch should make printk
behave *MUCH BETTER* than it currently does.

If you are worried about NMI panics, then we could add a touch nmi
within the printk loop.


> isn't it? the optimistic expectation here is that some other printk() from
> non-atomic CPU will jump in and take over printing from atomic CPUz. but I
> don't see why we are counting on it.

I don't see why we even care. Placing a printk in an atomic context is
a problem to begin with, and should only happen if there's issues in
the system.

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 12:46), Steven Rostedt wrote:
> On Mon, 18 Dec 2017 15:13:53 +0100
> Petr Mladek  wrote:
> 
> > One question is if we really want to rely on offloading in
> > this case. What if this is printed to debug some stalled
> > system.
> 
> Correct, and this is what I call when debugging hard lockups, and I do
> it from NMI. Which the new NMI code prevents all the data I want to
> print to come out to console.
> 
> I had to create a really huge buffer to print it.
> 
> show_state_filter() is not a normal printk() call. It is used for
> debugging. Not a very good example of issues that happen on production
> systems. If anything, this should be disabled on a production system.
> 
> Let's just add my patch (I'll respin it if it needs it), and send it
> off into the wild. Let's see if there's still reports of issues, and
> then come back to solutions. Because, really, I'm still not convinced
> that there's anything out there that needs much more "fixing" of
> printk().

... do you guys read my emails? which part of the traces I have provided
suggests that there is any improvement?

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
Hello Steven,

I couldn't reply sooner.


On (12/15/17 10:19), Steven Rostedt wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following 
> > > > preconditions:
> > > > 
> > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > > etc) context
> > > > 
> > > > what does guarantee that? what happens if there is NO non-atomic
> > > > CPU or that non-atomic simplky missses the console_owner != 
> > > > false
> > > > point? we are going to conclude
> > > > 
> > > > "if printk() doesn't work for you, it's because you are holding 
> > > > it wrong"?
> > > > 
> > > > 
> > > > what if that non-atomic CPU does not call printk(), but instead
> > > > it does console_lock()/console_unlock()? why there is no 
> > > > handoff?
> 
> The case here, you are talking about a CPU doing console_lock() from a
> non printk() case. Which is what I was asking about how often this
> happens.

I'd say often enough. but the point I was trying to make is that we can
have non-atomic CPUs which can do the print out, instead of "sharing the
load" between atomic CPUs.

> As for why there's no handoff. Does the non printk()
> console_lock/unlock ever happen from a critical location? I don't think
> it does (but I haven't checked). Then it is the perfect candidate to do
> all the printing.

that's right. that is the point I was trying making. we can have better
candidates to do all the printing.

[..]
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the configuration).
> > 
> 
> Is the above showing an issue of console_lock happening in the non
> printk() case?
>
> > do we periodically do console_lock() on a running system? yes, we do.
> > add to console_unlock()
> 
> Right, and the non printk() console_lock() should be fine to do all
> printing when it unlocks.

that's right.

> > this argument is really may be applied against your patch as well. I
> > really don't want us to have this type of "technical" discussion.
> 
> Sure, but my patch fixes the unfair approach that printk currently does.

I did tests yesterday, traces are available. I can't conclude that
the patch fixes the unfairness of printk().


> > printk() is a tool for developers. but developers can't use.
> > 
> > 
> > > >  c) the task that is looping in console_unlock() sees non-atomic CPU 
> > > > when
> > > > console_owner is set.  
> > > 
> > > I haven't looked at the latest code, but my last patch didn't care
> > > about "atomic" and "non-atomic"  
> > 
> > I know. and I think it is sort of a problem.
> 
> Please show me the case that it is. And don't explain where it is.
> Please apply the patch and have the problem occur and show it to me.
> That's all that I'm asking for.

I did some tests yesterday. I posted analysis and traces.

[..]
> No, because it is unrealistic. For example:

right.

> +static void test_noirq_console_unlock(void)
> +{
> +   unsigned long flags;
> +   unsigned long num_messages = 0;
> +
> +   pr_err("=== TEST %s\n", __func__);
> +
> +   num_messages = 0;
> +   console_lock();
> +   while (num_messages++ < max_num_messages)
> +   pr_info("=== %s Append message %lu out of %lu\n",
> +   __func__,
> +   num_messages,
> +   max_num_messages);
> +
> +   local_irq_save(flags);
> +   console_unlock();
> 
> Where in the kernel do we do this?

the funny thing is that we _are going to start doing this_ with
the console_owner hand off enabled.

consider the following case

we have console_lock() from non-atomic context. console_sem owner is
getting preempted, under console_sem. which is totally possible and
happens a lot. in the mean time we have OOM, which can print a lot of
info. by the time console_sem returns back to TASK_RUNNING logbuf
contains some number of pending messages [lets say 10 seconds worth
of printing]. console owner goes to console_unlock(). accidentally
we have printk from IRQ on CPUz. console_owner hands over printing
duty to CPUz. so now we have to print 10 seconds worth of OOM messages
from irq.



CPU0CPU1 ~ CPUx CPUz

console_lock

 << preempted >>


   OOMOOM printouts, lots
  of OOM traces, etc.

  OOM end [progress done].

 << back to RUNNING >>

  console_unlock()

for (;;)

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Steven Rostedt
On Mon, 18 Dec 2017 15:13:53 +0100
Petr Mladek  wrote:

> One question is if we really want to rely on offloading in
> this case. What if this is printed to debug some stalled
> system.

Correct, and this is what I call when debugging hard lockups, and I do
it from NMI. Which the new NMI code prevents all the data I want to
print to come out to console.

I had to create a really huge buffer to print it.

show_state_filter() is not a normal printk() call. It is used for
debugging. Not a very good example of issues that happen on production
systems. If anything, this should be disabled on a production system.

Let's just add my patch (I'll respin it if it needs it), and send it
off into the wild. Let's see if there's still reports of issues, and
then come back to solutions. Because, really, I'm still not convinced
that there's anything out there that needs much more "fixing" of
printk().

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Petr Mladek
On Mon 2017-12-18 22:39:48, Sergey Senozhatsky wrote:
> On (12/18/17 14:31), Petr Mladek wrote:
> > On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > > On (12/15/17 10:08), Petr Mladek wrote:
> > > 1) it opens both soft and hard lockup vectors
> > > 
> > >I see *a lot* of cases when CPU that call printk in a loop does not
> > >end up flushing its messages. And the problem seems to be - preemption.
> > > 
> > > 
> > >   CPU0CPU1
> > > 
> > >   for_each_process_thread(g, p)
> > > printk()
> > 
> > You print one message for each process in a tight loop.
> > Is there a code like this?
> 
> um... show_state() -> show_state_filter()?
> which prints million times more info than a single line per-PID.

Good example. Heh, it already somehow deals with this:

void show_state_filter(unsigned long state_filter)
{
struct task_struct *g, *p;

rcu_read_lock();
for_each_process_thread(g, p) {
/*
 * reset the NMI-timeout, listing all files on a slow
 * console might take a lot of time:
 * Also, reset softlockup watchdogs on all CPUs, because
 * another CPU might be blocked waiting for us to process
 * an IPI.
 */
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();
if (state_filter_match(state_filter, p))
sched_show_task(p);


One question is if we really want to rely on offloading in
this case. What if this is printed to debug some stalled
system.

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Petr Mladek
On Mon 2017-12-18 14:31:01, Petr Mladek wrote:
> On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > -  it has a significantly worse behaviour compared to old async printk.
> > -  it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
> >   for a long time.
> > -  it timeouts user space apps.
> > -  it re-introduces the lockup vector, by passing console_sem ownership
> >   to atomic tasks.
> 
> All this is in the current upstream code as well. Steven's patch
> should make it better in compare with the current upstream code.
> 
> Sure, the printk offload approach does all these things better.
> But there is still the fear that the offload is not reliable
> in all situations. The lazy offload should handle this well from
> my POV but I am not 100% sure.

BTW: There is one interesting thing. If we rely on the kthread
to handle flood of messages. It might be too slow because it
reschedules. It might cause loosing messages. Note that
the kthread should have rather normal priority to avoid
blocking other processes.

>From this point of view. We get the messages out much faster
in an atomic context. The question is if we want to sacrifice
the atomic context of a random process for this.

Just an idea. The handshake + throttling big sinners might
help to balance the load over the biggest sinners (printk
users).

The nice think about the hand-shake + throttling  based approach
is that it might be built step by step. Each step would just
make things better. While the offloading is rather a big
revolution. We already have many extra patches to avoid
regressions in the reliability.

One thing is that we play with offloading for years. The handshake
might have problems that we just do not know about at the moment.
But it definitely has some interesting aspects.

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Petr Mladek
On Mon 2017-12-18 19:36:24, Sergey Senozhatsky wrote:
> it takes call_console_drivers() 0.01+ of a second to print some of
> the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
> and fully blame call_console_drivers()]. so vprintk_emit() seems to be
> gazillion times faster and i_do_printks can add tons of messages while
> some other process prints just one.
> 
> to be more precise, I see from the traces that i_do_printks can add 1100
> messages to the logbuf while call_console_drivers() prints just one.

This sounds interesting.

A solution would be to add some "simple" throttling. We could add
a per-CPU or per-process counter that would count the number
of lines added while other CPU is processing one line.

The counter would be incremented only when the CPU is actively
printing.

One question is how to clear the counter. One possibility
would be to limit it to one scheduling period or so.

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 14:31), Petr Mladek wrote:
> On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> > On (12/15/17 10:08), Petr Mladek wrote:
> > 1) it opens both soft and hard lockup vectors
> > 
> >I see *a lot* of cases when CPU that call printk in a loop does not
> >end up flushing its messages. And the problem seems to be - preemption.
> > 
> > 
> >   CPU0  CPU1
> > 
> >   for_each_process_thread(g, p)
> > printk()
> 
> You print one message for each process in a tight loop.
> Is there a code like this?

um... show_state() -> show_state_filter()?
which prints million times more info than a single line per-PID.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Petr Mladek
On Mon 2017-12-18 18:36:15, Sergey Senozhatsky wrote:
> On (12/15/17 10:08), Petr Mladek wrote:
> 1) it opens both soft and hard lockup vectors
> 
>I see *a lot* of cases when CPU that call printk in a loop does not
>end up flushing its messages. And the problem seems to be - preemption.
> 
> 
>   CPU0CPU1
> 
>   for_each_process_thread(g, p)
> printk()

You print one message for each process in a tight loop.
Is there a code like this?

I think that more realistic is to print a message for
each CPU. But even in this case, the messages are usually
written by each CPU separately and thus less synchronized.

One exception might be the backtraces of all CPUs. These
are printed in NMI and flushed synchronously from a
single CPU. But they were recently optimized by
not printink idle threads.


> > Why did you completely ignored the paragraph about step by step
> > approach? Is there anything wrong about it?
> 
> frankly, I don't even understand what our plan is. I don't see how are
> we going to verify the patch. over the last 3 years, how many emails
> you have from facebook or samsung or linaro, or any other company
> reporting the printk-related lockups? I don't have that many in my gmail
> inbox, to be honest. and this is not because there were no printk lockups
> observed. this is because people usually don't report those issues to
> the upstream community. especially vendors that use "outdated" LTS
> kernels, which are approx 1-2 year(s) behind the mainline. and I don't see
> why it should be different this time. it will take years before vendors
> pick the next LTS kernel, which will have that patch in it. but the really
> big problem here is that we already know that the patch has some problems.
> are we going to conclude that "no emails === no problems"? with all my
> respect, it does seem like, in the grand scheme of things, we are going
> to do the same thing, yet expecting different result. that's my worry.

The patches for offloading printk console work are floating around
about 5 years and nothing went upstream. Therefore 2 more years look
acceptable if we actually could make thigs better a bit now. Well, I
believe that we would know the usefulness earlier than this.


> > You are looking for a perfect solution.
> 
> gentlemen, once again, you really can name it whatever you like.
> the bottom line is [speaking for myself only]:
> 
> -  the patch does not fit my needs.
> -  it does not address the issues I'm dealing with.

I am still missing some real life reproducer of the problems.

> -  it has a significantly worse behaviour compared to old async printk.
> -  it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
>   for a long time.
> -  it timeouts user space apps.
> -  it re-introduces the lockup vector, by passing console_sem ownership
>   to atomic tasks.

All this is in the current upstream code as well. Steven's patch
should make it better in compare with the current upstream code.

Sure, the printk offload approach does all these things better.
But there is still the fear that the offload is not reliable
in all situations. The lazy offload should handle this well from
my POV but I am not 100% sure.

If we have hard data (real life reproducers) in hand, we could
start pushing the offloading again.


> -  it doubles the amount of time CPU spins with local IRQs disabled in
>   the worst case.

It happens only during the hand shake. And the other CPU waits only
until a single line is flushed. It is much less that a single CPU
might spend flushing lines.


> -  I probably need to run more tests [I didn't execute any OOM tests, etc.],
>   but, preliminary, I can't see Samsung using this patch.

Does Samsung already use some offload implementation?

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 19:36), Sergey Senozhatsky wrote:
[..]
> it takes call_console_drivers() 0.01+ of a second to print some of
> the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
> and fully blame call_console_drivers()]. so vprintk_emit() seems to be
> gazillion times faster and i_do_printks can add tons of messages while
> some other process prints just one.
> 
> 
> to be more precise, I see from the traces that i_do_printks can add 1100
> messages to the logbuf while call_console_drivers() prints just one.
> 
> 
> systemd-udevd-671 owns the lock. sets the console_owner. i_do_printks
> keeps adding printks. there kworker/0:1-135 that just ahead of
> i_do_printks-1992 and registers itself as the console_sem waiter.
> 
>systemd-udevd-671   [003] d..366.334866: offloading: set console_owner 
>  :0
>  kworker/0:1-135   [000] d..266.335999: offloading: 
> vprintk_emit()->trylock FAIL  will spin? :1
> i_do_printks-1992  [002] d..266.345474: offloading: 
> vprintk_emit()->trylock FAIL  will spin? :0x 1100
>...
>systemd-udevd-671   [003] d..366.345917: offloading: cleaar 
> console_owner  waiter != NULL :1
> 
> 
> i_do_printks-1992 finishes printing [it does limited number of printks],
> it does not compete for console_sem anymore, so those are other tasks
> that have to flush pending messages stored by i_do_printks-1992   :(

even in this case the task that took over printing had to flush logbuf
messages worth of 1100 x 0.01s == 10+ seconds of printing. which is
enough to cause problem. if there are 2200 messages in logbuf, then
there will be 2200 x 0.01 == 20+ seconds of printing. if the task is
atomic, then you probably can imagine what will happen. numbers don't lie.
if we have enough tasks competing for console_sem then the tasks that
actually fills up the logbuf buffer may never call_console_drivers().
so the lockups are still very much possible. *in my particular case*

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/18/17 18:36), Sergey Senozhatsky wrote:
[..]
>I see *a lot* of cases when CPU that call printk in a loop does not
>end up flushing its messages. And the problem seems to be - preemption.
> 
> 
>   CPU0CPU1
> 
>   for_each_process_thread(g, p)
> printk()
> console_unlock()  printk
>console_trylock() fails
> sets console_owner
>sees console_owner
>sets console_waiter
> call_console_drivers
> clears console_owner
> sees console_waiter
> hand off   spins with local_irq disabled
>sees that it has acquired 
> console_sem ownership
> 
>enables local_irq
> printk
> .. << preemption >>
> printk
> ... unbound number of printk-s
> printk
> ..
> printk
>   back to TASK_RUNNING
>   goes to console_unlock()
> printk
>   local_irq_save
> 
> ???
>   *** if it will see 
> console_waiter [being in any
>   context] it will hand off. 
> otherwise, preemption
>   again and CPU0 can add even 
> more messages to logbuf
> 
>   local_irq_restore
> 
>   << preemption >>


hm... adding preemption_disable()/preemption_enable() to vprintk_emit()
does not fix the issues on my side.

preemption_disable();
...
if (console_trylock()) {
console_unlock();
} else {

// console_owner check and loop if needed
// console_unlock();
}
preemption_enable();


the root cause seems to be the fact that log_store() is significantly
faster than msg_print_text() + call_console_drivers().


looking at this

   systemd-udevd-671   [002] dn.366.736432: offloading: set console_owner  
:0
   systemd-udevd-671   [002] dn.366.749927: offloading: clear console_owner 
 waiter != NULL :0
   systemd-udevd-671   [002] dn.366.749931: offloading: set console_owner  
:0
   systemd-udevd-671   [002] dn.366.763426: offloading: clear console_owner 
 waiter != NULL :0
   systemd-udevd-671   [002] dn.366.763430: offloading: set console_owner  
:0
   systemd-udevd-671   [002] dn.366.776925: offloading: clear console_owner 
 waiter != NULL :0

which is this thing

len += msg_print_text(msg, console_prev, false,
  text + len, sizeof(text) - len);
console_idx = log_next(console_idx);
console_seq++;
console_prev = msg->flags;
raw_spin_unlock(&logbuf_lock);

/*
 * While actively printing out messages, if another printk()
 * were to occur on another CPU, it may wait for this one to
 * finish. This task can not be preempted if there is a
 * waiter waiting to take over.
 */
raw_spin_lock(&console_owner_lock);
console_owner = current;
raw_spin_unlock(&console_owner_lock);

+trace_offloading("set console_owner", " ", 0);

/* The waiter may spin on us after setting console_owner */
spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);

stop_critical_timings();/* don't trace print latency */
call_console_drivers(level, text, len);
start_critical_timings();

raw_spin_lock(&console_owner_lock);
waiter = READ_ONCE(console_waiter);
console_owner = NULL;
raw_spin_unlock(&console_owner_lock);

+trace_offloading("clear console_owner",
+" waiter != NULL ", !!waiter);


it takes call_console_drivers() 0.01+ of a second to print some of
the messages [I think we can ignore raw_spin_lock(&console_owner_lock)
and fully blame call_console_drivers()]. so vprintk_emit() seems to be
gazillion times faster and i_do_printks can add tons of messages while
some other process prints just one.


to be more precise, I see from the traces that i_do_printks can add 1100
messages to the logbuf while call_console_drivers() prints just one.


systemd-udevd-671 owns the lock. sets the console_owner. i_do_printks
keeps adding printks. there kworker/0:1-135 that just ahead of
i_do_printks-1992 and regi

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-18 Thread Sergey Senozhatsky
On (12/15/17 10:08), Petr Mladek wrote:
[..]
> > > Is the above scenario really dangerous? console_lock() owner is
> > > able to sleep. Therefore there is no risk of a softlockup.
> > > 
> > > Sure, many messages will get stacked in the meantime and the console
> > > owner my get then passed to another owner in atomic context. But
> > > do you really see this in the real life?
> > 
> > console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> > sleeping waiting for the console_sem. while atomic printk CPUs just hand
> > off console_sem ownership to each other without ever up()-ing the 
> > console_sem.
> > what's the point of hand off here? how is that going to work?
> > 
> > what we need to do is to offload printing from atomic contexts to a
> > non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> > on the console_sem, ready to continue printing. but it never gets its
> > chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> > around, resulting in the same "we print from atomic context" thing.
> 
> Yes, I understand the scenario. The question is how much it is
> realistic.

so it's unlikely to have several CPUs on an SMP system printk-ing
from atomic contexts, while one of the available CPUs does console_lock()
or printk() from non-atomic context?


[..]
> > which is not a justification. we are not looking for a solution that
> > does not make the things worse. we are looking for a solution that
> > does improve the things.
> 
> But it does improve things! The question is if it is enough or not
> in the real life.

console_unlock() is still unbound.


"spreading the printk load" between N CPUs is just 50% of the actual
problem.

console_unlock() has several sides being involved: one is doing the
print out, the other one is sleeping on console_sem waiting for the
first one to up() the console_sem. yes, CPUs that do printk will now
hand over console_sem to each other, but up() still does not happen
for unknown period of time. so all of the processes that sleep in
TASK_UNINTERRUPTIBLE on console_sem - user space process, tty, pty,
drm, frame buffers, PM, etc. etc. - still have unbound TASK_UNINTERRUPTIBLE
sleep. we've been talking about it for years. but there are more
issues.


> Do you see a scenario where it makes things statistically or
> even deterministically worse?

I really don't have that much time right now; but I did a very quick
test on one of my boards today.

NOTE: speaking for my myself only and about my observations only.
  you are free call it unrealistic, impossible, silly, etc.

One more NOTE:
  The board I'm playing with has immediate printk offloading enabled.
  We figured out it's better to have it enabled rather than not, after
  all. It makes a huge difference.

And another NOTE:
  I did NOT alter my setup; I did NOT perform the stupid printk-flood
  type of test (while (1) printk()). I used the "tools" we are using here
  every day, which don't really abuse printk.



// UPD... OK... I actually ended up spending several hours on it, much more
// than I planned. because... I was quite puzzled. I double... no... triple
// checked the backport. It's exactly what v4 posted by Steven does - modulo
// printk_safe stuff [I don't have it backported].


Back to the console_owner patch.

1) it opens both soft and hard lockup vectors

   I see *a lot* of cases when CPU that call printk in a loop does not
   end up flushing its messages. And the problem seems to be - preemption.


  CPU0  CPU1

  for_each_process_thread(g, p)
printk()
console_unlock()printk
 console_trylock() fails
sets console_owner
 sees console_owner
 sets console_waiter
call_console_drivers
clears console_owner
sees console_waiter
hand off spins with local_irq disabled
 sees that it has acquired 
console_sem ownership

 enables local_irq
printk
..   << preemption >>
printk
... unbound number of printk-s
printk
..
printk
back to TASK_RUNNING
goes to console_unlock()
printk
local_irq_save

???
*** if it will see 
console_waiter [being in any
context] it will hand off. 
otherwise, preemption
again and CPU0 can add even 
more messages to logbuf

local_irq_restore

 

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Steven Rostedt
On Thu, 14 Dec 2017 23:39:36 +0900
Sergey Senozhatsky  wrote:

> On (12/14/17 15:27), Petr Mladek wrote:
> >
> > Therefore I tend to give Steven's solution a chance before this
> > combined approach.
> >   
> 
> have you seen this https://marc.info/?l=linux-kernel&m=151015850209859
> or this https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> or this https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> 

And these are all just still hand waving. Can we please apply my patch
and send it off into the wild. It doesn't not make things worse, but
does make things better.

My patch probably would have kept this from being a problem:

 
http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Steven Rostedt
On Fri, 15 Dec 2017 10:08:01 +0100
Petr Mladek  wrote:

> You are looking for a perfect solution. But there is no perfect
> solution.

"Perfection is the enemy of 'good enough'" :-)

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Steven Rostedt
On Fri, 15 Dec 2017 09:31:51 +0100
Petr Mladek  wrote:

> Do people have issues with the current upstream printk() or
> still even with Steven's patch?
> 
> My current view is that Steven's patch could not make things
> worse. I was afraid of possible deadlock but it seems that I was
> wrong. Other than that the patch should make things just better
> because it allows to pass the work from time to time a safe way.
> 
> Of course, there is a chance that it will pass the work from
> a safe context to atomic one. But there was the same chance that
> the work already started in the atomic context. Therefore statistically
> this should not make things worse.
> 
> This is why I suggest to start with Steven's solution. If people
> would still see problems in the real life then we could think
> about how to fix it. It is quite likely that we would need to add
> offloading to the kthreads in the end but there is a chance...
> 
> In each case, I think that is better to split in into
> two or even more steps than introducing one mega-complex
> change. And given the many-years resistance against offloading
> I tend to start with Steven's approach.

THANK YOU!!!

This is exactly what I'm trying to convey.

> 
> Does this make some sense, please?

It definitely does to me :-)

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Steven Rostedt
On Fri, 15 Dec 2017 15:52:05 +0900
Sergey Senozhatsky  wrote:

> On (12/15/17 14:06), Sergey Senozhatsky wrote:
> [..]
> > > Where do we do the above? And has this been proven to be an issue?  
> > 
> > um... hundreds of cases.
> > 
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the 
> > configuration).  
> 
> 
> and, Steven, one more thing. wondering what's your opinion.
> 
> 
> suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
> printk-s and several atomic CPUs doing printk-s. Is proposed hand off
> scheme really useful in this case? CPUs will now
> 
> a) print their lines (a potentially slow call_console_drivers())

My question is, is this an actual problem. Because we haven't fully
evaluated if my patch is enough to make this a non issue.

> 
> and
> 
> b) spin in vprintk_emit on console_owner with local IRQs disabled
>waiting for either non-atomic printk CPU or another atomic CPU
>to finish printing its line (call_console_drivers()) and to hand
>off printing. so current CPU, after busy-waiting for foreign CPU's
>call_console_drivers(), will go and do his own call_console_drivers().
>which, time-wise, simply doubles (roughly) the amount of time that
>CPU spends in printk()->console_unlock(). agreed?

Worse case is doubling, if the two printks happen at the same time.

Today, it is unbounded, which means my patch is much better than what
we do today, and hence we still don't know if it is "good enough". This
is where I say you are trying to solve another problem than what we are
encountering today. Because we don't know if the problem is due to the
unbounded nature of printk, or just a slight longer (but bounded) time.

> 
>if we previously could have a case when non-atomic printk CPU would
>grab the console_sem and print all atomic printk CPUs messages first,
>and then its own messages, thus atomic printk CPUs would have just
>log_store(), now we will have CPUs to call_console_driver() and to
>spin on console_sem owner waiting for call_console_driver() on a foreign
>CPU  [not all of them: it's one CPU doing the print out and one CPU
>spinning console_owner. but overall I think all CPUs will experience
>that spin on console_sem waiting for call_console_driver() and then do
>its own call_console_driver()].
> 
> 
> even two CPUs case is not so simple anymore. see below.
> 
> - first, assume one CPU is atomic and one is non-atomic.
> - second, assume that both CPUs are atomic CPUs, and go thought it again.
> 
> 
> CPU0CPU1
> 
> printk()printk()
>  log_store()
>  log_store()
>  console_unlock()
>   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
>   call_console_drivers()
>   sees console_waiter
>break
> 
> printk()
>  log_store()
>  console_unlock()
>   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
>  call_console_drivers()
>  sees console_waiter
>   break
> 
> printk()
>  log_store()
>  console_unlock()
>   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
>   call_console_drivers()
>   sees console_waiter
>   break
> 
> printk()
>  log_store()
>  console_unlock()
>   set console_owner
>  sees console_owner
>  sets console_waiter
>  spin
> 
> 
> 
> 
> that "wait for call_console_drivers() on another CPU and then do
> its own call_console_drivers()" pattern does look dangerous. the
> benefit of hand-off is really fragile sometimes, isn't it?

Actually, I see your scenario as a perfect example of my patch working
well. You have two CPUs spamming the console with printks, and instead
of one CPU stuck doing nothing but outputting both CPUs work, the two
share the load.

Again, I'm not convinced that all the issues you have encountered so
far can not be solved with my patch. I would like to see a real example
of where it fails. Lets not make this a theoretical approach, lets do
it incrementally and solve one problem at a time.

I know fo

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Steven Rostedt
On Fri, 15 Dec 2017 14:06:07 +0900
Sergey Senozhatsky  wrote:

> Hello,
> 
> On (12/14/17 22:18), Steven Rostedt wrote:
> > > Steven, your approach works ONLY when we have the following preconditions:
> > > 
> > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > etc) context
> > > 
> > > what does guarantee that? what happens if there is NO non-atomic
> > > CPU or that non-atomic simplky missses the console_owner != false
> > > point? we are going to conclude
> > > 
> > > "if printk() doesn't work for you, it's because you are holding 
> > > it wrong"?
> > > 
> > > 
> > > what if that non-atomic CPU does not call printk(), but instead
> > > it does console_lock()/console_unlock()? why there is no handoff?

The case here, you are talking about a CPU doing console_lock() from a
non printk() case. Which is what I was asking about how often this
happens.

As for why there's no handoff. Does the non printk()
console_lock/unlock ever happen from a critical location? I don't think
it does (but I haven't checked). Then it is the perfect candidate to do
all the printing.


> > > 
> > > CPU0  CPU1 ~ CPU10
> > >   in atomic contexts [!]. ping-ponging 
> > > console_sem
> > >   ownership to each other. while what 
> > > they really
> > >   need to do is to simply up() and let 
> > > CPU0 to
> > >   handle it.
> > >   printk
> > >   console_lock()
> > >schedule()
> > >   ...
> > >   printk
> > >   printk
> > >   ...
> > >   printk
> > >   printk
> > > 
> > >   up()
> > > 
> > >   // woken up
> > >   console_unlock()
> > > 
> > > why do we make an emphasis on fixing vprintk_printk()?  
> > 
> > Where do we do the above? And has this been proven to be an issue?  
> 
> um... hundreds of cases.

I was asking about doing the console_unlock from not printk case.

> 
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).
> 

Is the above showing an issue of console_lock happening in the non
printk() case?

> 
> do we periodically do console_lock() on a running system? yes, we do.
> add to console_unlock()

Right, and the non printk() console_lock() should be fine to do all
printing when it unlocks.

> 
> ---
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b9006617710f..1c811f6d94bf 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2143,6 +2143,10 @@ void console_unlock(void)
> bool wake_klogd = false;
> bool do_cond_resched, retry;
>  
> +   if (!(current->flags & PF_KTHREAD))
> +   dump_stack();
> +
> +
> if (console_suspended) {
> up_console_sem();
> return;
> 
> ---
> 
> and just boot the system.
> 
> 
> I work for a company that has several thousand engineers spread
> across the globe. and people do use printk(), and issues do happen.

Sure, and I still think my patch is good enough.

> 
> the scenarios that Tejun and I talk about are not theoretical. if those
> scenarios are completely theoretical, as you suggest, - then, OK, what
> exactly guarantees that

And I still think my patch is good enough.

> 
>   whenever atomic CPUs printk there is always a non-atomic
>   CPU to take over the printing?

No, and I don't think it has to.

> 
> 
> 
> > >  b) non-atomic CPU sees console_owner set (which is set for a very short
> > > period of time)
> > > 
> > > again. what if that non-atomic CPU does not see console_owner?
> > > "don't use printk()"?  
> > 
> > May I ask, why are we doing the printk in the first place?  
> 
> this argument is really may be applied against your patch as well. I
> really don't want us to have this type of "technical" discussion.

Sure, but my patch fixes the unfair approach that printk currently does.

> 
> printk() is a tool for developers. but developers can't use.
> 
> 
> > >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> > > console_owner is set.  
> > 
> > I haven't looked at the latest code, but my last patch didn't care
> > about "atomic" and 

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Petr Mladek
On Fri 2017-12-15 17:42:36, Sergey Senozhatsky wrote:
> On (12/15/17 09:31), Petr Mladek wrote:
> > > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > > Steven, your approach works ONLY when we have the following 
> > > > > preconditions:
> > > > > 
> > > > >  a) there is a CPU that is calling printk() from the 'safe' 
> > > > > (non-atomic,
> > > > > etc) context
> > > > > 
> > > > > what does guarantee that? what happens if there is NO 
> > > > > non-atomic
> > > > > CPU or that non-atomic simplky missses the console_owner != 
> > > > > false
> > > > > point? we are going to conclude
> > > > > 
> > > > > "if printk() doesn't work for you, it's because you are 
> > > > > holding it wrong"?
> > > > > 
> > > > > 
> > > > > what if that non-atomic CPU does not call printk(), but 
> > > > > instead
> > > > > it does console_lock()/console_unlock()? why there is no 
> > > > > handoff?
> > > > > 
> > > > > CPU0  CPU1 ~ CPU10
> > > > >   in atomic contexts [!]. 
> > > > > ping-ponging console_sem
> > > > >   ownership to each other. while 
> > > > > what they really
> > > > >   need to do is to simply up() 
> > > > > and let CPU0 to
> > > > >   handle it.
> > > > >   printk
> > > > >   console_lock()
> > > > >schedule()
> > > > >   ...
> > > > >   printk
> > > > >   printk
> > > > >   ...
> > > > >   printk
> > > > >   printk
> > > > > 
> > > > >   up()
> > > > > 
> > > > >   // woken up
> > > > >   console_unlock()
> > > > > 
> > > > > why do we make an emphasis on fixing vprintk_printk()?
> > 
> > Is the above scenario really dangerous? console_lock() owner is
> > able to sleep. Therefore there is no risk of a softlockup.
> > 
> > Sure, many messages will get stacked in the meantime and the console
> > owner my get then passed to another owner in atomic context. But
> > do you really see this in the real life?
> 
> console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> sleeping waiting for the console_sem. while atomic printk CPUs just hand
> off console_sem ownership to each other without ever up()-ing the console_sem.
> what's the point of hand off here? how is that going to work?
> 
> what we need to do is to offload printing from atomic contexts to a
> non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> on the console_sem, ready to continue printing. but it never gets its
> chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> around, resulting in the same "we print from atomic context" thing.

Yes, I understand the scenario. The question is how much it is
realistic. And if it is realistic, the question is if the Steven's
patch helps to avoid the softlockup or not.

IMHO, what we need is to push the patch into wild and wait for
real life reports.


> > Of course, there is a chance that it will pass the work from
> > a safe context to atomic one. But there was the same chance that
> > the work already started in the atomic context. Therefore statistically
> > this should not make things worse.
> 
> which is not a justification. we are not looking for a solution that
> does not make the things worse. we are looking for a solution that
> does improve the things.

But it does improve things! The question is if it is enough or not
in the real life.

Do you see a scenario where it makes things statistically or
even deterministically worse?

Why did you completely ignored the paragraph about step by step
approach? Is there anything wrong about it?


You are looking for a perfect solution. But there is no perfect
solution. There still will be conflict between the user requirements:
"get messages out" vs. "do not lockup the machine".

The nice thing about Steven's solution is that it slightly improves
one side and does not make worse the other side. Or am I wrong?
Sure, it is possible that it will not be enough. But why not try
this small step first?

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Sergey Senozhatsky
On (12/15/17 09:31), Petr Mladek wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following 
> > > > preconditions:
> > > > 
> > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > > etc) context
> > > > 
> > > > what does guarantee that? what happens if there is NO non-atomic
> > > > CPU or that non-atomic simplky missses the console_owner != 
> > > > false
> > > > point? we are going to conclude
> > > > 
> > > > "if printk() doesn't work for you, it's because you are holding 
> > > > it wrong"?
> > > > 
> > > > 
> > > > what if that non-atomic CPU does not call printk(), but instead
> > > > it does console_lock()/console_unlock()? why there is no 
> > > > handoff?
> > > > 
> > > > CPU0CPU1 ~ CPU10
> > > > in atomic contexts [!]. 
> > > > ping-ponging console_sem
> > > > ownership to each other. while 
> > > > what they really
> > > > need to do is to simply up() 
> > > > and let CPU0 to
> > > > handle it.
> > > > printk
> > > > console_lock()
> > > >  schedule()
> > > > ...
> > > > printk
> > > > printk
> > > > ...
> > > > printk
> > > > printk
> > > > 
> > > > up()
> > > > 
> > > > // woken up
> > > > console_unlock()
> > > > 
> > > > why do we make an emphasis on fixing vprintk_printk()?
> 
> Is the above scenario really dangerous? console_lock() owner is
> able to sleep. Therefore there is no risk of a softlockup.
> 
> Sure, many messages will get stacked in the meantime and the console
> owner my get then passed to another owner in atomic context. But
> do you really see this in the real life?

console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
sleeping waiting for the console_sem. while atomic printk CPUs just hand
off console_sem ownership to each other without ever up()-ing the console_sem.
what's the point of hand off here? how is that going to work?

what we need to do is to offload printing from atomic contexts to a
non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
on the console_sem, ready to continue printing. but it never gets its
chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
around, resulting in the same "we print from atomic context" thing.

> Of course, there is a chance that it will pass the work from
> a safe context to atomic one. But there was the same chance that
> the work already started in the atomic context. Therefore statistically
> this should not make things worse.

which is not a justification. we are not looking for a solution that
does not make the things worse. we are looking for a solution that
does improve the things.

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-15 Thread Petr Mladek
On Fri 2017-12-15 14:06:07, Sergey Senozhatsky wrote:
> Hello,
> 
> On (12/14/17 22:18), Steven Rostedt wrote:
> > > Steven, your approach works ONLY when we have the following preconditions:
> > > 
> > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > etc) context
> > > 
> > > what does guarantee that? what happens if there is NO non-atomic
> > > CPU or that non-atomic simplky missses the console_owner != false
> > > point? we are going to conclude
> > > 
> > > "if printk() doesn't work for you, it's because you are holding 
> > > it wrong"?
> > > 
> > > 
> > > what if that non-atomic CPU does not call printk(), but instead
> > > it does console_lock()/console_unlock()? why there is no handoff?
> > > 
> > > CPU0  CPU1 ~ CPU10
> > >   in atomic contexts [!]. ping-ponging 
> > > console_sem
> > >   ownership to each other. while what 
> > > they really
> > >   need to do is to simply up() and let 
> > > CPU0 to
> > >   handle it.
> > >   printk
> > >   console_lock()
> > >schedule()
> > >   ...
> > >   printk
> > >   printk
> > >   ...
> > >   printk
> > >   printk
> > > 
> > >   up()
> > > 
> > >   // woken up
> > >   console_unlock()
> > > 
> > > why do we make an emphasis on fixing vprintk_printk()?

Is the above scenario really dangerous? console_lock() owner is
able to sleep. Therefore there is no risk of a softlockup.

Sure, many messages will get stacked in the meantime and the console
owner my get then passed to another owner in atomic context. But
do you really see this in the real life?


> > Where do we do the above? And has this been proven to be an issue?
> 
> um... hundreds of cases.
> 
> I work for a company that has several thousand engineers spread
> across the globe. and people do use printk(), and issues do happen.

Do people have issues with the current upstream printk() or
still even with Steven's patch?

My current view is that Steven's patch could not make things
worse. I was afraid of possible deadlock but it seems that I was
wrong. Other than that the patch should make things just better
because it allows to pass the work from time to time a safe way.

Of course, there is a chance that it will pass the work from
a safe context to atomic one. But there was the same chance that
the work already started in the atomic context. Therefore statistically
this should not make things worse.

This is why I suggest to start with Steven's solution. If people
would still see problems in the real life then we could think
about how to fix it. It is quite likely that we would need to add
offloading to the kthreads in the end but there is a chance...

In each case, I think that is better to split in into
two or even more steps than introducing one mega-complex
change. And given the many-years resistance against offloading
I tend to start with Steven's approach.

Does this make some sense, please?

Best Regards,
Petr


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Sergey Senozhatsky
On (12/15/17 14:06), Sergey Senozhatsky wrote:
[..]
> > Where do we do the above? And has this been proven to be an issue?
> 
> um... hundreds of cases.
> 
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).


and, Steven, one more thing. wondering what's your opinion.


suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
printk-s and several atomic CPUs doing printk-s. Is proposed hand off
scheme really useful in this case? CPUs will now

a) print their lines (a potentially slow call_console_drivers())

and

b) spin in vprintk_emit on console_owner with local IRQs disabled
   waiting for either non-atomic printk CPU or another atomic CPU
   to finish printing its line (call_console_drivers()) and to hand
   off printing. so current CPU, after busy-waiting for foreign CPU's
   call_console_drivers(), will go and do his own call_console_drivers().
   which, time-wise, simply doubles (roughly) the amount of time that
   CPU spends in printk()->console_unlock(). agreed?

   if we previously could have a case when non-atomic printk CPU would
   grab the console_sem and print all atomic printk CPUs messages first,
   and then its own messages, thus atomic printk CPUs would have just
   log_store(), now we will have CPUs to call_console_driver() and to
   spin on console_sem owner waiting for call_console_driver() on a foreign
   CPU  [not all of them: it's one CPU doing the print out and one CPU
   spinning console_owner. but overall I think all CPUs will experience
   that spin on console_sem waiting for call_console_driver() and then do
   its own call_console_driver()].


even two CPUs case is not so simple anymore. see below.

- first, assume one CPU is atomic and one is non-atomic.
- second, assume that both CPUs are atomic CPUs, and go thought it again.


CPU0CPU1

printk()printk()
 log_store()
 log_store()
 console_unlock()
  set console_owner
 sees console_owner
 sets console_waiter
 spin
  call_console_drivers()
  sees console_waiter
   break

printk()
 log_store()
 console_unlock()
  set console_owner
 sees console_owner
 sets console_waiter
 spin
 call_console_drivers()
 sees console_waiter
  break

printk()
 log_store()
 console_unlock()
  set console_owner
 sees console_owner
 sets console_waiter
 spin
  call_console_drivers()
  sees console_waiter
  break

printk()
 log_store()
 console_unlock()
  set console_owner
 sees console_owner
 sets console_waiter
 spin




that "wait for call_console_drivers() on another CPU and then do
its own call_console_drivers()" pattern does look dangerous. the
benefit of hand-off is really fragile sometimes, isn't it?

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Sergey Senozhatsky
Hello,

On (12/14/17 22:18), Steven Rostedt wrote:
> > Steven, your approach works ONLY when we have the following preconditions:
> > 
> >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > etc) context
> > 
> > what does guarantee that? what happens if there is NO non-atomic
> > CPU or that non-atomic simplky missses the console_owner != false
> > point? we are going to conclude
> > 
> > "if printk() doesn't work for you, it's because you are holding it 
> > wrong"?
> > 
> > 
> > what if that non-atomic CPU does not call printk(), but instead
> > it does console_lock()/console_unlock()? why there is no handoff?
> > 
> > CPU0CPU1 ~ CPU10
> > in atomic contexts [!]. ping-ponging 
> > console_sem
> > ownership to each other. while what 
> > they really
> > need to do is to simply up() and let 
> > CPU0 to
> > handle it.
> > printk
> > console_lock()
> >  schedule()
> > ...
> > printk
> > printk
> > ...
> > printk
> > printk
> > 
> > up()
> > 
> > // woken up
> > console_unlock()
> > 
> > why do we make an emphasis on fixing vprintk_printk()?
> 
> Where do we do the above? And has this been proven to be an issue?

um... hundreds of cases.

deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
happening at the same moment + NMI backtraces from all the CPUs (more
than 3 cpus) that follows the lockups, over not-so-fast serial console.
exactly the bug report I received two days ago. so which one of the CPUs
here is a good candidate to successfully emit all of the pending logbuf
entries? none. all of them either have local IRQs disabled, or dump_stack()
from either backtrace IPI or backtrace NMI (depending on the configuration).


do we periodically do console_lock() on a running system? yes, we do.
add to console_unlock()

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9006617710f..1c811f6d94bf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2143,6 +2143,10 @@ void console_unlock(void)
bool wake_klogd = false;
bool do_cond_resched, retry;
 
+   if (!(current->flags & PF_KTHREAD))
+   dump_stack();
+
+
if (console_suspended) {
up_console_sem();
return;

---

and just boot the system.


I work for a company that has several thousand engineers spread
across the globe. and people do use printk(), and issues do happen.

the scenarios that Tejun and I talk about are not theoretical. if those
scenarios are completely theoretical, as you suggest, - then, OK, what
exactly guarantees that

whenever atomic CPUs printk there is always a non-atomic
CPU to take over the printing?



> >  b) non-atomic CPU sees console_owner set (which is set for a very short
> > period of time)
> > 
> > again. what if that non-atomic CPU does not see console_owner?
> > "don't use printk()"?
> 
> May I ask, why are we doing the printk in the first place?

this argument is really may be applied against your patch as well. I
really don't want us to have this type of "technical" discussion.

printk() is a tool for developers. but developers can't use.


> >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> > console_owner is set.
> 
> I haven't looked at the latest code, but my last patch didn't care
> about "atomic" and "non-atomic"

I know. and I think it is sort of a problem.

lots of printk-s are happening from IRQs / softirqs and so on.
take a look at CONFIG_IP_ROUTE_VERBOSE, for example.

do_softirq() -> ip_handle_martian_source() and a bunch of other
places. 
these irq->printk-s can "steal" the console_sem and go to
console_unlock().

"don't use printk() then" type of argument does not really help
to a guy who reports the lockup.


> > Steven, I thought we reached the agreement [**] that the solution we should
> > be working on is a combination of prinkt_kthread and console_sem hand
> > off. Simply because it adds the missing "there is a non-atomic CPU wishing
> > to console_unlock()" thing.
> > 
> > lkml.kernel.org/r/20171108162813.ga983...@devbig577.frc2.facebook.com
> > 
> > https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> > https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> > https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> > https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> > https://marc.info/?l=linux-kernel&m=1510

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Steven Rostedt
On Fri, 15 Dec 2017 11:10:24 +0900
Sergey Senozhatsky  wrote:

> Steven, your approach works ONLY when we have the following preconditions:
> 
>  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> etc) context
> 
> what does guarantee that? what happens if there is NO non-atomic
> CPU or that non-atomic simplky missses the console_owner != false
> point? we are going to conclude
> 
> "if printk() doesn't work for you, it's because you are holding it 
> wrong"?
> 
> 
> what if that non-atomic CPU does not call printk(), but instead
> it does console_lock()/console_unlock()? why there is no handoff?
> 
> CPU0  CPU1 ~ CPU10
>   in atomic contexts [!]. ping-ponging 
> console_sem
>   ownership to each other. while what 
> they really
>   need to do is to simply up() and let 
> CPU0 to
>   handle it.
>   printk
>   console_lock()
>schedule()
>   ...
>   printk
>   printk
>   ...
>   printk
>   printk
> 
>   up()
> 
>   // woken up
>   console_unlock()
> 
> why do we make an emphasis on fixing vprintk_printk()?

Where do we do the above? And has this been proven to be an issue? If
it has, I think it's a separate issue than what I proposed. As what I
proposed is to fix the case where lots of CPUs are doing printks, and
only one actually does the write.

> 
> 
>  b) non-atomic CPU sees console_owner set (which is set for a very short
> period of time)
> 
> again. what if that non-atomic CPU does not see console_owner?
> "don't use printk()"?

May I ask, why are we doing the printk in the first place?

> 
>  c) the task that is looping in console_unlock() sees non-atomic CPU when
> console_owner is set.

I haven't looked at the latest code, but my last patch didn't care
about "atomic" and "non-atomic" issues, because I don't know if that is
indeed an issue in the real world.

> 
> 
> IOW, we need to have
> 
> 
>the right CPU (a) at the very right moment (b && c) doing the very right 
> thing.
> 
> 
>* and the "very right moment" is tiny and additionally depends
>  on a foreign CPU [the one that is looping in console_unlock()].
> 
> 
> 
> a simple question - how is that going to work for everyone? are we
> "fixing" a small fraction of possible use-cases?

Still sounds like you are ;-)

> 
> 
> 
> Steven, I thought we reached the agreement [**] that the solution we should
> be working on is a combination of prinkt_kthread and console_sem hand
> off. Simply because it adds the missing "there is a non-atomic CPU wishing
> to console_unlock()" thing.
> 
>   lkml.kernel.org/r/20171108162813.ga983...@devbig577.frc2.facebook.com
> 
>   https://marc.info/?l=linux-kernel&m=151011840830776&w=2
>   https://marc.info/?l=linux-kernel&m=151015141407368&w=2
>   https://marc.info/?l=linux-kernel&m=151018900919386&w=2
>   https://marc.info/?l=linux-kernel&m=151019815721161&w=2
>   https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> **https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> **https://marc.info/?l=linux-kernel&m=151020565222469&w=2

I'm still fine with the hybrid approach, but I want to see a problem
first before we fix it.

> 
> 
> what am I missing?

The reproducer. Let Tejun do the test with just my patch, and if it
still has problems, then we can add more logic to the code. I like to
take things one step at a time. What I'm seeing is that there was a
problem that could be solved with my solution, but during this process,
people have found hundreds of theoretical problems and started down the
path to solve each of them. I want to see a real bug, before we go down
the path of having to have external threads and such, to solve a bug
that we don't really know exists yet.

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Sergey Senozhatsky
Hello,

On (12/14/17 10:11), Tejun Heo wrote:
> Hey, Steven.
> 
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.
> 
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that.  Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

I second _everything_ that Tejun has said.


Steven, your approach works ONLY when we have the following preconditions:

 a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
etc) context

what does guarantee that? what happens if there is NO non-atomic
CPU or that non-atomic simplky missses the console_owner != false
point? we are going to conclude

"if printk() doesn't work for you, it's because you are holding it 
wrong"?


what if that non-atomic CPU does not call printk(), but instead
it does console_lock()/console_unlock()? why there is no handoff?

CPU0CPU1 ~ CPU10
in atomic contexts [!]. ping-ponging 
console_sem
ownership to each other. while what 
they really
need to do is to simply up() and let 
CPU0 to
handle it.
printk
console_lock()
 schedule()
...
printk
printk
...
printk
printk

up()

// woken up
console_unlock()

why do we make an emphasis on fixing vprintk_printk()?


 b) non-atomic CPU sees console_owner set (which is set for a very short
period of time)

again. what if that non-atomic CPU does not see console_owner?
"don't use printk()"?

 c) the task that is looping in console_unlock() sees non-atomic CPU when
console_owner is set.


IOW, we need to have


   the right CPU (a) at the very right moment (b && c) doing the very right 
thing.


   * and the "very right moment" is tiny and additionally depends
 on a foreign CPU [the one that is looping in console_unlock()].



a simple question - how is that going to work for everyone? are we
"fixing" a small fraction of possible use-cases?



Steven, I thought we reached the agreement [**] that the solution we should
be working on is a combination of prinkt_kthread and console_sem hand
off. Simply because it adds the missing "there is a non-atomic CPU wishing
to console_unlock()" thing.

lkml.kernel.org/r/20171108162813.ga983...@devbig577.frc2.facebook.com

https://marc.info/?l=linux-kernel&m=151011840830776&w=2
https://marc.info/?l=linux-kernel&m=151015141407368&w=2
https://marc.info/?l=linux-kernel&m=151018900919386&w=2
https://marc.info/?l=linux-kernel&m=151019815721161&w=2
https://marc.info/?l=linux-kernel&m=151020275921953&w=2
**  https://marc.info/?l=linux-kernel&m=151020404622181&w=2
**  https://marc.info/?l=linux-kernel&m=151020565222469&w=2


what am I missing?

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Steven Rostedt
On Thu, 14 Dec 2017 10:11:53 -0800
Tejun Heo  wrote:

> Hey, Steven.
> 
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.  
> 
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that.  Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

Can you post the message id of the discussion you are referencing.
Because I've been swamped with other activities and only been skimming
these threads.

Thanks,

-- Steve


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Tejun Heo
Hey, Steven.

On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> Yes! Please create a reproducer, because I still don't believe there is
> one. And it's all hand waving until there's an actual report that we can
> lock up the system with my approach.

Yeah, will do, but out of curiosity, Sergey and I already described
what the root problem was and you didn't really seem to take that.  Is
that because the explanation didn't make sense to you or us
misunderstanding what your code does?

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Steven Rostedt
On Thu, 14 Dec 2017 07:25:51 -0800
Tejun Heo  wrote:

> > 3. Soft-lockups are still theoretically possible with Steven's
> >approach.
> > 
> >But it seems to be quite efficient in many real life scenarios,
> >including Tetsuo's stress testing. Or am I wrong?  
> 
> AFAICS, Steven's approach doesn't fix the livelock that we see quite
> often in the fleet where we don't have a safe context to keep flushing
> messages.  This isn't theoretical at all.  You simply don't have a
> safe context on the cpu to go to.  I said I'd come back with a repro
> case but haven't had a chance to yet.  I'll try to do it before the
> end of the year, but idk this is pretty obvious.

Yes! Please create a reproducer, because I still don't believe there is
one. And it's all hand waving until there's an actual report that we can
lock up the system with my approach.

-- Steve



Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Tejun Heo
Hello, Petr.

On Thu, Dec 14, 2017 at 03:27:09PM +0100, Petr Mladek wrote:
> Ah, I know that it was me who was pessimistic about Steven's approach[1]
> and persuaded you that offloading idea was still alive. But I am less
> sure now.

So, I don't really care which one gets in as long as the livelock
problem is fixed although to my obviously partial eyes the two
alternatives seem overly complex.  That said,

> My three main concerns about Steven's approach were:
> 
> 1. I was afraid that it might introduce new type of deadlocks.
> 
>But it seems that it is quite safe after all.
> 
> 
> 2. Steven's code, implementing the hand shake, is far from trivial.
>Few people were confused and reported false bugs.
> 
>But the basic idea is pretty simple and straightforward. If
>we manage to encapsulate it into few helpers, it might become
>rather self-contained and maintainable. In each case, the needed
>changes are much smaller than I expected.
> 
> 
> 3. Soft-lockups are still theoretically possible with Steven's
>approach.
> 
>But it seems to be quite efficient in many real life scenarios,
>including Tetsuo's stress testing. Or am I wrong?

AFAICS, Steven's approach doesn't fix the livelock that we see quite
often in the fleet where we don't have a safe context to keep flushing
messages.  This isn't theoretical at all.  You simply don't have a
safe context on the cpu to go to.  I said I'd come back with a repro
case but haven't had a chance to yet.  I'll try to do it before the
end of the year, but idk this is pretty obvious.

Thanks.

-- 
tejun


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Sergey Senozhatsky
On (12/14/17 15:27), Petr Mladek wrote:
>
> Therefore I tend to give Steven's solution a chance before this
> combined approach.
> 

have you seen this https://marc.info/?l=linux-kernel&m=151015850209859
or this https://marc.info/?l=linux-kernel&m=151011840830776&w=2
or this https://marc.info/?l=linux-kernel&m=151020275921953&w=2

?

-ss


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-14 Thread Petr Mladek
On Mon 2017-12-04 22:48:13, Sergey Senozhatsky wrote:
> Hello,
> 
>   RFC
> 
>   A new version, yet another rework. Lots of changes, e.g. hand off
> control based on Steven's patch. Another change is that this time around
> we finally have a kernel module to test printk offloading (YAYY!). The
> module tests a bunch use cases; we also have trace printk events to...
> trace offloading.

Ah, I know that it was me who was pessimistic about Steven's approach[1]
and persuaded you that offloading idea was still alive. But I am less
sure now.

My three main concerns about Steven's approach were:

1. I was afraid that it might introduce new type of deadlocks.

   But it seems that it is quite safe after all.


2. Steven's code, implementing the hand shake, is far from trivial.
   Few people were confused and reported false bugs.

   But the basic idea is pretty simple and straightforward. If
   we manage to encapsulate it into few helpers, it might become
   rather self-contained and maintainable. In each case, the needed
   changes are much smaller than I expected.


3. Soft-lockups are still theoretically possible with Steven's
   approach.

   But it seems to be quite efficient in many real life scenarios,
   including Tetsuo's stress testing. Or am I wrong?



Therefore I tend to give Steven's solution a chance before this
combined approach.

In each case, I do not feel comfortable with this combined solution.
I know that it might work much better that the two approaches
alone. But it has the complexity and possible risks of both
implementations. I would prefer to go with smaller steps.

[1] https://lkml.kernel.org/r/20171108102723.60221...@gandalf.local.home


> I'll post the testing script and test module in reply
> to this mail. So... let's have some progress ;) The code is not completely
> awesome, but not tremendously difficult at the same time. We can verify
> the approach/design (we have tests and traces now) first and then start
> improving the code.

The testing module and code is great. Thanks a lot for posting it.

Best Regards,
Petr