Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
Hello,

On Wed, Jan 10, 2018 at 01:22:55PM -0500, Steven Rostedt wrote:
> > Can you please chime in?  Would you be opposed to offloading to an
> > independent context even if it were only for cases where we were
> > already punting?  The thing with the current offloading is that we
> > don't know who we're offloading to.  It might end up in faster or
> > slower context, or more importantly a dangerous one.
> 
> And how is that different to what we have today? It could be the
> "dangerous one" that did the first printk, and 100 other CPUs in "non
> dangerous" locations are constantly calling printk and making that
> "dangerous" one NEVER STOP.

So, the dangerous one would punt to the dedicated safe one beyond
certain point.  The posted version just flushes to the last message
that it saw on entry to flush.

> > The particular case that we've been seeing regularly in the fleet was
> > the following scenario.
> > 
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >netconsole is in use.
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >/ driver tries to allocate memory and then fail, which in turn
> >triggers allocation failure or other warning messages.  printk was
> >already flushing, so the messages are queued on the ring.
> 
> This looks like a bug in the netconsole, as the net console shouldn't
> print warnings if the warning is caused by it doing a print.
> 
> Totally unrelated problem to my and Petr's patch set. Basically your
> argument is "I see this bug, and your patch doesn't fix it". Well maybe
> we are not solving your bug. Not to mention, it looks like printk isn't
> the bug, but net console is.

Sure, that could be the case, especially if punting to a safe context
can't be done reasonably (and there are downsides to silencing the
recursive messages too), but it'd also be really great to have printk
generaly safe from brining down a machine this way, right?  I just
don't yet see why punting to a safe context is so difficult /
undesirable that we can't solve the issue in a general manner.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
Hello, Peter.

On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >/ driver tries to allocate memory and then fail, which in turn
> >triggers allocation failure or other warning messages.  printk was
> >already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >shrinking.  Because OOM handler is trapped in printk flushing, it
> >never manages to free memory and no one else can enter OOM path
> >either, so the system is trapped in this state.
> 
> Why not kill recursive OOM (msgs) ?

Sure, we can do that too, e.g. marking flushing thread and ignoring
new messages from it, although that does come with its own downsides.
The choices are

* If we can make printk safe without much downside, that'd be the best
  option.

* If we decide that we can't do that in a reasonable way, we sure can
  try to plug the identified cases.  We might have to play a bit of
  whack-a-mole (e.g. the feedback loop might not necessarily be from
  the same context) but there likely are very few repeatable cases.

It could be me not knowing the history of the discussion but up until
now the discussion hasn't really gotten to that point since I brought
up the case that we've been seeing.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Steven Rostedt
On Wed, 10 Jan 2018 09:02:23 -0800
Tejun Heo  wrote:

> Hello, Linus, Andrew.
> 
> On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote:
> > Where is the acceptable compromise? I am not sure. So far, the most
> > forceful people (Linus) did not see softlockups as a big problem.
> > They rather wanted to see the messages.  
> 
> Can you please chime in?  Would you be opposed to offloading to an
> independent context even if it were only for cases where we were
> already punting?  The thing with the current offloading is that we
> don't know who we're offloading to.  It might end up in faster or
> slower context, or more importantly a dangerous one.

And how is that different to what we have today? It could be the
"dangerous one" that did the first printk, and 100 other CPUs in "non
dangerous" locations are constantly calling printk and making that
"dangerous" one NEVER STOP.

My solution is, if there are a ton of printks going off, each one will
do a single print, and pass it to the next one. The printk will only be
stuck doing more than one message if no more printks happen. Which is a
good thing!

Again, my algorithm bounds printk to printing AT MOST the printk buffer
size. And that can only happen if there was a burst of printks on all
CPUs, and then no printks. The one to get handed off the printk would
just finish the buffer and continue. Which should not be an issue.

> 
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.

This looks like a bug in the netconsole, as the net console shouldn't
print warnings if the warning is caused by it doing a print.

Totally unrelated problem to my and Petr's patch set. Basically your
argument is "I see this bug, and your patch doesn't fix it". Well maybe
we are not solving your bug. Not to mention, it looks like printk isn't
the bug, but net console is.


> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.
> 
> The system usually never recovers in time once this sort of condition
> hits and the following was the patch that I suggested which only punts
> when messages are already being punted and we can easily make it less
> punty by delaying the punting by N messages.
> 
>  http://lkml.kernel.org/r/20171102135258.go3252...@devbig577.frc2.facebook.com
> 
> We definitely can fix the above described case by e.g. preventing
> printk flushing task from queueing more messages or whatever, but it
> just seems really dumb for the system to die from things like this in
> general and it doesn't really take all that much to trigger the
> condition.

It seems really dumb to not fix that recursive net console bug, and
try to solve it with a printk work around. 

-- Steve



Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Peter Zijlstra
On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>/ driver tries to allocate memory and then fail, which in turn
>triggers allocation failure or other warning messages.  printk was
>already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>shrinking.  Because OOM handler is trapped in printk flushing, it
>never manages to free memory and no one else can enter OOM path
>either, so the system is trapped in this state.

Why not kill recursive OOM (msgs) ?


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
On Wed, Jan 10, 2018 at 10:12:52AM -0800, Tejun Heo wrote:
> Hello, Steven.
> 
> So, everything else on your message, sure.  You do what you have to
> do, but I really don't understand the following part, and this has
> been the main source of frustration in the whole discussion.
> 
> On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:
> > You on the other hand are showing unrealistic scenarios, and crying
> > that it's what you see in production, with no proof of it.
> 
> I've explained the same scenario multiple times.  Unless you're
> assuming that I'm lying, it should be amply clear that the scenario is
> unrealistic - we've been seeing them taking place repeatedly for quite
> a while.

Oops, I meant to write "not unrealistic".  Anyways, if you think I'm
lying, please let me know.  I can ask others who have been seeing the
issue to join the thread.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
Hello, Steven.

So, everything else on your message, sure.  You do what you have to
do, but I really don't understand the following part, and this has
been the main source of frustration in the whole discussion.

On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:
> You on the other hand are showing unrealistic scenarios, and crying
> that it's what you see in production, with no proof of it.

I've explained the same scenario multiple times.  Unless you're
assuming that I'm lying, it should be amply clear that the scenario is
unrealistic - we've been seeing them taking place repeatedly for quite
a while.

What I don't understand is why we can't address this seemingly obvious
problem.  If there are technical reasons and the consensus is to not
solve this within flushing logic, sure, we can deal with it otherwise,
but we at least have to be able to agree that there are actual issues
here, no?

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Steven Rostedt
On Wed, 10 Jan 2018 06:05:47 -0800
Tejun Heo  wrote:

> On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> > This is the last version of Steven's console owner/waiter logic.
> > Plus my proposal to hide it into 3 helper functions. It is supposed
> > to keep the code maintenable.
> > 
> > The handshake really works. It happens about 10-times even during
> > boot of a simple system in qemu with a fast console here. It is
> > definitely able to avoid some softlockups. Let's see if it is
> > enough in practice.
> > 
> > From my point of view, it is ready to go into linux-next so that
> > it can get some more test coverage.
> > 
> > Steven's patch is the v4, see
> > https://lkml.kernel.org/r/20171108102723.60221...@gandalf.local.home  
> 
> At least for now,
> 
>  Nacked-by: Tejun Heo 

And I NACK your NACK!

> 
> Maybe this can be a part of solution but it's really worrying how the
> whole discussion around this subject is proceeding.  You guys are
> trying to railroad actual problems.  Please address actual technical
> problems.

WE ARE!

I presented the issue at Kernel Summit and everyone agreed with me that
the issue my patch solves is a real issue. You have yet to demonstrate
how this does not solve issues.

I presented the history of printk, where it use to serialize all
printks. This was a problem when you had n CPUs doing printks at the
same time, because the n'th CPU had to wait for the n-1 CPUs to print
before it could. This was obviously an issue.

The "solution" to that was to have the first printk do the printing,
and all other printks that come in while it is printing just load their
data into the log buffer and continue. The first printk would get stuck
printing for everyone else. This was fine when we had 4 CPUs, but now
that we have boxes with 100s of CPUs, this is definitely an issue. I
demonstrated that this caused printk() to be unbounded, and there were
real word scenarios that could easily cause a printk to never stop
printing.

My solution is to make printk() have a max bounded time to print. This
is how we solve things in the Real Time world, and it makes perfect
sense in this context. The point being, the max a printk() could
print, and that is if it was really unlucky, which would be really
unlikely because it would mean we had a burst of printks followed by no
printks, the bounded time is what it takes to print the entire buffer.

My solution takes printk from its current unbounded state, and makes it
fixed bounded. Which means printk() is now a O(1) algorithm.

The solution is simple, everyone at KS agreed with it, there should be
no controversy here.

You on the other hand are showing unrealistic scenarios, and crying
that it's what you see in production, with no proof of it.

My printk solution is solid, with no risk of regressions of current
printk usages.

If anything, I'll pull theses patches myself, and push them to Linus
directly. I'll Cc you and you can make your argument to NACK them, and
I'll make mine to take them.

-- Steve


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
Hello, Linus, Andrew.

On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote:
> Where is the acceptable compromise? I am not sure. So far, the most
> forceful people (Linus) did not see softlockups as a big problem.
> They rather wanted to see the messages.

Can you please chime in?  Would you be opposed to offloading to an
independent context even if it were only for cases where we were
already punting?  The thing with the current offloading is that we
don't know who we're offloading to.  It might end up in faster or
slower context, or more importantly a dangerous one.

The particular case that we've been seeing regularly in the fleet was
the following scenario.

1. Console is IPMI emulated serial console.  Super slow.  Also
   netconsole is in use.
2. System runs out of memory, OOM triggers.
3. OOM handler is printing out OOM debug info.
4. While trying to emit the messages for netconsole, the network stack
   / driver tries to allocate memory and then fail, which in turn
   triggers allocation failure or other warning messages.  printk was
   already flushing, so the messages are queued on the ring.
5. OOM handler keeps flushing but 4 repeats and the queue is never
   shrinking.  Because OOM handler is trapped in printk flushing, it
   never manages to free memory and no one else can enter OOM path
   either, so the system is trapped in this state.

The system usually never recovers in time once this sort of condition
hits and the following was the patch that I suggested which only punts
when messages are already being punted and we can easily make it less
punty by delaying the punting by N messages.

 http://lkml.kernel.org/r/20171102135258.go3252...@devbig577.frc2.facebook.com

We definitely can fix the above described case by e.g. preventing
printk flushing task from queueing more messages or whatever, but it
just seems really dumb for the system to die from things like this in
general and it doesn't really take all that much to trigger the
condition.

Thanks.

-- 
tejun


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Petr Mladek
On Wed 2018-01-10 06:05:47, Tejun Heo wrote:
> On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> > This is the last version of Steven's console owner/waiter logic.
> > Plus my proposal to hide it into 3 helper functions. It is supposed
> > to keep the code maintenable.
> > 
> > The handshake really works. It happens about 10-times even during
> > boot of a simple system in qemu with a fast console here. It is
> > definitely able to avoid some softlockups. Let's see if it is
> > enough in practice.
> > 
> > From my point of view, it is ready to go into linux-next so that
> > it can get some more test coverage.
> > 
> > Steven's patch is the v4, see
> > https://lkml.kernel.org/r/20171108102723.60221...@gandalf.local.home
> 
> At least for now,
> 
>  Nacked-by: Tejun Heo 
> 
> Maybe this can be a part of solution but it's really worrying how the
> whole discussion around this subject is proceeding.  You guys are
> trying to railroad actual problems.  Please address actual technical
> problems.

I wonder how long you follow the discussions about solving this
problem. I was able to find one old solution from Jan Kara that
was sent on January 15, 2013. You might google it by
"[PATCH] printk: Avoid softlockups in console_unlock()". For example,
it is archived at
http://linux-kernel.2935.n7.nabble.com/PATCH-printk-Avoid-softlockups-in-console-unlock-td581957.html

The historic Jan Kara's solution is actually very similar to your proposal at
https://lkml.kernel.org/r/20171102135258.go3252...@devbig577.frc2.facebook.com

Why Jan Kara's Solution was not accepted?
Was it because he was not trying enough?

No, Jan provided several variants (based on workqueues, irqwork,
kthread), for example
https://lkml.kernel.org/r/1395770101-24534-1-git-send-email-j...@suse.cz
Also he discussed this on conferences, etc.

Later Jan handed over the fight to Sergey Senozhatsky, see
https://lkml.kernel.org/r/1457175338-1665-1-git-send-email-sergey.senozhat...@gmail.com

Also Sergey was very active. He was addressing many issues, discussed
this on Kernel Summit twice.

Why is it not upstream?

All attempts up to v12 were blocked by someone (Andrew, Linus,
Pavel Machek, few others) because they did not guarantee enough
that the kthread would wake up and they would be able to see
the messages!

Sergey tried to address this by forcing synchronous mode in
some situations (panic, suspend, kexec, ...). But people
still complained.

One important milestone was v12, see
https://lkml.kernel.org/r/20160513131848.2087-1-sergey.senozhat...@gmail.com
It was the last version where we did the offload immediately from
vprintk_emit().

The next versions used lazy offload from console_unlock() when
the thread spent there too much time. IMHO, this is one
very promising solution. It guarantees that softlockup
would never happen. But it tries hard to get the messages
out immediately.

Unfortunately, it is very complicated. We have troubles to understand
the concerns, for example see the long discussion about v3 at
https://lkml.kernel.org/r/20170509082859.854-1-sergey.senozhat...@gmail.com
I admit that I did not have enough time to review this.


Anyway, in October, 2017, Steven came up with a completely
different approach (console owner/waiter transfer). It does
not guarantee that the softlockup will not happen. But it
does not suffer from the problem that blocked the obvious
solution for years. It moves the owner at runtime, so
it is guaranteed that the new owner would continue
printing.



Finally, no solution is perfect! There are contradicting requirements
on printk:

get the messages out ASAP
   vs.
do not block the system

The harder you try to get the messages out the more you could block
the entire system.

Where is the acceptable compromise? I am not sure. So far, the most
forceful people (Linus) did not see softlockups as a big problem.
They rather wanted to see the messages.


What could we do?

   + offload  -> not acceptable so far
   + lazy offload -> might be acceptable if done more easily or gets
review

   + try to transfer console owner (Steven) -> helps in several
 situations, so far only hand made stress code failed

   + reduce amount of messages
 + does it make sense to print the same warning 1000-times?
 + could one long warning cause softlockup with the console
   owner transfer?

   + throttle thread producing too many messages
 + IMHO, very good solution but nobody investigated it


This patchset really helps in many situations. I believe that it
does not make things worse. You might block it and spend another
long time discussing other solutions.

Will we need a better solution? Maybe, probably.

Is it possible to provide an acceptable solution using offload?
Probably using lazy offload. In a reasonable time frame with
a comparably low risk? Me not.

Best Regards,
Petr


Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

2018-01-10 Thread Tejun Heo
On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> This is the last version of Steven's console owner/waiter logic.
> Plus my proposal to hide it into 3 helper functions. It is supposed
> to keep the code maintenable.
> 
> The handshake really works. It happens about 10-times even during
> boot of a simple system in qemu with a fast console here. It is
> definitely able to avoid some softlockups. Let's see if it is
> enough in practice.
> 
> From my point of view, it is ready to go into linux-next so that
> it can get some more test coverage.
> 
> Steven's patch is the v4, see
> https://lkml.kernel.org/r/20171108102723.60221...@gandalf.local.home

At least for now,

 Nacked-by: Tejun Heo 

Maybe this can be a part of solution but it's really worrying how the
whole discussion around this subject is proceeding.  You guys are
trying to railroad actual problems.  Please address actual technical
problems.

Thanks.

-- 
tejun


<    1   2