subject:"Re\: \[RFC\]\[PATCH v2 1\/2\] printk\: Make printk\(\) completely async"

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-11 Thread Sergey Senozhatsky

Hello Tejun,

On (03/11/16 12:22), Tejun Heo wrote:
> On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote:
> > I'd personally prefer to go with the "less dependency" option -- a dedicated
> > kthread, I think. mostly for the sake of simplicity. I agree with the point
> > that console_unlock() has unpredictable execution time, and in general case
> > we would have a busy kworker (or sleeping in console_lock() or doing
> > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
> > that don't depend on printk. printk with dedicated printk-kthread seems
> > easier to control. how does it sound?
> 
> I don't think it makes sense to avoid workqueue for execution latency.
> The only case which can matter is the rescuer case and as I wrote
> before the system is already in an extremely high latency mode by the
> time rescuer is needed, so it's unlikely to make noticeable
> differences.
>
> However, I agree that using kthread is a good idea here just to reduce
> the amount of dependency as prink working even during complex failures
> is important.  workqueue itself is fairly complex and it also requires
> timer and task creation to work correctly for proper operation.
> That's a lot of extra dependency.

Thanks!

I agree that, in some cases (if not in most of them) the "value" of printk()
output is inversely proportional to the system health -- the worst the state,
the more attention people pay to printk() output; so a simpler solution here
gives more confidence.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-11 Thread Tejun Heo

Hello, Sergey.

On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote:
> I'd personally prefer to go with the "less dependency" option -- a dedicated
> kthread, I think. mostly for the sake of simplicity. I agree with the point
> that console_unlock() has unpredictable execution time, and in general case
> we would have a busy kworker (or sleeping in console_lock() or doing
> cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
> that don't depend on printk. printk with dedicated printk-kthread seems
> easier to control. how does it sound?

I don't think it makes sense to avoid workqueue for execution latency.
The only case which can matter is the rescuer case and as I wrote
before the system is already in an extremely high latency mode by the
time rescuer is needed, so it's unlikely to make noticeable
differences.

However, I agree that using kthread is a good idea here just to reduce
the amount of dependency as prink working even during complex failures
is important.  workqueue itself is fairly complex and it also requires
timer and task creation to work correctly for proper operation.
That's a lot of extra dependency.

Thanks.

-- 
tejun

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-10 Thread Sergey Senozhatsky

On (03/10/16 10:53), Petr Mladek wrote:
[..]
> On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote:
> > On (03/07/16 13:16), Jan Kara wrote:
> > What do you think? Or would you prefer to first introduce async
> > printk() rework, and move to console_unlock() in vprintk_emit()
> > one release cycle later?
> > IOW, in 3 steps:
> > -- first make printk() async
> > -- then console_unlock() async, and use console_unlock_for_printk() in
> >vprintk_emit()
> >
> > -- then switch to console_unlock() in vprintk_emit().
> 
> I would sort this by priorities.

I agree, let's settle down async printk() first.

> I know about real-world problems that will get solved by
> async printk. I haven't heard yet people complaining about
> blocked console_lock()/console_unlock() calls outside printk
> code. So, I would personally prefer to handle async printk
> first.

well, I see some problems with console_lock()/console_unlock()  :)

> Heh, you opened an interesting can of worms. There are definitely
> locations that just want to manipulate the list of consoles and
> their setting without the need to push the date. I wonder how
> many locations really need to push the data.

I've tested it briefly on some of the setups that I have around,
and the boot time reduced by (very roughly) ~20+%; systemd and
friends do a number of tty/etc. calls, and stuck in console_unock()
each time. of course, the "pre-condition" here are printk()s from
drivers/etc. (frequent enough to keep call_console_drivers() busy,
not necessarily "pressure").

even on my laptop, userspace does a ton of console_unlock()

...
  [] console_unlock+0x24/0x89
  [] console_device+0x4a/0x54
  [] tty_open+0x127/0x4c5
  [] chrdev_open+0x13f/0x164
  [] ? cdev_put+0x23/0x23
  [] do_dentry_open.isra.1+0x1b3/0x29e
  [] vfs_open+0x53/0x58
  [] path_openat+0xa37/0xc8c
  [] do_filp_open+0x4d/0xa3
  [] ? __alloc_fd+0x1ae/0x1c0
  [] ? _raw_spin_unlock+0x27/0x31
  [] do_sys_open+0x13c/0x1cc
  [] ? do_sys_open+0x13c/0x1cc
  [] SyS_open+0x1e/0x20
  [] entry_SYSCALL_64_fastpath+0x18/0xa8

fput()->console_unlock()

  [] console_unlock+0x24/0x89
  [] con_shutdown+0x2d/0x30
  [] release_tty+0x52/0x12e
  [] tty_release+0x436/0x453
  [] __fput+0x107/0x1ba
  [] fput+0xe/0x10
  [] task_work_run+0x67/0x90
  [] exit_to_usermode_loop+0x66/0x84
  [] syscall_return_slowpath+0x8d/0x92
  [] entry_SYSCALL_64_fastpath+0xa6/0xa8
...

etc.


> Note that console_unlock_for_printk() might be a bit
> misleading. Especially when you suggest to replace it by
> console_unlock() in vprintk_emit() ;-) I wonder if
> console_flush_and_unlock() might be more descriptive.

oh, yes, the function name was absolutely random.
console_flush_and_unlock() looks good.

> We might even split flush_console() into a separate function in the end.
> I think that the combination with unlock() is there to make sure
> that somebody will flush the last messages from printk(), see
> the retry stuff. It probably won't be needed with the asynch printk().
> 
> Anyway, all these console_unlock() changes looks like another big step
> and I suggest to do it separately.

ok.

> PS: I want to check more precisely the async printk patchset but
> I am repeatedy sidetracked this week :-(

no prob! it's a pre-merge period, no pressure.

I'll re-spin the printk() patch tomorrow, I think.
async console_unlock() will be separated.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-10 Thread Sergey Senozhatsky

On (03/10/16 10:27), Jan Kara wrote:
[..]
> So I think this should definitely stay as a separate patch since it
> possibly changes user visible behavior and sometimes blocking may be
> actually desirable for userspace. I don't have that strong opinion whether
> it should be in a separate patch set or part of this one. Maybe a separate
> patch set would be somewhat better so that we first hash out possible issues
> with async printk and once that's settled we start messing more with the
> code changing the behavior of console_unlock() as well.

agree, thanks. I'll split the series and submit console_unlock() rework
separately.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-10 Thread Petr Mladek

On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote:
> On (03/07/16 13:16), Jan Kara wrote:
> What do you think? Or would you prefer to first introduce async
> printk() rework, and move to console_unlock() in vprintk_emit()
> one release cycle later?
> IOW, in 3 steps:
> -- first make printk() async
> -- then console_unlock() async, and use console_unlock_for_printk() in
>vprintk_emit()
>
> -- then switch to console_unlock() in vprintk_emit().

I would sort this by priorities. I know about real-world problems
that will get solved by async printk. I haven't heard yet people
complaining about blocked console_lock()/console_unlock() calls
outside printk code. So, I would personally prefer to handle
async printk first.

Heh, you opened an interesting can of worms. There are definitely
locations that just want to manipulate the list of consoles and
their setting without the need to push the date. I wonder how
many locations really need to push the data.

Note that console_unlock_for_printk() might be a bit
misleading. Especially when you suggest to replace it by
console_unlock() in vprintk_emit() ;-) I wonder if
console_flush_and_unlock() might be more descriptive.

We might even split flush_console() into a separate function in the end.
I think that the combination with unlock() is there to make sure
that somebody will flush the last messages from printk(), see
the retry stuff. It probably won't be needed with the asynch printk().

Anyway, all these console_unlock() changes looks like another big step
and I suggest to do it separately.

I looking forward to see where this ends.

Thanks,
Petr

PS: I want to check more precisely the async printk patchset but
I am repeatedy sidetracked this week :-(

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-10 Thread Jan Kara

On Wed 09-03-16 15:09:50, Sergey Senozhatsky wrote:
> Hello Jan,
> 
> On (03/07/16 13:16), Jan Kara wrote:
> [..]
> > > So if this will be a problem in practice, using a kthread will probably be
> > > the easiest solution.
> > 
> > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> > create kthread anyway as a rescuer thread, it may be the simplest to just
> > go back to using a single kthread for printing. What do you think?
> 
> I have this patch on top of the series now. In short, it closes one more
> possibility of lockups -- console_lock()/console_unlock() calls. the patch
> splits console_unlock() in two parts:
> -- the fast one just wake up printing kthread
> -- the slow one does call_console_drivers() loop
> 
> I think it sort of makes sense to tweak the patch below a bit and fold it
> into 0001, and move _some_ of the vprintk_emit() checks to console_unlock().
> 
> very schematically, after folding, vprintk_emit() will be
> 
>   if (in_sched) {
>   if (!printk_sync && printk_thread)
>   wake_up()
>   else
>   irq_work_queue()
>   }
>   
>   if (!in_sched)
>   if (console_trylock())
>   console_unlock()
> 
> and console_unlock() will be
> 
>   if (!in_panic && !printk_sync && printk_thread) {
>   up_console_sem()
>   wake_up()
>   } else {
>   console_unlock_for_printk()
>   }
> 
> console_unlock_for_printk() does the call_console_drivers() loop.
> 
> console_flush_on_panic() and printing_func() call console_unlock_for_printk()
> directly.
> 
> 
> What do you think? Or would you prefer to first introduce async
> printk() rework, and move to console_unlock() in vprintk_emit()
> one release cycle later?
> IOW, in 3 steps:
> -- first make printk() async
> -- then console_unlock() async, and use console_unlock_for_printk() in
>vprintk_emit()
> -- then switch to console_unlock() in vprintk_emit().
> 
> 
> below is the patch which introduces console_unlock_for_printk().
> not the squashed console_unlock_for_printk() and 0001.

So I think this should definitely stay as a separate patch since it
possibly changes user visible behavior and sometimes blocking may be
actually desirable for userspace. I don't have that strong opinion whether
it should be in a separate patch set or part of this one. Maybe a separate
patch set would be somewhat better so that we first hash out possible issues
with async printk and once that's settled we start messing more with the
code changing the behavior of console_unlock() as well.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-08 Thread Sergey Senozhatsky

Hello Jan,

On (03/07/16 13:16), Jan Kara wrote:
[..]
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?

I have this patch on top of the series now. In short, it closes one more
possibility of lockups -- console_lock()/console_unlock() calls. the patch
splits console_unlock() in two parts:
-- the fast one just wake up printing kthread
-- the slow one does call_console_drivers() loop

I think it sort of makes sense to tweak the patch below a bit and fold it
into 0001, and move _some_ of the vprintk_emit() checks to console_unlock().

very schematically, after folding, vprintk_emit() will be

if (in_sched) {
if (!printk_sync && printk_thread)
wake_up()
else
irq_work_queue()
}

if (!in_sched)
if (console_trylock())
console_unlock()

and console_unlock() will be

if (!in_panic && !printk_sync && printk_thread) {
up_console_sem()
wake_up()
} else {
console_unlock_for_printk()
}

console_unlock_for_printk() does the call_console_drivers() loop.

console_flush_on_panic() and printing_func() call console_unlock_for_printk()
directly.


What do you think? Or would you prefer to first introduce async
printk() rework, and move to console_unlock() in vprintk_emit()
one release cycle later?
IOW, in 3 steps:
-- first make printk() async
-- then console_unlock() async, and use console_unlock_for_printk() in
   vprintk_emit()
-- then switch to console_unlock() in vprintk_emit().


below is the patch which introduces console_unlock_for_printk().
not the squashed console_unlock_for_printk() and 0001.

-ss

==

>From bc3932c68c5afb9bf43af98335c705c75067a93a Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky 
Subject: [PATCH 3/4] printk: introduce console_unlock_for_printk()

Even though we already have asynchronous printk()->vprintk_emit(),
there are still good chances to get lockups, because we don't have
asynchronous console_unlock(). So any process doing console_lock()
and console_unlock() will end up looping in console_unlock(), pushing
the messages to console drivers (possibly with IRQs or preemption
disabled), regardless the fact that we have a dedicated kthread for
that particular job.

Apart from that, console_lock()/console_unlock() can be executed by
user processes as a part of system calls:

a)  SyS_open()
 ...
  chrdev_open()
   tty_open()
console_device()
 console_lock()
 console_unlock()
  for (;;) {
   call_console_drivers()
  }

b) SyS_read()
...
 sysfs_read_file()
  dev_attr_show()
   show_cons_active()
console_lock()
console_unlock()
 for (;;) {
  call_console_drivers()
 }

c) doing `cat /proc/consoles`
  SyS_read()
   vfs_read()
proc_reg_read()
 seq_read()
  c_stop()
   console_unlock()
for (;;) {
 call_console_drivers()
}

etc.

This can add unnecessary latencies to the user space processes.

This patch splits console_unlock() in two parts:
-- the fast path up() console semaphore and wake up printing kthread
   (if there is one, of course), otherwise
-- the slow path: does what console_unlock() did previously, emit
   the messages and then up() console semaphore

The actual printing loop is, thus, moved to a new function,
console_unlock_for_printk(). There are 3 places that
unconditionally call it:
 a) direct printing from vprintk_emit()
 b) console_flush_on_panic()
 c) printing kthread callback

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 51 +++---
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de45d86..ddaf62e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -303,6 +303,8 @@ static struct task_struct *printk_thread;
 /* Wait for printing wakeups from async vprintk_emit() */
 static DECLARE_WAIT_QUEUE_HEAD(printing_wait);
 
+static void console_unlock_for_printk(void);
+
 static int printing_func(void *data)
 {
while (1) {
@@ -314,7 +316,7 @@ static int printing_func(void *data)
remove_wait_queue(&printing_wait, &wait);
 
console_lock();
-   console_unlock();
+   console_unlock_for_printk();
}
 
return 0;
@@ -1900,7 +1902,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 * /dev/kmsg and syslog() users.
 */
if (console_trylock())
-   consol

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-08 Thread Sergey Senozhatsky

Hello Tejun,

On (03/07/16 10:49), Tejun Heo wrote:
> On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote:
> > A new version. Switched to [printk] kthread.
> 
> There are some benefits to using a percpu workqueue with CPU_INTENSIVE
> set or an unbound workqueue.  It'd need WQ_RESCUER so it'd still
> create a dedicated thread which is used under heavy memory pressure
> but workqueue will usaully give you local (cpu or node) worker.  One
> reason to use kthread directly tho is minimizing the amount of
> dependency which is pretty important for printk.  If we decide to use
> kthread instead of workqueue, let's please do it for the right reason.

thanks.

I'd personally prefer to go with the "less dependency" option -- a dedicated
kthread, I think. mostly for the sake of simplicity. I agree with the point
that console_unlock() has unpredictable execution time, and in general case
we would have a busy kworker (or sleeping in console_lock() or doing
cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules
that don't depend on printk. printk with dedicated printk-kthread seems
easier to control. how does it sound?

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Tejun Heo

Hello,

On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote:
> A new version. Switched to [printk] kthread.

There are some benefits to using a percpu workqueue with CPU_INTENSIVE
set or an unbound workqueue.  It'd need WQ_RESCUER so it'd still
create a dedicated thread which is used under heavy memory pressure
but workqueue will usaully give you local (cpu or node) worker.  One
reason to use kthread directly tho is minimizing the amount of
dependency which is pretty important for printk.  If we decide to use
kthread instead of workqueue, let's please do it for the right reason.

Thanks.

-- 
tejun

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Tejun Heo

Hello,

On Mon, Mar 07, 2016 at 09:22:30AM +0100, Jan Kara wrote:
> > > I don't know what MAYDAY is. I'm talking about a situation where 
> > > printing_work
> > > work item is not processed (i.e. printing_work_func() is not called) until
> > > current work item calls schedule_timeout_*().

That was because the work item was percpu and not marked
CPU_INTENSIVE.  Either using an unbound or CPU_INTENSIVE workqueue
should be enough.

> > > We had a problem that since vmstat_work work item was using system_wq,
> > > vmstat_work work item was not processed (i.e. vmstat_update() was not 
> > > called) if
> > > kworker was looping inside memory allocator without calling 
> > > schedule_timeout_*()
> > > due to disk_events_workfn() doing GFP_NOIO allocation).
> > 
> > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > (WQ_MEM_RECLAIM).

Because WQ_MEM_RECLAIM only guarantees concurrency of 1, it doesn't
make sense to set it to a shared workqueue.  A dedicated workquee
should be created per domain which needs forward progress guarantee.

> > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > ->rescuer thread.
> > need to look more.
> 
> Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> creating a worker process and wakes up rescuer thread. However I don't see
> that as a problem...

I don't think it matters.  At that point, the system should already be
thrashing heavily and everything is crawling anyway.  A couple jiffies
delay isn't gonna be noticeable.

Thanks.

-- 
tejun

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Sergey Senozhatsky

On (03/07/16 13:16), Jan Kara wrote:
[..]
> > So for UP systems, we should by default disable async printing anyway I
> > suppose. It is just a pointless overhead. So please just make printk_sync
> > default to true if !CONFIG_SMP.
> > 
> > When IRQs are disabled, you're right we will have a change in behavior. I
> > don't see an easy way of avoiding delaying of printk until IRQs get
> > enabled. I don't want to queue work directly because that creates
> > possibility for lock recursion in queue_work(). And playing some tricks
> > with irq_works isn't easy either - you cannot actually rely on any other
> > CPU doing anything (even a timer tick) because of NOHZ.
> > 
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?

A new version. Switched to [printk] kthread.
Minimally tested only.

===8<===8<===

>From 6f697488b20b356e4103ddcc4f8b0ec1fa6a9531 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Mon, 7 Mar 2016 23:53:46 +0900
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory), thus printk() uses its own dedicated printing
kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
set we potentially can receive delays in printing until workqueue
declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention,
is that deferred printk() messages may appear before @printk_thread is
allocated, so in the very beginning we have to print deferred messages
the old way -- via IRQs.

Signed-off-by: Jan Kara 
Signed-off-by: Sergey Senozhatsky 
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c  | 193 +---
 2 files changed, 145 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index e0a21e4..5b01941 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)

+   printk.

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Sergey Senozhatsky

On (03/07/16 11:52), Jan Kara wrote:
[..]
> So for UP systems, we should by default disable async printing anyway I
> suppose. It is just a pointless overhead. So please just make printk_sync
> default to true if !CONFIG_SMP.

ok, thanks.

> When IRQs are disabled, you're right we will have a change in behavior. I
> don't see an easy way of avoiding delaying of printk until IRQs get
> enabled. I don't want to queue work directly because that creates
> possibility for lock recursion in queue_work(). And playing some tricks
> with irq_works isn't easy either - you cannot actually rely on any other
> CPU doing anything (even a timer tick) because of NOHZ.

yes. I thought about some sort of PRINTK_IPI on smp, and queue work from
PRINTK_IPI. which is a bit insane.

> So if this will be a problem in practice, using a kthread will probably be
> the easiest solution.

probably kthread is the way to go then. Tetsuo's bad experience with
workqueues also sounds a bit alarming.

I'll post a new patch soon.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Sergey Senozhatsky

On (03/07/16 20:10), Tetsuo Handa wrote:
[..]
> > > > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > > > ->rescuer thread.
> > > > need to look more.
> > > 
> > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > > creating a worker process and wakes up rescuer thread. However I don't see
> > > that as a problem...
> > 
> > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> > delay. the two commits that Tetsuo pointed at earlier in he loop 
> > (373ccbe59270
> > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> > I've slightly tested OOM-kill on my desktop system and haven't spotted any
> > printk delays (well, a test on desktop is not really representative, of
> > course).
> 
> I wanted to tell that if kworker is running a buggy function that calls
> cond_resched() but does not call schedule_timeout_*() for very long time,
> such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for
> waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can
> still cause huge delay if kworker does not call schedule_timeout_*().
> Not specific to OOM-killer or vmstat.

your point is taken. thanks.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Tetsuo Handa

Jan Kara wrote:
> > So for UP systems, we should by default disable async printing anyway I
> > suppose. It is just a pointless overhead. So please just make printk_sync
> > default to true if !CONFIG_SMP.
> > 
> > When IRQs are disabled, you're right we will have a change in behavior. I
> > don't see an easy way of avoiding delaying of printk until IRQs get
> > enabled. I don't want to queue work directly because that creates
> > possibility for lock recursion in queue_work(). And playing some tricks
> > with irq_works isn't easy either - you cannot actually rely on any other
> > CPU doing anything (even a timer tick) because of NOHZ.
> > 
> > So if this will be a problem in practice, using a kthread will probably be
> > the easiest solution.
> 
> Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
> create kthread anyway as a rescuer thread, it may be the simplest to just
> go back to using a single kthread for printing. What do you think?
> 

I think a plain kthread is better. printing_work_func() can become a buggy
function. console_unlock() from printing_work_func() can occupy kworker for
many seconds due to looping with cond_resched() (i.e. without calling
schedule_timeout_*()) from schedulable context which can in turn defer
other work items.

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Jan Kara

On Mon 07-03-16 11:52:48, Jan Kara wrote:
> On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > On (03/07/16 09:22), Jan Kara wrote:
> > [..]
> > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer 
> > > > thread
> > > > (WQ_MEM_RECLAIM).
> > > > 
> > > > [..]
> > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work 
> > > > > item,
> > > > > printing_work_func() will not be called until current work item calls
> > > > > schedule_timeout_*(). That will be an undesirable random delay. If 
> > > > > you use
> > > > > a dedicated kernel thread rather than a dedicated workqueue with 
> > > > > WQ_MEM_RECLAIM,
> > > > > we can avoid this random delay.
> > > > 
> > > > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > > > ->rescuer thread.
> > > > need to look more.
> > > 
> > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > > creating a worker process and wakes up rescuer thread. However I don't see
> > > that as a problem...
> > 
> > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> > delay. the two commits that Tetsuo pointed at earlier in he loop 
> > (373ccbe59270
> > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> > I've slightly tested OOM-kill on my desktop system and haven't spotted any
> > printk delays (well, a test on desktop is not really representative, of
> > course).
> > 
> > 
> > the only thing that so far grabbed my attention - is
> > 
> > __this_cpu_or(printk_pending)
> > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > 
> > a _theoretical_ corner case here is when we have only one CPU doing a bunch
> > of printk()s and this CPUs disables irqs in advance
> > local_irq_save
> > for (...)
> > printk()
> > local_irq_restore()
> > 
> > if no other CPUs see `printk_pending' then nothing will be printed up
> > until local_irq_restore() (assuming that IRQ disable time is withing
> > the hardlockup detection threshold). if any other CPUs concurrently
> > execute printk then we are fine, but
> > a) if none -- then we probably have a small change in behaviour
> > and
> > b) UP systems
> 
> So for UP systems, we should by default disable async printing anyway I
> suppose. It is just a pointless overhead. So please just make printk_sync
> default to true if !CONFIG_SMP.
> 
> When IRQs are disabled, you're right we will have a change in behavior. I
> don't see an easy way of avoiding delaying of printk until IRQs get
> enabled. I don't want to queue work directly because that creates
> possibility for lock recursion in queue_work(). And playing some tricks
> with irq_works isn't easy either - you cannot actually rely on any other
> CPU doing anything (even a timer tick) because of NOHZ.
> 
> So if this will be a problem in practice, using a kthread will probably be
> the easiest solution.

Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues
create kthread anyway as a rescuer thread, it may be the simplest to just
go back to using a single kthread for printing. What do you think?

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Tetsuo Handa

Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/07/16 09:22), Jan Kara wrote:
> [..]
> > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > > (WQ_MEM_RECLAIM).
> > > 
> > > [..]
> > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work 
> > > > item,
> > > > printing_work_func() will not be called until current work item calls
> > > > schedule_timeout_*(). That will be an undesirable random delay. If you 
> > > > use
> > > > a dedicated kernel thread rather than a dedicated workqueue with 
> > > > WQ_MEM_RECLAIM,
> > > > we can avoid this random delay.
> > > 
> > > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > > ->rescuer thread.
> > > need to look more.
> > 
> > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > creating a worker process and wakes up rescuer thread. However I don't see
> > that as a problem...
> 
> yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> I've slightly tested OOM-kill on my desktop system and haven't spotted any
> printk delays (well, a test on desktop is not really representative, of
> course).

I wanted to tell that if kworker is running a buggy function that calls
cond_resched() but does not call schedule_timeout_*() for very long time,
such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for
waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can
still cause huge delay if kworker does not call schedule_timeout_*().
Not specific to OOM-killer or vmstat.

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Jan Kara

On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote:
> Hello,
> 
> On (03/07/16 09:22), Jan Kara wrote:
> [..]
> > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > > (WQ_MEM_RECLAIM).
> > > 
> > > [..]
> > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work 
> > > > item,
> > > > printing_work_func() will not be called until current work item calls
> > > > schedule_timeout_*(). That will be an undesirable random delay. If you 
> > > > use
> > > > a dedicated kernel thread rather than a dedicated workqueue with 
> > > > WQ_MEM_RECLAIM,
> > > > we can avoid this random delay.
> > > 
> > > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > > ->rescuer thread.
> > > need to look more.
> > 
> > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> > creating a worker process and wakes up rescuer thread. However I don't see
> > that as a problem...
> 
> yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
> delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
> and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
> I've slightly tested OOM-kill on my desktop system and haven't spotted any
> printk delays (well, a test on desktop is not really representative, of
> course).
> 
> 
> the only thing that so far grabbed my attention - is
> 
>   __this_cpu_or(printk_pending)
>   irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> 
> a _theoretical_ corner case here is when we have only one CPU doing a bunch
> of printk()s and this CPUs disables irqs in advance
>   local_irq_save
>   for (...)
>   printk()
>   local_irq_restore()
> 
> if no other CPUs see `printk_pending' then nothing will be printed up
> until local_irq_restore() (assuming that IRQ disable time is withing
> the hardlockup detection threshold). if any other CPUs concurrently
> execute printk then we are fine, but
>   a) if none -- then we probably have a small change in behaviour
> and
>   b) UP systems

So for UP systems, we should by default disable async printing anyway I
suppose. It is just a pointless overhead. So please just make printk_sync
default to true if !CONFIG_SMP.

When IRQs are disabled, you're right we will have a change in behavior. I
don't see an easy way of avoiding delaying of printk until IRQs get
enabled. I don't want to queue work directly because that creates
possibility for lock recursion in queue_work(). And playing some tricks
with irq_works isn't easy either - you cannot actually rely on any other
CPU doing anything (even a timer tick) because of NOHZ.

So if this will be a problem in practice, using a kthread will probably be
the easiest solution.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Sergey Senozhatsky

Hello,

On (03/07/16 09:22), Jan Kara wrote:
[..]
> > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> > (WQ_MEM_RECLAIM).
> > 
> > [..]
> > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > > printing_work_func() will not be called until current work item calls
> > > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > > a dedicated kernel thread rather than a dedicated workqueue with 
> > > WQ_MEM_RECLAIM,
> > > we can avoid this random delay.
> > 
> > hm. yes, seems that it may take some time until workqueue wakeup() a 
> > ->rescuer thread.
> > need to look more.
> 
> Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
> creating a worker process and wakes up rescuer thread. However I don't see
> that as a problem...

yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer
delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270
and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq.
I've slightly tested OOM-kill on my desktop system and haven't spotted any
printk delays (well, a test on desktop is not really representative, of
course).

the only thing that so far grabbed my attention - is

__this_cpu_or(printk_pending)
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));

a _theoretical_ corner case here is when we have only one CPU doing a bunch
of printk()s and this CPUs disables irqs in advance
local_irq_save
for (...)
printk()
local_irq_restore()

if no other CPUs see `printk_pending' then nothing will be printed up
until local_irq_restore() (assuming that IRQ disable time is withing
the hardlockup detection threshold). if any other CPUs concurrently
execute printk then we are fine, but
a) if none -- then we probably have a small change in behaviour
and
b) UP systems

[..]
> > such usage is quite possible.
> > 
> > problems that I have with console_lock()/console_unlock() is that
> > these functions serve a double purpose: exclusive printk() lock and a
> > console_drivers list lock.
> 
> Well, but changing how console locking works is a separate issue, isn't it?
> So please as a separate patch set if you want to try it.

absolutely agree, this is a separate thing.

> Actually I don't think changing the locking will be so easy.

again, agree. splitting any lock is always tricky and risky.
especially if we talk about console_sem. it can easily add up
new deadlocks, make some fbcon unhappy, etc. etc.

register_console()
write_lock_console_lock()
if (error)
printk()
printk_lock()
read_lock_console_lock() <- eadlock

and so on and so forth; I'm not very enthusiastic to change this at
the moment.

-ss

> console_lock/unlock is used e.g. for console blanking where you need to
> block any printing while you call ->unblank() for each console. That being
> said I don't think improvement is impossible, just given my experiences
> with console / printk code there will be surprises waiting for you :).
> 
>   Honza
> -- 
> Jan Kara 
> SUSE Labs, CR
>

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-07 Thread Jan Kara

On Sun 06-03-16 22:27:03, Sergey Senozhatsky wrote:
> On (03/06/16 20:06), Tetsuo Handa wrote:
> [..]
> > > do you mean a new worker allocation delay and a MAYDAY timer delay?
> > > 
> > 
> > I don't know what MAYDAY is. I'm talking about a situation where 
> > printing_work
> > work item is not processed (i.e. printing_work_func() is not called) until
> > current work item calls schedule_timeout_*().
> > 
> > We had a problem that since vmstat_work work item was using system_wq,
> > vmstat_work work item was not processed (i.e. vmstat_update() was not 
> > called) if
> > kworker was looping inside memory allocator without calling 
> > schedule_timeout_*()
> > due to disk_events_workfn() doing GFP_NOIO allocation).
> 
> hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
> (WQ_MEM_RECLAIM).
> 
> [..]
> > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> > printing_work_func() will not be called until current work item calls
> > schedule_timeout_*(). That will be an undesirable random delay. If you use
> > a dedicated kernel thread rather than a dedicated workqueue with 
> > WQ_MEM_RECLAIM,
> > we can avoid this random delay.
> 
> hm. yes, seems that it may take some time until workqueue wakeup() a 
> ->rescuer thread.
> need to look more.

Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up
creating a worker process and wakes up rescuer thread. However I don't see
that as a problem...

> > >   console_lock();
> > >   for (...) {
> > >   do_foo() {
> > >   ...
> > >   pr_err(" ... foo message ...\n");
> > >   ...
> > >   }
> > >   }
> > >   console_unlock();
> > > 
> > > then yes, nothing will be printed until that process executes 
> > > console_unlock(),
> > > because it's console_unlock() that pushes the messages to console drivers.
> > 
> > Yes, I meant a sequence like
> > 
> >   console_lock();
> >   ptr = kmalloc(GFP_KERNEL);
> >   kfree(ptr);
> >   console_unlock();
> > 
> > and kmalloc() prints OOM killer messages rather than failing that 
> > allocation.
> > Are we sure that there is no such usage?
> 
> such usage is quite possible.
> 
> problems that I have with console_lock()/console_unlock() is that
> these functions serve a double purpose: exclusive printk() lock and a
> console_drivers list lock.

Well, but changing how console locking works is a separate issue, isn't it?
So please as a separate patch set if you want to try it.

Actually I don't think changing the locking will be so easy.
console_lock/unlock is used e.g. for console blanking where you need to
block any printing while you call ->unblank() for each console. That being
said I don't think improvement is impossible, just given my experiences
with console / printk code there will be surprises waiting for you :).

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-06 Thread Tetsuo Handa

Sergey Senozhatsky wrote:
> such usage is quite possible.
> 
> problems that I have with console_lock()/console_unlock() is that
> these functions serve a double purpose: exclusive printk() lock and a
> console_drivers list lock.

Yes, I don't like it too.

> 
>  I haven't really thought about it yet, but I want to split it. 
> 

Since writing to console does not call schedule(), I think
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu() (or synchronize_rcu_*() ?)
can manage it without using
read_lock_console()/read_unlock_console()/write_lock_console()/write_unlock_console().

Replacing console_lock()/console_unlock() for protecting console_drivers list
with RCU might be helpful.

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-06 Thread Sergey Senozhatsky

On (03/06/16 20:06), Tetsuo Handa wrote:
[..]
> > do you mean a new worker allocation delay and a MAYDAY timer delay?
> > 
> 
> I don't know what MAYDAY is. I'm talking about a situation where printing_work
> work item is not processed (i.e. printing_work_func() is not called) until
> current work item calls schedule_timeout_*().
> 
> We had a problem that since vmstat_work work item was using system_wq,
> vmstat_work work item was not processed (i.e. vmstat_update() was not called) 
> if
> kworker was looping inside memory allocator without calling 
> schedule_timeout_*()
> due to disk_events_workfn() doing GFP_NOIO allocation).

hm, just for note, none of system-wide wqs seem to have a ->rescuer thread
(WQ_MEM_RECLAIM).

[..]
> Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
> printing_work_func() will not be called until current work item calls
> schedule_timeout_*(). That will be an undesirable random delay. If you use
> a dedicated kernel thread rather than a dedicated workqueue with 
> WQ_MEM_RECLAIM,
> we can avoid this random delay.

hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer 
thread.
need to look more.

[..]
> > console_lock();
> > for (...) {
> > do_foo() {
> > ...
> > pr_err(" ... foo message ...\n");
> > ...
> > }
> > }
> > console_unlock();
> > 
> > then yes, nothing will be printed until that process executes 
> > console_unlock(),
> > because it's console_unlock() that pushes the messages to console drivers.
> 
> Yes, I meant a sequence like
> 
>   console_lock();
>   ptr = kmalloc(GFP_KERNEL);
>   kfree(ptr);
>   console_unlock();
> 
> and kmalloc() prints OOM killer messages rather than failing that allocation.
> Are we sure that there is no such usage?

such usage is quite possible.

problems that I have with console_lock()/console_unlock() is that
these functions serve a double purpose: exclusive printk() lock and a
console_drivers list lock.

 I haven't really thought about it yet, but I want to split it. 

console_lock()/console_unlock() can be executed by user space processes
(inside system calls). for example:

SyS_open()
 ...
  chrdev_open()
   tty_open()
console_device()
 console_lock()
 console_unlock()
  for (;;) {
   call_console_drivers()
  }

or doing `cat /proc/consoles`

SyS_read()
 vfs_read()
  proc_reg_read()
   seq_read()
c_stop()
 console_unlock()
  for (;;) {
   call_console_drivers()
  }

which can introduce two nasty problems:

1) console_lock() may put user space process in TASK_UNINTERRUPTIBLE for
   unknown period of time -- until current console_sem owner will not
   finish print loop in console_unlock(). no signals, heartbeats, etc.
   will be processed by this user space process.

2) user space process may have to spend an unknown period of time in
   console_unlock() later, pushing "3rd party" kernel messages to
   console drivers. again, not really good.

(kthreads can suffer here too, sure).

in the examples above, a process just wanted to iterate the console_drivers
list in read access mode. so, for instance, in

struct tty_driver *console_device(int *index)
{
struct console *c;
struct tty_driver *driver = NULL;

console_lock();
for_each_console(c) {
if (!c->device)
continue;
driver = c->device(c, index);
if (driver)
break;
}
console_unlock();
return driver;
}

instead of console_lock()/console_unlock()->call_console_drivers()
it could do (very schematically)

read_lock_console();
  for_each_console(c) {
  if (!c->device)
  continue;
  driver = c->device(c, index);
  if (driver)
  break;
  }
read_unlock_console();

and in functions that modify the list, the lock can be acquired in
write mode. example,

int unregister_console(struct console *console)
{
write_lock_console();
...
for (a=console_drivers->next, b=console_drivers ;
 a; b=a, a=b->next) {
if (a == console) {
b->next = a->next;
res = 0;
break;
}
}
}
...
if (console_drivers != NULL && console->flags & CON_CONSDEV)
console_drivers->flags |= CON_CONSDEV;

console->flags &= ~CON_ENABLED;
write_unlock_unlock();

printk(), thus, will take its own "another" exclusive lock, to guarantee
that only one process can call_console_drivers(), and it will take the
console_drivers list lock in read mode. so other process(-es) that also
want to access console_drivers list in

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-06 Thread Tetsuo Handa

Sergey Senozhatsky wrote:
> On (03/06/16 16:18), Tetsuo Handa wrote:
> > Moreover, I don't like use of a workqueue even if printk_wq was allocated
> > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
> > the OOM reaper chose a dedicated kernel thread rather than a workqueue
> > ( 
> > http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org
> >  ).
> >
> > Blocking actual printing until ongoing workqueue item calls 
> > schedule_timeout_*()
> > is not nice (see commit 373ccbe59270 and 564e81a57f97).
> 
> do you mean a new worker allocation delay and a MAYDAY timer delay?
> 

I don't know what MAYDAY is. I'm talking about a situation where printing_work
work item is not processed (i.e. printing_work_func() is not called) until
current work item calls schedule_timeout_*().

We had a problem that since vmstat_work work item was using system_wq,
vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
kworker was looping inside memory allocator without calling schedule_timeout_*()
due to disk_events_workfn() doing GFP_NOIO allocation).

--
[  271.579276] MemAlloc: kworker/0:56(7399) gfp=0x240 order=0 delay=129294
[  271.581237]  88007c78fa08 8800778f8c80 88007c79 
8800778f8c80
[  271.583329]  0240  8800778f8c80 
88007c78fa20
[  271.585391]  8162aa9d 0001 88007c78fa30 
8162aac7
[  271.587463] Call Trace:
[  271.588512]  [] preempt_schedule_common+0x18/0x2b
[  271.590243]  [] _cond_resched+0x17/0x20
[  271.591830]  [] __alloc_pages_nodemask+0x64e/0xcc0
[  271.593561]  [] ? __kmalloc+0x22/0x190
[  271.595119]  [] alloc_pages_current+0x87/0x110
[  271.596778]  [] bio_copy_kern+0xc4/0x180
[  271.598342]  [] ? wait_woken+0x80/0x80
[  271.599878]  [] blk_rq_map_kern+0x70/0x130
[  271.601481]  [] ? blk_get_request+0x75/0xe0
[  271.603100]  [] scsi_execute+0x12d/0x160
[  271.604657]  [] scsi_execute_req_flags+0x84/0xf0
[  271.606339]  [] sr_check_events+0xb2/0x2a0 [sr_mod]
[  271.608141]  [] ? set_next_entity+0x6c/0x6a0
[  271.609830]  [] cdrom_check_events+0x13/0x30 [cdrom]
[  271.611610]  [] sr_block_check_events+0x25/0x30 [sr_mod]
[  271.613429]  [] disk_check_events+0x5b/0x150
[  271.615065]  [] disk_events_workfn+0x11/0x20
[  271.616699]  [] process_one_work+0x135/0x310
[  271.618321]  [] worker_thread+0x11b/0x4a0
[  271.620018]  [] ? process_one_work+0x310/0x310
[  271.622022]  [] kthread+0xd3/0xf0
[  271.623533]  [] ? kthread_create_on_node+0x1a0/0x1a0
[  271.625487]  [] ret_from_fork+0x3f/0x70
[  271.627175]  [] ? kthread_create_on_node+0x1a0/0x1a0
--

Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
printing_work_func() will not be called until current work item calls
schedule_timeout_*(). That will be an undesirable random delay. If you use
a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
we can avoid this random delay.

> > > +static void printing_work_func(struct work_struct *work)
> > > +{
> > > + console_lock();
> > > + console_unlock();
> > > +}
> > 
> > Is this safe? If somebody invokes the OOM killer between console_lock()
> > and console_unlock(), won't this cause OOM killer messages not printed?
> 
> if you mean something like
> 
>   console_lock();
>   for (...) {
>   do_foo() {
>   ...
>   pr_err(" ... foo message ...\n");
>   ...
>   }
>   }
>   console_unlock();
> 
> then yes, nothing will be printed until that process executes 
> console_unlock(),
> because it's console_unlock() that pushes the messages to console drivers.

Yes, I meant a sequence like

  console_lock();
  ptr = kmalloc(GFP_KERNEL);
  kfree(ptr);
  console_unlock();

and kmalloc() prints OOM killer messages rather than failing that allocation.
Are we sure that there is no such usage?

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-06 Thread Sergey Senozhatsky

On (03/06/16 16:18), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> > printk() is expected to work under different conditions and in different
> > scenarios, including corner cases of OOM when all of the workers are busy
> > (e.g. allocating memory). Thus by default printk() uses its own dedicated
> > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
> > (console_unlock() is time unbound) only if it has failed to allocate
> > printk_wq. Another thing to mention, is that deferred printk() messages
> > may appear before printk_wq is allocated, so in the very beginning we
> > have to printk deferred messages the old way -- in IRQ context.
> 
> I think we should not depend on system_long_wq which does not have
> WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM),
> such systems won't be able to start userspace programs.

well, yes. system_long_wq is a fallback, but probably just BUG_ON(!printk_wq)
would be a better thing to do.

> Moreover, I don't like use of a workqueue even if printk_wq was allocated
> with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
> the OOM reaper chose a dedicated kernel thread rather than a workqueue
> ( 
> http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org 
> ).
>
> Blocking actual printing until ongoing workqueue item calls 
> schedule_timeout_*()
> is not nice (see commit 373ccbe59270 and 564e81a57f97).

printk and console_drivers are expected to be 'callable' from any
context, including IRQs, or under spin_locks, etc. so neither part of
printk->console_unlock()->console_driver->write()
can sleep.

console_driver->write() is quite often something like this

foo_write()
{
spin_lock_irqsave(&port->lock);
uart_console_write( ... foo_put_char);
spin_unlock_iqrestore(&port->lock);
}

and foo_put_char(...), perhaps, waits for device to become ready
transaction

while (!(UART_GET_STATUS(port) & TXEMPTY))
cpu_relax();

and then sends out the character to the device

UART_SET_DATA(port, (unsigned char)ch)

so an already executing printk_wq item shouldn't block because of
OOM happening on another CPU.

do you mean a new worker allocation delay and a MAYDAY timer delay?

> Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, 
> using
> a kernel thread does not change total number of task_struct compared to 
> WQ_MEM_RECLAIM
> approach. I think actual printing should occur as soon as possible rather 
> than randomly
> deferred until workqueue item sleeps.

hm, need to take a look.

> > +static void printing_work_func(struct work_struct *work)
> > +{
> > +   console_lock();
> > +   console_unlock();
> > +}
> 
> Is this safe? If somebody invokes the OOM killer between console_lock()
> and console_unlock(), won't this cause OOM killer messages not printed?

between console_lock() and console_unlock() nothing can steal that lock
from worker; so once a worker grabbed the console_lock() it will print
(and console_lock() does not mean "lock logbuf", printk() from other CPUs
can add messages to the logbuf while worker loops in console_unlock()).

if you mean something like

console_lock();
for (...) {
do_foo() {
...
pr_err(" ... foo message ...\n");
...
}
}
console_unlock();

then yes, nothing will be printed until that process executes console_unlock(),
because it's console_unlock() that pushes the messages to console drivers.
(console_lock()/console_unlock() have other problems and issues, which are
not addressed in this patch set). there is only one function that ignores the
state of console semaphore:
panic()->console_flush_on_panic()

but that's a separate thing, not related to this patch set. if you mean 
something
else here, then please more details.

-ss

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-05 Thread Tetsuo Handa

Sergey Senozhatsky wrote:
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory). Thus by default printk() uses its own dedicated
> workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
> (console_unlock() is time unbound) only if it has failed to allocate
> printk_wq. Another thing to mention, is that deferred printk() messages
> may appear before printk_wq is allocated, so in the very beginning we
> have to printk deferred messages the old way -- in IRQ context.

I think we should not depend on system_long_wq which does not have
WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM),
such systems won't be able to start userspace programs.

Moreover, I don't like use of a workqueue even if printk_wq was allocated
with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
the OOM reaper chose a dedicated kernel thread rather than a workqueue
( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org 
).

Blocking actual printing until ongoing workqueue item calls schedule_timeout_*()
is not nice (see commit 373ccbe59270 and 564e81a57f97). Use of WQ_MEM_RECLAIM
means we add a task_struct for that workqueue. Thus, using a kernel thread does
not change total number of task_struct compared to WQ_MEM_RECLAIM approach.
I think actual printing should occur as soon as possible rather than randomly
deferred until workqueue item sleeps.

> +static void printing_work_func(struct work_struct *work)
> +{
> + console_lock();
> + console_unlock();
> +}

Is this safe? If somebody invokes the OOM killer between console_lock()
and console_unlock(), won't this cause OOM killer messages not printed?

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

2016-03-05 Thread Sergey Senozhatsky

On (03/05/16 19:55), Sergey Senozhatsky wrote:
[..]
> +static int __init init_printk_workqueue(void)
> +{
> + if (printk_sync)
> + return 0;
> +
> + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0);
> + /*
> +  * Fallback to one of system-wide workqueues if we failed to
> +  * allocate a dedicated one. console_unlock() may take some
> +  * time to flush all of the messages, so we better use
> +  * system_long_wq.
> +  */
> + if (!printk_wq) {
> + pr_err("printk: Fallback to system_long workqueue\n");
> + printk_wq = system_long_wq;
> + }
> +
> + return 0;
> +}
> +early_initcall(init_printk_workqueue);
  ^^
better be late_initcall(). init_workqueues(), which creates pwq_cache 
kmem_cache,
is early initcall.

===8<===8<===8<

>From d9cf33958febd863ceb25c59d447aa9500427c30 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

printk() is expected to work under different conditions and in different
scenarios, including corner cases of OOM when all of the workers are busy
(e.g. allocating memory). Thus by default printk() uses its own dedicated
workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq
(console_unlock() is time unbound) only if it has failed to allocate
printk_wq. Another thing to mention, is that deferred printk() messages
may appear before printk_wq is allocated, so in the very beginning we
have to printk deferred messages the old way -- in IRQ context.

Signed-off-by: Jan Kara 
Signed-off-by: Sergey Senozhatsky 
---
 Documentation/kernel-parameters.txt |  10 ++
 kernel/printk/printk.c  | 194 +---
 2 files changed, 146 insertions(+), 58 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index e0a21e4..5b01941 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)

+   printk.synchronous=
+   By default kernel messages are printed to console
+   asynchronously (except during early boot or when oops
+   is happening). That avoids kernel stalling behind slow
+   serial console and thus avoids softlockups, interrupt
+

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

25 matches

Site Navigation

Mail list logo

Footer information