Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello Tejun, On (03/11/16 12:22), Tejun Heo wrote: > On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote: > > I'd personally prefer to go with the "less dependency" option -- a dedicated > > kthread, I think. mostly for the sake of simplicity. I agree with the point > > that console_unlock() has unpredictable execution time, and in general case > > we would have a busy kworker (or sleeping in console_lock() or doing > > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules > > that don't depend on printk. printk with dedicated printk-kthread seems > > easier to control. how does it sound? > > I don't think it makes sense to avoid workqueue for execution latency. > The only case which can matter is the rescuer case and as I wrote > before the system is already in an extremely high latency mode by the > time rescuer is needed, so it's unlikely to make noticeable > differences. > > However, I agree that using kthread is a good idea here just to reduce > the amount of dependency as prink working even during complex failures > is important. workqueue itself is fairly complex and it also requires > timer and task creation to work correctly for proper operation. > That's a lot of extra dependency. Thanks! I agree that, in some cases (if not in most of them) the "value" of printk() output is inversely proportional to the system health -- the worst the state, the more attention people pay to printk() output; so a simpler solution here gives more confidence. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello, Sergey. On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote: > I'd personally prefer to go with the "less dependency" option -- a dedicated > kthread, I think. mostly for the sake of simplicity. I agree with the point > that console_unlock() has unpredictable execution time, and in general case > we would have a busy kworker (or sleeping in console_lock() or doing > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules > that don't depend on printk. printk with dedicated printk-kthread seems > easier to control. how does it sound? I don't think it makes sense to avoid workqueue for execution latency. The only case which can matter is the rescuer case and as I wrote before the system is already in an extremely high latency mode by the time rescuer is needed, so it's unlikely to make noticeable differences. However, I agree that using kthread is a good idea here just to reduce the amount of dependency as prink working even during complex failures is important. workqueue itself is fairly complex and it also requires timer and task creation to work correctly for proper operation. That's a lot of extra dependency. Thanks. -- tejun
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/10/16 10:53), Petr Mladek wrote: [..] > On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote: > > On (03/07/16 13:16), Jan Kara wrote: > > What do you think? Or would you prefer to first introduce async > > printk() rework, and move to console_unlock() in vprintk_emit() > > one release cycle later? > > IOW, in 3 steps: > > -- first make printk() async > > -- then console_unlock() async, and use console_unlock_for_printk() in > >vprintk_emit() > > > > -- then switch to console_unlock() in vprintk_emit(). > > I would sort this by priorities. I agree, let's settle down async printk() first. > I know about real-world problems that will get solved by > async printk. I haven't heard yet people complaining about > blocked console_lock()/console_unlock() calls outside printk > code. So, I would personally prefer to handle async printk > first. well, I see some problems with console_lock()/console_unlock() :) > Heh, you opened an interesting can of worms. There are definitely > locations that just want to manipulate the list of consoles and > their setting without the need to push the date. I wonder how > many locations really need to push the data. I've tested it briefly on some of the setups that I have around, and the boot time reduced by (very roughly) ~20+%; systemd and friends do a number of tty/etc. calls, and stuck in console_unock() each time. of course, the "pre-condition" here are printk()s from drivers/etc. (frequent enough to keep call_console_drivers() busy, not necessarily "pressure"). even on my laptop, userspace does a ton of console_unlock() ... [] console_unlock+0x24/0x89 [] console_device+0x4a/0x54 [] tty_open+0x127/0x4c5 [] chrdev_open+0x13f/0x164 [] ? cdev_put+0x23/0x23 [] do_dentry_open.isra.1+0x1b3/0x29e [] vfs_open+0x53/0x58 [] path_openat+0xa37/0xc8c [] do_filp_open+0x4d/0xa3 [] ? __alloc_fd+0x1ae/0x1c0 [] ? _raw_spin_unlock+0x27/0x31 [] do_sys_open+0x13c/0x1cc [] ? do_sys_open+0x13c/0x1cc [] SyS_open+0x1e/0x20 [] entry_SYSCALL_64_fastpath+0x18/0xa8 fput()->console_unlock() [] console_unlock+0x24/0x89 [] con_shutdown+0x2d/0x30 [] release_tty+0x52/0x12e [] tty_release+0x436/0x453 [] __fput+0x107/0x1ba [] fput+0xe/0x10 [] task_work_run+0x67/0x90 [] exit_to_usermode_loop+0x66/0x84 [] syscall_return_slowpath+0x8d/0x92 [] entry_SYSCALL_64_fastpath+0xa6/0xa8 ... etc. > Note that console_unlock_for_printk() might be a bit > misleading. Especially when you suggest to replace it by > console_unlock() in vprintk_emit() ;-) I wonder if > console_flush_and_unlock() might be more descriptive. oh, yes, the function name was absolutely random. console_flush_and_unlock() looks good. > We might even split flush_console() into a separate function in the end. > I think that the combination with unlock() is there to make sure > that somebody will flush the last messages from printk(), see > the retry stuff. It probably won't be needed with the asynch printk(). > > Anyway, all these console_unlock() changes looks like another big step > and I suggest to do it separately. ok. > PS: I want to check more precisely the async printk patchset but > I am repeatedy sidetracked this week :-( no prob! it's a pre-merge period, no pressure. I'll re-spin the printk() patch tomorrow, I think. async console_unlock() will be separated. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/10/16 10:27), Jan Kara wrote: [..] > So I think this should definitely stay as a separate patch since it > possibly changes user visible behavior and sometimes blocking may be > actually desirable for userspace. I don't have that strong opinion whether > it should be in a separate patch set or part of this one. Maybe a separate > patch set would be somewhat better so that we first hash out possible issues > with async printk and once that's settled we start messing more with the > code changing the behavior of console_unlock() as well. agree, thanks. I'll split the series and submit console_unlock() rework separately. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote: > On (03/07/16 13:16), Jan Kara wrote: > What do you think? Or would you prefer to first introduce async > printk() rework, and move to console_unlock() in vprintk_emit() > one release cycle later? > IOW, in 3 steps: > -- first make printk() async > -- then console_unlock() async, and use console_unlock_for_printk() in >vprintk_emit() > > -- then switch to console_unlock() in vprintk_emit(). I would sort this by priorities. I know about real-world problems that will get solved by async printk. I haven't heard yet people complaining about blocked console_lock()/console_unlock() calls outside printk code. So, I would personally prefer to handle async printk first. Heh, you opened an interesting can of worms. There are definitely locations that just want to manipulate the list of consoles and their setting without the need to push the date. I wonder how many locations really need to push the data. Note that console_unlock_for_printk() might be a bit misleading. Especially when you suggest to replace it by console_unlock() in vprintk_emit() ;-) I wonder if console_flush_and_unlock() might be more descriptive. We might even split flush_console() into a separate function in the end. I think that the combination with unlock() is there to make sure that somebody will flush the last messages from printk(), see the retry stuff. It probably won't be needed with the asynch printk(). Anyway, all these console_unlock() changes looks like another big step and I suggest to do it separately. I looking forward to see where this ends. Thanks, Petr PS: I want to check more precisely the async printk patchset but I am repeatedy sidetracked this week :-(
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On Wed 09-03-16 15:09:50, Sergey Senozhatsky wrote: > Hello Jan, > > On (03/07/16 13:16), Jan Kara wrote: > [..] > > > So if this will be a problem in practice, using a kthread will probably be > > > the easiest solution. > > > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > > create kthread anyway as a rescuer thread, it may be the simplest to just > > go back to using a single kthread for printing. What do you think? > > I have this patch on top of the series now. In short, it closes one more > possibility of lockups -- console_lock()/console_unlock() calls. the patch > splits console_unlock() in two parts: > -- the fast one just wake up printing kthread > -- the slow one does call_console_drivers() loop > > I think it sort of makes sense to tweak the patch below a bit and fold it > into 0001, and move _some_ of the vprintk_emit() checks to console_unlock(). > > very schematically, after folding, vprintk_emit() will be > > if (in_sched) { > if (!printk_sync && printk_thread) > wake_up() > else > irq_work_queue() > } > > if (!in_sched) > if (console_trylock()) > console_unlock() > > and console_unlock() will be > > if (!in_panic && !printk_sync && printk_thread) { > up_console_sem() > wake_up() > } else { > console_unlock_for_printk() > } > > console_unlock_for_printk() does the call_console_drivers() loop. > > console_flush_on_panic() and printing_func() call console_unlock_for_printk() > directly. > > > What do you think? Or would you prefer to first introduce async > printk() rework, and move to console_unlock() in vprintk_emit() > one release cycle later? > IOW, in 3 steps: > -- first make printk() async > -- then console_unlock() async, and use console_unlock_for_printk() in >vprintk_emit() > -- then switch to console_unlock() in vprintk_emit(). > > > below is the patch which introduces console_unlock_for_printk(). > not the squashed console_unlock_for_printk() and 0001. So I think this should definitely stay as a separate patch since it possibly changes user visible behavior and sometimes blocking may be actually desirable for userspace. I don't have that strong opinion whether it should be in a separate patch set or part of this one. Maybe a separate patch set would be somewhat better so that we first hash out possible issues with async printk and once that's settled we start messing more with the code changing the behavior of console_unlock() as well. Honza -- Jan Kara SUSE Labs, CR
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello Jan, On (03/07/16 13:16), Jan Kara wrote: [..] > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? I have this patch on top of the series now. In short, it closes one more possibility of lockups -- console_lock()/console_unlock() calls. the patch splits console_unlock() in two parts: -- the fast one just wake up printing kthread -- the slow one does call_console_drivers() loop I think it sort of makes sense to tweak the patch below a bit and fold it into 0001, and move _some_ of the vprintk_emit() checks to console_unlock(). very schematically, after folding, vprintk_emit() will be if (in_sched) { if (!printk_sync && printk_thread) wake_up() else irq_work_queue() } if (!in_sched) if (console_trylock()) console_unlock() and console_unlock() will be if (!in_panic && !printk_sync && printk_thread) { up_console_sem() wake_up() } else { console_unlock_for_printk() } console_unlock_for_printk() does the call_console_drivers() loop. console_flush_on_panic() and printing_func() call console_unlock_for_printk() directly. What do you think? Or would you prefer to first introduce async printk() rework, and move to console_unlock() in vprintk_emit() one release cycle later? IOW, in 3 steps: -- first make printk() async -- then console_unlock() async, and use console_unlock_for_printk() in vprintk_emit() -- then switch to console_unlock() in vprintk_emit(). below is the patch which introduces console_unlock_for_printk(). not the squashed console_unlock_for_printk() and 0001. -ss == >From bc3932c68c5afb9bf43af98335c705c75067a93a Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Subject: [PATCH 3/4] printk: introduce console_unlock_for_printk() Even though we already have asynchronous printk()->vprintk_emit(), there are still good chances to get lockups, because we don't have asynchronous console_unlock(). So any process doing console_lock() and console_unlock() will end up looping in console_unlock(), pushing the messages to console drivers (possibly with IRQs or preemption disabled), regardless the fact that we have a dedicated kthread for that particular job. Apart from that, console_lock()/console_unlock() can be executed by user processes as a part of system calls: a) SyS_open() ... chrdev_open() tty_open() console_device() console_lock() console_unlock() for (;;) { call_console_drivers() } b) SyS_read() ... sysfs_read_file() dev_attr_show() show_cons_active() console_lock() console_unlock() for (;;) { call_console_drivers() } c) doing `cat /proc/consoles` SyS_read() vfs_read() proc_reg_read() seq_read() c_stop() console_unlock() for (;;) { call_console_drivers() } etc. This can add unnecessary latencies to the user space processes. This patch splits console_unlock() in two parts: -- the fast path up() console semaphore and wake up printing kthread (if there is one, of course), otherwise -- the slow path: does what console_unlock() did previously, emit the messages and then up() console semaphore The actual printing loop is, thus, moved to a new function, console_unlock_for_printk(). There are 3 places that unconditionally call it: a) direct printing from vprintk_emit() b) console_flush_on_panic() c) printing kthread callback Signed-off-by: Sergey Senozhatsky --- kernel/printk/printk.c | 51 +++--- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index de45d86..ddaf62e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -303,6 +303,8 @@ static struct task_struct *printk_thread; /* Wait for printing wakeups from async vprintk_emit() */ static DECLARE_WAIT_QUEUE_HEAD(printing_wait); +static void console_unlock_for_printk(void); + static int printing_func(void *data) { while (1) { @@ -314,7 +316,7 @@ static int printing_func(void *data) remove_wait_queue(&printing_wait, &wait); console_lock(); - console_unlock(); + console_unlock_for_printk(); } return 0; @@ -1900,7 +1902,7 @@ asmlinkage int vprintk_emit(int facility, int level, * /dev/kmsg and syslog() users. */ if (console_trylock()) - consol
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello Tejun, On (03/07/16 10:49), Tejun Heo wrote: > On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote: > > A new version. Switched to [printk] kthread. > > There are some benefits to using a percpu workqueue with CPU_INTENSIVE > set or an unbound workqueue. It'd need WQ_RESCUER so it'd still > create a dedicated thread which is used under heavy memory pressure > but workqueue will usaully give you local (cpu or node) worker. One > reason to use kthread directly tho is minimizing the amount of > dependency which is pretty important for printk. If we decide to use > kthread instead of workqueue, let's please do it for the right reason. thanks. I'd personally prefer to go with the "less dependency" option -- a dedicated kthread, I think. mostly for the sake of simplicity. I agree with the point that console_unlock() has unpredictable execution time, and in general case we would have a busy kworker (or sleeping in console_lock() or doing cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules that don't depend on printk. printk with dedicated printk-kthread seems easier to control. how does it sound? -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello, On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote: > A new version. Switched to [printk] kthread. There are some benefits to using a percpu workqueue with CPU_INTENSIVE set or an unbound workqueue. It'd need WQ_RESCUER so it'd still create a dedicated thread which is used under heavy memory pressure but workqueue will usaully give you local (cpu or node) worker. One reason to use kthread directly tho is minimizing the amount of dependency which is pretty important for printk. If we decide to use kthread instead of workqueue, let's please do it for the right reason. Thanks. -- tejun
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello, On Mon, Mar 07, 2016 at 09:22:30AM +0100, Jan Kara wrote: > > > I don't know what MAYDAY is. I'm talking about a situation where > > > printing_work > > > work item is not processed (i.e. printing_work_func() is not called) until > > > current work item calls schedule_timeout_*(). That was because the work item was percpu and not marked CPU_INTENSIVE. Either using an unbound or CPU_INTENSIVE workqueue should be enough. > > > We had a problem that since vmstat_work work item was using system_wq, > > > vmstat_work work item was not processed (i.e. vmstat_update() was not > > > called) if > > > kworker was looping inside memory allocator without calling > > > schedule_timeout_*() > > > due to disk_events_workfn() doing GFP_NOIO allocation). > > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > (WQ_MEM_RECLAIM). Because WQ_MEM_RECLAIM only guarantees concurrency of 1, it doesn't make sense to set it to a shared workqueue. A dedicated workquee should be created per domain which needs forward progress guarantee. > > hm. yes, seems that it may take some time until workqueue wakeup() a > > ->rescuer thread. > > need to look more. > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > creating a worker process and wakes up rescuer thread. However I don't see > that as a problem... I don't think it matters. At that point, the system should already be thrashing heavily and everything is crawling anyway. A couple jiffies delay isn't gonna be noticeable. Thanks. -- tejun
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/07/16 13:16), Jan Kara wrote: [..] > > So for UP systems, we should by default disable async printing anyway I > > suppose. It is just a pointless overhead. So please just make printk_sync > > default to true if !CONFIG_SMP. > > > > When IRQs are disabled, you're right we will have a change in behavior. I > > don't see an easy way of avoiding delaying of printk until IRQs get > > enabled. I don't want to queue work directly because that creates > > possibility for lock recursion in queue_work(). And playing some tricks > > with irq_works isn't easy either - you cannot actually rely on any other > > CPU doing anything (even a timer tick) because of NOHZ. > > > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? A new version. Switched to [printk] kthread. Minimally tested only. ===8<===8<=== >From 6f697488b20b356e4103ddcc4f8b0ec1fa6a9531 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 7 Mar 2016 23:53:46 +0900 Subject: [PATCH 1/2] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory), thus printk() uses its own dedicated printing kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit set we potentially can receive delays in printing until workqueue declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention, is that deferred printk() messages may appear before @printk_thread is allocated, so in the very beginning we have to print deferred messages the old way -- via IRQs. Signed-off-by: Jan Kara Signed-off-by: Sergey Senozhatsky --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 193 +--- 2 files changed, 145 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time=Show timing data prefixed to each printk message line Format: (1/Y/y=enable, 0/N/n=disable) + printk.
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/07/16 11:52), Jan Kara wrote: [..] > So for UP systems, we should by default disable async printing anyway I > suppose. It is just a pointless overhead. So please just make printk_sync > default to true if !CONFIG_SMP. ok, thanks. > When IRQs are disabled, you're right we will have a change in behavior. I > don't see an easy way of avoiding delaying of printk until IRQs get > enabled. I don't want to queue work directly because that creates > possibility for lock recursion in queue_work(). And playing some tricks > with irq_works isn't easy either - you cannot actually rely on any other > CPU doing anything (even a timer tick) because of NOHZ. yes. I thought about some sort of PRINTK_IPI on smp, and queue work from PRINTK_IPI. which is a bit insane. > So if this will be a problem in practice, using a kthread will probably be > the easiest solution. probably kthread is the way to go then. Tetsuo's bad experience with workqueues also sounds a bit alarming. I'll post a new patch soon. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/07/16 20:10), Tetsuo Handa wrote: [..] > > > > hm. yes, seems that it may take some time until workqueue wakeup() a > > > > ->rescuer thread. > > > > need to look more. > > > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > > creating a worker process and wakes up rescuer thread. However I don't see > > > that as a problem... > > > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > > delay. the two commits that Tetsuo pointed at earlier in he loop > > (373ccbe59270 > > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > > I've slightly tested OOM-kill on my desktop system and haven't spotted any > > printk delays (well, a test on desktop is not really representative, of > > course). > > I wanted to tell that if kworker is running a buggy function that calls > cond_resched() but does not call schedule_timeout_*() for very long time, > such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for > waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can > still cause huge delay if kworker does not call schedule_timeout_*(). > Not specific to OOM-killer or vmstat. your point is taken. thanks. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Jan Kara wrote: > > So for UP systems, we should by default disable async printing anyway I > > suppose. It is just a pointless overhead. So please just make printk_sync > > default to true if !CONFIG_SMP. > > > > When IRQs are disabled, you're right we will have a change in behavior. I > > don't see an easy way of avoiding delaying of printk until IRQs get > > enabled. I don't want to queue work directly because that creates > > possibility for lock recursion in queue_work(). And playing some tricks > > with irq_works isn't easy either - you cannot actually rely on any other > > CPU doing anything (even a timer tick) because of NOHZ. > > > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? > I think a plain kthread is better. printing_work_func() can become a buggy function. console_unlock() from printing_work_func() can occupy kworker for many seconds due to looping with cond_resched() (i.e. without calling schedule_timeout_*()) from schedulable context which can in turn defer other work items.
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On Mon 07-03-16 11:52:48, Jan Kara wrote: > On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote: > > Hello, > > > > On (03/07/16 09:22), Jan Kara wrote: > > [..] > > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer > > > > thread > > > > (WQ_MEM_RECLAIM). > > > > > > > > [..] > > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work > > > > > item, > > > > > printing_work_func() will not be called until current work item calls > > > > > schedule_timeout_*(). That will be an undesirable random delay. If > > > > > you use > > > > > a dedicated kernel thread rather than a dedicated workqueue with > > > > > WQ_MEM_RECLAIM, > > > > > we can avoid this random delay. > > > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a > > > > ->rescuer thread. > > > > need to look more. > > > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > > creating a worker process and wakes up rescuer thread. However I don't see > > > that as a problem... > > > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > > delay. the two commits that Tetsuo pointed at earlier in he loop > > (373ccbe59270 > > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > > I've slightly tested OOM-kill on my desktop system and haven't spotted any > > printk delays (well, a test on desktop is not really representative, of > > course). > > > > > > the only thing that so far grabbed my attention - is > > > > __this_cpu_or(printk_pending) > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > > > a _theoretical_ corner case here is when we have only one CPU doing a bunch > > of printk()s and this CPUs disables irqs in advance > > local_irq_save > > for (...) > > printk() > > local_irq_restore() > > > > if no other CPUs see `printk_pending' then nothing will be printed up > > until local_irq_restore() (assuming that IRQ disable time is withing > > the hardlockup detection threshold). if any other CPUs concurrently > > execute printk then we are fine, but > > a) if none -- then we probably have a small change in behaviour > > and > > b) UP systems > > So for UP systems, we should by default disable async printing anyway I > suppose. It is just a pointless overhead. So please just make printk_sync > default to true if !CONFIG_SMP. > > When IRQs are disabled, you're right we will have a change in behavior. I > don't see an easy way of avoiding delaying of printk until IRQs get > enabled. I don't want to queue work directly because that creates > possibility for lock recursion in queue_work(). And playing some tricks > with irq_works isn't easy either - you cannot actually rely on any other > CPU doing anything (even a timer tick) because of NOHZ. > > So if this will be a problem in practice, using a kthread will probably be > the easiest solution. Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues create kthread anyway as a rescuer thread, it may be the simplest to just go back to using a single kthread for printing. What do you think? Honza -- Jan Kara SUSE Labs, CR
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Sergey Senozhatsky wrote: > Hello, > > On (03/07/16 09:22), Jan Kara wrote: > [..] > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > > (WQ_MEM_RECLAIM). > > > > > > [..] > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work > > > > item, > > > > printing_work_func() will not be called until current work item calls > > > > schedule_timeout_*(). That will be an undesirable random delay. If you > > > > use > > > > a dedicated kernel thread rather than a dedicated workqueue with > > > > WQ_MEM_RECLAIM, > > > > we can avoid this random delay. > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a > > > ->rescuer thread. > > > need to look more. > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > creating a worker process and wakes up rescuer thread. However I don't see > > that as a problem... > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > I've slightly tested OOM-kill on my desktop system and haven't spotted any > printk delays (well, a test on desktop is not really representative, of > course). I wanted to tell that if kworker is running a buggy function that calls cond_resched() but does not call schedule_timeout_*() for very long time, such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can still cause huge delay if kworker does not call schedule_timeout_*(). Not specific to OOM-killer or vmstat.
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote: > Hello, > > On (03/07/16 09:22), Jan Kara wrote: > [..] > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > > (WQ_MEM_RECLAIM). > > > > > > [..] > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work > > > > item, > > > > printing_work_func() will not be called until current work item calls > > > > schedule_timeout_*(). That will be an undesirable random delay. If you > > > > use > > > > a dedicated kernel thread rather than a dedicated workqueue with > > > > WQ_MEM_RECLAIM, > > > > we can avoid this random delay. > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a > > > ->rescuer thread. > > > need to look more. > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > creating a worker process and wakes up rescuer thread. However I don't see > > that as a problem... > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > I've slightly tested OOM-kill on my desktop system and haven't spotted any > printk delays (well, a test on desktop is not really representative, of > course). > > > the only thing that so far grabbed my attention - is > > __this_cpu_or(printk_pending) > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > a _theoretical_ corner case here is when we have only one CPU doing a bunch > of printk()s and this CPUs disables irqs in advance > local_irq_save > for (...) > printk() > local_irq_restore() > > if no other CPUs see `printk_pending' then nothing will be printed up > until local_irq_restore() (assuming that IRQ disable time is withing > the hardlockup detection threshold). if any other CPUs concurrently > execute printk then we are fine, but > a) if none -- then we probably have a small change in behaviour > and > b) UP systems So for UP systems, we should by default disable async printing anyway I suppose. It is just a pointless overhead. So please just make printk_sync default to true if !CONFIG_SMP. When IRQs are disabled, you're right we will have a change in behavior. I don't see an easy way of avoiding delaying of printk until IRQs get enabled. I don't want to queue work directly because that creates possibility for lock recursion in queue_work(). And playing some tricks with irq_works isn't easy either - you cannot actually rely on any other CPU doing anything (even a timer tick) because of NOHZ. So if this will be a problem in practice, using a kthread will probably be the easiest solution. Honza -- Jan Kara SUSE Labs, CR
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Hello, On (03/07/16 09:22), Jan Kara wrote: [..] > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > (WQ_MEM_RECLAIM). > > > > [..] > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > > printing_work_func() will not be called until current work item calls > > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > > a dedicated kernel thread rather than a dedicated workqueue with > > > WQ_MEM_RECLAIM, > > > we can avoid this random delay. > > > > hm. yes, seems that it may take some time until workqueue wakeup() a > > ->rescuer thread. > > need to look more. > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > creating a worker process and wakes up rescuer thread. However I don't see > that as a problem... yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. I've slightly tested OOM-kill on my desktop system and haven't spotted any printk delays (well, a test on desktop is not really representative, of course). the only thing that so far grabbed my attention - is __this_cpu_or(printk_pending) irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); a _theoretical_ corner case here is when we have only one CPU doing a bunch of printk()s and this CPUs disables irqs in advance local_irq_save for (...) printk() local_irq_restore() if no other CPUs see `printk_pending' then nothing will be printed up until local_irq_restore() (assuming that IRQ disable time is withing the hardlockup detection threshold). if any other CPUs concurrently execute printk then we are fine, but a) if none -- then we probably have a small change in behaviour and b) UP systems [..] > > such usage is quite possible. > > > > problems that I have with console_lock()/console_unlock() is that > > these functions serve a double purpose: exclusive printk() lock and a > > console_drivers list lock. > > Well, but changing how console locking works is a separate issue, isn't it? > So please as a separate patch set if you want to try it. absolutely agree, this is a separate thing. > Actually I don't think changing the locking will be so easy. again, agree. splitting any lock is always tricky and risky. especially if we talk about console_sem. it can easily add up new deadlocks, make some fbcon unhappy, etc. etc. register_console() write_lock_console_lock() if (error) printk() printk_lock() read_lock_console_lock() <- eadlock and so on and so forth; I'm not very enthusiastic to change this at the moment. -ss > console_lock/unlock is used e.g. for console blanking where you need to > block any printing while you call ->unblank() for each console. That being > said I don't think improvement is impossible, just given my experiences > with console / printk code there will be surprises waiting for you :). > > Honza > -- > Jan Kara > SUSE Labs, CR >
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On Sun 06-03-16 22:27:03, Sergey Senozhatsky wrote: > On (03/06/16 20:06), Tetsuo Handa wrote: > [..] > > > do you mean a new worker allocation delay and a MAYDAY timer delay? > > > > > > > I don't know what MAYDAY is. I'm talking about a situation where > > printing_work > > work item is not processed (i.e. printing_work_func() is not called) until > > current work item calls schedule_timeout_*(). > > > > We had a problem that since vmstat_work work item was using system_wq, > > vmstat_work work item was not processed (i.e. vmstat_update() was not > > called) if > > kworker was looping inside memory allocator without calling > > schedule_timeout_*() > > due to disk_events_workfn() doing GFP_NOIO allocation). > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > (WQ_MEM_RECLAIM). > > [..] > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > printing_work_func() will not be called until current work item calls > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > a dedicated kernel thread rather than a dedicated workqueue with > > WQ_MEM_RECLAIM, > > we can avoid this random delay. > > hm. yes, seems that it may take some time until workqueue wakeup() a > ->rescuer thread. > need to look more. Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up creating a worker process and wakes up rescuer thread. However I don't see that as a problem... > > > console_lock(); > > > for (...) { > > > do_foo() { > > > ... > > > pr_err(" ... foo message ...\n"); > > > ... > > > } > > > } > > > console_unlock(); > > > > > > then yes, nothing will be printed until that process executes > > > console_unlock(), > > > because it's console_unlock() that pushes the messages to console drivers. > > > > Yes, I meant a sequence like > > > > console_lock(); > > ptr = kmalloc(GFP_KERNEL); > > kfree(ptr); > > console_unlock(); > > > > and kmalloc() prints OOM killer messages rather than failing that > > allocation. > > Are we sure that there is no such usage? > > such usage is quite possible. > > problems that I have with console_lock()/console_unlock() is that > these functions serve a double purpose: exclusive printk() lock and a > console_drivers list lock. Well, but changing how console locking works is a separate issue, isn't it? So please as a separate patch set if you want to try it. Actually I don't think changing the locking will be so easy. console_lock/unlock is used e.g. for console blanking where you need to block any printing while you call ->unblank() for each console. That being said I don't think improvement is impossible, just given my experiences with console / printk code there will be surprises waiting for you :). Honza -- Jan Kara SUSE Labs, CR
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Sergey Senozhatsky wrote: > such usage is quite possible. > > problems that I have with console_lock()/console_unlock() is that > these functions serve a double purpose: exclusive printk() lock and a > console_drivers list lock. Yes, I don't like it too. > > I haven't really thought about it yet, but I want to split it. > Since writing to console does not call schedule(), I think rcu_read_lock()/rcu_read_unlock()/synchronize_rcu() (or synchronize_rcu_*() ?) can manage it without using read_lock_console()/read_unlock_console()/write_lock_console()/write_unlock_console(). Replacing console_lock()/console_unlock() for protecting console_drivers list with RCU might be helpful.
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/06/16 20:06), Tetsuo Handa wrote: [..] > > do you mean a new worker allocation delay and a MAYDAY timer delay? > > > > I don't know what MAYDAY is. I'm talking about a situation where printing_work > work item is not processed (i.e. printing_work_func() is not called) until > current work item calls schedule_timeout_*(). > > We had a problem that since vmstat_work work item was using system_wq, > vmstat_work work item was not processed (i.e. vmstat_update() was not called) > if > kworker was looping inside memory allocator without calling > schedule_timeout_*() > due to disk_events_workfn() doing GFP_NOIO allocation). hm, just for note, none of system-wide wqs seem to have a ->rescuer thread (WQ_MEM_RECLAIM). [..] > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > printing_work_func() will not be called until current work item calls > schedule_timeout_*(). That will be an undesirable random delay. If you use > a dedicated kernel thread rather than a dedicated workqueue with > WQ_MEM_RECLAIM, > we can avoid this random delay. hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. need to look more. [..] > > console_lock(); > > for (...) { > > do_foo() { > > ... > > pr_err(" ... foo message ...\n"); > > ... > > } > > } > > console_unlock(); > > > > then yes, nothing will be printed until that process executes > > console_unlock(), > > because it's console_unlock() that pushes the messages to console drivers. > > Yes, I meant a sequence like > > console_lock(); > ptr = kmalloc(GFP_KERNEL); > kfree(ptr); > console_unlock(); > > and kmalloc() prints OOM killer messages rather than failing that allocation. > Are we sure that there is no such usage? such usage is quite possible. problems that I have with console_lock()/console_unlock() is that these functions serve a double purpose: exclusive printk() lock and a console_drivers list lock. I haven't really thought about it yet, but I want to split it. console_lock()/console_unlock() can be executed by user space processes (inside system calls). for example: SyS_open() ... chrdev_open() tty_open() console_device() console_lock() console_unlock() for (;;) { call_console_drivers() } or doing `cat /proc/consoles` SyS_read() vfs_read() proc_reg_read() seq_read() c_stop() console_unlock() for (;;) { call_console_drivers() } which can introduce two nasty problems: 1) console_lock() may put user space process in TASK_UNINTERRUPTIBLE for unknown period of time -- until current console_sem owner will not finish print loop in console_unlock(). no signals, heartbeats, etc. will be processed by this user space process. 2) user space process may have to spend an unknown period of time in console_unlock() later, pushing "3rd party" kernel messages to console drivers. again, not really good. (kthreads can suffer here too, sure). in the examples above, a process just wanted to iterate the console_drivers list in read access mode. so, for instance, in struct tty_driver *console_device(int *index) { struct console *c; struct tty_driver *driver = NULL; console_lock(); for_each_console(c) { if (!c->device) continue; driver = c->device(c, index); if (driver) break; } console_unlock(); return driver; } instead of console_lock()/console_unlock()->call_console_drivers() it could do (very schematically) read_lock_console(); for_each_console(c) { if (!c->device) continue; driver = c->device(c, index); if (driver) break; } read_unlock_console(); and in functions that modify the list, the lock can be acquired in write mode. example, int unregister_console(struct console *console) { write_lock_console(); ... for (a=console_drivers->next, b=console_drivers ; a; b=a, a=b->next) { if (a == console) { b->next = a->next; res = 0; break; } } } ... if (console_drivers != NULL && console->flags & CON_CONSDEV) console_drivers->flags |= CON_CONSDEV; console->flags &= ~CON_ENABLED; write_unlock_unlock(); printk(), thus, will take its own "another" exclusive lock, to guarantee that only one process can call_console_drivers(), and it will take the console_drivers list lock in read mode. so other process(-es) that also want to access console_drivers list in
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Sergey Senozhatsky wrote: > On (03/06/16 16:18), Tetsuo Handa wrote: > > Moreover, I don't like use of a workqueue even if printk_wq was allocated > > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, > > the OOM reaper chose a dedicated kernel thread rather than a workqueue > > ( > > http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org > > ). > > > > Blocking actual printing until ongoing workqueue item calls > > schedule_timeout_*() > > is not nice (see commit 373ccbe59270 and 564e81a57f97). > > do you mean a new worker allocation delay and a MAYDAY timer delay? > I don't know what MAYDAY is. I'm talking about a situation where printing_work work item is not processed (i.e. printing_work_func() is not called) until current work item calls schedule_timeout_*(). We had a problem that since vmstat_work work item was using system_wq, vmstat_work work item was not processed (i.e. vmstat_update() was not called) if kworker was looping inside memory allocator without calling schedule_timeout_*() due to disk_events_workfn() doing GFP_NOIO allocation). -- [ 271.579276] MemAlloc: kworker/0:56(7399) gfp=0x240 order=0 delay=129294 [ 271.581237] 88007c78fa08 8800778f8c80 88007c79 8800778f8c80 [ 271.583329] 0240 8800778f8c80 88007c78fa20 [ 271.585391] 8162aa9d 0001 88007c78fa30 8162aac7 [ 271.587463] Call Trace: [ 271.588512] [] preempt_schedule_common+0x18/0x2b [ 271.590243] [] _cond_resched+0x17/0x20 [ 271.591830] [] __alloc_pages_nodemask+0x64e/0xcc0 [ 271.593561] [] ? __kmalloc+0x22/0x190 [ 271.595119] [] alloc_pages_current+0x87/0x110 [ 271.596778] [] bio_copy_kern+0xc4/0x180 [ 271.598342] [] ? wait_woken+0x80/0x80 [ 271.599878] [] blk_rq_map_kern+0x70/0x130 [ 271.601481] [] ? blk_get_request+0x75/0xe0 [ 271.603100] [] scsi_execute+0x12d/0x160 [ 271.604657] [] scsi_execute_req_flags+0x84/0xf0 [ 271.606339] [] sr_check_events+0xb2/0x2a0 [sr_mod] [ 271.608141] [] ? set_next_entity+0x6c/0x6a0 [ 271.609830] [] cdrom_check_events+0x13/0x30 [cdrom] [ 271.611610] [] sr_block_check_events+0x25/0x30 [sr_mod] [ 271.613429] [] disk_check_events+0x5b/0x150 [ 271.615065] [] disk_events_workfn+0x11/0x20 [ 271.616699] [] process_one_work+0x135/0x310 [ 271.618321] [] worker_thread+0x11b/0x4a0 [ 271.620018] [] ? process_one_work+0x310/0x310 [ 271.622022] [] kthread+0xd3/0xf0 [ 271.623533] [] ? kthread_create_on_node+0x1a0/0x1a0 [ 271.625487] [] ret_from_fork+0x3f/0x70 [ 271.627175] [] ? kthread_create_on_node+0x1a0/0x1a0 -- Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, printing_work_func() will not be called until current work item calls schedule_timeout_*(). That will be an undesirable random delay. If you use a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, we can avoid this random delay. > > > +static void printing_work_func(struct work_struct *work) > > > +{ > > > + console_lock(); > > > + console_unlock(); > > > +} > > > > Is this safe? If somebody invokes the OOM killer between console_lock() > > and console_unlock(), won't this cause OOM killer messages not printed? > > if you mean something like > > console_lock(); > for (...) { > do_foo() { > ... > pr_err(" ... foo message ...\n"); > ... > } > } > console_unlock(); > > then yes, nothing will be printed until that process executes > console_unlock(), > because it's console_unlock() that pushes the messages to console drivers. Yes, I meant a sequence like console_lock(); ptr = kmalloc(GFP_KERNEL); kfree(ptr); console_unlock(); and kmalloc() prints OOM killer messages rather than failing that allocation. Are we sure that there is no such usage?
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/06/16 16:18), Tetsuo Handa wrote: > Sergey Senozhatsky wrote: > > printk() is expected to work under different conditions and in different > > scenarios, including corner cases of OOM when all of the workers are busy > > (e.g. allocating memory). Thus by default printk() uses its own dedicated > > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq > > (console_unlock() is time unbound) only if it has failed to allocate > > printk_wq. Another thing to mention, is that deferred printk() messages > > may appear before printk_wq is allocated, so in the very beginning we > > have to printk deferred messages the old way -- in IRQ context. > > I think we should not depend on system_long_wq which does not have > WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM), > such systems won't be able to start userspace programs. well, yes. system_long_wq is a fallback, but probably just BUG_ON(!printk_wq) would be a better thing to do. > Moreover, I don't like use of a workqueue even if printk_wq was allocated > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, > the OOM reaper chose a dedicated kernel thread rather than a workqueue > ( > http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org > ). > > Blocking actual printing until ongoing workqueue item calls > schedule_timeout_*() > is not nice (see commit 373ccbe59270 and 564e81a57f97). printk and console_drivers are expected to be 'callable' from any context, including IRQs, or under spin_locks, etc. so neither part of printk->console_unlock()->console_driver->write() can sleep. console_driver->write() is quite often something like this foo_write() { spin_lock_irqsave(&port->lock); uart_console_write( ... foo_put_char); spin_unlock_iqrestore(&port->lock); } and foo_put_char(...), perhaps, waits for device to become ready transaction while (!(UART_GET_STATUS(port) & TXEMPTY)) cpu_relax(); and then sends out the character to the device UART_SET_DATA(port, (unsigned char)ch) so an already executing printk_wq item shouldn't block because of OOM happening on another CPU. do you mean a new worker allocation delay and a MAYDAY timer delay? > Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, > using > a kernel thread does not change total number of task_struct compared to > WQ_MEM_RECLAIM > approach. I think actual printing should occur as soon as possible rather > than randomly > deferred until workqueue item sleeps. hm, need to take a look. > > +static void printing_work_func(struct work_struct *work) > > +{ > > + console_lock(); > > + console_unlock(); > > +} > > Is this safe? If somebody invokes the OOM killer between console_lock() > and console_unlock(), won't this cause OOM killer messages not printed? between console_lock() and console_unlock() nothing can steal that lock from worker; so once a worker grabbed the console_lock() it will print (and console_lock() does not mean "lock logbuf", printk() from other CPUs can add messages to the logbuf while worker loops in console_unlock()). if you mean something like console_lock(); for (...) { do_foo() { ... pr_err(" ... foo message ...\n"); ... } } console_unlock(); then yes, nothing will be printed until that process executes console_unlock(), because it's console_unlock() that pushes the messages to console drivers. (console_lock()/console_unlock() have other problems and issues, which are not addressed in this patch set). there is only one function that ignores the state of console semaphore: panic()->console_flush_on_panic() but that's a separate thing, not related to this patch set. if you mean something else here, then please more details. -ss
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
Sergey Senozhatsky wrote: > printk() is expected to work under different conditions and in different > scenarios, including corner cases of OOM when all of the workers are busy > (e.g. allocating memory). Thus by default printk() uses its own dedicated > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq > (console_unlock() is time unbound) only if it has failed to allocate > printk_wq. Another thing to mention, is that deferred printk() messages > may appear before printk_wq is allocated, so in the very beginning we > have to printk deferred messages the old way -- in IRQ context. I think we should not depend on system_long_wq which does not have WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM), such systems won't be able to start userspace programs. Moreover, I don't like use of a workqueue even if printk_wq was allocated with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, the OOM reaper chose a dedicated kernel thread rather than a workqueue ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mho...@kernel.org ). Blocking actual printing until ongoing workqueue item calls schedule_timeout_*() is not nice (see commit 373ccbe59270 and 564e81a57f97). Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, using a kernel thread does not change total number of task_struct compared to WQ_MEM_RECLAIM approach. I think actual printing should occur as soon as possible rather than randomly deferred until workqueue item sleeps. > +static void printing_work_func(struct work_struct *work) > +{ > + console_lock(); > + console_unlock(); > +} Is this safe? If somebody invokes the OOM killer between console_lock() and console_unlock(), won't this cause OOM killer messages not printed?
Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async
On (03/05/16 19:55), Sergey Senozhatsky wrote: [..] > +static int __init init_printk_workqueue(void) > +{ > + if (printk_sync) > + return 0; > + > + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); > + /* > + * Fallback to one of system-wide workqueues if we failed to > + * allocate a dedicated one. console_unlock() may take some > + * time to flush all of the messages, so we better use > + * system_long_wq. > + */ > + if (!printk_wq) { > + pr_err("printk: Fallback to system_long workqueue\n"); > + printk_wq = system_long_wq; > + } > + > + return 0; > +} > +early_initcall(init_printk_workqueue); ^^ better be late_initcall(). init_workqueues(), which creates pwq_cache kmem_cache, is early initcall. ===8<===8<===8< >From d9cf33958febd863ceb25c59d447aa9500427c30 Mon Sep 17 00:00:00 2001 From: Jan Kara Subject: [PATCH 1/2] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory). Thus by default printk() uses its own dedicated workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq (console_unlock() is time unbound) only if it has failed to allocate printk_wq. Another thing to mention, is that deferred printk() messages may appear before printk_wq is allocated, so in the very beginning we have to printk deferred messages the old way -- in IRQ context. Signed-off-by: Jan Kara Signed-off-by: Sergey Senozhatsky --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 194 +--- 2 files changed, 146 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time=Show timing data prefixed to each printk message line Format: (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt +