Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On Mon 2020-09-28 12:25:59, Peter Zijlstra wrote: > On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote: > > > Well, you are lucky. So it's a problem in our printk implementation. > > Not lucky; I just kicked it in the groin really hard: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git > debug/experimental > > > The deadlock path is: > > > > printk > > vprintk_emit > > console_unlock > > vt_console_print > > hide_cursor > > bit_cursor > > soft_cursor > > queue_work_on > > __queue_work > > try_to_wake_up > > _raw_spin_lock > > native_queued_spin_lock_slowpath > > > > Looks like it's introduced by this commit: > > > > eaa434defaca1781fb2932c685289b610aeb8b4b > > > > "drm/fb-helper: Add fb_deferred_io support" > > Oh gawd, yeah, all the !serial consoles are utter batshit. > > Please look at John's last printk rewrite, IIRC it farms all that off to > a kernel thread instead of doing it from the printk() caller's context. > > I'm not sure where he hides his latests patches, but I'm sure he'll be > more than happy to tell you. AFAIK, John is just working on updating the patchset so that it will be based on the lockless ringbuffer that is finally in the queue for-5.10. Upstreaming the console handling will be the next big step. I am sure that there will be long discussion about it. But there might be few things that would help removing printk_deferred(). 1. Messages will be printed on consoles by dedicated kthreads. It will be safe context. No deadlocks. 2. The registration and unregistration of consoles should not longer be handled by console_lock (semaphore). It should be possible to call most consoles without a sleeping lock. It should remove all these deadlocks between printk() and scheduler(). There might be problems with some consoles. For example, tty would most likely still need a sleeping lock because it is using the console semaphore also internally. 3. We will try harder to get the messages out immediately during panic(). It would take some time until the above reaches upstream. But it seems to be the right way to go. About printk_deferred(): It is a whack a mole game. It is easy to miss printk() that might eventually cause the deadlock. printk deferred context is more safe. But it is still a what a mole game. The kthreads will do the same job for sure. Finally, the deadlock happens "only" when someone is waiting on console_lock() in parallel. Otherwise, the waitqueue for the semaphore is empty and scheduler is not called. It means that there is quite a big change to see the WARN(). It might be even bigger than with printk_deferred() because WARN() in scheduler means that the scheduler is big troubles. Nobody guarantees that the deferred messages will get handled later. Best Regards, Petr ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On (20/09/29 17:09), Peter Zijlstra wrote: > > 2. The registration and unregistration of consoles should not longer > >be handled by console_lock (semaphore). It should be possible to > >call most consoles without a sleeping lock. It should remove all > >these deadlocks between printk() and scheduler(). > > I'm confused, who cares about registation? That only happens once at > boot, right? You can modprobe or rmmod (register/unregister) netconsole anytime, for example. -ss ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On Tue, Sep 29, 2020 at 04:27:51PM +0200, Petr Mladek wrote: > Upstreaming the console handling will be the next big step. I am sure > that there will be long discussion about it. But there might be > few things that would help removing printk_deferred(). > > 1. Messages will be printed on consoles by dedicated kthreads. It will >be safe context. No deadlocks. This, that's what I remember. With sane consoles having a ->write_atomic() callback which is called in-line. Current -RT has a single kthread that's flushing the consoles. > 2. The registration and unregistration of consoles should not longer >be handled by console_lock (semaphore). It should be possible to >call most consoles without a sleeping lock. It should remove all >these deadlocks between printk() and scheduler(). I'm confused, who cares about registation? That only happens once at boot, right? >There might be problems with some consoles. For example, tty would >most likely still need a sleeping lock because it is using the console >semaphore also internally. But per 1) above, that's done from a kthread, so nobody cares. > 3. We will try harder to get the messages out immediately during >panic(). As long as you guarantee to first write everything to consoles with ->write_atomic() before attempting to flush others that should be fine. > It would take some time until the above reaches upstream. But it seems > to be the right way to go. Yep. > Finally, the deadlock happens "only" when someone is waiting on > console_lock() in parallel. Otherwise, the waitqueue for the semaphore > is empty and scheduler is not called. There's more deadlocks. In fact the one described earlier in this thread isn't the one you mention. > It means that there is quite a big change to see the WARN(). It might > be even bigger than with printk_deferred() because WARN() in scheduler > means that the scheduler is big troubles. Nobody guarantees that > the deferred messages will get handled later. I only care about ->write_atomic() :-) Anything else is a best-effort-loss anyway. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
在 2020/9/28 下午5:01, Peter Zijlstra 写道: > On Mon, Sep 28, 2020 at 04:54:53PM +0800, Chengming Zhou wrote: >> 在 2020/9/28 下午3:32, Peter Zijlstra 写道: >>> On Mon, Sep 28, 2020 at 12:11:30AM +0800, Chengming Zhou wrote: The WARN_ON/WARN_ON_ONCE with rq lock held in __schedule() should be deferred by marking the PRINTK_DEFERRED_CONTEXT_MASK, or will cause deadlock on rq lock in the printk path. >>> It also shouldn't happen in the first place, so who bloody cares. >> Yes, but if our box deadlock just because a WARN_ON_ONCE, we have to >> reboot : ( > You have to reboot anyway to get into the fixed kernel. Mostly, WARN_ON_ONCE happened in the perf code on our machines, we actually don't care too much about the perf function works : ) Looks like we have to find and fix that perf bug before go on... >> So these WARN_ON_ONCE have BUG_ON effect ? Or we should change to use >> BUG_ON ? > Or use a sane printk implementation, I never suffer this. Well, you are lucky. So it's a problem in our printk implementation. The deadlock path is: printk vprintk_emit console_unlock vt_console_print hide_cursor bit_cursor soft_cursor queue_work_on __queue_work try_to_wake_up _raw_spin_lock native_queued_spin_lock_slowpath Looks like it's introduced by this commit: eaa434defaca1781fb2932c685289b610aeb8b4b "drm/fb-helper: Add fb_deferred_io support" This adds deferred io support to drm_fb_helper. The fbdev framebuffer changes are flushed using the callback (struct drm_framebuffer *)->funcs->dirty() by a dedicated worker ensuring that it always runs in process context. For those wondering why we need to be able to handle atomic calling contexts: Both panic paths and cursor code and fbcon blanking can run from atomic. See commit bcb39af4486be07e896fc374a2336bad3104ae0a Author: Dave Airlie Date: Thu Feb 7 11:19:15 2013 +1000 drm/udl: make usage as a console safer for where this was originally discovered. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On (20/09/28 12:25), Peter Zijlstra wrote: [..] > > printk > > vprintk_emit > > console_unlock > > vt_console_print > > hide_cursor > > bit_cursor > > soft_cursor > > queue_work_on > > __queue_work > > try_to_wake_up > > _raw_spin_lock > > native_queued_spin_lock_slowpath > > > > Looks like it's introduced by this commit: > > > > eaa434defaca1781fb2932c685289b610aeb8b4b > > > > "drm/fb-helper: Add fb_deferred_io support" > > Oh gawd, yeah, all the !serial consoles are utter batshit. > > Please look at John's last printk rewrite, IIRC it farms all that off to > a kernel thread instead of doing it from the printk() caller's context. Not yet. Scheduler is still part of the printk() - either in the form of !serial consoles or console_sem, or both. -ss ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote: > Well, you are lucky. So it's a problem in our printk implementation. Not lucky; I just kicked it in the groin really hard: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental > The deadlock path is: > > printk > vprintk_emit > console_unlock > vt_console_print > hide_cursor > bit_cursor > soft_cursor > queue_work_on > __queue_work > try_to_wake_up > _raw_spin_lock > native_queued_spin_lock_slowpath > > Looks like it's introduced by this commit: > > eaa434defaca1781fb2932c685289b610aeb8b4b > > "drm/fb-helper: Add fb_deferred_io support" Oh gawd, yeah, all the !serial consoles are utter batshit. Please look at John's last printk rewrite, IIRC it farms all that off to a kernel thread instead of doing it from the printk() caller's context. I'm not sure where he hides his latests patches, but I'm sure he'll be more than happy to tell you. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel