[PATCH] fbcon: use default if cursor blink interval is not valid
On Thu, May 19, 2016 at 10:22 PM, Scot Doyle wrote: > On Thu, 19 May 2016, Pavel Machek wrote: >> Hi! >> >> > Two current [1] and three previous [2] systems locked during boot >> > because the cursor flash timer was set using an ops->cur_blink_jiffies >> > value of 0. Previous patches attempted to solve the problem by moving >> > variable initialization earlier in the setup sequence [2]. >> > >> > Use the normal cursor blink default interval of 200 ms if >> > ops->cur_blink_jiffies is not in the range specified in commit >> > bd63364caa8d. Since invalid values are not used, specific system >> > initialization timings should not cause lockups. >> > >> > [1] https://bugs.launchpad.net/bugs/1574814 >> > [2] see commits: 2a17d7e80f1d, f235f664a8af, a1e533ec07d5 >> >> Acked-by: Pavel Machek >> >> > static void cursor_timer_handler(unsigned long dev_addr) >> > { >> > struct fb_info *info = (struct fb_info *) dev_addr; >> > struct fbcon_ops *ops = info->fbcon_par; >> > >> > queue_work(system_power_efficient_wq, >queue); >> > - mod_timer(>cursor_timer, jiffies + ops->cur_blink_jiffies); >> > + mod_timer(>cursor_timer, jiffies + >> > + cursor_blink_jiffies(ops->cur_blink_jiffies)); >> > } >> > >> > static void fbcon_add_cursor_timer(struct fb_info *info) >> >> And actually... perhaps mod_timer should have some check for too low >> timeouts..? >> >> WARN_ON? >> Pavel > > > Interesting idea. I applied this patch to a couple systems and > receive the same warning on both: If 'jiffies' is passed to mod_timer() for same timer unusually OR mod_timer() isn't called from the timer handler, it shoudln't cause soft lockup. In the case of fbcon, 'jiffies' is always passed to mod_timer() and mod_timer() is called from the timer handler meantime, that is a real lockup. > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 73164c3..f6c0b69 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -788,6 +788,7 @@ __mod_timer(struct timer_list *timer, unsigned long > expires, > > timer_stats_timer_set_start_info(timer); > BUG_ON(!timer->function); > + WARN_ONCE(expires == jiffies, "timer should expire in the future"); > > base = lock_timer_base(timer, ); > > -- > > [2.060474] [ cut here ] > [2.061613] WARNING: CPU: 0 PID: 164 at kernel/time/timer.c:791 > mod_timer+0x233/0x240 > [2.062740] timer should expire in the future > [2.062757] CPU: 0 PID: 164 Comm: kworker/0:2 Not tainted 4.6.0+ #7 > [2.065870] Hardware name: Toshiba Leon, BIOS 12/04/2013 > [2.067828] Workqueue: events_power_efficient hub_init_func3 > [2.069762] 88007443bbb8 8139932b > 88007443bc08 > [2.071701] 88007443bbf8 8112e57c > 0317 > [2.073655] 88007486a0b0 fffea2da 88007486a000 > 0202 > [2.075594] Call Trace: > [2.077503] [] dump_stack+0x4d/0x72 > [2.079426] [] __warn+0xcc/0xf0 > [2.081325] [] warn_slowpath_fmt+0x4f/0x60 > [2.083212] [] ? find_next_bit+0x15/0x20 > [2.085022] [] ? cpumask_next_and+0x2f/0x40 > [2.086696] [] mod_timer+0x233/0x240 > [2.088362] [] usb_hcd_submit_urb+0x3f2/0x8c0 > [2.090026] [] ? urb_destroy+0x24/0x30 > [2.091698] [] ? insert_work+0x58/0xb0 > [2.093349] [] usb_submit_urb+0x287/0x530 > [2.094985] [] hub_activate+0x1fd/0x5d0 > [2.096625] [] ? finish_task_switch+0x78/0x1f0 > [2.098268] [] hub_init_func3+0x1a/0x20 > [2.099908] [] process_one_work+0x140/0x3e0 > [2.101539] [] worker_thread+0x4e/0x480 > [2.103173] [] ? process_one_work+0x3e0/0x3e0 > [2.104790] [] ? process_one_work+0x3e0/0x3e0 > [2.106259] [] kthread+0xc9/0xe0 > [2.107731] [] ret_from_fork+0x22/0x40 > [2.109215] [] ? __kthread_parkme+0x70/0x70 > [2.110704] ---[ end trace 3519886a1a990d99 ]--- > > mod_timer is called from over a thousand places. Should timers always > expire in the future? >
[PATCH] tty: vt: Fix soft lockup in fbcon cursor blink timer.
On Wed, May 18, 2016 at 4:49 AM, Pavel Machek wrote: > On Tue 2016-05-17 11:41:04, David Daney wrote: >> From: David Daney >> >> We are getting somewhat random soft lockups with this signature: >> >> [ 86.992215] [] el1_irq+0xa0/0x10c >> [ 86.997082] [] cursor_timer_handler+0x30/0x54 >> [ 87.002991] [] call_timer_fn+0x54/0x1a8 >> [ 87.008378] [] run_timer_softirq+0x1c4/0x2bc >> [ 87.014200] [] __do_softirq+0x114/0x344 >> [ 87.019590] [] irq_exit+0x74/0x98 >> [ 87.024458] [] __handle_domain_irq+0x98/0xfc >> [ 87.030278] [] gic_handle_irq+0x94/0x190 >> >> This is caused by the vt visual_init() function calling into >> fbcon_init() with a vc_cur_blink_ms value of zero. This is a >> transient condition, as it is later set to a non-zero value. But, if >> the timer happens to expire while the blink rate is zero, it goes into >> an endless loop, and we get soft lockup. >> >> The fix is to initialize vc_cur_blink_ms before calling the con_init() >> function. >> >> Signed-off-by: David Daney >> Cc: stable at vger.kernel.org > > Acked-by: Pavel Machek Tested-by: Ming Lei Thanks David and Pavel for making it work! > > (And it is amazing how many problems configurable blink speed caused). > > Thanks! > Pavel > > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
ast: cursor flashing softlockups
Hi, On Tue, May 17, 2016 at 4:07 AM, Dann Frazier wrote: > Hi, > I'm observing a soft lockup issue w/ the ASPEED controller on an > arm64 server platform. This was originally seen on Ubuntu's 4.4 > kernel, but it is reproducible w/ vanilla 4.6-rc7 as well. > > [ 32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! > [swapper/38:0] > > I observe this just once each time I boot into debian-installer (I'm > using a serial console, but the ast module gets loaded during > startup). I have figured out that it is caused by 'mod_timer(timer, jiffies)' and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler() when the issue happened. Looks it is a real fbcon/vt issue, see following: fbcon_init() <-.con_init <-visual_init() reset_terminal() <-vc_init() vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path, and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms in fbcon_init(). And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies is initialized as zero and cause the soft lockup issue finally. Thanks, Ming > > perf shows that the CPU caught by the NMI is typically in code > updating the cursor timer: > > - 16.92% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore >- _raw_spin_unlock_irqrestore > + 16.87% mod_timer > + 0.05% cursor_timer_handler > - 12.15% swapper [kernel.kallsyms] [k] queue_work_on >- queue_work_on > + 12.00% cursor_timer_handler > + 0.15% call_timer_fn > + 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq > -2.23% swapper [kernel.kallsyms] [k] mod_timer >- mod_timer > + 1.97% cursor_timer_handler > + 0.26% call_timer_fn > > During the same period, I can see that another CPU is actively > executing the timer function: > > - 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock >- ww_mutex_unlock > - 40.70% ast_dirty_update >ast_imageblit >soft_cursor >bit_cursor >fb_flashcursor >process_one_work >worker_thread >kthread >ret_from_fork > + 1.48% ast_imageblit > - 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio >- __memcpy_toio > + 31.54% ast_dirty_update > + 8.61% ast_imageblit > > Using the graph function tracer on fb_flashcursor(), I see that > ast_dirty_update usually takes around 60 us, in which it makes 16 > calls to __memcpy_toio(). However, there is always one instance on > every boot of the installer where ast_dirty_update() takes ~98 *ms* to > complete, during which it makes 743 calls to __memcpy_toio(). While > that doesn't directly account for the full 22s, I do wonder if that > maybe a smoking gun. > > fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814 > > -dann