[PATCH] fbcon: use default if cursor blink interval is not valid

2016-05-20 Thread Ming Lei
On Thu, May 19, 2016 at 10:22 PM, Scot Doyle  wrote:
> On Thu, 19 May 2016, Pavel Machek wrote:
>> Hi!
>>
>> > Two current [1] and three previous [2] systems locked during boot
>> > because the cursor flash timer was set using an ops->cur_blink_jiffies
>> > value of 0. Previous patches attempted to solve the problem by moving
>> > variable initialization earlier in the setup sequence [2].
>> >
>> > Use the normal cursor blink default interval of 200 ms if
>> > ops->cur_blink_jiffies is not in the range specified in commit
>> > bd63364caa8d. Since invalid values are not used, specific system
>> > initialization timings should not cause lockups.
>> >
>> > [1] https://bugs.launchpad.net/bugs/1574814
>> > [2] see commits: 2a17d7e80f1d, f235f664a8af, a1e533ec07d5
>>
>> Acked-by: Pavel Machek 
>>
>> >  static void cursor_timer_handler(unsigned long dev_addr)
>> >  {
>> > struct fb_info *info = (struct fb_info *) dev_addr;
>> > struct fbcon_ops *ops = info->fbcon_par;
>> >
>> > queue_work(system_power_efficient_wq, >queue);
>> > -   mod_timer(>cursor_timer, jiffies + ops->cur_blink_jiffies);
>> > +   mod_timer(>cursor_timer, jiffies +
>> > +   cursor_blink_jiffies(ops->cur_blink_jiffies));
>> >  }
>> >
>> >  static void fbcon_add_cursor_timer(struct fb_info *info)
>>
>> And actually... perhaps mod_timer should have some check for too low
>> timeouts..?
>>
>> WARN_ON?
>>   Pavel
>
>
> Interesting idea. I applied this patch to a couple systems and
> receive the same warning on both:

If 'jiffies' is passed to mod_timer() for same timer unusually OR
mod_timer() isn't called from the timer handler, it shoudln't cause
soft lockup.

In the case of fbcon, 'jiffies' is always passed to mod_timer() and
mod_timer() is called from the timer handler meantime, that is a
real lockup.

>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 73164c3..f6c0b69 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -788,6 +788,7 @@ __mod_timer(struct timer_list *timer, unsigned long 
> expires,
>
> timer_stats_timer_set_start_info(timer);
> BUG_ON(!timer->function);
> +   WARN_ONCE(expires == jiffies, "timer should expire in the future");
>
> base = lock_timer_base(timer, );
>
> --
>
> [2.060474] [ cut here ]
> [2.061613] WARNING: CPU: 0 PID: 164 at kernel/time/timer.c:791 
> mod_timer+0x233/0x240
> [2.062740] timer should expire in the future
> [2.062757] CPU: 0 PID: 164 Comm: kworker/0:2 Not tainted 4.6.0+ #7
> [2.065870] Hardware name: Toshiba Leon, BIOS  12/04/2013
> [2.067828] Workqueue: events_power_efficient hub_init_func3
> [2.069762]   88007443bbb8 8139932b 
> 88007443bc08
> [2.071701]   88007443bbf8 8112e57c 
> 0317
> [2.073655]  88007486a0b0 fffea2da 88007486a000 
> 0202
> [2.075594] Call Trace:
> [2.077503]  [] dump_stack+0x4d/0x72
> [2.079426]  [] __warn+0xcc/0xf0
> [2.081325]  [] warn_slowpath_fmt+0x4f/0x60
> [2.083212]  [] ? find_next_bit+0x15/0x20
> [2.085022]  [] ? cpumask_next_and+0x2f/0x40
> [2.086696]  [] mod_timer+0x233/0x240
> [2.088362]  [] usb_hcd_submit_urb+0x3f2/0x8c0
> [2.090026]  [] ? urb_destroy+0x24/0x30
> [2.091698]  [] ? insert_work+0x58/0xb0
> [2.093349]  [] usb_submit_urb+0x287/0x530
> [2.094985]  [] hub_activate+0x1fd/0x5d0
> [2.096625]  [] ? finish_task_switch+0x78/0x1f0
> [2.098268]  [] hub_init_func3+0x1a/0x20
> [2.099908]  [] process_one_work+0x140/0x3e0
> [2.101539]  [] worker_thread+0x4e/0x480
> [2.103173]  [] ? process_one_work+0x3e0/0x3e0
> [2.104790]  [] ? process_one_work+0x3e0/0x3e0
> [2.106259]  [] kthread+0xc9/0xe0
> [2.107731]  [] ret_from_fork+0x22/0x40
> [2.109215]  [] ? __kthread_parkme+0x70/0x70
> [2.110704] ---[ end trace 3519886a1a990d99 ]---
>
> mod_timer is called from over a thousand places. Should timers always
> expire in the future?
>


[PATCH] tty: vt: Fix soft lockup in fbcon cursor blink timer.

2016-05-18 Thread Ming Lei
On Wed, May 18, 2016 at 4:49 AM, Pavel Machek  wrote:
> On Tue 2016-05-17 11:41:04, David Daney wrote:
>> From: David Daney 
>>
>> We are getting somewhat random soft lockups with this signature:
>>
>> [   86.992215] [] el1_irq+0xa0/0x10c
>> [   86.997082] [] cursor_timer_handler+0x30/0x54
>> [   87.002991] [] call_timer_fn+0x54/0x1a8
>> [   87.008378] [] run_timer_softirq+0x1c4/0x2bc
>> [   87.014200] [] __do_softirq+0x114/0x344
>> [   87.019590] [] irq_exit+0x74/0x98
>> [   87.024458] [] __handle_domain_irq+0x98/0xfc
>> [   87.030278] [] gic_handle_irq+0x94/0x190
>>
>> This is caused by the vt visual_init() function calling into
>> fbcon_init() with a vc_cur_blink_ms value of zero.  This is a
>> transient condition, as it is later set to a non-zero value.  But, if
>> the timer happens to expire while the blink rate is zero, it goes into
>> an endless loop, and we get soft lockup.
>>
>> The fix is to initialize vc_cur_blink_ms before calling the con_init()
>> function.
>>
>> Signed-off-by: David Daney 
>> Cc: stable at vger.kernel.org
>
> Acked-by: Pavel Machek 

Tested-by: Ming Lei 

Thanks David and Pavel for making it work!

>
> (And it is amazing how many problems configurable blink speed caused).
>
> Thanks!
> Pavel
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


ast: cursor flashing softlockups

2016-05-17 Thread Ming Lei
Hi,

On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
 wrote:
> Hi,
>  I'm observing a soft lockup issue w/ the ASPEED controller on an
> arm64 server platform. This was originally seen on Ubuntu's 4.4
> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>
> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
> [swapper/38:0]
>
> I observe this just once each time I boot into debian-installer (I'm
> using a serial console, but the ast module gets loaded during
> startup).

I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
when the issue happened.

Looks it is a real fbcon/vt issue, see following:

fbcon_init()
<-.con_init
  <-visual_init()

reset_terminal()
<-vc_init()

vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
in fbcon_init().

And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
is initialized as zero and cause the soft lockup issue finally.

Thanks,
Ming

>
> perf shows that the CPU caught by the NMI is typically in code
> updating the cursor timer:
>
> -   16.92%  swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>- _raw_spin_unlock_irqrestore
>   + 16.87% mod_timer
>   + 0.05% cursor_timer_handler
> -   12.15%  swapper  [kernel.kallsyms]  [k] queue_work_on
>- queue_work_on
>   + 12.00% cursor_timer_handler
>   + 0.15% call_timer_fn
> +   10.98%  swapper  [kernel.kallsyms]  [k] run_timer_softirq
> -2.23%  swapper  [kernel.kallsyms]  [k] mod_timer
>- mod_timer
>   + 1.97% cursor_timer_handler
>   + 0.26% call_timer_fn
>
> During the same period, I can see that another CPU is actively
> executing the timer function:
>
> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>- ww_mutex_unlock
>   - 40.70% ast_dirty_update
>ast_imageblit
>soft_cursor
>bit_cursor
>fb_flashcursor
>process_one_work
>worker_thread
>kthread
>ret_from_fork
>   + 1.48% ast_imageblit
> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>- __memcpy_toio
>   + 31.54% ast_dirty_update
>   + 8.61% ast_imageblit
>
> Using the graph function tracer on fb_flashcursor(), I see that
> ast_dirty_update usually takes around 60 us, in which it makes 16
> calls to __memcpy_toio(). However, there is always one instance on
> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
> complete, during which it makes 743 calls to __memcpy_toio(). While
> that  doesn't directly account for the full 22s, I do wonder if that
> maybe a smoking gun.
>
> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>
>   -dann