On 29.09.2020 15:07, Andrey Ryabinin wrote: > > > On 9/29/20 11:24 AM, Kirill Tkhai wrote: >> On 28.09.2020 15:03, Andrey Ryabinin wrote: >>> We've got a hard lockup which seems to be caused by mgag200 >>> console printk code calling to schedule_work from scheduler >>> with rq->lock held: >>> #5 [ffffb79e034239a8] native_queued_spin_lock_slowpath at ffffffff8b50c6c6 >>> #6 [ffffb79e034239a8] _raw_spin_lock at ffffffff8bc96e5c >>> #7 [ffffb79e034239b0] try_to_wake_up at ffffffff8b4e26ff >>> #8 [ffffb79e03423a10] __queue_work at ffffffff8b4ce3f3 >>> #9 [ffffb79e03423a58] queue_work_on at ffffffff8b4ce714 >>> #10 [ffffb79e03423a68] mga_imageblit at ffffffffc026d666 [mgag200] >>> #11 [ffffb79e03423a80] soft_cursor at ffffffff8b8a9d84 >>> #12 [ffffb79e03423ad8] bit_cursor at ffffffff8b8a99b2 >>> #13 [ffffb79e03423ba0] hide_cursor at ffffffff8b93bc7a >>> #14 [ffffb79e03423bb0] vt_console_print at ffffffff8b93e07d >>> #15 [ffffb79e03423c18] console_unlock at ffffffff8b518f0e >>> #16 [ffffb79e03423c68] vprintk_emit_log at ffffffff8b51acf7 >>> #17 [ffffb79e03423cc0] vprintk_default at ffffffff8b51adcd >>> #18 [ffffb79e03423cd0] printk at ffffffff8b51b3d6 >>> #19 [ffffb79e03423d30] __warn_printk at ffffffff8b4b13a0 >>> #20 [ffffb79e03423d98] assert_clock_updated at ffffffff8b4dd293 >>> #21 [ffffb79e03423da0] deactivate_task at ffffffff8b4e12d1 >>> #22 [ffffb79e03423dc8] move_task_group at ffffffff8b4eaa5b >>> #23 [ffffb79e03423e00] cpulimit_balance_cpu_stop at ffffffff8b4f02f3 >>> #24 [ffffb79e03423eb0] cpu_stopper_thread at ffffffff8b576b67 >>> #25 [ffffb79e03423ee8] smpboot_thread_fn at ffffffff8b4d9125 >>> #26 [ffffb79e03423f10] kthread at ffffffff8b4d4fc2 >>> #27 [ffffb79e03423f50] ret_from_fork at ffffffff8be00255 >>> >>> The printk called because assert_clock_updated() triggered >>> SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP); >>> >>> This means that we missing necessary update_rq_clock() call. >>> Add one to cpulimit_balance_cpu_stop() to fix the warning. >>> Also add one in load_balance() before move_task_groups() call. >>> It seems to be another place missing this call. >>> >>> https://jira.sw.ru/browse/PSBM-108013 >>> Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> >>> --- >>> kernel/sched/fair.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 5d3556b15e70..e6dc21d5fa03 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data) >>> >>> schedstat_inc(sd->clb_count); >>> >>> + update_rq_clock(rq); >> >> Shouldn't we also add the same for target_rq to avoid WARN() coming from >> attach_task()? >> > > It seems like we should.
Are you going to send v3 or patch on top of this? _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel