We've got a hard lockup which seems to be caused by mgag200 console printk code calling to schedule_work from scheduler with rq->lock held:
#5 [ffffb79e034239a8] native_queued_spin_lock_slowpath at ffffffff8b50c6c6 #6 [ffffb79e034239a8] _raw_spin_lock at ffffffff8bc96e5c #7 [ffffb79e034239b0] try_to_wake_up at ffffffff8b4e26ff #8 [ffffb79e03423a10] __queue_work at ffffffff8b4ce3f3 #9 [ffffb79e03423a58] queue_work_on at ffffffff8b4ce714 The printk called because assert_clock_updated() triggered SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP); This means that we missing necessary update_rq_clock() call. Add one to cpulimit_balance_cpu_stop() to fix the warning. Also add one in load_balance() before move_task_groups() call. It seems to be another place missing this call. https://jira.sw.ru/browse/PSBM-108013 Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5d3556b15e70..e6dc21d5fa03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data) schedstat_inc(sd->clb_count); + update_rq_clock(rq); if (do_cpulimit_balance(&env)) schedstat_inc(sd->clb_pushed); else @@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.loop = 0; local_irq_save(rf.flags); double_rq_lock(env.dst_rq, busiest); + update_rq_clock(env.dst_rq); cur_ld_moved = ld_moved = move_task_groups(&env); double_rq_unlock(env.dst_rq, busiest); local_irq_restore(rf.flags); -- 2.26.2 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel