The commit is pushed to "branch-rh7-3.10.0-123.1.2-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-123.1.2.vz7.5.9 ------> commit 034153bb1d78cbfd0c44c0e91c3c77d4178606b6 Author: Vladimir Davydov <vdavy...@parallels.com> Date: Thu Jun 4 15:53:10 2015 +0400
sched: Port diff-sched-initialize-runtime-to-non-zero-on-cfs-bw-set Author: Vladimir Davydov Email: vdavy...@parallels.com Subject: sched: initialize runtime to non-zero on cfs bw set Date: Mon, 21 Jan 2013 11:44:58 +0400 * [sched] running tasks could be throttled and never unthrottled thus causing random node hangs. (PSBM-17658) If cfs_rq->runtime_remaining is <= 0 then either - cfs_rq is throttled and waiting for quota redistribution, or - cfs_rq is currently executing and will be throttled on put_prev_entity, or - cfs_rq is not throttled and has not executed since its quota was set (runtime_remaining is set to 0 on cfs bandwidth reconfiguration). It is obvious that the last case is rather an exception from the rule "runtime_remaining<=0 iff cfs_rq is throttled or will be throttled as soon as it finishes its execution". Moreover, it can lead to a task hang as follows. If put_prev_task is called immediately after first pick_next_task after quota was set, "immediately" meaning rq->clock in both functions is the same, then the corresponding cfs_rq will be throttled. Besides being unfair (the cfs_rq has not executed in fact), the quota refilling timer can be idle at that time and it won't be activated on put_prev_task because update_curr calls account_cfs_rq_runtime, which activates the timer, only if delta_exec is strictly positive. As a result we can get a task "running" inside a throttled cfs_rq which will probably never be unthrottled. To avoid the problem, the patch makes tg_set_cfs_bandwidth initialize runtime_remaining of each cfs_rq to 1 instead of 0 so that the cfs_rq will be throttled only if it has executed for some positive number of nanoseconds. Several times we had our customers encountered such hangs inside a VM (seems something is wrong or rather different in time accounting there). Analyzing crash dumps revealed that hung tasks were running inside cfs_rq's, which had the following setup cfs_rq->throttled=1 cfs_rq->runtime_enabled=1 cfs_rq->runtime_remaining=0 cfs_rq->tg->cfs_bandwidth.idle=1 cfs_rq->tg->cfs_bandwidth.timer_active=0 which conforms pretty nice to the explanation given above. https://jira.sw.ru/browse/PSBM-17658 Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> ============================================================================= Related to https://jira.sw.ru/browse/PSBM-33642 Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4e6254b..d8831c9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8342,7 +8342,7 @@ static int __tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) raw_spin_lock_irq(&rq->lock); cfs_rq->runtime_enabled = runtime_enabled; - cfs_rq->runtime_remaining = 0; + cfs_rq->runtime_remaining = 1; if (cfs_rq->throttled) unthrottle_cfs_rq(cfs_rq); _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel