I noticed the group frequently got throttled even it consumed
low cpu usage, this caused some jitters on the response time
to some of our business containers enabling cpu quota.

It's very easy to reproduce:
mkdir /sys/fs/cgroup/cpu/test
cd /sys/fs/cgroup/cpu/test
echo 100000 > cpu.cfs_quota_us
echo $$ > tasks
then repeat:
cat cpu.stat |grep nr_throttled  // nr_throttled will increase

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The global expiration should be advanced accordingly when the
bandwidth period timer is restarted.

Signed-off-by: Xunlei Pang <xlp...@linux.alibaba.com>
---
 kernel/sched/fair.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9f384264e832..bb006e671609 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5204,13 +5204,18 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 
 void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
 {
+       u64 overrun;
+
        lockdep_assert_held(&cfs_b->lock);
 
-       if (!cfs_b->period_active) {
-               cfs_b->period_active = 1;
-               hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
-               hrtimer_start_expires(&cfs_b->period_timer, 
HRTIMER_MODE_ABS_PINNED);
-       }
+       if (cfs_b->period_active)
+               return;
+
+       cfs_b->period_active = 1;
+       overrun = hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
+       cfs_b->runtime_expires += (overrun + 1) * ktime_to_ns(cfs_b->period);
+       cfs_b->expires_seq++;
+       hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED);
 }
 
 static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
-- 
2.14.1.40.g8e62ba1

Reply via email to