Phil Auld <pa...@redhat.com> writes: > On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: >> On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: >> > Letting it spin for 100ms and then only increasing by 6% seems extremely >> > generous. If we went this route I'd probably say "after looping N >> > times, set the period to time taken / N + X%" where N is like 8 or >> > something. I think I'd probably perfer something like this to the >> > previous "just abort and let it happen again next interrupt" one. >> >> Okay. I'll try to spin something up that does this. It may be a little >> trickier to keep the quota proportional to the new period. I think that's >> important since we'll be changing the user's setting. >> >> Do you mean to have it break when it hits N and recalculates the period or >> reset the counter and keep going? >> > > Let me know what you think of the below. It's working nicely. I like your > suggestion to limit it quickly based on number of loops and use that to > scale up. I think it is best to break out and let it fire again if needed. > The warning fires once, very occasionally twice, and then things are quiet. > > If that looks reasonable I'll do some more testing and spin it up as a real > patch submission.
Yeah, this looks reasonable. I should probably see how unreasonable the other thing would be, but if your previous periods were kinda small (and it's just that the machine crashing isn't an ok failure mode) I suppose it's not a big deal. > > Cheers, > Phil > --- > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 310d0637fe4b..54b30adfc89e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4859,19 +4859,51 @@ static enum hrtimer_restart > sched_cfs_slack_timer(struct hrtimer *timer) > return HRTIMER_NORESTART; > } > > +extern const u64 max_cfs_quota_period; > +int cfs_period_autotune_loop_limit = 8; > +int cfs_period_autotune_cushion_pct = 15; /* percentage added to period > recalculation */ > + > static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) > { > struct cfs_bandwidth *cfs_b = > container_of(timer, struct cfs_bandwidth, period_timer); > + s64 nsstart, nsnow, new_period; > int overrun; > int idle = 0; > + int count = 0; > > raw_spin_lock(&cfs_b->lock); > + nsstart = ktime_to_ns(hrtimer_cb_get_time(timer)); > for (;;) { > overrun = hrtimer_forward_now(timer, cfs_b->period); > if (!overrun) > break; > > + if (++count > cfs_period_autotune_loop_limit) { > + ktime_t old_period = ktime_to_ns(cfs_b->period); > + > + nsnow = ktime_to_ns(hrtimer_cb_get_time(timer)); > + new_period = (nsnow - > nsstart)/cfs_period_autotune_loop_limit; > + > + /* Make sure new period will be larger than old. */ > + if (new_period < old_period) { > + new_period = old_period; > + } > + new_period += (new_period * > cfs_period_autotune_cushion_pct) / 100; This ordering means that it will always increase by at least 15%. This is a bit odd but probably a good thing; I'd just change the comment to make it clear this is deliberate. > + > + if (new_period > max_cfs_quota_period) > + new_period = max_cfs_quota_period; > + > + cfs_b->period = ns_to_ktime(new_period); > + cfs_b->quota += (cfs_b->quota * ((new_period - > old_period) * 100)/old_period)/100; In general it makes sense to do fixed point via 1024 or something that can be optimized into shifts (and a larger number is better in general for better precision). > + pr_warn_ratelimited( > + "cfs_period_timer[cpu%d]: period too short, > scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", > + smp_processor_id(), > cfs_b->period/NSEC_PER_USEC, cfs_b->quota/NSEC_PER_USEC); > + > + idle = 0; > + break; > + } > + > idle = do_sched_cfs_period_timer(cfs_b, overrun); > } > if (idle)