On 18/10/18 11:48, Peter Zijlstra wrote: > On Thu, Oct 18, 2018 at 10:28:38AM +0200, Juri Lelli wrote: > > > Another side problem seems also to be that with such tiny parameters we > > spend lot of time in the while (dl_se->runtime <= 0) loop of replenish_dl_ > > entity() (actually uselessly, as deadline is most probably going to > > still be in the past when eventually runtime becomes positive again), as > > delta_exec is huge w.r.t. runtime and runtime has to keep up with tiny > > increments of dl_runtime. I guess we could ameliorate things here by > > limiting the number of time we execute the loop before bailing out. > > That's the "DL replenish lagged too much" case, right? Yeah, there is > only so much we can recover from.
Right. > Funny that GCC actually emits that loop; sometimes we've had to fight > GCC not to turn that into a division. > > But yes, I suppose we can put a limit on how many periods we can lag > before just giving up. OK. > > So, I tend to think that we might want to play safe and put some higher > > minimum value for dl_runtime (it's currently at 1ULL << DL_SCALE). > > Guess the problem is to pick a reasonable value, though. Maybe link it > > someway to HZ? Then we might add a sysctl (or similar) thing with which > > knowledgeable users can do whatever they think their platform/config can > > support? > > Yes, a HZ related limit sounds like something we'd want. But if we're > going to do a minimum sysctl, we should also consider adding a maximum, > if you set a massive period/deadline, you can, even with a relatively > low u, incur significant delays. > > And do we want to put the limit on runtime or on period ? > > That is, something like: > > TICK_NSEC/2 < period < 10*TICK_NSEC > > and/or > > TICK_NSEC/2 < runtime < 10*TICK_NSEC > > Hmm, for HZ=1000 that ends up with a max period of 10ms, that's far too > low, 24Hz needs ~41ms. We can of course also limit the runtime by > capping u for users (as we should anyway). I also thought of TICK_NSEC/2 as a reasonably safe lower limit, that will implicitly limit period as well since runtime <= deadline <= period Not sure about the upper limit, though. Lower limit is something related to the inherent granularity of the platform/config, upper limit is more to do with highest prio stuff with huge period delaying everything else; doesn't seem to be related to HZ? Maybe we could just pick something that seems reasonably big to handle SCHED_DEADLINE users needs and not too big to jeopardize everyone else, say 0.5s?