On 28-Jun 14:38, Peter Zijlstra wrote: > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote: > > On 26-Jun 13:40, Vincent Guittot wrote: > > > Hi Patrick, > > > > > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <[email protected]> > > > wrote: > > > > > > > > The estimated utilization for a task is currently defined based on: > > > > - enqueued: the utilization value at the end of the last activation > > > > - ewma: an exponential moving average which samples are the > > > > enqueued values > > > > > > > > According to this definition, when a task suddenly change it's bandwidth > > > > requirements from small to big, the EWMA will need to collect multiple > > > > samples before converging up to track the new big utilization. > > > > > > > > Moreover, after the PELT scale invariance update [1], in the above > > > > scenario we > > > > can see that the utilization of the task has a significant drop from > > > > the first > > > > big activation to the following one. That's implied by the new > > > > "time-scaling" > > > > > > Could you give us more details about this? I'm not sure to understand > > > what changes between the 1st big activation and the following one ? > > > > We are after a solution for the problem Douglas Raillard discussed at > > OSPM, specifically the "Task util drop after 1st idle" highlighted in > > slide 6 of his presentation: > > > > > > http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf > > > > So I see the problem, and I don't hate the patch, but I'm still > struggling to understand how exactly it related to the time-scaling > stuff. Afaict the fundamental problem here is layering two averages. The > second (EWMA in our case) will always lag/delay the input of the first > (PELT). > > The time-scaling thing might make matters worse, because that helps PELT > ramp up faster, but that is not the primary issue.
Sure, we like the new time-scaling PELT which ramps up faster and, as long as we have idle time, it's better in predicting what would be the utilization as if we was running at max OPP. However, the experiment above shows that: - despite the task being a 75% after a certain activation, it takes multiple activations for PELT to actually enter that range. - the first activation ends at 665, 10% short wrt the configured utilization - while the PELT signal converge toward the 75%, we have some pretty consistent drops at wakeup time, especially after the first big activation. > Or am I missing something? I'm not sure the above happens because of a problem in the new time-scaling PELT, I actually think it's kind of expected given the way we re-scale time contributions depending on the current OPPs. It's just that a 375 drops in utilization with just 1.1ms sleep time looks to me more related to the time-scaling invariance then just the normal/expected PELT decay. Could it be an out-of-sync issue between the PELT time scaling code and capacity scaling code? Perhaps due to some OPP changes/notification going wrong? Sorry for not being much more useful on that, maybe Vincent has some better ideas. The only thing I've kind of convinced myself is that an EWMA on util_est does not make a lot of sense for increasing utilization tracking. Best, Patrick -- #include <best/regards.h> Patrick Bellasi

