On Fri, Oct 11, 2019 at 11:55 PM Aaron Lu <aaron...@linux.alibaba.com> wrote:
> > I don't think we need do the normalization afterwrads and it appears > we are on the same page regarding core wide vruntime. > > The intent of my patch is to treat all the root level sched entities of > the two siblings as if they are in a single cfs_rq of the core. With a > core wide min_vruntime, the core scheduler can decide which sched entity > to run next. And the individual sched entity's vruntime shouldn't be > changed based on the change of core wide min_vruntime, or faireness can > hurt(if we add or reduce vruntime of a sched entity, its credit will > change). > Ok, I think I get it now. I see that your first patch actually wraps all the places where min_vruntime is accessed. So yes, the tree vruntime updation is needed only one time. From then on, since we use the wrapper cfs_rq_min_vruntime(), both the runqueues would self adjust from then on based on the code wide min_vruntime. Also by the virtue that min_vruntime stays min from there on, the tree updation logic will not be called more than once. So I think the changes are safe. I will do some profiling to make sure that it is actually called once only. > The weird thing about my patch is, the min_vruntime is often increased, > it doesn't point to the smallest value as in a traditional cfs_rq. This > probabaly can be changed to follow the tradition, I don't quite remember > why I did this, will need to check this some time later. Yeah, I noticed this. In my patch, I had already accounted for this and changed to min() instead of max() which is more logical that min_vruntime should be the minimum of both the run queue. > All those sub cfs_rq's sched entities are not interesting. Because once > we decided which sched entity in the root level cfs_rq should run next, > we can then pick the final next task from there(using the usual way). In > other words, to make scheduler choose the correct candidate for the core, > we only need worry about sched entities on both CPU's root level cfs_rqs. > Understood. The only reason I did the normalize is to get both the runqueues under one min_vruntime always. And as long as we use the cfs_rq_min_vruntime from then on, we wouldn't be calling the balancing logic any more. > Does this make sense? Sure, thanks for the clarification. Thanks, Vineeth