Re: [RFC PATCH v3 00/16] Core scheduling v3

Li, Aubrey Thu, 25 Jul 2019 14:43:20 -0700

On 2019/7/25 22:30, Aaron Lu wrote:
> On Mon, Jul 22, 2019 at 06:26:46PM +0800, Aubrey Li wrote:
>> The granularity period of util_avg seems too large to decide task priority
>> during pick_task(), at least it is in my case, cfs_prio_less() always picked
>> core max task, so pick_task() eventually picked idle, which causes this 
>> change
>> not very helpful for my case.
>>
>>  <idle>-0     [057] dN..    83.716973: __schedule: max: sysbench/2578
>> ffff889050f68600
>>  <idle>-0     [057] dN..    83.716974: __schedule:
>> (swapper/5/0;140,0,0) ?< (mysqld/2511;119,1042118143,0)
>>  <idle>-0     [057] dN..    83.716975: __schedule:
>> (sysbench/2578;119,96449836,0) ?< (mysqld/2511;119,1042118143,0)
>>  <idle>-0     [057] dN..    83.716975: cfs_prio_less: picked
>> sysbench/2578 util_avg: 20 527 -507 <======= here===
>>  <idle>-0     [057] dN..    83.716976: __schedule: pick_task cookie
>> pick swapper/5/0 ffff889050f68600
> 
> I tried a different approach based on vruntime with 3 patches following.
> 
> When the two tasks are on the same CPU, no change is made, I still route
> the two sched entities up till they are in the same group(cfs_rq) and
> then do the vruntime comparison.
> 
> When the two tasks are on differen threads of the same core, the root
> level sched_entities to which the two tasks belong will be used to do
> the comparison.
> 
> An ugly illustration for the cross CPU case:
> 
>    cpu0         cpu1
>  /   |  \     /   |  \
> se1 se2 se3  se4 se5 se6
>     /  \            /   \
>   se21 se22       se61  se62
> 
> Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while
> task B's se is se61. To compare priority of task A and B, we compare
> priority of se2 and se6. The smaller vruntime wins.
> 
> To make this work, the root level ses on both CPU should have a common
> cfs_rq min vuntime, which I call it the core cfs_rq min vruntime.
> 
> This is mostly done in patch2/3.
> 
> Test:
> 1 wrote an cpu intensive program that does nothing but while(1) in
>   main(), let's call it cpuhog;
> 2 start 2 cgroups, with one cgroup's cpuset binding to CPU2 and the
>   other binding to cpu3. cpu2 and cpu3 are smt siblings on the test VM;
> 3 enable cpu.tag for the two cgroups;
> 4 start one cpuhog task in each cgroup;
> 5 kill both cpuhog tasks after 10 seconds;
> 6 check each cgroup's cpu usage.
> 
> If the task is scheduled fairly, then each cgroup's cpu usage should be
> around 5s.
> 
> With v3, the cpu usage of both cgroups are sometimes 3s, 7s; sometimes
> 1s, 9s.
> 
> With the 3 patches applied, the numbers are mostly around 5s, 5s.
> 
> Another test is starting two cgroups simultaneously with cpu.tag set,
> with one cgroup running: will-it-scale/page_fault1_processes -t 16 -s 30,
> the other running: will-it-scale/page_fault2_processes -t 16 -s 30.
> With v3, like I said last time, the later started page_fault processes
> can't start running. With the 3 patches applied, both running at the
> same time with each CPU having a relatively fair score:
> 
> output line of 16 page_fault1 processes in 1 second interval:
> min:105225 max:131716 total:1872322
> 
> output line of 16 page_fault2 processes in 1 second interval:
> min:86797 max:110554 total:1581177
> 
> Note the value in min and max, the smaller the gap is, the better the
> faireness is.
> 
> Aubrey,
> 
> I haven't been able to run your workload yet...
>


No worry, let me try to see how it works.

Thanks,
-Aubrey

Re: [RFC PATCH v3 00/16] Core scheduling v3

Reply via email to