Le mercredi 17 juin 2020 à 16:57:25 (+0200), Vincent Guittot a écrit :
> Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit :
> > 
> > 
> > On 6/16/2020 2:54 PM, Vincent Guittot wrote:
> > > 
> > > Hi Xing,
> > > 
> > > Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit :
> > > > 
> > > > 
> > > > On 6/15/2020 4:10 PM, Vincent Guittot wrote:
> > > > > Hi Xing,
> > > > > 
> > > > > Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit :
> > > > > > 
> > > > > > 
> > > > > > On 6/12/2020 7:06 PM, Hillf Danton wrote:
> > > > > > > 
> > > > > > > On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote:
> > > > > 
> > > 
> > > ...
> > > 
...
> > 
> > I apply the patch based on v5.7, the test result is as the following:
> > 
> > =========================================================================================
> > tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:
> > 
> > lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21
> > 
> > commit:
> >   9f68395333ad7f5bfe2f83473fed363d4229f11c
> >   070f5e860ee2bf588c99ef7b4c202451faa48236
> >   v5.7
> >   63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch)
> > 
> > 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2                        v5.7
> > 63a5d0fbb5ec62f5148c251c01e
> > ---------------- --------------------------- ---------------------------
> > ---------------------------
> >          %stddev     %change         %stddev     %change %stddev     %change
> > %stddev
> >              \          |                \          |                \
> > |                \
> >       0.69           -10.3%       0.62            -9.1%       0.62
> > +1.0%       0.69        reaim.child_systime
> >       0.62            -1.0%       0.61            +0.5%       0.62
> > -0.1%       0.62        reaim.child_utime
> >      66870           -10.0%      60187            -7.6%      61787
> > +1.1%      67636        reaim.jobs_per_min
> >      16717           -10.0%      15046            -7.6%      15446
> > +1.1%      16909        reaim.jobs_per_min_child
> 
> OK. So the regression disappears when the conditions on runnable_avg are 
> removed.
> 
> In the meantime, I have been able to understand more deeply what was 
> happeningi
> for this bench and how it is impacted by
>   commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to 
> classify group")
> 
> This bench forks a new thread for each and every new step. But a newly forked
> threads start with a load_avg and a runnable_avg set to max whereas the 
> threads
> are running shortly before exiting. This makes the CPU to be set overloaded in
> some case whereas it isn't.
> 
> Could you try the patch below ?
> It fixes the problem on my setup (I have finally been able to reproduce the 
> problem)
> 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0aeffff62807..b33a4a9e1491 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p)
>               }
>       }
>  
> -     sa->runnable_avg = cpu_scale;
> +     sa->runnable_avg = sa->util_avg;
>  
>       if (p->sched_class != &fair_sched_class) {
>               /*
> -- 
> 2.17.1
> 

The patch above tries to move back to the group in the same classification as
before but this could harm other benchmarks.

There is another way to fix this by easing the migration of task in the case
of migrate_util imbalance.

Could you also try the patch below instead of the one above ?

---
 kernel/sched/fair.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0aeffff62807..fcaf66c4d086 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env)
                case migrate_util:
                        util = task_util_est(p);

-                       if (util > env->imbalance)
+                       if (util/2 > env->imbalance &&
+                           env->sd->nr_balance_failed <= 
env->sd->cache_nice_tries)
                                goto next;

                        env->imbalance -= util;
--
2.17.1


> 
> 
> >      97.84            -1.1%      96.75            -0.4%      97.43
> > +0.3%      98.09        reaim.jti
> >      72000           -10.8%      64216            -8.3%      66000
> > +0.0%      72000        reaim.max_jobs_per_min
> >       0.36           +10.6%       0.40            +7.8%       0.39
> > -1.1%       0.36        reaim.parent_time
> >       1.58 ±  2%     +71.0%       2.70 ±  2%     +26.9%       2.01 ± 2%
> > -11.9%       1.39 ±  4%  reaim.std_dev_percent
> >       0.00 ±  5%    +110.4%       0.01 ±  3%     +48.8%       0.01 ± 7%
> > -27.3%       0.00 ± 15%  reaim.std_dev_time
> >      50800            -2.4%      49600            -1.6%      50000
> > +0.0%      50800        reaim.workload
> > 
> > 
> > > > 
> > > > =========================================================================================
> > > > tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:
> > > > 
> > > > lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21
> > > > 
> > > > commit:
> > > >    9f68395333ad7f5bfe2f83473fed363d4229f11c
> > > >    070f5e860ee2bf588c99ef7b4c202451faa48236
> > > >    v5.7
> > > >    3e1643da53f3fc7414cfa3ad2a16ab2a164b7f4d (the test patch)
> > > > 
> > > > 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2                        v5.7
> > > > 3e1643da53f3fc7414cfa3ad2a1
> > > > ---------------- --------------------------- ---------------------------
> > > > ---------------------------
> > > >           %stddev     %change         %stddev     %change %stddev     
> > > > %change
> > > > %stddev
> > > >               \          |                \          |                \
> > > > |                \
> > > >        0.69           -10.3%       0.62            -9.1%       0.62
> > > > -7.1%       0.64        reaim.child_systime
> > > >        0.62            -1.0%       0.61            +0.5%       0.62
> > > > +1.3%       0.63        reaim.child_utime
> > > >       66870           -10.0%      60187            -7.6%      61787
> > > > -6.1%      62807        reaim.jobs_per_min
> > > >       16717           -10.0%      15046            -7.6%      15446
> > > > -6.1%      15701        reaim.jobs_per_min_child
> > > >       97.84            -1.1%      96.75            -0.4%      97.43
> > > > -0.5%      97.34        reaim.jti
> > > >       72000           -10.8%      64216            -8.3%      66000
> > > > -5.7%      67885        reaim.max_jobs_per_min
> > > >        0.36           +10.6%       0.40            +7.8%       0.39
> > > > +6.9%       0.38        reaim.parent_time
> > > >        1.58 ±  2%     +71.0%       2.70 ±  2%     +26.9%       2.01 ± 2%
> > > > +32.5%       2.09 ±  6%  reaim.std_dev_percent
> > > >        0.00 ±  5%    +110.4%       0.01 ±  3%     +48.8%       0.01 ± 7%
> > > > +61.7%       0.01 ±  8%  reaim.std_dev_time
> > > >       50800            -2.4%      49600            -1.6%      50000
> > > > -1.3%      50133        reaim.workload
> > > > 
> > > > 
> > > > > > 
> > > > > > =========================================================================================
> > > > > > tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode:
> > > > > > 
> > > > > > lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21
> > > > > > 
> > > > > > commit:
> > > > > >     9f68395333ad7f5bfe2f83473fed363d4229f11c
> > > > > >     070f5e860ee2bf588c99ef7b4c202451faa48236
> > > > > >     v5.7
> > > > > >     6b33257768b8dd3982054885ea310871be2cfe0b (Hillf's patch)
> > > > > > 
> > > > > > 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2                        
> > > > > > v5.7
> > > > > > 6b33257768b8dd3982054885ea3
> > > > > > ---------------- --------------------------- 
> > > > > > ---------------------------
> > > > > > ---------------------------
> > > > > >            %stddev     %change         %stddev     %change %stddev  
> > > > > >    %change
> > > > > > %stddev
> > > > > >                \          |                \          |             
> > > > > >    \
> > > > > > |                \
> > > > > >         0.69           -10.3%       0.62            -9.1%       0.62
> > > > > > -10.1%       0.62        reaim.child_systime
> > > > > >         0.62            -1.0%       0.61            +0.5%       0.62
> > > > > > +0.3%       0.62        reaim.child_utime
> > > > > >        66870           -10.0%      60187            -7.6%      61787
> > > > > > -8.3%      61305        reaim.jobs_per_min
> > > > > >        16717           -10.0%      15046            -7.6%      15446
> > > > > > -8.3%      15326        reaim.jobs_per_min_child
> > > > > >        97.84            -1.1%      96.75            -0.4%      97.43
> > > > > > -0.5%      97.37        reaim.jti
> > > > > >        72000           -10.8%      64216            -8.3%      66000
> > > > > > -8.3%      66000        reaim.max_jobs_per_min
> > > > > >         0.36           +10.6%       0.40            +7.8%       0.39
> > > > > > +9.4%       0.39        reaim.parent_time
> > > > > >         1.58 ±  2%     +71.0%       2.70 ±  2%     +26.9%       
> > > > > > 2.01 ± 2%
> > > > > > +33.2%       2.11        reaim.std_dev_percent
> > > > > >         0.00 ±  5%    +110.4%       0.01 ±  3%     +48.8%       
> > > > > > 0.01 ± 7%
> > > > > > +65.3%       0.01 ±  3%  reaim.std_dev_time
> > > > > >        50800            -2.4%      49600            -1.6%      50000
> > > > > > -1.8%      49866        reaim.workload
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Zhengjun Xing
> > > > 
> > > > -- 
> > > > Zhengjun Xing
> > 
> > -- 
> > Zhengjun Xing

Reply via email to