Much better; but the below does not in fact apply.

> ---
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 1214f0a..82698e5 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -185,6 +185,8 @@ exit_idle:
>   */
>  static void cpu_idle_loop(void)
>  {
> +     int cpu_id;
> 
> +     cpu_id = smp_processor_id();

This hunk assumes there's an empty line between the '{' and 'while
(1)', this isn't actually there.

>       while (1) {
>               /*
>                * If the arch has a polling bit, we maintain an invariant:
> @@ -202,7 +204,7 @@ static void cpu_idle_loop(void)
>                       check_pgt_cache();
>                       rmb();
> 
> -                     if (cpu_is_offline(smp_processor_id()))
> +                     if (cpu_is_offline(cpu_id))
>                                 arch_cpu_idle_dead();

And this hunk fails to apply because the recent cpu hotplug work added
another call here.

> 
>                       local_irq_disable();

I've taken your patch and modified it; see below.

Another optimization you could look at is removing that rmb(); I don't
actually think its needed, but you'd need to find why it was added and
then check it was in fact needed back then, and then check if it is in
fact still needed now.

Its a bit of a trek through git history, but lfence / dsb ld are
expensive instructions.

---
Subject: sched,idle: Optimize idle loop
From: "Gaurav Jindal (Gaurav Jindal)" <[email protected]>
Date: Thu, 12 May 2016 10:13:33 +0000

Currently, smp_processor_id() is used to fetch the current CPU in
cpu_idle_loop. Every time the idle thread runs, it fetches current CPU
using smp_processor_id().

Since the idle thread is per CPU, the current CPU is constant, so we
can lift the load out of the loop, saving execution cycles/time in the
loop.

x86-64:
Before patch (execution in loop):
        148:    0f ae e8                lfence
        14b:    65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
        152:    00
        153:    89 c0                   mov %eax,%eax
        155:    49 0f a3 04 24          bt %rax,(%r12)

After patch (execution in loop):
        150:    0f ae e8                lfence
        153:    4d 0f a3 34 24          bt %r14,(%r12)

ARM64:
Before patch (execution in loop):
        168:    d5033d9f        dsb     ld
        16c:    b9405661        ldr     w1,[x19,#84]
        170:    1100fc20        add     w0,w1,#0x3f
        174:    6b1f003f        cmp     w1,wzr
        178:    1a81b000        csel    w0,w0,w1,lt
        17c:    130c7000        asr     w0,w0,#6
        180:    937d7c00        sbfiz   x0,x0,#3,#32
        184:    f8606aa0        ldr     x0,[x21,x0]
        188:    9ac12401        lsr     x1,x0,x1
        18c:    36000e61        tbz     w1,#0,358

After patch (execution in loop):
        1a8:    d50339df        dsb     ld
        1ac:    f8776ac0        ldr     x0,[x22,x23]
        ab0:    ea18001f        tst     x0,x24
        1b4:    54000ea0        b.eq    388

Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
called 8672 times. Shifting the code will save instructions executed
in loop and eventually time as well.

Cc: "[email protected]" <[email protected]>
Signed-off-by: Gaurav Jindal <[email protected]>
Reviewed-by: Sanjeev Yadav <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: 
http://lkml.kernel.org/r/[email protected]
---
 kernel/sched/idle.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -201,6 +201,8 @@ static void cpuidle_idle_call(void)
  */
 static void cpu_idle_loop(void)
 {
+       int cpu = smp_processor_id();
+
        while (1) {
                /*
                 * If the arch has a polling bit, we maintain an invariant:
@@ -219,7 +221,7 @@ static void cpu_idle_loop(void)
                        check_pgt_cache();
                        rmb();
 
-                       if (cpu_is_offline(smp_processor_id())) {
+                       if (cpu_is_offline(cpu)) {
                                cpuhp_report_idle_dead();
                                arch_cpu_idle_dead();
                        }

Reply via email to