From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

This patch exposes those extended CEDE states to the cpuidle framework
which are responsive to external interrupts and do not need an H_PROD.

Since as per the PAPR, all the extended CEDE states are non-responsive
to timers, we indicate this to the cpuidle subsystem via the
CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
can wake up on external interrupts.

With the patch, we are able to see the extended CEDE state with
latency hint = 1 exposed via the cpuidle framework.

        $ cpupower idle-info
        CPUidle driver: pseries_idle
        CPUidle governor: menu
        analyzing CPU 0:

        Number of idle states: 3
        Available idle states: snooze CEDE XCEDE1
        snooze:
        Flags/Description: snooze
        Latency: 0
        Usage: 33429446
        Duration: 27006062
        CEDE:
        Flags/Description: CEDE
        Latency: 1
        Usage: 10272
        Duration: 110786770
        XCEDE1:
        Flags/Description: XCEDE1
        Latency: 12
        Usage: 26445
        Duration: 1436433815

Benchmark results:
TLDR: Over all we do not see any additional benefit from having XCEDE1 over
CEDE.

ebizzy :
2 threads bound to a big-core. With this patch, we see a 3.39%
regression compared to with only CEDE0 latency fixup.
x With only CEDE0 latency fixup
* With CEDE0 latency fixup + CEDE1
    N           Min           Max        Median           Avg        Stddev
x  10       2893813       5834474       5832448     5327281.3     1055941.4
*  10       2907329       5834923       5831398     5146614.6     1193874.8

context_switch2:
With the context_switch2 there are no observable regressions in the
results.

context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
No difference with and without patch.
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        343644        348778        345444     345584.02     1035.1658
* 500        344310        347646        345776     345877.22     802.19501

context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
with patch
x without_patch
* with_patch
    N           Min           Max        Median           Avg        Stddev
x 500        287562        288756        288162     288134.76     262.24328
* 500        287874        288960        288306     288274.66     187.57034

schbench:
No regressions observed with schbench

Without Patch:
Latency percentiles (usec)
        50.0th: 29
        75.0th: 40
        90.0th: 50
        95.0th: 61
        *99.0th: 13648
        99.5th: 14768
        99.9th: 15664
        min=0, max=29812

With Patch:
Latency percentiles (usec)
        50.0th: 30
        75.0th: 40
        90.0th: 51
        95.0th: 59
        *99.0th: 13616
        99.5th: 14512
        99.9th: 15696
        min=0, max=15996

Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 502f906..6f893cd 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
        for (i = 0; i < nr_xcede_records; i++) {
                u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
                u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+               char name[CPUIDLE_NAME_LEN];
+               unsigned int latency_hint = xcede_records[i].latency_hint;
+               u64 residency_us;
+
+               if (!xcede_records[i].responsive_to_irqs) {
+                       pr_info("cpuidle : Skipping XCEDE%d. Not responsive to 
IRQs\n",
+                               latency_hint);
+                       continue;
+               }
 
                if (latency_us < min_latency_us)
                        min_latency_us = latency_us;
+               snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
+
+               /*
+                * As per the section 14.14.1 of PAPR version 2.8.1
+                * says that alling H_CEDE with the value of the cede
+                * latency specifier set greater than zero allows the
+                * processor timer facility to be disabled (so as not
+                * to cause gratuitous wake-ups - the use of H_PROD,
+                * or other external interrupt is required to wake the
+                * processor in this case).
+                *
+                * So, inform the cpuidle-subsystem that the timer
+                * will be stopped for these states.
+                *
+                * Also, bump up the latency by 10us, since cpuidle
+                * would use timer-offload framework which will need
+                * to send an IPI to wakeup a CPU whose timer has
+                * expired.
+                */
+               if (latency_hint > 0) {
+                       dedicated_states[nr_states].flags = 
CPUIDLE_FLAG_TIMER_STOP;
+                       latency_us += 10;
+               }
+
+               /*
+                * Thumb rule : Reside in the XCEDE state for at least
+                * 10x the time required to enter and exit that state.
+                */
+               residency_us = latency_us * 10;
+
+               strlcpy(dedicated_states[nr_states].name, (const char *)name,
+                       CPUIDLE_NAME_LEN);
+               strlcpy(dedicated_states[nr_states].desc, (const char *)name,
+                       CPUIDLE_NAME_LEN);
+               dedicated_states[nr_states].exit_latency = latency_us;
+               dedicated_states[nr_states].target_residency = residency_us;
+               dedicated_states[nr_states].enter = &dedicated_cede_loop;
+               cede_latency_hint[nr_states] = latency_hint;
+               pr_info("cpuidle : Added %s. latency-hint = %d\n",
+                       name, latency_hint);
+               nr_states++;
        }
 
        /*
-- 
1.9.4

Reply via email to