Re: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework

2020-07-19 Thread Vaidyanathan Srinivasan
* Gautham R Shenoy  [2020-07-07 16:41:38]:

> From: "Gautham R. Shenoy" 
> 
> This patch exposes those extended CEDE states to the cpuidle framework
> which are responsive to external interrupts and do not need an H_PROD.
> 
> Since as per the PAPR, all the extended CEDE states are non-responsive
> to timers, we indicate this to the cpuidle subsystem via the
> CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
> can wake up on external interrupts.
> 
> With the patch, we are able to see the extended CEDE state with
> latency hint = 1 exposed via the cpuidle framework.
> 
>   $ cpupower idle-info
>   CPUidle driver: pseries_idle
>   CPUidle governor: menu
>   analyzing CPU 0:
> 
>   Number of idle states: 3
>   Available idle states: snooze CEDE XCEDE1
>   snooze:
>   Flags/Description: snooze
>   Latency: 0
>   Usage: 33429446
>   Duration: 27006062
>   CEDE:
>   Flags/Description: CEDE
>   Latency: 1
>   Usage: 10272
>   Duration: 110786770
>   XCEDE1:
>   Flags/Description: XCEDE1
>   Latency: 12
>   Usage: 26445
>   Duration: 1436433815
> 
> Benchmark results:
> TLDR: Over all we do not see any additional benefit from having XCEDE1 over
> CEDE.
> 
> ebizzy :
> 2 threads bound to a big-core. With this patch, we see a 3.39%
> regression compared to with only CEDE0 latency fixup.
> x With only CEDE0 latency fixup
> * With CEDE0 latency fixup + CEDE1
> N   Min   MaxMedian   AvgStddev
> x  10   2893813   5834474   5832448 5327281.3 1055941.4
> *  10   2907329   5834923   5831398 5146614.6 1193874.8
> 
> context_switch2:
> With the context_switch2 there are no observable regressions in the
> results.
> 
> context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
> No difference with and without patch.
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500343644348778345444 345584.02 1035.1658
> * 500344310347646345776 345877.22 802.19501
> 
> context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
> with patch
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500287562288756288162 288134.76 262.24328
> * 500287874288960288306 288274.66 187.57034
> 
> schbench:
> No regressions observed with schbench
> 
> Without Patch:
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 40
>   90.0th: 50
>   95.0th: 61
>   *99.0th: 13648
>   99.5th: 14768
>   99.9th: 15664
>   min=0, max=29812
> 
> With Patch:
> Latency percentiles (usec)
>   50.0th: 30
>   75.0th: 40
>   90.0th: 51
>   95.0th: 59
>   *99.0th: 13616
>   99.5th: 14512
>   99.9th: 15696
>   min=0, max=15996
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 

> ---
>  drivers/cpuidle/cpuidle-pseries.c | 50 
> +++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 502f906..6f893cd 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
>   for (i = 0; i < nr_xcede_records; i++) {
>   u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
>   u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> + char name[CPUIDLE_NAME_LEN];
> + unsigned int latency_hint = xcede_records[i].latency_hint;
> + u64 residency_us;
> +
> + if (!xcede_records[i].responsive_to_irqs) {
> + pr_info("cpuidle : Skipping XCEDE%d. Not responsive to 
> IRQs\n",
> + latency_hint);
> + continue;
> + }
> 
>   if (latency_us < min_latency_us)
>   min_latency_us = latency_us;
> + snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> +
> + /*
> +  * As per the section 14.14.1 of PAPR version 2.8.1
> +  * says that alling H_CEDE with the value of the cede
> +  * latency specifier set greater than zero allows the
> +  * processor timer facility to be disabled (so as not
> +  * to cause gratuitous wake-ups - the use of H_PROD,
> +  * or other external interrupt is required to wake the
> +  * processor in this case).
> +  *
> +  * So, inform the cpuidle-subsystem that the timer
> +  * will be stopped for these states.
> +  *
> +  * Also, bump up the latency by 10us, since cpu

[PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework

2020-07-07 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

This patch exposes those extended CEDE states to the cpuidle framework
which are responsive to external interrupts and do not need an H_PROD.

Since as per the PAPR, all the extended CEDE states are non-responsive
to timers, we indicate this to the cpuidle subsystem via the
CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
can wake up on external interrupts.

With the patch, we are able to see the extended CEDE state with
latency hint = 1 exposed via the cpuidle framework.

$ cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 3
Available idle states: snooze CEDE XCEDE1
snooze:
Flags/Description: snooze
Latency: 0
Usage: 33429446
Duration: 27006062
CEDE:
Flags/Description: CEDE
Latency: 1
Usage: 10272
Duration: 110786770
XCEDE1:
Flags/Description: XCEDE1
Latency: 12
Usage: 26445
Duration: 1436433815

Benchmark results:
TLDR: Over all we do not see any additional benefit from having XCEDE1 over
CEDE.

ebizzy :
2 threads bound to a big-core. With this patch, we see a 3.39%
regression compared to with only CEDE0 latency fixup.
x With only CEDE0 latency fixup
* With CEDE0 latency fixup + CEDE1
N   Min   MaxMedian   AvgStddev
x  10   2893813   5834474   5832448 5327281.3 1055941.4
*  10   2907329   5834923   5831398 5146614.6 1193874.8

context_switch2:
With the context_switch2 there are no observable regressions in the
results.

context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
No difference with and without patch.
x without_patch
* with_patch
N   Min   MaxMedian   AvgStddev
x 500343644348778345444 345584.02 1035.1658
* 500344310347646345776 345877.22 802.19501

context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
with patch
x without_patch
* with_patch
N   Min   MaxMedian   AvgStddev
x 500287562288756288162 288134.76 262.24328
* 500287874288960288306 288274.66 187.57034

schbench:
No regressions observed with schbench

Without Patch:
Latency percentiles (usec)
50.0th: 29
75.0th: 40
90.0th: 50
95.0th: 61
*99.0th: 13648
99.5th: 14768
99.9th: 15664
min=0, max=29812

With Patch:
Latency percentiles (usec)
50.0th: 30
75.0th: 40
90.0th: 51
95.0th: 59
*99.0th: 13616
99.5th: 14512
99.9th: 15696
min=0, max=15996

Signed-off-by: Gautham R. Shenoy 
---
 drivers/cpuidle/cpuidle-pseries.c | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 502f906..6f893cd 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
for (i = 0; i < nr_xcede_records; i++) {
u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+   char name[CPUIDLE_NAME_LEN];
+   unsigned int latency_hint = xcede_records[i].latency_hint;
+   u64 residency_us;
+
+   if (!xcede_records[i].responsive_to_irqs) {
+   pr_info("cpuidle : Skipping XCEDE%d. Not responsive to 
IRQs\n",
+   latency_hint);
+   continue;
+   }
 
if (latency_us < min_latency_us)
min_latency_us = latency_us;
+   snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
+
+   /*
+* As per the section 14.14.1 of PAPR version 2.8.1
+* says that alling H_CEDE with the value of the cede
+* latency specifier set greater than zero allows the
+* processor timer facility to be disabled (so as not
+* to cause gratuitous wake-ups - the use of H_PROD,
+* or other external interrupt is required to wake the
+* processor in this case).
+*
+* So, inform the cpuidle-subsystem that the timer
+* will be stopped for these states.
+*
+* Also, bump up the latency by 10us, since cpuidle
+* would use timer-offload framework which will need
+* to send an IPI to wakeup a CPU whose timer has
+* expired.
+*/
+   if (latency_hint > 0) {
+