* Gautham R Shenoy [2020-07-07 16:41:38]:
> From: "Gautham R. Shenoy"
>
> This patch exposes those extended CEDE states to the cpuidle framework
> which are responsive to external interrupts and do not need an H_PROD.
>
> Since as per the PAPR, all the extended CEDE states are non-responsive
> to timers, we indicate this to the cpuidle subsystem via the
> CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
> can wake up on external interrupts.
>
> With the patch, we are able to see the extended CEDE state with
> latency hint = 1 exposed via the cpuidle framework.
>
> $ cpupower idle-info
> CPUidle driver: pseries_idle
> CPUidle governor: menu
> analyzing CPU 0:
>
> Number of idle states: 3
> Available idle states: snooze CEDE XCEDE1
> snooze:
> Flags/Description: snooze
> Latency: 0
> Usage: 33429446
> Duration: 27006062
> CEDE:
> Flags/Description: CEDE
> Latency: 1
> Usage: 10272
> Duration: 110786770
> XCEDE1:
> Flags/Description: XCEDE1
> Latency: 12
> Usage: 26445
> Duration: 1436433815
>
> Benchmark results:
> TLDR: Over all we do not see any additional benefit from having XCEDE1 over
> CEDE.
>
> ebizzy :
> 2 threads bound to a big-core. With this patch, we see a 3.39%
> regression compared to with only CEDE0 latency fixup.
> x With only CEDE0 latency fixup
> * With CEDE0 latency fixup + CEDE1
> N Min MaxMedian AvgStddev
> x 10 2893813 5834474 5832448 5327281.3 1055941.4
> * 10 2907329 5834923 5831398 5146614.6 1193874.8
>
> context_switch2:
> With the context_switch2 there are no observable regressions in the
> results.
>
> context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
> No difference with and without patch.
> x without_patch
> * with_patch
> N Min MaxMedian AvgStddev
> x 500343644348778345444 345584.02 1035.1658
> * 500344310347646345776 345877.22 802.19501
>
> context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
> with patch
> x without_patch
> * with_patch
> N Min MaxMedian AvgStddev
> x 500287562288756288162 288134.76 262.24328
> * 500287874288960288306 288274.66 187.57034
>
> schbench:
> No regressions observed with schbench
>
> Without Patch:
> Latency percentiles (usec)
> 50.0th: 29
> 75.0th: 40
> 90.0th: 50
> 95.0th: 61
> *99.0th: 13648
> 99.5th: 14768
> 99.9th: 15664
> min=0, max=29812
>
> With Patch:
> Latency percentiles (usec)
> 50.0th: 30
> 75.0th: 40
> 90.0th: 51
> 95.0th: 59
> *99.0th: 13616
> 99.5th: 14512
> 99.9th: 15696
> min=0, max=15996
>
> Signed-off-by: Gautham R. Shenoy
Reviewed-by: Vaidyanathan Srinivasan
> ---
> drivers/cpuidle/cpuidle-pseries.c | 50
> +++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c
> b/drivers/cpuidle/cpuidle-pseries.c
> index 502f906..6f893cd 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
> for (i = 0; i < nr_xcede_records; i++) {
> u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
> u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> + char name[CPUIDLE_NAME_LEN];
> + unsigned int latency_hint = xcede_records[i].latency_hint;
> + u64 residency_us;
> +
> + if (!xcede_records[i].responsive_to_irqs) {
> + pr_info("cpuidle : Skipping XCEDE%d. Not responsive to
> IRQs\n",
> + latency_hint);
> + continue;
> + }
>
> if (latency_us < min_latency_us)
> min_latency_us = latency_us;
> + snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> +
> + /*
> + * As per the section 14.14.1 of PAPR version 2.8.1
> + * says that alling H_CEDE with the value of the cede
> + * latency specifier set greater than zero allows the
> + * processor timer facility to be disabled (so as not
> + * to cause gratuitous wake-ups - the use of H_PROD,
> + * or other external interrupt is required to wake the
> + * processor in this case).
> + *
> + * So, inform the cpuidle-subsystem that the timer
> + * will be stopped for these states.
> + *
> + * Also, bump up the latency by 10us, since cpu