Re: [PATCH] thermal: hisilicon: Add dependency on the clock driver to allow frequency scaling
Hi Amit, On Mon, Jun 20, 2016 at 05:46:36PM +0530, Amit Kucheria wrote: > The Hisilicon clock stub driver is needed to allow the thermal drivers to > actually scale the frequency. Make it an automatic dependency. > > Signed-off-by: Amit Kucheria > --- > drivers/thermal/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 22ae1f7..4e843f7 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -178,6 +178,7 @@ config THERMAL_EMULATION > config HISI_THERMAL > tristate "Hisilicon thermal driver" > depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > + select STUB_CLK_HI6220 Acked-by: Leo Yan > help > Enable this to plug hisilicon's thermal sensor driver into the Linux > thermal framework. cpufreq is used as the cooling device to throttle > -- > 2.5.0 >
[PATCH 3/3] arm64: defconfig: enable two common modules for power management
Enable two common modules for power management, one is to enable CPUFREQ_DT driver; this CPUFreq driver is used by many platforms by passing OPP table from device tree. Also enables CPU_THERMAL for CPU cooling device driver; so after we enabled thermal framework and can bind thermal zone with CPU cooling device driver. Signed-off-by: Leo Yan --- arch/arm64/configs/defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index eadf485..9fd2681 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -82,6 +82,7 @@ CONFIG_COMPAT=y CONFIG_CPU_IDLE=y CONFIG_ARM_CPUIDLE=y CONFIG_CPU_FREQ=y +CONFIG_CPUFREQ_DT=y CONFIG_ARM_BIG_LITTLE_CPUFREQ=y CONFIG_ARM_SCPI_CPUFREQ=y CONFIG_NET=y @@ -252,6 +253,7 @@ CONFIG_SENSORS_INA2XX=m CONFIG_SENSORS_ARM_SCPI=y CONFIG_THERMAL=y CONFIG_THERMAL_EMULATION=y +CONFIG_CPU_THERMAL=y CONFIG_EXYNOS_THERMAL=y CONFIG_WATCHDOG=y CONFIG_RENESAS_WDT=y -- 1.9.1
[PATCH 0/3] arm64: Hikey: enable CPUFreq and thermal drivers
Currently most power management code of 96boards Hikey have been merged into mainline kernel, but some of them have not been built by default; So this patch series is to build all these modules properly. The software layer is as below: set constraint cpufreq-dt driver <-- thermal/cpu_cooling driver | | V V hi6220 stub clock driverhisi thermal driver So first two patches are two enable fundamental modules for hi6220 stub clock driver and hisilicon thermal driver. Based on these two drivers, we also need enable common drivers for cpufreq-dt.c and cpu_cooling.c; these two drivers are commonly used by other platforms, so enable these two common drivers in arm64's defconfig. arm32's defconfig has enabled these two configurations yet. Leo Yan (3): clk: Hi6220: enable stub clock driver for ARCH_HISI thermal: hisilicon: fix for dependency arm64: defconfig: enable two common modules for power management arch/arm64/configs/defconfig | 2 ++ drivers/clk/hisilicon/Kconfig | 1 + drivers/thermal/Kconfig | 4 +++- 3 files changed, 6 insertions(+), 1 deletion(-) -- 1.9.1
[PATCH 2/3] thermal: hisilicon: fix for dependency
The thermal driver is standalone driver which is used to enable thermal sensors, so it can be used with any cooling device and should not bind with CPU cooling device driver. This original patch is suggested by Amit Kucheria; so it's to polish the dependency in Kconfig, and remove the dependency with CPU_THERMAL. Signed-off-by: Leo Yan --- drivers/thermal/Kconfig | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index 2d702ca..91ebab3 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -177,8 +177,10 @@ config THERMAL_EMULATION config HISI_THERMAL tristate "Hisilicon thermal driver" - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST + depends on ARCH_HISI || COMPILE_TEST depends on HAS_IOMEM + depends on OF + default y help Enable this to plug hisilicon's thermal sensor driver into the Linux thermal framework. cpufreq is used as the cooling device to throttle -- 1.9.1
[PATCH 1/3] clk: Hi6220: enable stub clock driver for ARCH_HISI
In current kernel config 'CONFIG_STUB_CLK_HI6220' is disabled by default, as result stub clock driver has not been registered and CPUFreq driver cannot work. This patch is to enable stub clock driver in config for ARCH_HISI. Reported-by: Dietmar Eggemann Signed-off-by: Leo Yan --- drivers/clk/hisilicon/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig index 3f537a0..9e0a95e 100644 --- a/drivers/clk/hisilicon/Kconfig +++ b/drivers/clk/hisilicon/Kconfig @@ -23,5 +23,6 @@ config RESET_HISI config STUB_CLK_HI6220 bool "Hi6220 Stub Clock Driver" depends on COMMON_CLK_HI6220 && MAILBOX + default ARCH_HISI help Build the Hisilicon Hi6220 stub clock driver. -- 1.9.1
[PATCH v2] arm64: defconfig: enable common modules for power management
Enable common modules for power management; one is to enable CPUFREQ_DT driver; the driver is used by many platforms by passing OPP table from device tree. Also enables thermal related drivers. Firstly we need enable configuration CPU_THERMAL for CPU cooling device driver, this will bind thermal zone with CPU cooling device; and enable 'power allocator' thermal governor. Signed-off-by: Leo Yan --- arch/arm64/configs/defconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index eadf485..c4f5948 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -82,6 +82,7 @@ CONFIG_COMPAT=y CONFIG_CPU_IDLE=y CONFIG_ARM_CPUIDLE=y CONFIG_CPU_FREQ=y +CONFIG_CPUFREQ_DT=y CONFIG_ARM_BIG_LITTLE_CPUFREQ=y CONFIG_ARM_SCPI_CPUFREQ=y CONFIG_NET=y @@ -252,6 +253,8 @@ CONFIG_SENSORS_INA2XX=m CONFIG_SENSORS_ARM_SCPI=y CONFIG_THERMAL=y CONFIG_THERMAL_EMULATION=y +CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y +CONFIG_CPU_THERMAL=y CONFIG_EXYNOS_THERMAL=y CONFIG_WATCHDOG=y CONFIG_RENESAS_WDT=y -- 1.9.1
Re: [PATCH v2] arm64: defconfig: enable common modules for power management
Hi Catalin, Will, On Thu, Sep 01, 2016 at 12:51:09PM +0800, Leo Yan wrote: > Enable common modules for power management; one is to enable > CPUFREQ_DT driver; the driver is used by many platforms by passing OPP > table from device tree. > > Also enables thermal related drivers. Firstly we need enable > configuration CPU_THERMAL for CPU cooling device driver, this will bind > thermal zone with CPU cooling device; and enable 'power allocator' > thermal governor. This patch is an updated version for [1] with enabling "power allocator". Sorry for regression. [1] http://archive.arm.linux.org.uk/lurker/message/20160831.085017.a42c57fe.en.html Thanks, Leo Yan > > Signed-off-by: Leo Yan > --- > arch/arm64/configs/defconfig | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index eadf485..c4f5948 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -82,6 +82,7 @@ CONFIG_COMPAT=y > CONFIG_CPU_IDLE=y > CONFIG_ARM_CPUIDLE=y > CONFIG_CPU_FREQ=y > +CONFIG_CPUFREQ_DT=y > CONFIG_ARM_BIG_LITTLE_CPUFREQ=y > CONFIG_ARM_SCPI_CPUFREQ=y > CONFIG_NET=y > @@ -252,6 +253,8 @@ CONFIG_SENSORS_INA2XX=m > CONFIG_SENSORS_ARM_SCPI=y > CONFIG_THERMAL=y > CONFIG_THERMAL_EMULATION=y > +CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y > +CONFIG_CPU_THERMAL=y > CONFIG_EXYNOS_THERMAL=y > CONFIG_WATCHDOG=y > CONFIG_RENESAS_WDT=y > -- > 1.9.1 >
[PATCH] sched/fair: polish function update_next_balance()
Function update_next_balance() is only used by idle balance, so its parameter 'cpu_busy' is redundant and always be passed with 0. This patch is to polish update_next_balance() to remove parameter 'cpu_busy'. Signed-off-by: Leo Yan --- kernel/sched/fair.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4088eed..e342159 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7704,11 +7704,12 @@ get_sd_balance_interval(struct sched_domain *sd, int cpu_busy) } static inline void -update_next_balance(struct sched_domain *sd, int cpu_busy, unsigned long *next_balance) +update_next_balance(struct sched_domain *sd, unsigned long *next_balance) { unsigned long interval, next; - interval = get_sd_balance_interval(sd, cpu_busy); + /* used by idle balance, so cpu_busy = 0 */ + interval = get_sd_balance_interval(sd, 0); next = sd->last_balance + interval; if (time_after(*next_balance, next)) @@ -7738,7 +7739,7 @@ static int idle_balance(struct rq *this_rq) rcu_read_lock(); sd = rcu_dereference_check_sched_domain(this_rq->sd); if (sd) - update_next_balance(sd, 0, &next_balance); + update_next_balance(sd, &next_balance); rcu_read_unlock(); goto out; @@ -7756,7 +7757,7 @@ static int idle_balance(struct rq *this_rq) continue; if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { - update_next_balance(sd, 0, &next_balance); + update_next_balance(sd, &next_balance); break; } @@ -7774,7 +7775,7 @@ static int idle_balance(struct rq *this_rq) curr_cost += domain_cost; } - update_next_balance(sd, 0, &next_balance); + update_next_balance(sd, &next_balance); /* * Stop searching for tasks to pull if there are -- 1.9.1
[PATCH] sched/core: fix one typo
Fix one minor typo in the comment: s/targer/target/. Signed-off-by: Leo Yan --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5c883fe..4cb3547 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1264,7 +1264,7 @@ static void __migrate_swap_task(struct task_struct *p, int cpu) /* * Task isn't running anymore; make it appear like we migrated * it before it went to sleep. This means on wakeup we make the -* previous cpu our targer instead of where it really is. +* previous cpu our target instead of where it really is. */ p->wake_cpu = cpu; } -- 1.9.1
[PATCH] sched/fair: refine maximum periods for decay_load
In current code, decay_load() will consider to set load value to zero after passing 32*64 ms. So this means max_load * (0.5^64) ~= 0. Kernel can support maximum number of processes and threads to 2^29 and set task with highest priority with nice=-20 (weight = 88761). So in worst case, one CPU may have maximum load value is: max_load = 2^29 * 88761 < 2^46 In theory after pass 46 periods we can ensure load value will be decayed to zero. So this patch is to change maximum periods from 64 to 48. Signed-off-by: Leo Yan --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e342159..55cb134 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2622,7 +2622,7 @@ static __always_inline u64 decay_load(u64 val, u64 n) if (!n) return val; - else if (unlikely(n > LOAD_AVG_PERIOD * 63)) + else if (unlikely(n > LOAD_AVG_PERIOD * 47)) return 0; /* after bounds checking we can collapse to 32-bit */ -- 1.9.1
Re: [PATCH] sched/fair: refine maximum periods for decay_load
Hi Vincent, Thanks for review. On Fri, Aug 05, 2016 at 09:56:46AM +0200, Vincent Guittot wrote: > Hi Leo, > > On 5 August 2016 at 08:34, Leo Yan wrote: > > > > In current code, decay_load() will consider to set load value to zero > > after passing 32*64 ms. So this means max_load * (0.5^64) ~= 0. > > > > Kernel can support maximum number of processes and threads to 2^29 and > > set task with highest priority with nice=-20 (weight = 88761). So in > > worst case, one CPU may have maximum load value is: > > > > max_load = 2^29 * 88761 < 2^46 > > > > In theory after pass 46 periods we can ensure load value will be decayed > > to zero. So this patch is to change maximum periods from 64 to 48. > > > > Signed-off-by: Leo Yan > > --- > > kernel/sched/fair.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e342159..55cb134 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2622,7 +2622,7 @@ static __always_inline u64 decay_load(u64 val, u64 n) > > > > if (!n) > > return val; > > - else if (unlikely(n > LOAD_AVG_PERIOD * 63)) > > + else if (unlikely(n > LOAD_AVG_PERIOD * 47)) > > In the equation above, you use 46 then you mentioned : "change maximum > periods from 64 to 48." and finally you use 47. Sorry introduce confusion. I want to align to 16 so choose maximum periods to 48, and due to condition is (n > LOAD_AVG_PERIOD * 47) so it equeals to (n >= LOAD_AVG_PERIOD * 48). Precisely I think the code should be: else if (unlikely(n >= LOAD_AVG_PERIOD * 46)) Do you think this is okay? > > return 0; > > > > /* after bounds checking we can collapse to 32-bit */ > > -- > > 1.9.1 > >
[PATCH v2] sched/fair: refine maximum periods for decay_load
In current code, decay_load() will consider to set load value to zero after passing 32*64 ms. So this means max_load * (0.5^64) ~= 0. Kernel can support maximum number of processes and threads to 2^29 and set task with highest priority with nice=-20 (weight = 88761). So in worst case, one CPU may have maximum load value is: max_load = 2^29 * 88761 < 2^46 In theory after pass 46 periods we can ensure load value to be decayed to zero. So this patch is to change maximum periods from 64 to 46. Signed-off-by: Leo Yan --- kernel/sched/fair.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e342159..bbd01eb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2622,7 +2622,15 @@ static __always_inline u64 decay_load(u64 val, u64 n) if (!n) return val; - else if (unlikely(n > LOAD_AVG_PERIOD * 63)) + /* +* Kernel can support maximum number of processes and threads up to +* 2^29 and task has highest weight 88761 with nice=-20. +* +* maximum load = 2^29 * 88761 < 2^46 +* +* So pass 46 periods can ensure maximum load to be decayed to zero. +*/ + else if (unlikely(n >= LOAD_AVG_PERIOD * 46)) return 0; /* after bounds checking we can collapse to 32-bit */ -- 1.9.1
[PATCH] clk: Hi6220: enable stub clock driver for ARCH_HISI
In current kernel config 'CONFIG_STUB_CLK_HI6220' is disabled by default, as result stub clock driver has not been registered and CPUFreq driver cannot work. This patch is to enable stub clock driver in config for ARCH_HISI. Reported-by: Dietmar Eggemann Signed-off-by: Leo Yan --- drivers/clk/hisilicon/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig index 3f537a0..9e0a95e 100644 --- a/drivers/clk/hisilicon/Kconfig +++ b/drivers/clk/hisilicon/Kconfig @@ -23,5 +23,6 @@ config RESET_HISI config STUB_CLK_HI6220 bool "Hi6220 Stub Clock Driver" depends on COMMON_CLK_HI6220 && MAILBOX + default ARCH_HISI help Build the Hisilicon Hi6220 stub clock driver. -- 1.9.1
Re: [PATCH] clk: Hi6220: enable stub clock driver for ARCH_HISI
Hi Amit, On Mon, Aug 08, 2016 at 11:23:31AM +0530, Amit Kucheria wrote: > On Mon, Aug 8, 2016 at 9:07 AM, Leo Yan wrote: > > In current kernel config 'CONFIG_STUB_CLK_HI6220' is disabled by > > default, as result stub clock driver has not been registered and > > CPUFreq driver cannot work. > > I have a related patch that has been pending for a while: > https://lkml.org/lkml/2016/6/20/375 but it was tied only to the > thermal driver. I also have concern this patch may duplicate with yours. > Is the stub mandatory for the architecture? Will other SoCs in the > family will use the same stub? I don't think stub driver is mandartory for archtitecture and it's only used by Hi6220 on Hikey. AFAIK, currently stub driver is only used by CPU frequency change. The logic is: Thermal cooling device driver `-> CPUFreq DT driver `-> Stub clock driver ARM is working on Hikey for EAS profiling, so usually the use case is to enable CPUFreq driver and stub clock driver. Sometimes it only need enable this driver without thermal driver; so this is why we cannot rely on thermal driver to enable stub clock driver. I'm open-minded if you have better idea to enable it. > > This patch is to enable stub clock driver in config for ARCH_HISI. > > > > Reported-by: Dietmar Eggemann > > Signed-off-by: Leo Yan > > --- > > drivers/clk/hisilicon/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig > > index 3f537a0..9e0a95e 100644 > > --- a/drivers/clk/hisilicon/Kconfig > > +++ b/drivers/clk/hisilicon/Kconfig > > @@ -23,5 +23,6 @@ config RESET_HISI > > config STUB_CLK_HI6220 > > bool "Hi6220 Stub Clock Driver" > > depends on COMMON_CLK_HI6220 && MAILBOX > > + default ARCH_HISI > > help > > Build the Hisilicon Hi6220 stub clock driver. > > -- > > 1.9.1 > >
Re: [PATCH] clk: Hi6220: enable stub clock driver for ARCH_HISI
On Mon, Aug 08, 2016 at 08:12:21PM +0530, Amit Kucheria wrote: [...] > >> > diff --git a/drivers/clk/hisilicon/Kconfig > >> > b/drivers/clk/hisilicon/Kconfig > >> > index 3f537a0..9e0a95e 100644 > >> > --- a/drivers/clk/hisilicon/Kconfig > >> > +++ b/drivers/clk/hisilicon/Kconfig > >> > @@ -23,5 +23,6 @@ config RESET_HISI > >> > config STUB_CLK_HI6220 > >> > bool "Hi6220 Stub Clock Driver" > >> > depends on COMMON_CLK_HI6220 && MAILBOX > >> > + default ARCH_HISI > > Instead of forcing this up on the entire arch, why not restrict it to > just the cpufreq driver? ARM_HISI_ACPU_CPUFREQ? ARM_HISI_ACPU_CPUFREQ has been removed by 3920be471ce7f "cpufreq: hisilicon: Use generic platdev driver". By default now platforms are using drivers/cpufreq/cpufreq-dt-platdev.c. > >> > help > >> > Build the Hisilicon Hi6220 stub clock driver. > >> > -- > >> > 1.9.1 > >> >
Re: [PATCH] clk: Hi6220: enable stub clock driver for ARCH_HISI
On Mon, Aug 08, 2016 at 09:36:32PM +0100, Daniel Thompson wrote: [...] > >My earlier patch focused on enabling the stub driver in the case the > >thermal driver was enabled (and subsequently turning on HISI_THERMAL > >in defconfig). The EAS profiling usecase to prevent thermal-throttling > >from kicking in is a bad default to have in the kernel, IMO - it can > >be easily achieved by just changing the thermal thresholds. > > > >Something like the following, with HISI_THERMAL added to defconfig > >would give a "stable" kernel on Hikey. > > > >diff --git i/drivers/thermal/Kconfig w/drivers/thermal/Kconfig > >index 2d702ca..77597a5 100644 > >--- i/drivers/thermal/Kconfig > >+++ w/drivers/thermal/Kconfig > >@@ -177,8 +177,11 @@ config THERMAL_EMULATION > > > > config HISI_THERMAL > > tristate "Hisilicon thermal driver" > >- depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > >+ depends on (ARCH_HISI && OF) || COMPILE_TEST > > depends on HAS_IOMEM > >+ select CPU_THERMAL > >+ select CPUFREQ_DT > >+ select STUB_CLK_HI6220 > > I'm actually a little uncomfortable having a thermal sensor dictate > what cooling devices are used to react to its temperature reading. > The link between sensors and cooling devices comes from DT. > > However I admit there are other platforms (IMX and DB8500) that > accept the same build time diktat from their thermal sensors. For thermal enabling on Hikey with CPU cooling device, how about below change? I checked arch/arm/configs/multi_v7_defconfig, both CONFIG_CPUFREQ_DT and CONFIG_CPU_THERMAL have been enabled in it. These two drivers are quite common and used by many ARM platforms. diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 0555b7c..f65336f 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -78,6 +78,7 @@ CONFIG_COMPAT=y CONFIG_CPU_IDLE=y CONFIG_ARM_CPUIDLE=y CONFIG_CPU_FREQ=y +CONFIG_CPUFREQ_DT=y CONFIG_ARM_BIG_LITTLE_CPUFREQ=y CONFIG_ARM_SCPI_CPUFREQ=y CONFIG_NET=y @@ -217,6 +218,7 @@ CONFIG_SENSORS_INA2XX=m CONFIG_SENSORS_ARM_SCPI=y CONFIG_THERMAL=y CONFIG_THERMAL_EMULATION=y +CONFIG_CPU_THERMAL=y CONFIG_EXYNOS_THERMAL=y CONFIG_WATCHDOG=y CONFIG_RENESAS_WDT=y diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index 2d702ca..91ebab3 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -177,8 +177,10 @@ config THERMAL_EMULATION config HISI_THERMAL tristate "Hisilicon thermal driver" - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST + depends on ARCH_HISI || COMPILE_TEST depends on HAS_IOMEM + depends on OF + default y help Enable this to plug hisilicon's thermal sensor driver into the Linux thermal framework. cpufreq is used as the cooling device to throttle
Re: [PATCH 2/3] thermal: hisilicon: add thermal sensor driver for Hi3660
Hi Rui, Eduardo, On Tue, Jun 20, 2017 at 11:40:34AM +0800, Tao Wang wrote: [...] > diff --git a/drivers/thermal/hi3660_thermal.c > b/drivers/thermal/hi3660_thermal.c > new file mode 100644 > index 000..a538721 > --- /dev/null > +++ b/drivers/thermal/hi3660_thermal.c > @@ -0,0 +1,198 @@ > +/* > + * linux/drivers/thermal/hi3660_thermal.c > + * > + * Copyright (c) 2017 Hisilicon Limited. > + * Copyright (c) 2017 Linaro Limited. > + * > + * Author: Tao Wang > + * Author: Leo Yan > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; version 2 of the License. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "thermal_core.h" > + > +#define HW_MAX_SENSORS 4 > +#define HISI_MAX_SENSORS 6 > +#define SENSOR_MAX 4 > +#define SENSOR_AVG 5 > + > +#define ADC_MIN 116 > +#define ADC_MAX 922 > + > +/* hi3660 Thermal Sensor Dev Structure */ > +struct hi3660_thermal_sensor { > + struct hi3660_thermal_data *thermal; > + struct thermal_zone_device *tzd; > + > + uint32_t id; > +}; > + > +struct hi3660_thermal_data { > + struct platform_device *pdev; > + struct hi3660_thermal_sensor sensors[HISI_MAX_SENSORS]; > + void __iomem *thermal_base; > +}; > + > +unsigned int sensor_reg_offset[HW_MAX_SENSORS] = { 0x1c, 0x5c, 0x9c, 0xdc }; > + > + > +static int hi3660_thermal_get_temp(void *_sensor, int *temp) > +{ > + struct hi3660_thermal_sensor *sensor = _sensor; > + struct hi3660_thermal_data *data = sensor->thermal; > + unsigned int idx; > + int val, average = 0, max = 0; > + > + if (sensor->id < HW_MAX_SENSORS) { > + val = readl(data->thermal_base + sensor_reg_offset[sensor->id]); > + val = clamp_val(val, ADC_MIN, ADC_MAX); > + } else { > + for (idx = 0; idx < HW_MAX_SENSORS; idx++) { > + val = readl(data->thermal_base > + + sensor_reg_offset[idx]); > + val = clamp_val(val, ADC_MIN, ADC_MAX); > + average += val; > + if (val > max) > + max = val; > + } > + > + if (sensor->id == SENSOR_MAX) > + val = max; > + else if (sensor->id == SENSOR_AVG) > + val = average / HW_MAX_SENSORS; > + } I think here have one thing it's better to check with you ahead and want your suggestions. Tao adds two 'software' sensors, one is sensor 4 which is used to present the maximum temperature value crossing from sensor 0 ~ 3; and another sensor 4 is used to present the average temperature value. Does this make sense for you? > + *temp = ((val - ADC_MIN) * 165000) / (ADC_MAX - ADC_MIN) - 4; > + > + return 0; > +} > + > +static struct thermal_zone_of_device_ops hi3660_of_thermal_ops = { > + .get_temp = hi3660_thermal_get_temp, > +}; > + > +static int hi3660_thermal_register_sensor(struct platform_device *pdev, > + struct hi3660_thermal_data *data, > + struct hi3660_thermal_sensor *sensor, > + int index) > +{ > + int ret = 0; > + > + sensor->id = index; > + sensor->thermal = data; > + > + sensor->tzd = devm_thermal_zone_of_sensor_register(&pdev->dev, > + sensor->id, sensor, &hi3660_of_thermal_ops); > + if (IS_ERR(sensor->tzd)) { > + ret = PTR_ERR(sensor->tzd); > + sensor->tzd = NULL; > + } > + > + return ret; > +} > + > +static void hi3660_thermal_toggle_sensor(struct hi3660_thermal_sensor > *sensor, > +bool on) > +{ > + struct thermal_zone_device *tzd = sensor->tzd; > + > + tzd->ops->set_mode(tzd, > + on ? THERMAL_DEVICE_ENABLED : THE
Re: [PATCH] arm64: kdump: Avoid to power off nonpanic CPUs
Hi Mark, On Tue, Oct 10, 2017 at 01:51:33PM -0600, Mathieu Poirier wrote: > On 8 October 2017 at 09:35, Mark Rutland wrote: > > Hi Leo, > > > > On Sun, Oct 08, 2017 at 10:12:46PM +0800, Leo Yan wrote: > >> commit a88ce63b642c ("arm64: kexec: have own crash_smp_send_stop() for > >> crash dump for nonpanic cores") introduces ARM64 architecture function > >> crash_smp_send_stop() to replace the weak function, this results in > >> the nonpanic CPUs to be hot-plugged out and CPUs are placed into low > >> power state on ARM64 platforms with the flow: > >> > >> Panic CPU: > >> machine_crash_shutdown() > >> crash_smp_send_stop() > >> smp_cross_call(&mask, IPI_CPU_CRASH_STOP) > >> > >> Nonpanic CPUs: > >> handle_IPI() > >> ipi_cpu_crash_stop() > >> cpu_ops[cpu]->cpu_die() > >> > >> The upper patch has no issue if enabled crash dump only; but if enabled > >> crash dump and Coresight debug module for panic dumping at the meantime, > >> nonpanic CPUs are powered off in crash dump flow, > > > > We want to turn secondary CPUs off if at all possible, since we want to > > prevent > > issues resulting from asynchronous behaviour (e.g. TLB/cache fetches) that > > could result in subsequent problems (e.g. if bad page tables resulted in > > page > > table walks to MMIO devices). > > > > So we *really* want this behaviour in the general case. > > > >> later this may introduce conflicts with the Coresight debug module because > >> Coresight debug registers dumping requires the CPU must be powered on for > >> some platforms (e.g. Hi6220 on Hikey board). If we cannot keep the CPUs > >> powered on, we can see the hardware lockup issue when access Coresight > >> debug > >> registers. > > > > Just to check I understand, the coresight debug module is being invoked as a > > panic notifier in the current kernel, right? > > > >> To fix this issue, this commit removes CPU hotplug operation in func > >> crash_smp_send_stop() and let CPUs to run into WFE/WFI states so CPUs > >> can still be powered on after crash dump. This finally is more safe > >> for Coresight debug module to dump registers and avoid hardware lockup. > >> > >> Cc: James Morse > >> Cc: Will Deacon > >> Cc: Catalin Marinas > >> Cc: Mathieu Poirier > >> Signed-off-by: Leo Yan > >> --- > >> arch/arm64/kernel/smp.c | 6 -- > >> 1 file changed, 6 deletions(-) > >> > >> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > >> index 9f7195a..a65e68b 100644 > >> --- a/arch/arm64/kernel/smp.c > >> +++ b/arch/arm64/kernel/smp.c > >> @@ -856,12 +856,6 @@ static void ipi_cpu_crash_stop(unsigned int cpu, > >> struct pt_regs *regs) > >> > >> local_irq_disable(); > >> > >> -#ifdef CONFIG_HOTPLUG_CPU > >> - if (cpu_ops[cpu]->cpu_die) > >> - cpu_ops[cpu]->cpu_die(cpu); > >> -#endif > >> - > > > > If it's really necessary to keep secondary CPUs online, please limit that to > > the case where the coresight debug module is being used. > > > > IIRC there were similar interactions with cpuidle, and I don't see why > > hotplug > > should be any different. > > Can you point to where it was fixed for CPUidle? We should try to do > the same for coresight_debug so that things are done the same way. > I'm also thinking that we could call ->cpu_die(cpu) in a #ifdef > CONFIG_HOTPLUG_CPU clause in debug_notifier_call(). That way the > behaviour remains the same, just enacted a little later - please > advise on what option you prefer. IMHO 's more readable to place hotplug operations into the function ipi_cpu_crash_stop(), due this function is doing stuffs related with "cpu stop". But I think Mathieu's question is for you :) Could you give advice as well? Thanks, Leo Yan
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote: > On 10/10/17 16:45, Mark Rutland wrote: > > On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote: > >> Hi Mark, > > > > Hi Leo, > > > >> I work mainline kernel on Hikey620 board, I find it's easily to > >> introduce the panic and report the log as below. So I bisect the kernel > >> and finally narrow down the commit e3067861ba66 ("arm64: add basic > >> VMAP_STACK support") which introduce this issue. > >> > >> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from > >> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So > >> could you check this and have insight for this issue? > > > > Given the stuff in the backtrace, my suspicion is something is trying to > > perform DMA to/from the stack, getting junk addresses form the attempted > > virt<->phys conversions. > > > > Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL? > > CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack > addresses either way, too. Thanks for suggestions, Mark & Robin. I enabled these debugging configs but cannot get clue from it; but occasionally found this issue is quite likely related with CA53 errata, especialy ERRATA_A53_855873 is the relative one. So I changed to use ARM-TF mainline code with ERRATA fixing, this issue can be dismissed. Please ignore this regression reporting. Thanks, Leo Yan
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On Mon, Oct 16, 2017 at 03:35:46PM +0100, Robin Murphy wrote: > On 16/10/17 15:26, Mark Rutland wrote: > > On Mon, Oct 16, 2017 at 03:12:45PM +0100, Robin Murphy wrote: > >> On 16/10/17 14:48, Mark Rutland wrote: > >>> Hi Leo, > >>> > >>> On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote: > >>>> On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote: > >>>>> On 10/10/17 16:45, Mark Rutland wrote: > >>>>>> On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote: > >>>>>>> I work mainline kernel on Hikey620 board, I find it's easily to > >>>>>>> introduce the panic and report the log as below. So I bisect the > >>>>>>> kernel > >>>>>>> and finally narrow down the commit e3067861ba66 ("arm64: add basic > >>>>>>> VMAP_STACK support") which introduce this issue. > >>>>>>> > >>>>>>> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from > >>>>>>> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So > >>>>>>> could you check this and have insight for this issue? > >>>>>> > >>>>>> Given the stuff in the backtrace, my suspicion is something is trying > >>>>>> to > >>>>>> perform DMA to/from the stack, getting junk addresses form the > >>>>>> attempted > >>>>>> virt<->phys conversions. > >>>>>> > >>>>>> Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL? > >>>>> > >>>>> CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack > >>>>> addresses either way, too. > >>>> > >>>> Thanks for suggestions, Mark & Robin. > >>>> > >>>> I enabled these debugging configs but cannot get clue from it; but > >>>> occasionally found this issue is quite likely related with CA53 errata, > >>>> especialy ERRATA_A53_855873 is the relative one. So I changed to use > >>>> ARM-TF mainline code with ERRATA fixing, this issue can be dismissed. > >>> > >>> Thanks for the update. > >>> > >>> Just to confirm, with the updated firmware you no longer see the issue? > >>> > >>> I can't immediately see how that would be related. > >> > >> Cores up to r0p2 have the other errata to which > >> ARM64_WORKAROUND_CLEAN_CACHE also applies anyway; r3p0+ have an ACTLR > >> bit to do thee CVAC->CIVAC upgrade in hardware, and our policy is that > >> we expect firmware to enable such hardware workarounds where possible. I > >> assume that's why we don't explicitly document 855873 anywhere in Linux. > > > > Sure, I also looked it up. ;) > > > > I meant that I couldn't immediately see why VMAP'd stacks were likely to > > tickle issues with that more reliably. > > Ah, right - in context, "that" appeared to refer to "updated firmware", > not "VMAP_STACK". Sorry. > > I guess the vmap addresses might tickle the "same L2 set" condition > differently to when both stack and DMA buffer are linear map addresses. A bit more info for this. I can reproduce this memory abort panic, and the panic places are not consistent; usually it's related with kmalloc address. Do you think "VMAP_STACK" introduces much more operations for cache clean? If so if might be in the same *set* with any other memory access (like kmalloc operations), then trigger data abort. Hikey has CA53 CPUs is r3 version so it's luck can directly apply the ERRATA 855873 in ARM-TF. BTW, in case I may mislead you guys, we should note there have another two ERRATAs applied in ARM-TFv1.4 for Hikey: ERRATA_A53_836870 := 1 ERRATA_A53_843419 := 1 Thanks, Leo Yan
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On Mon, Oct 16, 2017 at 02:48:19PM +0100, Mark Rutland wrote: > Hi Leo, > > On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote: > > On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote: > > > On 10/10/17 16:45, Mark Rutland wrote: > > > > On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote: > > > >> I work mainline kernel on Hikey620 board, I find it's easily to > > > >> introduce the panic and report the log as below. So I bisect the kernel > > > >> and finally narrow down the commit e3067861ba66 ("arm64: add basic > > > >> VMAP_STACK support") which introduce this issue. > > > >> > > > >> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from > > > >> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So > > > >> could you check this and have insight for this issue? > > > > > > > > Given the stuff in the backtrace, my suspicion is something is trying to > > > > perform DMA to/from the stack, getting junk addresses form the attempted > > > > virt<->phys conversions. > > > > > > > > Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL? > > > > > > CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack > > > addresses either way, too. > > > > Thanks for suggestions, Mark & Robin. > > > > I enabled these debugging configs but cannot get clue from it; but > > occasionally found this issue is quite likely related with CA53 errata, > > especialy ERRATA_A53_855873 is the relative one. So I changed to use > > ARM-TF mainline code with ERRATA fixing, this issue can be dismissed. > > Thanks for the update. > > Just to confirm, with the updated firmware you no longer see the issue? Yes. > I can't immediately see how that would be related. > > Thanks, > Mark.
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On Tue, Oct 17, 2017 at 10:32:21AM +0100, Ard Biesheuvel wrote: [...] > > AFAICT, erratum 836870 results in livelock rather than memory > > corruption, so I think we can ignore that. > > > > I'm a little worried by erratum 843419. The VMAP_STACK patches changed > > {adr,ldr}_this_cpu (and some users thereof), and it's possible we're > > managing to tickle that issue. > > > > If you still have an affected kernel, could you dump the output of: > > > > $ aarch64-linux-gnu-objdump -d vmlinux | grep -A 3 > > 'ff[8c]:\s\+[a-f0-9]\+\s\+adrp' > > > > ... that would show us if there are any affected sequences. > > > > From a quick scan of my own vmlinux build from commit e3067861ba66, I > > didn't see any, but it's possible this depends on the config used. > > > > The linker should take care of that: it scans the entire executable, > and inserts a veneer if an adrp happens to end up at a vulnerable > offset in the page. Is this dependent on any GCC version? I am using GCC 6.2.1, so I get many affected sequences with Mark's command: leoy@leoy-linaro:~/Work/reference/opensource/linux$ aarch64-linux-gnu-objdump -d vmlinux | grep -A 3 'ff[8c]:\s\+[a-f0-9]\+\s\+adrp' 080a1ffc: 90007340adrpx0, 08f09000 080a2000: a900fedfstp xzr, xzr, [x22,#8] 080a2004: 91374000add x0, x0, #0xdd0 080a2008: f9000ec0str x0, [x22,#24] -- 080b6ff8: b0007ce0adrpx0, 09053000 080b6ffc: 52901801mov w1, #0x80c0 // #32960 080b7000: 72a02801movkw1, #0x140, lsl #16 080b7004: 9102e273add x19, x19, #0xb8 -- 080f1ff8: d00070a1adrpx1, 08f07000 080f1ffc: f9406402ldr x2, [x0,#200] 080f2000: f9405c03ldr x3, [x0,#184] 080f2004: 14002915b 080fc458 -- 080feff8: 9002adrpx2, 080fe000 080feffc: 9136e042add x2, x2, #0xdb8 080ff000: f9000462str x2, [x3,#8] 080ff004: f9448e62ldr x2, [x19,#2328] -- 0810affc: f0006fe1adrpx1, 08f09000 0810b000: f94017a2ldr x2, [x29,#40] 0810b004: aa1303e0mov x0, x19 0810b008: 1400013db 0810b4fc -- 0811dff8: d0005c80adrpx0, 08caf000 0811dffc: 912be021add x1, x1, #0xaf8 0811e000: 912ae000add x0, x0, #0xab8 0811e004: 97ffe6abbl 08117ab0 -- 08137ff8: f0004443adrpx3, 089c2000 08137ffc: 910ac042add x2, x2, #0x2b0 08138000: 9113c063add x3, x3, #0x4f0 08138004: 71000c9fcmp w4, #0x3 -- 0815eff8: b0005aa1adrpx1, 08cb3000 0815effc: 91362021add x1, x1, #0xd88 0815f000: 97c8bl 0815ef20 0815f004: b94023a1ldr w1, [x29,#32]
Re: [PATCH 02/25] thermal/drivers/hisi: Remove the multiple sensors support
On Tue, Oct 17, 2017 at 02:07:08PM -0700, Eduardo Valentin wrote: > On Tue, Oct 17, 2017 at 09:03:40PM +0200, Daniel Lezcano wrote: > > On 17/10/2017 20:25, Eduardo Valentin wrote: > > > Hello, > > > > > > On Tue, Oct 17, 2017 at 02:28:27PM +0200, Daniel Lezcano wrote: > > >> On 17/10/2017 05:54, Eduardo Valentin wrote: > > >>> On Tue, Oct 10, 2017 at 08:02:27PM +0200, Daniel Lezcano wrote: > > >>>> By essence, the tsensor does not really support multiple sensor at the > > >>>> same > > >>>> time. It allows to set a sensor and use it to get the temperature, > > >>>> another > > >>>> sensor could be switched but with a delay of 3-5ms. It is difficult to > > >>>> read > > >>>> simultaneously several sensors without a big delay. > > >>>> > > >>> > > >>> Is 3-5 ms enough to loose an event? Is this really a problem? > > >> > > >> There are several aspects: > > >> > > >> - the multiple sensors is not needed here > > > > > > Well, that is debatable, I cannot really agree or disagree with the > > > above statement without understanding the use cases and most important, > > > the location of each sensor. What is the location of each sensor? > > > > > >> > > >> - the temperature controller is not designed to read several sensors at > > >> the same time, we switch the sensor and that clears some internal > > >> buffers and re-init the controller > > > > > > Which is still very helpful in case you have multiple hotspots that you > > > want to track and they are exposed on different workloads. Sacrificing > > > the availability of sensors is something needs a better justification > > > other than "current code uses only one". > > > > > > > > >> > > >> - some boards can take 40°C in 1 sec, the temperature increase is > > >> insanely fast and reading several sensors add an extra 15ms. > > >> > > > > > > > > > Ok... What is the difference in update rate with and without the switch > > > of sensors? With the above worst case, you have about 4/6 mC/ms. Can > > > your tsensor support that resolution for a single sensor? What is the > > > maximum resolution a tsensor can support? What is the penalty added with > > > switch? > > > > > > Based on this data, and the above 3-5ms, that means you would miss about > > > ~ 3 - 4 mC while switching ( assuming tsensor can really achieve the > > > above rate of change: 5ms * 4/6 mC /ms). Are you sure that is > > > enough justification to drop three extra sensors? > > > > Ok if I refer to the documentation the rate is 0.768 ms with the current > > configuration. > > > > The driver is currently bogus: register overwritten, bouncing interrupt, > > unneeded lock, ... So the proposition was to remove the multiple sensors > > support, clean the driver, and re-introduce it if there is a need. > > > > If I remember correctly Leo, author of the driver, agreed on this. Leo ? > > > > Note, I'm not strongly against multiple sensors support in the driver if > > you think it is convenient but it is much simpler to remove the current > > code as it is not used and put it back on top of a sane foundation > > instead of circumventing that on the existing code. > > > > > > I am also fine with the above strategy, as long as you are sure you are > not breaking anyone (specially userspace). Also, it would be good to get > a reviewed-by from hisilicon just to confirm (Leo?). Sorry I missed to reply this patch. And yes, I have tested and reviewed it at my side: Reviewed-by: Leo Yan P.s. I am working for Linaro; I am continously co-working with Hisilicon to maintain this driver due it's important for Hikey/Hikey960 two boards stability; this driver also is important for our daily profiling for power and performance. Eduardo, so please let us know if you still need ack from Hisilicon engineer. > Besides, once you get his reviewed-by, and add it to the patches, > can you please resend the series with the minor issues I > mentioned (a few minor checkpatch issues and one compilation warn that > is added to the driver after the series is applied). > > > > > > > > > -- > > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > > <http://twitter.com/#!/linaroorg> Twitter | > > <http://www.linaro.org/linaro-blog/> Blog > >
Re: [PATCH v13 0/9] coresight: enable debug module
On Mon, May 29, 2017 at 10:41:17AM -0600, Mathieu Poirier wrote: > On 25 May 2017 at 09:57, Leo Yan wrote: > > ARMv8 architecture reference manual (ARM DDI 0487A.k) Chapter H7 "The > > Sample-based Profiling Extension" has description for sampling > > registers, we can utilize these registers to check program counter > > value with combined CPU exception level, secure state, etc. So this is > > helpful for CPU lockup bugs, e.g. if one CPU has run into infinite loop > > with IRQ disabled; the 'hang' CPU cannot switch context and handle any > > interrupt, so it cannot handle SMP call for stack dump, etc. > > > > This patch series is to enable coresight debug module with sample-based > > registers and register call back notifier for PCSR register dumping > > when panic happens, so we can see below dumping info for panic; and > > this patch series has considered the conditions for access permission > > for debug registers self, so this can avoid access debug registers when > > CPU power domain is off; the driver also try to figure out the CPU is > > in secure or non-secure state. > > I have queued patches 1 to 7 to my tree. I can't do anything about > patches 8 and 9 because they haven't been ack'ed. From here you can > either chase them to get an ACK or send a separate patch to them > directly. Thanks a lot, Mathieu. I will ping Wei and Andy/David saperately. > Thanks, > Mathieu
Re: [PATCH v13 8/9] arm64: dts: hi6220: register debug module
Hi Wei, On Thu, May 25, 2017 at 11:57:15PM +0800, Leo Yan wrote: > Bind debug module driver for Hi6220. Could you ACK this patch? From Mathieu's previous suggestion, after your confirmation he could pick up this patch. If you want me to send a separate patch to you directly, also is okay. Please let me know which is preferring. Thanks, Leo Yan > Reviewed-by: Mathieu Poirier > Signed-off-by: Leo Yan > --- > arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 64 > +++ > 1 file changed, 64 insertions(+) > > diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi > b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi > index 1e5129b..21805b9 100644 > --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi > +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi > @@ -916,5 +916,69 @@ > }; > }; > }; > + > + debug@f659 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf659 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu0>; > + }; > + > + debug@f6592000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf6592000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu1>; > + }; > + > + debug@f6594000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf6594000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu2>; > + }; > + > + debug@f6596000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf6596000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu3>; > + }; > + > + debug@f65d { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf65d 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu4>; > + }; > + > + debug@f65d2000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf65d2000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu5>; > + }; > + > + debug@f65d4000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf65d4000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu6>; > + }; > + > + debug@f65d6000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0 0xf65d6000 0 0x1000>; > + clocks = <&sys_ctrl HI6220_DAPB_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&cpu7>; > + }; > }; > }; > -- > 2.7.4 >
Re: [PATCH v13 9/9] arm64: dts: qcom: msm8916: Add debug unit
Hi Andy, David, [ + Nico ] On Fri, May 26, 2017 at 12:04:13AM +0800, Leo Yan wrote: > Add debug unit on Qualcomm msm8916 based platforms, including the > DragonBoard 410c board. Could you take a look for this patch? After get your ACK I think Mathieu could help pick up this patch through coresight repository. If you want me to send a separate patch to you directly, also is okay. Please let me know which is your preferring. Thanks, Leo Yan > Reviewed-by: Mathieu Poirier > Signed-off-by: Leo Yan > --- > arch/arm64/boot/dts/qcom/msm8916.dtsi | 32 > 1 file changed, 32 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi > b/arch/arm64/boot/dts/qcom/msm8916.dtsi > index ab30939..17691ab 100644 > --- a/arch/arm64/boot/dts/qcom/msm8916.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi > @@ -1116,6 +1116,38 @@ > }; > }; > > + debug@85 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0x85 0x1000>; > + clocks = <&rpmcc RPM_QDSS_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&CPU0>; > + }; > + > + debug@852000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0x852000 0x1000>; > + clocks = <&rpmcc RPM_QDSS_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&CPU1>; > + }; > + > + debug@854000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0x854000 0x1000>; > + clocks = <&rpmcc RPM_QDSS_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&CPU2>; > + }; > + > + debug@856000 { > + compatible = "arm,coresight-cpu-debug","arm,primecell"; > + reg = <0x856000 0x1000>; > + clocks = <&rpmcc RPM_QDSS_CLK>; > + clock-names = "apb_pclk"; > + cpu = <&CPU3>; > + }; > + > etm@85c000 { > compatible = "arm,coresight-etm4x", "arm,primecell"; > reg = <0x85c000 0x1000>; > -- > 2.7.4 >
Re: [PATCH 0/2] Add support for Hi6220 coresight
On Wed, Nov 01, 2017 at 04:48:50PM +0100, Michael Turquette wrote: > Quoting Wei Xu (2017-10-13 10:57:02) > > Hi Leo, > > > > On 2017/10/7 13:18, Leo Yan wrote: > > > Hi Stephen, Wei, > > > > > > On Thu, Aug 31, 2017 at 06:33:01PM -0700, Stephen Boyd wrote: > > >> On 09/01, Leo Yan wrote: > > >>> This patch series adds support for coresight on Hi6220; the first patch > > >>> is to fix coresight PLL so can avoid system hang after we enable > > >>> coresight, the second patch is to add DT binding according to coresight > > >>> topology. > > >>> > > >>> The patch has been tested on Hikey; By using OpenCSD snapshot mode, it > > >>> can successfully decode ETF and ETB trace data. > > >>> > > >> > > >> I can take the first one and second one goes through arm-soc? > > > > > > Could you pick these two patches for Hi6220 coresight enabling for > > > this merge window? Or need me resend these two patches? > > > > Applied patch 2 into hisilicon dt tree with slight fix. > > Thanks! > > Cool. Patch #1 applied to clk tree. Thanks, Mike and Wei. > Regards, > Mike > > > > > BR, > > Wei > > > > > > > > Thanks, > > > Leo Yan
Re: [PATCH 1/3] driver: mailbox: add support for Hi3660
Hi Jassi, On Wed, Oct 25, 2017 at 09:47:34AM +0530, Jassi Brar wrote: > On Mon, Aug 7, 2017 at 2:47 PM, Zhong Kaihua wrote: > > From: Kaihua Zhong > > > > Add mailbox driver for Hi3660. > > > > Signed-off-by: Leo Yan > > Signed-off-by: Ruyi Wang > > Tested-by: Kaihua Zhong > > > > --- > > drivers/mailbox/Kconfig | 6 + > > drivers/mailbox/Makefile | 2 + > > drivers/mailbox/hi3660-mailbox.c | 688 > > +++ > > 3 files changed, 696 insertions(+) > > create mode 100644 drivers/mailbox/hi3660-mailbox.c > > > > diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig > > index ee1a3d9..778ba85 100644 > > --- a/drivers/mailbox/Kconfig > > +++ b/drivers/mailbox/Kconfig > > @@ -116,6 +116,12 @@ config HI6220_MBOX > > between application processors and MCU. Say Y here if you want to > > build Hi6220 mailbox controller driver. > > > > +config HI3660_MBOX > > + tristate "Hi3660 Mailbox" > > + depends on ARCH_HISI > > + help > > + Mailbox implementation for Hi3660. > > + > > config MAILBOX_TEST > > tristate "Mailbox Test Client" > > depends on OF > > diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile > > index e2bcb03..f1c2fc4 100644 > > --- a/drivers/mailbox/Makefile > > +++ b/drivers/mailbox/Makefile > > @@ -28,6 +28,8 @@ obj-$(CONFIG_XGENE_SLIMPRO_MBOX) += > > mailbox-xgene-slimpro.o > > > > obj-$(CONFIG_HI6220_MBOX) += hi6220-mailbox.o > > > > +obj-$(CONFIG_HI3660_MBOX) += hi3660-mailbox.o > > + > > obj-$(CONFIG_BCM_PDC_MBOX) += bcm-pdc-mailbox.o > > > > obj-$(CONFIG_BCM_FLEXRM_MBOX) += bcm-flexrm-mailbox.o > > diff --git a/drivers/mailbox/hi3660-mailbox.c > > b/drivers/mailbox/hi3660-mailbox.c > > new file mode 100644 > > index 000..14f469d > > --- /dev/null > > +++ b/drivers/mailbox/hi3660-mailbox.c > > @@ -0,0 +1,688 @@ > > +/* > > + * Hisilicon's Hi3660 mailbox driver > > + * > > + * Copyright (c) 2017 Hisilicon Limited. > > + * Copyright (c) 2017 Linaro Limited. > > + * > > + * Author: Leo Yan > > + * > > + * This program is free software: you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation, version 2 of the License. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > > no client.h please Will remove. > > +#include > > +#include > > +#include > > +#include > > + > > +#include "mailbox.h" > > + > > +#define MBOX_CHAN_MAX 32 > > + > > +#define MBOX_TX0x1 > > + > > +/* Mailbox message length: 2 words */ > > +#define MBOX_MSG_LEN 2 > > + > > +#define MBOX_OFF(m)(0x40 * (m)) > > +#define MBOX_SRC_REG(m)MBOX_OFF(m) > > +#define MBOX_DST_REG(m)(MBOX_OFF(m) + 0x04) > > +#define MBOX_DCLR_REG(m) (MBOX_OFF(m) + 0x08) > > +#define MBOX_DSTAT_REG(m) (MBOX_OFF(m) + 0x0C) > > +#define MBOX_MODE_REG(m) (MBOX_OFF(m) + 0x10) > > +#define MBOX_IMASK_REG(m) (MBOX_OFF(m) + 0x14) > > +#define MBOX_ICLR_REG(m) (MBOX_OFF(m) + 0x18) > > +#define MBOX_SEND_REG(m) (MBOX_OFF(m) + 0x1C) > > +#define MBOX_DATA_REG(m, i)(MBOX_OFF(m) + 0x20 + ((i) << 2)) > > + > > +#define MBOX_CPU_IMASK(cpu)(((cpu) << 3) + 0x800) > > +#define MBOX_CPU_IRST(cpu) (((cpu) << 3) + 0x804) > > +#define MBOX_IPC_LOCK (0xA00) > > + > > +#define MBOX_IPC_UNLOCKED 0x > > +#define AUTOMATIC_ACK_CONFIG (1 << 0) > > +#define NO_FUNC_CONFIG (0 << 0) > > + > > +#define MBOX_MANUAL_ACK0 > > +#define MBOX_AUTO_ACK 1 > >
Re: [PATCH 1/3] driver: mailbox: add support for Hi3660
Hi Jassi, On Sat, Sep 02, 2017 at 01:07:50PM +0530, Jassi Brar wrote: [...] > > +#define MBOX_CHAN_MAX 32 > > + > > +#define MBOX_TX0x1 > > + > > +/* Mailbox message length: 2 words */ > > +#define MBOX_MSG_LEN 2 > > + > > +#define MBOX_OFF(m)(0x40 * (m)) > > +#define MBOX_SRC_REG(m)MBOX_OFF(m) > > +#define MBOX_DST_REG(m)(MBOX_OFF(m) + 0x04) > > +#define MBOX_DCLR_REG(m) (MBOX_OFF(m) + 0x08) > > +#define MBOX_DSTAT_REG(m) (MBOX_OFF(m) + 0x0C) > > +#define MBOX_MODE_REG(m) (MBOX_OFF(m) + 0x10) > > +#define MBOX_IMASK_REG(m) (MBOX_OFF(m) + 0x14) > > +#define MBOX_ICLR_REG(m) (MBOX_OFF(m) + 0x18) > > +#define MBOX_SEND_REG(m) (MBOX_OFF(m) + 0x1C) > > +#define MBOX_DATA_REG(m, i)(MBOX_OFF(m) + 0x20 + ((i) << 2)) > > + > > +#define MBOX_CPU_IMASK(cpu)(((cpu) << 3) + 0x800) > > +#define MBOX_CPU_IRST(cpu) (((cpu) << 3) + 0x804) > > +#define MBOX_IPC_LOCK (0xA00) > > + > > How is this controller different than the PL320? Sorry for the late replying, I am starting to work on this patch set. I compared Hi3660 mailbox hardware design with PL320 IPC driver drivers/mailbox/pl320-ipc.c, below are summary for the difference between them: - The register offset is similiar, PL320 IPC driver has one more register IPCMxMSTATUS but Hi3660 doesn't have it; - Hi3660 introduces the ipc lock with offset 0xA00 but PL320 doesn't have it; - They have some conflicts for mailbox channel usage: Hi3660 introduces two extra operations: Unlocking (so have permission to access the channel) Occupying (so can exclusivly access the channel) Hi3660 introduces "MBOX_AUTO_ACK" mode, with this mode the kernel doesn't wait for acknowledge from remote and directly return back; as the first version for this patch I am planning to support this mode for CPU clock. - PL320 registers have different semantics and usage with Hi3660's: PL320 register IPCMxMODE (offset 0x10) and Hi3660 register MBOX_MODE_REG (offset 0x10) have different semantics; PL320 doesn't use IPCMxMODE, but Hi3660 uses MBOX_MODE_REG to set the channel working mode (e.g. set "MBOX_AUTO_ACK" mode). PL320 register IPCMxMSET (offset 0x14) and Hi3660 register MBOX_IMASK_REG (offset 0x14) have different semantics; PL320 IPCMxMSET is used to set the channel's source and destination, Hi3660 MBOX_IMASK_REG is used to mask interrupts for source and destination and only clear bit for which master need receive interrupt. Though Hi3660 register definition is close to PL320 driver, but it's seems hard to combine Hi3660 mailbox driver into PL320 driver according to the register usage and flow. So I prefer to write one dedicated Hi3660 mailbox driver. Jassi, how about you think for this? Thanks, Leo Yan
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On Tue, Oct 17, 2017 at 10:56:43AM +0100, Mark Rutland wrote: > On Tue, Oct 17, 2017 at 05:36:58PM +0800, Leo Yan wrote: > > On Tue, Oct 17, 2017 at 10:32:21AM +0100, Ard Biesheuvel wrote: > > > > [...] > > > > > > AFAICT, erratum 836870 results in livelock rather than memory > > > > corruption, so I think we can ignore that. > > > > > > > > I'm a little worried by erratum 843419. The VMAP_STACK patches changed > > > > {adr,ldr}_this_cpu (and some users thereof), and it's possible we're > > > > managing to tickle that issue. > > > > > > > > If you still have an affected kernel, could you dump the output of: > > > > > > > > $ aarch64-linux-gnu-objdump -d vmlinux | grep -A 3 > > > > 'ff[8c]:\s\+[a-f0-9]\+\s\+adrp' > > > > > > > > ... that would show us if there are any affected sequences. > > > > > > > > From a quick scan of my own vmlinux build from commit e3067861ba66, I > > > > didn't see any, but it's possible this depends on the config used. > > > > > > > > > > The linker should take care of that: it scans the entire executable, > > > and inserts a veneer if an adrp happens to end up at a vulnerable > > > offset in the page. > > > > Is this dependent on any GCC version? > > It is, but we should warn if CONFIG_ARM64_ERRATUM_843419 is selected and > the linked doesn't support the --fix-cortex-a53-843419 option: > > ld does not support --fix-cortex-a53-843419; kernel may be susceptible to > erratum) > > ... do you see this when building the kernel? No, I don't see this building warning. Thanks you and Ard for confirmation. > > I am using GCC 6.2.1, so I get many affected sequences with Mark's command: > > I beleive these are all beningn. AFAICT, none of these meet the conditions for > sequence 1 or sequence 2 affected by the erratum. e.g. many don't have > loads/stores using the adrp result. > > Thanks, > Mark. > > > leoy@leoy-linaro:~/Work/reference/opensource/linux$ > > aarch64-linux-gnu-objdump -d vmlinux | grep -A 3 > > 'ff[8c]:\s\+[a-f0-9]\+\s\+adrp' > > 080a1ffc: 90007340adrpx0, 08f09000 > > > > 080a2000: a900fedfstp xzr, xzr, [x22,#8] > > 080a2004: 91374000add x0, x0, #0xdd0 > > 080a2008: f9000ec0str x0, [x22,#24] > > -- > > 080b6ff8: b0007ce0adrpx0, 09053000 > > > > 080b6ffc: 52901801mov w1, #0x80c0 > > // #32960 > > 080b7000: 72a02801movkw1, #0x140, lsl #16 > > 080b7004: 9102e273add x19, x19, #0xb8 > > -- > > 080f1ff8: d00070a1adrpx1, 08f07000 > > > > 080f1ffc: f9406402ldr x2, [x0,#200] > > 080f2000: f9405c03ldr x3, [x0,#184] > > 080f2004: 14002915b 080fc458 > > > > -- > > 080feff8: 9002adrpx2, 080fe000 > > > > 080feffc: 9136e042add x2, x2, #0xdb8 > > 080ff000: f9000462str x2, [x3,#8] > > 080ff004: f9448e62ldr x2, [x19,#2328] > > -- > > 0810affc: f0006fe1adrpx1, 08f09000 > > > > 0810b000: f94017a2ldr x2, [x29,#40] > > 0810b004: aa1303e0mov x0, x19 > > 0810b008: 1400013db 0810b4fc > > > > -- > > 0811dff8: d0005c80adrpx0, 08caf000 > > > > 0811dffc: 912be021add x1, x1, #0xaf8 > > 0811e000: 912ae000add x0, x0, #0xab8 > > 0811e004: 97ffe6abbl 08117ab0 > > -- > > 08137ff8: f0004443adrpx3, 089c2000 > > > > 08137ffc: 910ac042add x2, x2, #0x2b0 > > 08138000: 9113c063add x3, x3, #0x4f0 > > 08138004: 71000c9fcmp w4, #0x3 > > -- > > 0815eff8: b0005aa1adrpx1, 08cb3000 > > > > 0815effc: 91362021add x1, x1, #0xd88 > > 0815f000: 97c8bl 0815ef20 > > > > 0815f004: b94023a1ldr w1, [x29,#32]
Re: [PATCH v2 1/3] dt-bindings: mailbox: Introduce Hi3660 controller binding
On Fri, Oct 27, 2017 at 09:38:44AM -0500, Rob Herring wrote: > On Fri, Oct 27, 2017 at 02:15:02PM +0800, Kaihua Zhong wrote: > > From: Leo Yan > > > > Introduce a binding for the Hi3660 mailbox controller, the mailbox is > > used within application processor (AP), communication processor (CP), > > HIFI and MCU, etc. > > > > Cc: John Stultz > > Cc: Guodong Xu > > Cc: Haojian Zhuang > > Cc: Niranjan Yadla > > Cc: Raj Pawate > > Signed-off-by: Leo Yan > > --- > > .../bindings/mailbox/hisilicon,hi3660-mailbox.txt | 52 > > ++ > > 1 file changed, 52 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt > > > > diff --git > > a/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt > > b/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt > > new file mode 100644 > > index 000..8a8d7e1 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt > > @@ -0,0 +1,52 @@ > > +Hisilicon Hi3660 Mailbox Driver > > Bindings are for h/w, not drivers. Thanks for reviewing, Rob. Will fix for this and below comments. > > + > > +Hisilicon Hi3660 mailbox controller supports up to 32 channels. Messages > > +are passed between processors, including application & communication > > +processors, MCU, HIFI, etc. Each channel is unidirectional and accessed > > +by using MMIO registers; it supports maximum to 8 words message. > > + > > +Controller > > +-- > > + > > +Required properties: > > +- compatible: : Shall be "hisilicon,hi3660-mbox" > > +- reg: : Offset and length of the device's register set > > +- #mbox-cells: : Must be 3 > > + <&phandle channel dst_irq ack_irq> > > + phandle : Label name of controller > > + channel : Channel number > > + dst_irq : Remote interrupt vector > > + ack_irq : Local interrupt vector > > + > > +- interrupts: : Contains the two IRQ lines for mailbox. > > + > > +Example: > > + > > +mailbox: mailbox@e896b000 { > > + compatible = "hisilicon,hi3660-mbox"; > > + reg = <0x0 0xe896b000 0x0 0x1000>; > > + interrupts = <0x0 0xc0 0x4>, > > +<0x0 0xc1 0x4>; > > + #mbox-cells = <3>; > > +}; > > + > > +Client > > +-- > > + > > +Required properties: > > +- compatible : See the client docs > > +- mboxes : Standard property to specify a Mailbox (See > > ./mailbox.txt) > > + Cells must match 'mbox-cells' (See Controller docs > > above) > > + > > +Optional properties > > +- mbox-names : Name given to channels seen in the 'mboxes' > > property. > > + > > +Example: > > + > > +stub_clock: stub_clock { > > clock@e896b500 > > > + compatible = "hisilicon,hi3660-stub-clk"; > > + reg = <0x0 0xe896b500 0x0 0x0100>; > > + #clock-cells = <1>; > > + mbox-names = "mbox-tx"; > > Don't need -names when there is only one. Plus "mbox-" part is > redundant. > > > + mboxes = <&mailbox 13 3 0>; > > +}; > > -- > > 1.9.1 > >
Re: [PATCH v2 2/3] mailbox: Add support for Hi3660 mailbox
Hi Mark, On Fri, Oct 27, 2017 at 11:46:00AM +0100, Mark Rutland wrote: > On Fri, Oct 27, 2017 at 02:15:03PM +0800, Kaihua Zhong wrote: > > Hi3660 mailbox controller is used to send message within multiple > > processors, MCU, HIFI, etc. It supports 32 mailbox channels and every > > channel can only be used for single transferring direction. Once the > > channel is enabled, it needs to specify the destination interrupt and > > acknowledge interrupt, these two interrupt vectors are used to create > > the connection between the mailbox and interrupt controllers. > > > > The application processor (or from point of view of kernel) is not the > > only one master which can launch the data transferring, other > > processors or MCU/DSP also can kick off the data transferring. So this > > driver implements a locking mechanism to support exclusive accessing. > > ... and that locking mechanism is what precisely? > > Where is the protocol defined? > > > +static int hi3660_mbox_check_state(struct mbox_chan *chan) > > +{ > > + unsigned long ch = (unsigned long)chan->con_priv; > > + struct hi3660_mbox *mbox = to_hi3660_mbox(chan); > > + struct hi3660_mbox_dev *mdev = &mbox->mdev[ch]; > > + void __iomem *base = MBOX_BASE(mbox, ch); > > + unsigned long val; > > + unsigned int state, ret; > > + > > + /* Mailbox is idle so directly bail out */ > > + state = readl_relaxed(base + MBOX_MODE_REG); > > + if (state & MBOX_STATE_IDLE) > > + return 0; > > + > > + /* Wait for acknowledge from remote */ > > + ret = readx_poll_timeout_atomic(readl_relaxed, base + MBOX_MODE_REG, > > + val, (val & MBOX_STATE_ACK), 1000, 30); > > + if (ret) { > > + dev_err(mbox->dev, "%s: timeout for receiving ack\n", __func__); > > + return ret; > > + } > > + > > + /* Ensure channel is released */ > > + writel_relaxed(0x, base + MBOX_IMASK_REG); > > + writel_relaxed(BIT(mdev->ack_irq), base + MBOX_SRC_REG); > > + __asm__ volatile ("sev"); > > + return 0; > > +} > > Drivers really shouldn't be using SEV directly (even if via the sev() > macro)... > > This SEV isn't ordered w.r.t. anything, and it's unclear what ordering you > need, so this simply does not work. I will leave your questions for Hisilicon colleagues, essentially your questions are related with mailbox mechanism. But I'd like to firstly get clear your question for "This SEV isn't ordered w.r.t. anything". From my understanding, ARMv8 architecture natually adds DMB before SEV so all previous register writing opreations should be ensured to endpoint before SEV? [...] Thanks, Leo Yan
Re: [PATCH v2 2/3] mailbox: Add support for Hi3660 mailbox
Hi Mark, On Mon, Oct 30, 2017 at 10:19:40AM +, Mark Rutland wrote: > Hi, > > On Mon, Oct 30, 2017 at 12:45:06PM +0800, Leo Yan wrote: > > On Fri, Oct 27, 2017 at 11:46:00AM +0100, Mark Rutland wrote: > > > On Fri, Oct 27, 2017 at 02:15:03PM +0800, Kaihua Zhong wrote: > > > > +static int hi3660_mbox_check_state(struct mbox_chan *chan) > > > > +{ > > > > > + /* Ensure channel is released */ > > > > + writel_relaxed(0x, base + MBOX_IMASK_REG); > > > > + writel_relaxed(BIT(mdev->ack_irq), base + MBOX_SRC_REG); > > > > + __asm__ volatile ("sev"); > > > > + return 0; > > > > +} > > > > > > Drivers really shouldn't be using SEV directly (even if via the > > > sev() macro)... > > > > > > This SEV isn't ordered w.r.t. anything, and it's unclear what > > > ordering you need, so this simply does not work. > > > > I will leave your questions for Hisilicon colleagues, essentially your > > questions are related with mailbox mechanism. > > > > But I'd like to firstly get clear your question for "This SEV isn't > > ordered w.r.t. anything". From my understanding, ARMv8 architecture > > natually adds DMB before SEV so all previous register writing > > opreations should be ensured to endpoint before SEV? > > This is not the case; SEV does not add any implicit memory barrier, and > is not ordered w.r.t. memory accesses. > > See ARM DDI 0487B.b, page D1-1905, "The Send Event instructions": > > The PE is not required to guarantee the ordering of this event with > respect to the completion of memory accesses by instructions before > the SEV instruction. Therefore, ARM recommends that software > includes a DSB instruction before any SEV instruction. My fault and thanks for explanation. > Note that a DMB is not sufficient, as SEV is not a memory access. Understood now, so below code should be safe? wmb(); -> dsb(st); sev(); Thanks, Leo Yan
Re: [PATCH v1 3/3] arm64: dts: add Hi6220 mailbox node
Hi Mark, On Thu, Aug 27, 2015 at 05:31:09PM +0100, Mark Rutland wrote: > On Wed, Aug 26, 2015 at 07:59:50AM +0100, Leo Yan wrote: > > Hi Haojian, > > > > On Wed, Aug 26, 2015 at 09:25:41AM +0800, Haojian Zhuang wrote: > > > On Wed, 2015-08-26 at 00:00 +0800, Leo Yan wrote: > > > > On Tue, Aug 25, 2015 at 09:43:14PM +0800, Haojian Zhuang wrote: > > > > > On Tue, 2015-08-25 at 11:42 +0100, Mark Rutland wrote: > > > > > > > > Are you then going to hack GRUB, release a special HiKey > > > > > > > > version of > > > > > > > > GRUB, not support any other versions, and still can your > > > > > > > > firmware > > > > > > > > UEFI? > > > > > > > > > > > > > > I don't need to hack GRUB at all. > > > > > > > > > > > > Then it is working for you by pure chance alone. > > > > > > > > > > > > Please listen to the advice you are being given here; we're trying > > > > > > to > > > > > > ensure that your platform functions (and continues to function) as > > > > > > best > > > > > > it can. > > > > > > > > > > Since we discussed a lot on this, let's make a conclusion on it. > > > > > > > > > > 1. UEFI could append the reserved buffer in it's memory mapping. > > > > > 2. These reserved buffer must be declared in DT, since we also need to > > > > >support non-UEFI (uboot) at the same time. > > > > > 3. Mailbox node should reference reserved buffer by phandle in DT. > > > > > Then > > > > >map the buffer as non-cacheable in driver. > > > > > 4. These reserved buffer must use "no-map" property since it should be > > > > >non-cacheable in driver. > > > > > > > > For more specific discussion for DTS, i list two options at here; > > > > > > > > - Option 1: just simply reserve memory regions through memory node, > > > > and mailbox node will directly use the buffer through reg ranges; > > > > > > > > - Option 2: use reserved-memory and mailbox node will refer phandle > > > > of reserved-memory; > > > > > > > > These two options both can work well with UEFI and Uboot, but option 1 > > > > is more simple and straightforward; so i personally prefer it. But > > > > look forwarding your guys' suggestion. > > > > > > > > Option 1: > > > > > > > > memory@0 { > > > > device_type = "memory"; > > > > reg = <0x 0x 0x 0x05e0>, > > > > <0x 0x05f0 0x 0x00eff000>, > > > > <0x 0x06e0 0x 0x0060f000>, > > > > <0x 0x0741 0x 0x38bf>; > > > > }; > > > > > > > > [...] > > > > > > > > mailbox: mailbox@f751 { > > > > #mbox-cells = <1>; > > > > compatible = "hisilicon,hi6220-mbox"; > > > > reg = <0x0 0xf751 0x0 0x1000>, /* IPC_S */ > > > > <0x0 0x06dff800 0x0 0x0800>; /* Mailbox buffer */ > > > > interrupts = ; > > > > }; > > > > > > > > Option 2: > > > > > > > > memory@0 { > > > > device_type = "memory"; > > > > reg = <0x0 0x0 0x0 0x4000>; > > > > }; > > > > > > > > reserved-memory { > > > > #address-cells = <2>; > > > > #size-cells = <2>; > > > > ranges; > > > > > > > > mcu_reserved: mcu_reserved@06dff000 { > > > > no-map; > > > > reg = <0x0 0x06dff000 0x0 0x1000>, /* MCU > > > > mailbox buffer */ > > > > <0x0 0x05e0 0x0 0x0010>, /* MCU > > > > firmware buffer */ > > > > <0x0 0x074
Re: [PATCH 2/2] arm64: dts: add dts files for Hisilicon Hip05-D02 Development Board
On Sat, Aug 29, 2015 at 04:52:41PM +0800, Ding Tianhong wrote: > Add initial dtsi file to support Hisilicon Hip05-D02 Board with > support of CPUs in four clusters and each cluster has quard Cortex-A57. > > Also add dts file to support Hip05-D02 development board. > > Signed-off-by: Ding Tianhong > Signed-off-by: Kefeng Wang > --- > arch/arm64/boot/dts/hisilicon/Makefile | 2 +- > arch/arm64/boot/dts/hisilicon/hip05-d02.dts | 36 > arch/arm64/boot/dts/hisilicon/hip05.dtsi| 271 > > 3 files changed, 308 insertions(+), 1 deletion(-) > create mode 100644 arch/arm64/boot/dts/hisilicon/hip05-d02.dts > create mode 100644 arch/arm64/boot/dts/hisilicon/hip05.dtsi > > diff --git a/arch/arm64/boot/dts/hisilicon/Makefile > b/arch/arm64/boot/dts/hisilicon/Makefile > index fa81a6e..cd158b8 100644 > --- a/arch/arm64/boot/dts/hisilicon/Makefile > +++ b/arch/arm64/boot/dts/hisilicon/Makefile > @@ -1,4 +1,4 @@ > -dtb-$(CONFIG_ARCH_HISI) += hi6220-hikey.dtb > +dtb-$(CONFIG_ARCH_HISI) += hi6220-hikey.dtb hip05-d02.dtb > > always := $(dtb-y) > subdir-y := $(dts-dirs) > diff --git a/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > b/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > new file mode 100644 > index 000..ae34e25 > --- /dev/null > +++ b/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > @@ -0,0 +1,36 @@ > +/** > + * dts file for Hisilicon D02 Development Board > + * > + * Copyright (C) 2014,2015 Hisilicon Ltd. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * publishhed by the Free Software Foundation. > + * > + */ > + > +/dts-v1/; > + > +#include "hip05.dtsi" > + > +/ { > + model = "Hisilicon Hip05 D02 Development Board"; > + compatible = "hisilicon,hip05-d02"; > + > + memory@ { > + device_type = "memory"; > + reg = <0x0 0x 0x0 0x8000>; > + }; > + > + aliases { > + serial0 = &uart0; > + }; > + > + chosen { > + stdout-path = "serial0:115200n8"; > + }; > +}; > + > +&uart0 { > + status = "ok"; > +}; > diff --git a/arch/arm64/boot/dts/hisilicon/hip05.dtsi > b/arch/arm64/boot/dts/hisilicon/hip05.dtsi > new file mode 100644 > index 000..da12d94 > --- /dev/null > +++ b/arch/arm64/boot/dts/hisilicon/hip05.dtsi > @@ -0,0 +1,271 @@ > +/** > + * dts file for Hisilicon D02 Development Board > + * > + * Copyright (C) 2014,2015 Hisilicon Ltd. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * publishhed by the Free Software Foundation. > + * > + */ > + > +#include > + > +/ { > + compatible = "hisilicon,hip05-d02"; > + interrupt-parent = <&gic>; > + #address-cells = <2>; > + #size-cells = <2>; > + > + psci { > + compatible = "arm,psci-0.2"; > + method = "smc"; > + }; > + > + cpus { > + #address-cells = <1>; > + #size-cells = <0>; > + > + cpu-map { > + cluster0 { > + core0 { > + cpu = <&cpu0>; > + }; > + core1 { > + cpu = <&cpu1>; > + }; > + core2 { > + cpu = <&cpu2>; > + }; > + core3 { > + cpu = <&cpu3>; > + }; > + }; > + cluster1 { > + core0 { > + cpu = <&cpu4>; > + }; > + core1 { > + cpu = <&cpu5>; > + }; > + core2 { > + cpu = <&cpu6>; > + }; > + core3 { > + cpu = <&cpu7>; > + }; > + }; > + cluster2 { > + core0 { > + cpu = <&cpu8>; > + }; > + core1 { > + cpu = <&cpu9>; > + }; > + core2 { > + cpu = <&cpu10>; > + }; > + core3 { > + cpu = <&cpu11>; > + }; > + }; > + cluster3 { > + core0 { > +
Re: [PATCH 2/2] arm64: dts: add dts files for Hisilicon Hip05-D02 Development Board
On Mon, Aug 31, 2015 at 09:44:38PM +0800, Ding Tianhong wrote: > On 2015/8/31 21:12, Leo Yan wrote: > > On Sat, Aug 29, 2015 at 04:52:41PM +0800, Ding Tianhong wrote: > >> Add initial dtsi file to support Hisilicon Hip05-D02 Board with > >> support of CPUs in four clusters and each cluster has quard Cortex-A57. > >> > >> Also add dts file to support Hip05-D02 development board. > >> > >> Signed-off-by: Ding Tianhong > >> Signed-off-by: Kefeng Wang > >> --- > >> arch/arm64/boot/dts/hisilicon/Makefile | 2 +- > >> arch/arm64/boot/dts/hisilicon/hip05-d02.dts | 36 > >> arch/arm64/boot/dts/hisilicon/hip05.dtsi| 271 > >> > >> 3 files changed, 308 insertions(+), 1 deletion(-) > >> create mode 100644 arch/arm64/boot/dts/hisilicon/hip05-d02.dts > >> create mode 100644 arch/arm64/boot/dts/hisilicon/hip05.dtsi > >> > >> diff --git a/arch/arm64/boot/dts/hisilicon/Makefile > >> b/arch/arm64/boot/dts/hisilicon/Makefile > >> index fa81a6e..cd158b8 100644 > >> --- a/arch/arm64/boot/dts/hisilicon/Makefile > >> +++ b/arch/arm64/boot/dts/hisilicon/Makefile > >> @@ -1,4 +1,4 @@ > >> -dtb-$(CONFIG_ARCH_HISI) += hi6220-hikey.dtb > >> +dtb-$(CONFIG_ARCH_HISI) += hi6220-hikey.dtb hip05-d02.dtb > >> > >> always:= $(dtb-y) > >> subdir-y := $(dts-dirs) > >> diff --git a/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > >> b/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > >> new file mode 100644 > >> index 000..ae34e25 > >> --- /dev/null > >> +++ b/arch/arm64/boot/dts/hisilicon/hip05-d02.dts > >> @@ -0,0 +1,36 @@ > >> +/** > >> + * dts file for Hisilicon D02 Development Board > >> + * > >> + * Copyright (C) 2014,2015 Hisilicon Ltd. > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License version 2 as > >> + * publishhed by the Free Software Foundation. > >> + * > >> + */ > >> + > >> +/dts-v1/; > >> + > >> +#include "hip05.dtsi" > >> + > >> +/ { > >> + model = "Hisilicon Hip05 D02 Development Board"; > >> + compatible = "hisilicon,hip05-d02"; > >> + > >> + memory@ { > >> + device_type = "memory"; > >> + reg = <0x0 0x 0x0 0x8000>; > >> + }; > >> + > >> + aliases { > >> + serial0 = &uart0; > >> + }; > >> + > >> + chosen { > >> + stdout-path = "serial0:115200n8"; > >> + }; > >> +}; > >> + > >> +&uart0 { > >> + status = "ok"; > >> +}; > >> diff --git a/arch/arm64/boot/dts/hisilicon/hip05.dtsi > >> b/arch/arm64/boot/dts/hisilicon/hip05.dtsi > >> new file mode 100644 > >> index 000..da12d94 > >> --- /dev/null > >> +++ b/arch/arm64/boot/dts/hisilicon/hip05.dtsi > >> @@ -0,0 +1,271 @@ > >> +/** > >> + * dts file for Hisilicon D02 Development Board > >> + * > >> + * Copyright (C) 2014,2015 Hisilicon Ltd. > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License version 2 as > >> + * publishhed by the Free Software Foundation. > >> + * > >> + */ > >> + > >> +#include > >> + > >> +/ { > >> + compatible = "hisilicon,hip05-d02"; > >> + interrupt-parent = <&gic>; > >> + #address-cells = <2>; > >> + #size-cells = <2>; > >> + > >> + psci { > >> + compatible = "arm,psci-0.2"; > >> + method = "smc"; > >> + }; > >> + > >> + cpus { > >> + #address-cells = <1>; > >> + #size-cells = <0>; > >> + > >> + cpu-map { > >> + cluster0 { > >> + core0 { > >> + cpu = <&cpu0>; > >> + }; > >> + core1 { > >> + cpu = <&cpu1>; > >> + }; > >> +
[PATCH v4 0/4] mailbox: hisilicon: add Hi6220 mailbox driver
Hi6220 mailbox supports up to 32 channels. Each channel is unidirectional with a maximum message size of 8 words. I/O is performed using register access (there is no DMA) and the cell raises an interrupt when messages are received. This patch series is to implement Hi6220 mailbox driver. It registers two channels into framework for communication with MCU, one is tx channel and another is rx channel. Now mailbox driver is used to send message to MCU to control dynamic voltage and frequency scaling for CPU, GPU and DDR. Changes from v3: * The patch series for enabling idle state for Hi6220 has reserved memory regions, so this series will not include it anymore * Refined mailbox driver according to Jassi's suggestion; Removed kfifo from mailbox driver; Removed spinlock for ipc registers accessing, due every channel has its own dedicated bit in ipc register and readl/writel will introduce memory barrier, so don't need spinlock to protect ipc registers accessing * After mailbox driver is ready, can use patch 4 to enable CPU's OPPs and stub clock driver; finally can enable CPUFreq driver for CPU frequency scaling Changes from v2: * Get rid of unused memory regions from memory node in DT, and don't use reserved-memory node according to Mark and Leif's suggestion; Haojian also has updated UEFI for efi memory info Changes from v1: * Correct lock usage for SMP scenario Changes from RFC: * According to Jassi's review, totally remove the abstract common driver layer and only commit driver dedicated for Hi6220 * According to Paul Bolle's review, fix typo issue for Kconfig and remove unnecessary dependency with OF and fix minor for mailbox driver * Refine a little for dts nodes Leo Yan (4): dt-bindings: mailbox: Document Hi6220 mailbox driver mailbox: Hi6220: add mailbox driver arm64: dts: add mailbox node for Hi6220 arm64: dts: add Hi6220's stub clock node .../bindings/mailbox/hisilicon,hi6220-mailbox.txt | 57 arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 31 ++ drivers/mailbox/Kconfig| 8 + drivers/mailbox/Makefile | 2 + drivers/mailbox/hi6220-mailbox.c | 371 + 5 files changed, 469 insertions(+) create mode 100644 Documentation/devicetree/bindings/mailbox/hisilicon,hi6220-mailbox.txt create mode 100644 drivers/mailbox/hi6220-mailbox.c -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 4/4] arm64: dts: add Hi6220's stub clock node
Enable SRAM node and stub clock node for Hi6220; furthermore add the CPU's clock so it will be used by cpufreq-dt driver. Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 23 +++ 1 file changed, 23 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi index 76560fa..912671e 100644 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -80,6 +80,17 @@ device_type = "cpu"; reg = <0x0 0x0>; enable-method = "psci"; + clocks = <&stub_clock 0>; + clock-latency = <0>; + operating-points = < + /* kHz */ + 120 0 + 96 0 + 729000 0 + 432000 0 + 208000 0 + >; + #cooling-cells = <2>; /* min followed by max */ cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; @@ -167,6 +178,11 @@ #size-cells = <2>; ranges; + sram: sram@fff8 { + compatible = "hisilicon,hi6220-sramctrl", "syscon"; + reg = <0x0 0xfff8 0x0 0x12000>; + }; + ao_ctrl: ao_ctrl@f780 { compatible = "hisilicon,hi6220-aoctrl", "syscon"; reg = <0x0 0xf780 0x0 0x2000>; @@ -191,6 +207,13 @@ #clock-cells = <1>; }; + stub_clock: stub_clock { + compatible = "hisilicon,hi6220-stub-clk"; + hisilicon,hi6220-clk-sram = <&sram>; + #clock-cells = <1>; + mboxes = <&mailbox 1>; + }; + uart0: uart@f8015000 { /* console */ compatible = "arm,pl011", "arm,primecell"; reg = <0x0 0xf8015000 0x0 0x1000>; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/4] mailbox: Hi6220: add mailbox driver
Add driver for Hi6220 mailbox, the mailbox communicates with MCU; for sending data, it can support two methods for low level implementation: one is to use interrupt as acknowledge, another is automatic mode which without any acknowledge. These two methods have been supported in the driver. For receiving data, it will depend on the interrupt to notify the channel has incoming message. Now mailbox driver is used to send message to MCU to control dynamic voltage and frequency scaling for CPU, GPU and DDR. Signed-off-by: Leo Yan --- drivers/mailbox/Kconfig | 8 + drivers/mailbox/Makefile | 2 + drivers/mailbox/hi6220-mailbox.c | 371 +++ 3 files changed, 381 insertions(+) create mode 100644 drivers/mailbox/hi6220-mailbox.c diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index bbec500..41fb7fa 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -71,4 +71,12 @@ config BCM2835_MBOX the services of the Videocore. Say Y here if you want to use the BCM2835 Mailbox. +config HI6220_MBOX + tristate "Hi6220 Mailbox" + depends on ARCH_HISI + help + An implementation of the hi6220 mailbox. It is used to send message + between application processors and MCU. Say Y here if you want to build + the Hi6220 mailbox controller driver. + endif diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile index 8e6d822..4ba9f5f 100644 --- a/drivers/mailbox/Makefile +++ b/drivers/mailbox/Makefile @@ -13,3 +13,5 @@ obj-$(CONFIG_PCC) += pcc.o obj-$(CONFIG_ALTERA_MBOX) += mailbox-altera.o obj-$(CONFIG_BCM2835_MBOX) += bcm2835-mailbox.o + +obj-$(CONFIG_HI6220_MBOX) += hi6220-mailbox.o diff --git a/drivers/mailbox/hi6220-mailbox.c b/drivers/mailbox/hi6220-mailbox.c new file mode 100644 index 000..c0e19d5 --- /dev/null +++ b/drivers/mailbox/hi6220-mailbox.c @@ -0,0 +1,371 @@ +/* + * Hisilicon's Hi6220 mailbox driver + * + * Copyright (c) 2015 Hisilicon Limited. + * Copyright (c) 2015 Linaro Limited. + * + * Author: Leo Yan + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MBOX_CHAN_MAX 32 +#define MBOX_CHAN_NUM 2 + +#define MBOX_RX0x0 +#define MBOX_TX0x1 + +/* Mailbox message length: 8 words */ +#define MBOX_MSG_LEN 8 + +/* Mailbox Registers */ +#define MBOX_OFF(m)(0x40 * (m)) +#define MBOX_MODE_REG(m) (MBOX_OFF(m) + 0x0) +#define MBOX_DATA_REG(m) (MBOX_OFF(m) + 0x4) + +#define MBOX_STATE_MASK(0xF << 4) +#define MBOX_STATE_IDLE(0x1 << 4) +#define MBOX_STATE_TX (0x2 << 4) +#define MBOX_STATE_RX (0x4 << 4) +#define MBOX_STATE_ACK (0x8 << 4) +#define MBOX_ACK_CONFIG_MASK (0x1 << 0) +#define MBOX_ACK_AUTOMATIC (0x1 << 0) +#define MBOX_ACK_IRQ (0x0 << 0) + +/* IPC registers */ +#define ACK_INT_RAW_REG(i) ((i) + 0x400) +#define ACK_INT_MSK_REG(i) ((i) + 0x404) +#define ACK_INT_STAT_REG(i)((i) + 0x408) +#define ACK_INT_CLR_REG(i) ((i) + 0x40c) +#define ACK_INT_ENA_REG(i) ((i) + 0x500) +#define ACK_INT_DIS_REG(i) ((i) + 0x504) +#define DST_INT_RAW_REG(i) ((i) + 0x420) + + +struct hi6220_mbox_chan { + + /* +* Description for channel's hardware info: +* - direction: tx or rx +* - dst irq: peer core's irq number +* - ack irq: local irq number +* - slot number +*/ + unsigned int dir, dst_irq, ack_irq; + unsigned int slot; + + struct hi6220_mbox *parent; +}; + +struct hi6220_mbox { + struct device *dev; + + int irq; + + /* flag of enabling tx's irq mode */ + bool tx_irq_mode; + + /* region for ipc event */ + void __iomem *ipc; + + /* region for mailbox */ + void __iomem *base; + + unsigned int chan_num; + struct hi6220_mbox_chan *mchan; + + void *irq_map_chan[MBOX_CHAN_MAX]; + struct mbox_chan *chan; + struct mbox_controller controller; +}; + +static void mbox_set_state(struct hi6220_mbox *mbox, +
[PATCH v4 3/4] arm64: dts: add mailbox node for Hi6220
This patch add device mailbox node for Hi6220 in DT. Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi index e83802a..76560fa 100644 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -207,5 +207,13 @@ clocks = <&ao_ctrl 27>, <&ao_ctrl 27>; clock-names = "apb_pclk", "apb_pclk"; }; + + mailbox: mailbox@f751 { + #mbox-cells = <1>; + compatible = "hisilicon,hi6220-mbox"; + reg = <0x0 0xf751 0x0 0x1000>, /* IPC_S */ + <0x0 0x06dff800 0x0 0x0800>; /* Mailbox buffer */ + interrupts = ; + }; }; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/4] dt-bindings: mailbox: Document Hi6220 mailbox driver
Document the new compatible for Hisilicon Hi6220 mailbox driver. Signed-off-by: Leo Yan --- .../bindings/mailbox/hisilicon,hi6220-mailbox.txt | 57 ++ 1 file changed, 57 insertions(+) create mode 100644 Documentation/devicetree/bindings/mailbox/hisilicon,hi6220-mailbox.txt diff --git a/Documentation/devicetree/bindings/mailbox/hisilicon,hi6220-mailbox.txt b/Documentation/devicetree/bindings/mailbox/hisilicon,hi6220-mailbox.txt new file mode 100644 index 000..3dfb0b0 --- /dev/null +++ b/Documentation/devicetree/bindings/mailbox/hisilicon,hi6220-mailbox.txt @@ -0,0 +1,57 @@ +Hisilicon Hi6220 Mailbox Driver +=== + +Hisilicon Hi6220 mailbox supports up to 32 channels. Each channel +is unidirectional with a maximum message size of 8 words. I/O is +performed using register access (there is no DMA) and the cell +raises an interrupt when messages are received. + +Mailbox Device Node: + + +Required properties: + +- compatible: Shall be "hisilicon,hi6220-mbox" +- reg: Contains the mailbox register address range (base + address and length); the first item is for IPC + registers, the second item is shared buffer for + slots. +- #mbox-cells Common mailbox binding property to identify the number + of cells required for the mailbox specifier. Should be 1. +- interrupts: Contains the interrupt information for the mailbox + device. The format is dependent on which interrupt + controller the SoCs use. + +Example: + + + mailbox: mailbox@F751 { + #mbox-cells = <1>; + compatible = "hisilicon,hi6220-mbox"; + reg = <0x0 0xF751 0x0 0x1000>, /* IPC_S */ + <0x0 0x06DFF800 0x0 0x0800>; /* Mailbox */ + interrupt-parent = <&gic>; + interrupts = <0 94 4>; + }; + + +Mailbox client +=== + +"mboxes" and the optional "mbox-names" (please see +Documentation/devicetree/bindings/mailbox/mailbox.txt for details). Each value +of the mboxes property should contain a phandle to the mailbox controller +device node and second argument is the channel index. It must be 0 (hardware +support only one channel). The equivalent "mbox-names" property value can be +used to give a name to the communication channel to be used by the client user. + +Example: + + + stub_clock: stub_clock { + compatible = "hisilicon,hi6220-stub-clk"; + hisilicon,hi6220-clk-sram = <&sram>; + #clock-cells = <1>; + mbox-names = "mbox-tx"; + mboxes = <&mailbox 1>; + }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/4] arm64: Kconfig: select sp804 timer for ARCH_HISI
Select sp804 timer for ARCH_HISI, which is used as broadcast timer. Signed-off-by: Leo Yan --- arch/arm64/Kconfig.platforms | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms index 23800a1..6d730fb 100644 --- a/arch/arm64/Kconfig.platforms +++ b/arch/arm64/Kconfig.platforms @@ -35,6 +35,7 @@ config ARCH_FSL_LS2085A config ARCH_HISI bool "Hisilicon SoC Family" + select ARM_TIMER_SP804 help This enables support for Hisilicon ARMv8 SoC family -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/4] arm64: Hi6220: enable CPU idle states
This patch series is to enable CPU idle states for Hi6220. Hi6220 uses PSCIv0.2 compliance interface, so directly use ARM's generic CPUIdle driver. Patch 1 is to reserve memory regions so make sure MCU can work well to handle power controlling; Patch 2/3 enable sp804 timer as broadcast timer during idle states; Patch 4 registers CPU power down state and cluster power down state. Changes from v1: * According to Sudeep's review, fix binding for idle-states * According to Rob's review, due timers share same clock source with apb clock, so just only pass one clock phandle Leo Yan (4): arm64: dts: Reserve memory regions for hi6220 arm64: Kconfig: select sp804 timer for ARCH_HISI arm64: dts: add sp804 timer node for Hi6220 arm64: dts: enable idle states for Hi6220 arch/arm64/Kconfig.platforms | 1 + arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts | 16 --- arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 40 ++ 3 files changed, 53 insertions(+), 4 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 3/4] arm64: dts: add sp804 timer node for Hi6220
Add sp804 timer for hi6220, so it can be used as broadcast timer. Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi index 3f03380..cdd4125 100644 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -167,5 +167,14 @@ clocks = <&ao_ctrl 36>, <&ao_ctrl 36>; clock-names = "uartclk", "apb_pclk"; }; + + dual_timer0: dual_timer@f8008000 { + compatible = "arm,sp804", "arm,primecell"; + reg = <0x0 0xf8008000 0x0 0x1000>; + interrupts = , +; + clocks = <&ao_ctrl 27>; + clock-names = "apb_pclk"; + }; }; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 4/4] arm64: dts: enable idle states for Hi6220
Add cpu and cluster level's low power state for Hi6220. Acked-by: Sudeep Holla Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi index cdd4125..2830571 100644 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -52,11 +52,35 @@ }; }; + idle-states { + entry-method = "psci"; + + CPU_SLEEP: cpu-sleep { + compatible = "arm,idle-state"; + local-timer-stop; + arm,psci-suspend-param = <0x001>; + entry-latency-us = <700>; + exit-latency-us = <250>; + min-residency-us = <1000>; + }; + + CLUSTER_SLEEP: cluster-sleep { + compatible = "arm,idle-state"; + local-timer-stop; + arm,psci-suspend-param = <0x101>; + entry-latency-us = <1000>; + exit-latency-us = <700>; + min-residency-us = <2700>; + wakeup-latency-us = <1500>; + }; + }; + cpu0: cpu@0 { compatible = "arm,cortex-a53", "arm,armv8"; device_type = "cpu"; reg = <0x0 0x0>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu1: cpu@1 { @@ -64,6 +88,7 @@ device_type = "cpu"; reg = <0x0 0x1>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu2: cpu@2 { @@ -71,6 +96,7 @@ device_type = "cpu"; reg = <0x0 0x2>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu3: cpu@3 { @@ -78,6 +104,7 @@ device_type = "cpu"; reg = <0x0 0x3>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu4: cpu@100 { @@ -85,6 +112,7 @@ device_type = "cpu"; reg = <0x0 0x100>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu5: cpu@101 { @@ -92,6 +120,7 @@ device_type = "cpu"; reg = <0x0 0x101>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu6: cpu@102 { @@ -99,6 +128,7 @@ device_type = "cpu"; reg = <0x0 0x102>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; cpu7: cpu@103 { @@ -106,6 +136,7 @@ device_type = "cpu"; reg = <0x0 0x103>; enable-method = "psci"; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>; }; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/4] arm64: dts: Reserve memory regions for hi6220
On Hi6220, below memory regions in DDR have specific purpose: 0x05e0, - 0x05ef,: For MCU firmware using at runtime; 0x06df,f000 - 0x06df,: For mailbox message data; 0x0740,f000 - 0x0740,: For MCU firmware's section; 0x3e00, - 0x3fff,: For OP-TEE. This patch reserves these memory regions in DT. Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts b/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts index e36a539..e3f4cb3 100644 --- a/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts +++ b/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts @@ -7,9 +7,6 @@ /dts-v1/; -/*Reserved 1MB memory for MCU*/ -/memreserve/ 0x05e0 0x0010; - #include "hi6220.dtsi" / { @@ -24,8 +21,19 @@ stdout-path = "serial0:115200n8"; }; + /* +* Reserve below regions from memory node: +* +* - 0x05e0, - 0x05ef,: MCU firmware runtime using +* - 0x06df,f000 - 0x06df,: Mailbox message data +* - 0x0740,f000 - 0x0740,: MCU firmware section +* - 0x3e00, - 0x3fff,: OP-TEE +*/ memory@0 { device_type = "memory"; - reg = <0x0 0x0 0x0 0x4000>; + reg = <0x 0x 0x 0x05e0>, + <0x 0x05f0 0x 0x00eff000>, + <0x 0x06e0 0x 0x0060f000>, + <0x 0x0741 0x 0x36bf>; }; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/4] dt-bindings: mailbox: Document Hi6220 mailbox driver
Hi Jassi, Thanks for review. On Fri, Oct 16, 2015 at 10:26:48AM +0530, Jassi Brar wrote: > On Mon, Oct 12, 2015 at 10:02 PM, Leo Yan wrote: > > > + > > +Mailbox Device Node: > > + > > + > > +Required properties: > > + > > +- compatible: Shall be "hisilicon,hi6220-mbox" > > +- reg: Contains the mailbox register address range (base > > + address and length); the first item is for IPC > > + registers, the second item is shared buffer for > > + slots. > > > Is the shared-buffer a part of mailbox controller or just regular > memory that could be used for purposes other than mailbox also? If > latter, please remove it from here. The shared-buffer is part of mailbox controller, this region cannot be dynamically allocated for other purpose; it's hardcode by MCU firmware. > > +Mailbox client > > +=== > > + > > +"mboxes" and the optional "mbox-names" (please see > > +Documentation/devicetree/bindings/mailbox/mailbox.txt for details). Each > > value > > +of the mboxes property should contain a phandle to the mailbox controller > > +device node and second argument is the channel index. It must be 0 > > (hardware > > +support only one channel). > > > sorry, what must be 0? You have MBOX_CHAN_MAX and MBOX_CHAN_NUM as 32 > and 2 in the driver. Will fix it. Thanks, Leo Yan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/4] mailbox: Hi6220: add mailbox driver
On Fri, Oct 16, 2015 at 10:43:00AM +0530, Jassi Brar wrote: > On Mon, Oct 12, 2015 at 10:02 PM, Leo Yan wrote: > > > + > > +#define MBOX_CHAN_MAX 32 > > +#define MBOX_CHAN_NUM 2 > > + > You mean the hardware has 32 channels but this driver can not manage > more than 2 ? > > OR, there are 32 interfaces but only 2 physical 'floating' links, so > no more than 2 interfaces can be active at any time? For the later case. Mailbox also can be used by other modules, such like audio hifi; later need add links for them if want to enable audio. But so far only uses first 2 channels. Thanks, Leo Yan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/4] mailbox: Hi6220: add mailbox driver
On Fri, Oct 16, 2015 at 11:17:32AM +0530, Jassi Brar wrote: > On Fri, Oct 16, 2015 at 10:59 AM, Leo Yan wrote: > > On Fri, Oct 16, 2015 at 10:43:00AM +0530, Jassi Brar wrote: > >> On Mon, Oct 12, 2015 at 10:02 PM, Leo Yan wrote: > >> > >> > + > >> > +#define MBOX_CHAN_MAX 32 > >> > +#define MBOX_CHAN_NUM 2 > >> > + > >> You mean the hardware has 32 channels but this driver can not manage > >> more than 2 ? > >> > >> OR, there are 32 interfaces but only 2 physical 'floating' links, so > >> no more than 2 interfaces can be active at any time? > > > > For the later case. > Former is a s/w limitation and latter is h/w limitation. From what you > write below, it seems former to be the case. Sorry, i misunderstood the question. It's a s/w limitation. > > Mailbox also can be used by other modules, such > > like audio hifi; later need add links for them if want to enable > > audio. But so far only uses first 2 channels. > > > You mean every time your platform needs another channel, you'll update > the driver? Not sure about that. It should be simpler to assign which > ever and as many channels as the client asks via DT. For hi6220, every channel is fixed to specific purpose; so i can register all of them in driver, such like hifi related channels; Though i cannot test them currently, but this will avoid extra efforts for enabling channels anymore, do you think this is okay? Or you prefer to bind with DT? Thanks, Leo Yan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/4] mailbox: Hi6220: add mailbox driver
On Fri, Oct 16, 2015 at 11:42:54AM +0530, Jassi Brar wrote: > On 16 October 2015 at 11:38, Leo Yan wrote: > > On Fri, Oct 16, 2015 at 11:17:32AM +0530, Jassi Brar wrote: > >> On Fri, Oct 16, 2015 at 10:59 AM, Leo Yan wrote: > >> > On Fri, Oct 16, 2015 at 10:43:00AM +0530, Jassi Brar wrote: > >> >> On Mon, Oct 12, 2015 at 10:02 PM, Leo Yan wrote: > >> >> > >> >> > + > >> >> > +#define MBOX_CHAN_MAX 32 > >> >> > +#define MBOX_CHAN_NUM 2 > >> >> > + > >> >> You mean the hardware has 32 channels but this driver can not manage > >> >> more than 2 ? > >> >> > >> >> OR, there are 32 interfaces but only 2 physical 'floating' links, so > >> >> no more than 2 interfaces can be active at any time? > >> > > >> > For the later case. > >> Former is a s/w limitation and latter is h/w limitation. From what you > >> write below, it seems former to be the case. > > > > Sorry, i misunderstood the question. It's a s/w limitation. > > > >> > Mailbox also can be used by other modules, such > >> > like audio hifi; later need add links for them if want to enable > >> > audio. But so far only uses first 2 channels. > >> > > >> You mean every time your platform needs another channel, you'll update > >> the driver? Not sure about that. It should be simpler to assign which > >> ever and as many channels as the client asks via DT. > > > > For hi6220, every channel is fixed to specific purpose; > on different platform the purposes may be different, so > > > so i can > > register all of them in driver, such like hifi related channels; > > Though i cannot test them currently, but this will avoid extra > > efforts for enabling channels anymore, do you think this is okay? > > Or you prefer to bind with DT? > > > DT please. Ok, will do this. Thansk, Leo Yan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/4] arm64: dts: Reserve memory regions for hi6220
Hi Rob, On Tue, Oct 13, 2015 at 08:58:13AM +0800, Leo Yan wrote: > On Hi6220, below memory regions in DDR have specific purpose: > > 0x05e0, - 0x05ef,: For MCU firmware using at runtime; > 0x06df,f000 - 0x06df,: For mailbox message data; > 0x0740,f000 - 0x0740,: For MCU firmware's section; > 0x3e00, - 0x3fff,: For OP-TEE. > > This patch reserves these memory regions in DT. > > Signed-off-by: Leo Yan > --- > arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts > b/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts > index e36a539..e3f4cb3 100644 > --- a/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts > +++ b/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts > @@ -7,9 +7,6 @@ > > /dts-v1/; > > -/*Reserved 1MB memory for MCU*/ > -/memreserve/ 0x05e0 0x0010; > - > #include "hi6220.dtsi" > > / { > @@ -24,8 +21,19 @@ > stdout-path = "serial0:115200n8"; > }; > > + /* > + * Reserve below regions from memory node: > + * > + * - 0x05e0, - 0x05ef,: MCU firmware runtime using > + * - 0x06df,f000 - 0x06df,: Mailbox message data > + * - 0x0740,f000 - 0x0740,: MCU firmware section > + * - 0x3e00, - 0x3fff,: OP-TEE > + */ > memory@0 { > device_type = "memory"; > - reg = <0x0 0x0 0x0 0x4000>; > + reg = <0x 0x 0x 0x05e0>, > + <0x 0x05f0 0x 0x00eff000>, > + <0x 0x06e0 0x 0x0060f000>, > + <0x 0x0741 0x 0x36bf>; > }; > }; Are you convinced by the before's patch arguments from Mark and I? i'd like get green light from you. Thanks, Leo Yan > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/5] clk: Hi6220: add stub clock driver
Hi Kevin, On Tue, Sep 01, 2015 at 05:28:03PM -0700, Kevin Hilman wrote: > On Sun, Aug 2, 2015 at 6:13 PM, Leo Yan wrote: > > On Hi6220, there have some clocks which can use mailbox channel to send > > messages to power controller to change frequency; this includes CPU, GPU > > and DDR clocks. > > > > For dynamic frequency scaling, firstly need write the frequency value to > > SRAM region, and then send message to mailbox to trigger power controller > > to handle this requirement. This driver will use syscon APIs to pass SRAM > > memory region and use common mailbox APIs for channels accessing. > > > > This init driver will support cpu frequency change firstly. > > > > Signed-off-by: Leo Yan > > The kernelci.org build/boot bot detected boot failures in > linux-next[1], and the failure was bisected down to this patch (landed > in linux-next as commit c1628a2c416da947f5afac615d53189250fa49cb. > > I verifed that reverting this commit on top of next-20150901 gets the > hikey booting again. Thanks for reporting. This issue has been confirmed at my side, it's caused by the patch have added dependency with MAILBOX, will fix this issue and send patch. Thanks, Leo Yan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] clk: Hi6220: separately build stub clock driver
The previous code, kernel builds Hi6220's common clock driver and stub clock driver together. Stub clock driver has introduced the dependency with CONFIG_MAILBOX, so kernel will not build Hi6220's common clock driver due ARM64's defconfig have not enabled CONFIG_MAILBOX by default. So separately build stub clock driver and common clock driver for Hi6220; and only let stub clock driver has the dependency with CONFIG_MAILBOX. Signed-off-by: Leo Yan --- drivers/clk/hisilicon/Kconfig | 8 +++- drivers/clk/hisilicon/Makefile | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig index 2c16807..e434854 100644 --- a/drivers/clk/hisilicon/Kconfig +++ b/drivers/clk/hisilicon/Kconfig @@ -1,6 +1,12 @@ config COMMON_CLK_HI6220 bool "Hi6220 Clock Driver" - depends on (ARCH_HISI || COMPILE_TEST) && MAILBOX + depends on ARCH_HISI || COMPILE_TEST default ARCH_HISI help Build the Hisilicon Hi6220 clock driver based on the common clock framework. + +config STUB_CLK_HI6220 + bool "Hi6220 Stub Clock Driver" + depends on COMMON_CLK_HI6220 && MAILBOX + help + Build the Hisilicon Hi6220 stub clock driver. diff --git a/drivers/clk/hisilicon/Makefile b/drivers/clk/hisilicon/Makefile index 4a1001a..74dba31 100644 --- a/drivers/clk/hisilicon/Makefile +++ b/drivers/clk/hisilicon/Makefile @@ -7,4 +7,5 @@ obj-y += clk.o clkgate-separated.o clkdivider-hi6220.o obj-$(CONFIG_ARCH_HI3xxx) += clk-hi3620.o obj-$(CONFIG_ARCH_HIP04) += clk-hip04.o obj-$(CONFIG_ARCH_HIX5HD2) += clk-hix5hd2.o -obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o clk-hi6220-stub.o +obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o +obj-$(CONFIG_STUB_CLK_HI6220) += clk-hi6220-stub.o -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFCv5 PATCH 32/46] sched: Energy-aware wake-up task placement
On Tue, Jul 07, 2015 at 07:24:15PM +0100, Morten Rasmussen wrote: > Let available compute capacity and estimated energy impact select > wake-up target cpu when energy-aware scheduling is enabled and the > system in not over-utilized (above the tipping point). > > energy_aware_wake_cpu() attempts to find group of cpus with sufficient > compute capacity to accommodate the task and find a cpu with enough spare > capacity to handle the task within that group. Preference is given to > cpus with enough spare capacity at the current OPP. Finally, the energy > impact of the new target and the previous task cpu is compared to select > the wake-up target cpu. > > cc: Ingo Molnar > cc: Peter Zijlstra > > Signed-off-by: Morten Rasmussen > --- > kernel/sched/fair.c | 85 > - > 1 file changed, 84 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0f7dbda4..01f7337 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5427,6 +5427,86 @@ static int select_idle_sibling(struct task_struct *p, > int target) > return target; > } > > +static int energy_aware_wake_cpu(struct task_struct *p, int target) > +{ > + struct sched_domain *sd; > + struct sched_group *sg, *sg_target; > + int target_max_cap = INT_MAX; > + int target_cpu = task_cpu(p); > + int i; > + > + sd = rcu_dereference(per_cpu(sd_ea, task_cpu(p))); > + > + if (!sd) > + return target; > + > + sg = sd->groups; > + sg_target = sg; > + > + /* > + * Find group with sufficient capacity. We only get here if no cpu is > + * overutilized. We may end up overutilizing a cpu by adding the task, > + * but that should not be any worse than select_idle_sibling(). > + * load_balance() should sort it out later as we get above the tipping > + * point. > + */ > + do { > + /* Assuming all cpus are the same in group */ > + int max_cap_cpu = group_first_cpu(sg); > + > + /* > + * Assume smaller max capacity means more energy-efficient. > + * Ideally we should query the energy model for the right > + * answer but it easily ends up in an exhaustive search. > + */ > + if (capacity_of(max_cap_cpu) < target_max_cap && > + task_fits_capacity(p, max_cap_cpu)) { > + sg_target = sg; > + target_max_cap = capacity_of(max_cap_cpu); > + } Here should consider scenario for two groups have same capacity? This will benefit for the case LITTLE.LITTLE. So the code will be looks like below: int target_sg_cpu = INT_MAX; if (capacity_of(max_cap_cpu) <= target_max_cap && task_fits_capacity(p, max_cap_cpu)) { if ((capacity_of(max_cap_cpu) == target_max_cap) && (target_sg_cpu < max_cap_cpu)) continue; target_sg_cpu = max_cap_cpu; sg_target = sg; target_max_cap = capacity_of(max_cap_cpu); } > + } while (sg = sg->next, sg != sd->groups); > + > + /* Find cpu with sufficient capacity */ > + for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg_target)) { > + /* > + * p's blocked utilization is still accounted for on prev_cpu > + * so prev_cpu will receive a negative bias due the double > + * accouting. However, the blocked utilization may be zero. > + */ > + int new_usage = get_cpu_usage(i) + task_utilization(p); > + > + if (new_usage > capacity_orig_of(i)) > + continue; > + > + if (new_usage < capacity_curr_of(i)) { > + target_cpu = i; > + if (cpu_rq(i)->nr_running) > + break; > + } > + > + /* cpu has capacity at higher OPP, keep it as fallback */ > + if (target_cpu == task_cpu(p)) > + target_cpu = i; > + } > + > + if (target_cpu != task_cpu(p)) { > + struct energy_env eenv = { > + .usage_delta= task_utilization(p), > + .src_cpu= task_cpu(p), > + .dst_cpu= target_cpu, > + }; > + > + /* Not enough spare capacity on previous cpu */ > + if (cpu_overutilized(task_cpu(p))) > + return target_cpu; > + > + if (energy_diff(&eenv) >= 0) > + return task_cpu(p); > + } > + > + return target_cpu; > +} > + > /* > * select_task_rq_fair: Select target runqueue for the waking task in domains > * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE, > @@ -5479,7 +5559,10 @@ select_task_rq_fair(struct task_struct *p, int > prev_cpu, int
Re: [RFCv5 PATCH 22/46] sched: Calculate energy consumption of sched_group
On Tue, Jul 07, 2015 at 07:24:05PM +0100, Morten Rasmussen wrote: > For energy-aware load-balancing decisions it is necessary to know the > energy consumption estimates of groups of cpus. This patch introduces a > basic function, sched_group_energy(), which estimates the energy > consumption of the cpus in the group and any resources shared by the > members of the group. > > NOTE: The function has five levels of identation and breaks the 80 > character limit. Refactoring is necessary. > > cc: Ingo Molnar > cc: Peter Zijlstra > > Signed-off-by: Morten Rasmussen > --- > kernel/sched/fair.c | 146 > > 1 file changed, 146 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 78d3081..bd0be9d 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4846,6 +4846,152 @@ static inline bool energy_aware(void) > return sched_feat(ENERGY_AWARE); > } > > +/* > + * cpu_norm_usage() returns the cpu usage relative to a specific capacity, > + * i.e. it's busy ratio, in the range [0..SCHED_LOAD_SCALE] which is useful > for > + * energy calculations. Using the scale-invariant usage returned by > + * get_cpu_usage() and approximating scale-invariant usage by: > + * > + * usage ~ (curr_freq/max_freq)*1024 * capacity_orig/1024 * > running_time/time > + * > + * the normalized usage can be found using the specific capacity. > + * > + * capacity = capacity_orig * curr_freq/max_freq > + * > + * norm_usage = running_time/time ~ usage/capacity > + */ > +static unsigned long cpu_norm_usage(int cpu, unsigned long capacity) > +{ > + int usage = __get_cpu_usage(cpu); > + > + if (usage >= capacity) > + return SCHED_CAPACITY_SCALE; > + > + return (usage << SCHED_CAPACITY_SHIFT)/capacity; > +} > + > +static unsigned long group_max_usage(struct sched_group *sg) > +{ > + int i; > + unsigned long max_usage = 0; > + > + for_each_cpu(i, sched_group_cpus(sg)) > + max_usage = max(max_usage, get_cpu_usage(i)); > + > + return max_usage; > +} > + > +/* > + * group_norm_usage() returns the approximated group usage relative to it's > + * current capacity (busy ratio) in the range [0..SCHED_LOAD_SCALE] for use > in > + * energy calculations. Since task executions may or may not overlap in time > in > + * the group the true normalized usage is between max(cpu_norm_usage(i)) and > + * sum(cpu_norm_usage(i)) when iterating over all cpus in the group, i. The > + * latter is used as the estimate as it leads to a more pessimistic energy > + * estimate (more busy). > + */ > +static unsigned long group_norm_usage(struct sched_group *sg, int cap_idx) > +{ > + int i; > + unsigned long usage_sum = 0; > + unsigned long capacity = sg->sge->cap_states[cap_idx].cap; > + > + for_each_cpu(i, sched_group_cpus(sg)) > + usage_sum += cpu_norm_usage(i, capacity); > + > + if (usage_sum > SCHED_CAPACITY_SCALE) > + return SCHED_CAPACITY_SCALE; > + return usage_sum; > +} > + > +static int find_new_capacity(struct sched_group *sg, > + struct sched_group_energy *sge) > +{ > + int idx; > + unsigned long util = group_max_usage(sg); > + > + for (idx = 0; idx < sge->nr_cap_states; idx++) { > + if (sge->cap_states[idx].cap >= util) > + return idx; > + } > + > + return idx; > +} > + > +/* > + * sched_group_energy(): Returns absolute energy consumption of cpus > belonging > + * to the sched_group including shared resources shared only by members of > the > + * group. Iterates over all cpus in the hierarchy below the sched_group > starting > + * from the bottom working it's way up before going to the next cpu until all > + * cpus are covered at all levels. The current implementation is likely to > + * gather the same usage statistics multiple times. This can probably be > done in > + * a faster but more complex way. > + */ > +static unsigned int sched_group_energy(struct sched_group *sg_top) > +{ > + struct sched_domain *sd; > + int cpu, total_energy = 0; > + struct cpumask visit_cpus; > + struct sched_group *sg; > + > + WARN_ON(!sg_top->sge); > + > + cpumask_copy(&visit_cpus, sched_group_cpus(sg_top)); > + > + while (!cpumask_empty(&visit_cpus)) { > + struct sched_group *sg_shared_cap = NULL; > + > + cpu = cpumask_first(&visit_cpus); > + > + /* > + * Is the group utilization affected by cpus outside this > + * sched_group? > + */ > + sd = highest_flag_domain(cpu, SD_SHARE_CAP_STATES); > + if (sd && sd->parent) > + sg_shared_cap = sd->parent->groups; If the sched domain is already the highest level, should directly use its group to calculate shared capacity? so the code like below: if (sd && sd->parent) sg_shared_cap =
Re: [PATCH] clk: Hi6220: separately build stub clock driver
On Thu, Sep 03, 2015 at 09:01:03AM -0700, Kevin Hilman wrote: > Leo Yan writes: > > > The previous code, kernel builds Hi6220's common clock driver and stub > > clock driver together. Stub clock driver has introduced the dependency > > with CONFIG_MAILBOX, so kernel will not build Hi6220's common clock > > driver due ARM64's defconfig have not enabled CONFIG_MAILBOX by default. > > > > So separately build stub clock driver and common clock driver for > > Hi6220; and only let stub clock driver has the dependency with > > CONFIG_MAILBOX. > > > > Signed-off-by: Leo Yan > > Tested-by: Kevin Hilman > > I verifed that this patch on top of next-20150902 fixes the hikey boot > failure I was seeing. Thanks, Kevin and Stephen :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 0/2] Optimization CPU idle state impacted by tick
After Rafael's patch series 'sched/cpuidle: Idle loop rework' has been merged in mainline kernel, it perfectly resolved the Powernightmares issue [1] with not stopping the tick during the idle loop; we verified this patch series on Arm platform (96boards Hikey620 with octa CA53 CPUs) with the rt-app [2] program to generate workloads: a single task with below combinded configurations with period 5ms and duty cycle 1%/3%/5%/10%/20%/30%/40%. After run these testing cases, we found the CPU cannot stay in deepest idle state as expected, the issues essentialy are related with sched tick: The prominent issue is the criteria for decision stopping tick; now the criteria is checking expected interval is less than TICK_USEC, but this doesn't consider from the perspective of idle state parameters, so we can observe the CPU even has enough sleeping time but it cannot run into deepest idle state; this is very serious for some specific ducy cycle cases. Another issue is after tick keeping running in idle state, the tick can heavily impact on 'menu' governor metrics, especially it will introduce many noise for next event correction factors. This patch series tries to fix these two issues; patch 0001 wants to define a time point to distinguish for stopping or not, this time point consideres the factors from tick period and the maximum target residency and use prediction period to compare this time point to decide if need to stop tick. Patch 0002 wants to always to give compensation for tick event so that dimiss the tick impaction on correction factors for next time prediction. Blow table are comparison results for testing cases between without and with this patch series; we run the test case with single task with period 5ms with different dutycycle, the total running time is 10s. Based on the tracing log, we do statistics for all CPUs for all idle states duration, the unit is second (s), on Hikey board the result shows the C2 state (the CPU deepest state) selection improvement. Some notations are used in the table: state: C0: WFI; C1: CPU OFF; C2: Cluster OFF All testing cases have single task with 5ms period: Without patches With patches Difference - - --- Duty cycle C0C1C2 C0C1 C2 C0C1 C2 1%0.218589 4.208460 87.995606 0.119723 0.847116 91.940569 -0.098866 -3.361344 +3.944963 3%0.801521 5.031361 86.444753 0.147346 0.820276 91.761191 -0.654175 -4.211085 +5.316438 5%0.590236 2.733048 88.284541 0.149237 1.042383 90.490482 -0.440999 -1.690665 +2.205941 10%0.601922 6.282368 84.899870 0.169491 1.304985 89.725754 -0.432431 -4.977383 +4.825884 20%1.381870 8.531687 80.627691 0.307390 3.302562 86.686887 -1.074480 -5.229125 +6.059196 30%1.785221 6.974483 81.083312 0.548050 5.319929 83.551747 -1.237171 -1.654554 +2.468435 40%1.403247 6.474203 80.577176 0.467686 6.366482 81.983384 -0.935561 -0.107721 +1.406208 Leo Yan (2): cpuidle: menu: Correct the criteria for stopping tick cpuidle: menu: Dismiss tick impaction on correction factors drivers/cpuidle/governors/menu.c | 55 1 file changed, 45 insertions(+), 10 deletions(-) -- 2.7.4
[PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors
If the idle duration predictor detects the tick is triggered, and with meeting the condition 'data->next_timer_us > TICK_USEC', it will give a big compensation for the 'measured' interval; this is purposed to avoid artificially small correction factor values. Unfortunately, this still cannot cover all cases of the tick impaction on correction factors, e.g. if the predicted next event is less than ITCK_USEC, then all wakening up by the ticks will be taken as usual case and reducing exit latency, as results the tick events heavily impacts the correction factors. Moreover, the coming tick sometimes is very soon, especially at the first time when the CPU becomes idle the tick expire time might be vary, so ticks can introduce big deviation on correction factors. If idle governor deliberately doesn't stop the tick timer, the tick event is coming as expected with fixed interval, so the tick event is predictable; if the tick event is coming early than other normal timer event and other possible wakeup events, we need to dismiss the tick impaction on correction factors, this can let the correction factor array is purely used for other wakeup events correctness rather than sched tick. This patch is to check if it's a tick wakeup, it takes the CPU can stay in the idle state for enough time so it gives high compensation for the measured' interval, this can avoid tick impaction on the correction factor array. Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 2ce4068..43cbde3 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -525,15 +525,13 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * assume the state was never reached and the exit latency is 0. */ - if (data->tick_wakeup && data->next_timer_us > TICK_USEC) { + if (data->tick_wakeup) { /* -* The nohz code said that there wouldn't be any events within -* the tick boundary (if the tick was stopped), but the idle -* duration predictor had a differing opinion. Since the CPU -* was woken up by a tick (that wasn't stopped after all), the -* predictor was not quite right, so assume that the CPU could -* have been idle long (but not forever) to help the idle -* duration predictor do a better job next time. +* Since the CPU was woken up by a tick (that wasn't stopped +* after all), the predictor was not quite right, so assume +* that the CPU could have been idle long (but not forever) +* to help the idle duration predictor do a better job next +* time. */ measured_us = 9 * MAX_INTERESTING / 10; } else { -- 2.7.4
[PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
The criteria for keeping tick running is the prediction duration is less than TICK_USEC, the mainline kernel configures HZ=250 so TICK_USEC equals to 4000us, so any prediction is less than 4000us will not stop tick and the idle state will be fixed up to one shallow state. On the other hand, let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has the deepest C-state is cluster off state which its 'target_residency' is 2700us, if the 'menu' governor predicts the next idle duration is any value fallen into the range [2700us, 4000us), then the 'menu' governor will keep sched tick running and and roll back to a shallow CPU off state rather than cluster off state. Finally we can see the CPU has much less chance to run into deepest state when a task repeatedly running on it with 5000us period and 40% duty cycle (so the task runs for 2000us and then sleep for 3000us in every period). In theory, we should permit the CPU to stay in cluster off state due the CPU sleeping time 3000us is over its 'target_residency' 2700us. This issue is caused by the 'menu' governor's criteria for decision if need to enable tick and roll back to shallow state, the criteria is: 'expected_interval < TICK_USEC'. This criteria is only considering from tick aspect, but it doesn't consider idle state residency so misses better choice for deeper idle state; e.g., the deepest idle state 'target_residency' is less than TICK_USEC, which is quite common on Arm platforms. To fix this issue, this patch is to add one extra variable 'stop_tick_point' to help decision if need to stop tick or not. If prediction is longer than 'stop_tick_point' then we can stop tick, otherwise it will keep tick running. For 'stop_tick_point', except we need to compare prediction period with TICK_USEC, we also need consider from the perspective of deepest idle state 'target_residency'. Finally, 'stop_tick_point' is coming from the minimum value within the deepest idle state 'target_residency' and TICK_USEC. Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 41 ++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 30ab759..2ce4068 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, unsigned int expected_interval; unsigned long nr_iowaiters, cpu_load; ktime_t delta_next; + unsigned int stop_tick_point; if (data->needs_update) { menu_update(drv, dev); @@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = 0; /* No states enabled. Must use 0. */ /* +* Decide the time point for tick stopping, if the prediction is before +* this time point it's better to keep the tick enabled and after the +* time point it means the CPU can stay in idle state for enough long +* time so should stop the tick. This point needs to consider two +* factors: the first one is tick period and the another factor is the +* maximum target residency. +* +* We can divide into below cases: +* +* The first case is the prediction is shorter than the maximum target +* residency and also shorter than tick period, this means the +* prediction isn't to use deepest idle state and it's suppose the CPU +* will be waken up within tick period, for this case we should keep +* the tick to be enabled; +* +* The second case is the prediction is shorter than the maximum target +* residency and longer than tick period, for this case the idle state +* selection has already based on the prediction for shallow state and +* we will expect some events can arrive later than tick to wake up the +* CPU; another thinking for this case is the CPU is likely to stay in +* the expected idle state for long while (which should be longer than +* tick period), so it's reasonable to stop the tick. +* +* The third case is the prediction is longer than the maximum target +* residency, but weather it's longer or shorter than tick period; for +* this case we have selected the deepest idle state so it's pointless +* to enable tick to wake up CPU from deepest state. +* +* To summary upper cases, we use the value of min(TICK_USEC, +* maximum_target_residency) as the critical point to decide if need to +* stop tick. +*/ + stop_tick_point = min_t(u
Re: [PATCH v1 0/2] Optimization CPU idle state impacted by tick
On Tue, Aug 07, 2018 at 10:27:02PM +0800, Leo Yan wrote: > After Rafael's patch series 'sched/cpuidle: Idle loop rework' has been > merged in mainline kernel, it perfectly resolved the Powernightmares > issue [1] with not stopping the tick during the idle loop; we verified > this patch series on Arm platform (96boards Hikey620 with octa CA53 > CPUs) with the rt-app [2] program to generate workloads: a single task > with below combinded configurations with period 5ms and duty cycle > 1%/3%/5%/10%/20%/30%/40%. Oops, I missed the two reference links, for complete info so list here: [1] https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf?lang=en [2] https://git.linaro.org/power/rt-app.git Thanks, Leo Yan
[PATCH] sched: idle: Reenable sched tick for cpuidle request
The idle loop stops tick by respecting the decision from cpuidle framework, if the condition 'need_resched()' is false without any task scheduling, the CPU keeps running in the loop in do_idle() and it has no chance call tick_nohz_idle_exit() to enable the tick. This results in the idle loop cannot reenable sched tick afterwards, if the idle governor selects a shallow state, thus the powernightmares issue can occur again. This issue can be easily reproduce with the case on Arm Hikey board: use CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback function it start a hrtimer with 4ms, so the 4ms timer delta value can let 'menu' governor to choose deepest state in the next entering idle time. From then on, CPU7 restarts hrtimer with 1ms interval for total 10 times, so this can utilize the typical pattern in 'menu' governor to have prediction for 1ms duration, finally idle governor is easily to select a shallow state, on Hikey board it usually is to select CPU off state. From then on, CPU7 stays in this shallow state for long time until there have other interrupts on it. C2: cluster off; C1: CPU off Idle state: C2C2C2C2C2C2C2C1 -> Interrupt: ^^ ^ ^ ^ ^ ^ ^ ^ IPI Timer Timer Timer Timer Timer Timer Timer Timer 4ms 1ms 1ms 1ms 1ms 1ms 1ms 1ms To fix this issue, the idle loop needs to support reenabling sched tick. This patch checks the conditions 'stop_tick' is false when the tick is stopped, this condition indicates the cpuidle governor asks to reenable the tick and we can use tick_nohz_idle_restart_tick() for this purpose. A synthetic case is used to to verify this patch, we use CPU0 to send IPI to wake up CPU7 with 50ms interval, CPU7 generate a series hrtimer events (the first interval is 4ms, then the sequential 10 timer events are 1ms interval, same as described above). We do statistics for idle states duration, the unit is second (s), the testing result shows the C2 state (deepest state) staying time can be improved significantly for CPU7 (+7.942s for 10s execution time on CPU7) and all CPUs wide (+13.360s for ~80s of all CPUs execution time). Without patches With patches Difference --- CPUC0 C1 C2 C0 C1 C2 C0 C1 C2 00.000 0.027 9.941 0.055 0.038 9.700 +0.055 +0.010 -0.240 10.045 0.000 9.964 0.019 0.000 9.943 -0.026 +0.000 -0.020 20.002 0.003 10.007 0.035 0.053 9.916 +0.033 +0.049 -0.090 30.000 0.023 9.994 0.024 0.246 9.732 +0.024 +0.222 -0.261 40.032 0.000 9.985 0.015 0.007 9.993 -0.016 +0.007 +0.008 50.001 0.000 9.226 0.039 0.000 9.971 +0.038 +0.000 +0.744 60.000 0.000 0.000 0.036 0.000 5.278 +0.036 +0.000 +5.278 71.894 8.013 0.059 1.509 0.026 8.002 -0.384 -7.987 +7.942 All 1.976 8.068 59.179 1.737 0.372 72.539 -0.239 -7.695 +13.360 Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- kernel/sched/idle.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 1a3e9bd..802286e 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -190,10 +190,18 @@ static void cpuidle_idle_call(void) */ next_state = cpuidle_select(drv, dev, &stop_tick); - if (stop_tick) + if (stop_tick) { tick_nohz_idle_stop_tick(); - else + } else { + /* +* The cpuidle framework says to not stop tick but +* the tick has been stopped yet, so restart it. +*/ + if (tick_nohz_tick_stopped()) + tick_nohz_idle_restart_tick(); + tick_nohz_idle_retain_tick(); + } rcu_idle_enter(); -- 2.7.4
Re: [PATCH] sched: idle: Reenable sched tick for cpuidle request
On Thu, Aug 09, 2018 at 01:47:27PM +0800, Leo Yan wrote: > The idle loop stops tick by respecting the decision from cpuidle > framework, if the condition 'need_resched()' is false without any task > scheduling, the CPU keeps running in the loop in do_idle() and it has no > chance call tick_nohz_idle_exit() to enable the tick. This results in > the idle loop cannot reenable sched tick afterwards, if the idle > governor selects a shallow state, thus the powernightmares issue can > occur again. > > This issue can be easily reproduce with the case on Arm Hikey board: use > CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback > function it start a hrtimer with 4ms, so the 4ms timer delta value can > let 'menu' governor to choose deepest state in the next entering idle > time. From then on, CPU7 restarts hrtimer with 1ms interval for total > 10 times, so this can utilize the typical pattern in 'menu' governor to > have prediction for 1ms duration, finally idle governor is easily to > select a shallow state, on Hikey board it usually is to select CPU off > state. From then on, CPU7 stays in this shallow state for long time > until there have other interrupts on it. > > C2: cluster off; C1: CPU off > > Idle state: C2C2C2C2C2C2C2C1 > -> > Interrupt: ^^ ^ ^ ^ ^ ^ ^ ^ > IPI Timer Timer Timer Timer Timer Timer Timer Timer >4ms 1ms 1ms 1ms 1ms 1ms 1ms 1ms > > To fix this issue, the idle loop needs to support reenabling sched tick. > This patch checks the conditions 'stop_tick' is false when the tick is > stopped, this condition indicates the cpuidle governor asks to reenable > the tick and we can use tick_nohz_idle_restart_tick() for this purpose. > > A synthetic case is used to to verify this patch, we use CPU0 to send > IPI to wake up CPU7 with 50ms interval, CPU7 generate a series hrtimer > events (the first interval is 4ms, then the sequential 10 timer events > are 1ms interval, same as described above). We do statistics for idle > states duration, the unit is second (s), the testing result shows the > C2 state (deepest state) staying time can be improved significantly for > CPU7 (+7.942s for 10s execution time on CPU7) and all CPUs wide > (+13.360s for ~80s of all CPUs execution time). > >Without patches With patches Difference > --- > CPUC0 C1 C2 C0 C1 C2 C0 C1 C2 > 00.000 0.027 9.941 0.055 0.038 9.700 +0.055 +0.010 -0.240 > 10.045 0.000 9.964 0.019 0.000 9.943 -0.026 +0.000 -0.020 > 20.002 0.003 10.007 0.035 0.053 9.916 +0.033 +0.049 -0.090 > 30.000 0.023 9.994 0.024 0.246 9.732 +0.024 +0.222 -0.261 > 40.032 0.000 9.985 0.015 0.007 9.993 -0.016 +0.007 +0.008 > 50.001 0.000 9.226 0.039 0.000 9.971 +0.038 +0.000 +0.744 > 60.000 0.000 0.000 0.036 0.000 5.278 +0.036 +0.000 +5.278 > 71.894 8.013 0.059 1.509 0.026 8.002 -0.384 -7.987 +7.942 > All 1.976 8.068 59.179 1.737 0.372 72.539 -0.239 -7.695 +13.360 I found the CPU6 data in upper table is flaw when I read this again, CPU6 has no any ftrace event for idle entering/exiting from the start testing, both two runs have the same issue. so the result is not reliable for CPU6. Retested this case and at the beginning to wake up all CPUs so we can have sane idle ftrace events. Below is result, the conclusion is: CPU7 has improvement for staying in deepest state and there have no regression on other CPUs. Without patches With patches Difference -- CPUC0 C1 C2 C0 C1 C2 C0 C1 C2 00.000 0.021 9.837 0.000 0.022 9.919 +0.000 +0.000 +0.081 10.000 0.003 10.034 0.028 0.000 9.983 +0.028 -0.003 -0.051 20.023 0.031 9.963 0.007 0.019 9.986 -0.016 -0.011 +0.023 30.028 0.003 9.976 0.000 0.008 10.006 -0.027 +0.005 +0.030 40.052 0.000 9.971 0.023 0.000 9.994 -0.028 +0.000 +0.022 50.027 0.000 10.002 0.024 0.000 9.996 -0.002 +0.000 -0.006 60.013 0.000 10.018 0.025 0.000 9.992 +0.011 +0.000 -0.025 71.766 8.041 0.043 1.981 0.030 7.872 +0.214 -8.011 +7.829 All 1.912 8.101 69.847 2.092 0.081 77.752 +0.180 -8.020 +7.905 Another important dependency should to mention, we also need another prerequisite patch "cpuidle: menu: Correct the criteria for stopping tick&qu
Re: [PATCH] sched: idle: Reenable sched tick for cpuidle request
On Thu, Aug 09, 2018 at 12:45:49PM +0200, Peter Zijlstra wrote: > On Thu, Aug 09, 2018 at 01:47:27PM +0800, Leo Yan wrote: > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > > index 1a3e9bd..802286e 100644 > > --- a/kernel/sched/idle.c > > +++ b/kernel/sched/idle.c > > @@ -190,10 +190,18 @@ static void cpuidle_idle_call(void) > > */ > > next_state = cpuidle_select(drv, dev, &stop_tick); > > > > - if (stop_tick) > > + if (stop_tick) { > > tick_nohz_idle_stop_tick(); > > - else > > + } else { > > + /* > > +* The cpuidle framework says to not stop tick but > > +* the tick has been stopped yet, so restart it. > > +*/ > > + if (tick_nohz_tick_stopped()) > > + tick_nohz_idle_restart_tick(); > > + > > I suspect you want an 'else' here. restart_tick already calls > timer_clear_idle(). No, from the testing I found must call retain_tick, otherwise the kernel compliants the warning from tick_nohz_idle_exit() when exit from idle state: WARN_ON_ONCE(ts->timer_expires_base); > > tick_nohz_idle_retain_tick(); > > + } > > > > However, I would rather we stuff all this into retain_tick. Ah, yes; I tested below change and it also have same improvement for idle state with my preivous change; please review if it's okay? diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index da9455a..fd2bfad 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -962,6 +962,10 @@ void tick_nohz_idle_stop_tick(void) void tick_nohz_idle_retain_tick(void) { + /* Restart the tikc if it has been stopped yet. */ + if (tick_nohz_tick_stopped()) + tick_nohz_idle_restart_tick(); + tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched)); /* * Undo the effect of get_next_timer_interrupt() called from Thanks, Leo Yan
Re: [PATCH] sched: idle: Reenable sched tick for cpuidle request
On Thu, Aug 09, 2018 at 05:42:30PM +0200, Rafael J. Wysocki wrote: [...] > >> This issue can be easily reproduce with the case on Arm Hikey board: use > >> CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback > >> function it start a hrtimer with 4ms, so the 4ms timer delta value can > >> let 'menu' governor to choose deepest state in the next entering idle > >> time. From then on, CPU7 restarts hrtimer with 1ms interval for total > >> 10 times, so this can utilize the typical pattern in 'menu' governor to > >> have prediction for 1ms duration, finally idle governor is easily to > >> select a shallow state, on Hikey board it usually is to select CPU off > >> state. From then on, CPU7 stays in this shallow state for long time > >> until there have other interrupts on it. > > > > And which means that the above-mentioned code misses this case. > > And I don't really understand how this happens. :-/ > > If menu sees that the tick has been stopped, it sets > data->predicted_us to the minimum of TICK_USEC and > ktime_to_us(delta_next) and the latency requirements comes from PM QoS > (no interactivity boost). Thus the only case when it will say "do not > stop the tick" is when delta_next is below the tick period length, but > that's OK, because it means that there is a timer pending that much > time away, so it doesn't make sense to select a deeper idle state > then. > > If there is a short-interval timer pending every time we go idle, it > doesn't matter that the tick is stopped really, because the other > timer will wake the CPU up anyway. > > Have I missed anything? Yeah, you miss one case is if there haven't anymore timer event, for this case the ktime_to_us(delta_next) is a quite large value and data->predicted_us will be to set TICK_USEC; if HZ=1000 then TICK_USEC is 1000us, on Hikey board if data->predicted_us is 1000us then it's easily to set shallow state (C1) rather than C2. Unfortunately, this is the last time the CPU can predict idle state before it will stay in idle for long period. [...] > >> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > >> index 1a3e9bd..802286e 100644 > >> --- a/kernel/sched/idle.c > >> +++ b/kernel/sched/idle.c > >> @@ -190,10 +190,18 @@ static void cpuidle_idle_call(void) > >>*/ > >> next_state = cpuidle_select(drv, dev, &stop_tick); > >> > >> - if (stop_tick) > >> + if (stop_tick) { > >> tick_nohz_idle_stop_tick(); > >> - else > >> + } else { > >> + /* > >> + * The cpuidle framework says to not stop tick but > >> + * the tick has been stopped yet, so restart it. > >> + */ > >> + if (tick_nohz_tick_stopped()) > >> + tick_nohz_idle_restart_tick(); > > > > You need an "else" here IMO as Peter said. Yeah, I have replied to Peter, after we restart the tick, I found must to call tick_retain() to clear 'ts->timer_expires_base' to 0, otherwise tick_nohz_idle_exit() reports warning when exit from idle loop. > And I really would prefer to avoid restarting the tick here, because > it is overhead and quite likely unnecessary. I understand the logic when read the code, actually I did some experiments on the function menu_select(), in menu_select() function it discards the consideration for typical pattern interval and it also tries to avoid to enable tick and select more shallow state at the bottom of function. So I agree that in the middle of idles it's redundant to reenable tick and the code is careful thought. But this patch tries to rescue the case at the last time the CPU enter one shallow idle state but without wake up event. > >> + > >> tick_nohz_idle_retain_tick(); > >> + } > >> > >> rcu_idle_enter(); > >> > >> > > > > Please CC cpuidle patches to linux...@vger.kernel.org, that helps a lot. > > Thanks, > Rafael
Re: [PATCH] sched: idle: Reenable sched tick for cpuidle request
On Thu, Aug 09, 2018 at 06:43:55PM +0200, Rafael J. Wysocki wrote: > On Thu, Aug 9, 2018 at 6:29 PM, wrote: > > On Thu, Aug 09, 2018 at 05:42:30PM +0200, Rafael J. Wysocki wrote: > > > > [...] > > > >> >> This issue can be easily reproduce with the case on Arm Hikey board: use > >> >> CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback > >> >> function it start a hrtimer with 4ms, so the 4ms timer delta value can > >> >> let 'menu' governor to choose deepest state in the next entering idle > >> >> time. From then on, CPU7 restarts hrtimer with 1ms interval for total > >> >> 10 times, so this can utilize the typical pattern in 'menu' governor to > >> >> have prediction for 1ms duration, finally idle governor is easily to > >> >> select a shallow state, on Hikey board it usually is to select CPU off > >> >> state. From then on, CPU7 stays in this shallow state for long time > >> >> until there have other interrupts on it. > >> > > >> > And which means that the above-mentioned code misses this case. > >> > >> And I don't really understand how this happens. :-/ > >> > >> If menu sees that the tick has been stopped, it sets > >> data->predicted_us to the minimum of TICK_USEC and > >> ktime_to_us(delta_next) and the latency requirements comes from PM QoS > >> (no interactivity boost). Thus the only case when it will say "do not > >> stop the tick" is when delta_next is below the tick period length, but > >> that's OK, because it means that there is a timer pending that much > >> time away, so it doesn't make sense to select a deeper idle state > >> then. > >> > >> If there is a short-interval timer pending every time we go idle, it > >> doesn't matter that the tick is stopped really, because the other > >> timer will wake the CPU up anyway. > >> > >> Have I missed anything? > > > > Yeah, you miss one case is if there haven't anymore timer event, for this > > case the ktime_to_us(delta_next) is a quite large value and > > data->predicted_us will be to set TICK_USEC; if HZ=1000 then TICK_USEC is > > 1000us, on Hikey board if data->predicted_us is 1000us then it's easily > > to set shallow state (C1) rather than C2. Unfortunately, this is the > > last time the CPU can predict idle state before it will stay in idle > > for long period. > > Fair enough, but in that case the governor will want the tick to be > stopped, because expected_interval is TICK_USEC then, so I'm not sure > how the patch helps? Correct, I might introduce confusion at here and I mentioned in another email I have one prerequisite patch [1]: "cpuidle: menu: Correct the criteria for stopping tick", if without this dependency patch, the idle governor will always stop the tick even it selects one shallow state. Sorry when I sent patchs with [1], I didn't send to linux-pm mailing list, do you want me to send these patches to linux-pm? [...] Thanks, Leo Yan [1] https://lkml.org/lkml/2018/8/7/407
[RESEND PATCH v1 0/2] Optimization CPU idle state impacted by tick
After Rafael's patch series 'sched/cpuidle: Idle loop rework' has been merged in mainline kernel, it perfectly resolved the Powernightmares issue [1] with not stopping the tick during the idle loop; we verified this patch series on Arm platform (96boards Hikey620 with octa CA53 CPUs) with the rt-app [2] program to generate workloads: a single task with below combinded configurations with period 5ms and duty cycle 1%/3%/5%/10%/20%/30%/40%. After run these testing cases, we found the CPU cannot stay in deepest idle state as expected, the issues essentialy are related with sched tick: The prominent issue is the criteria for decision stopping tick; now the criteria is checking expected interval is less than TICK_USEC, but this doesn't consider from the perspective of idle state parameters, so we can observe the CPU even has enough sleeping time but it cannot run into deepest idle state; this is very serious for some specific ducy cycle cases. Another issue is after tick keeping running in idle state, the tick can heavily impact on 'menu' governor metrics, especially it will introduce many noise for next event correction factors. This patch series tries to fix these two issues; patch 0001 wants to define a time point to distinguish for stopping or not, this time point consideres the factors from tick period and the maximum target residency and use prediction period to compare this time point to decide if need to stop tick. Patch 0002 wants to always to give compensation for tick event so that dimiss the tick impaction on correction factors for next time prediction. Blow table are comparison results for testing cases between without and with this patch series; we run the test case with single task with period 5ms with different dutycycle, the total running time is 10s. Based on the tracing log, we do statistics for all CPUs for all idle states duration, the unit is second (s), on Hikey board the result shows the C2 state (the CPU deepest state) selection improvement. Some notations are used in the table: state: C0: WFI; C1: CPU OFF; C2: Cluster OFF All testing cases have single task with 5ms period: Without patches With patches Difference - - --- Duty cycle C0C1C2 C0C1 C2 C0C1 C2 1%0.218589 4.208460 87.995606 0.119723 0.847116 91.940569 -0.098866 -3.361344 +3.944963 3%0.801521 5.031361 86.444753 0.147346 0.820276 91.761191 -0.654175 -4.211085 +5.316438 5%0.590236 2.733048 88.284541 0.149237 1.042383 90.490482 -0.440999 -1.690665 +2.205941 10%0.601922 6.282368 84.899870 0.169491 1.304985 89.725754 -0.432431 -4.977383 +4.825884 20%1.381870 8.531687 80.627691 0.307390 3.302562 86.686887 -1.074480 -5.229125 +6.059196 30%1.785221 6.974483 81.083312 0.548050 5.319929 83.551747 -1.237171 -1.654554 +2.468435 40%1.403247 6.474203 80.577176 0.467686 6.366482 81.983384 -0.935561 -0.107721 +1.406208 Leo Yan (2): cpuidle: menu: Correct the criteria for stopping tick cpuidle: menu: Dismiss tick impaction on correction factors drivers/cpuidle/governors/menu.c | 55 1 file changed, 45 insertions(+), 10 deletions(-) -- 2.7.4
[RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
The criteria for keeping tick running is the prediction duration is less than TICK_USEC, the mainline kernel configures HZ=250 so TICK_USEC equals to 4000us, so any prediction is less than 4000us will not stop tick and the idle state will be fixed up to one shallow state. On the other hand, let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has the deepest C-state is cluster off state which its 'target_residency' is 2700us, if the 'menu' governor predicts the next idle duration is any value fallen into the range [2700us, 4000us), then the 'menu' governor will keep sched tick running and and roll back to a shallow CPU off state rather than cluster off state. Finally we can see the CPU has much less chance to run into deepest state when a task repeatedly running on it with 5000us period and 40% duty cycle (so the task runs for 2000us and then sleep for 3000us in every period). In theory, we should permit the CPU to stay in cluster off state due the CPU sleeping time 3000us is over its 'target_residency' 2700us. This issue is caused by the 'menu' governor's criteria for decision if need to enable tick and roll back to shallow state, the criteria is: 'expected_interval < TICK_USEC'. This criteria is only considering from tick aspect, but it doesn't consider idle state residency so misses better choice for deeper idle state; e.g., the deepest idle state 'target_residency' is less than TICK_USEC, which is quite common on Arm platforms. To fix this issue, this patch is to add one extra variable 'stop_tick_point' to help decision if need to stop tick or not. If prediction is longer than 'stop_tick_point' then we can stop tick, otherwise it will keep tick running. For 'stop_tick_point', except we need to compare prediction period with TICK_USEC, we also need consider from the perspective of deepest idle state 'target_residency'. Finally, 'stop_tick_point' is coming from the minimum value within the deepest idle state 'target_residency' and TICK_USEC. Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 41 ++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 30ab759..2ce4068 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, unsigned int expected_interval; unsigned long nr_iowaiters, cpu_load; ktime_t delta_next; + unsigned int stop_tick_point; if (data->needs_update) { menu_update(drv, dev); @@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = 0; /* No states enabled. Must use 0. */ /* +* Decide the time point for tick stopping, if the prediction is before +* this time point it's better to keep the tick enabled and after the +* time point it means the CPU can stay in idle state for enough long +* time so should stop the tick. This point needs to consider two +* factors: the first one is tick period and the another factor is the +* maximum target residency. +* +* We can divide into below cases: +* +* The first case is the prediction is shorter than the maximum target +* residency and also shorter than tick period, this means the +* prediction isn't to use deepest idle state and it's suppose the CPU +* will be waken up within tick period, for this case we should keep +* the tick to be enabled; +* +* The second case is the prediction is shorter than the maximum target +* residency and longer than tick period, for this case the idle state +* selection has already based on the prediction for shallow state and +* we will expect some events can arrive later than tick to wake up the +* CPU; another thinking for this case is the CPU is likely to stay in +* the expected idle state for long while (which should be longer than +* tick period), so it's reasonable to stop the tick. +* +* The third case is the prediction is longer than the maximum target +* residency, but weather it's longer or shorter than tick period; for +* this case we have selected the deepest idle state so it's pointless +* to enable tick to wake up CPU from deepest state. +* +* To summary upper cases, we use the value of min(TICK_USEC, +* maximum_target_residency) as the critical point to decide if need to +* stop tick. +*/ + stop_tick_point = min_t(u
[RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors
If the idle duration predictor detects the tick is triggered, and with meeting the condition 'data->next_timer_us > TICK_USEC', it will give a big compensation for the 'measured' interval; this is purposed to avoid artificially small correction factor values. Unfortunately, this still cannot cover all cases of the tick impaction on correction factors, e.g. if the predicted next event is less than ITCK_USEC, then all wakening up by the ticks will be taken as usual case and reducing exit latency, as results the tick events heavily impacts the correction factors. Moreover, the coming tick sometimes is very soon, especially at the first time when the CPU becomes idle the tick expire time might be vary, so ticks can introduce big deviation on correction factors. If idle governor deliberately doesn't stop the tick timer, the tick event is coming as expected with fixed interval, so the tick event is predictable; if the tick event is coming early than other normal timer event and other possible wakeup events, we need to dismiss the tick impaction on correction factors, this can let the correction factor array is purely used for other wakeup events correctness rather than sched tick. This patch is to check if it's a tick wakeup, it takes the CPU can stay in the idle state for enough time so it gives high compensation for the measured' interval, this can avoid tick impaction on the correction factor array. Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 2ce4068..43cbde3 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -525,15 +525,13 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * assume the state was never reached and the exit latency is 0. */ - if (data->tick_wakeup && data->next_timer_us > TICK_USEC) { + if (data->tick_wakeup) { /* -* The nohz code said that there wouldn't be any events within -* the tick boundary (if the tick was stopped), but the idle -* duration predictor had a differing opinion. Since the CPU -* was woken up by a tick (that wasn't stopped after all), the -* predictor was not quite right, so assume that the CPU could -* have been idle long (but not forever) to help the idle -* duration predictor do a better job next time. +* Since the CPU was woken up by a tick (that wasn't stopped +* after all), the predictor was not quite right, so assume +* that the CPU could have been idle long (but not forever) +* to help the idle duration predictor do a better job next +* time. */ measured_us = 9 * MAX_INTERESTING / 10; } else { -- 2.7.4
Re: [PATCH] sched: idle: Reenable sched tick for cpuidle request
On Thu, Aug 09, 2018 at 11:31:46PM +0200, Rafael J . Wysocki wrote: [...] > > >> And I really would prefer to avoid restarting the tick here, because > > >> it is overhead and quite likely unnecessary. > > > > > > I understand the logic when read the code, actually I did some experiments > > > on the function menu_select(), in menu_select() function it discards the > > > consideration for typical pattern interval and it also tries to avoid to > > > enable tick and select more shallow state at the bottom of function. So I > > > agree that in the middle of idles it's redundant to reenable tick and the > > > code is careful thought. > > > > > > But this patch tries to rescue the case at the last time the CPU enter one > > > shallow idle state but without wake up event. > > > > It is better to avoid entering a shallow state IMO. Let me think > > about that a bit. > > The simple change below should address this issue and I don't quite see > what it can break. It may cause deeper idle states to be selected with > the tick already stopped, but that really shouldn't be problematic, as > (since the tick has been stopped) there are no strict latency constraints, > so even if there is an early wakeup, we should be able to tolerate the > extra latency just fine. > > --- > drivers/cpuidle/governors/menu.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > === > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr >* If the tick is already stopped, the cost of possible short >* idle duration misprediction is much higher, because the CPU >* may be stuck in a shallow idle state for a long time as a > - * result of it. In that case say we might mispredict and try > - * to force the CPU into a state for which we would have stopped > - * the tick, unless a timer is going to expire really soon > - * anyway. > + * result of it. In that case say we might mispredict and use > + * the known time to the closest timer event for the idle state > + * selection. >*/ > if (data->predicted_us < TICK_USEC) > - data->predicted_us = min_t(unsigned int, TICK_USEC, > -ktime_to_us(delta_next)); > + data->predicted_us = ktime_to_us(delta_next); I did the testing on this, but above change cannot really resolve the issue, it misses to handle the case if 'data->predicted_us > TICK_USEC'; if the prediction is longer than TICK_USEC, e.g. data->predicted_us is 2ms, TICK_USEC=1ms; for this case the deepest state will not be chosen and if the data->predicted_us is decided by typical pattern value but not the closest timer, finally the CPU still might stay in shallow state for long time. Actually in the CPU idle loop with the tick is stopped, I think we should achieve two targets: - Ensure the CPU can enter the deepest idle state at the last time it runs into into idle; - In the middle of idles, we will not reenable the tick at all; though the idle states can be chosen a shallow state for short prediction; To achieve the first target, we need to define what's the possible case the CPU might stay into shallow state but cannot be waken up in short time; so for this purpose it's pointeless to compare the value between 'data->predicted_us' and TICK_USEC, so I'd like to check if the next timer event is reliable to wake up CPU in short time, this can be finished by comparison between 'ktime_to_us(delta_next)' with maximum target residency; For the second target, we should not enable the tick again in the idle loop after the tick is stopped, whatever the governor choose any idle state. So how about below changes? I did some verify on this. diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 30ab759..e2de7c2 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -351,18 +351,21 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, data->predicted_us = min(data->predicted_us, expected_interval); if (tick_nohz_tick_stopped()) { + unsigned int delta_next_us = ktime_to_us(delta_next); + /* * If the tick is already stopped, the cost of possible short * idle duration misprediction is much higher, because the CPU * may be stuck in a shallow idle state for a long time as a -* result of it. In that case say we might mispredict and try -* to force the CPU into a state for which we would have stopped -* the tick
Re: [PATCH] sched: idle: Avoid retaining the tick when it has been stopped
On Thu, Aug 09, 2018 at 07:08:34PM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki > > If the tick has been stopped already, but the governor has not asked to > stop it (which it can do sometimes), the idle loop should invoke > tick_nohz_idle_stop_tick(), to let tick_nohz_stop_tick() take care > of this case properly. IMHO, I don't think this patch is on the right way; from the idle loop side, it needs to provide sane fundamental supports, for example, it can stop or restart the tick per idle governor's request. On the other hand, the idle governors can decide their own policy for how to use the tick in idle loop. This patch seems mixes two things and finally it's possible to couple the implementation between idle loop and 'menu' governor for sched tick usage. I still think my patch to restart the tick is valid :) Thanks, Leo Yan > Fixes: 554c8aa8ecad (sched: idle: Select idle state before stopping the tick) > Signed-off-by: Rafael J. Wysocki > --- > kernel/sched/idle.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/kernel/sched/idle.c > === > --- linux-pm.orig/kernel/sched/idle.c > +++ linux-pm/kernel/sched/idle.c > @@ -190,7 +190,7 @@ static void cpuidle_idle_call(void) >*/ > next_state = cpuidle_select(drv, dev, &stop_tick); > > - if (stop_tick) > + if (stop_tick || tick_nohz_tick_stopped()) > tick_nohz_idle_stop_tick(); > else > tick_nohz_idle_retain_tick(); >
Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote: > On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan wrote: > > The criteria for keeping tick running is the prediction duration is less > > than TICK_USEC, > > Yes, because if the predicted idle duration is less than the tick > period, stopping the tick is pointless overhead (if the governor > predicts a CPU wakeup within the tick period length range, it may as > well let the tick run, because the CPU will be woken up relatively > shortly anyway). > > > the mainline kernel configures HZ=250 so TICK_USEC equals > > To be precise, other values of HZ may be used too, depending on how > the kernel is configured. > > > to 4000us, so any prediction is less than 4000us will not stop tick and > > the idle state will be fixed up to one shallow state. On the other hand, > > let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has > > the deepest C-state is cluster off state which its 'target_residency' is > > 2700us, if the 'menu' governor predicts the next idle duration is any > > value fallen into the range [2700us, 4000us), then the 'menu' governor > > will keep sched tick running and and roll back to a shallow CPU off state > > rather than cluster off state. > > Which is as expected. > > > Finally we can see the CPU has much less > > chance to run into deepest state when a task repeatedly running on it > > with 5000us period and 40% duty cycle (so the task runs for 2000us and > > then sleep for 3000us in every period). In theory, we should permit the > > CPU to stay in cluster off state due the CPU sleeping time 3000us is > > over its 'target_residency' 2700us. > > For every particular choice of the criteria you can find a particular > case in which it will be suboptimal. But this patch wants to resolve a common case rather than the particular case on Arm board; this issue is possible to happen cross different platforms. I will demonstrate in below comments. [...] > > diff --git a/drivers/cpuidle/governors/menu.c > > b/drivers/cpuidle/governors/menu.c > > index 30ab759..2ce4068 100644 > > --- a/drivers/cpuidle/governors/menu.c > > +++ b/drivers/cpuidle/governors/menu.c > > @@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, > > struct cpuidle_device *dev, > > unsigned int expected_interval; > > unsigned long nr_iowaiters, cpu_load; > > ktime_t delta_next; > > + unsigned int stop_tick_point; > > > > if (data->needs_update) { > > menu_update(drv, dev); > > @@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, > > struct cpuidle_device *dev, > > idx = 0; /* No states enabled. Must use 0. */ > > > > /* > > +* Decide the time point for tick stopping, if the prediction is > > before > > +* this time point it's better to keep the tick enabled and after > > the > > +* time point it means the CPU can stay in idle state for enough > > long > > +* time so should stop the tick. This point needs to consider two > > +* factors: the first one is tick period and the another factor is > > the > > +* maximum target residency. > > +* > > +* We can divide into below cases: > > +* > > +* The first case is the prediction is shorter than the maximum > > target > > +* residency and also shorter than tick period, this means the > > +* prediction isn't to use deepest idle state and it's suppose the > > CPU > > +* will be waken up within tick period, for this case we should keep > > +* the tick to be enabled; > > +* > > +* The second case is the prediction is shorter than the maximum > > target > > +* residency and longer than tick period, for this case the idle > > state > > +* selection has already based on the prediction for shallow state > > and > > +* we will expect some events can arrive later than tick to wake up > > the > > +* CPU; another thinking for this case is the CPU is likely to stay > > in > > +* the expected idle state for long while (which should be longer > > than > > +* tick period), so it's reasonable to stop the tick. > > +* > > +* The third case is the prediction is longer than the maximum > > target > > +* residency, but we
Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote: > On Fri, Aug 10, 2018 at 9:13 AM, wrote: > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote: > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan wrote: > > [cut] > > >> And that will cause the tick to be stopped unnecessarily in certain > >> situations, so why is this better? > > > > Let's see below two cases, the first one case we configure > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000 > > (4ms). > > > > Let's assume we do the testing one the same platform and have two runs, > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval > > is 3ms and deepest idle state target residency is 2ms, finally the idle > > governor will choose the deepest state and skip to calibrate to shallow > > state caused by 'expected_interval' > TICK_USEC; > > > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval > > (3ms) and deepest idle state target residency (2ms) are same with the > > Case 1; but because expected_interval < TICK_USEC so the idle governor > > will do calibration to select a shallower state. If we image on one > > platform, the deepest idle state's target residency is smaller value, > > then it has bigger gap with TICK_USEC, the deepest idle state is harder > > to be selected due 'expected_interval' can be easily hit the range > > [Deepest target residency..TICK_USEC). > > > > This patch has no any change for Case 1 and it wants to optimize for > > Case 2 so Case 2 has chance to stay in deepest idle state. I > > understand from the performance pespective, we need to avoid to stop > > tick for shallow states; on the other hand we cannot prevent CPU run > > into deepest idle state just only we want to keep the tick running, > > especially the expected interval is longer than the deepest state > > target residency. > > > > Case 1: > > Deepest idle state's target residency=2ms > > | > > V > > |> time (ms) > > ^ ^ > > | | > > TICK_USEC=1ms expected_interval=3ms > > > > > > Case 2: > > Deepest idle state's target residency = 2ms > > | > > V > > |> time (ms) > > ^ ^ > > | | > > expected_interval = 3ms TICK_USEC = 4ms > > > > > > > >> > unsigned int delta_next_us = ktime_to_us(delta_next); > >> > > >> > *stop_tick = false; > >> > -- > > Well, I don't quite agree with the approach here, then. > > As I said in the previous reply, IMO restarting the stopped tick > before leaving the loop in do_idle() is pointless overhead. It is not > necessary to do that to avoid leaving CPUs in shallow idle states for > too long (I'll send an alternative patch to fix this issue shortly). > > While you may think that pointless overhead is not a problem, I don't > quite agree with that. I disagree this patch will introduce any extra overhead. Firstly, the idle loop doesn't support restarting tick even this patch tells idle loop to restart the tick; secondly this patch is mainly to resolve issue for the CPU cannot stay in deepest state in Case 2, as a side effect it also can tell idle loop to restart the tick for case 3 in below, actually IMHO this makes sense to tell the idle loop to enable the tick but idle loop can ignore this info. Furthermore, we have another thread for the patch to always stop tick after the the tick has been stopped in the idle loop. So this patch is still valid. Case 3: Deepest idle state's target residency = 2ms | V |> time (ms) ^` | \ TICK_USEC=1ms expected_interval = 1.5ms
Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
On Fri, Aug 10, 2018 at 04:49:06PM +0800, Leo Yan wrote: > On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote: > > On Fri, Aug 10, 2018 at 9:13 AM, wrote: > > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote: > > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan wrote: > > > > [cut] > > > > >> And that will cause the tick to be stopped unnecessarily in certain > > >> situations, so why is this better? > > > > > > Let's see below two cases, the first one case we configure > > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000 > > > (4ms). > > > > > > Let's assume we do the testing one the same platform and have two runs, > > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval > > > is 3ms and deepest idle state target residency is 2ms, finally the idle > > > governor will choose the deepest state and skip to calibrate to shallow > > > state caused by 'expected_interval' > TICK_USEC; > > > > > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval > > > (3ms) and deepest idle state target residency (2ms) are same with the > > > Case 1; but because expected_interval < TICK_USEC so the idle governor > > > will do calibration to select a shallower state. If we image on one > > > platform, the deepest idle state's target residency is smaller value, > > > then it has bigger gap with TICK_USEC, the deepest idle state is harder > > > to be selected due 'expected_interval' can be easily hit the range > > > [Deepest target residency..TICK_USEC). > > > > > > This patch has no any change for Case 1 and it wants to optimize for > > > Case 2 so Case 2 has chance to stay in deepest idle state. I > > > understand from the performance pespective, we need to avoid to stop > > > tick for shallow states; on the other hand we cannot prevent CPU run > > > into deepest idle state just only we want to keep the tick running, > > > especially the expected interval is longer than the deepest state > > > target residency. > > > > > > Case 1: > > > Deepest idle state's target residency=2ms > > > | > > > V > > > |> time (ms) > > > ^ ^ > > > | | > > > TICK_USEC=1ms expected_interval=3ms > > > > > > > > > Case 2: > > > Deepest idle state's target residency = 2ms > > > | > > > V > > > |> time (ms) > > > ^ ^ > > > | | > > > expected_interval = 3ms TICK_USEC = 4ms > > > > > > > > > > > >> > unsigned int delta_next_us = ktime_to_us(delta_next); > > >> > > > >> > *stop_tick = false; > > >> > -- > > > > Well, I don't quite agree with the approach here, then. > > > > As I said in the previous reply, IMO restarting the stopped tick > > before leaving the loop in do_idle() is pointless overhead. It is not > > necessary to do that to avoid leaving CPUs in shallow idle states for > > too long (I'll send an alternative patch to fix this issue shortly). > > > > While you may think that pointless overhead is not a problem, I don't > > quite agree with that. > > I disagree this patch will introduce any extra overhead. > > Firstly, the idle loop doesn't support restarting tick even this patch > tells idle loop to restart the tick; secondly this patch is mainly to > resolve issue for the CPU cannot stay in deepest state in Case 2, as a > side effect it also can tell idle loop to restart the tick for case 3 > in below, actually IMHO this makes sense to tell the idle loop to > enable the tick but idle loop can ignore this info. > > Furthermore, we have another thread for the patch to always stop > tick after the the tick has been stopped in the idle loop. > > So this patch is still valid. Correct for Case 3 as below, actually this case will disappear if we force to set expected_interval=ktime_to_us(delta_next) in another proposaled patch. If so, this patch will have no any chance to introduce extra ticks. expected_interval Deepest idle state's = min(TICK_USEC,ktime_to_us(delta_next)) target residency = 2ms = TICK_USEC = 1ms| | | V V |> time (ms) ^ | TICK_USEC=1ms
Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states > with stopped tick) missed the case when the target residencies of > deep idle states of CPUs are above the tick boundary which may cause > the CPU to get stuck in a shallow idle state for a long time. > > Say there are two CPU idle states available: one shallow, with the > target residency much below the tick boundary and one deep, with > the target residency significantly above the tick boundary. In > that case, if the tick has been stopped already and the expected > next timer event is relatively far in the future, the governor will > assume the idle duration to be equal to TICK_USEC and it will select > the idle state for the CPU accordingly. However, that will cause the > shallow state to be selected even though it would have been more > energy-efficient to select the deep one. > > To address this issue, modify the governor to always assume idle > duration to be equal to the time till the closest timer event if > the tick is not running which will cause the selected idle states > to always match the known CPU wakeup time. > > Also make it always indicate that the tick should be stopped in > that case for consistency. > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > stopped tick) > Reported-by: Leo Yan > Signed-off-by: Rafael J. Wysocki > --- > > -> v2: Initialize first_idx properly in the stopped tick case. > > --- > drivers/cpuidle/governors/menu.c | 55 > +-- > 1 file changed, 25 insertions(+), 30 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > === > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr > { > struct menu_device *data = this_cpu_ptr(&menu_devices); > int latency_req = cpuidle_governor_latency_req(dev->cpu); > - int i; > - int first_idx; > - int idx; > + int first_idx = 0; > + int idx, i; > unsigned int interactivity_req; > unsigned int expected_interval; > unsigned long nr_iowaiters, cpu_load; > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr > /* determine the expected residency time, round up */ > data->next_timer_us = > ktime_to_us(tick_nohz_get_sleep_length(&delta_next)); > > + /* > + * If the tick is already stopped, the cost of possible short idle > + * duration misprediction is much higher, because the CPU may be stuck > + * in a shallow idle state for a long time as a result of it. In that > + * case say we might mispredict and use the known time till the closest > + * timer event for the idle state selection. > + */ > + if (tick_nohz_tick_stopped()) { > + data->predicted_us = ktime_to_us(delta_next); > + goto select; > + } > + This introduce two potential issues: - This will totally ignore the typical pattern in idle loop; I observed on the mmc driver can trigger multiple times (> 10 times) with consistent interval; but I have no strong opinion to not use next timer event for this case. - Will this break correction factors when the CPU exit from idle? data->bucket is stale value > get_iowait_load(&nr_iowaiters, &cpu_load); > data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); > > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr > expected_interval = get_typical_interval(data); > expected_interval = min(expected_interval, data->next_timer_us); > > - first_idx = 0; > if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) { > struct cpuidle_state *s = &drv->states[1]; > unsigned int polling_threshold; > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr >*/ > data->predicted_us = min(data->predicted_us, expected_interval); > > - if (tick_nohz_tick_stopped()) { > - /* > - * If the tick is already stopped, the cost of possible short > - * idle duration misprediction is much higher, because the CPU > - * may be stuck in a shallow idle state for a long time as a > - * result of it. In that case say we might mispredict and try > - * to force the CPU into a state for which
Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote: > On Fri, Aug 10, 2018 at 11:20 AM wrote: > > > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote: > > > From: Rafael J. Wysocki > > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively > > > > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states > > > with stopped tick) missed the case when the target residencies of > > > deep idle states of CPUs are above the tick boundary which may cause > > > the CPU to get stuck in a shallow idle state for a long time. > > > > > > Say there are two CPU idle states available: one shallow, with the > > > target residency much below the tick boundary and one deep, with > > > the target residency significantly above the tick boundary. In > > > that case, if the tick has been stopped already and the expected > > > next timer event is relatively far in the future, the governor will > > > assume the idle duration to be equal to TICK_USEC and it will select > > > the idle state for the CPU accordingly. However, that will cause the > > > shallow state to be selected even though it would have been more > > > energy-efficient to select the deep one. > > > > > > To address this issue, modify the governor to always assume idle > > > duration to be equal to the time till the closest timer event if > > > the tick is not running which will cause the selected idle states > > > to always match the known CPU wakeup time. > > > > > > Also make it always indicate that the tick should be stopped in > > > that case for consistency. > > > > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > > > stopped tick) > > > Reported-by: Leo Yan > > > Signed-off-by: Rafael J. Wysocki > > > --- > > > > > > -> v2: Initialize first_idx properly in the stopped tick case. > > > > > > --- > > > drivers/cpuidle/governors/menu.c | 55 > > > +-- > > > 1 file changed, 25 insertions(+), 30 deletions(-) > > > > > > Index: linux-pm/drivers/cpuidle/governors/menu.c > > > === > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > > > +++ linux-pm/drivers/cpuidle/governors/menu.c > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr > > > { > > > struct menu_device *data = this_cpu_ptr(&menu_devices); > > > int latency_req = cpuidle_governor_latency_req(dev->cpu); > > > - int i; > > > - int first_idx; > > > - int idx; > > > + int first_idx = 0; > > > + int idx, i; > > > unsigned int interactivity_req; > > > unsigned int expected_interval; > > > unsigned long nr_iowaiters, cpu_load; > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr > > > /* determine the expected residency time, round up */ > > > data->next_timer_us = > > > ktime_to_us(tick_nohz_get_sleep_length(&delta_next)); > > > > > > + /* > > > + * If the tick is already stopped, the cost of possible short idle > > > + * duration misprediction is much higher, because the CPU may be > > > stuck > > > + * in a shallow idle state for a long time as a result of it. In > > > that > > > + * case say we might mispredict and use the known time till the > > > closest > > > + * timer event for the idle state selection. > > > + */ > > > + if (tick_nohz_tick_stopped()) { > > > + data->predicted_us = ktime_to_us(delta_next); > > > + goto select; > > > + } > > > + > > > > This introduce two potential issues: > > > > - This will totally ignore the typical pattern in idle loop; I > > observed on the mmc driver can trigger multiple times (> 10 times) > > with consistent interval; > > I'm not sure what you mean by "ignore". You could see after move code from blow to this position, the typical pattern interval will not be accounted; so if in the middle of idles there have a bunch of interrupts with fix pattern, the upper code cannot detect this pattern anymore. [...] > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || > > > - expected_interval <
Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
p; > > > > > !tick_nohz_tick_stopped()) { > > > > > > > > I am not sure this logic is right... Why not use below checking, so > > > > for POLLING state we will never ask to stop the tick? > > > > > > > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING || > > > > (expected_interval < TICK_USEC && > > > > !tick_nohz_tick_stopped())) { > > > > > > > > > > The only effect of it would be setting stop_tick to false, but why > > > would that matter? > > > > Please consider below situation, not sure if this case is existed or > > not: > > > > step1: first time: enter one idle state with stopping tick; > > step2: second time: select POLLING state and tick_nohz_tick_stopped() > > is true; > > > > So in step2, it cannot set stop_tick to false with below sentence. > > > > > > > unsigned int delta_next_us = ktime_to_us(delta_next); > > > > > > > > > > *stop_tick = false; > > But setting *stop_tick has no effect as far as the current code is > concerned (up to the bug fixed by the other patch). Yeah. > Also the POLLING state can only be selected if there are no other > states matching delta_next available in that case which means that > there will be a timer to break the polling loop soon enough (and BTW > the polling has a built-in timeout too), so I don't really see a > problem here. Ah, now I understand the logic and I misunderstand the POLLING mode before; now agree with this. Sorry for noise. Thanks, Leo Yan
Re: [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
On Fri, Aug 10, 2018 at 01:15:58PM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states > with stopped tick) missed the case when the target residencies of > deep idle states of CPUs are above the tick boundary which may cause > the CPU to get stuck in a shallow idle state for a long time. > > Say there are two CPU idle states available: one shallow, with the > target residency much below the tick boundary and one deep, with > the target residency significantly above the tick boundary. In > that case, if the tick has been stopped already and the expected > next timer event is relatively far in the future, the governor will > assume the idle duration to be equal to TICK_USEC and it will select > the idle state for the CPU accordingly. However, that will cause the > shallow state to be selected even though it would have been more > energy-efficient to select the deep one. > > To address this issue, modify the governor to always assume idle > duration to be equal to the time till the closest timer event if > the tick is not running which will cause the selected idle states > to always match the known CPU wakeup time. > > Also make it always indicate that the tick should be stopped in > that case for consistency. > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > stopped tick) > Reported-by: Leo Yan > Signed-off-by: Rafael J. Wysocki > --- > > -> v2: Initialize first_idx properly in the stopped tick case. > > -> v3: Compute data->bucket before checking whether or not the tick has been >stopped already to prevent it from becoming stale. > > --- > drivers/cpuidle/governors/menu.c | 55 > +-- > 1 file changed, 25 insertions(+), 30 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > === > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr > { > struct menu_device *data = this_cpu_ptr(&menu_devices); > int latency_req = cpuidle_governor_latency_req(dev->cpu); > - int i; > - int first_idx; > - int idx; > + int first_idx = 0; > + int idx, i; > unsigned int interactivity_req; > unsigned int expected_interval; > unsigned long nr_iowaiters, cpu_load; > @@ -311,6 +310,18 @@ static int menu_select(struct cpuidle_dr > data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); > > /* > + * If the tick is already stopped, the cost of possible short idle > + * duration misprediction is much higher, because the CPU may be stuck > + * in a shallow idle state for a long time as a result of it. In that > + * case say we might mispredict and use the known time till the closest > + * timer event for the idle state selection. > + */ > + if (tick_nohz_tick_stopped()) { > + data->predicted_us = ktime_to_us(delta_next); > + goto select; > + } I tried this patch at my side, firstly just clarify this patch is okay for me, but there have other underlying issues I observed the CPU staying shallow idle state with tick stopped, so just note at here. >From my understanding, the rational for this patch is we only use the timer event as the reliable wake up source; if there have one short timer event then we can select shallow state, otherwise we also can select deepest idle state for long expired timer. This means the idle governor needs to know the reliable info for the timer event, so far I observe there at least have two issues for timer event delta value cannot be trusted. The first one issue is caused by timer cancel, I wrote one case for CPU_0 starting a hrtimer with pinned mode with short expire time and when the CPU_0 goes to sleep this short timeout timer can let idle governor selects a shallow state; at the meantime another CPU_1 will be used to try to cancel the timer, my purpose is to cheat CPU_0 so can see the CPU_0 staying in shallow state for long time; it has low percentage to cancel the timer successfully, but I do see seldomly the timer can be canceled successfully so CPU_0 will stay in idle for long time (I cannot explain why the timer cannot be canceled successfully for every time, this might be another issue?). This case is tricky, but it's possible happen in drivers with timer cancel. Another issue is caused by spurious interrupts; if we review the function tick_nohz_get_sleep_length(), it uses 'ts->idle_entrytime' to calculate tick or timer delta, s
Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
On Sun, Aug 12, 2018 at 01:12:41PM +0200, Rafael J. Wysocki wrote: > On Fri, Aug 10, 2018 at 11:03 AM wrote: > > > > On Fri, Aug 10, 2018 at 04:49:06PM +0800, Leo Yan wrote: > > > On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote: > > > > On Fri, Aug 10, 2018 at 9:13 AM, wrote: > > > > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote: > > > > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan wrote: > > > > > > > > [cut] > > > > > > > > >> And that will cause the tick to be stopped unnecessarily in certain > > > > >> situations, so why is this better? > > > > > > > > > > Let's see below two cases, the first one case we configure > > > > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000 > > > > > (4ms). > > > > > > > > > > Let's assume we do the testing one the same platform and have two > > > > > runs, > > > > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval > > > > > is 3ms and deepest idle state target residency is 2ms, finally the > > > > > idle > > > > > governor will choose the deepest state and skip to calibrate to > > > > > shallow > > > > > state caused by 'expected_interval' > TICK_USEC; > > > > > > > > > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval > > > > > (3ms) and deepest idle state target residency (2ms) are same with the > > > > > Case 1; but because expected_interval < TICK_USEC so the idle governor > > > > > will do calibration to select a shallower state. If we image on one > > > > > platform, the deepest idle state's target residency is smaller value, > > > > > then it has bigger gap with TICK_USEC, the deepest idle state is > > > > > harder > > > > > to be selected due 'expected_interval' can be easily hit the range > > > > > [Deepest target residency..TICK_USEC). > > > > > > > > > > This patch has no any change for Case 1 and it wants to optimize for > > > > > Case 2 so Case 2 has chance to stay in deepest idle state. I > > > > > understand from the performance pespective, we need to avoid to stop > > > > > tick for shallow states; on the other hand we cannot prevent CPU run > > > > > into deepest idle state just only we want to keep the tick running, > > > > > especially the expected interval is longer than the deepest state > > > > > target residency. > > > > > > > > > > Case 1: > > > > > Deepest idle state's target residency=2ms > > > > > | > > > > > V > > > > > |> time (ms) > > > > > ^ ^ > > > > > | | > > > > > TICK_USEC=1ms expected_interval=3ms > > > > > > > > > > > > > > > Case 2: > > > > > Deepest idle state's target residency = 2ms > > > > > | > > > > > V > > > > > |> time (ms) > > > > > ^ ^ > > > > > | | > > > > > expected_interval = 3ms TICK_USEC = 4ms > > > > > > > > > > > > > > > > > > > >> > unsigned int delta_next_us = > > > > >> > ktime_to_us(delta_next); > > > > >> > > > > > >> > *stop_tick = false; > > > > >> > -- > > > > > > > > Well, I don't quite agree with the approach here, then. > > > > > > > > As I said in the previous reply, IMO restarting the stopped tick > > > > before leaving the loop in do_idle() is pointless overhead. It is not > > > > necessary to do that to avoid leaving CPUs in shallow idle states for > > > > too long (I'll send an alternative patch to fix this issue shortly). > > > > > > > > While y
[PATCH v1 5/5] cpuidle: menu: Change to compare prediction with tick delta
The tick stopping decision is made by comparing the prediction with TICK_USEC, if the prediction is shorter than TICK_USEC then this means the CPU is likely waken up before the tick event so it's pointless to stop tick. In reality when make the decision, though the tick period is fixed to TICK_USEC, but the CPU is randomly entering/exiting idle states so the next tick delta is float and should be in the range [0, TICK_USEC]. This can result in wrong decision for stopping tick, e.g. if the prediction is 3ms idle duration and we compare with TICK_USEC=4000 (HZ=250), this can lead to a wrong conclusion is the tick event will be later than the prediction duration so the governor doesn't stop the tick; but in fact the tick is expired for 1ms, so the tick wakes up the CPU ahead and the CPU cannot stay in idle for 3ms as expected. Alternatively, 'data->tick_delta_us' is for the tick delta value and it's a accurate estimation for tick event coming. This patch changes to compare prediction with tick delta rather than comparing with the static tick interval. Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 566c65c..06d5942 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -300,10 +300,11 @@ static bool menu_decide_stopping_tick(struct cpuidle_driver *drv, return false; /* -* Don't stop the tick if the prediction is shorter than the -* tick period length. +* If the prediction is shorter than the next tick event, means +* the CPU will be waken up before the tick event; don't stop +* the tick. */ - if (data->predicted_us < TICK_USEC) + if (data->predicted_us < data->tick_delta_us) return false; /* -- 2.7.4
[PATCH v1 1/5] cpuidle: menu: Clean up variables usage in menu_select()
The usage for two variables 'data->predicted_us' and 'expected_interval' in menu_select() are confused, especially these two variables are assigned with each other: firstly 'data->predicted_us' is assigned to the minimum value between 'data->predicted_us' and 'expected_interval', so it presents the prediction period for taking account different factors and include consideration for expected interval; but later 'data->predicted_us' is assigned back to 'expected_interval' and from then on the function uses 'expected_interval' to select idle state; this results in 'expected_interval' has two different semantics between the top half and the bottom half of the same function. This patch is to clean up the usage of these two variables, we always use 'data->predicted_us' to present the idle duration predictions and it can be used to compare with idle state target residency or tick boundary for choosing idle state; we purely use 'expected_interval' to record the expected interval value, which is mainly for interval interrupt estimation. Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 5eb7d6f..b972db1 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -363,7 +363,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, latency_req = interactivity_req; select: - expected_interval = data->predicted_us; /* * Find the idle state with the lowest power while satisfying * our constraints. @@ -386,7 +385,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration so that the tick is retained * as long as that target residency is low enough. */ - expected_interval = drv->states[idx].target_residency; + data->predicted_us = drv->states[idx].target_residency; break; } idx = i; @@ -400,7 +399,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) { + data->predicted_us < TICK_USEC) && !tick_nohz_tick_stopped()) { unsigned int delta_next_us = ktime_to_us(delta_next); *stop_tick = false; -- 2.7.4
[PATCH v1 0/5] Improvement stopping tick decision making in 'menu' idle governor
We found the CPU cannot stay in deepest idle state as expected with running synthetic workloads with mainline kernel on Arm platform (96boards Hikey620 with octa CA53 CPUs). The main issue is the criteria for decision stopping tick; now the criteria is checking expected interval is less than TICK_USEC, but this doesn't consider the next tick detla is float due CPU randomly eneters and exits idle states; furthermore, it's stick to checking TICK_USEC as boundary for decision stopping tick, unfortunately this has hole to select a shallow state with stopping tick, so the CPU stays in shallow state for long time. This patch series is to explore more reasonable making decision for stopping tick and the most important fixing is to avoid powernightmares issue after we apply these criterias for making decisions. Patches 0001 ~ 0003 are used to refactor the variables and structures for more readable code, it also provides a function menu_decide_stopping_tick() which can be used to encapsulate the making decision logics. The last two patches are primary for improvement, patch 0004 'cpuidle: menu: Don't stay in shallow state for a long time' introduces a new criteria (it's a more strict criteria than before) for not stopping tick for shallow state cases; patch 0005 is use the dynamic tick detla to replace the static value TICK_USEC for decision if the tick is expired before or after the prediction, according this comparison we can get conclusion if need to stop tick or not. With more accurate decision for stopping tick, one immediate benefit is the CPUs have more chance to stay in deepest state, it also can avoid to run tick unnecessarily and so avoid a shallower state introduced by tick event. For the testing result in below table, we can see the result proves the improvement by better stopping tick decision making in this patch series, we run the workload generated by rt-app (a single task with period 5ms and duty cycle 1%/3%/5%/10%/20%/30%/40%), the total running time is 60s. We do statistics for all CPUs for all idle states duration, the unit is second (s), for cases (dutycycle=1%/3%/5%/10%/20%) we can see the shallow state C0/C1 duration are reduced and the time has been moved to deepest state, so the deepest state C2 duration can have improvement for ~9s to ~21s. for cases (dutycycle=30%/40%) though we can see the deepest state durations are parity between with and without patch series, but it has a minor improvement for C1 state duration by stealing C0 state duration. Some notations are used in the table: state: C0: WFI; C1: CPU OFF; C2: Cluster OFF All testing cases have single task with 5ms period: Without patches With patches Difference --- --- -- Duty cycleC0 C1 C2 C0 C1 C2C0C1 C2 1%2.397 16.528 471.905 0.9162.688 487.328 -1.481 -13.840 +15.422 3%3.957 20.541 464.434 1.5102.398 485.914 -2.447 -18.143 +21.480 5%2.866 8.609 474.777 1.1662.250 483.983 -1.699-6.359 +9.205 10%2.893 28.753 453.277 1.147 14.134 469.190 -1.745 -14.618 +15.913 20%7.620 41.086 431.735 1.595 35.055 442.482 -6.024-6.030 +10.747 30%4.394 38.328 431.442 1.964 40.857 430.973 -2.430+2.529 -0.468 40%7.390 29.415 430.914 1.789 34.832 431.588 -5.600+5.417 -0.673 P.s. for the testing, applied Rafael's patch 'cpuidle: menu: Handle stopped tick more aggressively' [1] to avoid select unexpected shallow state after tick has been stopped. [1] https://lkml.org/lkml/2018/8/10/259 Leo Yan (5): cpuidle: menu: Clean up variables usage in menu_select() cpuidle: menu: Record tick delta value in struct menu_device cpuidle: menu: Provide menu_decide_stopping_tick() cpuidle: menu: Don't stay in shallow state for a long time cpuidle: menu: Change to compare prediction with tick delta drivers/cpuidle/governors/menu.c | 104 --- 1 file changed, 76 insertions(+), 28 deletions(-) -- 2.7.4
[PATCH v1 2/5] cpuidle: menu: Record tick delta value in struct menu_device
Since the tick delta is used in multiple places in menu_select(), it's better to use single one variable to record this value; furthermore, for more readable we can refactor the code to split a separate function to making decision for stopping tick, which also needs to use tick delta value as one metric for consideration. To achieve these purposes, this patch adds a new item 'tick_delta_us' in struct menu_device to record tick delta value. This patch also is a preparation for optimization stopping tick in sequential patches. Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b972db1..83618ab 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -124,6 +124,7 @@ struct menu_device { int tick_wakeup; unsigned intnext_timer_us; + unsigned inttick_delta_us; unsigned intpredicted_us; unsigned intbucket; unsigned intcorrection_factor[BUCKETS]; @@ -305,6 +306,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, /* determine the expected residency time, round up */ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next)); + data->tick_delta_us = ktime_to_us(delta_next); get_iowait_load(&nr_iowaiters, &cpu_load); data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); @@ -317,7 +319,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * timer event for the idle state selection. */ if (tick_nohz_tick_stopped()) { - data->predicted_us = ktime_to_us(delta_next); + data->predicted_us = data->tick_delta_us; goto select; } @@ -400,11 +402,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || data->predicted_us < TICK_USEC) && !tick_nohz_tick_stopped()) { - unsigned int delta_next_us = ktime_to_us(delta_next); *stop_tick = false; - if (idx > 0 && drv->states[idx].target_residency > delta_next_us) { + if (idx > 0 && + drv->states[idx].target_residency > data->tick_delta_us) { /* * The tick is not going to be stopped and the target * residency of the state to be returned is not within @@ -417,7 +419,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, continue; idx = i; - if (drv->states[i].target_residency <= delta_next_us) + if (drv->states[i].target_residency <= + data->tick_delta_us) break; } } -- 2.7.4
[PATCH v1 4/5] cpuidle: menu: Don't stay in shallow state for a long time
To avoid staying in a shallow state for a long time, the menu governor relies on not stopping tick when detects the prediction is shorter than the tick event. This is just luckily to cover most cases but cannot say it is completely safe. For example, if the prediction is 2000us and the TICK_USEC=1000 so it's impossible to meet the condition 'data->predicted_us < TICK_USEC' and this lead to stop the tick for a shallow state; finally the CPU is possible to stay in this shallow state for very long time. This patch checks the candidate idle state isn't deepest one and find if the timer will come after more than 2 times of the maximum target residency, though the governor selects a shallow state according to prediction, due the timer is most reliable waken up source but it will come very late, so the CPU has chance to stay in the shallow state for a long time; the patch doesn't stop the tick for this case so can avoid powernightmares issue. Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 4f02207..566c65c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -284,6 +284,10 @@ static unsigned int get_typical_interval(struct menu_device *data) static bool menu_decide_stopping_tick(struct cpuidle_driver *drv, struct menu_device *data, int idx) { + int max_target_residency; + + max_target_residency = drv->states[drv->state_count-1].target_residency; + /* * If the tick has been stopped yet, force to stop it afterwards and * don't give chance to set *stop_tick to false. @@ -302,6 +306,23 @@ static bool menu_decide_stopping_tick(struct cpuidle_driver *drv, if (data->predicted_us < TICK_USEC) return false; + /* +* The candidate idle state isn't deepest one, on the other hand +* the most reliable wakeup source is timer (compare against to +* interrupts) says it will come after more than 2 times of maximum +* target residency, this means the CPU has risk to stay in shallow +* state for more than 2 times of maximum target residency. +* +* It's acceptable to stay in the shallow state at this time but we +* need to ensure to wake up the CPU by tick to check if has better +* choice. Finally it can have choice to select deeper state and +* avoid the CPU staying in shallow state for very long time and +* without any wake up event. +*/ + if (idx < drv->state_count - 1 && + data->next_timer_us > max_target_residency * 2) + return false; + /* Otherwise, let's stop the tick at this time. */ return true; } -- 2.7.4
[PATCH v1 3/5] cpuidle: menu: Provide menu_decide_stopping_tick()
This patch is only for code refactoring and without functional change. It introduces a new function menu_decide_stopping_tick(); we can use this function to focus on making stopping tick decision. With moving out stopping tick decision code, it lets the below piece code is simplized only for the idle state calibration and thus save one indent level in the loop. Signed-off-by: Leo Yan --- drivers/cpuidle/governors/menu.c | 76 ++-- 1 file changed, 50 insertions(+), 26 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 83618ab..4f02207 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -276,6 +276,37 @@ static unsigned int get_typical_interval(struct menu_device *data) } /** + * menu_decide_stopping_tick - decides if need to stopping tick + * @drv: cpuidle driver containing state data + * @data: menu_device structure pointer + * @idx: the candidate idle state index + */ +static bool menu_decide_stopping_tick(struct cpuidle_driver *drv, + struct menu_device *data, int idx) +{ + /* +* If the tick has been stopped yet, force to stop it afterwards and +* don't give chance to set *stop_tick to false. +*/ + if (tick_nohz_tick_stopped()) + return true; + + /* Don't stop the tick if the selected state is a polling one */ + if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING) + return false; + + /* +* Don't stop the tick if the prediction is shorter than the +* tick period length. +*/ + if (data->predicted_us < TICK_USEC) + return false; + + /* Otherwise, let's stop the tick at this time. */ + return true; +} + +/** * menu_select - selects the next idle state to enter * @drv: cpuidle driver containing state data * @dev: the CPU @@ -396,33 +427,26 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx == -1) idx = 0; /* No states enabled. Must use 0. */ - /* -* Don't stop the tick if the selected state is a polling one or if the -* expected idle duration is shorter than the tick period length. -*/ - if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - data->predicted_us < TICK_USEC) && !tick_nohz_tick_stopped()) { - - *stop_tick = false; + *stop_tick = menu_decide_stopping_tick(drv, data, idx); - if (idx > 0 && - drv->states[idx].target_residency > data->tick_delta_us) { - /* -* The tick is not going to be stopped and the target -* residency of the state to be returned is not within -* the time until the next timer event including the -* tick, so try to correct that. -*/ - for (i = idx - 1; i >= 0; i--) { - if (drv->states[i].disabled || - dev->states_usage[i].disable) - continue; - - idx = i; - if (drv->states[i].target_residency <= - data->tick_delta_us) - break; - } + /* Calibrate the idle state according to the tick event. */ + if (!*stop_tick && idx > 0 && + drv->states[idx].target_residency > data->tick_delta_us) { + /* +* The tick is not going to be stopped and the target +* residency of the state to be returned is not within +* the time until the next timer event including the +* tick, so try to correct that. +*/ + for (i = idx - 1; i >= 0; i--) { + if (drv->states[i].disabled || + dev->states_usage[i].disable) + continue; + + idx = i; + if (drv->states[i].target_residency <= + data->tick_delta_us) + break; } } -- 2.7.4
Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
On Mon, Aug 13, 2018 at 10:01:20AM +0200, Rafael J. Wysocki wrote: > On Sun, Aug 12, 2018 at 6:07 PM wrote: > > > > On Sun, Aug 12, 2018 at 01:12:41PM +0200, Rafael J. Wysocki wrote: > > > On Fri, Aug 10, 2018 at 11:03 AM wrote: > > [cut] > > > > > > > Assuming shot noise wakeups, if > > > drv->states[drv->state_count-1].target_residency is less than > > > TICK_USEC, the tick will be stopped for CPUs more often on average > > > with the patch applied (simply because the idle duration range for > > > which it will not be stopped is narrower then). > > > > Yes, good point, so in the new approach I try to change the code > > to compare with next tick delta rather than TICK_USEC, it can keeps > > running tick for the tick with long expire time (similiar with > > comparing with TICK_USEC) but we also can stop tick if the tick is likely > > to break idle residency. > > We tried something similar and the results were not encouraging. > Please see > https://lore.kernel.org/lkml/079e16b6-2e04-2518-0553-66becab22...@tu-dresden.de/ I reviewed the result, in the shared page, it said to use next tick delta and results in the power data increasing, I think this can be explained. If we use prediction to compare with next tick delta rather than TICK_USEC, Thomas gave the expectation is 'This works as a I expect in the sense of stopping the tick more often and allowing deeper sleep states in some cases.'; but we also need to expect the change will give more chance for powernightmares to occurring, if the expect_interval just falls into the range [delta_next_us..TICK_USEC) then idle governor will stop tick after comparing with the next tick delta, but at the meantime the idle governor is very likely to select one shallow state for expect_interval is a small value. So though the change gives more chance for staying deeper state but it also give more chance for staying in shallow state for longer time. >From the testing report, I don't find it do statistics for idle state duration time. The email said 'No more sched tick, no more C1E requests, but increased power.', so this is just for statistics for idle state requests (entering/exiting times), but not for duration statistics. So this isn't clear for me how the difference for idle duration. Because the change gives more chance for powernightmares, so we need use extra method to avoid it. This is why I add one extra patch for this [1], it checks for shallow state with long expire timer, for this case we should not stop the tick. Actually the powernightmares issue is not completely resolved so it still impact the power data; after we really get rid of the impaction of powernightmares, I think we may have chance to see the benefits of comparing with next tick delta. Based on these ideas, I worked out the patch set 'Improvement stopping tick decision making in 'menu' idle governor' [2], the testing result supports the idle duration improvement, which shared in the cover letter. Please help review and let me know if it's doable or not. Thanks for the suggestion. Thanks, Leo Yan [1] https://lkml.org/lkml/2018/8/12/84 [2] https://lkml.org/lkml/2018/8/12/82
Re: [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states > with stopped tick) missed the case when the target residencies of > deep idle states of CPUs are above the tick boundary which may cause > the CPU to get stuck in a shallow idle state for a long time. > > Say there are two CPU idle states available: one shallow, with the > target residency much below the tick boundary and one deep, with > the target residency significantly above the tick boundary. In > that case, if the tick has been stopped already and the expected > next timer event is relatively far in the future, the governor will > assume the idle duration to be equal to TICK_USEC and it will select > the idle state for the CPU accordingly. However, that will cause the > shallow state to be selected even though it would have been more > energy-efficient to select the deep one. > > To address this issue, modify the governor to always use the time > till the closest timer event instead of the predicted idle duration > if the latter is less than the tick period length and the tick has > been stopped already. Also make it extend the search for a matching > idle state if the tick is stopped to avoid settling on a shallow > state if deep states with target residencies above the tick period > length are available. > > In addition, make it always indicate that the tick should be stopped > if it has been stopped already for consistency. > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > stopped tick) > Reported-by: Leo Yan > Signed-off-by: Rafael J. Wysocki > --- > -> v2: Initialize first_idx properly in the stopped tick case. > > v2 -> v3: Compute data->bucket before checking whether or not the tick has > been > stopped already to prevent it from becoming stale. > > v3 -> v4: Allow the usual state selection to be carried out if the tick has > been > stopped in case the predicted idle duration is greater than the tick > period length and a matching state can be found without overriding > the prediction result. > > v4 -> v5: Rework code to be more straightforward. Functionally, it should > behave like the v4. > --- > drivers/cpuidle/governors/menu.c | 36 > 1 file changed, 24 insertions(+), 12 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > === > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr >* If the tick is already stopped, the cost of possible short >* idle duration misprediction is much higher, because the CPU >* may be stuck in a shallow idle state for a long time as a > - * result of it. In that case say we might mispredict and try > - * to force the CPU into a state for which we would have stopped > - * the tick, unless a timer is going to expire really soon > - * anyway. > + * result of it. In that case say we might mispredict and use > + * the known time till the closest timer event for the idle > + * state selection. >*/ > if (data->predicted_us < TICK_USEC) > - data->predicted_us = min_t(unsigned int, TICK_USEC, > -ktime_to_us(delta_next)); > + data->predicted_us = ktime_to_us(delta_next); > } else { > /* >* Use the performance multiplier and the user-configurable > @@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr > continue; > if (idx == -1) > idx = i; /* first enabled state */ > - if (s->target_residency > data->predicted_us) > - break; > + if (s->target_residency > data->predicted_us) { > + if (!tick_nohz_tick_stopped()) > + break; > + > + /* > + * If the state selected so far is shallow and this > + * state's target residency matches the time till the > + * closest timer event, select this one to avoid getting > + * stuck in the shallow one for too long. > + */ > + if (drv->states[idx].
Re: [PATCH] cpuidle: menu: Retain tick when shallow state is selected
On Tue, Aug 21, 2018 at 10:44:10AM +0200, Rafael J . Wysocki wrote: > From: Rafael J. Wysocki > > The case addressed by commit 5ef499cd571c (cpuidle: menu: Handle > stopped tick more aggressively) in the stopped tick case is present > when the tick has not been stopped yet too. Namely, if only two CPU > idle states, shallow state A with target residency significantly > below the tick boundary and deep state B with target residency > significantly above it, are available and the predicted idle > duration is above the tick boundary, but below the target residency > of state B, state A will be selected and the CPU may spend indefinite > amount of time in it, which is not quite energy-efficient. > > However, if the tick has not been stopped yet and the governor is > about to select a shallow idle state for the CPU even though the idle > duration predicted by it is above the tick boundary, it should be > fine to wake up the CPU early, so the tick can be retained then and > the governor will have a chance to select a deeper state when it runs > next time. > > [Note that when this really happens, it will make the idle duration > predictor believe that the CPU might be idle longer than predicted, > which will make it more likely to predict longer idle durations going > forward, but that will also cause deeper idle states to be selected > going forward, on average, which is what's needed here.] > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > stopped tick) > Reported-by: Leo Yan > Signed-off-by: Rafael J. Wysocki > --- > > Commit 5ef499cd571c (cpuidle: menu: Handle stopped tick more aggressively) is > in linux-next only at this point. > > --- > drivers/cpuidle/governors/menu.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > === > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -379,9 +379,20 @@ static int menu_select(struct cpuidle_dr > if (idx == -1) > idx = i; /* first enabled state */ > if (s->target_residency > data->predicted_us) { > - if (!tick_nohz_tick_stopped()) > + if (data->predicted_us < TICK_USEC) With this change, for tick has been stopped, it might introduce regression to select a shallow state and it's conflict with the effect of patch "cpuidle: menu: Handle stopped tick more aggressively". > break; > > + if (!tick_nohz_tick_stopped()) { > + /* > + * If the state selected so far is shallow, > + * waking up early won't hurt, so retain the > + * tick in that case and let the governor run > + * again in the next iteration of the loop. > + */ > + expected_interval = > drv->states[idx].target_residency; > + break; > + } > + This is reliable, how we can rely on a shallow idle state target residency to decide if need to stop a tick or not? > /* >* If the state selected so far is shallow and this >* state's target residency matches the time till the >
Re: [PATCH] cpuidle: menu: Retain tick when shallow state is selected
On Wed, Aug 22, 2018 at 08:02:00PM +0800, Leo Yan wrote: [...] > > + if (!tick_nohz_tick_stopped()) { > > + /* > > +* If the state selected so far is shallow, > > +* waking up early won't hurt, so retain the > > +* tick in that case and let the governor run > > +* again in the next iteration of the loop. > > +*/ > > + expected_interval = > > drv->states[idx].target_residency; > > + break; > > + } > > + > > This is reliable, how we can rely on a shallow idle state target > residency to decide if need to stop a tick or not? s/This is reliable/This isn't reliable > > > /* > > * If the state selected so far is shallow and this > > * state's target residency matches the time till the > >
Re: [PATCH v2 0/6] perf cs-etm: Fix tracing packet handling and minor refactoring
Hi Arnaldo, On Wed, Jul 11, 2018 at 03:45:39PM +0800, Leo Yan wrote: Just want to confirm, I saw the first two patches in this serise have been merged into perf/core branch [1], but the last 4 patches are missed. Could I know if you have trouble when you apply them? Or anything need me to follow up? Otherwise you might miss them :) Thanks, Leo Yan [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core > Leo Yan (6): > perf cs-etm: Introduce invalid address macro > perf cs-etm: Bail out immediately for instruction sample failure > perf cs-etm: Fix start tracing packet handling > perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet > perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON > packet > perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++-- > tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 + > tools/perf/util/cs-etm.c| 71 > + > 3 files changed, 68 insertions(+), 14 deletions(-) > > -- > 2.7.4 >
Re: [PATCH v2 0/6] perf cs-etm: Fix tracing packet handling and minor refactoring
On Tue, Jul 31, 2018 at 11:24:45AM -0300, Arnaldo Carvalho de Melo wrote: > Em Mon, Jul 23, 2018 at 09:35:07AM +0800, leo@linaro.org escreveu: > > Hi Arnaldo, > > > > On Wed, Jul 11, 2018 at 03:45:39PM +0800, Leo Yan wrote: > > > > Just want to confirm, I saw the first two patches in this serise have > > been merged into perf/core branch [1], but the last 4 patches are > > missed. > > > > Could I know if you have trouble when you apply them? Or anything > > need me to follow up? Otherwise you might miss them :) > > Just this reminder was enough, checked that it has Mathieu's reviewed-by > tags, applied, will go thru my cross build containers, hopefully all > will test build ok and then it'll go to Ingo's direction. Cool! Thanks a lot, Arnaldo.
Re: [PATCH v9 7/9] coresight: add support for CPU debug module
On Thu, May 11, 2017 at 11:12:32AM -0600, Mathieu Poirier wrote: [...] > > +static int debug_probe(struct amba_device *adev, const struct amba_id *id) > > +{ > > + void __iomem *base; > > + struct device *dev = &adev->dev; > > + struct debug_drvdata *drvdata; > > + struct resource *res = &adev->res; > > + struct device_node *np = adev->dev.of_node; > > + int ret; > > + > > + drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL); > > + if (!drvdata) > > + return -ENOMEM; > > + > > + drvdata->cpu = np ? of_coresight_get_cpu(np) : 0; > > + if (per_cpu(debug_drvdata, drvdata->cpu)) { > > + dev_err(dev, "CPU%d drvdata has been initialized, " > > s/"has been"/"has already been" > > That way it really look like an error message. Will fix. > Also debug_probe() uses dev_xyz() but everywhere else in the driver it is > pr_xyz() - any specific reason for that? I suggest moving to dev_xyz(). Some logs are shared for all device instances, so I think should use pr_xyz(); will spin a new version to change other places to use dev_xyz() if the logs are for specific device instance. > Otherwise things look good. [...] Thanks, Leo Yan
[PATCH] samples/bpf: Add program for CPU state statistics
| pstate 4 : 8|| CPU 5 State: Duration(ms) Distribution cstate 0 : 95 || cstate 1 : 18377|| cstate 2 : 47609|* | pstate 0 : 1165 || pstate 1 : 243 || pstate 2 : 818 || pstate 3 : 1007 || pstate 4 : 9|| CPU 6 State: Duration(ms) Distribution cstate 0 : 102 || cstate 1 : 16629|** | cstate 2 : 49335|** | pstate 0 : 836 || pstate 1 : 253 || pstate 2 : 895 || pstate 3 : 1275 || pstate 4 : 6|| CPU 7 State: Duration(ms) Distribution cstate 0 : 88 || cstate 1 : 16070|** | cstate 2 : 50279|*** | pstate 0 : 948 || pstate 1 : 214 || pstate 2 : 873 || pstate 3 : 952 || pstate 4 : 0| | Cc: Daniel Lezcano Cc: Vincent Guittot Signed-off-by: Leo Yan --- samples/bpf/Makefile | 4 + samples/bpf/cpustat_kern.c | 281 + samples/bpf/cpustat_user.c | 234 + 3 files changed, 519 insertions(+) create mode 100644 samples/bpf/cpustat_kern.c create mode 100644 samples/bpf/cpustat_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index adeaa13..e5d747f 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map hostprogs-y += xdp_redirect_cpu hostprogs-y += xdp_monitor hostprogs-y += syscall_tp +hostprogs-y += cpustat # Libbpf dependencies LIBBPF := ../../tools/lib/bpf/bpf.o @@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o +cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -137,6 +139,7 @@ always += xdp_redirect_map_kern.o always += xdp_redirect_cpu_kern.o always += xdp_monitor_kern.o always += syscall_tp_kern.o +always += cpustat_kern.o HOSTCFLAGS += -I$(objtree)/usr/include HOSTCFLAGS += -I$(srctree)/tools/lib/ @@ -179,6 +182,7 @@ HOSTLOADLIBES_xdp_redirect_map += -lelf HOSTLOADLIBES_xdp_redirect_cpu += -lelf HOSTLOADLIBES_xdp_monitor += -lelf HOSTLOADLIBES_syscall_tp += -lelf +HOSTLOADLIBES_cpustat += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/cpustat_kern.c b/samples/bpf/cpustat_kern.c new file mode 100644 index 000..68c84da --- /dev/null +++ b/samples/bpf/cpustat_kern.c @@ -0,0 +1,281 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include "bpf_helpers.h" + +/* + * The CPU number, cstate number and pstate number are based + * on 96boards Hikey with octa CA53 CPUs. + * + * Every CPU have three idle states for cstate: + * WFI, CPU_OFF, CLUSTER_OFF + * + * Every CPU have 5 operating points: + * 208MHz, 432MHz, 729MHz, 960MHz, 1200MHz + * + * This code is based on these assumption and other platforms + * need to adjust these definitions. + */ +#define MAX_CPU8 +#define MAX_PSTATE_ENTRIES 5 +#define MAX_CSTATE_ENTRIES 3 + +static int cpu_opps[] = { 208000, 432000, 729000, 96, 120 }; + +/* + * my_map structure is used to record cstate and pstate index and + * timestamp (Idx, Ts), when new event incoming we need to update + * combination for new state index and timestamp (Idx`, Ts`). + * + * Based on (Idx, Ts) and (Idx`, Ts`) we can calculate the time + * interval for the previous state: Duration(Idx) = Ts` - Ts. + * + * Every CPU has one below array for recording state index and + * timestamp, and record for cstate and pstate saperately: + * + * +--
Re: [PATCH] samples/bpf: Add program for CPU state statistics
Hi Jesper, On Wed, Jan 31, 2018 at 10:14:27AM +0100, Jesper Dangaard Brouer wrote: > On Wed, 31 Jan 2018 02:29:59 +0800 > Leo Yan wrote: > > > CPU 0 > > State: Duration(ms) Distribution > > cstate 0 : 47555|* | > > cstate 1 : 0|| > > cstate 2 : 0|| > > pstate 0 : 15239|* | > > pstate 1 : 1521 || > > pstate 2 : 3188 |* | > > pstate 3 : 1836 || > > pstate 4 : 94 || > > > > CPU 1 > > State: Duration(ms) Distribution > > cstate 0 : 87 || > > cstate 1 : 16264|** | > > cstate 2 : 50458|*** | > > pstate 0 : 832 || > > pstate 1 : 131 || > > pstate 2 : 825 || > > pstate 3 : 787 || > > pstate 4 : 4|| > > > > CPU 2 > > State: Duration(ms) Distribution > > cstate 0 : 177 || > > cstate 1 : 9363 |* | > > cstate 2 : 55835|*** | > > pstate 0 : 1468 || > > pstate 1 : 350 || > > pstate 2 : 1062 || > > pstate 3 : 1164 || > > pstate 4 : 7|| > > The output gets very long as the number of CPUs grow... > What about using the following output: > > state(ms) cstate-0 cstate-1 cstate-2 pstate-0 pstate-1 pstate-2 > pstate-3 pstate-4 > CPU-047,555 0 015,239 1,521 1,836 > 1,83694 > CPU-18716,26450,458 832 131 825 > 787 4 > CPU-2 177 9,36355,835 1,468 350 1,062 > 1,164 7 > > Look at the code samples/bpf/xdp_redirect_cpu_user.c for an examples of > howto align the columns, and the trick to get printf to pretty print > with thousands separators use %' and setlocale(LC_NUMERIC, "en_US"). Thanks a lot for suggestion. Using row/columns looks good for me, will change code for this way. > P.S. net-next and bpf-next is closed at the moment. > Next time you submit read[1] and [2], especially howto indicate which > tree (bpf vs. bpf-next) the patch is against, as this helps the > workflow of the maintainers. Yeah, will read the docs and follow up the workflow and send new patch to target next merge window. Thanks, Leo Yan > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/bpf_devel_QA.txt > [2] Documentation/networking/netdev-FAQ.txt > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v4 2/3] mailbox: Add support for Hi3660 mailbox
Hi Jassi, On Fri, Jan 05, 2018 at 09:58:57AM +0530, Jassi Brar wrote: > On Tue, Dec 19, 2017 at 4:45 PM, Kaihua Zhong wrote: > > . > > diff --git a/drivers/mailbox/hi3660-mailbox.c > > b/drivers/mailbox/hi3660-mailbox.c > > new file mode 100644 > > index 000..3ceca40 > > --- /dev/null > > +++ b/drivers/mailbox/hi3660-mailbox.c > > @@ -0,0 +1,319 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +// Copyright (c) 2017 Hisilicon Limited. > > +// Copyright (c) 2017 Linaro Limited. > > A blank here please. Will fix. > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include Will add blank. > . > > + > > +static inline struct hi3660_mbox *to_hi3660_mbox(struct mbox_controller > > *mbox) > > > inline in .c is out of fasion these days Will remove 'inline'. > > +{ > > + return container_of(mbox, struct hi3660_mbox, controller); > > +} > > + > > > > + > > +static int hi3660_mbox_startup(struct mbox_chan *chan) > > +{ > > + int ret; > > + > > + ret = hi3660_mbox_check_state(chan); > > + if (ret) > > + return ret; > > + > > + ret = hi3660_mbox_unlock(chan); > > + if (ret) > > + return ret; > > + > > + ret = hi3660_mbox_acquire_channel(chan); > > + if (ret) > > + return ret; > > + > > + return 0; > > +} > Do you not need to do "release channel" for shutdown() ? >From my testing, the driver don't need "release channel" operations. > . > > + > > +static int hi3660_mbox_send_data(struct mbox_chan *chan, void *msg) > > +{ > > + return hi3660_mbox_send(chan, msg); > > > Please directly have hi3660_mbox_send as .send_data Will fix. Thanks, Leo Yan
[PATCH v5 0/3] Add support for Hi3660 mailbox driver
Hi3660 mailbox controller is used to send message within multiple processors, MCU, HIFI, etc. This patch series is to implement an initial version for Hi3660 mailbox driver with "automatic acknowledge" mode. The patch set have been verified with Hi3660 stub clock driver, so we can send message to MCU to execute CPU frequency scaling. This is tested on 96boards Hikey960. Changes from v4: * According to Jassi suggestion, refactored mailbox driver and removed "inline" for function declaration; Changes from v3: * According to Jassi suggestion, refined structure name to "struct hi3660_chan_info"; * According to Jassi suggestion, moved channel 'lock'+'acquire' operations into .startup(); Changes from v2: * According to Mark Rutland suggestions, removed sev()/wfe() from driver, the system has no two masters sharing the same channel for data transferring so we don't need these instructions; * Refined DT binding and doc according to Rob suggestions; * Refined driver according to Julien suggestions; Changes from v1: * Added cover letter to track the changelog; * Added document for DT binding; * Refactored and simplized mailbox driver with "automatic ack" mode; * Refined commit logs for patches; Kaihua Zhong (2): mailbox: Add support for Hi3660 mailbox dts: arm64: Add mailbox binding for hi3660 Leo Yan (1): dt-bindings: mailbox: Introduce Hi3660 controller binding .../bindings/mailbox/hisilicon,hi3660-mailbox.txt | 51 arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 8 + drivers/mailbox/Kconfig| 8 + drivers/mailbox/Makefile | 2 + drivers/mailbox/hi3660-mailbox.c | 316 + 5 files changed, 385 insertions(+) create mode 100644 Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt create mode 100644 drivers/mailbox/hi3660-mailbox.c -- 1.9.1
[PATCH v5 1/3] dt-bindings: mailbox: Introduce Hi3660 controller binding
Introduce a binding for the Hi3660 mailbox controller, the mailbox is used within application processor (AP), communication processor (CP), HIFI and MCU, etc. Acked-by: Rob Herring Signed-off-by: Leo Yan --- .../bindings/mailbox/hisilicon,hi3660-mailbox.txt | 51 ++ 1 file changed, 51 insertions(+) create mode 100644 Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt diff --git a/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt b/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt new file mode 100644 index 000..3e5b453 --- /dev/null +++ b/Documentation/devicetree/bindings/mailbox/hisilicon,hi3660-mailbox.txt @@ -0,0 +1,51 @@ +Hisilicon Hi3660 Mailbox Controller + +Hisilicon Hi3660 mailbox controller supports up to 32 channels. Messages +are passed between processors, including application & communication +processors, MCU, HIFI, etc. Each channel is unidirectional and accessed +by using MMIO registers; it supports maximum to 8 words message. + +Controller +-- + +Required properties: +- compatible: : Shall be "hisilicon,hi3660-mbox" +- reg: : Offset and length of the device's register set +- #mbox-cells: : Must be 3 + <&phandle channel dst_irq ack_irq> + phandle : Label name of controller + channel : Channel number + dst_irq : Remote interrupt vector + ack_irq : Local interrupt vector + +- interrupts: : Contains the two IRQ lines for mailbox. + +Example: + +mailbox: mailbox@e896b000 { + compatible = "hisilicon,hi3660-mbox"; + reg = <0x0 0xe896b000 0x0 0x1000>; + interrupts = <0x0 0xc0 0x4>, +<0x0 0xc1 0x4>; + #mbox-cells = <3>; +}; + +Client +-- + +Required properties: +- compatible : See the client docs +- mboxes : Standard property to specify a Mailbox (See ./mailbox.txt) + Cells must match 'mbox-cells' (See Controller docs above) + +Optional properties +- mbox-names : Name given to channels seen in the 'mboxes' property. + +Example: + +stub_clock: stub_clock@e896b500 { + compatible = "hisilicon,hi3660-stub-clk"; + reg = <0x0 0xe896b500 0x0 0x0100>; + #clock-cells = <1>; + mboxes = <&mailbox 13 3 0>; +}; -- 1.9.1
[PATCH v5 2/3] mailbox: Add support for Hi3660 mailbox
From: Kaihua Zhong Hi3660 mailbox controller is used to send message within multiple processors, MCU, HIFI, etc. It supports 32 mailbox channels and every channel can only be used for single transferring direction. Once the channel is enabled, it needs to specify the destination interrupt and acknowledge interrupt, these two interrupt vectors are used to create the connection between the mailbox and interrupt controllers. The data transferring supports two modes, one is named as "automatic acknowledge" mode so after send message the kernel doesn't need to wait for acknowledge from remote and directly return; there have another mode is to rely on handling interrupt for acknowledge. This commit is for initial version driver, which only supports "automatic acknowledge" mode to support CPU clock, which is the only one consumer to use mailbox and has been verified. Later may enhance this driver for interrupt mode (e.g. for supporting HIFI). Signed-off-by: Leo Yan Signed-off-by: Ruyi Wang Signed-off-by: Kaihua Zhong --- drivers/mailbox/Kconfig | 8 + drivers/mailbox/Makefile | 2 + drivers/mailbox/hi3660-mailbox.c | 316 +++ 3 files changed, 326 insertions(+) create mode 100644 drivers/mailbox/hi3660-mailbox.c diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index ba2f152..de8390d4 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -108,6 +108,14 @@ config TI_MESSAGE_MANAGER multiple processors within the SoC. Select this driver if your platform has support for the hardware block. +config HI3660_MBOX + tristate "Hi3660 Mailbox" + depends on ARCH_HISI && OF + help + An implementation of the hi3660 mailbox. It is used to send message + between application processors and other processors/MCU/DSP. Select + Y here if you want to use Hi3660 mailbox controller. + config HI6220_MBOX tristate "Hi6220 Mailbox" depends on ARCH_HISI diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile index 4896f8d..cc23c3a 100644 --- a/drivers/mailbox/Makefile +++ b/drivers/mailbox/Makefile @@ -27,6 +27,8 @@ obj-$(CONFIG_TI_MESSAGE_MANAGER) += ti-msgmgr.o obj-$(CONFIG_XGENE_SLIMPRO_MBOX) += mailbox-xgene-slimpro.o +obj-$(CONFIG_HI3660_MBOX) += hi3660-mailbox.o + obj-$(CONFIG_HI6220_MBOX) += hi6220-mailbox.o obj-$(CONFIG_BCM_PDC_MBOX) += bcm-pdc-mailbox.o diff --git a/drivers/mailbox/hi3660-mailbox.c b/drivers/mailbox/hi3660-mailbox.c new file mode 100644 index 000..d6b1600 --- /dev/null +++ b/drivers/mailbox/hi3660-mailbox.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2017 Hisilicon Limited. +// Copyright (c) 2017 Linaro Limited. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mailbox.h" + +#define MBOX_CHAN_MAX 32 + +#define MBOX_RX0x0 +#define MBOX_TX0x1 + +#define MBOX_BASE(mbox, ch)((mbox)->base + ((ch) * 0x40)) +#define MBOX_SRC_REG 0x00 +#define MBOX_DST_REG 0x04 +#define MBOX_DCLR_REG 0x08 +#define MBOX_DSTAT_REG 0x0c +#define MBOX_MODE_REG 0x10 +#define MBOX_IMASK_REG 0x14 +#define MBOX_ICLR_REG 0x18 +#define MBOX_SEND_REG 0x1c +#define MBOX_DATA_REG 0x20 + +#define MBOX_IPC_LOCK_REG 0xa00 +#define MBOX_IPC_UNLOCK0x1acce551 + +#define MBOX_AUTOMATIC_ACK 1 + +#define MBOX_STATE_IDLEBIT(4) +#define MBOX_STATE_ACK BIT(7) + +#define MBOX_MSG_LEN 8 + +/** + * Hi3660 mailbox channel information + * + * A channel can be used for TX or RX, it can trigger remote + * processor interrupt to notify remote processor and can receive + * interrupt if has incoming message. + * + * @dst_irq: Interrupt vector for remote processor + * @ack_irq: Interrupt vector for local processor + */ +struct hi3660_chan_info { + unsigned int dst_irq; + unsigned int ack_irq; +}; + +/** + * Hi3660 mailbox controller data + * + * Mailbox controller includes 32 channels and can allocate + * channel for message transferring. + * + * @dev: Device to which it is attached + * @base: Base address of the register mapping region + * @chan: Representation of channels in mailbox controller + * @mchan: Representation of channel info + * @controller:Representation of a communication channel controller + */ +struct hi3660_mbox { + struct device *dev; + void __iomem *base; + struct mbox_chan chan[MBOX_CHAN_MAX]; + struct hi3660_chan_info mchan[MBOX_CHAN_MAX]; + s
[PATCH v5 3/3] dts: arm64: Add mailbox binding for hi3660
From: Kaihua Zhong Add DT binding for mailbox driver. Signed-off-by: Ruyi Wang Signed-off-by: Kaihua Zhong --- arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi index 22570c3..1ef7b94 100644 --- a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi @@ -286,6 +286,14 @@ #reset-cells = <2>; }; + mailbox: mailbox@e896b000 { + compatible = "hisilicon,hi3660-mbox"; + reg = <0x0 0xe896b000 0x0 0x1000>; + interrupts = , +; + #mbox-cells = <3>; + }; + stub_clock: stub_clock@e896b500 { compatible = "hisilicon,hi3660-stub-clk"; reg = <0x0 0xe896b500 0x0 0x0100>; -- 1.9.1
[PATCH] arm64: dts: Hi3660: Remove 'CPU_NAP' idle state
Thanks a lot for Vincent Guittot careful work to find bug for 'CPU_NAP' idle state. At early time, the CPU CA73 CPU_NAP idle state has been supported on Hikey960. Later we found the system has the hang issue and for resolving this issue Hisilicon released new MCU firmware, but unfortunately the new MCU firmware has side effect and results in the CA73 CPU cannot really enter CPU_NAP state and roll back to WFI state. After discussion we cannot see the possibility to enable CA73 CPU_NAP state anymore on Hikey960, based on this conclusion we should remove this state from DT binding. Cc: Daniel Lezcano Cc: Kevin Wang Cc: Vincent Guittot Signed-off-by: Leo Yan --- arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 32 --- 1 file changed, 4 insertions(+), 28 deletions(-) diff --git a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi index ab0b95b..8fddf0d 100644 --- a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi @@ -96,11 +96,7 @@ reg = <0x0 0x100>; enable-method = "psci"; next-level-cache = <&A73_L2>; - cpu-idle-states = < - &CPU_NAP - &CPU_SLEEP - &CLUSTER_SLEEP_1 - >; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP_1>; }; cpu5: cpu@101 { @@ -109,11 +105,7 @@ reg = <0x0 0x101>; enable-method = "psci"; next-level-cache = <&A73_L2>; - cpu-idle-states = < - &CPU_NAP - &CPU_SLEEP - &CLUSTER_SLEEP_1 - >; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP_1>; }; cpu6: cpu@102 { @@ -122,11 +114,7 @@ reg = <0x0 0x102>; enable-method = "psci"; next-level-cache = <&A73_L2>; - cpu-idle-states = < - &CPU_NAP - &CPU_SLEEP - &CLUSTER_SLEEP_1 - >; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP_1>; }; cpu7: cpu@103 { @@ -135,24 +123,12 @@ reg = <0x0 0x103>; enable-method = "psci"; next-level-cache = <&A73_L2>; - cpu-idle-states = < - &CPU_NAP - &CPU_SLEEP - &CLUSTER_SLEEP_1 - >; + cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP_1>; }; idle-states { entry-method = "psci"; - CPU_NAP: cpu-nap { - compatible = "arm,idle-state"; - arm,psci-suspend-param = <0x001>; - entry-latency-us = <7>; - exit-latency-us = <2>; - min-residency-us = <15>; - }; - CPU_SLEEP: cpu-sleep { compatible = "arm,idle-state"; local-timer-stop; -- 2.7.4
Re: [PATCH v3 3/6] coresight: Support panic kdump functionality
On Tue, Jan 09, 2018 at 11:41:26AM -0700, Mathieu Poirier wrote: > On Thu, Dec 21, 2017 at 04:20:12PM +0800, Leo Yan wrote: > > After kernel panic happens, coresight has many useful info can be used > > for analysis. For example, the trace info from ETB RAM can be used to > > check the CPU execution flows before crash. So we can save the tracing > > data from sink devices, and rely on kdump to save DDR content and uses > > "crash" tool to extract coresight dumping from vmcore file. > > > > This patch is to add a simple framework to support panic dump > > functionality; it registers panic notifier, and provide the general APIs > > {coresight_kdump_add|coresight_kdump_del} as helper functions so any > > coresight device can add itself into dump list or delete as needed. > > > > This driver provides helper function coresight_kdump_update() to update > > the dump buffer base address and buffer size. This function can be used > > by coresight driver, e.g. it can be used to save ETM meta data info at > > runtime and these info can be prepared pre panic happening. > > > > When kernel panic happens, the notifier iterates dump list and calls > > callback function to dump device specific info. The panic dump is > > mainly used to dump trace data so we can get to know the execution flow > > before the panic happens. > > > > Signed-off-by: Leo Yan > > --- > > drivers/hwtracing/coresight/Kconfig| 9 ++ > > drivers/hwtracing/coresight/Makefile | 1 + > > .../hwtracing/coresight/coresight-panic-kdump.c| 154 > > + > > drivers/hwtracing/coresight/coresight-priv.h | 13 ++ > > include/linux/coresight.h | 7 + > > 5 files changed, 184 insertions(+) > > create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c > > > > diff --git a/drivers/hwtracing/coresight/Kconfig > > b/drivers/hwtracing/coresight/Kconfig > > index ef9cb3c..4812529 100644 > > --- a/drivers/hwtracing/coresight/Kconfig > > +++ b/drivers/hwtracing/coresight/Kconfig > > @@ -103,4 +103,13 @@ config CORESIGHT_CPU_DEBUG > > properly, please refer Documentation/trace/coresight-cpu-debug.txt > > for detailed description and the example for usage. > > > > +config CORESIGHT_PANIC_KDUMP > > + bool "CoreSight Panic Kdump driver" > > + depends on ARM || ARM64 > > At this time only ETMv4 supports the feature, so it is only ARM64. Thanks for reviewing, Mathieu. Will change to only for ARM64. > > + help > > + This driver provides panic kdump functionality for CoreSight > > + devices. When a kernel panic happen a device supplied callback > > function > > + is used to save trace data to memory. From there we rely on kdump to > > extract > > + the trace data from kernel dump file. > > + > > endif > > diff --git a/drivers/hwtracing/coresight/Makefile > > b/drivers/hwtracing/coresight/Makefile > > index 61db9dd..946fe19 100644 > > --- a/drivers/hwtracing/coresight/Makefile > > +++ b/drivers/hwtracing/coresight/Makefile > > @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o > > \ > > obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += > > coresight-dynamic-replicator.o > > obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o > > obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o > > +obj-$(CONFIG_CORESIGHT_PANIC_KDUMP) += coresight-panic-kdump.o > > diff --git a/drivers/hwtracing/coresight/coresight-panic-kdump.c > > b/drivers/hwtracing/coresight/coresight-panic-kdump.c > > new file mode 100644 > > index 000..c21d20b > > --- /dev/null > > +++ b/drivers/hwtracing/coresight/coresight-panic-kdump.c > > @@ -0,0 +1,154 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +// Copyright (c) 2017 Linaro Limited. > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "coresight-priv.h" > > + > > +typedef void (*coresight_cb_t)(void *data); > > + > > +/** > > + * struct coresight_kdump_node - Node information for dump > > + * @cpu: The cpu this node is affined to. > > + * @csdev: Handler for coresight device. > > + * @buf: Pointer for dump buffer. > > + * @buf_size: Length of dump buffer. > > + * @list: Hook to the list. > > + */ > > +struct coresight_kdump_no