Re: [RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-08-12 Thread Dietmar Eggemann
On 12/08/15 11:33, Peter Zijlstra wrote:
> On Tue, Jul 07, 2015 at 07:24:01PM +0100, Morten Rasmussen wrote:
>> +static struct capacity_state cap_states_cluster_a7[] = {
>> +/* Cluster only power */
>> + { .cap =  150, .power = 2967, }, /*  350 MHz */
>> + { .cap =  172, .power = 2792, }, /*  400 MHz */
>> + { .cap =  215, .power = 2810, }, /*  500 MHz */
>> + { .cap =  258, .power = 2815, }, /*  600 MHz */
>> + { .cap =  301, .power = 2919, }, /*  700 MHz */
>> + { .cap =  344, .power = 2847, }, /*  800 MHz */
>> + { .cap =  387, .power = 3917, }, /*  900 MHz */
>> + { .cap =  430, .power = 4905, }, /* 1000 MHz */
>> +};
> 
> So can I suggest a SCHED_DEBUG validation of the data provided?

Yes we can do that.

> 
> Given the above table, it _never_ makes sense to run at .cap=150, it
> equally also doesn't make sense to run at .cap = 301.
>

Absolutely right.


> So please add a SCHED_DEBUG test on domain creation that validates that
> not only is the .cap monotonically increasing, but the .power is too.

The requirement for current EAS code to work is even higher. We're not
only requiring monotonically increasing values for .cap and .power but
that the energy efficiency (.cap/.power) is monotonically decreasing.
Otherwise we can't stop the search for a new appropriate OPP in
find_new_capacity() in case .cap >= current 'max. group usage' because
we can't assume that this OPP will be the most energy efficient one.

For the example above we get .cap/.power = [0.05 0.06 0.08 0.09 0.1 0.12
0.1 0.09] so only the last 3 OPPs [800, 900, 1000 Mhz] make sense from
this perspective on our TC2 test chip platform.

So we should check for monotonically decreasing (.cap/.power) values.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-08-12 Thread Peter Zijlstra
On Tue, Jul 07, 2015 at 07:24:01PM +0100, Morten Rasmussen wrote:
> +static struct capacity_state cap_states_cluster_a7[] = {
> + /* Cluster only power */
> +  { .cap =  150, .power = 2967, }, /*  350 MHz */
> +  { .cap =  172, .power = 2792, }, /*  400 MHz */
> +  { .cap =  215, .power = 2810, }, /*  500 MHz */
> +  { .cap =  258, .power = 2815, }, /*  600 MHz */
> +  { .cap =  301, .power = 2919, }, /*  700 MHz */
> +  { .cap =  344, .power = 2847, }, /*  800 MHz */
> +  { .cap =  387, .power = 3917, }, /*  900 MHz */
> +  { .cap =  430, .power = 4905, }, /* 1000 MHz */
> + };

So can I suggest a SCHED_DEBUG validation of the data provided?

Given the above table, it _never_ makes sense to run at .cap=150, it
equally also doesn't make sense to run at .cap = 301.

So please add a SCHED_DEBUG test on domain creation that validates that
not only is the .cap monotonically increasing, but the .power is too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-08-12 Thread Peter Zijlstra
On Tue, Jul 07, 2015 at 07:24:01PM +0100, Morten Rasmussen wrote:
 +static struct capacity_state cap_states_cluster_a7[] = {
 + /* Cluster only power */
 +  { .cap =  150, .power = 2967, }, /*  350 MHz */
 +  { .cap =  172, .power = 2792, }, /*  400 MHz */
 +  { .cap =  215, .power = 2810, }, /*  500 MHz */
 +  { .cap =  258, .power = 2815, }, /*  600 MHz */
 +  { .cap =  301, .power = 2919, }, /*  700 MHz */
 +  { .cap =  344, .power = 2847, }, /*  800 MHz */
 +  { .cap =  387, .power = 3917, }, /*  900 MHz */
 +  { .cap =  430, .power = 4905, }, /* 1000 MHz */
 + };

So can I suggest a SCHED_DEBUG validation of the data provided?

Given the above table, it _never_ makes sense to run at .cap=150, it
equally also doesn't make sense to run at .cap = 301.

So please add a SCHED_DEBUG test on domain creation that validates that
not only is the .cap monotonically increasing, but the .power is too.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-08-12 Thread Dietmar Eggemann
On 12/08/15 11:33, Peter Zijlstra wrote:
 On Tue, Jul 07, 2015 at 07:24:01PM +0100, Morten Rasmussen wrote:
 +static struct capacity_state cap_states_cluster_a7[] = {
 +/* Cluster only power */
 + { .cap =  150, .power = 2967, }, /*  350 MHz */
 + { .cap =  172, .power = 2792, }, /*  400 MHz */
 + { .cap =  215, .power = 2810, }, /*  500 MHz */
 + { .cap =  258, .power = 2815, }, /*  600 MHz */
 + { .cap =  301, .power = 2919, }, /*  700 MHz */
 + { .cap =  344, .power = 2847, }, /*  800 MHz */
 + { .cap =  387, .power = 3917, }, /*  900 MHz */
 + { .cap =  430, .power = 4905, }, /* 1000 MHz */
 +};
 
 So can I suggest a SCHED_DEBUG validation of the data provided?

Yes we can do that.

 
 Given the above table, it _never_ makes sense to run at .cap=150, it
 equally also doesn't make sense to run at .cap = 301.


Absolutely right.


 So please add a SCHED_DEBUG test on domain creation that validates that
 not only is the .cap monotonically increasing, but the .power is too.

The requirement for current EAS code to work is even higher. We're not
only requiring monotonically increasing values for .cap and .power but
that the energy efficiency (.cap/.power) is monotonically decreasing.
Otherwise we can't stop the search for a new appropriate OPP in
find_new_capacity() in case .cap = current 'max. group usage' because
we can't assume that this OPP will be the most energy efficient one.

For the example above we get .cap/.power = [0.05 0.06 0.08 0.09 0.1 0.12
0.1 0.09] so only the last 3 OPPs [800, 900, 1000 Mhz] make sense from
this perspective on our TC2 test chip platform.

So we should check for monotonically decreasing (.cap/.power) values.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-07-07 Thread Morten Rasmussen
From: Dietmar Eggemann 

This patch is only here to be able to test provisioning of energy related
data from an arch topology shim layer to the scheduler. Since there is no
code today which deals with extracting energy related data from the dtb or
acpi, and process it in the topology shim layer, the content of the
sched_group_energy structures as well as the idle_state and capacity_state
arrays are hard-coded here.

This patch defines the sched_group_energy structure as well as the
idle_state and capacity_state array for the cluster (relates to sched
groups (sgs) in DIE sched domain level) and for the core (relates to sgs
in MC sd level) for a Cortex A7 as well as for a Cortex A15.
It further provides related implementations of the sched_domain_energy_f
functions (cpu_cluster_energy() and cpu_core_energy()).

To be able to propagate this information from the topology shim layer to
the scheduler, the elements of the arm_topology[] table have been
provisioned with the appropriate sched_domain_energy_f functions.

cc: Russell King 

Signed-off-by: Dietmar Eggemann 
---
 arch/arm/kernel/topology.c | 118 +++--
 1 file changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b35d3e5..bbe20c7 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -274,6 +274,119 @@ void store_cpu_topology(unsigned int cpuid)
cpu_topology[cpuid].socket_id, mpidr);
 }
 
+/*
+ * ARM TC2 specific energy cost model data. There are no unit requirements for
+ * the data. Data can be normalized to any reference point, but the
+ * normalization must be consistent. That is, one bogo-joule/watt must be the
+ * same quantity for all data, but we don't care what it is.
+ */
+static struct idle_state idle_states_cluster_a7[] = {
+{ .power = 25 }, /* WFI */
+{ .power = 10 }, /* cluster-sleep-l */
+   };
+
+static struct idle_state idle_states_cluster_a15[] = {
+{ .power = 70 }, /* WFI */
+{ .power = 25 }, /* cluster-sleep-b */
+   };
+
+static struct capacity_state cap_states_cluster_a7[] = {
+   /* Cluster only power */
+{ .cap =  150, .power = 2967, }, /*  350 MHz */
+{ .cap =  172, .power = 2792, }, /*  400 MHz */
+{ .cap =  215, .power = 2810, }, /*  500 MHz */
+{ .cap =  258, .power = 2815, }, /*  600 MHz */
+{ .cap =  301, .power = 2919, }, /*  700 MHz */
+{ .cap =  344, .power = 2847, }, /*  800 MHz */
+{ .cap =  387, .power = 3917, }, /*  900 MHz */
+{ .cap =  430, .power = 4905, }, /* 1000 MHz */
+   };
+
+static struct capacity_state cap_states_cluster_a15[] = {
+   /* Cluster only power */
+{ .cap =  426, .power =  7920, }, /*  500 MHz */
+{ .cap =  512, .power =  8165, }, /*  600 MHz */
+{ .cap =  597, .power =  8172, }, /*  700 MHz */
+{ .cap =  682, .power =  8195, }, /*  800 MHz */
+{ .cap =  768, .power =  8265, }, /*  900 MHz */
+{ .cap =  853, .power =  8446, }, /* 1000 MHz */
+{ .cap =  938, .power = 11426, }, /* 1100 MHz */
+{ .cap = 1024, .power = 15200, }, /* 1200 MHz */
+   };
+
+static struct sched_group_energy energy_cluster_a7 = {
+ .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a7),
+ .idle_states= idle_states_cluster_a7,
+ .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a7),
+ .cap_states = cap_states_cluster_a7,
+};
+
+static struct sched_group_energy energy_cluster_a15 = {
+ .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a15),
+ .idle_states= idle_states_cluster_a15,
+ .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a15),
+ .cap_states = cap_states_cluster_a15,
+};
+
+static struct idle_state idle_states_core_a7[] = {
+{ .power = 0 }, /* WFI */
+   };
+
+static struct idle_state idle_states_core_a15[] = {
+{ .power = 0 }, /* WFI */
+   };
+
+static struct capacity_state cap_states_core_a7[] = {
+   /* Power per cpu */
+{ .cap =  150, .power =  187, }, /*  350 MHz */
+{ .cap =  172, .power =  275, }, /*  400 MHz */
+{ .cap =  215, .power =  334, }, /*  500 MHz */
+{ .cap =  258, .power =  407, }, /*  600 MHz */
+{ .cap =  301, .power =  447, }, /*  700 MHz */
+{ .cap =  344, .power =  549, }, /*  800 MHz */
+{ .cap =  387, .power =  761, }, /*  900 MHz */
+{ .cap =  430, .power = 1024, }, /* 1000 MHz */
+   };
+
+static struct capacity_state cap_states_core_a15[] = {
+   /* Power per cpu */
+{ .cap =  426, .power = 2021, }, /*  500 MHz */
+{ .cap =  512, .power = 2312, }, /*  600 MHz */
+{ .cap =  597, .power = 2756, }, /*  700 MHz */
+{ .cap =  682, .power = 3125, }, /*  800 MHz */
+{ .cap =  768, .power = 3524, }, /*  900 MHz */
+{ .cap =  853, .power = 3846, }, /* 

[RFCv5 PATCH 18/46] arm: topology: Define TC2 energy and provide it to the scheduler

2015-07-07 Thread Morten Rasmussen
From: Dietmar Eggemann dietmar.eggem...@arm.com

This patch is only here to be able to test provisioning of energy related
data from an arch topology shim layer to the scheduler. Since there is no
code today which deals with extracting energy related data from the dtb or
acpi, and process it in the topology shim layer, the content of the
sched_group_energy structures as well as the idle_state and capacity_state
arrays are hard-coded here.

This patch defines the sched_group_energy structure as well as the
idle_state and capacity_state array for the cluster (relates to sched
groups (sgs) in DIE sched domain level) and for the core (relates to sgs
in MC sd level) for a Cortex A7 as well as for a Cortex A15.
It further provides related implementations of the sched_domain_energy_f
functions (cpu_cluster_energy() and cpu_core_energy()).

To be able to propagate this information from the topology shim layer to
the scheduler, the elements of the arm_topology[] table have been
provisioned with the appropriate sched_domain_energy_f functions.

cc: Russell King li...@arm.linux.org.uk

Signed-off-by: Dietmar Eggemann dietmar.eggem...@arm.com
---
 arch/arm/kernel/topology.c | 118 +++--
 1 file changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b35d3e5..bbe20c7 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -274,6 +274,119 @@ void store_cpu_topology(unsigned int cpuid)
cpu_topology[cpuid].socket_id, mpidr);
 }
 
+/*
+ * ARM TC2 specific energy cost model data. There are no unit requirements for
+ * the data. Data can be normalized to any reference point, but the
+ * normalization must be consistent. That is, one bogo-joule/watt must be the
+ * same quantity for all data, but we don't care what it is.
+ */
+static struct idle_state idle_states_cluster_a7[] = {
+{ .power = 25 }, /* WFI */
+{ .power = 10 }, /* cluster-sleep-l */
+   };
+
+static struct idle_state idle_states_cluster_a15[] = {
+{ .power = 70 }, /* WFI */
+{ .power = 25 }, /* cluster-sleep-b */
+   };
+
+static struct capacity_state cap_states_cluster_a7[] = {
+   /* Cluster only power */
+{ .cap =  150, .power = 2967, }, /*  350 MHz */
+{ .cap =  172, .power = 2792, }, /*  400 MHz */
+{ .cap =  215, .power = 2810, }, /*  500 MHz */
+{ .cap =  258, .power = 2815, }, /*  600 MHz */
+{ .cap =  301, .power = 2919, }, /*  700 MHz */
+{ .cap =  344, .power = 2847, }, /*  800 MHz */
+{ .cap =  387, .power = 3917, }, /*  900 MHz */
+{ .cap =  430, .power = 4905, }, /* 1000 MHz */
+   };
+
+static struct capacity_state cap_states_cluster_a15[] = {
+   /* Cluster only power */
+{ .cap =  426, .power =  7920, }, /*  500 MHz */
+{ .cap =  512, .power =  8165, }, /*  600 MHz */
+{ .cap =  597, .power =  8172, }, /*  700 MHz */
+{ .cap =  682, .power =  8195, }, /*  800 MHz */
+{ .cap =  768, .power =  8265, }, /*  900 MHz */
+{ .cap =  853, .power =  8446, }, /* 1000 MHz */
+{ .cap =  938, .power = 11426, }, /* 1100 MHz */
+{ .cap = 1024, .power = 15200, }, /* 1200 MHz */
+   };
+
+static struct sched_group_energy energy_cluster_a7 = {
+ .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a7),
+ .idle_states= idle_states_cluster_a7,
+ .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a7),
+ .cap_states = cap_states_cluster_a7,
+};
+
+static struct sched_group_energy energy_cluster_a15 = {
+ .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a15),
+ .idle_states= idle_states_cluster_a15,
+ .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a15),
+ .cap_states = cap_states_cluster_a15,
+};
+
+static struct idle_state idle_states_core_a7[] = {
+{ .power = 0 }, /* WFI */
+   };
+
+static struct idle_state idle_states_core_a15[] = {
+{ .power = 0 }, /* WFI */
+   };
+
+static struct capacity_state cap_states_core_a7[] = {
+   /* Power per cpu */
+{ .cap =  150, .power =  187, }, /*  350 MHz */
+{ .cap =  172, .power =  275, }, /*  400 MHz */
+{ .cap =  215, .power =  334, }, /*  500 MHz */
+{ .cap =  258, .power =  407, }, /*  600 MHz */
+{ .cap =  301, .power =  447, }, /*  700 MHz */
+{ .cap =  344, .power =  549, }, /*  800 MHz */
+{ .cap =  387, .power =  761, }, /*  900 MHz */
+{ .cap =  430, .power = 1024, }, /* 1000 MHz */
+   };
+
+static struct capacity_state cap_states_core_a15[] = {
+   /* Power per cpu */
+{ .cap =  426, .power = 2021, }, /*  500 MHz */
+{ .cap =  512, .power = 2312, }, /*  600 MHz */
+{ .cap =  597, .power = 2756, }, /*  700 MHz */
+{ .cap =  682, .power = 3125, }, /*  800 MHz */
+{ .cap =  768, .power =