from:"Gautham R. Shenoy"

[PATCH v2 1/2] cpu: Offline state Framework.

2009-08-28 Thread Gautham R Shenoy

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the hotplug states, on needs to perform a read on the sysfs tunable:
/sys/devices/system/cpu/cpunumber/available_hotplug_states

To query or set the current state for a particular CPU, one needs to
use the sysfs interface:
/sys/devices/system/cpu/cpunumber/current_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

Architectures which want to expose the multiple offline-states to the
userspace are expected to write a driver which can register
with this framework.

Such a driver should:
- Implement the callbacks defined in the structure struct cpu_offline_driver
  which can be called into by this framework when the corresponding
  sysfs interfaces are read or written into.

- Ensure that the following operation puts the CPU in the same state
  as it did in the absence of the driver.
echo 0  /sys/devices/system/cpu/cpunumber/online

This framework also serializes the writes to the current_state
with respect to with the writes to the online sysfs tunable.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 drivers/base/cpu.c  |  176 ---
 include/linux/cpu.h |   30 +
 2 files changed, 197 insertions(+), 9 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..73efc55 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,7 +20,161 @@ EXPORT_SYMBOL(cpu_sysdev_class);
 
 static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
+struct sys_device *get_cpu_sysdev(unsigned cpu)
+{
+   if (cpu  nr_cpu_ids  cpu_possible(cpu))
+   return per_cpu(cpu_sys_devices, cpu);
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(get_cpu_sysdev);
+
+
 #ifdef CONFIG_HOTPLUG_CPU
+
+struct cpu_offline_driver *cpu_offline_driver;
+static DEFINE_MUTEX(cpu_offline_driver_lock);
+
+ssize_t show_available_states(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-read_available_states(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   return ret;
+
+}
+
+ssize_t show_current_state(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret = 0;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-read_current_state(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   return ret;
+}
+
+ssize_t store_current_state(struct sys_device *dev,
+   struct sysdev_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret = count;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-write_current_state(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   if (ret = 0)
+   ret = count;
+   return ret;
+}
+
+static SYSDEV_ATTR(available_hotplug_states, 0444, show_available_states,
+   NULL);
+static SYSDEV_ATTR(current_state, 0644, show_current_state,
+   store_current_state);
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+   if (!cpu_offline_driver || !cpu_sys_dev)
+   return;
+
+   sysdev_create_file(cpu_sys_dev, attr_available_hotplug_states);
+   sysdev_create_file(cpu_sys_dev, attr_current_state);
+}
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+   if (!cpu_offline_driver || !cpu_sys_dev)
+   return;
+
+   sysdev_remove_file(cpu_sys_dev, attr_available_hotplug_states);
+   sysdev_remove_file(cpu_sys_dev, attr_current_state);
+
+}
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+   int ret = 0;
+   int cpu;
+   mutex_lock(cpu_offline_driver_lock);
+
+   if (cpu_offline_driver != NULL) {
+   ret = -EEXIST

[PATCH v2 0/2] cpu: pseries: Offline state framework.

2009-08-28 Thread Gautham R Shenoy

Hi,

This is the version 2 of the patch series to provide a cpu-offline framework
that enables the administrators choose the state the offline CPU must be put
into when multiple such states are exposed by the underlying architecture.

Version 1 of the Patch can be found here:
http://lkml.org/lkml/2009/8/6/236

The patch-series exposes the following sysfs tunables to
allow the system-adminstrator to choose the state of a CPU:

To query the available hotplug states, one needs to read the sysfs tunable:
/sys/devices/system/cpu/cpunumber/available_hotplug_states
To query or set the current state, on needs to read/write the sysfs tunable:
/sys/devices/system/cpu/cpunumber/current_states

The patchset ensures that the writes to the current_state sysfs file are
serialized against the writes to the online file.

This patchset also contains the offline state driver implemented for
pSeries. For pSeries, we define three available_hotplug_states. They are:

online: The processor is online.

deallocate: This is the the default behaviour when the cpu is offlined
even in the absense of this driver. The CPU would call make an
rtas_stop_self() call and hand over the CPU back to the resource pool,
thereby effectively deallocating that vCPU from the LPAR.
NOTE: This would result in a configuration change to the LPAR
which is visible to the outside world.

deactivate: This cedes the vCPU to the hypervisor which
in turn can put the vCPU time to the best use.
NOTE: This option DOES NOT result in a configuration change
and the vCPU would be still entitled to the LPAR to which it earlier
belong to.

Awaiting your feedback.
---

Gautham R Shenoy (2):
  cpu: Implement cpu-offline-state driver for pSeries.
  cpu: Offline state Framework.


 arch/powerpc/platforms/pseries/Makefile |2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   76 +-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c|   17 ++
 drivers/base/cpu.c  |  176 ++-
 include/linux/cpu.h |   30 
 7 files changed, 465 insertions(+), 17 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] cpu: Implement cpu-offline-state driver for pSeries.

2009-08-28 Thread Gautham R Shenoy

This patch implements the callbacks to handle the reads/writes
into the sysfs interfaces

/sys/devices/system/cpu/cpunumber/available_hotplug_states
and
/sys/devices/system/cpu/cpunumber/current_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- deallocate: This is the the default behaviour when the cpu is offlined even
  in the absense of this driver.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR. This would result in a configuration change to the
  LPAR which is visible to the outside world.

- deactivate: This cedes the vCPU to the hypervisor which in turn can put the
  vCPU time to the best use. This option does not result in a configuration
  change and the vCPU would be still entitled to the LPAR to which it earlier
  belong to.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/Makefile |2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   76 ++-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c|   17 ++
 5 files changed, 268 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 790c0b8..3a569c7 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_KEXEC)   += kexec.o
 obj-$(CONFIG_PCI)  += pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)  += msi.o
 
-obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o
+obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o offline_driver.o
 obj-$(CONFIG_MEMORY_HOTPLUG)   += hotplug-memory.o
 
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..6880a1d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include asm/pSeries_reconfig.h
 #include xics.h
 #include plpar_wrappers.h
+#include offline_driver.h
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -54,13 +55,62 @@ static void rtas_stop_self(void)
panic(Alas, I survived.\n);
 }
 
+static void cede_on_offline(void)
+{
+   unsigned int cpu = smp_processor_id();
+   unsigned int hwcpu = hard_smp_processor_id();
+
+   get_lppaca()-idle = 1;
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 1;
+
+   printk(KERN_INFO cpu %u (hwid %u) ceding for offline with hint %d\n,
+   cpu, hwcpu, cede_latency_hint);
+   while (get_preferred_offline_state(cpu) != CPU_STATE_ONLINE) {
+   cede_processor();
+   printk(KERN_INFO cpu %u (hwid %u) returned from cede.\n,
+   cpu, hwcpu);
+   }
+
+   printk(KERN_INFO cpu %u (hwid %u) got prodded to go online\n,
+   cpu, hwcpu);
+
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 0;
+   get_lppaca()-idle = 0;
+   unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+   /*
+* NOTE: Calling start_secondary() here for now to start
+* a new context.
+*
+* However, need to do it cleanly by resetting the stack
+* pointer.
+*/
+   start_secondary();
+}
+
 static void pseries_mach_cpu_die(void)
 {
+   unsigned int cpu = smp_processor_id();
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
-   unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-   rtas_stop_self();
+
+   if (get_preferred_offline_state(cpu) == CPU_DEALLOCATE) {
+
+   set_cpu_current_state(cpu, CPU_DEALLOCATE);
+   unregister_slb_shadow(hard_smp_processor_id(),
+   __pa(get_slb_shadow()));
+   rtas_stop_self();
+   goto out_bug;
+   } else if (get_preferred_offline_state(cpu) == CPU_DEACTIVATE) {
+   set_cpu_current_state(cpu, CPU_DEACTIVATE);
+   cede_on_offline();
+   }
+
+out_bug:
/* Should never get here... */
BUG();
for(;;);
@@ -112,11 +162,23 @@ static void pseries_cpu_die(unsigned int cpu)
int cpu_status;
unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-   for (tries = 0; tries  25; tries++) {
-   cpu_status = query_cpu_stopped(pcpu);
-   if (cpu_status == 0 || cpu_status == -1)
-   break

[PATCH v3 0/3] cpu: pseries: Cpu offline states framework

2009-09-15 Thread Gautham R Shenoy

Hi,

 RFC not for inclusion 

This is the version 3 of the patch series to provide a cpu-offline framework
that enables the administrators choose the state of a CPU when it is
offlined, when multiple such states are exposed by the underlying
architecture.

Changes from Version 2:(can be found here: http://lkml.org/lkml/2009/8/28/102)
- Addressed Andrew Morton's review comments regarding names of global
  variables, handling of error conditions and documentation of the interfaces.

- Implemented a patch to provide helper functions to set the cede latency
  specifier value in the VPA indicating latency expectation of the guest OS
  when the vcpu is ceded from a subsequent H_CEDE hypercall. Hypervisor may
  use this for better energy savings.

- Renamed of the cpu-hotplug states. deallocate is renamed
  as offline and deactivate is renamed as inactive.

The patch-series exposes the following sysfs tunables to
allow the system-adminstrator to choose the state of a CPU:

To query the available hotplug states, one needs to read the sysfs tunable:
/sys/devices/system/cpu/cpunumber/available_hotplug_states
To query or set the current state, on needs to read/write the sysfs tunable:
/sys/devices/system/cpu/cpunumber/current_hotplug_state

The patchset ensures that the writes to the current_hotplug_state sysfs file 
are
serialized against the writes to the online file.

This patchset contains the offline state driver implemented for
pSeries. For pSeries, we define three available_hotplug_states. They are:

online: The processor is online.

offline: This is the the default behaviour when the cpu is offlined
even in the absense of this driver. The CPU would call make an
rtas_stop_self() call and hand over the CPU back to the resource pool,
thereby effectively deallocating that vCPU from the LPAR.
NOTE: This would result in a configuration change to the LPAR
which is visible to the outside world.

inactive: This cedes the vCPU to the hypervisor with a cede latency
specifier value 2.
NOTE: This option does not result in a configuration change
and the vCPU would be still entitled to the LPAR to which it earlier
belong to.

Any feedback on the patchset will be immensely valuable.
---

Arun R Bharadwaj (1):
  pSeries: cede latency specifier helper function.

Gautham R Shenoy (2):
  cpu: Implement cpu-offline-state callbacks for pSeries.
  cpu: Offline state Framework.


 Documentation/cpu-hotplug.txt   |   22 +++
 arch/powerpc/include/asm/lppaca.h   |9 +
 arch/powerpc/platforms/pseries/Makefile |2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   88 ++-
 arch/powerpc/platforms/pseries/offline_driver.c |  148 +++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   17 ++
 arch/powerpc/platforms/pseries/smp.c|   17 ++
 arch/powerpc/xmon/xmon.c|3 
 drivers/base/cpu.c  |  181 ++-
 include/linux/cpu.h |   10 +
 11 files changed, 498 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/3] pSeries: cede latency specifier helper function.

2009-09-15 Thread Gautham R Shenoy

From: Arun R Bharadwaj a...@linux.vnet.ibm.com

This patch provides helper functions to set the cede latency specifier
value in the VPA indicating the latency expectation of the guest OS to
inform the hypervisor's choice of the platform dependent energy saving
mode chosen for the processor when unused during the subsequent
H_CEDE hypercall.

Signed-off-by: Arun R Bharadwaj a...@linux.vnet.ibm.com
Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/include/asm/lppaca.h   |9 -
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   17 +
 arch/powerpc/xmon/xmon.c|3 ++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index f78f65c..aaa0066 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
// Used to pass parms from the OS to PLIC for SetAsrAndRfid
u64 saved_gpr3; // Saved GPR3   x20-x27
u64 saved_gpr4; // Saved GPR4   x28-x2F
-   u64 saved_gpr5; // Saved GPR5   x30-x37
+   union {
+   u64 saved_gpr5; // Saved GPR5   x30-x37
+   struct {
+   u8  cede_latency_hint;  //  x30
+   u8  reserved[7];//  x31-x36
+   } fields;
+   } gpr5_dword;
+
 
u8  dtl_enable_mask;// Dispatch Trace Log mask  x38-x38
u8  donate_dedicated_cpu;   // Donate dedicated CPU cycles  x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h 
b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..1174d4b 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,28 @@ static inline long poll_pending(void)
return plpar_hcall_norets(H_POLL_PENDING);
 }
 
+static inline u8 get_cede_latency_hint(void)
+{
+   return get_lppaca()-gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+   get_lppaca()-gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
 static inline long cede_processor(void)
 {
return plpar_hcall_norets(H_CEDE);
 }
 
+static inline long extended_cede_processor(u8 latency_hint)
+{
+   set_cede_latency_hint(latency_hint);
+   cede_processor();
+
+}
+
 static inline long vpa_call(unsigned long flags, unsigned long cpu,
unsigned long vpa)
 {
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e1f33a8..a2089cd 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1613,7 +1613,8 @@ static void super_regs(void)
   ptrLpPaca-saved_srr0, ptrLpPaca-saved_srr1);
printf(Saved Gpr3=%.16lx  Saved Gpr4=%.16lx \n,
   ptrLpPaca-saved_gpr3, ptrLpPaca-saved_gpr4);
-   printf(Saved Gpr5=%.16lx \n, 
ptrLpPaca-saved_gpr5);
+   printf(Saved Gpr5=%.16lx \n,
+   ptrLpPaca-gpr5_dword.saved_gpr5);
}
 #endif
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/3] cpu: Offline state Framework.

2009-09-15 Thread Gautham R Shenoy

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the hotplug states, on needs to perform a read on:
/sys/devices/system/cpu/cpunumber/available_hotplug_states

To query or set the current state for a particular CPU, one needs to
use the sysfs interface

/sys/devices/system/cpu/cpunumber/current_hotplug_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

The architecture specific bits are expected to register the actual code which
implements the callbacks when the above mentioned sysfs interfaces are read or
written into. Thus the values provided by reading available_offline_states vary
with the architecture.

The patch provides serialization for writes to the current_hotplug_state
with respect to with the writes to the online sysfs tunable.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 Documentation/cpu-hotplug.txt |   22 +
 drivers/base/cpu.c|  181 +++--
 include/linux/cpu.h   |   10 ++
 3 files changed, 204 insertions(+), 9 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9d620c1..dcec06d 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -115,6 +115,28 @@ Just remember the critical section cannot call any
 function that can sleep or schedule this process away. The preempt_disable()
 will work as long as stop_machine_run() is used to take a cpu down.
 
+CPU-offline states
+--
+On architectures which allow the more than one valid state when
+the CPU goes offline, the system administrator can decide
+the state the CPU needs to go to when it is offlined.
+
+If the architecture has implemented a cpu-offline driver exposing these
+multiple offline states, the system administrator can use the following sysfs
+interfaces to query the available hotplug states and also query and set the
+current hotplug state for a given cpu:
+
+To query the hotplug states, on needs to perform a read on:
+/sys/devices/system/cpu/cpunumber/available_hotplug_states
+
+To query or set the current state for a particular CPU,
+one needs to use the sysfs interface
+
+/sys/devices/system/cpu/cpunumber/current_hotplug_state
+
+Writes to the online sysfs files are serialized against the writes to the
+current_hotplug_state file.
+
 CPU Hotplug - Frequently Asked Questions.
 
 Q: How to enable my kernel to support CPU hotplug?
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..00c38be 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,7 +20,166 @@ EXPORT_SYMBOL(cpu_sysdev_class);
 
 static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
+struct sys_device *get_cpu_sysdev(unsigned cpu)
+{
+   if (cpu  nr_cpu_ids  cpu_possible(cpu))
+   return per_cpu(cpu_sys_devices, cpu);
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(get_cpu_sysdev);
+
+
 #ifdef CONFIG_HOTPLUG_CPU
+
+struct cpu_offline_driver *cpu_offline_driver;
+static DEFINE_MUTEX(cpu_offline_driver_lock);
+
+ssize_t show_available_hotplug_states(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-read_available_states(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   return ret;
+
+}
+
+ssize_t show_current_hotplug_state(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret = 0;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-read_current_state(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   return ret;
+}
+
+ssize_t store_current_hotplug_state(struct sys_device *dev,
+   struct sysdev_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int cpu_num = cpu-sysdev.id;
+   ssize_t ret = count;
+
+   mutex_lock(cpu_offline_driver_lock);
+   if (!cpu_offline_driver) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   ret = cpu_offline_driver-write_current_state(cpu_num, buf);
+
+out_unlock:
+   mutex_unlock(cpu_offline_driver_lock);
+
+   if (ret = 0)
+   ret = count;
+   return ret;
+}
+
+static SYSDEV_ATTR

[PATCH v3 3/3] cpu: Implement cpu-offline-state callbacks for pSeries.

2009-09-15 Thread Gautham R Shenoy

This patch implements the callbacks to handle the reads/writes into the sysfs
interfaces

/sys/devices/system/cpu/cpunumber/available_hotplug_states
and
/sys/devices/system/cpu/cpunumber/current_hotplug_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- offline: The current behaviour when the cpu is offlined.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR.

- inactive: This is expected to cede the processor to the hypervisor with a
  latency hint specifier value. Hypervisor may use this hint to provide
  better energy savings. In this state, the control of the vCPU will continue
  to be with the LPAR.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/Makefile |2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   88 +-
 arch/powerpc/platforms/pseries/offline_driver.c |  148 +++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c|   17 +++
 5 files changed, 267 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 790c0b8..3a569c7 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_KEXEC)   += kexec.o
 obj-$(CONFIG_PCI)  += pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)  += msi.o
 
-obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o
+obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o offline_driver.o
 obj-$(CONFIG_MEMORY_HOTPLUG)   += hotplug-memory.o
 
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..1e06bb1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include asm/pSeries_reconfig.h
 #include xics.h
 #include plpar_wrappers.h
+#include offline_driver.h
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -54,13 +55,74 @@ static void rtas_stop_self(void)
panic(Alas, I survived.\n);
 }
 
+static void cede_on_offline(u8 cede_latency_hint)
+{
+   unsigned int cpu = smp_processor_id();
+   unsigned int hwcpu = hard_smp_processor_id();
+   u8 old_cede_latency_hint;
+
+   old_cede_latency_hint = get_cede_latency_hint();
+   get_lppaca()-idle = 1;
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 1;
+
+   printk(KERN_INFO cpu %u (hwid %u) ceding for offline with hint %d\n,
+   cpu, hwcpu, cede_latency_hint);
+   while (get_preferred_offline_state(cpu) != CPU_STATE_ONLINE) {
+   extended_cede_processor(cede_latency_hint);
+   printk(KERN_INFO cpu %u (hwid %u) returned from cede.\n,
+   cpu, hwcpu);
+   printk(KERN_INFO
+   Decrementer value = %x Timebase value = %llx\n,
+   get_dec(), get_tb());
+   }
+
+   printk(KERN_INFO cpu %u (hwid %u) got prodded to go online\n,
+   cpu, hwcpu);
+
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 0;
+   get_lppaca()-idle = 0;
+
+   /* Reset the cede_latency specifier value */
+   set_cede_latency_hint(old_cede_latency_hint);
+
+   unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+   /*
+* NOTE: Calling start_secondary() here for now to start
+* a new context.
+*
+* However, need to do it cleanly by resetting the stack
+* pointer.
+*/
+   start_secondary();
+}
+
 static void pseries_mach_cpu_die(void)
 {
+   unsigned int cpu = smp_processor_id();
+   u8 cede_latency_hint = 0;
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
-   unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-   rtas_stop_self();
+
+   if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+   set_cpu_current_state(cpu, CPU_STATE_OFFLINE);
+   unregister_slb_shadow(hard_smp_processor_id(),
+   __pa(get_slb_shadow()));
+   rtas_stop_self();
+   goto out_bug;
+   } else if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+   set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+   cede_latency_hint = 2;
+   cede_on_offline(cede_latency_hint);
+
+   }
+
+out_bug:
/* Should never get here

[PATCH v4 0/4] pseries: Add cede support for cpu-offline

2009-10-09 Thread Gautham R Shenoy

Hi,

This is version 4 of patch series that provides a framework to choose the
state a pseries CPU must be put to when it is offlined.

Previous versions can be found here:
Version 3: http://lkml.org/lkml/2009/9/15/164
Version 2: http://lkml.org/lkml/2009/8/28/102
Version 1: http://lkml.org/lkml/2009/8/6/236

This patch series differs considerably from the previous iterations in that:
- It is based on top of Nathan Fontenot's
  pseries kernel handling of dynamic logical partitioning v2 patches
  found here: http://lkml.org/lkml/2009/9/18/173

- It does away with the sysfs tunables available_hotplug_states and
  currrent_hotplug_state, thereby refraining from exposing any platform
  specific option to the userspace.

- It provides a framework that discovers the support for ceding the vcpu
  to the hypervisor with a latency specifier value. When such a support is
  present, it would put the offlined CPUs into a ceded state while
  setting the cede   latency specifier value in the VPA indicating
  the latency expectation of the   guest OS. Hypervisor may use this for
  better energy savings.

- On platforms which don't have support for ceding with a latency specifier
  hint, it puts the offlined CPUs into the rtas_stop_self() state.

- Currently, Nathan's patches have provided sysfs interfaces
/sys/devices/system/cpu/{probe, release}
  to dynamically allocate and deallocate resources respectively. The current
  process of allocating and deallocating CPUs would require two steps,
  namely:
- Write to the probe/release file
- Write to the online file.

  This patch-series combines these two operations such that when the user
  writes to the probe, the CPUs would not only get allocated to the
  partition but also onlined. Similarly, when the user writes to release,
  the CPUs would be offlined before they are deallocated.

  Thus, the user will now have to play with only one set of sysfs tunables
  for performing deallocate and the sysfs tunable online can be used for
  the purpose of deactivating the CPU while still letting the LPAR own it.

- It provides serializations for handling the writes the the online file
  against the writes to probe/release pair so that at any given instant,
  the user could either be deallocating the CPUs or deactivating it, but
  not both.

With this approach, I hope that the objection regarding the layering
violation that was raised by Peter for previous approaches is
addressed, while also providing clean interfaces for the userspace tools
to distinguish between the deallocation and deactivation without exposing
any platform specific details.

The patchset has been tested on the available pseries platforms and it
works as per the expectations.

Comments are very much appreciated.
---

Gautham R Shenoy (4):
  pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  pseries: Add code to online/offline CPUs of a DLPAR node.
  pSeries: Add hooks to put the CPU into an appropriate offline state
  pSeries: extended_cede_processor() helper function.


 arch/powerpc/include/asm/lppaca.h   |9 +
 arch/powerpc/platforms/pseries/dlpar.c  |  129 ++-
 arch/powerpc/platforms/pseries/hotplug-cpu.c|  157 ++-
 arch/powerpc/platforms/pseries/offline_states.h |   18 +++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   22 +++
 arch/powerpc/platforms/pseries/smp.c|   19 +++
 arch/powerpc/xmon/xmon.c|3 
 drivers/base/cpu.c  |2 
 include/linux/cpu.h |   13 ++
 9 files changed, 356 insertions(+), 16 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

-- 
Thanks and Regards
gautham.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state

2009-10-09 Thread Gautham R Shenoy

When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
  rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
  hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c|  157 ++-
 arch/powerpc/platforms/pseries/offline_states.h |   18 +++
 arch/powerpc/platforms/pseries/smp.c|   19 +++
 3 files changed, 185 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index ebff6d9..8a47db4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include asm/pSeries_reconfig.h
 #include xics.h
 #include plpar_wrappers.h
+#include offline_states.h
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -39,6 +40,37 @@ static struct rtas_args rtas_stop_self_args = {
.rets = rtas_stop_self_args.args[0],
 };
 
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+   CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+   return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+   per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+   return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+   per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+   per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
 static void rtas_stop_self(void)
 {
struct rtas_args *args = rtas_stop_self_args;
@@ -56,11 +88,66 @@ static void rtas_stop_self(void)
 
 static void pseries_mach_cpu_die(void)
 {
+   unsigned int cpu = smp_processor_id();
+   unsigned int hwcpu = hard_smp_processor_id();
+   u8 cede_latency_hint = 0;
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
-   unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-   rtas_stop_self();
+
+   if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+   set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+   cede_latency_hint = 2;
+
+   get_lppaca()-idle = 1;
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 1;
+
+   printk(KERN_INFO
+   cpu %u (hwid %u) ceding for offline with hint %d\n,
+   cpu, hwcpu, cede_latency_hint);
+   while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+   extended_cede_processor(cede_latency_hint);
+   printk(KERN_INFO cpu %u (hwid %u) returned from 
cede.\n,
+   cpu, hwcpu);
+   printk(KERN_INFO
+   Decrementer value = %x Timebase value = %llx\n,
+   get_dec(), get_tb());
+   }
+
+   printk(KERN_INFO cpu %u (hwid %u) got prodded to go online\n,
+   cpu, hwcpu);
+
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 0;
+   get_lppaca()-idle = 0;
+   }
+
+   if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
+   unregister_slb_shadow(hwcpu, __pa(get_slb_shadow

[PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node.

2009-10-09 Thread Gautham R Shenoy

Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.

- Set the indicators and update the device tree by writing to the sysfs
  tunable probe during allocation and release during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
  writing to the sysfs tunable online.

This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely probe and release.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
---
 arch/powerpc/platforms/pseries/dlpar.c |  103 
 1 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 84d156b..9752386 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -20,7 +20,7 @@
 #include linux/sysdev.h
 #include linux/sysfs.h
 #include linux/cpu.h
-
+#include offline_states.h
 
 #include asm/prom.h
 #include asm/machdep.h
@@ -357,6 +357,98 @@ int remove_device_tree_nodes(struct device_node *dn)
return rc;
 }
 
+int online_node_cpus(struct device_node *dn)
+{
+   int rc = 0;
+   unsigned int cpu;
+   int len, nthreads, i;
+   const u32 *intserv;
+
+   intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
+   if (!intserv)
+   return -EINVAL;
+
+   nthreads = len / sizeof(u32);
+
+   cpu_maps_update_begin();
+   for (i = 0; i  nthreads; i++) {
+   for_each_present_cpu(cpu) {
+   if (get_hard_smp_processor_id(cpu) != intserv[i])
+   continue;
+   BUG_ON(get_cpu_current_state(cpu)
+   != CPU_STATE_OFFLINE);
+   cpu_maps_update_done();
+   rc = cpu_up(cpu);
+   if (rc)
+   goto out;
+   cpu_maps_update_begin();
+
+   break;
+   }
+   if (cpu == num_possible_cpus())
+   printk(KERN_WARNING Could not find cpu to online 
+  with physical id 0x%x\n, intserv[i]);
+   }
+   cpu_maps_update_done();
+
+out:
+   return rc;
+
+}
+
+int offline_node_cpus(struct device_node *dn)
+{
+   int rc = 0;
+   unsigned int cpu;
+   int len, nthreads, i;
+   const u32 *intserv;
+
+   intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
+   if (!intserv)
+   return -EINVAL;
+
+   nthreads = len / sizeof(u32);
+
+   cpu_maps_update_begin();
+   for (i = 0; i  nthreads; i++) {
+   for_each_present_cpu(cpu) {
+   if (get_hard_smp_processor_id(cpu) != intserv[i])
+   continue;
+
+   if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
+   break;
+
+   if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
+   cpu_maps_update_done();
+   rc = cpu_down(cpu);
+   if (rc)
+   goto out;
+   cpu_maps_update_begin();
+   break;
+
+   }
+
+   /*
+* The cpu is in CPU_STATE_INACTIVE.
+* Upgrade it's state to CPU_STATE_OFFLINE.
+*/
+   set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
+   BUG_ON(plpar_hcall_norets(H_PROD, intserv[i])
+   != H_SUCCESS);
+   __cpu_die(cpu);
+   break;
+   }
+   if (cpu == num_possible_cpus())
+   printk(KERN_WARNING Could not find cpu to offline 
+  with physical id 0x%x\n, intserv[i]);
+   }
+   cpu_maps_update_done();
+
+out:
+   return rc;
+
+}
+
 #define DR_ENTITY_SENSE9003
 #define DR_ENTITY_PRESENT  1
 #define DR_ENTITY_UNUSABLE 2
@@ -591,6 +683,8 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
if (rc)
release_drc(drc_index);
 
+   rc = online_node_cpus(dn);
+
return rc ? rc : count;
 }
 
@@ -611,6 +705,11 @@ static ssize_t cpu_release_store(struct class *class, 
const char *buf,
return -EINVAL;
}
 
+   rc = offline_node_cpus(dn);
+
+   if (rc)
+   goto out;
+
rc = release_drc(*drc_index);
if (rc

[PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate

2009-10-09 Thread Gautham R Shenoy

Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
  information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables probe during allocation
and release during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable online.

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the online sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the probe/release sysfs
tunable with the writes to the online sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/dlpar.c |   26 ++
 drivers/base/cpu.c |2 ++
 include/linux/cpu.h|   13 +
 3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 9752386..fc261e6 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -644,6 +644,18 @@ static ssize_t memory_release_store(struct class *class, 
const char *buf,
return rc ? -1 : count;
 }
 
+static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+   mutex_lock(pseries_cpu_hotplug_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+   mutex_unlock(pseries_cpu_hotplug_mutex);
+}
+
 static ssize_t cpu_probe_store(struct class *class, const char *buf,
   size_t count)
 {
@@ -656,14 +668,15 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
if (rc)
return -EINVAL;
 
+   cpu_hotplug_driver_lock();
rc = acquire_drc(drc_index);
if (rc)
-   return rc;
+   goto out;
 
dn = configure_connector(drc_index);
if (!dn) {
release_drc(drc_index);
-   return rc;
+   goto out;
}
 
/* fixup dn name */
@@ -672,7 +685,8 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
if (!cpu_name) {
free_cc_nodes(dn);
release_drc(drc_index);
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto out;
}
 
sprintf(cpu_name, /cpus/%s, dn-full_name);
@@ -684,6 +698,8 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
release_drc(drc_index);
 
rc = online_node_cpus(dn);
+out:
+   cpu_hotplug_driver_unlock();
 
return rc ? rc : count;
 }
@@ -705,6 +721,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
return -EINVAL;
}
 
+   cpu_hotplug_driver_lock();
rc = offline_node_cpus(dn);
 
if (rc)
@@ -713,7 +730,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
-   return rc;
+   goto out;
}
 
rc = remove_device_tree_nodes(dn);
@@ -723,6 +740,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
of_node_put(dn);
 
 out:
+   cpu_hotplug_driver_unlock();
return rc ? rc : count;
 }
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..07c3f05 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, 
struct sysdev_attribut
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
ssize_t ret;
 
+   cpu_hotplug_driver_lock();
switch (buf[0]) {
case '0':
ret = cpu_down(cpu-sysdev.id);
@@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, 
struct sysdev_attribut
default:
ret = -EINVAL;
}
+   cpu_hotplug_driver_unlock();
 
if (ret = 0)
ret = count;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4753619..b0ad4e1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -115,6 +115,19 @@ extern void put_online_cpus(void);
 #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
 
+#ifdef CONFIG_PPC_PSERIES
+extern void cpu_hotplug_driver_lock(void);
+extern void cpu_hotplug_driver_unlock(void);
+#else
+static inline void cpu_hotplug_driver_lock(void)
+{
+}
+
+static inline void cpu_hotplug_driver_unlock(void)
+{
+}
+#endif
+
 #else

[PATCH v5 1/4] pSeries: extended_cede_processor() helper function.

2009-10-29 Thread Gautham R Shenoy

This patch provides an extended_cede_processor() helper function
which takes the cede latency hint as an argument. This hint is to be passed
on to the hypervisor to cede to the corresponding state on platforms
which support it.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
Signed-off-by: Arun R Bharadwaj a...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/lppaca.h   |9 -
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   22 ++
 arch/powerpc/xmon/xmon.c|3 ++-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index f78f65c..14b592d 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
// Used to pass parms from the OS to PLIC for SetAsrAndRfid
u64 saved_gpr3; // Saved GPR3   x20-x27
u64 saved_gpr4; // Saved GPR4   x28-x2F
-   u64 saved_gpr5; // Saved GPR5   x30-x37
+   union {
+   u64 saved_gpr5; /* Saved GPR5   x30-x37 */
+   struct {
+   u8  cede_latency_hint;  /*  x30 */
+   u8  reserved[7];/*  x31-x36 */
+   } fields;
+   } gpr5_dword;
+
 
u8  dtl_enable_mask;// Dispatch Trace Log mask  x38-x38
u8  donate_dedicated_cpu;   // Donate dedicated CPU cycles  x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h 
b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..0603c91 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,33 @@ static inline long poll_pending(void)
return plpar_hcall_norets(H_POLL_PENDING);
 }
 
+static inline u8 get_cede_latency_hint(void)
+{
+   return get_lppaca()-gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+   get_lppaca()-gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
 static inline long cede_processor(void)
 {
return plpar_hcall_norets(H_CEDE);
 }
 
+static inline long extended_cede_processor(unsigned long latency_hint)
+{
+   long rc;
+   u8 old_latency_hint = get_cede_latency_hint();
+
+   set_cede_latency_hint(latency_hint);
+   rc = cede_processor();
+   set_cede_latency_hint(old_latency_hint);
+
+   return rc;
+}
+
 static inline long vpa_call(unsigned long flags, unsigned long cpu,
unsigned long vpa)
 {
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c6f0a71..57124cf 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1623,7 +1623,8 @@ static void super_regs(void)
   ptrLpPaca-saved_srr0, ptrLpPaca-saved_srr1);
printf(Saved Gpr3=%.16lx  Saved Gpr4=%.16lx \n,
   ptrLpPaca-saved_gpr3, ptrLpPaca-saved_gpr4);
-   printf(Saved Gpr5=%.16lx \n, 
ptrLpPaca-saved_gpr5);
+   printf(Saved Gpr5=%.16lx \n,
+   ptrLpPaca-gpr5_dword.saved_gpr5);
}
 #endif
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state

2009-10-29 Thread Gautham R Shenoy

When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
  rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
  hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

The patch also provides a boot-time command line argument to disable/enable
CPU_STATE_INACTIVE.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
Signed-off-by: Nathan Fontenot nf...@austin.ibm.com
---
 Documentation/cpu-hotplug.txt   |6 +
 arch/powerpc/platforms/pseries/hotplug-cpu.c|  182 ++-
 arch/powerpc/platforms/pseries/offline_states.h |   18 ++
 arch/powerpc/platforms/pseries/smp.c|   19 ++
 4 files changed, 216 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9d620c1..4d4a644 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -49,6 +49,12 @@ maxcpus=nRestrict boot time cpus to n. Say if you have 4 
cpus, using
 additional_cpus=n (*)  Use this to limit hotpluggable cpus. This option sets
cpu_possible_map = cpu_present_map + additional_cpus
 
+cede_offline={off,on}  Use this option to disable/enable putting offlined
+   processors to an extended H_CEDE state on
+   supported pseries platforms.
+   If nothing is specified,
+   cede_offline is set to on.
+
 (*) Option valid only for following architectures
 - ia64
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index ebff6d9..6ea4698 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include asm/pSeries_reconfig.h
 #include xics.h
 #include plpar_wrappers.h
+#include offline_states.h
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -39,6 +40,55 @@ static struct rtas_args rtas_stop_self_args = {
.rets = rtas_stop_self_args.args[0],
 };
 
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+   CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+static int cede_offline_enabled __read_mostly = 1;
+
+/*
+ * Enable/disable cede_offline when available.
+ */
+static int __init setup_cede_offline(char *str)
+{
+   if (!strcmp(str, off))
+   cede_offline_enabled = 0;
+   else if (!strcmp(str, on))
+   cede_offline_enabled = 1;
+   else
+   return 0;
+   return 1;
+}
+
+__setup(cede_offline=, setup_cede_offline);
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+   return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+   per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+   return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+   per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+   per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
 static void rtas_stop_self(void)
 {
struct rtas_args *args = rtas_stop_self_args;
@@ -56,11 +106,61 @@ static void rtas_stop_self(void)
 
 static void pseries_mach_cpu_die(void)
 {
+   unsigned int cpu = smp_processor_id();
+   unsigned int hwcpu = hard_smp_processor_id();
+   u8 cede_latency_hint = 0;
+
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
-   unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow

[PATCH v5 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate

2009-10-29 Thread Gautham R Shenoy

Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
  information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables probe during allocation
and release during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable online.

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the online sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the probe/release sysfs
tunable with the writes to the online sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/dlpar.c |   28 +++-
 drivers/base/cpu.c |2 ++
 include/linux/cpu.h|   13 +
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 8e04a69..b6fc6ab 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -501,6 +501,18 @@ int release_drc(u32 drc_index)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+   mutex_lock(pseries_cpu_hotplug_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+   mutex_unlock(pseries_cpu_hotplug_mutex);
+}
+
 static ssize_t cpu_probe_store(struct class *class, const char *buf,
   size_t count)
 {
@@ -509,18 +521,19 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
char *cpu_name;
int rc;
 
+   cpu_hotplug_driver_lock();
rc = strict_strtoul(buf, 0, drc_index);
if (rc)
-   return -EINVAL;
+   goto out;
 
rc = acquire_drc(drc_index);
if (rc)
-   return -EINVAL;
+   goto out;
 
dn = configure_connector(drc_index);
if (!dn) {
release_drc(drc_index);
-   return -EINVAL;
+   goto out;
}
 
/* fixup dn name */
@@ -529,7 +542,8 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
if (!cpu_name) {
free_cc_nodes(dn);
release_drc(drc_index);
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto out;
}
 
sprintf(cpu_name, /cpus/%s, dn-full_name);
@@ -541,6 +555,8 @@ static ssize_t cpu_probe_store(struct class *class, const 
char *buf,
release_drc(drc_index);
 
rc = online_node_cpus(dn);
+out:
+   cpu_hotplug_driver_unlock();
 
return rc ? -EINVAL : count;
 }
@@ -562,6 +578,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
return -EINVAL;
}
 
+   cpu_hotplug_driver_lock();
rc = offline_node_cpus(dn);
 
if (rc)
@@ -570,7 +587,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
rc = release_drc(*drc_index);
if (rc) {
of_node_put(dn);
-   return -EINVAL;
+   goto out;
}
 
rc = remove_device_tree_nodes(dn);
@@ -579,6 +596,7 @@ static ssize_t cpu_release_store(struct class *class, const 
char *buf,
 
of_node_put(dn);
 out:
+   cpu_hotplug_driver_unlock();
return rc ? -EINVAL : count;
 }
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..07c3f05 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, 
struct sysdev_attribut
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
ssize_t ret;
 
+   cpu_hotplug_driver_lock();
switch (buf[0]) {
case '0':
ret = cpu_down(cpu-sysdev.id);
@@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, 
struct sysdev_attribut
default:
ret = -EINVAL;
}
+   cpu_hotplug_driver_unlock();
 
if (ret = 0)
ret = count;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4753619..b0ad4e1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -115,6 +115,19 @@ extern void put_online_cpus(void);
 #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
 
+#ifdef CONFIG_PPC_PSERIES
+extern void cpu_hotplug_driver_lock(void);
+extern void cpu_hotplug_driver_unlock(void);
+#else
+static inline void cpu_hotplug_driver_lock(void

Re: [PATCH v5 0/4] pseries: Add cede support for cpu-offline

2009-11-11 Thread Gautham R Shenoy

On Fri, Oct 30, 2009 at 10:52:43AM +0530, Gautham R Shenoy wrote:
 Hi,

Hi Peter,

Did you get a chance to look at this new design ?

 
 This is version 5 of patch series that provides a framework to choose the
 state a pseries CPU must be put to when it is offlined.
 
 Previous versions can be found here:
 Version 4: http://lkml.org/lkml/2009/10/9/59
 Version 3: http://lkml.org/lkml/2009/9/15/164
 Version 2: http://lkml.org/lkml/2009/8/28/102
 Version 1: http://lkml.org/lkml/2009/8/6/236
 
 Changes from the previous version include:
 - Rebased against Nathan Fontenot's latest pseries kernel handling of dynamic
   logical paritioning v4 patches found here:
   http://lkml.org/lkml/2009/10/21/98
 
 - Added boot-time option to disable putting the offlined vcpus into an
   extended H_CEDE state.
 
 - Addressed Ben's comments regarding the if-else sequencing in
   pseries_mach_cpu_die().
 
 - Addition of comments for pseries_cpu_die() to distinguish it from
   pseries_mach_cpu_die()
 
 Also,
 
 - This approach addresses Peter Z's objections regarding layering
   violations. The user simply offlines the cpu and doesn't worry about what
   state the CPU should be put into. That part is automatically handled by the
   kernel.
 
 - It does not add any additional sysfs interface instead uses the existing
   sysfs interface to offline CPUs.
 
 - On platforms which do not have support for ceding the vcpu with a
   latency specifier value, the offlining mechanism defaults to the current
   method of calling rtas_stop_self().
 
 The patchset has been tested on the available pseries platforms and it works
 as per the expectations. I believe that the patch set is ready for inclusion.
 ---
 
 Gautham R Shenoy (4):
   pseries: Serialize cpu hotplug operations during deactivate Vs 
 deallocate
   pseries: Add code to online/offline CPUs of a DLPAR node.
   pSeries: Add hooks to put the CPU into an appropriate offline state
   pSeries: extended_cede_processor() helper function.
 
 
  Documentation/cpu-hotplug.txt   |6 +
  arch/powerpc/include/asm/lppaca.h   |9 +
  arch/powerpc/platforms/pseries/dlpar.c  |  129 
  arch/powerpc/platforms/pseries/hotplug-cpu.c|  182 
 ++-
  arch/powerpc/platforms/pseries/offline_states.h |   18 ++
  arch/powerpc/platforms/pseries/plpar_wrappers.h |   22 +++
  arch/powerpc/platforms/pseries/smp.c|   19 ++
  arch/powerpc/xmon/xmon.c|3 
  drivers/base/cpu.c  |2 
  include/linux/cpu.h |   13 ++
  10 files changed, 387 insertions(+), 16 deletions(-)
  create mode 100644 arch/powerpc/platforms/pseries/offline_states.h
 
 -- 
 Thanks and Regards
 gautham.
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-- 
Thanks and Regards
gautham
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] pseries: Don't panic when H_PROD fails during cpu-online.

2009-12-10 Thread Gautham R Shenoy

If an online-attempt on a CPU which has been offlined using H_CEDE
with an appropriate cede latency hint fails, don't panic.

Instead print the error message and let the __cpu_up() code notify the
CPU Hotplug framework of the failure, which in turn can notify the
other subsystem through CPU_UP_CANCELED.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/smp.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 8868c01..b488663 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -144,8 +144,8 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
hcpuid = get_hard_smp_processor_id(nr);
rc = plpar_hcall_norets(H_PROD, hcpuid);
if (rc != H_SUCCESS)
-   panic(Error: Prod to wake up processor %d Ret= %ld\n,
-   nr, rc);
+   printk(KERN_ERR Error: Prod to wake up processor %d\
+   Ret= %ld\n, nr, rc);
}
 }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] pseries: Make declarations of cpu_hotplug_driver_lock() ANSI compatible.

2009-12-10 Thread Gautham R Shenoy

And add the __acquires() and __releases() annotations, while at it.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/dlpar.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 12df9e8..67b7a10 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -346,12 +346,14 @@ int dlpar_release_drc(u32 drc_index)
 
 static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
 
-void cpu_hotplug_driver_lock()
+void cpu_hotplug_driver_lock(void)
+__acquires(pseries_cpu_hotplug_mutex)
 {
mutex_lock(pseries_cpu_hotplug_mutex);
 }
 
-void cpu_hotplug_driver_unlock()
+void cpu_hotplug_driver_unlock(void)
+__releases(pseries_cpu_hotplug_mutex)
 {
mutex_unlock(pseries_cpu_hotplug_mutex);
 }

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/cpufreq: Add pr_warn() on OPAL firmware failures

2014-08-04 Thread Gautham R Shenoy

On Sun, Aug 03, 2014 at 02:54:05PM +0530, Vaidyanathan Srinivasan wrote:
 @@ -131,7 +136,12 @@ static unsigned int pstate_id_to_freq(int pstate_id)
   int i;
 
   i = powernv_pstate_info.max - pstate_id;
 - BUG_ON(i = powernv_pstate_info.nr_pstates || i  0);
 + if (i = powernv_pstate_info.nr_pstates || i  0) {
 + pr_warn(PState id %d outside of PState table, 
 + reporting nominal id %d instead\n,
 + pstate_id, powernv_pstate_info.nominal);
 + i = powernv_pstate_info.max - powernv_pstate_info.nominal;

As of now the default loglevel corresponds to KERN_WARNING so this
warning should get printed anyway. However, don't you think it would
be better if we make it a pr_err( ) since it's a platform error that's
causing the pstate_id to go out of bounds ?

Otherwise it looks ok.

Acked-by: Gautham R. Shenoy e...@linux.vnet.ibm.com

 + }
 
   return powernv_freqs[i].frequency;
  }


--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] powernv:cpufreq: Create pstate_id_to_freq() helper

2014-03-03 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a helper routine that can return the cpu-frequency for the
corresponding pstate_id.

Also, cache the values of the pstate_max, pstate_min and
pstate_nominal and nr_pstates in a static structure so that they can
be reused in the future to perform any validations.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 345501e..d0a8dee 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -37,6 +37,15 @@ static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
 #define POWERNV_MAX_PSTATES256
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
+struct powernv_pstate_info {
+   int pstate_min_id;
+   int pstate_max_id;
+   int pstate_nominal_id;
+   int nr_pstates;
+};
+
+static struct powernv_pstate_info powernv_pstate_info;
+
 
 /*
  * Initialize the freq table based on data obtained
@@ -106,9 +115,28 @@ static int init_powernv_pstates(void)
powernv_freqs[i].driver_data = 0;
powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
 
+   powernv_pstate_info.pstate_min_id = pstate_min;
+   powernv_pstate_info.pstate_max_id = pstate_max;
+   powernv_pstate_info.pstate_nominal_id = pstate_nominal;
+   powernv_pstate_info.nr_pstates = nr_pstates;
+
return 0;
 }
 
+/**
+ * Returns the cpu frequency corresponding to the pstate_id.
+ */
+static unsigned int pstate_id_to_freq(int pstate_id)
+{
+   int i;
+
+   i = powernv_pstate_info.pstate_max_id - pstate_id;
+
+   BUG_ON(i = powernv_pstate_info.nr_pstates || i  0);
+   WARN_ON(powernv_freqs[i].driver_data != pstate_id);
+   return powernv_freqs[i].frequency;
+}
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
NULL,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] powernv:cpufreq: Export nominal frequency via sysfs.

2014-03-03 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a driver attribute named cpuinfo_nominal_frequency which
creates a sysfs read-only file named cpuinfo_nominal_frequency. Export
the frequency corresponding to the nominal_pstate through this
interface.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d0a8dee..c59eb26 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -137,8 +137,30 @@ static unsigned int pstate_id_to_freq(int pstate_id)
return powernv_freqs[i].frequency;
 }
 
+/**
+ * show_cpuinfo_nominal_freq - Show the nominal CPU frequency as indicated by
+ * the firmware
+ */
+static ssize_t show_cpuinfo_nominal_freq(struct cpufreq_policy *policy,
+   char *buf)
+{
+   int nominal_freq;
+   nominal_freq = pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id);
+   return sprintf(buf, %u\n, nominal_freq);
+}
+
+
+struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = {
+   .attr = { .name = cpuinfo_nominal_freq,
+ .mode = 0444,
+   },
+   .show = show_cpuinfo_nominal_freq,
+};
+
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
+   cpufreq_freq_attr_cpuinfo_nominal_freq,
NULL,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4] powernv:cpufreq: Export nominal and current frequency via sysfs

2014-03-03 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

On IBM POWERNV platforms[1], presently we do not report the current
operating frequency of the processor through the sysfs interface
cpuinfo_cur_freq since the cpu frequency driver[1] has not
implemented the -get(unsigned int cpu) method. Fix this by
implementing the -get(unsigned int cpu) method which will report the
frequency corresponding to the pstate_id on PMSR on cpu.

Also, export the nominal frequency of the processor through the sysfs
interface cpuinfo_nominal_freq by defining it as a driver attribute.

The patch series depends on the cpu-frequency driver patch-series[1].

[1] http://comments.gmane.org/gmane.linux.ports.ppc.embedded/67905

Thanks and Regards
gautham.

Summary:
==
Gautham R. Shenoy (4):
  powernv:cpufreq: Create pstate_id_to_freq() helper
  powernv:cpufreq: Export nominal frequency via sysfs.
  powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.
  powernv:cpufreq: Implement the driver-get() method

 drivers/cpufreq/powernv-cpufreq.c | 115 --
 1 file changed, 110 insertions(+), 5 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powernv:cpufreq: Implement the driver-get() method

2014-03-03 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

The current frequency of a cpu is reported through the sysfs file
cpuinfo_cur_freq. This requires the driver to implement a
-get(unsigned int cpu) method which will return the current
operating frequency.

Implement a function named powernv_cpufreq_get() which corresponds to
the required -get() method.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index f0dae6f..5f43e4f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -218,6 +218,46 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
BUG();
 }
 
+/**
+ * Computes the current frequency on this cpu
+ * and stores the result in *ret_freq.
+ */
+static void powernv_get_cpu_freq(void *ret_freq)
+{
+   unsigned long pmspr_val;
+   long pstate_id;
+   int *cur_freq, freq;
+
+   cur_freq = (int *)ret_freq;
+   pmspr_val = get_pmspr(SPRN_PMSR);
+   pstate_id = pmspr_val;
+   pstate_id = pstate_id  56;
+   WARN_ON(pstate_id  0);
+   freq = pstate_id_to_freq(pstate_id);
+   pr_debug(cpu %d pmsr %lx pstate_id %ld frequency %d \n,
+   smp_processor_id(), pmspr_val, pstate_id, freq);
+   *cur_freq = freq;
+}
+
+/**
+ * Returns the cpu frequency as reported by the firmware for 'cpu'.
+ * This value is reported through the sysfs file cpuinfo_cur_freq.
+ */
+unsigned int powernv_cpufreq_get(unsigned int cpu)
+{
+   int ret_freq;
+   cpumask_var_t sibling_mask;
+
+   if (!zalloc_cpumask_var(sibling_mask, GFP_KERNEL))
+   return -ENOMEM;
+
+   powernv_cpu_to_core_mask(cpu, sibling_mask);
+   smp_call_function_any(sibling_mask, powernv_get_cpu_freq, ret_freq, 1);
+
+   free_cpumask_var(sibling_mask);
+   return ret_freq;
+}
+
 static void set_pstate(void *pstate)
 {
unsigned long val;
@@ -310,6 +350,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
 static struct cpufreq_driver powernv_cpufreq_driver = {
.verify = powernv_cpufreq_verify,
.target = powernv_cpufreq_target,
+   .get= powernv_cpufreq_get,
.init   = powernv_cpufreq_cpu_init,
.exit   = powernv_cpufreq_cpu_exit,
.name   = powernv-cpufreq,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 14/52] powerpc, sysfs: Fix CPU hotplug callback registration

2014-03-06 Thread Gautham R Shenoy

Hello Ben,

On Fri, Mar 07, 2014 at 01:57:31PM +1100, Benjamin Herrenschmidt wrote:
 On Fri, 2014-02-14 at 13:22 +0530, Srivatsa S. Bhat wrote:
  Subsystems that want to register CPU hotplug callbacks, as well as perform
  initialization for the CPUs that are already online, often do it as shown
  below:
  
  get_online_cpus();
  
  for_each_online_cpu(cpu)
  init_cpu(cpu);
  
  register_cpu_notifier(foobar_cpu_notifier);
  
  put_online_cpus();
 
 This patch breaks a good half of my test configs with:
 
 /home/benh/linux-powerpc-test/arch/powerpc/kernel/sysfs.c: In function 
 'topology_init':
 /home/benh/linux-powerpc-test/arch/powerpc/kernel/sysfs.c:979:2: error: 
 implicit declaration of function 'cpu_notifier_register_begin' 
 [-Werror=implicit-function-declaration]
 /home/benh/linux-powerpc-test/arch/powerpc/kernel/sysfs.c:1004:2: error: 
 implicit declaration of function '__register_cpu_notifier' 
 [-Werror=implicit-function-declaration]
 /home/benh/linux-powerpc-test/arch/powerpc/kernel/sysfs.c:1006:2: error: 
 implicit declaration of function 'cpu_notifier_register_done' 
 [-Werror=implicit-function-declaration]
 cc1: all warnings being treated as errors
 make[2]: *** [arch/powerpc/kernel/sysfs.o] Error 1
 make[2]: *** Waiting for unfinished jobs

This patch depends on
[PATCH v2 02/52] CPU hotplug: Provide lockless versions of callback 
registration functions 
of the series (Can be found here: https://lkml.org/lkml/2014/2/14/59).
This particular patch defines 'cpu_notifier_register_begin',
'__register_cpu_notifier', and 'cpu_notifier_register_done' in the
cpu-hotplug core. Are you seeing the build breakage with this patch on
?

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/6] powernv:cpufreq: Dynamic cpu-frequency scaling

2014-03-10 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

This is the v2 of the consolidated patchset consisting
patches for enabling cpufreq on IBM POWERNV platforms
along with some enhancements. 

The v1 of these patches have been previously 
submitted on linuxppc-dev [1][2].

- This patchset contains code for the platform driver to support CPU
  frequency scaling on IBM POWERNV platforms.

- In addition to the standard control and status files exposed by the
  cpufreq core, the patchset exposes the nominal frequency through the
  file named cpuinfo_nominal_freq.

The patchset is based against commit c3bebc71c4bcdafa24b506adf0c1de3c1f77e2e0
of the mainline tree.

[1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-February/115244.html
[2]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-March/115703.html

Gautham R. Shenoy (3):
  powernv:cpufreq: Create pstate_id_to_freq() helper
  powernv:cpufreq: Export nominal frequency via sysfs.
  powernv:cpufreq: Implement the driver-get() method

Srivatsa S. Bhat (2):
  powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.
  powernv,cpufreq:Add per-core locking to serialize frequency
transitions

Vaidyanathan Srinivasan (1):
  powernv: cpufreq driver for powernv platform

 arch/powerpc/include/asm/reg.h |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig|   1 +
 drivers/cpufreq/Kconfig.powerpc|  13 ++
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/powernv-cpufreq.c  | 397 +
 6 files changed, 417 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 5/6] powernv:cpufreq: Export nominal frequency via sysfs.

2014-03-10 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a driver attribute named cpuinfo_nominal_freq which
creates a sysfs read-only file named cpuinfo_nominal_freq. Export
the frequency corresponding to the nominal_pstate through this
interface.

Nominal frequency is the highest non-turbo frequency for the
platform.  This is generally used for setting governor policies from
user space for optimal energy efficiency.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 0ecd163..183bbc4 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -142,8 +142,30 @@ static unsigned int pstate_id_to_freq(int pstate_id)
return powernv_freqs[i].frequency;
 }
 
+/**
+ * show_cpuinfo_nominal_freq - Show the nominal CPU frequency as indicated by
+ * the firmware
+ */
+static ssize_t show_cpuinfo_nominal_freq(struct cpufreq_policy *policy,
+   char *buf)
+{
+   int nominal_freq;
+   nominal_freq = pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id);
+   return sprintf(buf, %u\n, nominal_freq);
+}
+
+
+struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = {
+   .attr = { .name = cpuinfo_nominal_freq,
+ .mode = 0444,
+   },
+   .show = show_cpuinfo_nominal_freq,
+};
+
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
+   cpufreq_freq_attr_cpuinfo_nominal_freq,
NULL,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/6] powernv: cpufreq driver for powernv platform

2014-03-10 Thread Gautham R. Shenoy

From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/reg.h |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig|   1 +
 drivers/cpufreq/Kconfig.powerpc|  13 ++
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/powernv-cpufreq.c  | 277 +
 6 files changed, 297 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 90c06ec..84f92ca 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
+#define SPRN_PMICR 0x354   /* Power Management Idle Control Reg */
+#define SPRN_PMSR  0x355   /* Power Management Status Reg */
+#define SPRN_PMCR  0x374   /* Power Management Control Register */
+
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
 #define FSCR_EBB_LG7   /* Enable Event Based Branching */
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..1fe12b1 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,7 @@ config PPC_POWERNV
select PPC_UDBG_16550
select PPC_SCOM
select ARCH_RANDOM
+   select CPU_FREQ
default y
 
 config PPC_POWERNV_RTAS
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 4b029c0..4ba1632 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
 choice
prompt Default CPUFreq governor
default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || 
ARM_SA1110_CPUFREQ
+   default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
help
  This option sets which CPUFreq governor shall be loaded at
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index ca0021a..93f8689 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -54,3 +54,16 @@ config PPC_PASEMI_CPUFREQ
help
  This adds the support for frequency switching on PA Semi
  PWRficient processors.
+
+config POWERNV_CPUFREQ
+   tristate CPU frequency scaling for IBM POWERNV platform
+   depends on PPC_POWERNV
+   select CPU_FREQ_GOV_PERFORMANCE
+   select CPU_FREQ_GOV_POWERSAVE
+   select CPU_FREQ_GOV_USERSPACE
+   select CPU_FREQ_GOV_ONDEMAND
+   select CPU_FREQ_GOV_CONSERVATIVE
+   default y
+   help
+This adds support for CPU frequency switching on IBM POWERNV
+platform
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 7494565..0dbb963 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_PPC_CORENET_CPUFREQ)   += ppc-corenet-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC)+= pmac32-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC64)  += pmac64-cpufreq.o
 obj-$(CONFIG_PPC_PASEMI_CPUFREQ)   += pasemi-cpufreq.o
+obj-$(CONFIG_POWERNV_CPUFREQ)  += powernv-cpufreq.o
 
 
##
 # Other platform drivers
diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
new file mode 100644
index 000..ab1551f
--- /dev/null
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -0,0 +1,277 @@
+/*
+ * POWERNV cpufreq driver for the IBM POWER processors
+ *
+ * (C) Copyright IBM 2014
+ *
+ * Author: Vaidyanathan Srinivasan svaidy at linux.vnet.ibm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope

[PATCH v2 3/6] powernv, cpufreq:Add per-core locking to serialize frequency transitions

2014-03-10 Thread Gautham R. Shenoy

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

On POWER systems, the CPU frequency is controlled at a core-level and
hence we need to serialize so that only one of the threads in the core
switches the core's frequency at a time.

Using a global mutex lock would needlessly serialize _all_ frequency
transitions in the system (across all cores). So introduce per-core
locking to enable finer-grained synchronization and thereby enhance
the speed and responsiveness of the cpufreq driver to varying workload
demands.

The design of per-core locking is very simple and straight-forward: we
first define a Per-CPU lock and use the ones that belongs to the first
thread sibling of the core.

cpu_first_thread_sibling() macro is used to find the *common* lock for
all thread siblings belonging to a core.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4cad727..4c2e8ca 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -24,8 +24,15 @@
 #include linux/of.h
 #include asm/cputhreads.h
 
-/* FIXME: Make this per-core */
-static DEFINE_MUTEX(freq_switch_mutex);
+/* Per-Core locking for frequency transitions */
+static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
+
+#define lock_core_freq(cpu)\
+   mutex_lock(per_cpu(freq_switch_lock,\
+   cpu_first_thread_sibling(cpu)));
+#define unlock_core_freq(cpu)  \
+   mutex_unlock(per_cpu(freq_switch_lock,\
+   cpu_first_thread_sibling(cpu)));
 
 #define POWERNV_MAX_PSTATES256
 
@@ -233,7 +240,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
freqs.new = powernv_freqs[new_index].frequency;
freqs.cpu = policy-cpu;
 
-   mutex_lock(freq_switch_mutex);
+   lock_core_freq(policy-cpu);
cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
 
pr_debug(setting frequency for cpu %d to %d kHz index %d pstate %d,
@@ -245,7 +252,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
rc = powernv_set_freq(policy-cpus, new_index);
 
cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
-   mutex_unlock(freq_switch_mutex);
+   unlock_core_freq(policy-cpu);
 
return rc;
 }
@@ -262,7 +269,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 
 static int __init powernv_cpufreq_init(void)
 {
-   int rc = 0;
+   int cpu, rc = 0;
 
/* Discover pstates from device tree and init */
 
@@ -272,6 +279,10 @@ static int __init powernv_cpufreq_init(void)
pr_info(powernv-cpufreq disabled\n);
return rc;
}
+   /* Init per-core mutex */
+   for_each_possible_cpu(cpu) {
+   mutex_init(per_cpu(freq_switch_lock, cpu));
+   }
 
rc = cpufreq_register_driver(powernv_cpufreq_driver);
return rc;
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/6] powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.

2014-03-10 Thread Gautham R. Shenoy

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

Create a helper method that computes the cpumask corresponding to the
thread-siblings of a cpu. Use this for initializing the policy-cpus
mask for a given cpu.

(Original code written by Srivatsa S. Bhat. Gautham moved this to a
helper function!)

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index ab1551f..4cad727 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -115,6 +115,23 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
 
 /* Helper routines */
 
+/**
+ * Sets the bits corresponding to the thread-siblings of cpu in its core
+ * in 'cpus'.
+ */
+static void powernv_cpu_to_core_mask(unsigned int cpu, cpumask_var_t cpus)
+{
+   int base, i;
+
+   base = cpu_first_thread_sibling(cpu);
+
+   for (i = 0; i  threads_per_core; i++) {
+   cpumask_set_cpu(base + i, cpus);
+   }
+
+   return;
+}
+
 /* Access helpers to power mgt SPR */
 
 static inline unsigned long get_pmspr(unsigned long sprn)
@@ -180,13 +197,8 @@ static int powernv_set_freq(cpumask_var_t cpus, unsigned 
int new_index)
 
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-   int base, i;
-
 #ifdef CONFIG_SMP
-   base = cpu_first_thread_sibling(policy-cpu);
-
-   for (i = 0; i  threads_per_core; i++)
-   cpumask_set_cpu(base + i, policy-cpus);
+   powernv_cpu_to_core_mask(policy-cpu, policy-cpus);
 #endif
policy-cpuinfo.transition_latency = 25000;
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 6/6] powernv:cpufreq: Implement the driver-get() method

2014-03-10 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

The current frequency of a cpu is reported through the sysfs file
cpuinfo_cur_freq. This requires the driver to implement a
-get(unsigned int cpu) method which will return the current
operating frequency.

Implement a function named powernv_cpufreq_get() which reads the local
pstate from the PMSR and returns the corresponding frequency.

Set the powernv_cpufreq_driver.get hook to powernv_cpufreq_get().

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 183bbc4..6f3b6e1 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -223,6 +223,53 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
BUG();
 }
 
+/*
+ * Computes the current frequency on this cpu
+ * and stores the result in *ret_freq.
+ */
+static void powernv_read_cpu_freq(void *ret_freq)
+{
+   unsigned long pmspr_val;
+   s8 local_pstate_id;
+   int *cur_freq, freq, pstate_id;
+
+   cur_freq = (int *)ret_freq;
+   pmspr_val = get_pmspr(SPRN_PMSR);
+
+   /* The local pstate id corresponds bits 48..55 in the PMSR.
+ * Note: Watch out for the sign! */
+   local_pstate_id = (pmspr_val  48)  0xFF;
+   pstate_id = local_pstate_id;
+
+   freq = pstate_id_to_freq(pstate_id);
+   pr_debug(cpu %d pmsr %lx pstate_id %d frequency %d \n,
+   smp_processor_id(), pmspr_val, pstate_id, freq);
+   *cur_freq = freq;
+}
+
+/*
+ * Returns the cpu frequency as reported by the firmware for 'cpu'.
+ * This value is reported through the sysfs file cpuinfo_cur_freq.
+ */
+unsigned int powernv_cpufreq_get(unsigned int cpu)
+{
+   int ret_freq;
+   cpumask_var_t sibling_mask;
+
+   if (unlikely(!zalloc_cpumask_var(sibling_mask, GFP_KERNEL))) {
+   smp_call_function_single(cpu, powernv_read_cpu_freq,
+   ret_freq, 1);
+   return ret_freq;
+   }
+
+   powernv_cpu_to_core_mask(cpu, sibling_mask);
+   smp_call_function_any(sibling_mask, powernv_read_cpu_freq,
+   ret_freq, 1);
+
+   free_cpumask_var(sibling_mask);
+   return ret_freq;
+}
+
 static void set_pstate(void *pstate)
 {
unsigned long val;
@@ -309,6 +356,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
 static struct cpufreq_driver powernv_cpufreq_driver = {
.verify = powernv_cpufreq_verify,
.target = powernv_cpufreq_target,
+   .get= powernv_cpufreq_get,
.init   = powernv_cpufreq_cpu_init,
.exit   = powernv_cpufreq_cpu_exit,
.name   = powernv-cpufreq,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/6] powernv:cpufreq: Create pstate_id_to_freq() helper

2014-03-10 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a helper routine that can return the cpu-frequency for the
corresponding pstate_id.

Also, cache the values of the pstate_max, pstate_min and
pstate_nominal and nr_pstates in a static structure so that they can
be reused in the future to perform any validations.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 4c2e8ca..0ecd163 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,6 +39,14 @@ static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
+struct powernv_pstate_info {
+   int pstate_min_id;
+   int pstate_max_id;
+   int pstate_nominal_id;
+   int nr_pstates;
+};
+static struct powernv_pstate_info powernv_pstate_info;
+
 /*
  * Initialize the freq table based on data obtained
  * from the firmware passed via device-tree
@@ -112,9 +120,28 @@ static int init_powernv_pstates(void)
for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
pr_debug(%d: %d\n, i, powernv_freqs[i].frequency);
 
+   powernv_pstate_info.pstate_min_id = pstate_min;
+   powernv_pstate_info.pstate_max_id = pstate_max;
+   powernv_pstate_info.pstate_nominal_id = pstate_nominal;
+   powernv_pstate_info.nr_pstates = nr_pstates;
+
return 0;
 }
 
+/**
+ * Returns the cpu frequency corresponding to the pstate_id.
+ */
+static unsigned int pstate_id_to_freq(int pstate_id)
+{
+   int i;
+
+   i = powernv_pstate_info.pstate_max_id - pstate_id;
+
+   BUG_ON(i = powernv_pstate_info.nr_pstates || i  0);
+   WARN_ON(powernv_pstate_ids[i] != pstate_id);
+   return powernv_freqs[i].frequency;
+}
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
NULL,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/2] powerpc: powernv: Framework to show the correct clock in /proc/cpuinfo

2014-03-11 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Currently, the code in setup-common.c for powerpc assumes that all
clock rates are same in a smp system. This value is cached in the
variable named ppc_proc_freq and is the value that is reported in
/proc/cpuinfo.

However on the PowerNV platform, the clock rate is same only across
the threads of the same core. Hence the value that is reported in
/proc/cpuinfo is incorrect on PowerNV platforms. We need a better way
to query and report the correct value of the processor clock in
/proc/cpuinfo.

The patch achieves this by creating a machdep_call named
get_proc_freq() which is expected to returns the frequency in Hz. The
code in show_cpuinfo() can invoke this method to display the correct
clock rate on platforms that have implemented this method. On the
other powerpc platforms it can use the value cached in ppc_proc_freq.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/machdep.h |  2 ++
 arch/powerpc/kernel/setup-common.c | 16 
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index ad3025d..eb1c6d4 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -113,6 +113,8 @@ struct machdep_calls {
/* Optional, may be NULL. */
void(*show_cpuinfo)(struct seq_file *m);
void(*show_percpuinfo)(struct seq_file *m, int i);
+   /* Returns the current operating frequency of cpu in Hz */
+   unsigned long   (*get_proc_freq)(unsigned int cpu);
 
void(*init_IRQ)(void);
 
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index bc76cc6..bdd3045 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -209,6 +209,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 {
unsigned long cpu_id = (unsigned long)v - 1;
unsigned int pvr;
+   unsigned long proc_freq;
unsigned short maj;
unsigned short min;
 
@@ -260,12 +261,19 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 #endif /* CONFIG_TAU */
 
/*
-* Assume here that all clock rates are the same in a
-* smp system.  -- Cort
+* Platforms that have variable clock rates, should implement
+* the method ppc_md.get_proc_freq() that reports the clock
+* rate of a given cpu. The rest can use ppc_proc_freq to
+* report the clock rate that is same across all cpus.
 */
-   if (ppc_proc_freq)
+   if (ppc_md.get_proc_freq)
+   proc_freq = ppc_md.get_proc_freq(cpu_id);
+   else
+   proc_freq = ppc_proc_freq;
+
+   if (proc_freq)
seq_printf(m, clock\t\t: %lu.%06luMHz\n,
-  ppc_proc_freq / 100, ppc_proc_freq % 100);
+  proc_freq / 100, proc_freq % 100);
 
if (ppc_md.show_percpuinfo != NULL)
ppc_md.show_percpuinfo(m, cpu_id);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/2] powerpc: powernv: Implement ppc_md.get_proc_freq()

2014-03-11 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Implement a method named pnv_get_proc_freq(unsigned int cpu) which
returns the current clock rate on the 'cpu' in Hz to be reported in
/proc/cpuinfo. This method uses the value reported by cpufreq when
such a value is sane. Otherwise it falls back to old way of reporting
the clockrate, i.e. ppc_proc_freq.

Set the ppc_md.get_proc_freq() hook to pnv_get_proc_freq() on the
PowerNV platform.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/setup.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 110f4fb..423e36d 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -28,6 +28,7 @@
 #include linux/bug.h
 #include linux/cpuidle.h
 #include linux/pci.h
+#include linux/cpufreq.h
 
 #include asm/machdep.h
 #include asm/firmware.h
@@ -235,6 +236,25 @@ void powernv_idle(void)
}
 }
 
+/*
+ * Returns the cpu frequency for 'cpu' in Hz. This is used by
+ * /proc/cpuinfo
+ */
+unsigned long pnv_get_proc_freq(unsigned int cpu)
+{
+   unsigned long ret_freq;
+
+   ret_freq = cpufreq_quick_get(cpu) * 1000ul;
+
+   /*
+* If the backend cpufreq driver does not exist,
+ * then fallback to old way of reporting the clockrate.
+*/
+   if (!ret_freq)
+   ret_freq = ppc_proc_freq;
+   return ret_freq;
+}
+
 define_machine(powernv) {
.name   = PowerNV,
.probe  = pnv_probe,
@@ -242,6 +262,7 @@ define_machine(powernv) {
.setup_arch = pnv_setup_arch,
.init_IRQ   = pnv_init_IRQ,
.show_cpuinfo   = pnv_show_cpuinfo,
+   .get_proc_freq  = pnv_get_proc_freq,
.progress   = pnv_progress,
.machine_shutdown   = pnv_shutdown,
.power_save = powernv_idle,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 0/2] powernv: Show the correct clock value in /proc/cpuinfo

2014-03-11 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

Currently, the code in setup-common.c for powerpc assumes that all
clock rates are same in a smp system. This value is cached in the
variable named ppc_proc_freq and is the value that is reported in
/proc/cpuinfo.

However on the PowerNV platform, the clock rate is same only across
the threads of the same core. Hence the value that is reported in
/proc/cpuinfo is incorrect on PowerNV platforms.

This patch-series fixes this problem by having /proc/cpuinfo report
the value returned by cpufreq_quick_get(cpu) whenever the cpufreq
backend driver is available and fallback to the old way of reporting
the clock rate in its absence.

These patches depend on the patches to enable dynamic cpufrequency
scaling on PowerNV that can be found here:
http://linuxppc.10917.n7.nabble.com/PATCH-v2-0-6-powernv-cpufreq-Dynamic-cpu-frequency-scaling-td80641.html

Gautham R. Shenoy (2):
  powerpc: powernv: Framework to show the correct clock in /proc/cpuinfo
  powerpc: powernv: Implement ppc_md.get_proc_freq()

 arch/powerpc/include/asm/machdep.h |  2 ++
 arch/powerpc/kernel/setup-common.c | 16 
 arch/powerpc/platforms/powernv/setup.c | 21 +
 3 files changed, 35 insertions(+), 4 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/6] powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.

2014-03-19 Thread Gautham R Shenoy

On Wed, Mar 19, 2014 at 12:05:20PM +0530, Srivatsa S. Bhat wrote:
 On 03/19/2014 05:07 AM, Benjamin Herrenschmidt wrote:
  On Mon, 2014-03-10 at 16:40 +0530, Gautham R. Shenoy wrote:
  From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 
  Create a helper method that computes the cpumask corresponding to the
  thread-siblings of a cpu. Use this for initializing the policy-cpus
  mask for a given cpu.
 
  (Original code written by Srivatsa S. Bhat. Gautham moved this to a
  helper function!)
  
  How does that differ from the existing entry in the sibling map  which
  you can obtain with the helper cpu_sibling_mask() ? (You probably want
  to *copy* the mask of course but I don't see the need of re-creating
  one from scratch).
 
 
 The intent here was to have a sibling mask that is invariant of CPU
 hotplug. cpu_sibling_mask is updated on every hotplug operation, and this
 would break cpufreq, since policy-cpus has to be hotplug invariant.
 
 This should have been noted in the changelog of patch 1 as well as this
 patch. (The earlier (internal) versions of this patchset had the
 description in the changelogs, but somehow it got dropped accidentally).
 I'll work with Gautham and ensure that we have this important info
 included in the changelog. Thanks for pointing it out!

I reused that part of the code because I was unaware of
cpu_sibling_mask()!

For implementing powernv_cpufreq_get(), cpu_sibling_mask() suffices
since we are using this mask as a parameter to
smp_call_function_any(), and hence are concerned only about the online
siblings. 

I shall fix the changelog to reflect Srivatsa's reason for having a
mask that does not vary with cpu-hotplug.

 
 Regards,
 Srivatsa S. Bhat
 
 
  Also, this should have been CCed to the cpufreq mailing list and maybe
  the relevant maintainer too.

Ok. Will do it in the subsequent version. 

Thanks for the review!

  
  Cheers,
  Ben.
  
  Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
  Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
  ---
   drivers/cpufreq/powernv-cpufreq.c | 24 ++--
   1 file changed, 18 insertions(+), 6 deletions(-)
 
  diff --git a/drivers/cpufreq/powernv-cpufreq.c 
  b/drivers/cpufreq/powernv-cpufreq.c
  index ab1551f..4cad727 100644
  --- a/drivers/cpufreq/powernv-cpufreq.c
  +++ b/drivers/cpufreq/powernv-cpufreq.c
  @@ -115,6 +115,23 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
   
   /* Helper routines */
   
  +/**
  + * Sets the bits corresponding to the thread-siblings of cpu in its core
  + * in 'cpus'.
  + */
  +static void powernv_cpu_to_core_mask(unsigned int cpu, cpumask_var_t cpus)
  +{
  +  int base, i;
  +
  +  base = cpu_first_thread_sibling(cpu);
  +
  +  for (i = 0; i  threads_per_core; i++) {
  +  cpumask_set_cpu(base + i, cpus);
  +  }
  +
  +  return;
  +}
  +
   /* Access helpers to power mgt SPR */
   
   static inline unsigned long get_pmspr(unsigned long sprn)
  @@ -180,13 +197,8 @@ static int powernv_set_freq(cpumask_var_t cpus, 
  unsigned int new_index)
   
   static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
   {
  -  int base, i;
  -
   #ifdef CONFIG_SMP
  -  base = cpu_first_thread_sibling(policy-cpu);
  -
  -  for (i = 0; i  threads_per_core; i++)
  -  cpumask_set_cpu(base + i, policy-cpus);
  +  powernv_cpu_to_core_mask(policy-cpu, policy-cpus);
   #endif
 policy-cpuinfo.transition_latency = 25000;
   
  
  
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/5] powernv: Enable Dynamic Frequency

2014-03-20 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

This is the v3 of the consolidated patchset consisting
patches for enabling cpufreq on IBM POWERNV platforms
along with some enhancements. I have incorporated the review 
for v2 (which can be found at [1]).

- This patchset contains code for the platform driver to support CPU
  frequency scaling on IBM POWERNV platforms.

- In addition to the standard control and status files exposed by the
  cpufreq core, the patchset exposes the nominal frequency through the
  file named cpuinfo_nominal_freq.


The changes from from v2:
  - Updated the changelog for powernv: cpufreq driver for powernv platform
to record the fact that policy-cpus must be populated with a sibling mask
that includes even the offlined siblings.

  - Dropped the patch named 
powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper
 since the driver-get() method can use cpu_sibling_mask().
  
  -  Updated powernv:cpufreq: Implement the driver-get() method to 
 use cpu_sibling_mask() in powernv_cpufreq_get()

The patchset is based against commit 4907cdca7210c5895311bddcf05a4c85b67d8566
of the mainline tree.

[1]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-March/115861.html


Gautham R. Shenoy (3):
  powernv:cpufreq: Create pstate_id_to_freq() helper
  powernv:cpufreq: Export nominal frequency via sysfs.
  powernv:cpufreq: Implement the driver-get() method

Srivatsa S. Bhat (1):
  powernv,cpufreq:Add per-core locking to serialize frequency
transitions

Vaidyanathan Srinivasan (1):
  powernv: cpufreq driver for powernv platform

 arch/powerpc/include/asm/reg.h |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig|   1 +
 drivers/cpufreq/Kconfig.powerpc|  13 ++
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/powernv-cpufreq.c  | 375 +
 6 files changed, 395 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/5] powernv: cpufreq driver for powernv platform

2014-03-20 Thread Gautham R. Shenoy

From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The policy-cpus needs to be populated in a hotplug-invariant manner
instead of using cpu_sibling_mask() which varies with cpu-hotplug. This
is because the cpufreq core code copies this content into policy-related_cpus
mask which is should not vary on cpu-hotplug.

[Change log updated by e...@linux.vnet.ibm.com]

Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/reg.h |   4 +
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 drivers/cpufreq/Kconfig|   1 +
 drivers/cpufreq/Kconfig.powerpc|  13 ++
 drivers/cpufreq/Makefile   |   1 +
 drivers/cpufreq/powernv-cpufreq.c  | 277 +
 6 files changed, 297 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 90c06ec..84f92ca 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
+#define SPRN_PMICR 0x354   /* Power Management Idle Control Reg */
+#define SPRN_PMSR  0x355   /* Power Management Status Reg */
+#define SPRN_PMCR  0x374   /* Power Management Control Register */
+
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
 #define FSCR_EBB_LG7   /* Enable Event Based Branching */
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..1fe12b1 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,7 @@ config PPC_POWERNV
select PPC_UDBG_16550
select PPC_SCOM
select ARCH_RANDOM
+   select CPU_FREQ
default y
 
 config PPC_POWERNV_RTAS
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 4b029c0..4ba1632 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
 choice
prompt Default CPUFreq governor
default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || 
ARM_SA1110_CPUFREQ
+   default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
help
  This option sets which CPUFreq governor shall be loaded at
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index ca0021a..93f8689 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -54,3 +54,16 @@ config PPC_PASEMI_CPUFREQ
help
  This adds the support for frequency switching on PA Semi
  PWRficient processors.
+
+config POWERNV_CPUFREQ
+   tristate CPU frequency scaling for IBM POWERNV platform
+   depends on PPC_POWERNV
+   select CPU_FREQ_GOV_PERFORMANCE
+   select CPU_FREQ_GOV_POWERSAVE
+   select CPU_FREQ_GOV_USERSPACE
+   select CPU_FREQ_GOV_ONDEMAND
+   select CPU_FREQ_GOV_CONSERVATIVE
+   default y
+   help
+This adds support for CPU frequency switching on IBM POWERNV
+platform
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 7494565..0dbb963 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_PPC_CORENET_CPUFREQ)   += ppc-corenet-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC)+= pmac32-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC64)  += pmac64-cpufreq.o
 obj-$(CONFIG_PPC_PASEMI_CPUFREQ)   += pasemi-cpufreq.o
+obj-$(CONFIG_POWERNV_CPUFREQ)  += powernv-cpufreq.o
 
 
##
 # Other platform drivers
diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
new file mode 100644
index 000..ab1551f
--- /dev/null
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -0,0 +1,277 @@
+/*
+ * POWERNV cpufreq driver for the IBM POWER processors
+ *
+ * (C) Copyright IBM 2014
+ *
+ * Author: Vaidyanathan Srinivasan svaidy

[PATCH v3 2/5] powernv, cpufreq:Add per-core locking to serialize frequency transitions

2014-03-20 Thread Gautham R. Shenoy

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

On POWER systems, the CPU frequency is controlled at a core-level and
hence we need to serialize so that only one of the threads in the core
switches the core's frequency at a time.

Using a global mutex lock would needlessly serialize _all_ frequency
transitions in the system (across all cores). So introduce per-core
locking to enable finer-grained synchronization and thereby enhance
the speed and responsiveness of the cpufreq driver to varying workload
demands.

The design of per-core locking is very simple and straight-forward: we
first define a Per-CPU lock and use the ones that belongs to the first
thread sibling of the core.

cpu_first_thread_sibling() macro is used to find the *common* lock for
all thread siblings belonging to a core.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index ab1551f..66dae0d 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -24,8 +24,15 @@
 #include linux/of.h
 #include asm/cputhreads.h
 
-/* FIXME: Make this per-core */
-static DEFINE_MUTEX(freq_switch_mutex);
+/* Per-Core locking for frequency transitions */
+static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
+
+#define lock_core_freq(cpu)\
+   mutex_lock(per_cpu(freq_switch_lock,\
+   cpu_first_thread_sibling(cpu)));
+#define unlock_core_freq(cpu)  \
+   mutex_unlock(per_cpu(freq_switch_lock,\
+   cpu_first_thread_sibling(cpu)));
 
 #define POWERNV_MAX_PSTATES256
 
@@ -221,7 +228,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
freqs.new = powernv_freqs[new_index].frequency;
freqs.cpu = policy-cpu;
 
-   mutex_lock(freq_switch_mutex);
+   lock_core_freq(policy-cpu);
cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
 
pr_debug(setting frequency for cpu %d to %d kHz index %d pstate %d,
@@ -233,7 +240,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
rc = powernv_set_freq(policy-cpus, new_index);
 
cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
-   mutex_unlock(freq_switch_mutex);
+   unlock_core_freq(policy-cpu);
 
return rc;
 }
@@ -250,7 +257,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 
 static int __init powernv_cpufreq_init(void)
 {
-   int rc = 0;
+   int cpu, rc = 0;
 
/* Discover pstates from device tree and init */
 
@@ -260,6 +267,10 @@ static int __init powernv_cpufreq_init(void)
pr_info(powernv-cpufreq disabled\n);
return rc;
}
+   /* Init per-core mutex */
+   for_each_possible_cpu(cpu) {
+   mutex_init(per_cpu(freq_switch_lock, cpu));
+   }
 
rc = cpufreq_register_driver(powernv_cpufreq_driver);
return rc;
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/5] powernv:cpufreq: Create pstate_id_to_freq() helper

2014-03-20 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a helper routine that can return the cpu-frequency for the
corresponding pstate_id.

Also, cache the values of the pstate_max, pstate_min and
pstate_nominal and nr_pstates in a static structure so that they can
be reused in the future to perform any validations.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 66dae0d..e7b0292 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,6 +39,14 @@ static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
+struct powernv_pstate_info {
+   int pstate_min_id;
+   int pstate_max_id;
+   int pstate_nominal_id;
+   int nr_pstates;
+};
+static struct powernv_pstate_info powernv_pstate_info;
+
 /*
  * Initialize the freq table based on data obtained
  * from the firmware passed via device-tree
@@ -112,9 +120,28 @@ static int init_powernv_pstates(void)
for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
pr_debug(%d: %d\n, i, powernv_freqs[i].frequency);
 
+   powernv_pstate_info.pstate_min_id = pstate_min;
+   powernv_pstate_info.pstate_max_id = pstate_max;
+   powernv_pstate_info.pstate_nominal_id = pstate_nominal;
+   powernv_pstate_info.nr_pstates = nr_pstates;
+
return 0;
 }
 
+/**
+ * Returns the cpu frequency corresponding to the pstate_id.
+ */
+static unsigned int pstate_id_to_freq(int pstate_id)
+{
+   int i;
+
+   i = powernv_pstate_info.pstate_max_id - pstate_id;
+
+   BUG_ON(i = powernv_pstate_info.nr_pstates || i  0);
+   WARN_ON(powernv_pstate_ids[i] != pstate_id);
+   return powernv_freqs[i].frequency;
+}
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
NULL,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/5] powernv:cpufreq: Export nominal frequency via sysfs.

2014-03-20 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Create a driver attribute named cpuinfo_nominal_freq which
creates a sysfs read-only file named cpuinfo_nominal_freq. Export
the frequency corresponding to the nominal_pstate through this
interface.

Nominal frequency is the highest non-turbo frequency for the
platform.  This is generally used for setting governor policies from
user space for optimal energy efficiency.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e7b0292..46bee8a 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -142,8 +142,30 @@ static unsigned int pstate_id_to_freq(int pstate_id)
return powernv_freqs[i].frequency;
 }
 
+/**
+ * show_cpuinfo_nominal_freq - Show the nominal CPU frequency as indicated by
+ * the firmware
+ */
+static ssize_t show_cpuinfo_nominal_freq(struct cpufreq_policy *policy,
+   char *buf)
+{
+   int nominal_freq;
+   nominal_freq = pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id);
+   return sprintf(buf, %u\n, nominal_freq);
+}
+
+
+struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = {
+   .attr = { .name = cpuinfo_nominal_freq,
+ .mode = 0444,
+   },
+   .show = show_cpuinfo_nominal_freq,
+};
+
+
 static struct freq_attr *powernv_cpu_freq_attr[] = {
cpufreq_freq_attr_scaling_available_freqs,
+   cpufreq_freq_attr_cpuinfo_nominal_freq,
NULL,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-20 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

The current frequency of a cpu is reported through the sysfs file
cpuinfo_cur_freq. This requires the driver to implement a
-get(unsigned int cpu) method which will return the current
operating frequency.

Implement a function named powernv_cpufreq_get() which reads the local
pstate from the PMSR and returns the corresponding frequency.

Set the powernv_cpufreq_driver.get hook to powernv_cpufreq_get().

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 46bee8a..ef6ed8c 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -206,6 +206,43 @@ static inline void set_pmspr(unsigned long sprn, unsigned 
long val)
BUG();
 }
 
+/*
+ * Computes the current frequency on this cpu
+ * and stores the result in *ret_freq.
+ */
+static void powernv_read_cpu_freq(void *ret_freq)
+{
+   unsigned long pmspr_val;
+   s8 local_pstate_id;
+   int *cur_freq, freq, pstate_id;
+
+   cur_freq = (int *)ret_freq;
+   pmspr_val = get_pmspr(SPRN_PMSR);
+
+   /* The local pstate id corresponds bits 48..55 in the PMSR.
+ * Note: Watch out for the sign! */
+   local_pstate_id = (pmspr_val  48)  0xFF;
+   pstate_id = local_pstate_id;
+
+   freq = pstate_id_to_freq(pstate_id);
+   pr_debug(cpu %d pmsr %lx pstate_id %d frequency %d \n,
+   smp_processor_id(), pmspr_val, pstate_id, freq);
+   *cur_freq = freq;
+}
+
+/*
+ * Returns the cpu frequency as reported by the firmware for 'cpu'.
+ * This value is reported through the sysfs file cpuinfo_cur_freq.
+ */
+unsigned int powernv_cpufreq_get(unsigned int cpu)
+{
+   int ret_freq;
+
+   smp_call_function_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
+   ret_freq, 1);
+   return ret_freq;
+}
+
 static void set_pstate(void *pstate)
 {
unsigned long val;
@@ -297,6 +334,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
*policy,
 static struct cpufreq_driver powernv_cpufreq_driver = {
.verify = powernv_cpufreq_verify,
.target = powernv_cpufreq_target,
+   .get= powernv_cpufreq_get,
.init   = powernv_cpufreq_cpu_init,
.exit   = powernv_cpufreq_cpu_exit,
.name   = powernv-cpufreq,
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/5] powernv, cpufreq:Add per-core locking to serialize frequency transitions

2014-03-21 Thread Gautham R Shenoy

On Thu, Mar 20, 2014 at 05:40:57PM +0530, Gautham R. Shenoy wrote:
 From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 
 On POWER systems, the CPU frequency is controlled at a core-level and
 hence we need to serialize so that only one of the threads in the core
 switches the core's frequency at a time.
 
 Using a global mutex lock would needlessly serialize _all_ frequency
 transitions in the system (across all cores). So introduce per-core
 locking to enable finer-grained synchronization and thereby enhance
 the speed and responsiveness of the cpufreq driver to varying workload
 demands.
 
 The design of per-core locking is very simple and straight-forward: we
 first define a Per-CPU lock and use the ones that belongs to the first
 thread sibling of the core.
 
 cpu_first_thread_sibling() macro is used to find the *common* lock for
 all thread siblings belonging to a core.
 

Forgot to add the following line.

Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
 Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
 ---
  drivers/cpufreq/powernv-cpufreq.c | 21 -
  1 file changed, 16 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c
 index ab1551f..66dae0d 100644
 --- a/drivers/cpufreq/powernv-cpufreq.c
 +++ b/drivers/cpufreq/powernv-cpufreq.c
 @@ -24,8 +24,15 @@
  #include linux/of.h
  #include asm/cputhreads.h
 
 -/* FIXME: Make this per-core */
 -static DEFINE_MUTEX(freq_switch_mutex);
 +/* Per-Core locking for frequency transitions */
 +static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
 +
 +#define lock_core_freq(cpu)  \
 + mutex_lock(per_cpu(freq_switch_lock,\
 + cpu_first_thread_sibling(cpu)));
 +#define unlock_core_freq(cpu)\
 + mutex_unlock(per_cpu(freq_switch_lock,\
 + cpu_first_thread_sibling(cpu)));
 
  #define POWERNV_MAX_PSTATES  256
 
 @@ -221,7 +228,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
 *policy,
   freqs.new = powernv_freqs[new_index].frequency;
   freqs.cpu = policy-cpu;
 
 - mutex_lock(freq_switch_mutex);
 + lock_core_freq(policy-cpu);
   cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
 
   pr_debug(setting frequency for cpu %d to %d kHz index %d pstate %d,
 @@ -233,7 +240,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
 *policy,
   rc = powernv_set_freq(policy-cpus, new_index);
 
   cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
 - mutex_unlock(freq_switch_mutex);
 + unlock_core_freq(policy-cpu);
 
   return rc;
  }
 @@ -250,7 +257,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 
  static int __init powernv_cpufreq_init(void)
  {
 - int rc = 0;
 + int cpu, rc = 0;
 
   /* Discover pstates from device tree and init */
 
 @@ -260,6 +267,10 @@ static int __init powernv_cpufreq_init(void)
   pr_info(powernv-cpufreq disabled\n);
   return rc;
   }
 + /* Init per-core mutex */
 + for_each_possible_cpu(cpu) {
 + mutex_init(per_cpu(freq_switch_lock, cpu));
 + }
 
   rc = cpufreq_register_driver(powernv_cpufreq_driver);
   return rc;
 -- 
 1.8.3.1
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/5] powernv:cpufreq: Create pstate_id_to_freq() helper

2014-03-21 Thread Gautham R Shenoy

On Thu, Mar 20, 2014 at 05:40:58PM +0530, Gautham R. Shenoy wrote:
 From: Gautham R. Shenoy e...@linux.vnet.ibm.com
 
 Create a helper routine that can return the cpu-frequency for the
 corresponding pstate_id.
 
 Also, cache the values of the pstate_max, pstate_min and
 pstate_nominal and nr_pstates in a static structure so that they can
 be reused in the future to perform any validations.
 

Forgot to add the following line:

  Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
 ---
  drivers/cpufreq/powernv-cpufreq.c | 27 +++
  1 file changed, 27 insertions(+)
 
 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c
 index 66dae0d..e7b0292 100644
 --- a/drivers/cpufreq/powernv-cpufreq.c
 +++ b/drivers/cpufreq/powernv-cpufreq.c
 @@ -39,6 +39,14 @@ static DEFINE_PER_CPU(struct mutex, freq_switch_lock);
  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
  static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
 +struct powernv_pstate_info {
 + int pstate_min_id;
 + int pstate_max_id;
 + int pstate_nominal_id;
 + int nr_pstates;
 +};
 +static struct powernv_pstate_info powernv_pstate_info;
 +
  /*
   * Initialize the freq table based on data obtained
   * from the firmware passed via device-tree
 @@ -112,9 +120,28 @@ static int init_powernv_pstates(void)
   for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
   pr_debug(%d: %d\n, i, powernv_freqs[i].frequency);
 
 + powernv_pstate_info.pstate_min_id = pstate_min;
 + powernv_pstate_info.pstate_max_id = pstate_max;
 + powernv_pstate_info.pstate_nominal_id = pstate_nominal;
 + powernv_pstate_info.nr_pstates = nr_pstates;
 +
   return 0;
  }
 
 +/**
 + * Returns the cpu frequency corresponding to the pstate_id.
 + */
 +static unsigned int pstate_id_to_freq(int pstate_id)
 +{
 + int i;
 +
 + i = powernv_pstate_info.pstate_max_id - pstate_id;
 +
 + BUG_ON(i = powernv_pstate_info.nr_pstates || i  0);
 + WARN_ON(powernv_pstate_ids[i] != pstate_id);
 + return powernv_freqs[i].frequency;
 +}
 +
  static struct freq_attr *powernv_cpu_freq_attr[] = {
   cpufreq_freq_attr_scaling_available_freqs,
   NULL,
 -- 
 1.8.3.1
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-21 Thread Gautham R Shenoy

On Thu, Mar 20, 2014 at 05:41:00PM +0530, Gautham R. Shenoy wrote:
 From: Gautham R. Shenoy e...@linux.vnet.ibm.com
 
 The current frequency of a cpu is reported through the sysfs file
 cpuinfo_cur_freq. This requires the driver to implement a
 -get(unsigned int cpu) method which will return the current
 operating frequency.
 
 Implement a function named powernv_cpufreq_get() which reads the local
 pstate from the PMSR and returns the corresponding frequency.
 
 Set the powernv_cpufreq_driver.get hook to powernv_cpufreq_get().

Forgot to add 

  Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 
 Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
 ---
  drivers/cpufreq/powernv-cpufreq.c | 38 ++
  1 file changed, 38 insertions(+)
 
 diff --git a/drivers/cpufreq/powernv-cpufreq.c 
 b/drivers/cpufreq/powernv-cpufreq.c
 index 46bee8a..ef6ed8c 100644
 --- a/drivers/cpufreq/powernv-cpufreq.c
 +++ b/drivers/cpufreq/powernv-cpufreq.c
 @@ -206,6 +206,43 @@ static inline void set_pmspr(unsigned long sprn, 
 unsigned long val)
   BUG();
  }
 
 +/*
 + * Computes the current frequency on this cpu
 + * and stores the result in *ret_freq.
 + */
 +static void powernv_read_cpu_freq(void *ret_freq)
 +{
 + unsigned long pmspr_val;
 + s8 local_pstate_id;
 + int *cur_freq, freq, pstate_id;
 +
 + cur_freq = (int *)ret_freq;
 + pmspr_val = get_pmspr(SPRN_PMSR);
 +
 + /* The local pstate id corresponds bits 48..55 in the PMSR.
 + * Note: Watch out for the sign! */
 + local_pstate_id = (pmspr_val  48)  0xFF;
 + pstate_id = local_pstate_id;
 +
 + freq = pstate_id_to_freq(pstate_id);
 + pr_debug(cpu %d pmsr %lx pstate_id %d frequency %d \n,
 + smp_processor_id(), pmspr_val, pstate_id, freq);
 + *cur_freq = freq;
 +}
 +
 +/*
 + * Returns the cpu frequency as reported by the firmware for 'cpu'.
 + * This value is reported through the sysfs file cpuinfo_cur_freq.
 + */
 +unsigned int powernv_cpufreq_get(unsigned int cpu)
 +{
 + int ret_freq;
 +
 + smp_call_function_any(cpu_sibling_mask(cpu), powernv_read_cpu_freq,
 + ret_freq, 1);
 + return ret_freq;
 +}
 +
  static void set_pstate(void *pstate)
  {
   unsigned long val;
 @@ -297,6 +334,7 @@ static int powernv_cpufreq_target(struct cpufreq_policy 
 *policy,
  static struct cpufreq_driver powernv_cpufreq_driver = {
   .verify = powernv_cpufreq_verify,
   .target = powernv_cpufreq_target,
 + .get= powernv_cpufreq_get,
   .init   = powernv_cpufreq_cpu_init,
   .exit   = powernv_cpufreq_cpu_exit,
   .name   = powernv-cpufreq,
 -- 
 1.8.3.1
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/5] powernv: Enable Dynamic Frequency

2014-03-21 Thread Gautham R Shenoy

Hi Viresh,

On Fri, Mar 21, 2014 at 01:16:03PM +0530, Viresh Kumar wrote:
 On Thu, Mar 20, 2014 at 5:40 PM, Gautham R. Shenoy
 e...@linux.vnet.ibm.com wrote:
  This is the v3 of the consolidated patchset consisting
  patches for enabling cpufreq on IBM POWERNV platforms
  along with some enhancements.
 
 This is the first time I saw them. Looks like you never Cc'd linux-pm list.
 Also, would be better to keep Maintainers in --to field so that they can
 review them as soon as possible. Otherwise they might stay on lists for
 a long time..

Yes, I hadn't Cc'ed linux-pm list last time around. Sorry about
that. Will keep this in mind the next time around. Also, your
suggestion regarding keeping the Maintainers in --to field is well
taken.

--
Thanks and Regards
gautham.
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/5] powernv:cpufreq: Export nominal frequency via sysfs.

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 02:17:28PM +0530, Viresh Kumar wrote:
  +static ssize_t show_cpuinfo_nominal_freq(struct cpufreq_policy *policy,
  +   char *buf)
  +{
  +   int nominal_freq;
  +   nominal_freq = 
  pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id);
  +   return sprintf(buf, %u\n, nominal_freq);
 
 return sprintf(buf, %u\n,
 pstate_id_to_freq(powernv_pstate_info.pstate_nominal_id));

Sure. Will fix this. 
 
 ??
 
  +}
  +
  +
 
 remove extra blank line.
 
  +struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = {
  +   .attr = { .name = cpuinfo_nominal_freq,
  + .mode = 0444,
  +   },
 
 Align {}

Probably the use of ATTR_RO(cpuinfo_nominal_freq) and renaming
show_cpuinfo_nominal_freq to cpuinfo_nominal_freq_show() would be even
better. What do you think ?

 
  +   .show = show_cpuinfo_nominal_freq,
  +};
  +
  +
   static struct freq_attr *powernv_cpu_freq_attr[] = {
  cpufreq_freq_attr_scaling_available_freqs,
  +   cpufreq_freq_attr_cpuinfo_nominal_freq,
  NULL,
   };

This needs to be rewritten to include the entries of cpufreq_generic_attr.


Thanks for the review.

--
Regards
gautham. 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/5] powernv: cpufreq driver for powernv platform

2014-03-21 Thread Gautham R Shenoy

Hi Viresh,

On Fri, Mar 21, 2014 at 02:11:32PM +0530, Viresh Kumar wrote:
 On Thu, Mar 20, 2014 at 5:40 PM, Gautham R. Shenoy
 e...@linux.vnet.ibm.com wrote:
  From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
 
 Hi Vaidy,
 
  diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
  index 4b029c0..4ba1632 100644
  --- a/drivers/cpufreq/Kconfig
  +++ b/drivers/cpufreq/Kconfig
  @@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
   choice
  prompt Default CPUFreq governor
  default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || 
  ARM_SA1110_CPUFREQ
  +   default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
 
 Probably we should remove SA1100's entry as well from here. This is
 not the right way of doing it. Imagine 100 platforms having entries here.
 If you want it, then select it from your platforms Kconfig.

Sure. Will move these bits and the other governor related bits to the
Powernv Kconfig.

  diff --git a/drivers/cpufreq/powernv-cpufreq.c 
  b/drivers/cpufreq/powernv-cpufreq.c
  new file mode 100644
  index 000..ab1551f
  --- /dev/null
  +
  +#define pr_fmt(fmt)powernv-cpufreq:  fmt
  +
  +#include linux/module.h
  +#include linux/cpufreq.h
  +#include linux/of.h
  +#include asm/cputhreads.h
 
 That's it? Sure?
 
 Even if things compile for you, you must explicitly include all the
 files on which
 you depend.

Ok. 

 
  +
  +   WARN_ON(len_ids != len_freqs);
  +   nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
  +   WARN_ON(!nr_pstates);
 
 Why do you want to continue here?

Good point. We might be better off exiting at this point. 

 
  +   pr_debug(NR PStates %d\n, nr_pstates);
  +   for (i = 0; i  nr_pstates; i++) {
  +   u32 id = be32_to_cpu(pstate_ids[i]);
  +   u32 freq = be32_to_cpu(pstate_freqs[i]);
  +
  +   pr_debug(PState id %d freq %d MHz\n, id, freq);
  +   powernv_freqs[i].driver_data = i;
 
 I don't think you are using this field at all and this is the field you can
 use for driver_data and so you can get rid of powernv_pstate_ids[ ].

Using driver_data to record powernv_pstate_ids won't work since
powernv_pstate_ids can be negative. So a pstate_id -3 can be confused
with CPUFREQ_BOOST_FREQ thereby not displaying the frequency
corresponding to pstate id -3. So for now I think we will be retaining
powernv_pstate_ids.

 
  +   powernv_freqs[i].frequency = freq * 1000; /* kHz */
  +   powernv_pstate_ids[i] = id;
  +   }
  +   /* End of list marker entry */
  +   powernv_freqs[i].driver_data = 0;
 
 Not required.

Ok.
 
  +   powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
  +
  +   /* Print frequency table */
  +   for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
  +   pr_debug(%d: %d\n, i, powernv_freqs[i].frequency);
 
 You have already printed this table earlier..

Fair enough.

 
  +
  +   return 0;
  +}
  +
  +static struct freq_attr *powernv_cpu_freq_attr[] = {
  +   cpufreq_freq_attr_scaling_available_freqs,
  +   NULL,
  +};
 
 Can use this instead: cpufreq_generic_attr?

In this patch yes. But later patch introduces an additional attribute
for displaying the nominal frequency. Will handle that part in a clean
way in the next version.

 
  +/* Helper routines */
  +
  +/* Access helpers to power mgt SPR */
  +
  +static inline unsigned long get_pmspr(unsigned long sprn)
 
 Looks big enough not be inlined?

It is called from one function. It has been defined separately for
readability. 

 
  +{
  +   switch (sprn) {
  +   case SPRN_PMCR:
  +   return mfspr(SPRN_PMCR);
  +
  +   case SPRN_PMICR:
  +   return mfspr(SPRN_PMICR);
  +
  +   case SPRN_PMSR:
  +   return mfspr(SPRN_PMSR);
  +   }
  +   BUG();
  +}
  +
  +static inline void set_pmspr(unsigned long sprn, unsigned long val)
  +{
 
 Same here..

Same reason as above.

 
  +   switch (sprn) {
  +   case SPRN_PMCR:
  +   mtspr(SPRN_PMCR, val);
  +   return;
  +
  +   case SPRN_PMICR:
  +   mtspr(SPRN_PMICR, val);
  +   return;
  +
  +   case SPRN_PMSR:
  +   mtspr(SPRN_PMSR, val);
  +   return;
  +   }
  +   BUG();
  +}
  +
  +static void set_pstate(void *pstate)
  +{
  +   unsigned long val;
  +   unsigned long pstate_ul = *(unsigned long *) pstate;
 
 Why not sending value only to this routine instead of pointer?

Well this function is called via an smp_call_function. so, cannot send
a value :(

 
  +
  +   val = get_pmspr(SPRN_PMCR);
  +   val = val  0xULL;
 
 Maybe a blank line here?

Ok.

 
  +   /* Set both global(bits 56..63) and local(bits 48..55) PStates */
  +   val = val | (pstate_ul  56) | (pstate_ul  48);
 
 here as well?

Ok.
 

  +   pr_debug(Setting cpu %d pmcr to %016lX\n, smp_processor_id(), 
  val

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 03:01:29PM +0530, Viresh Kumar wrote:
 On Thu, Mar 20, 2014 at 5:41 PM, Gautham R. Shenoy
 e...@linux.vnet.ibm.com wrote:
  From: Gautham R. Shenoy e...@linux.vnet.ibm.com
 
  The current frequency of a cpu is reported through the sysfs file
  cpuinfo_cur_freq. This requires the driver to implement a
  -get(unsigned int cpu) method which will return the current
  operating frequency.
 
  Implement a function named powernv_cpufreq_get() which reads the local
  pstate from the PMSR and returns the corresponding frequency.
 
  Set the powernv_cpufreq_driver.get hook to powernv_cpufreq_get().
 
  Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
  ---
   drivers/cpufreq/powernv-cpufreq.c | 38 
  ++
   1 file changed, 38 insertions(+)
 
 Please merge these fixups with the first patch which is creating the driver.
 I understand that a different guy has created this patch and so wanted
 to have a patch on his name but its really difficult to review this
way.

Heh! Well, that wasn't the reason why this was sent out as a separate
patch, but never mind. Though I don't understand why it would be
difficult to review the patch though.

 Better add your signed-off in the first patch instead. Sending such changes
 once the driver is mainlined looks fine.

Sure, this makes sense.

 
  diff --git a/drivers/cpufreq/powernv-cpufreq.c 
  b/drivers/cpufreq/powernv-cpufreq.c
  index 46bee8a..ef6ed8c 100644
  --- a/drivers/cpufreq/powernv-cpufreq.c
  +++ b/drivers/cpufreq/powernv-cpufreq.c
  @@ -206,6 +206,43 @@ static inline void set_pmspr(unsigned long sprn, 
  unsigned long val)
  BUG();
   }
 
  +/*
  + * Computes the current frequency on this cpu
  + * and stores the result in *ret_freq.
  + */
  +static void powernv_read_cpu_freq(void *ret_freq)
  +{
  +   unsigned long pmspr_val;
  +   s8 local_pstate_id;
  +   int *cur_freq, freq, pstate_id;
  +
  +   cur_freq = (int *)ret_freq;
 
 You don't need cur_freq variable at all..

I don't like it either. But the compiler complains without this hack
:-(

 
  +   pmspr_val = get_pmspr(SPRN_PMSR);
  +
  +   /* The local pstate id corresponds bits 48..55 in the PMSR.
  + * Note: Watch out for the sign! */
  +   local_pstate_id = (pmspr_val  48)  0xFF;
  +   pstate_id = local_pstate_id;
 
 similarly local_pstate_id

well, I am interested in the bits 48..55 of pmspr_val. That's the
pstate_id which can be negative. So I'll like to keep
local_pstate_id. 
 
  +
  +   freq = pstate_id_to_freq(pstate_id);
  +   pr_debug(cpu %d pmsr %lx pstate_id %d frequency %d \n,
  +   smp_processor_id(), pmspr_val, pstate_id, freq);
  +   *cur_freq = freq;
 
 Move above print here after a blank line. Also remove freq variable as
 well and use *cur_freq directly.. And then you can rename it to freq
as well.

We can get rid of freq and use *cur_freq in its place.


Thanks for the detailed review.

--
Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 12:01:07PM +, David Laight wrote:
 From: Viresh Kumar
  
  On 21 March 2014 16:34, Gautham R Shenoy e...@linux.vnet.ibm.com wrote:
   Heh! Well, that wasn't the reason why this was sent out as a separate
   patch, but never mind. Though I don't understand why it would be
   difficult to review the patch though.
  
  Because the initial driver wasn't complete earlier. There were 2-3 patches
  after the first one which are changing what the first patch has added.
  Nothing else :)
  
+static void powernv_read_cpu_freq(void *ret_freq)
+{
+   unsigned long pmspr_val;
+   s8 local_pstate_id;
+   int *cur_freq, freq, pstate_id;
+
+   cur_freq = (int *)ret_freq;
  
   You don't need cur_freq variable at all..
  
   I don't like it either. But the compiler complains without this hack
   :-(
  
  Why would the compiler warn for doing this?:
  
  *(int *)ret_freq = freq;
 
 Because it is very likely to be wrong.
 In general casts of pointers to integer types are dangerous.
 In this case why not make the function return the value?

Because this function is called via an smp_call_function(). And we
need a way of returning the value to the caller.

 
   David
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 05:15:34PM +0530, Viresh Kumar wrote:
 On 21 March 2014 16:34, Gautham R Shenoy e...@linux.vnet.ibm.com wrote:
  Heh! Well, that wasn't the reason why this was sent out as a separate
  patch, but never mind. Though I don't understand why it would be
  difficult to review the patch though.
 
 Because the initial driver wasn't complete earlier. There were 2-3 patches
 after the first one which are changing what the first patch has added.
 Nothing else :)
 
   +static void powernv_read_cpu_freq(void *ret_freq)
   +{
   +   unsigned long pmspr_val;
   +   s8 local_pstate_id;
   +   int *cur_freq, freq, pstate_id;
   +
   +   cur_freq = (int *)ret_freq;
 
  You don't need cur_freq variable at all..
 
  I don't like it either. But the compiler complains without this hack
  :-(
 
 Why would the compiler warn for doing this?:
 
 *(int *)ret_freq = freq;

The compiler doesn't complain if I do this.
 
   +   pmspr_val = get_pmspr(SPRN_PMSR);
   +
   +   /* The local pstate id corresponds bits 48..55 in the PMSR.
   + * Note: Watch out for the sign! */
   +   local_pstate_id = (pmspr_val  48)  0xFF;
   +   pstate_id = local_pstate_id;
 
  similarly local_pstate_id
 
  well, I am interested in the bits 48..55 of pmspr_val. That's the
  pstate_id which can be negative. So I'll like to keep
  local_pstate_id.
 
 Can you please explain at bit level how getting the value in a s8
 first and then into a s32 will help you here? Instead of getting it
 directly into s32.

Consider the case when pmspr = 0x00fe;

We are interested in extracting the value 'fe'. And ensure that when
we store this value into an int, we get the sign extension right.

So the following doesn't work: 

   pstate_id = (pmspr_val  48)  0x;

Since we get pstate_id = 0x00fe, which is not the same as
0xfffe.

Of course, we could do the following, but I wasn't sure if
two levels of type casting helped on the readability front.

   pstate_id = (int) ((s8)((pmspr_val  48)  0xFF));

--
Thanks and Regards
gautham.
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/5] powernv: cpufreq driver for powernv platform

2014-03-21 Thread Gautham R Shenoy

Hi Viresh,

On Fri, Mar 21, 2014 at 04:24:27PM +0530, Viresh Kumar wrote:
 On 21 March 2014 16:13, Gautham R Shenoy e...@linux.vnet.ibm.com wrote:
  On Fri, Mar 21, 2014 at 02:11:32PM +0530, Viresh Kumar wrote:
 
   +   pr_debug(PState id %d freq %d MHz\n, id, freq);
   +   powernv_freqs[i].driver_data = i;
 
  I don't think you are using this field at all and this is the field you can
  use for driver_data and so you can get rid of powernv_pstate_ids[ ].
 
  Using driver_data to record powernv_pstate_ids won't work since
  powernv_pstate_ids can be negative. So a pstate_id -3 can be confused
  with CPUFREQ_BOOST_FREQ thereby not displaying the frequency
  corresponding to pstate id -3. So for now I think we will be retaining
  powernv_pstate_ids.
 
 As I said earlier, this field is only used by cpufreq drivers and cpufreq core
 isn't supposed to touch it. CPUFREQ_BOOST_FREQ and other macros
 are there for .frequency field and not this one.
 

Ok, I had based my code on linus's git tree. I checked the 'pm-cpufreq'
branch of Rafael's 'linux-pm' tree and freq_table.c contains the
following code snippet in show_available_frequencies:

if (show_boost ^ (table[i].driver_data == CPUFREQ_BOOST_FREQ))
continue;

I suspect we're talking about different code bases. So could you
please tell me which one should I be looking at ?

 
   +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
   +{
   +   int base, i;
   +
   +#ifdef CONFIG_SMP
 
  What will break if you don't have this ifdef here? Without that as well
  below code should work?
 
 Missed this one?
 
   +   base = cpu_first_thread_sibling(policy-cpu);
   +
   +   for (i = 0; i  threads_per_core; i++)
   +   cpumask_set_cpu(base + i, policy-cpus);
   +#endif
   +   policy-cpuinfo.transition_latency = 25000;
   +
   +   policy-cur = powernv_freqs[0].frequency;
   +   cpufreq_frequency_table_get_attr(powernv_freqs, policy-cpu);
 
  This doesn't exist anymore.
 
  Didn't get this comment!
 
 cpufreq_frequency_table_get_attr() routine doesn't exist anymore.
 
   +static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
   +{
   +   cpufreq_frequency_table_put_attr(policy-cpu);
   +   return 0;
   +}
 
  You don't need this..
 
  Why not ?
 
 Because cpufreq_frequency_table_get_attr() and
 cpufreq_frequency_table_put_attr() don't exist anymore :)

Well, it does in the codebases that I was looking at. May be I've been
looking at the wrong place.

 
   +module_init(powernv_cpufreq_init);
   +module_exit(powernv_cpufreq_exit);
 
  Place these right after the routines without any blank lines in
  between.
 
  Is this the new convention ?
 
 Don't know I have been following this since long time, probably from the
 time I started with Mainline stuff.. I have seen many maintainers advising 
 this
 as it make code more readable, specially if these routines are quite big..
 
 Probably it isn't mentioned in coding guidelines but people follow it most of
 the times.

Ok. I was not aware of this. Good to know :-)
 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 06:42:50PM +0530, Viresh Kumar wrote:
 On 21 March 2014 18:34, Gautham R Shenoy e...@linux.vnet.ibm.com wrote:
  Consider the case when pmspr = 0x00fe;
 
  We are interested in extracting the value 'fe'. And ensure that when
  we store this value into an int, we get the sign extension right.
 
  So the following doesn't work:
 
 pstate_id = (pmspr_val  48)  0x;
 
 What about pstate_id = (pmspr_val  48)  0xFF; ??

Nope. 

Suppose the pmspr_val is contained in the register 
%rax, and pstate_id corresponds to the address -20(%rbp) then:

pstate_id = (pmspr_val  48)  0xFF;

would corrspond to 

shrq$48, %rax// Left shift by 48 bits
andl$255, %eax   // Mask the lower 32 bits of %rax with 
0x00FF
movl%eax, -20(%rbp)  // Store the lower 32 bits of %rax
into pstate_id


On the other hand, 

   pstate_id = (int)((s8)((pmspr_val  48)  0xFF));

would correspond to:

shrq$48, %rax  // Left shift by 48 bits.
movsbl  %al, %eax  // Move the lower 8 bits of %rax to %eax
  with sign-extension. 
movl%eax, -20(%rbp) // store the result in pstate_id;

So with this, we are getting the sign extension due to the use of movsbl.

And if local_pstate_id corresponds to the address -1(%rbp) then:

   local_pstate_id = (pmspr_val  48)  0xFF;
   pstate_id = local_pstate_id;

would correspond to:

shrq$48, %rax// Left shift by 48 bits
movb%al, -1(%rbp)//copy the lower 8 bits to local_pstate_id
movsbl  -1(%rbp), %eax   //move the contents of
   local_pstate_id to %eax with sign extension.
movl%eax, -20(%rbp)  // Store the result in pstate_id


Hope this helps :-)

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/5] powernv: cpufreq driver for powernv platform

2014-03-21 Thread Gautham R Shenoy

On Fri, Mar 21, 2014 at 04:24:27PM +0530, Viresh Kumar wrote:
   +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
   +{
   +   int base, i;
   +
   +   base = cpu_first_thread_sibling(policy-cpu);
   +
   +   for (i = 0; i  threads_per_core; i++)
   +   cpumask_set_cpu(base + i, policy-cpus);
   +   policy-cpuinfo.transition_latency = 25000;
   +
   +   policy-cur = powernv_freqs[0].frequency;
   +   cpufreq_frequency_table_get_attr(powernv_freqs, policy-cpu);
 
  This doesn't exist anymore.

Ok, I guess the right thing to do at this point is call

   cpufreq_table_validate_and_show(policy, powernv_freqs);

Will fix the code to take care of this.
--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/5] powernv:cpufreq: Implement the driver-get() method

2014-03-22 Thread Gautham R Shenoy

Hi Ben,

On Sat, Mar 22, 2014 at 09:56:30AM +1100, Benjamin Herrenschmidt wrote:
 On Fri, 2014-03-21 at 16:34 +0530, Gautham R Shenoy wrote:
 
   
+/*
+ * Computes the current frequency on this cpu
+ * and stores the result in *ret_freq.
+ */
+static void powernv_read_cpu_freq(void *ret_freq)
+{
+   unsigned long pmspr_val;
+   s8 local_pstate_id;
+   int *cur_freq, freq, pstate_id;
+
+   cur_freq = (int *)ret_freq;
   
   You don't need cur_freq variable at all..
  
  I don't like it either. But the compiler complains without this hack
  :-(
 
 Casting integers into void * is a recipe for disaster... what is that
 supposed to be about ?

Like I mentioned elsewhere on this thread, we're calling
powernv_read_cpu_freq via an smp_call_function(). We use this to
obtain the frequency on the cpu where powernv_read_cpu_freq
executes and return it to the caller of smp_call_function.

 We lose all type checking and get exposed
 to endian issues etc... the day somebody uses a different type on both
 sides.
 
Yes, I understand the problem now. I'll think of a safer way to pass
the return value.

 Also is freq a frequency ? In this case an int isn't big enough.

freq is the frequency stored in the cpufreq_table. The value is in
kHz. So, int should be big enough.

 Cheers,
 Ben.
 


--
Thanks and Regards
gautham. 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4] powernv: Dynamic Frequency Scaling Enablement

2014-03-26 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

This is the v4 of the patchset to enable Dynamic Frequency Scaling
on IBM PowerNV Platforms. I have incorporated the review comments
from the previous version (can be found at [1]). 

In this version, 

  * Multiple patches from the previous version have been into a single
patch, since the higher numbered patches implemented some helper
functions and the driver methods which should have been a part of
the first patch to begin with.

  * Use the generic helpers provided by the cpufreq core available in
the latest linux-next tree.

  * Fix the code to avoid casting integer pointers to void.

The patch is based on top of the commit:06ed26d1de59ce7cbbe68378b7e470be169750e5
of the linux-next tree.


[1]: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg76675.html

Vaidyanathan Srinivasan (1):
  powernv, cpufreq: cpufreq driver for powernv platform

 arch/powerpc/configs/pseries_defconfig|   1 +
 arch/powerpc/configs/pseries_le_defconfig |   1 +
 arch/powerpc/include/asm/reg.h|   4 +
 arch/powerpc/platforms/powernv/Kconfig|   6 +
 drivers/cpufreq/Kconfig.powerpc   |   8 +
 drivers/cpufreq/Makefile  |   1 +
 drivers/cpufreq/powernv-cpufreq.c | 372 ++
 7 files changed, 393 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform

2014-03-26 Thread Gautham R. Shenoy

From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The code was originally written by sva...@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:

 * The policy-cpus needs to be populated in a hotplug-invariant
   manner instead of using cpu_sibling_mask() which varies with
   cpu-hotplug. This is because the cpufreq core code copies this
   content into policy-related_cpus mask which is should not vary on
   cpu-hotplug. [Authored by srivatsa.b...@linux.vnet.ibm.com]

 * On POWER systems, the CPU frequency is controlled at a core-level
   and hence we need to serialize so that only one of the threads in
   the core switches the core's frequency at a time. Introduce
   per-core locking to enable finer-grained synchronization and
   thereby enhance the speed and responsiveness of the cpufreq driver
   to varying workload demands.

 The design of per-core locking is very simple and
   straight-forward: we first define a Per-CPU lock and use the ones
   that belongs to the first thread sibling of the core.

 cpu_first_thread_sibling() macro is used to find the *common*
   lock for all thread siblings belonging to a core. [Authored by
   srivatsa.b...@linux.vnet.ibm.com]

 * Create a helper routine that can return the cpu-frequency for the
   corresponding pstate_id. Also, cache the values of the pstate_max,
   pstate_min and pstate_nominal and nr_pstates in a static structure
   so that they can be reused in the future to perform any
   validations. [Authored by e...@linux.vnet.ibm.com]

 * Create a driver attribute named cpuinfo_nominal_freq which creates
   a sysfs read-only file named cpuinfo_nominal_freq. Export the
   frequency corresponding to the nominal_pstate through this
   interface.

 Nominal frequency is the highest non-turbo frequency for the
   platform.  This is generally used for setting governor policies
   from user space for optimal energy efficiency. [Authored by
   e...@linux.vnet.ibm.com]

 * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
   return the current operating frequency. Export this via the sysfs
   interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
   powernv_cpufreq_get(). [Authored by e...@linux.vnet.ibm.com]

[Change log updated by e...@linux.vnet.ibm.com]

Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/configs/pseries_defconfig|   1 +
 arch/powerpc/configs/pseries_le_defconfig |   1 +
 arch/powerpc/include/asm/reg.h|   4 +
 arch/powerpc/platforms/powernv/Kconfig|   6 +
 drivers/cpufreq/Kconfig.powerpc   |   8 +
 drivers/cpufreq/Makefile  |   1 +
 drivers/cpufreq/powernv-cpufreq.c | 372 ++
 7 files changed, 393 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index e9a8b4e..a285d44 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -353,3 +353,4 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/configs/pseries_le_defconfig 
b/arch/powerpc/configs/pseries_le_defconfig
index 62771e0..47e6161 100644
--- a/arch/powerpc/configs/pseries_le_defconfig
+++ b/arch/powerpc/configs/pseries_le_defconfig
@@ -350,3 +350,4 @@ CONFIG_CRYPTO_LZO=m
 # CONFIG_CRYPTO_ANSI_CPRNG is not set
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 90c06ec..84f92ca 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
+#define SPRN_PMICR 0x354   /* Power Management Idle Control Reg */
+#define

Re: [PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform

2014-03-27 Thread Gautham R Shenoy

On Thu, Mar 27, 2014 at 12:09:53PM +0530, Viresh Kumar wrote:
 Cc'ing Rafael.


Thanks. It was a lapse on my part by not Cc'ing Rafael in the first
place.
 
 On 26 March 2014 22:25, Gautham R. Shenoy e...@linux.vnet.ibm.com wrote:
  From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
 
  Backend driver to dynamically set voltage and frequency on
  IBM POWER non-virtualized platforms.  Power management SPRs
  are used to set the required PState.
 
  This driver works in conjunction with cpufreq governors
  like 'ondemand' to provide a demand based frequency and
  voltage setting on IBM POWER non-virtualized platforms.
 
  PState table is obtained from OPAL v3 firmware through device
  tree.
 
  powernv_cpufreq back-end driver would parse the relevant device-tree
  nodes and initialise the cpufreq subsystem on powernv platform.
 
  The code was originally written by sva...@linux.vnet.ibm.com. Over
  time it was modified to accomodate bug-fixes as well as updates to the
  the cpu-freq core. Relevant portions of the change logs corresponding
  to those modifications are noted below:
 
   * The policy-cpus needs to be populated in a hotplug-invariant
 manner instead of using cpu_sibling_mask() which varies with
 cpu-hotplug. This is because the cpufreq core code copies this
 content into policy-related_cpus mask which is should not vary on
 
 s/is /

Good catch! Will fix this

[...]
 
   * On POWER systems, the CPU frequency is controlled at a core-level
 and hence we need to serialize so that only one of the threads in
 the core switches the core's frequency at a time. Introduce
 per-core locking to enable finer-grained synchronization and
 thereby enhance the speed and responsiveness of the cpufreq driver
 to varying workload demands.
 
   The design of per-core locking is very simple and
 straight-forward: we first define a Per-CPU lock and use the ones
 that belongs to the first thread sibling of the core.
 
   cpu_first_thread_sibling() macro is used to find the *common*
 lock for all thread siblings belonging to a core. [Authored by
 srivatsa.b...@linux.vnet.ibm.com]
 
 We don't need that after serialization patch of core is accepted. And it
 should be accepted soon, in one form or other.

As of now, I prefer this patch be based on code that is in the -next
tree. I'll get rid of the per-core locking once the serialization
patch of the core is accepted.

[...]
  --- a/arch/powerpc/configs/pseries_defconfig
  +++ b/arch/powerpc/configs/pseries_defconfig
  @@ -353,3 +353,4 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
   CONFIG_VIRTUALIZATION=y
   CONFIG_KVM_BOOK3S_64=m
   CONFIG_KVM_BOOK3S_64_HV=y
  +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
  diff --git a/arch/powerpc/configs/pseries_le_defconfig 
  b/arch/powerpc/configs/pseries_le_defconfig
  index 62771e0..47e6161 100644
  --- a/arch/powerpc/configs/pseries_le_defconfig
  +++ b/arch/powerpc/configs/pseries_le_defconfig
  @@ -350,3 +350,4 @@ CONFIG_CRYPTO_LZO=m
   # CONFIG_CRYPTO_ANSI_CPRNG is not set
   CONFIG_CRYPTO_DEV_NX=y
   CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
  +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
 
 don't know how Rafael want this, but probably this part could have gone
 through ppc tree..
 

That would mean multiple patches :-)

  +
  +#define pr_fmt(fmt)powernv-cpufreq:  fmt
  +
  +#include linux/module.h
  +#include linux/cpufreq.h
  +#include linux/smp.h
  +#include linux/of.h
  +#include asm/cputhreads.h
 
 I thought I gave a comment on missing headers here?

Ok, so these seem to be the missing ones.

#include linux/kernel.h
#include linux/percpu-defs.h
#include linux/mutex.h
#include linux/sysfs.h
#include linux/cpumask.h

#include asm/reg.h

Have I missed anything else ?


  +static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
  +static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
 I though we don't need this anymore? Either Rafael will take my patch as
 is for the BOOST fixup or we will end up creating .isboost field in
 struct cpufreq_frequency_table and so you could have used .driver_data here.
 

I mentioned in my reply to your BOOST fixup patch that we would like
pstate_ids to be of the type signed int, while the .driver_data is
an unsigned int. If we end up using .driver_data, we would have to
be careful and not use it directly but reassign it to a signed int (or
typecast it) before using it.

Not just that, the pr_debugs in the core which are printed during
frequency transitions will then end up printing large positive values
(since it will interpret negative pstate_ids as unsigned int) in the
place of pstate_ids, which would not be particularly useful.

  +struct powernv_pstate_info {
  +   int pstate_min_id;
  +   int pstate_max_id;
  +   int pstate_nominal_id;
  +   int nr_pstates;
  +};
  +static struct powernv_pstate_info powernv_pstate_info;
 
 Maybe write it as this (if you like :), as this is the only instance
 of this struct):
 
 Also

Re: [PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform

2014-03-27 Thread Gautham R Shenoy

On Thu, Mar 27, 2014 at 03:29:36PM +0530, Viresh Kumar wrote:
 On 27 March 2014 15:00, Gautham R Shenoy e...@linux.vnet.ibm.com wrote:
  As of now, I prefer this patch be based on code that is in the -next
  tree. I'll get rid of the per-core locking once the serialization
  patch of the core is accepted.
 
 Okay.. Then divide this patch into two parts, second one doing all
 the serialization stuff and I will review only the first one from V5 :)
 
 Maybe mention that in Patch2 as well, that once serialization patches
 are accepted, its not required.

Sure. This would be a good idea.

 
  don't know how Rafael want this, but probably this part could have gone
  through ppc tree..
 
  That would mean multiple patches :-)
 
 I wasn't opposed to multiple patches at all, but multiple patches for
 basic functionality of same driver file. Otherwise separating things
 into patches is the best receipe.

Fair enough!

 
   +#define pr_fmt(fmt)powernv-cpufreq:  fmt
   +
   +#include linux/module.h
   +#include linux/cpufreq.h
   +#include linux/smp.h
   +#include linux/of.h
   +#include asm/cputhreads.h
 
  I thought I gave a comment on missing headers here?
 
  Ok, so these seem to be the missing ones.
 
  #include linux/kernel.h
  #include linux/percpu-defs.h
  #include linux/mutex.h
 
 This may shift to Patch 2 ..

Ok.

 
  #include linux/sysfs.h
  #include linux/cpumask.h
 
  #include asm/reg.h
 
  Have I missed anything else ?
 
 Not sure :) .. I will see if I can find anymore..


Cool.
 
   +static struct cpufreq_frequency_table 
   powernv_freqs[POWERNV_MAX_PSTATES+1];
   +static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
  I though we don't need this anymore? Either Rafael will take my patch as
  is for the BOOST fixup or we will end up creating .isboost field in
  struct cpufreq_frequency_table and so you could have used .driver_data 
  here.
 
 
  I mentioned in my reply to your BOOST fixup patch that we would like
  pstate_ids to be of the type signed int, while the .driver_data is
  an unsigned int. If we end up using .driver_data, we would have to
  be careful and not use it directly but reassign it to a signed int (or
  typecast it) before using it.
 
 Probably many people would want it to be unsigned it, so that they
 can put some strange hex values there..
 
 Which part of your code conflicts with this right now? I can see only
 this line in powernv_set_freq()

I am not denying the advantage of storing pstate_ids in .driver_data
since we will then have all the information in one place.
(That said, in the future if we want to export additional
information, say pertaining to the voltage, we will have to keep a
separate array anyway :-) )

However, as of now, I am wary about reusing a variable provided by
the core, whose type is fixed (which is not the type I am interested
in) and whose interpretation is ambiguous (since it is also being used
as a BOOST indicator).

When we have a .is_boost field, or a flags field in the cpufreq_table,
and once the .driver_data field is completely opaque, I would be
comfortable making use of .driver_data to store the pstate_ids. So
I'll send out those patches once the fixes are present in the
core. For now, let this code stay as it is. I think we should be ok
with the additional overhead of 1kb as of now.

 
 freq_data.pstate_id = powernv_pstate_ids[new_index];
 
 And I don't think we will have side effects here.
 
  Not just that, the pr_debugs in the core which are printed during
  frequency transitions will then end up printing large positive values
  (since it will interpret negative pstate_ids as unsigned int) in the
  place of pstate_ids, which would not be particularly useful.
 
 I have checked it again, those prints shouldn't have used that field.
 Will fix them.
 
 And so you can use that field in your code and I will get core fixed
 for you..

Like I said earlier, once the core is fixed, I'll be comfortable using
the .driver_data field for the purpose I need. Not until then :-)

 
  Why do you need to get these from DT? And not find that yourself here 
  instead?
 
  Well, I think we can obtain a more accurate value from the DT which
  would have already computed it. But I'll let vaidy answer this.
 
 Can we simply refer to frequency values here for finding min and max
 pstates? If yes, then probably this is the better place as there can be
 human errors in DT.. Also those binding are just not required then.
 And obviously less headache for you as well when you are changing
 pstate tables.
 
   +   freq_data = (struct powernv_smp_call_data *)arg;
 
  don't need casting here ?
 
  Doesn't harm having it there. Just like the #includes :-)
 
 Do you ever have these typecasts while you do kmalloc? It also
 returns void *.. That's the same..
 
 CodingStyle: Chapter 14: Allocating memory:
 
 Casting the return value which is a void pointer is redundant. The conversion
 from void pointer to any other pointer type is guaranteed by the C programming

[PATCH v5 0/3] powernv,cpufreq: Dynamic Frequency Scaling support

2014-03-28 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi,

This is v5 of the patchset to enable dynamic frequency scaling on IBM
PowerNV platforms. This patchset does address all the review comments
obtained for v4 (which can be found at [1]).

Changes from v4:

* Created a separate patch to select the CPUFreq related Config
  options for PowerNV

* Dropped the per-core locking hunks in
  drivers/cpufreq/powernv-cpufreq.c since the CPUFreq core takes
  care of the for us after the following commit which is present in
  linux-next:

commit 12478cf0c55e5969f740bb38a24b1a0104ae18d8   
Author: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Date:   Mon Mar 24 13:35:44 2014 +0530

cpufreq: Make sure frequency transitions are serialized

* [PATCH v5 3/3] gets rid of the powernv_pstate_ids[] array that
  was being used to record the pstate ids. After the following
  patch it is safe to use cpufreq_frequency_table.driver_data
  since it is opaque to the cpufreq core: 

From: Viresh Kumar viresh.ku...@linaro.org
Date: 2014-03-28 13:53:47
url: http://marc.info/?l=linux-pmm=139601416804702w=2

cpufreq: create another field .flags in cpufreq_frequency_table

The patchset is based on the commit
201544be8c37dffbf069bb5fc9edb5674f8c1754 of the linux-next tree.

While all the patches in the patchset apply cleanly on linux-next,
[PATCH v5 3/3] requires the Viresh's patch that was mentioned
above. Otherwise, the frequency corresponding to pstate id -3 will be
omited while reporting the scaling_available_frequencies in sysfs.

[1]: http://marc.info/?l=linux-pmm=139585297620612w=2

Gautham R. Shenoy (2):
  powernv, cpufreq: Select CPUFreq related Kconfig options for powernv
  powernv,cpufreq: Use cpufreq_frequency_table.driver_data to store
pstate ids

Vaidyanathan Srinivasan (1):
  powernv, cpufreq: cpufreq driver for powernv platform

 arch/powerpc/configs/pseries_defconfig|   1 +
 arch/powerpc/configs/pseries_le_defconfig |   1 +
 arch/powerpc/include/asm/reg.h|   4 +
 arch/powerpc/platforms/powernv/Kconfig|   6 +
 drivers/cpufreq/Kconfig.powerpc   |   8 +
 drivers/cpufreq/Makefile  |   1 +
 drivers/cpufreq/powernv-cpufreq.c | 341 ++
 7 files changed, 362 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 2/3] powernv, cpufreq: cpufreq driver for powernv platform

2014-03-28 Thread Gautham R. Shenoy

From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The code was originally written by sva...@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:

 * The policy-cpus needs to be populated in a hotplug-invariant
   manner instead of using cpu_sibling_mask() which varies with
   cpu-hotplug. This is because the cpufreq core code copies this
   content into policy-related_cpus mask which should not vary on
   cpu-hotplug. [Authored by srivatsa.b...@linux.vnet.ibm.com]

 * Create a helper routine that can return the cpu-frequency for the
   corresponding pstate_id. Also, cache the values of the pstate_max,
   pstate_min and pstate_nominal and nr_pstates in a static structure
   so that they can be reused in the future to perform any
   validations. [Authored by e...@linux.vnet.ibm.com]

 * Create a driver attribute named cpuinfo_nominal_freq which creates
   a sysfs read-only file named cpuinfo_nominal_freq. Export the
   frequency corresponding to the nominal_pstate through this
   interface.

 Nominal frequency is the highest non-turbo frequency for the
   platform.  This is generally used for setting governor policies
   from user space for optimal energy efficiency. [Authored by
   e...@linux.vnet.ibm.com]

 * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
   return the current operating frequency. Export this via the sysfs
   interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
   powernv_cpufreq_get(). [Authored by e...@linux.vnet.ibm.com]

[Change log updated by e...@linux.vnet.ibm.com]

Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/reg.h|   4 +
 drivers/cpufreq/Kconfig.powerpc   |   8 +
 drivers/cpufreq/Makefile  |   1 +
 drivers/cpufreq/powernv-cpufreq.c | 342 ++
 4 files changed, 355 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1a36b8e..2189f8f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
+#define SPRN_PMICR 0x354   /* Power Management Idle Control Reg */
+#define SPRN_PMSR  0x355   /* Power Management Status Reg */
+#define SPRN_PMCR  0x374   /* Power Management Control Register */
+
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
 #define FSCR_EBB_LG7   /* Enable Event Based Branching */
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index ca0021a..72564b7 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -54,3 +54,11 @@ config PPC_PASEMI_CPUFREQ
help
  This adds the support for frequency switching on PA Semi
  PWRficient processors.
+
+config POWERNV_CPUFREQ
+   tristate CPU frequency scaling for IBM POWERNV platform
+   depends on PPC_POWERNV
+   default y
+   help
+This adds support for CPU frequency switching on IBM POWERNV
+platform
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 7494565..0dbb963 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_PPC_CORENET_CPUFREQ)   += ppc-corenet-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC)+= pmac32-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC64)  += pmac64-cpufreq.o
 obj-$(CONFIG_PPC_PASEMI_CPUFREQ)   += pasemi-cpufreq.o
+obj-$(CONFIG_POWERNV_CPUFREQ)  += powernv-cpufreq.o
 
 
##
 # Other platform drivers
diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
new file mode 100644
index 000..e1e5197
--- /dev/null
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -0,0 +1,342

[PATCH v5 1/3] powernv, cpufreq: Select CPUFreq related Kconfig options for powernv

2014-03-28 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Enable CPUFreq for PowerNV. Select performance, powersave,
userspace and ondemand governors. Choose ondemand to be the
default governor.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---
 arch/powerpc/configs/pseries_defconfig| 1 +
 arch/powerpc/configs/pseries_le_defconfig | 1 +
 arch/powerpc/platforms/powernv/Kconfig| 6 ++
 3 files changed, 8 insertions(+)

diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 9ea8342b..a905063 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -306,3 +306,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/configs/pseries_le_defconfig 
b/arch/powerpc/configs/pseries_le_defconfig
index 3c84f9d..58e3dbf 100644
--- a/arch/powerpc/configs/pseries_le_defconfig
+++ b/arch/powerpc/configs/pseries_le_defconfig
@@ -301,3 +301,4 @@ CONFIG_CRYPTO_LZO=m
 # CONFIG_CRYPTO_ANSI_CPRNG is not set
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..c252ee9 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,12 @@ config PPC_POWERNV
select PPC_UDBG_16550
select PPC_SCOM
select ARCH_RANDOM
+   select CPU_FREQ
+   select CPU_FREQ_GOV_PERFORMANCE
+   select CPU_FREQ_GOV_POWERSAVE
+   select CPU_FREQ_GOV_USERSPACE
+   select CPU_FREQ_GOV_ONDEMAND
+   select CPU_FREQ_GOV_CONSERVATIVE
default y
 
 config PPC_POWERNV_RTAS
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 3/3] powernv, cpufreq: Use cpufreq_frequency_table.driver_data to store pstate ids

2014-03-28 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

The .driver_data field in the cpufreq_frequency_table was supposed to
be private to the drivers. However at some later point, it was being
used to indicate if the particular frequency in the table is the
BOOST_FREQUENCY. After patches [1] and [2], the .driver_data is once
again private to the driver. Thus we can safely use
cpufreq_frequency_table.driver_data to store pstate_ids instead of
having to maintain a separate array powernv_pstate_ids[] for this
purpose.

[1]:
  Subject: cpufreq: don't print value of .driver_data from core
  From   : Viresh Kumar viresh.kumar@ linaro.org
  url: http://marc.info/?l=linux-pmm=139601421504709w=2

[2]:
  Subject: cpufreq: create another field .flags in cpufreq_frequency_table
  From   : Viresh Kumar viresh.ku...@linaro.org
  url: http://marc.info/?l=linux-pmm=139601416804702w=2

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e1e5197..9edccc6 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -33,7 +33,6 @@
 #define POWERNV_MAX_PSTATES256
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -112,7 +111,7 @@ static int init_powernv_pstates(void)
 
pr_debug(PState id %d freq %d MHz\n, id, freq);
powernv_freqs[i].frequency = freq * 1000; /* kHz */
-   powernv_pstate_ids[i] = id;
+   powernv_freqs[i].driver_data = id;
}
/* End of list marker entry */
powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
@@ -283,7 +282,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 {
struct powernv_smp_call_data freq_data;
 
-   freq_data.pstate_id = powernv_pstate_ids[new_index];
+   freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
/*
 * Use smp_call_function to send IPI and execute the
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH REPOST v5 2/3] powernv, cpufreq: cpufreq driver for powernv platform

2014-04-01 Thread Gautham R. Shenoy

From: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com

Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The code was originally written by sva...@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:

 * The policy-cpus needs to be populated in a hotplug-invariant
   manner instead of using cpu_sibling_mask() which varies with
   cpu-hotplug. This is because the cpufreq core code copies this
   content into policy-related_cpus mask which should not vary on
   cpu-hotplug. [Authored by srivatsa.b...@linux.vnet.ibm.com]

 * Create a helper routine that can return the cpu-frequency for the
   corresponding pstate_id. Also, cache the values of the pstate_max,
   pstate_min and pstate_nominal and nr_pstates in a static structure
   so that they can be reused in the future to perform any
   validations. [Authored by e...@linux.vnet.ibm.com]

 * Create a driver attribute named cpuinfo_nominal_freq which creates
   a sysfs read-only file named cpuinfo_nominal_freq. Export the
   frequency corresponding to the nominal_pstate through this
   interface.

 Nominal frequency is the highest non-turbo frequency for the
   platform.  This is generally used for setting governor policies
   from user space for optimal energy efficiency. [Authored by
   e...@linux.vnet.ibm.com]

 * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
   return the current operating frequency. Export this via the sysfs
   interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
   powernv_cpufreq_get(). [Authored by e...@linux.vnet.ibm.com]

[Change log updated by e...@linux.vnet.ibm.com]

Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Anton Blanchard an...@samba.org
Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/reg.h|   4 +
 drivers/cpufreq/Kconfig.powerpc   |   8 +
 drivers/cpufreq/Makefile  |   1 +
 drivers/cpufreq/powernv-cpufreq.c | 342 ++
 4 files changed, 355 insertions(+)
 create mode 100644 drivers/cpufreq/powernv-cpufreq.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1a36b8e..2189f8f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -271,6 +271,10 @@
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
+#define SPRN_PMICR 0x354   /* Power Management Idle Control Reg */
+#define SPRN_PMSR  0x355   /* Power Management Status Reg */
+#define SPRN_PMCR  0x374   /* Power Management Control Register */
+
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
 #define FSCR_EBB_LG7   /* Enable Event Based Branching */
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index ca0021a..72564b7 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -54,3 +54,11 @@ config PPC_PASEMI_CPUFREQ
help
  This adds the support for frequency switching on PA Semi
  PWRficient processors.
+
+config POWERNV_CPUFREQ
+   tristate CPU frequency scaling for IBM POWERNV platform
+   depends on PPC_POWERNV
+   default y
+   help
+This adds support for CPU frequency switching on IBM POWERNV
+platform
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 7494565..0dbb963 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_PPC_CORENET_CPUFREQ)   += ppc-corenet-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC)+= pmac32-cpufreq.o
 obj-$(CONFIG_CPU_FREQ_PMAC64)  += pmac64-cpufreq.o
 obj-$(CONFIG_PPC_PASEMI_CPUFREQ)   += pasemi-cpufreq.o
+obj-$(CONFIG_POWERNV_CPUFREQ)  += powernv-cpufreq.o
 
 
##
 # Other platform drivers
diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
new file mode 100644
index 000..e1e5197
--- /dev/null
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -0,0 +1,342

[PATCH REPOST v5 3/3] powernv, cpufreq: Use cpufreq_frequency_table.driver_data to store pstate ids

2014-04-01 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

The .driver_data field in the cpufreq_frequency_table was supposed to
be private to the drivers. However at some later point, it was being
used to indicate if the particular frequency in the table is the
BOOST_FREQUENCY. After patches [1] and [2], the .driver_data is once
again private to the driver. Thus we can safely use
cpufreq_frequency_table.driver_data to store pstate_ids instead of
having to maintain a separate array powernv_pstate_ids[] for this
purpose.

[1]:
  Subject: cpufreq: don't print value of .driver_data from core
  From   : Viresh Kumar viresh.kumar@ linaro.org
  url: http://marc.info/?l=linux-pmm=139601421504709w=2

[2]:
  Subject: cpufreq: create another field .flags in cpufreq_frequency_table
  From   : Viresh Kumar viresh.ku...@linaro.org
  url: http://marc.info/?l=linux-pmm=139601416804702w=2

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e1e5197..9edccc6 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -33,7 +33,6 @@
 #define POWERNV_MAX_PSTATES256
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -112,7 +111,7 @@ static int init_powernv_pstates(void)
 
pr_debug(PState id %d freq %d MHz\n, id, freq);
powernv_freqs[i].frequency = freq * 1000; /* kHz */
-   powernv_pstate_ids[i] = id;
+   powernv_freqs[i].driver_data = id;
}
/* End of list marker entry */
powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
@@ -283,7 +282,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 {
struct powernv_smp_call_data freq_data;
 
-   freq_data.pstate_id = powernv_pstate_ids[new_index];
+   freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
/*
 * Use smp_call_function to send IPI and execute the
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH REPOST v5 1/3] powernv, cpufreq: Select CPUFreq related Kconfig options for powernv

2014-04-01 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Enable CPUFreq for PowerNV. Select performance, powersave,
userspace and ondemand governors. Choose ondemand to be the
default governor.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---
 arch/powerpc/configs/pseries_defconfig| 1 +
 arch/powerpc/configs/pseries_le_defconfig | 1 +
 arch/powerpc/platforms/powernv/Kconfig| 6 ++
 3 files changed, 8 insertions(+)

diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 9ea8342b..a905063 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -306,3 +306,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/configs/pseries_le_defconfig 
b/arch/powerpc/configs/pseries_le_defconfig
index 3c84f9d..58e3dbf 100644
--- a/arch/powerpc/configs/pseries_le_defconfig
+++ b/arch/powerpc/configs/pseries_le_defconfig
@@ -301,3 +301,4 @@ CONFIG_CRYPTO_LZO=m
 # CONFIG_CRYPTO_ANSI_CPRNG is not set
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 895e8a2..c252ee9 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -11,6 +11,12 @@ config PPC_POWERNV
select PPC_UDBG_16550
select PPC_SCOM
select ARCH_RANDOM
+   select CPU_FREQ
+   select CPU_FREQ_GOV_PERFORMANCE
+   select CPU_FREQ_GOV_POWERSAVE
+   select CPU_FREQ_GOV_USERSPACE
+   select CPU_FREQ_GOV_ONDEMAND
+   select CPU_FREQ_GOV_CONSERVATIVE
default y
 
 config PPC_POWERNV_RTAS
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] powernv:cpufreq: Create a powernv_cpu_to_core_mask() helper.

2014-03-03 Thread Gautham R. Shenoy

From: Gautham R. Shenoy e...@linux.vnet.ibm.com

Move the code that computes the cpumask corresponding to the
thread-siblings of a cpu to a new method named
powernv_cpu_to_core_mask() so that it can be used by other places in
the code.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 drivers/cpufreq/powernv-cpufreq.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index c59eb26..f0dae6f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -166,6 +166,23 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
 
 /* Helper routines */
 
+/**
+ * Sets the bits corresponding to the thread-siblings of cpu in its core
+ * in 'cpus'.
+ */
+static void powernv_cpu_to_core_mask(unsigned int cpu, cpumask_var_t cpus)
+{
+   int base, i;
+
+   base = cpu_first_thread_sibling(cpu);
+
+   for (i = 0; i  threads_per_core; i++) {
+   cpumask_set_cpu(base + i, cpus);
+   }
+
+   return;
+}
+
 /* Access helpers to power mgt SPR */
 
 static inline unsigned long get_pmspr(unsigned long sprn)
@@ -231,13 +248,10 @@ static int powernv_set_freq(cpumask_var_t cpus, unsigned 
int new_index)
 
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-   int base, i;
+   int i;
 
 #ifdef CONFIG_SMP
-   base = cpu_first_thread_sibling(policy-cpu);
-
-   for (i = 0; i  threads_per_core; i++)
-   cpumask_set_cpu(base + i, policy-cpus);
+   powernv_cpu_to_core_mask(policy-cpu, policy-cpus);
 #endif
policy-cpuinfo.transition_latency = 25000;
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] pseries: cpu: Reduce the polling interval in __cpu_up()

2009-06-24 Thread Gautham R Shenoy

Time time taken for a single cpu online operation on a pseries machine
is as follows:
Dedicated LPAR (POWER6): ~220ms.
Shared LPAR (POWER5)   : ~240ms.

Of this time, approximately 200ms is taken up by __cpu_up(). This is because
we poll every 200ms to check if the new cpu has notified it's presence
through the cpu_callin_map. We repeat this operation until the new cpu sets
the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.

However, using completion_structs instead of polling loops,
the time taken by the new processor to indicate it's presence has
found to be less than 1ms on pseries. This method however may not
work on all powerpc platforms due to the time-base synchronization code.

Keeping this in mind, we could reduce msleep polling interval from
200ms to 1ms while retaining the 5 second timeout.

With this, the time taken for a cpu online operation changes as follows:
Dedicated LPAR (POWER6): 20-25ms.
Shared LPAR (POWER5)   : 60-80ms.

In both these cases, it was found that the code polls through the loop
only once indicating that 1ms is a reasonable value, atleast on pseries.

The code needs testing on other powerpc platforms.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/kernel/smp.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 65484b2..00c13a1 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -411,9 +411,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
 * CPUs can take much longer to come up in the
 * hotplug case.  Wait five seconds.
 */
-   for (c = 25; c  !cpu_callin_map[cpu]; c--) {
-   msleep(200);
-   }
+   for (c = 5000; c  !cpu_callin_map[cpu]; c--)
+   msleep(1);
 #endif
 
if (!cpu_callin_map[cpu]) {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] cpu: idle state framework for offline CPUs.

2009-08-05 Thread Gautham R Shenoy

Hi,

 RFC not for inclusion 

When we perform a CPU-Offline operation today, we do not put the CPU
into the most energy efficient state. On x86, it loops in hlt as opposed to
going to one of the low-power C-states. On pSeries, we call rtas_stop_self()
and hand over the vCPU back to the resource pool, thereby deallocating
the vCPU.

Thus, when applications or platforms desire to put a particular CPU
to an extended low-power state for a short while, currently they have to
piggy-back on scheduler heuristics such as sched_mc_powersavings or play with
exclusive Cpusets. The former does a good job based on the workload, but fails
to provide any guarentee that the CPU won't be used for the next  seconds,
while the latter might conflict with the existing CPUsets configurations.

There were efforts to alleviate these problems and various proposals have been
put forth. They include putting the CPU to the deepest possible idle-state
when offlined [1], removing the desired CPU from the topmost-cpuset [2],
a driver which forces a high-priority idle thread to run on the desired CPU
thereby putting it to idle [3].

In this patch-series, we propose to extend the CPU-Hotplug infrastructure
and allow the system administrator to choose the desired state the CPU should
go to when it is offlined. We think this approach addresses the concerns about
determinism as well as transparency, since CPU-Hotplug already provides
notification mechanism which the userspace can listen to for any change
in the configuration and correspondingly readjust any previously set
cpu-affinities. Also, approaches such as [1] can make use of this
extended infrastructure instead of putting the CPU to an arbitrary C-state
when it is offlined, thereby providing the system administrator a rope to hang
himself with should he feel the need to do so.

This patch-series tries to achieve this by implementing an architecture
independent framework that exposes sysfs tunables to allow the
system-adminstrator to choose the offline-state of a CPU.

/sys/devices/system/cpu/cpunumber/available_offline_states
and
/sys/devices/system/cpu/cpunumber/preferred_offline_states

For the purpose of proof-of-concept, we've implemented the backend for
pSeries. For pSeries, we define two available_offline_states. They are:

deallocate: This is default behaviour which on an offline, deallocates
the vCPU by invoking rtas_stop_self() and hands it back to
the resource pool.

deactivate: This calls H_CEDE, which will request the hypervisor to
idle the vCPU in the lowest power mode and give it back as soon as
we need it.


Any feedback on the patchset will be immensely valuable.

References:
---
[1] Pallipadi, Venkatesh: x86: Make offline cpus to go to deepest idle state
using mwait (URL: http://lkml.org/lkml/2009/5/22/431)

[2] Li, Shaohua: cpuset: add new API to change cpuset top group's cpus
(URL: http://lkml.org/lkml/2009/5/19/54)

[3] Li, Shaohua: new ACPI processor driver to force CPUs idle
(URL: http://www.spinics.net/lists/linux-acpi/msg22863.html)


Changelog:
---

Gautham R Shenoy (3):
  pSeries: cpu: Cede CPU during a deactivate-offline
  cpu: Implement cpu-offline-state callbacks for pSeries.
  cpu: Offline state Framework.


 arch/powerpc/platforms/pseries/hotplug-cpu.c|  160 ++-
 arch/powerpc/platforms/pseries/offline_driver.h |   17 ++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |6 +
 arch/powerpc/platforms/pseries/smp.c|   18 ++-
 drivers/base/cpu.c  |  111 
 include/linux/cpu.h |   15 ++
 6 files changed, 319 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] cpu: Implement cpu-offline-state callbacks for pSeries.

2009-08-05 Thread Gautham R Shenoy

This patch implements the callbacks to handle the reads/writes into the sysfs
interfaces

/sys/devices/system/cpu/cpunumber/available_offline_states
and
/sys/devices/system/cpu/cpunumber/preferred_offline_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- deallocate: The current behaviour when the cpu is offlined.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR.

- deactivate: This is expected to cede the processor to the hypervisor, so
  that on processors which support appropriate low-power states, they can
  be exploited. This can be considered as an extended tickless idle state.

The patch only implements the callbacks which will display the available
states, and record the preferred states. The code bits to call
rtas_stop_self() or H_CEDE, depending on the preferred_offline_state is
implemented in the next patch.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   90 +++
 arch/powerpc/platforms/pseries/offline_driver.h |   16 
 2 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..f15de99 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,95 @@
 #include asm/pSeries_reconfig.h
 #include xics.h
 #include plpar_wrappers.h
+#include offline_driver.h
+
+struct cpu_offline_state {
+   enum cpu_state_vals state_val;
+   const char *state_name;
+} pSeries_cpu_offline_states[] = {
+   {CPU_DEACTIVATE, deactivate},
+   {CPU_DEALLOCATE, deallocate},
+};
+
+DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = CPU_DEALLOCATE;
+
+ssize_t pSeries_show_available_states(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   int state;
+   ssize_t ret = 0;
+
+   for (state = CPU_DEACTIVATE; state  CPU_MAX_OFFLINE_STATES; state++) {
+   if (state == CPU_STATE_ONLINE)
+   continue;
+
+   if (ret = (ssize_t) ((PAGE_SIZE / sizeof(char))
+   - (CPU_STATES_LEN + 2)))
+   goto out;
+   ret += scnprintf(buf[ret], CPU_STATES_LEN, %s ,
+   pSeries_cpu_offline_states[state].state_name);
+   }
+
+out:
+   ret += sprintf(buf[ret], \n);
+   return ret;
+}
+
+ssize_t pSeries_show_preferred_state(struct sys_device *dev,
+   struct sysdev_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   int state = per_cpu(preferred_offline_state, cpu-sysdev.id);
+
+   return scnprintf(buf, CPU_STATES_LEN, %s\n,
+   pSeries_cpu_offline_states[state].state_name);
+}
+
+ssize_t pSeries_store_preferred_state(struct sys_device *dev,
+   struct sysdev_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+   unsigned int ret = -EINVAL;
+   char state_name[CPU_STATES_LEN];
+   int i;
+   cpu_maps_update_begin();
+   ret = sscanf(buf, %15s, state_name);
+
+   if (ret != 1) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   for (i = CPU_DEACTIVATE; i  CPU_MAX_OFFLINE_STATES; i++)
+   if (!strnicmp(state_name,
+   pSeries_cpu_offline_states[i].state_name,
+   CPU_STATES_LEN))
+   break;
+
+   if (i == CPU_MAX_OFFLINE_STATES) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   per_cpu(preferred_offline_state, cpu-sysdev.id) =
+   pSeries_cpu_offline_states[i].state_val;
+   ret = 0;
+
+out_unlock:
+   cpu_maps_update_done();
+
+   if (ret)
+   return ret;
+   else
+   return count;
+}
+
+struct cpu_offline_driver pSeries_offline_driver = {
+   .show_available_states = pSeries_show_available_states,
+   .show_preferred_state = pSeries_show_preferred_state,
+   .store_preferred_state = pSeries_store_preferred_state,
+};
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -281,6 +370,7 @@ static int __init pseries_cpu_hotplug_init(void)
ppc_md.cpu_die = pseries_mach_cpu_die;
smp_ops-cpu_disable = pseries_cpu_disable;
smp_ops-cpu_die = pseries_cpu_die;
+   register_cpu_offline_driver(pSeries_offline_driver);
 
/* Processors can be added/removed only on LPAR

[PATCH 1/3] cpu: Offline state Framework.

2009-08-05 Thread Gautham R Shenoy

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the available offline states, on needs to perform a read on:
/sys/devices/system/cpu/cpunumber/available_offline_states

To query or set the preferred offline state for a particular CPU, one needs to
use the sysfs interface

/sys/devices/system/cpu/cpunumber/preferred_offline_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

The architecture specific bits are expected to register the actual code which
implements the callbacks when the above mentioned sysfs interfaces are read or
written into. Thus the values provided by reading available_offline_states are
expected to vary with the architecture.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 drivers/base/cpu.c  |  111 +++
 include/linux/cpu.h |   15 +++
 2 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..1a63de0 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -56,26 +56,137 @@ static ssize_t __ref store_online(struct sys_device *dev, 
struct sysdev_attribut
 }
 static SYSDEV_ATTR(online, 0644, show_online, store_online);
 
+static struct cpu_offline_driver *cpu_offline_driver;
+static SYSDEV_ATTR(available_offline_states, 0444, NULL, NULL);
+static SYSDEV_ATTR(preferred_offline_state, 0644, NULL, NULL);
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+   if (!cpu_offline_driver)
+   return;
+
+   sysdev_create_file(cpu_sys_dev, attr_available_offline_states);
+   sysdev_create_file(cpu_sys_dev, attr_preferred_offline_state);
+}
+
+/* Should be called with cpu_add_remove_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+   if (!cpu_offline_driver)
+   return;
+
+   sysdev_remove_file(cpu_sys_dev, attr_available_offline_states);
+   sysdev_remove_file(cpu_sys_dev, attr_preferred_offline_state);
+
+}
+
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
sysdev_create_file(cpu-sysdev, attr_online);
+   cpu_offline_driver_add_cpu(cpu-sysdev);
 }
+
 void unregister_cpu(struct cpu *cpu)
 {
int logical_cpu = cpu-sysdev.id;
 
unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
+   cpu_offline_driver_remove_cpu(cpu-sysdev);
sysdev_remove_file(cpu-sysdev, attr_online);
 
sysdev_unregister(cpu-sysdev);
per_cpu(cpu_sys_devices, logical_cpu) = NULL;
return;
 }
+
+static int __cpuinit
+cpu_driver_callback(struct notifier_block *nfb, unsigned long action,
+   void *hcpu)
+{
+   struct sys_device *cpu_sysdev = per_cpu(cpu_sys_devices,
+   (unsigned long)(hcpu));
+
+   switch (action) {
+   case CPU_DEAD:
+   case CPU_DEAD_FROZEN:
+   cpu_offline_driver_remove_cpu(cpu_sysdev);
+   break;
+
+   case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
+   cpu_offline_driver_add_cpu(cpu_sysdev);
+   break;
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata cpu_driver_notifier = {
+   .notifier_call = cpu_driver_callback,
+   .priority = 0
+};
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+   int ret = 0;
+   cpu_maps_update_begin();
+
+   if (cpu_offline_driver != NULL) {
+   ret = -EEXIST;
+   goto out_unlock;
+   }
+
+   if (!(arch_cpu_driver-show_available_states 
+ arch_cpu_driver-show_preferred_state 
+ arch_cpu_driver-store_preferred_state)) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   attr_available_offline_states.show =
+   arch_cpu_driver-show_available_states;
+   attr_preferred_offline_state.show =
+   arch_cpu_driver-show_preferred_state;
+   attr_preferred_offline_state.store =
+   arch_cpu_driver-store_preferred_state;
+
+   cpu_offline_driver = arch_cpu_driver;
+
+out_unlock:
+   cpu_maps_update_done();
+   if (!ret)
+   register_cpu_notifier(cpu_driver_notifier);
+   return ret;
+}
+
+void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+   cpu_maps_update_begin();
+
+   if (!cpu_offline_driver) {
+   WARN_ON(1);
+   cpu_maps_update_done();
+   return;
+   }
+
+   cpu_offline_driver = NULL;
+   attr_available_offline_states.show = NULL;
+   attr_preferred_offline_state.show = NULL;
+   attr_preferred_offline_state.store = NULL

[PATCH 3/3] pSeries: cpu: Cede CPU during a deactivate-offline

2009-08-05 Thread Gautham R Shenoy

Implements the pSeries specific code bits to put the CPU into
rtas_stop_self() state or H_CEDE state depending on the
preferred_offline_state value for that CPU.

Signed-off-by: Gautham R Shenoy e...@in.ibm.com
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c|   70 +--
 arch/powerpc/platforms/pseries/offline_driver.h |3 +
 arch/powerpc/platforms/pseries/plpar_wrappers.h |6 ++
 arch/powerpc/platforms/pseries/smp.c|   18 ++
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index f15de99..5b47d6c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -41,6 +41,8 @@ struct cpu_offline_state {
 };
 
 DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = CPU_DEALLOCATE;
+DEFINE_PER_CPU(enum cpu_state_vals, cpu_current_state);
+DEFINE_PER_CPU(int, cpu_offline_ack);
 
 ssize_t pSeries_show_available_states(struct sys_device *dev,
struct sysdev_attribute *attr, char *buf)
@@ -148,8 +150,47 @@ static void pseries_mach_cpu_die(void)
local_irq_disable();
idle_task_exit();
xics_teardown_cpu();
-   unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-   rtas_stop_self();
+   if (__get_cpu_var(preferred_offline_state) == CPU_DEACTIVATE) {
+
+   __get_cpu_var(cpu_offline_ack) = 1;
+   get_lppaca()-idle = 1;
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 1;
+
+   printk(KERN_INFO cpu %u (hwid %u) ceding\n,
+   smp_processor_id(), hard_smp_processor_id());
+
+   while (__get_cpu_var(cpu_current_state) == CPU_DEACTIVATE) {
+   cede_processor();
+   printk(KERN_INFO cpu %u (hwid %u) Returned from cede.\
+   Decrementer value: %x. Timebase value:%llx \n,
+  smp_processor_id(), hard_smp_processor_id(),
+  get_dec(), get_tb());
+   }
+
+   printk(KERN_INFO cpu %u (hwid %u) Received online PROD \n,
+  smp_processor_id(), hard_smp_processor_id());
+
+   if (!get_lppaca()-shared_proc)
+   get_lppaca()-donate_dedicated_cpu = 0;
+   get_lppaca()-idle = 0;
+
+   unregister_slb_shadow(hard_smp_processor_id(),
+   __pa(get_slb_shadow()));
+
+   /**
+* NOTE: Calling start_secondary() here, is not a very nice
+* way of beginning a new context.
+*
+* We need to reset the stack-pointer.
+* Find a cleaner way to do this.
+*/
+   start_secondary(NULL);
+   } else {
+   unregister_slb_shadow(hard_smp_processor_id(),
+   __pa(get_slb_shadow()));
+   rtas_stop_self();
+   }
/* Should never get here... */
BUG();
for(;;);
@@ -192,6 +233,10 @@ static int pseries_cpu_disable(void)
 
/* FIXME: abstract this to not be platform specific later on */
xics_migrate_irqs_away();
+   __get_cpu_var(cpu_current_state) =
+   __get_cpu_var(preferred_offline_state);
+
+   __get_cpu_var(cpu_offline_ack) = 0;
return 0;
 }
 
@@ -201,11 +246,22 @@ static void pseries_cpu_die(unsigned int cpu)
int cpu_status;
unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-   for (tries = 0; tries  25; tries++) {
-   cpu_status = query_cpu_stopped(pcpu);
-   if (cpu_status == 0 || cpu_status == -1)
-   break;
-   cpu_relax();
+   if (per_cpu(preferred_offline_state, cpu) == CPU_DEACTIVATE) {
+   /* Wait for some Ack */
+   for (tries = 0; tries  1; tries++) {
+   cpu_status = !per_cpu(cpu_offline_ack, cpu);
+   if (!cpu_status)
+   break;
+   cpu_relax();
+   }
+
+   } else {
+   for (tries = 0; tries  25; tries++) {
+   cpu_status = query_cpu_stopped(pcpu);
+   if (cpu_status == 0 || cpu_status == -1)
+   break;
+   cpu_relax();
+   }
}
if (cpu_status != 0) {
printk(Querying DEAD? cpu %i (%i) shows %i\n,
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h 
b/arch/powerpc/platforms/pseries/offline_driver.h
index bdae76a..571e085 100644
--- a/arch/powerpc/platforms/pseries/offline_driver.h
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -12,5 +12,6 @@ enum

Re: [PATCH 0/3] cpu: idle state framework for offline CPUs.

2009-08-06 Thread Gautham R Shenoy

Hi Shaohua,

On Thu, Aug 06, 2009 at 09:58:55AM +0800, Shaohua Li wrote:
 Hi,
 
 On Wed, Aug 05, 2009 at 10:25:53PM +0800, Gautham R Shenoy wrote:
  In this patch-series, we propose to extend the CPU-Hotplug infrastructure
  and allow the system administrator to choose the desired state the CPU 
  should
  go to when it is offlined. We think this approach addresses the concerns 
  about
  determinism as well as transparency, since CPU-Hotplug already provides
  notification mechanism which the userspace can listen to for any change
  in the configuration and correspondingly readjust any previously set
  cpu-affinities.
 Peter dislikes any approach (including cpuhotplug) which breaks userspace 
 policy,
 even userspace can get a notification.

I think Peter's problem was more to do with the kernel offlining the CPUs
behind the scenes, right ?

We don't do that in this patch series. The option to offline the CPUs is
very much with the admin. The patch-series only provides the interface
that helps the admin choose the state the CPU must reside in when it
goes offline.

 
  Also, approaches such as [1] can make use of this
  extended infrastructure instead of putting the CPU to an arbitrary C-state
  when it is offlined, thereby providing the system administrator a rope to 
  hang
  himself with should he feel the need to do so.
 I didn't see the reason why administrator needs to know which state offline 
 cpu
 should stay. Don't know about powerpc side, but in x86 side, it appears 
 deepest
 C-state is already preferred.

We can still provide a sane default value based on what states are
available and what the BIOS limits us to. Thus we can still use the
idle-state-offline patch that Venki posted sometime ago, right ?

 
 Thanks,
 Shaohua

-- 
Thanks and Regards
gautham
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] book3s_hv_rmhandlers:Pass the correct trap argument to kvmhv_commence_exit

2015-05-21 Thread Gautham R. Shenoy

In guest_exit_cont we call kvmhv_commence_exit which expects the trap
number as the argument. However r3 doesn't contain the trap number at
this point and as a result we would be calling the function with a
spurious trap number.

Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
r12 contains the trap number

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4d70df2..f0d7c54 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1170,6 +1170,7 @@ mc_cont:
bl  kvmhv_accumulate_time
 #endif
 
+   mr  r3, r12
/* Increment exit count, poke other threads to exit */
bl  kvmhv_commence_exit
nop
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] book3s_hv_rmhandlers:Pass the correct trap argument to kvmhv_commence_exit

2015-08-17 Thread Gautham R Shenoy

Hi Sam,

On Fri, Aug 14, 2015 at 03:07:28PM +1000, Sam Bobroff wrote:
 On Thu, May 21, 2015 at 01:57:04PM +0530, Gautham R. Shenoy wrote:
  In guest_exit_cont we call kvmhv_commence_exit which expects the trap
  number as the argument. However r3 doesn't contain the trap number at
  this point and as a result we would be calling the function with a
  spurious trap number.
  
  Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
  r12 contains the trap number
  
  Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
 
 Hi Gautham,
 
 I agree with your logic: r3 is quite clearly corrupted in that path. So:
 
 Reviewed-by: Sam Bobroff sam.bobr...@au1.ibm.com
 
 Just one comment: Do you have a case of this causing some visible problem due
 to the corrupted trap number? (I'll test the patch if you do.)
 

Actually no! I found this bug while reviewing the code for some other issue.

 Cheers,
 Sam.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Add an inline function to update HID0

2015-08-04 Thread Gautham R Shenoy

Hi Michael,

On Tue, Aug 04, 2015 at 08:08:58PM +1000, Michael Ellerman wrote:
 On Tue, 2015-04-08 at 08:30:58 UTC, Gautham R. Shenoy wrote:
  Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
  prescribes that updates to HID0 be preceded by a SYNC instruction and
  followed by an ISYNC instruction (Page 91).
  
  Create a function name update_hid0() which follows this recipe and
  invoke it from the static split core path.
  
  Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
  ---
   arch/powerpc/include/asm/kvm_ppc.h   | 11 +++
 
 Why is it in there? It's not KVM related per se.

Ok. Will fix this.
 
 Where should it go? I think reg.h would be best, ideally near the definition
 for HID0, though that's probably not possible because of ASSEMBLY 
 requirements.
 So at the bottom of reg.h ?
 
  diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
  b/arch/powerpc/include/asm/kvm_ppc.h
  index c6ef05b..325f1d6 100644
  --- a/arch/powerpc/include/asm/kvm_ppc.h
  +++ b/arch/powerpc/include/asm/kvm_ppc.h
  @@ -685,4 +685,15 @@ static inline ulong kvmppc_get_ea_indexed(struct 
  kvm_vcpu *vcpu, int ra, int rb)
   
   extern void xics_wake_cpu(int cpu);
   
  +static inline void update_hid0(unsigned long hid0)
  +{
  +   /*
  +*  The HID0 update should at the very least be preceded by a
  +*  a SYNC instruction followed by an ISYNC instruction
  +*/
  +   mb();
  +   mtspr(SPRN_HID0, hid0);
  +   isync();
 
 That's going to turn into three separate inline asm blocks, which is maybe a
 bit unfortunate. Have you checked the generated code is what we want, ie. just
 sync, mtspr, isync ?
 

Yes, the objdump of subcore.o shows exactly these three instructions:
 7c 00 04 ac sync
 7c 70 fb a6 mtspr   1008,r3
 4c 00 01 2c isync

 cheers

--
Thanks and Regards
gautham. 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc: Add an inline function to update HID0

2015-08-04 Thread Gautham R. Shenoy

Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_hid0() which follows this recipe
and invoke it from the static split core path.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
[v1-- v2: Moved defn of update_hid0 to reg.h from kvm_ppc.h]

 arch/powerpc/include/asm/reg.h   | 13 +
 arch/powerpc/platforms/powernv/subcore.c |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index af56b5c..beb98ac 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -12,6 +12,8 @@
 
 #include linux/stringify.h
 #include asm/cputable.h
+#include asm/barrier.h
+#include asm/synch.h
 
 /* Pickup Book E specific registers. */
 #if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
@@ -1281,6 +1283,17 @@ struct pt_regs;
 
 extern void ppc_save_regs(struct pt_regs *regs);
 
+static inline void update_hid0(unsigned long hid0)
+{
+   /*
+*  The HID0 update should at the very least be preceded by a
+*  a SYNC instruction followed by an ISYNC instruction
+*/
+   mb();
+   mtspr(SPRN_HID0, hid0);
+   isync();
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_REG_H */
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index f60f80a..37e77bf 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -190,7 +190,7 @@ static void unsplit_core(void)
 
hid0 = mfspr(SPRN_HID0);
hid0 = ~HID0_POWER8_DYNLPARDIS;
-   mtspr(SPRN_HID0, hid0);
+   update_hid0(hid0);
update_hid_in_slw(hid0);
 
while (mfspr(SPRN_HID0)  mask)
@@ -227,7 +227,7 @@ static void split_core(int new_mode)
/* Write new mode */
hid0  = mfspr(SPRN_HID0);
hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
-   mtspr(SPRN_HID0, hid0);
+   update_hid0(hid0);
update_hid_in_slw(hid0);
 
/* Wait for it to happen */
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Add an inline function to update HID0

2015-08-04 Thread Gautham R. Shenoy

Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create a function name update_hid0() which follows this recipe and
invoke it from the static split core path.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_ppc.h   | 11 +++
 arch/powerpc/platforms/powernv/subcore.c |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..325f1d6 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -685,4 +685,15 @@ static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu 
*vcpu, int ra, int rb)
 
 extern void xics_wake_cpu(int cpu);
 
+static inline void update_hid0(unsigned long hid0)
+{
+   /*
+*  The HID0 update should at the very least be preceded by a
+*  a SYNC instruction followed by an ISYNC instruction
+*/
+   mb();
+   mtspr(SPRN_HID0, hid0);
+   isync();
+}
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index f60f80a..37e77bf 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -190,7 +190,7 @@ static void unsplit_core(void)
 
hid0 = mfspr(SPRN_HID0);
hid0 = ~HID0_POWER8_DYNLPARDIS;
-   mtspr(SPRN_HID0, hid0);
+   update_hid0(hid0);
update_hid_in_slw(hid0);
 
while (mfspr(SPRN_HID0)  mask)
@@ -227,7 +227,7 @@ static void split_core(int new_mode)
/* Write new mode */
hid0  = mfspr(SPRN_HID0);
hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
-   mtspr(SPRN_HID0, hid0);
+   update_hid0(hid0);
update_hid_in_slw(hid0);
 
/* Wait for it to happen */
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Add an inline function to update HID0

2015-08-05 Thread Gautham R Shenoy

Hi Segher,

Thanks for the suggestions. I will rename the function to
update_power8_hid0() and use asm volatile.


On Tue, Aug 04, 2015 at 09:30:57PM -0500, Segher Boessenkool wrote:
 On Tue, Aug 04, 2015 at 08:08:58PM +1000, Michael Ellerman wrote:
   +static inline void update_hid0(unsigned long hid0)
   +{
   + /*
   +  *  The HID0 update should at the very least be preceded by a
   +  *  a SYNC instruction followed by an ISYNC instruction
   +  */
   + mb();
   + mtspr(SPRN_HID0, hid0);
   + isync();
  
  That's going to turn into three separate inline asm blocks, which is maybe a
  bit unfortunate. Have you checked the generated code is what we want, ie. 
  just
  sync, mtspr, isync ?
 
 The mb() is not such a great name anyway: you don't want a memory
 barrier, you want an actual sync instruction (sync 0, hwsync,
 whatever the currently preferred spelling is).
 
 The function name should also say this is for POWER8 (the required
 sequences are different for some other processors; and some others
 might not even _have_ a HID0, or not at 1008).  power8_write_hid0
 or such?
 
 For writing it as one asm, why not just
 
   asm volatile(sync ; mtspr %0,%1 ; isync : : i(SPRN_HID0), r(hid0));
 
 instead of the stringify stuff?
 
 
 Segher
 

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc: Add an inline function to update POWER8 HID0

2015-08-05 Thread Gautham R. Shenoy

Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.

Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com
---
[v1 -- v2: Moved defn of update_hid0 to reg.h from kvm_ppc.h]
[v2 -- v3: Renamed to update_power8_hid0 and used asm volatile]

 arch/powerpc/include/asm/reg.h   | 9 +
 arch/powerpc/platforms/powernv/subcore.c | 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index af56b5c..1245d99 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1281,6 +1281,15 @@ struct pt_regs;
 
 extern void ppc_save_regs(struct pt_regs *regs);
 
+static inline void update_power8_hid0(unsigned long hid0)
+{
+   /*
+*  The HID0 update on Power8 should at the very least be
+*  preceded by a a SYNC instruction followed by an ISYNC
+*  instruction
+*/
+   asm volatile(sync; mtspr %0,%1; isync:: i(SPRN_HID0), r(hid0));
+}
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_REG_H */
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index f60f80a..503a73f 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -190,7 +190,7 @@ static void unsplit_core(void)
 
hid0 = mfspr(SPRN_HID0);
hid0 = ~HID0_POWER8_DYNLPARDIS;
-   mtspr(SPRN_HID0, hid0);
+   update_power8_hid0(hid0);
update_hid_in_slw(hid0);
 
while (mfspr(SPRN_HID0)  mask)
@@ -227,7 +227,7 @@ static void split_core(int new_mode)
/* Write new mode */
hid0  = mfspr(SPRN_HID0);
hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
-   mtspr(SPRN_HID0, hid0);
+   update_power8_hid0(hid0);
update_hid_in_slw(hid0);
 
/* Wait for it to happen */
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] book3s_hv: Handle H_DOORBELL on the guest exit path

2015-10-15 Thread Gautham R. Shenoy

Currently a CPU running a guest can receive a H_DOORBELL in the
following two cases:
1) When the CPU is napping due to CEDE or there not being a guest
vcpu.
2) The CPU is running the guest vcpu.

Case 1), the doorbell message is not cleared since we were waking up
from nap. Hence when the EE bit gets set on transition from guest to
host, the H_DOORBELL interrupt is delivered to the host and the
corresponding handler is invoked.

However in Case 2), the message gets cleared by the action of taking
the H_DOORBELL interrupt. Since the CPU was running a guest, instead
of invoking the doorbell handler, the code invokes the second-level
interrupt handler to switch the context from the guest to the host. At
this point the setting of the EE bit doesn't result in the CPU getting
the doorbell interrupt since it has already been delivered once. So,
the handler for this doorbell is never invoked!

This causes softlockups if the missed DOORBELL was as IPI sent from a
sibling subcore CPU.

This patch fixes it by explitly invoking the doorbell handler on the
exit path if the exit reason is H_DOORBELL similar to the way an
EXTERNAL interrupt is handled. Since this will also handle Case 1), we
can unconditionally clear the doorbell message in
kvmppc_check_wake_reason.

Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..106c7f9 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -150,6 +150,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
+   cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
+   beq 15f /* Invoke the H_DOORBELL handler */
cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
beq cr2, 14f/* HMI check */
 
@@ -174,6 +176,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
 
+15:mtspr SPRN_HSRR0, r8
+   mtspr SPRN_HSRR1, r7
+   ba0xe80
+
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
@@ -2436,14 +2442,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* hypervisor doorbell */
 3: li  r12, BOOK3S_INTERRUPT_H_DOORBELL
+
+   /*
+* Clear the doorbell as we will invoke the handler
+* explicitly in the guest exit path.
+*/
+   lis r6, (PPC_DBELL_SERVER << (63-36))@h
+   PPC_MSGCLR(6)
/* see if it's a host IPI */
li  r3, 1
lbz r0, HSTATE_HOST_IPI(r13)
cmpwi   r0, 0
bnelr
-   /* if not, clear it and return -1 */
-   lis r6, (PPC_DBELL_SERVER << (63-36))@h
-   PPC_MSGCLR(6)
+   /* if not, return -1 */
li  r3, -1
blr
 
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v4 3/4] cpufreq: powernv: Add a trace print for the throttle event

2016-01-12 Thread Gautham R Shenoy

Hi Shilpa,

Just saw this resend!

On Tue, Jan 12, 2016 at 04:24:26AM -0600, Shilpasri G Bhat wrote:
> Record the throttle event with a trace print replacing the printk,
> except for events like throttling below nominal and occ reset
> event which print a warning message.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---

[..snip..]

> 
> -static void powernv_cpufreq_throttle_check(void *data)
> +static void powernv_cpufreq_check_pmax(void)
   ^^^
This function only contains code moved from
powernv_cpufreq_throttle_check with pr_crit/pr_warns replaced by
trace_powernv_throttle. Furthermore, it is not called from any other
place. Given that the original function was ~60 lines do we really
need to split it into two separate functions ? If yes, could it be an
inline function ?

>  {
>   unsigned int cpu = smp_processor_id();
>   unsigned int chip_id = pir_to_chip_id(hard_smp_processor_id());
> - unsigned long pmsr;
>   int pmsr_pmax, i;
> 
> - pmsr = get_pmspr(SPRN_PMSR);
> + pmsr_pmax = (s8)PMSR_MAX(get_pmspr(SPRN_PMSR));
> 
>   for (i = 0; i < nr_chips; i++)
>   if (chips[i].id == chip_id)
>   break;
> 
> - /* Check for Pmax Capping */
> - pmsr_pmax = (s8)PMSR_MAX(pmsr);
>   if (pmsr_pmax != powernv_pstate_info.max) {
>   if (chips[i].throttled)
> - goto next;
> + return;
> +
>   chips[i].throttled = true;
>   if (pmsr_pmax < powernv_pstate_info.nominal)
> - pr_crit("CPU %d on Chip %u has Pmax reduced below 
> nominal frequency (%d < %d)\n",
> - cpu, chips[i].id, pmsr_pmax,
> - powernv_pstate_info.nominal);
> - else
> - pr_info("CPU %d on Chip %u has Pmax reduced below turbo 
> frequency (%d < %d)\n",
> - cpu, chips[i].id, pmsr_pmax,
> - powernv_pstate_info.max);
> + pr_warn_once("CPU %d on Chip %u has Pmax reduced below 
> nominal frequency (%d < %d)\n",
> +  cpu, chips[i].id, pmsr_pmax,
> +  powernv_pstate_info.nominal);
> +
> + trace_powernv_throttle(chips[i].id,
> +throttle_reason[chips[i].throt_reason],
> +pmsr_pmax);
>   } else if (chips[i].throttled) {
>   chips[i].throttled = false;
> - pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
> - chips[i].id, pmsr_pmax);
> + trace_powernv_throttle(chips[i].id,
> +throttle_reason[chips[i].throt_reason],
> +pmsr_pmax);
>  }
> +}
> +
> +static void powernv_cpufreq_throttle_check(void *data)
> +{
> + unsigned long pmsr;
> +
> + pmsr = get_pmspr(SPRN_PMSR);
> +
> + /* Check for Pmax Capping */
> + powernv_cpufreq_check_pmax();
  
If you want to retain this function, you could pass pmsr as an
argument instead of computing it afresh in
powernv_cpufreq_check_pmax()

>   /* Check if Psafe_mode_active is set in PMSR. */
> -next:
>   if (pmsr & PMSR_PSAFE_ENABLE) {
>   throttled = true;
>   pr_info("Pstate set to safe frequency\n");

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v4 4/4] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-12 Thread Gautham R Shenoy

Hi Shilpa,

On Tue, Jan 12, 2016 at 04:24:27AM -0600, Shilpasri G Bhat wrote:

> +static inline int get_chip_index(struct kobject *kobj)

Probably have "get_chip_index(int id)". See the reason below.
> +{
> + int i, id;
> +
> + i = kstrtoint(kobj->name + 4, 0, );
> + if (i)
> + return i;
> +
> + for (i = 0; i < nr_chips; i++)
> + if (chips[i].id == id)
> + return i;

This pattern to obtain a chip index from the chip id is repeated in
multiple place inside this file. Might be worthwhile to move this to a
helper function, i.e get_chip_index(id)!

> + return -EINVAL;
> +}
> +
> +static ssize_t throttle_freq_show(struct kobject *kobj,
> +   struct kobj_attribute *attr, char *buf)
> +{
> + int i, count = 0, id;
> + 
We obtain the id from kobj here and then obtain the index from
id via the function below.


> + id = get_chip_index(kobj);


> + if (id < 0)
> + return id;
> +
> + for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
> + count += sprintf([count], "%d %d\n",
> +powernv_freqs[i].frequency,
> +chips[id].pstate_stat[i]);
> +
> + return count;
> +}
> +
> +static struct kobj_attribute attr_throttle_frequencies =
> +__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
> +
--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

2016-06-07 Thread Gautham R. Shenoy

With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

[ cut here ]
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
<..snip..>
Call Trace:
[c00f273ff920] [c010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 
(unreliable)
[c00f273ffac0] [c00ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c00f273ffb70] [c00f5c58] notifier_call_chain+0x98/0x100
[c00f273ffbc0] [c00c5ed0] __cpu_notify+0x70/0xe0
[c00f273ffc00] [c00c6028] notify_online+0x38/0x50
[c00f273ffc30] [c00c5214] cpuhp_invoke_callback+0x84/0x250
[c00f273ffc90] [c00c562c] cpuhp_up_callbacks+0x5c/0x120
[c00f273ffce0] [c00c64d4] cpuhp_thread_fun+0x184/0x1c0
[c00f273ffd20] [c00fa050] smpboot_thread_fn+0x290/0x2a0
[c00f273ffd80] [c00f45b0] kthread+0x110/0x130
[c00f273ffe30] [c0009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch sets the affinity of the worker to
a) the only online CPU in the cpumask of the worker pool when it comes
   online.
b) the cpumask of the worker pool when the second CPU in the pool's
   cpumask comes online.

Reported-by: Abdul Haleem <abdha...@linux.vnet.ibm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tejun Heo <hte...@gmail.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 kernel/workqueue.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e412794..1199f73 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4586,7 +4586,7 @@ static void rebind_workers(struct worker_pool *pool)
  *
  * An unbound pool may end up with a cpumask which doesn't have any online
  * CPUs.  When a worker of such pool get scheduled, the scheduler resets
- * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
+ * its cpus_allowed.  If @cpu is in @pool's cpumask which had at most one
  * online CPU before, cpus_allowed of all its workers should be restored.
  */
 static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
@@ -4600,15 +4600,26 @@ static void restore_unbound_workers_cpumask(struct 
worker_pool *pool, int cpu)
if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
return;
 
-   /* is @cpu the only online CPU? */
cpumask_and(, pool->attrs->cpumask, cpu_online_mask);
-   if (cpumask_weight() != 1)
+
+   /*
+* The affinity needs to be set
+* a) to @cpu when that is the only online CPU in
+*pool->attrs->cpumask.
+* b) to pool->attrs->cpumask when exactly two CPUs in
+*pool->attrs->cpumask are online. This affinity will be
+*retained when subsequent CPUs come online.
+*/
+   if (cpumask_weight() > 2)
return;
 
+   if (cpumask_weight() == 2)
+   cpumask_copy(, pool->attrs->cpumask);
+
/* as we're called from CPU_ONLINE, the following shouldn't fail */
for_each_pool_worker(worker, pool)
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
- pool->attrs->cpumask) < 0);
+ ) < 0);
 }
 
 /*
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-07 Thread Gautham R. Shenoy

Currently in the CPU_ONLINE workqueue handler, the
restore_unbound_workers_cpumask() will never call
set_cpus_allowed_ptr() for a newly created unbound worker thread.

This is because the function which creates a new unbound worker thread
when the first CPU in the node comes online [wq_update_unbound_numa()]
is invoked after the call to restore_unbound_workers_cpumask(). Thus
the affinity is never set for this worker thread when the first CPU in
the node comes online.

Furthermore, due to an optimization in
restore_unbound_workers_cpumask(), set_cpus_allowed_ptr() is not
called when subsequent CPUs in the node come online since it assumes
that the affinity would have been set when the first CPU has come
online.

This patch fixes this issue by invoking wq_update_unbound_numa()
before the calling restore_unbound_workers_cpumask().

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tejun Heo <hte...@gmail.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
---
 kernel/workqueue.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e1c0e99..e412794 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4638,6 +4638,10 @@ static int workqueue_cpu_up_callback(struct 
notifier_block *nfb,
case CPU_ONLINE:
mutex_lock(_pool_mutex);
 
+   /* update NUMA affinity of unbound workqueues */
+   list_for_each_entry(wq, , list)
+   wq_update_unbound_numa(wq, cpu, true);
+
for_each_pool(pool, pi) {
mutex_lock(>attach_mutex);
 
@@ -4649,10 +4653,6 @@ static int workqueue_cpu_up_callback(struct 
notifier_block *nfb,
mutex_unlock(>attach_mutex);
}
 
-   /* update NUMA affinity of unbound workqueues */
-   list_for_each_entry(wq, , list)
-   wq_update_unbound_numa(wq, cpu, true);
-
mutex_unlock(_pool_mutex);
break;
}
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2] Fix CPU Online handling for unbounded worker threads

2016-06-07 Thread Gautham R. Shenoy

Hi,

This patchset fixes a couple of issues in the CPU_ONLINE notification
handling for the workqueues with respect to unbounded worker threads.

Patch 1 ensures that the affinity of a unbound worker thread
associated with a node whose very first CPU has come online is set
correctly. In the existing code path we will never call
set_cpus_allowed_ptr() for unbound worker threads that have been
created on a CPU Online operation after boot.

Patch 2 fixes the following WARN_ON() reported by Abdul when
set_cpus_allowed_ptr() for an unbound worker thread is invoked when
only one of the CPUs in its cpumask is online but not yet active.

 [ cut here ]
 WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 
__set_cpus_allowed_ptr+0x21c/0x290
 Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest #1
task: c00f27284200 ti: c00f273fc000 task.ti: c00f273fc000
NIP: c010488c LR: c0104874 CTR: 
REGS: c00f273ff7d0 TRAP: 0700   Not tainted  (4.6.0-autotest)
MSR: 900100029033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 28002804  XER: 
2000
CFAR: c05b0888 SOFTE: 0
GPR00: c010478c c00f273ffa50 c13ce400 
GPR04: c140ed98 0800 c007f64d9408 
GPR08:  0028 c140ee90 0020
GPR12: 2200 cfb96800 c00f44a8 c007fa158480
GPR16: c007fc621a70 c00f2721f800  0001
GPR20: c1571ef0  c134879f c12bc510
GPR24: 0100  c140ea98 c007f64d9408
GPR28: c007fbc21c00 ffea  c00f2728
NIP [c010488c] __set_cpus_allowed_ptr+0x21c/0x290
LR [c0104874] __set_cpus_allowed_ptr+0x204/0x290
Call Trace:
[c00f273ffa50] [c010478c] __set_cpus_allowed_ptr+0x11c/0x290 
(unreliable)
[c00f273ffac0] [c00ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c00f273ffb70] [c00f5c58] notifier_call_chain+0x98/0x100
[c00f273ffbc0] [c00c5ed0] __cpu_notify+0x70/0xe0
[c00f273ffc00] [c00c6028] notify_online+0x38/0x50
[c00f273ffc30] [c00c5214] cpuhp_invoke_callback+0x84/0x250
[c00f273ffc90] [c00c562c] cpuhp_up_callbacks+0x5c/0x120
[c00f273ffce0] [c00c64d4] cpuhp_thread_fun+0x184/0x1c0
[c00f273ffd20] [c00fa050] smpboot_thread_fn+0x290/0x2a0
[c00f273ffd80] [c00f45b0] kthread+0x110/0x130
[c00f273ffe30] [c0009570] ret_from_kernel_thread+0x5c/0x6c
Instruction dump:
419eff3c 3d420004 38a00800 388a0998 7f63db78 484abfa1 6000 2fa3
409eff1c 813f0378 2f890001 419eff10 <0fe0> 4b08 6000 6000
 ---[ end trace cbc1c5cfbc9591d0 ]---

The patches are based on 4.7-rc2. I have tested the patches on a
multi-node x86_64 and a ppc64

Gautham R. Shenoy (2):
  workqueue: Move wq_update_unbound_numa() to the beginning of
CPU_ONLINE
  workqueue:Fix affinity of an unbound worker of a node with 1 online
CPU

 kernel/workqueue.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] Fix CPU Online handling for unbounded worker threads

2016-06-12 Thread Gautham R Shenoy

Hi Peter, Thomas,
On Tue, Jun 07, 2016 at 08:44:01PM +0530, Gautham R. Shenoy wrote:
> Hi,
> 
> This patchset fixes a couple of issues in the CPU_ONLINE notification
> handling for the workqueues with respect to unbounded worker
threads.

Any thoughts on these patches ? They fix a race which
was causing WARN_ON() to be consistently reproduced on POWER machines
since 4.6.

Could you please review these patches ?

--
Thanks and Regard
gautham.

> 
> Patch 1 ensures that the affinity of a unbound worker thread
> associated with a node whose very first CPU has come online is set
> correctly. In the existing code path we will never call
> set_cpus_allowed_ptr() for unbound worker threads that have been
> created on a CPU Online operation after boot.
> 
> Patch 2 fixes the following WARN_ON() reported by Abdul when
> set_cpus_allowed_ptr() for an unbound worker thread is invoked when
> only one of the CPUs in its cpumask is online but not yet active.
> 
>  [ cut here ]
>  WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 
> __set_cpus_allowed_ptr+0x21c/0x290
>  Modules linked in:
> CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest #1
> task: c00f27284200 ti: c00f273fc000 task.ti: c00f273fc000
> NIP: c010488c LR: c0104874 CTR: 
> REGS: c00f273ff7d0 TRAP: 0700   Not tainted  (4.6.0-autotest)
> MSR: 900100029033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 28002804  XER: 
> 2000
> CFAR: c05b0888 SOFTE: 0
> GPR00: c010478c c00f273ffa50 c13ce400 
> GPR04: c140ed98 0800 c007f64d9408 
> GPR08:  0028 c140ee90 0020
> GPR12: 2200 cfb96800 c00f44a8 c007fa158480
> GPR16: c007fc621a70 c00f2721f800  0001
> GPR20: c1571ef0  c134879f c12bc510
> GPR24: 0100  c140ea98 c007f64d9408
> GPR28: c007fbc21c00 ffea  c00f2728
> NIP [c010488c] __set_cpus_allowed_ptr+0x21c/0x290
> LR [c0104874] __set_cpus_allowed_ptr+0x204/0x290
> Call Trace:
> [c00f273ffa50] [c010478c] __set_cpus_allowed_ptr+0x11c/0x290 
> (unreliable)
> [c00f273ffac0] [c00ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
> [c00f273ffb70] [c00f5c58] notifier_call_chain+0x98/0x100
> [c00f273ffbc0] [c00c5ed0] __cpu_notify+0x70/0xe0
> [c00f273ffc00] [c00c6028] notify_online+0x38/0x50
> [c00f273ffc30] [c00c5214] cpuhp_invoke_callback+0x84/0x250
> [c00f273ffc90] [c00c562c] cpuhp_up_callbacks+0x5c/0x120
> [c00f273ffce0] [c00c64d4] cpuhp_thread_fun+0x184/0x1c0
> [c00f273ffd20] [c00fa050] smpboot_thread_fn+0x290/0x2a0
> [c00f273ffd80] [c00f45b0] kthread+0x110/0x130
> [c00f273ffe30] [c0009570] ret_from_kernel_thread+0x5c/0x6c
> Instruction dump:
> 419eff3c 3d420004 38a00800 388a0998 7f63db78 484abfa1 6000 2fa3
> 409eff1c 813f0378 2f890001 419eff10 <0fe0> 4b08 60000000 6000
>  ---[ end trace cbc1c5cfbc9591d0 ]---
> 
> The patches are based on 4.7-rc2. I have tested the patches on a
> multi-node x86_64 and a ppc64
> 
> Gautham R. Shenoy (2):
>   workqueue: Move wq_update_unbound_numa() to the beginning of
> CPU_ONLINE
>   workqueue:Fix affinity of an unbound worker of a node with 1 online
> CPU
> 
>  kernel/workqueue.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> -- 
> 1.9.3
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 08/10] powerpc/powernv: Add platform support for stop instruction

2016-05-24 Thread Gautham R Shenoy

Hi Shreyas,

On Tue, May 24, 2016 at 06:45:12PM +0530, Shreyas B. Prabhu wrote:
> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>  a) new instruction named stop is added. This instruction replaces
>   instructions like nap, sleep, rvwinkle.
>  b) new per thread SPR named Processor Stop Status and Control Register
>   (PSSCR) is added which controls the behavior of stop instruction.
> 
> PSSCR layout:
> --
> | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
> --
> 0  4 41   4243   44 4854   5660
> 
> PSSCR key fields:
>   Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
>   power-saving state the thread entered since stop instruction was last
>   executed.
> 
>   Bit 42 - Enable State Loss
>   0 - No state is lost irrespective of other fields
>   1 - Allows state loss
> 
>   Bits 44:47 - Power-Saving Level Limit
>   This limits the power-saving level that can be entered into.
> 
>   Bits 60:63 - Requested Level
>   Used to specify which power-saving level must be entered on executing
>   stop instruction
> 
> This patch adds support for stop instruction and PSSCR handling.

This version looks good to me.

Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

2016-06-15 Thread Gautham R Shenoy

Hi Peter,

On Tue, Jun 14, 2016 at 01:22:34PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote:
> 
> I'm still puzzled why we don't see this on x86. Afaict there's nothing
> PPC specific about this.

You are right. On PPC, at boot time we hit the WARN_ON like once in 5
times. Using some debug prints, I have verified that these are
instances when the workqueue subsystem gets initialized before all the
CPUs come online. On x86, I have never been able to hit this since it
appears that every time the workqueues get initialized only after all
the CPUs have come online.

PPC doesn't uses any specific unbound workqueue early in the boot. The
unbound workqueues causing the WARN_ON() were the
"events_unbound" workqueue which was created by workqueue_init().

=
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-127.
 online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-31. 
 online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 32-63.
  online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 64-95.
  online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 96-127.
  online mask 0
=

Also, with the first patch in the series (which ensures that
restore_unbound_workers are called *after* the new workers for the
newly onlined CPUs are created) and without this one, you can
reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
of a node and bringing just one of them online. So essentially the BUG
fixed by the previous patch is currently hiding this BUG which is why
we are not able to reproduce this WARN_ON() with CPU-hotplug once the
system has booted.

> 
> > This patch sets the affinity of the worker to
> > a) the only online CPU in the cpumask of the worker pool when it comes
> >online.
> > b) the cpumask of the worker pool when the second CPU in the pool's
> >cpumask comes online.
> 
> This basically works around the WARN conditions, which I suppose is fair
> enough, but I would like a note here to revisit this once the whole cpu
> hotplug rework has settled.
> 

Sure.

> The real problem is that workqueues seem to want to create worker
> threads before there's anybody who would use them or something like
> that.

I am not sure about that. The workqueue creates unbound workers for a
node via wq_update_unbound_numa() whenever the first CPU of every node
comes online. So that seems legitimate. It then tries to affine these
workers to the cpumask of that node. Again this seems right. As an
optimization, it does this only when the first CPU of the node comes
online. Since this online CPU is not yet active, and since
nr_cpus_allowed > 1, we will hit the WARN_ON().

However, I agree with you that during boot-up, the workqueue subsystem
needs to create unbound worker threads for only the online CPUs
(instead of all possible CPUs as it currently does!) and let the
CPU_ONLINE notification take care of creating the remaining workers
when they are really required.

> 
> Or is that what PPC does funny? Use an unbound workqueue this early in
> cpu bringup?

Like I pointed out above, PPC doesn't use an unbound workqueue early
in the CPU bring up.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

2016-06-15 Thread Gautham R Shenoy

On Wed, Jun 15, 2016 at 01:32:49PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 03:49:36PM +0530, Gautham R Shenoy wrote:
> 
> > Also, with the first patch in the series (which ensures that
> > restore_unbound_workers are called *after* the new workers for the
> > newly onlined CPUs are created) and without this one, you can
> > reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
> > of a node and bringing just one of them online.
> 
> Ah good.
> 
> > I am not sure about that. The workqueue creates unbound workers for a
> > node via wq_update_unbound_numa() whenever the first CPU of every node
> > comes online. So that seems legitimate. It then tries to affine these
> > workers to the cpumask of that node. Again this seems right. As an
> > optimization, it does this only when the first CPU of the node comes
> > online. Since this online CPU is not yet active, and since
> > nr_cpus_allowed > 1, we will hit the WARN_ON().
> 
> So I had another look and isn't the below a much simpler solution?
> 
> It seems to work on my x86 with:
> 
>   for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done
>   for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done
> 
> without complaint.

Yup. This will work on PPC as well. 

We will no longer have the optimization in
restore_unbound_workers_cpumask() but I suppose we don't lose much by
resetting the affinity every time a CPU in the pool->attr->cpumask
comes online.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Gautham R Shenoy

On Tue, Jun 21, 2016 at 09:47:19PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote:
> > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote:
> > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not
> > > entirely sure I like it.
> > > 
> > > I think I prefer ego's version because that makes it harder to get stuff
> > > to run on !active,online cpus. I think we really want to be careful what
> > > gets to run during that state.
> > 
> > The original patch just did set_cpus_allowed one more time late enough
> > so that the target kthread (in most cases) doesn't have to go through
> > fallback rq selection afterwards.  I don't know what the long term
> > solution is but CPU_ONLINE callbacks should be able to bind kthreads
> > to the new CPU one way or the other.
> 
> Fair enough; clearly I need to stare harder. In any case, patch is on
> its way to sched/urgent.

Thanks Tejun, Peter!
> 

--
Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-15 Thread Gautham R Shenoy

Hello Tejun,

On Wed, Jun 15, 2016 at 11:53:50AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 07, 2016 at 08:44:02PM +0530, Gautham R. Shenoy wrote:
> > Currently in the CPU_ONLINE workqueue handler, the
> > restore_unbound_workers_cpumask() will never call
> > set_cpus_allowed_ptr() for a newly created unbound worker thread.
> 
> Hmmm... did you actually verify that this happens?  A new kworker
> always gets bound to the cpumask that it's assigned to in
> create_worker().

Yes I have verified that this happens despite the fact that
create_worker() calls kthread_bind_mask() to bind the worker thread to
attrs->cpumask. However, this doesn't seem to be sufficient.

Consider the following case of a 2-node POWER machine running 4.7-rc3.

CPUs 0-79 belong to the 1st node and CPUs 80-159 belong to the second.

==
root@fir01:~# uname -r
4.7.0-rc3-vanilla
root@fir01:~# numactl -H
available: 2 nodes (0,8)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
77 78 79
node 0 size: 65246 MB
node 0 free: 64025 MB
node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 8 size: 65304 MB
node 8 free: 64985 MB
node distances:
node   0   8 
  0:  10  40 
  8:  40  10 
==

If we inspect the unbound worker threads affinity we will observe that
the ordered unbound worker threads (pids 6, 1088, 1122) are affined to
the online CPUs while the remaining unbound worker threads are affined
to the nodemask.
==
root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh #See [1] below
pid 6's current affinity list: 0-159
pid 7's current affinity list: 0-79
pid 1018's current affinity list: 80-159
pid 1054's current affinity list: 80-159
pid 1088's current affinity list: 0-159
pid 1089's current affinity list: 0-79
pid 1090's current affinity list: 80-159
pid 1122's current affinity list: 0-159
pid 1176's current affinity list: 0-79
pid 3683's current affinity list: 0-79
==

At this point if we offline all but CPU0, the only unbound workers
that exist are the unordered workers and those affined to the first
node.
==
root@fir01:/home/ego# ./cpuhp.sh 0 1 159 #See [2] below
root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh 
pid 6's current affinity list: 0
pid 7's current affinity list: 0
pid 1088's current affinity list: 0
pid 1089's current affinity list: 0
pid 1122's current affinity list: 0
pid 1176's current affinity list: 0
pid 3683's current affinity list: 0
==

We now online CPU80 which is the first CPU in the second node. We
would expect that an unbound worker thread corresponding to the second
node would be created and would have the mask 80-159. However, the
newly created workers (pid 4109 and 4110) are affined to CPU0 instead
of CPU80!
==
root@fir01:/home/ego# ./cpuhp.sh 1 80 80
root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh 
pid 6's current affinity list: 0,80
pid 7's current affinity list: 0
pid 1088's current affinity list: 0,80
pid 1089's current affinity list: 0
pid 1122's current affinity list: 0,80
pid 1176's current affinity list: 0
pid 3683's current affinity list: 0
pid 4109's current affinity list: 0
pid 4110's current affinity list: 0
==

Furthermore, if we now bring all the CPUs online, we don't expect new
worker threads to be created since the representative for the second
node would have been created with CPU80 coming online. However, we do
expect that those worker threads are affined to CPUs 80-159. But
that's not the case either!
==
root@fir01:/home/ego# ./cpuhp.sh 1 1 159
root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh 
pid 6's current affinity list: 0-159
pid 7's current affinity list: 0-79
pid 1088's current affinity list: 0-159
pid 1089's current affinity list: 0-79
pid 1122's current affinity list: 0-159
pid 1176's current affinity list: 0-79
pid 3683's current affinity list: 0-79
pid 4109's current affinity list: 0
pid 4110's current affinity list: 0

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Gautham R Shenoy

Hi Tejun,
On Thu, Jun 16, 2016 at 03:35:04PM -0400, Tejun Heo wrote:
> Hello,
> 
> So, the issue of the initial worker not having its affinity set
> correctly wasn't caused by the order of the operations.  Reordering
> just made set_cpus_allowed tried one more time late enough so that it
> hides the race condition most of the time.  The problem is that
> CPU_ONLINE callbacks are called while the cpu being onlined is online
> but not active and select_fallback_rq() only considers active cpus, so
> if a kthread gets scheduled in the meantime and it doesn't have any
> cpu which is active in its allowed mask, it's allowed mask gets reset
> to cpu_possible_mask.
> 
> Would something like the following make sense?
> 
> Thanks.
> -- 8< --
> Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus
> 
> During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
> online but not active.  A CPU_ONLINE callback may create or bind a
> kthread so that its cpus_allowed mask only allows the CPU which is
> being brought online.  The kthread may start executing before the CPU
> is made active and can end up in select_fallback_rq().
> 
> In such cases, the expected behavior is selecting the CPU which is
> coming online; however, because select_fallback_rq() only chooses from
> active CPUs, it determines that the task doesn't have any viable CPU
> in its allowed mask and ends up overriding it to cpu_possible_mask.
> 
> CPU_ONLINE callbacks should be able to put kthreads on the CPU which
> is coming online.  Update select_fallback_rq() so that it follows
> cpu_online() rather than cpu_active() for kthreads.
> 
> Signed-off-by: Tejun Heo <t...@kernel.org>
> Reported-by: Gautham R Shenoy <e...@linux.vnet.ibm.com>

Hi Tejun,

This patch fixes the issue on POWER. I am able to see the worker
threads of the unbound workqueues of the newly onlined node with this.

Tested-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

> ---
>  kernel/sched/core.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 017d539..a12e3db 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct 
> task_struct *p)
>   for (;;) {
>   /* Any allowed, online CPU? */
>   for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) {
> - if (!cpu_active(dest_cpu))
> + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu))
> + continue;
> + if (!cpu_online(dest_cpu))
>   continue;
>   goto out;
>   }
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-28 Thread Gautham R Shenoy

Hi Shilpa,

A minor nit.
On Thu, Jan 28, 2016 at 12:55:41PM +0530, Shilpasri G Bhat wrote:

[..snip..]
> +
> +What:
> /sys/devices/system/cpu/cpufreq/chip*/throttle_reasons/
> +Date:Jan 2016
> +Contact: Linux kernel mailing list 
> + Linux for PowerPC mailing list 
> +Description: CPU Frequency throttle reason stat for the chip
> +
> + This directory contains throttle reason files. Each file gives
> + the total number of times the max frequency is throttled, except
> + for 'unthrottle_count', which gives the total number of times
> + the max frequency is unthrottled after being throttled. Below
> + are the reason attributes.
> +
> + cpu_over_temperature: Throttled due to cpu over temperature
> +
> + occ_reset: Throttled due to reset of OCC
> +
> + over_current: Throttled due to over current

Overcurrent is a single word. No need of the extra space. You could
fix that and add my Reviewed-by.

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 1/6] cpufreq: powernv: Free 'chips' on module exit

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:36PM +0530, Shilpasri G Bhat wrote:
> This will free the dynamically allocated memory of'chips' on
> module exit.
> 
> Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/6] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:38PM +0530, Shilpasri G Bhat wrote:
> cpu_to_chip_id() does a DT walk through to find out the chip id by
> taking a contended device tree lock. This adds an unnecessary overhead
> in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> the chip ids for all cores in the array 'core_to_chip_map' and use it
> in the hotpath.
> 
> Reported-by: Anton Blanchard <an...@samba.org>
> Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 5/6] cpufreq: powernv: Replace pr_info with trace print for throttle event

2016-01-28 Thread Gautham R Shenoy

On Thu, Jan 28, 2016 at 12:55:40PM +0530, Shilpasri G Bhat wrote:
> Currently we use printk message to notify the throttle event. But this
> can flood the console if the cpu is throttled frequently. So replace the
> printk with the tracepoint to notify the throttle event. And also events
> like throttle below nominal frequency and OCC_RESET are reduced to
> pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
> messages. This patch adds 'throttle_reason' to struct chip to store the
> throttle reason.
> 
> Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>

Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-02-03 Thread Gautham R Shenoy

Hi Viresh,

> 
> What I can suggest is:
> - Move this directory inside cpuX/cpufreq/ directory, in a similar way
>   as to how we create 'stats' directory today.
> - You can then get policy->cpu, to get chip->id out of it.
> - The only disadvantage here is that the same chip directory will be
>   replicated in multiple policies, but that makes it more readable.

Thinking about it, having a sysfs group attached to a policy kobject
looks ok if replication of the same chip information across multiple
policies is not objectionable.

Regarding the table-format, it breaks the sysfs's one-value-per-file
rule. So I would still prefer each throttle reason being a separate
file which gives the number of times the chip frequency was throttled
due to that reason. We can live without the per-frequency
throttle stats listed in the throttle_status.

So, would the following be sysfs group structure be acceptable?

$ls -1 /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/
unthrottle
powercap
overtemp
supply_fault
overcurrent
occ_reset
turbo_stat
sub_turbo_stat

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-21 Thread Gautham R Shenoy

On Thu, Jan 21, 2016 at 03:08:59PM +0530, Shilpasri G Bhat wrote:
> Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-24 Thread Gautham R Shenoy

Hello Balbir,

On Sat, Jan 23, 2016 at 07:59:20PM +1100, Balbir Singh wrote:
> On Fri, 22 Jan 2016 12:49:02 +0530
> Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com> wrote:
> 
> > cpu_to_chip_id() does a DT walk through to find out the chip id by
> > taking a contended device tree lock. This adds an unnecessary overhead
> > in a hot path. So instead of calling cpu_to_chip_id() everytime cache
> > the chip ids for all cores in the array 'core_to_chip_map' and use it
> > in the hotpath.
> > 
> > Reported-by: Anton Blanchard <an...@samba.org>
> > Signed-off-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
> > Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
> 
> snip
> 
> Does the core_to_chip_map need to be updated/refreshed on/after/ a
> cpu (core) hotplug? I presume id's don't change

No, the id's don't change on cpu/core hotplug. The core_to_chip_map is
initialized in init_chip_info() where we use for_each_possible_cpu().

Thanks for the review!
> 
> Balbir
> 
--
Thanks and Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/9] powerpc/kvm: make hypervisor state restore a function

2016-05-18 Thread Gautham R Shenoy

Hi Shreyas,

On Tue, May 03, 2016 at 01:54:31PM +0530, Shreyas B. Prabhu wrote:
> In the current code, when the thread wakes up in reset vector, some
> of the state restore code and check for whether a thread needs to
> branch to kvm is duplicated. Reorder the code such that this
> duplication is avoided.
> 
> At a higher level this is what the change looks like-

I have manually verified that the code flow in the new patch is has
the same effect as whatever we were doing earlier. There a couple of
comments inline.

> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 7716ceb..7ebfbb0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -107,25 +107,8 @@ BEGIN_FTR_SECTION
>   beq 9f
> 
>   cmpwi   cr3,r13,2
> + bl  power7_restore_hyp_resource
> 
> - /*
> -  * Check if last bit of HSPGR0 is set. This indicates whether we are
> -  * waking up from winkle.
> -  */
> - GET_PACA(r13)
> - clrldi  r5,r13,63
> - clrrdi  r13,r13,1
> - cmpwi   cr4,r5,1
> - mtspr   SPRN_HSPRG0,r13
> -
> - lbz r0,PACA_THREAD_IDLE_STATE(r13)
> - cmpwi   cr2,r0,PNV_THREAD_NAP
> - bgt cr2,8f  /* Either sleep or Winkle */
> -
> - /* Waking up from nap should not cause hypervisor state loss */
> - bgt cr3,.
> -
> - /* Waking up from nap */
>   li  r0,PNV_THREAD_RUNNING
>   stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
> 
> @@ -143,13 +126,9 @@ BEGIN_FTR_SECTION
> 
>   /* Return SRR1 from power7_nap() */
>   mfspr   r3,SPRN_SRR1
> - beq cr3,2f
> - b   power7_wakeup_noloss
> -2:   b   power7_wakeup_loss
> -
> - /* Fast Sleep wakeup on PowerNV */
> -8:   GET_PACA(r13)

In the old code, we do a GET_PACA(r13) before invoking the
power7_wakeup_tb_loss. In the new code we don't. Can you explain
this omission ?


[..snip..]

> @@ -420,33 +451,9 @@ common_exit:
> 
>  hypervisor_state_restored:
> 
> - li  r5,PNV_THREAD_RUNNING
> - stb r5,PACA_THREAD_IDLE_STATE(r13)
> -
>   mtspr   SPRN_SRR1,r16
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> - li  r0,KVM_HWTHREAD_IN_KERNEL
> - stb r0,HSTATE_HWTHREAD_STATE(r13)
> - /* Order setting hwthread_state vs. testing hwthread_req */
> - sync
> - lbz r0,HSTATE_HWTHREAD_REQ(r13)
> - cmpwi   r0,0
> - beq 6f
> - b   kvm_start_guest
> -6:
> -#endif
> -
> - REST_NVGPRS(r1)
> - REST_GPR(2, r1)
> - ld  r3,_CCR(r1)
> - ld  r4,_MSR(r1)
> - ld  r5,_NIP(r1)
> - addir1,r1,INT_FRAME_SIZE
> - mtcrr3
> - mfspr   r3,SPRN_SRR1/* Return SRR1 */
> - mtspr   SPRN_SRR1,r4
> - mtspr   SPRN_SRR0,r5
> - rfid
> + mtlrr17
> + blr


Perhaps you could add a comment against this blr to indicate that we
go back to the reset vector right after the call to
power7_restore_hyp_resource. 

> 
>  fastsleep_workaround_at_exit:
>   li  r3,1
> -- 
> 2.4.11
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 6/9] powerpc/powernv: set power_save func after the idle states are initialized

2016-05-18 Thread Gautham R Shenoy

On Tue, May 03, 2016 at 01:54:35PM +0530, Shreyas B. Prabhu wrote:
> pnv_init_idle_states discovers supported idle states from the
> device tree and does the required initialization. Set power_save
> function pointer only after this initialization is done
> 
> Signed-off-by: Shreyas B. Prabhy <shre...@linux.vnet.ibm.com>

Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/idle.c  | 3 +++
>  arch/powerpc/platforms/powernv/setup.c | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index fcc8b68..fbb09fb 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
>   }
> 
>   pnv_alloc_idle_core_states();
> +
> + if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> + ppc_md.power_save = power7_idle;
>  out_free:
>   kfree(flags);
>  out:
> diff --git a/arch/powerpc/platforms/powernv/setup.c 
> b/arch/powerpc/platforms/powernv/setup.c
> index 1acb0c7..c9685b6 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -312,7 +312,7 @@ define_machine(powernv) {
>   .get_proc_freq  = pnv_get_proc_freq,
>   .progress   = pnv_progress,
>   .machine_shutdown   = pnv_shutdown,
> - .power_save = power7_idle,
> + .power_save = NULL,
>   .calibrate_decr = generic_calibrate_decr,
>  #ifdef CONFIG_KEXEC
>   .kexec_cpu_down = pnv_kexec_cpu_down,
> -- 
> 2.4.11
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

1 2 3 4 5 6 7 >

1 - 100 of 623 matches

Mail list logo