Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

2017-07-31 Thread Kees Cook
On Mon, Jul 24, 2017 at 12:07 PM, Brijesh Singh  wrote:
> From: Tom Lendacky 
>
> In prep for a new function that will need additional resource information
> during the resource walk, update the resource walk callback to pass the
> resource structure.  Since the current callback start and end arguments
> are pulled from the resource structure, the callback functions can obtain
> them from the resource structure directly.
>
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 

This is a nice clean up even without the refactoring need. :)

Reviewed-by: Kees Cook 

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [Jfs-discussion] [PATCH] fs: convert a pile of fsync routines to errseq_t based reporting

2017-07-31 Thread Dave Kleikamp
On 07/28/2017 09:23 AM, Jeff Layton wrote:
> From: Jeff Layton 
> 
> This patch converts most of the in-kernel filesystems that do writeback
> out of the pagecache to report errors using the errseq_t-based
> infrastructure that was recently added. This allows them to report
> errors once for each open file description.
> 
> Most filesystems have a fairly straightforward fsync operation. They
> call filemap_write_and_wait_range to write back all of the data and
> wait on it, and then (sometimes) sync out the metadata.
> 
> For those filesystems this is a straightforward conversion from calling
> filemap_write_and_wait_range in their fsync operation to calling
> file_write_and_wait_range.
> 
> Signed-off-by: Jeff Layton 

Acked-by: Dave Kleikamp 
(for jfs)

> ---
>  arch/powerpc/platforms/cell/spufs/file.c   | 2 +-
>  drivers/staging/lustre/lustre/llite/file.c | 2 +-
>  drivers/video/fbdev/core/fb_defio.c| 2 +-
>  fs/9p/vfs_file.c   | 4 ++--
>  fs/affs/file.c | 2 +-
>  fs/afs/write.c | 2 +-
>  fs/cifs/file.c | 4 ++--
>  fs/exofs/file.c| 2 +-
>  fs/f2fs/file.c | 2 +-
>  fs/hfs/inode.c | 2 +-
>  fs/hfsplus/inode.c | 2 +-
>  fs/hostfs/hostfs_kern.c| 2 +-
>  fs/hpfs/file.c | 2 +-
>  fs/jffs2/file.c| 2 +-
>  fs/jfs/file.c  | 2 +-
>  fs/ncpfs/file.c| 2 +-
>  fs/ntfs/dir.c  | 2 +-
>  fs/ntfs/file.c | 2 +-
>  fs/ocfs2/file.c| 2 +-
>  fs/reiserfs/dir.c  | 2 +-
>  fs/reiserfs/file.c | 2 +-
>  fs/ubifs/file.c| 2 +-
>  22 files changed, 24 insertions(+), 24 deletions(-)
> 
> Rolling up all of these conversions into a single patch, as Christoph
> Hellwig suggested. Most of these are not tested, but the conversion
> here is fairly straightforward.
> 
> Any maintainers who object, please let me know and I'll yank that
> part out of this patch.
> 
> diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
> b/arch/powerpc/platforms/cell/spufs/file.c
> index ae2f740a82f1..5ffcdeb1eb17 100644
> --- a/arch/powerpc/platforms/cell/spufs/file.c
> +++ b/arch/powerpc/platforms/cell/spufs/file.c
> @@ -1749,7 +1749,7 @@ static int spufs_mfc_flush(struct file *file, 
> fl_owner_t id)
>  static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
>   struct inode *inode = file_inode(file);
> - int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + int err = file_write_and_wait_range(file, start, end);
>   if (!err) {
>   inode_lock(inode);
>   err = spufs_mfc_flush(file, NULL);
> diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> b/drivers/staging/lustre/lustre/llite/file.c
> index ab1c85c1ed38..f7d07735ac66 100644
> --- a/drivers/staging/lustre/lustre/llite/file.c
> +++ b/drivers/staging/lustre/lustre/llite/file.c
> @@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t 
> end, int datasync)
>  PFID(ll_inode2fid(inode)), inode);
>   ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1);
>  
> - rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + rc = file_write_and_wait_range(file, start, end);
>   inode_lock(inode);
>  
>   /* catch async errors that were recorded back when async writeback
> diff --git a/drivers/video/fbdev/core/fb_defio.c 
> b/drivers/video/fbdev/core/fb_defio.c
> index 37f69c061210..487d5e336e1b 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, 
> loff_t end, int datasy
>  {
>   struct fb_info *info = file->private_data;
>   struct inode *inode = file_inode(file);
> - int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + int err = file_write_and_wait_range(file, start, end);
>   if (err)
>   return err;
>  
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 3de3b4a89d89..4802d75b3cf7 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t 
> start, loff_t end,
>   struct p9_wstat wstat;
>   int retval;
>  
> - retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + retval = file_write_and_wait_range(filp, start, end);
>   if (retval)
>   return retval;
>  
> @@ -468,7 +468,7 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, 
> loff_t end,
>   struct inode 

[RFC PATCH 3/3] powerpc/mm: Separate LMB information from device tree format

2017-07-31 Thread Nathan Fontenot
With the upcoming introduction of a new device tree property format
for memory (ibm,dynamic-memory-v2), we should separate the data for
the LMBs on a system from the device tree format used to represent
them. Witout doing this we face the task of having to update each
piece of the kernel that wants LMB information to know how
to parse all possible device tree formats.

This patch solves this by creating an array LMB information at boot
time. This array will hold all relevant information poresented in
the device tree for each LMB; base address, drc index, associativity
array index, and flags.

Any need to get LMB information can now use the array to get LMB
information directly without having to determine which version of
the device tree format is in use and know how to parse each one.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/include/asm/lmb.h |   44 ++
 arch/powerpc/mm/Makefile   |2 
 arch/powerpc/mm/lmb.c  |  146 
 arch/powerpc/mm/numa.c |  181 +++-
 4 files changed, 221 insertions(+), 152 deletions(-)
 create mode 100644 arch/powerpc/include/asm/lmb.h
 create mode 100644 arch/powerpc/mm/lmb.c

diff --git a/arch/powerpc/include/asm/lmb.h b/arch/powerpc/include/asm/lmb.h
new file mode 100644
index 000..7ff2fa6
--- /dev/null
+++ b/arch/powerpc/include/asm/lmb.h
@@ -0,0 +1,44 @@
+/*
+ * lmb.h: Power specific logical memory block representation
+ *
+ * ** Add (C) **
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_POWERPC_LMB_H
+#define _ASM_POWERPC_LMB_H
+
+extern struct lmb_data *lmb_array;
+extern int n_mem_addr_cells, n_mem_size_cells;
+
+struct lmb {
+   u64 base_address;
+   u32 drc_index;
+   u32 aa_index;
+   u32 flags;
+};
+
+struct lmb_data {
+   struct lmb  *lmbs;
+   int num_lmbs;
+   u32 lmb_size;
+};
+
+extern struct lmb_data *lmb_array;
+
+#define for_each_lmb(_lmb) \
+   for (_lmb = _array->lmbs[0];\
+_lmb != _array->lmbs[lmb_array->num_lmbs]; \
+_lmb++)
+
+extern int lmb_init(void);
+extern u32 lmb_get_lmb_size(void);
+extern u64 lmb_get_max_memory(void);
+extern unsigned long read_n_cells(int n, const __be32 **buf);
+extern void get_n_mem_cells(int *n_addr_cells, int *n_size_cells);
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 7414034..56c2591 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_40x) += 40x_mmu.o
 obj-$(CONFIG_44x)  += 44x_mmu.o
 obj-$(CONFIG_PPC_8xx)  += 8xx_mmu.o
 obj-$(CONFIG_PPC_FSL_BOOK3E)   += fsl_booke_mmu.o
-obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
+obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o lmb.o
 obj-$(CONFIG_PPC_SPLPAR)   += vphn.o
 obj-$(CONFIG_PPC_MM_SLICES)+= slice.o
 obj-y  += hugetlbpage.o
diff --git a/arch/powerpc/mm/lmb.c b/arch/powerpc/mm/lmb.c
new file mode 100644
index 000..e12e5be
--- /dev/null
+++ b/arch/powerpc/mm/lmb.c
@@ -0,0 +1,146 @@
+/*
+ * pSeries LMB support
+ *
+ * ** Add (C)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "lmb: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct lmb_data __lmb_data;
+struct lmb_data *lmb_array = &__lmb_data;
+int n_mem_addr_cells, n_mem_size_cells;
+
+unsigned long read_n_cells(int n, const __be32 **buf)
+{
+   unsigned long result = 0;
+
+   while (n--) {
+   result = (result << 32) | of_read_number(*buf, 1);
+   (*buf)++;
+   }
+   return result;
+}
+
+void __init get_n_mem_cells(int *n_addr_cells, int *n_size_cells)
+{
+   struct device_node *memory = NULL;
+
+   memory = of_find_node_by_type(memory, "memory");
+   if (!memory)
+   panic("numa.c: No memory nodes found!");
+
+   *n_addr_cells = of_n_addr_cells(memory);
+   *n_size_cells = of_n_size_cells(memory);
+   of_node_put(memory);
+}
+
+u32 lmb_get_lmb_size(void)
+{
+   return lmb_array->lmb_size;
+}
+
+u64 lmb_get_max_memory(void)
+{
+   u32 last_index = lmb_array->num_lmbs - 1;
+
+   return lmb_array->lmbs[last_index].base_address + lmb_array->lmb_size;
+}
+
+/*
+ * Retrieve and validate the ibm,dynamic-memory property of the device tree.
+ *
+ * The layout of the ibm,dynamic-memory property is a number N of memblock

[RFC PATCH 2/3] powerpc/numa: Get device node whenn retreiving usm memory

2017-07-31 Thread Nathan Fontenot
When we move to using the kernel lmb structs instead of accessing
the device tree directly for LMB information we will no longer have
a pointer to the device node for memory to pass to
of_get_usable_memory().

This patch updates of_get_usable_memory() to no longer take a
device node pointer and does the lookup of the memory device node
itself.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/mm/numa.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9c21953..24d9299 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -184,11 +184,19 @@ static const __be32 *of_get_associativity(struct 
device_node *dev)
  * it exists (the property exists only in kexec/kdump kernels,
  * added by kexec-tools)
  */
-static const __be32 *of_get_usable_memory(struct device_node *memory)
+static const __be32 *of_get_usable_memory(void)
 {
+   struct device_node *memory;
const __be32 *prop;
u32 len;
+
+   memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (!memory)
+   return NULL;
+
prop = of_get_property(memory, "linux,drconf-usable-memory", );
+   of_node_put(memory);
+
if (!prop || len < sizeof(unsigned int))
return NULL;
return prop;
@@ -674,7 +682,7 @@ static void __init parse_drconf_memory(struct device_node 
*memory)
return;
 
/* check if this is a kexec/kdump kernel */
-   usm = of_get_usable_memory(memory);
+   usm = of_get_usable_memory();
if (usm != NULL)
is_kexec_kdump = 1;
 



[RFC PATCH 1/3] powerpc/numa: Get device node whenn retreiving associativity arrays

2017-07-31 Thread Nathan Fontenot
When we move to using the kernel lmb structs instead of accessing
the device tree directly for LMB information we will no longer have
a pointer to the device node for memory to pass to of_get_assoc_arrays().

This patch updates of_get_assoc_arrays() to no longer take a device node
pointer and does the lookup of the memory device node itself.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/mm/numa.c |   18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 371792e..9c21953 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -466,19 +466,27 @@ struct assoc_arrays {
  * indicating the size of each associativity array, followed by a list
  * of N associativity arrays.
  */
-static int of_get_assoc_arrays(struct device_node *memory,
-  struct assoc_arrays *aa)
+static int of_get_assoc_arrays(struct assoc_arrays *aa)
 {
+   struct device_node *memory;
const __be32 *prop;
u32 len;
 
+   memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (!memory)
+   return -1;
+
prop = of_get_property(memory, "ibm,associativity-lookup-arrays", );
-   if (!prop || len < 2 * sizeof(unsigned int))
+   if (!prop || len < 2 * sizeof(unsigned int)) {
+   of_node_put(memory);
return -1;
+   }
 
aa->n_arrays = of_read_number(prop++, 1);
aa->array_sz = of_read_number(prop++, 1);
 
+   of_node_put(memory);
+
/* Now that we know the number of arrays and size of each array,
 * revalidate the size of the property read in.
 */
@@ -661,7 +669,7 @@ static void __init parse_drconf_memory(struct device_node 
*memory)
if (!lmb_size)
return;
 
-   rc = of_get_assoc_arrays(memory, );
+   rc = of_get_assoc_arrays();
if (rc)
return;
 
@@ -996,7 +1004,7 @@ static int hot_add_drconf_scn_to_nid(struct device_node 
*memory,
if (!lmb_size)
return -1;
 
-   rc = of_get_assoc_arrays(memory, );
+   rc = of_get_assoc_arrays();
if (rc)
return -1;
 



[RFC PATCH 0/3] Separate LMB data from device tree format

2017-07-31 Thread Nathan Fontenot
With the upcoming introduction of a new device tree property format
for memory (ibm,dynamic-memory-v2), we should separate LMB data
from the device tree format used to represent them. Doing this
allows any consumer of LMB information, currently the mm/numa
and pseries/hotplug-memory code, to access it directly without
having to worry about device tree format and how to parse it.

This patch set attempts to solve this by creating an array of LMB
information at boot time which holds all relevant data presented in
the device tree for each LMB; base address, drc index, associativity
array index, and flags.

The first two patches are small updates to two routines to have them
look up the memory device node instead of having it passed to them.
This is needed since the new code will not need to look at the
device tree when getting LMB information.

The third patch introduces the new LMB data array in lmb.h and the
set of routines needed to initialize and access the array. The data
is intialized from parse_numa_properties() and the routines in numa.c
are updated to use the new LMB array.

A few notes:

This code has only been boot-tested. I would like to get some feedback
on this approach before venturing too far down this design path.

I considered intializing the lmb array in prom.c to avoid having to
write a new routine there to parse the -v2 property. My concern is
allocating the array for this information that early in boot, the
number of LMBs can get very large on systems with 16, 32 TB.

The code for memory DLPAR (pseries/hotplug-memory.c) still needs
to be updated to use the new LMB array.


Any thoughts, feedback, comments (good and bad) would be appreciated.

Thanks,
-Nathan
---

Nathan Fontenot (3):
  powerpc/numa: Get device node whenn retreiving associativity arrays
  powerpc/numa: Get device node whenn retreiving usm memory
  powerpc/mm: Separate LMB information from device tree format


 arch/powerpc/include/asm/lmb.h |   44 
 arch/powerpc/mm/Makefile   |2 
 arch/powerpc/mm/lmb.c  |  146 
 arch/powerpc/mm/numa.c |  211 ++--
 4 files changed, 244 insertions(+), 159 deletions(-)
 create mode 100644 arch/powerpc/include/asm/lmb.h
 create mode 100644 arch/powerpc/mm/lmb.c



Re: [PATCH v3] powerpc/powernv: Enable PCI peer-to-peer

2017-07-31 Thread Brian King
Michael,

What do we need on this one before we can pull into your -next branch?

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



[RFC PATCH] powerpc: Disabling MEMORY_HOTPLUG_DEFAULT_ONLINE option for PPC64 arch

2017-07-31 Thread Daniel Henrique Barboza
Commit 943db62c316c ("powerpc/pseries: Revert 'Auto-online
hotplugged memory'") reverted the auto-online feature for pseries due
to problems with LMB removals not updating the device struct properly.
Among other things, this commit made the following change in
arch/powerpc/configs/pseries_defconfig:

@@ -58,7 +58,6 @@ CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTREMOVE=y
-CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
 CONFIG_KSM=y

The intent was to disable the option in the defconfig of pseries, since
after that the code doesn't have this support anymore. However, this change
alone isn't enough to prevent situations such as [1], where
distros can enable the option unaware of the consequences of
doing it (e.g. breaking LMB hotplug altogether).

Instead of relying on all distros knowing that pseries can't handle
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y after 943db62c316c, this patch
changes mm/Kconfig to make the MEMORY_HOTPLUG_DEFAULT_ONLINE config
unavailable for the PPC64 arch.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1476380

Fixes: 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
Signed-off-by: Daniel Henrique Barboza 
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 48b1af4..a342c77 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -169,7 +169,7 @@ config MEMORY_HOTPLUG_SPARSE
 config MEMORY_HOTPLUG_DEFAULT_ONLINE
 bool "Online the newly added memory blocks by default"
 default n
-depends on MEMORY_HOTPLUG
+depends on MEMORY_HOTPLUG && !PPC64
 help
  This option sets the default policy setting for memory hotplug
  onlining policy (/sys/devices/system/memory/auto_online_blocks) which
-- 
2.9.4



Re: [RFC v6 13/62] powerpc: track allocation status of all pkeys

2017-07-31 Thread Thiago Jung Bauermann

Ram Pai  writes:
>  static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
>  {
> - return -EINVAL;
> + if (!pkey_inited)
> + return -1;

Sorry, I missed this earlier but the pkey_free syscall will pass this
value to userspace so it needs to be an errno as well (-EINVAL?).

> +
> + if (!mm_pkey_is_allocated(mm, pkey))
> + return -EINVAL;
> +
> + mm_set_pkey_free(mm, pkey);
> +
> + return 0;
>  }

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



[PATCH 3/3] powerpc/xmon: Disable tracing on xmon by default

2017-07-31 Thread Breno Leitao
Currently tracing is enabled from inside xmon, which may cause some
noise into the tracing buffer, and makes it harder to find what, in
the tracing buffer, are kernel non-xmon functions and what is xmon
'noise' (as printk()s and terminal functions tracing).

This patch simple disables it by default, showing a better trace output
of the failing functions just before it gets into xmon.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/xmon/xmon.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 19276d2f2f25..b614cc3a3a65 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -89,7 +89,7 @@ static unsigned long nidump = 16;
 static unsigned long ncsum = 4096;
 static int termch;
 static char tmpstr[128];
-static char tracing_enabled = 1;
+static char tracing_enabled = 0;
 
 static long bus_error_jmp[JMP_BUF_LEN];
 static int catch_memory_errors;
@@ -463,6 +463,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
 
local_irq_save(flags);
hard_irq_disable();
+   tracing_off();
 
bp = in_breakpoint_table(regs->nip, );
if (bp != NULL) {
-- 
2.13.2



[PATCH 2/3] powerpc/xmon: Disable and enable tracing command

2017-07-31 Thread Breno Leitao
If tracing is enabled and you get into xmon, the tracing buffer
continues to be updated, causing possible loss of data due to buffer
overflow and unnecessary tracing information coming from xmon functions.

This patch adds a new option that allows the tracing to be disabled and
re-enabled from inside xmon.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/xmon/xmon.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 0cbd910193fa..19276d2f2f25 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -89,6 +89,7 @@ static unsigned long nidump = 16;
 static unsigned long ncsum = 4096;
 static int termch;
 static char tmpstr[128];
+static char tracing_enabled = 1;
 
 static long bus_error_jmp[JMP_BUF_LEN];
 static int catch_memory_errors;
@@ -268,6 +269,7 @@ Commands:\n\
   Sr # read SPR #\n\
   Sw #v write v to SPR #\n\
   tprint backtrace\n\
+  vtrace enable/disable\n\
   xexit monitor and recover\n\
   Xexit monitor and don't recover\n"
 #if defined(CONFIG_PPC64) && !defined(CONFIG_PPC_BOOK3E)
@@ -983,6 +985,17 @@ cmds(struct pt_regs *excp)
case 'x':
case 'X':
return cmd;
+   case 'v':
+   if (tracing_is_on()) {
+   printk("Disabling tracing\n");
+   tracing_enabled = 0;
+   tracing_off();
+   } else {
+   printk("Enabling tracing\n");
+   tracing_enabled = 1;
+   tracing_on();
+   }
+   break;
case EOF:
printf(" \n");
mdelay(2000);
@@ -2353,7 +2366,8 @@ static void dump_tracing(void)
else
ftrace_dump(DUMP_ALL);
 
-   tracing_on();
+   if (tracing_enabled)
+   tracing_on();
 }
 
 static void dump_all_pacas(void)
-- 
2.13.2



[PATCH 1/3] powerpc/xmon: Dump ftrace buffers for the current CPU

2017-07-31 Thread Breno Leitao
Current xmon 'dt' command dumps the tracing buffer for all the CPUs,
which makes it possibly hard to read the logs due to the fact that most
of powerpc machines currently have many CPUs. Other than that, the CPU
lines are interleaved in the ftrace log.

This new option just dumps the ftrace buffer for the current CPU.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/xmon/xmon.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 08e367e3e8c3..0cbd910193fa 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -234,6 +234,7 @@ Commands:\n\
   "\
   dr   dump stream of raw bytes\n\
   dt   dump the tracing buffers (uses printk)\n\
+  dtc  dump the tracing buffers for current CPU (uses printk)\n\
 "
 #ifdef CONFIG_PPC_POWERNV
 "  dx#   dump xive on CPU #\n\
@@ -2342,6 +2343,19 @@ static void dump_one_paca(int cpu)
sync();
 }
 
+static void dump_tracing(void)
+{
+   int c;
+
+   c = inchar();
+   if (c == 'c')
+   ftrace_dump(DUMP_ORIG);
+   else
+   ftrace_dump(DUMP_ALL);
+
+   tracing_on();
+}
+
 static void dump_all_pacas(void)
 {
int cpu;
@@ -2507,6 +2521,11 @@ dump(void)
}
 #endif
 
+   if (c == 't') {
+   dump_tracing();
+   return;
+   }
+
if (c == '\n')
termch = c;
 
@@ -2525,9 +2544,6 @@ dump(void)
dump_log_buf();
} else if (c == 'o') {
dump_opal_msglog();
-   } else if (c == 't') {
-   ftrace_dump(DUMP_ALL);
-   tracing_on();
} else if (c == 'r') {
scanhex();
if (ndump == 0)
-- 
2.13.2



Re: [RFC v6 21/62] powerpc: introduce execute-only pkey

2017-07-31 Thread Thiago Jung Bauermann

Ram Pai  writes:

> On Fri, Jul 28, 2017 at 07:17:13PM -0300, Thiago Jung Bauermann wrote:
>> 
>> Ram Pai  writes:
>> > --- a/arch/powerpc/mm/pkeys.c
>> > +++ b/arch/powerpc/mm/pkeys.c
>> > @@ -97,3 +97,60 @@ int __arch_set_user_pkey_access(struct task_struct 
>> > *tsk, int pkey,
>> >init_iamr(pkey, new_iamr_bits);
>> >return 0;
>> >  }
>> > +
>> > +static inline bool pkey_allows_readwrite(int pkey)
>> > +{
>> > +  int pkey_shift = pkeyshift(pkey);
>> > +
>> > +  if (!(read_uamor() & (0x3UL << pkey_shift)))
>> > +  return true;
>> > +
>> > +  return !(read_amr() & ((AMR_RD_BIT|AMR_WR_BIT) << pkey_shift));
>> > +}
>> > +
>> > +int __execute_only_pkey(struct mm_struct *mm)
>> > +{
>> > +  bool need_to_set_mm_pkey = false;
>> > +  int execute_only_pkey = mm->context.execute_only_pkey;
>> > +  int ret;
>> > +
>> > +  /* Do we need to assign a pkey for mm's execute-only maps? */
>> > +  if (execute_only_pkey == -1) {
>> > +  /* Go allocate one to use, which might fail */
>> > +  execute_only_pkey = mm_pkey_alloc(mm);
>> > +  if (execute_only_pkey < 0)
>> > +  return -1;
>> > +  need_to_set_mm_pkey = true;
>> > +  }
>> > +
>> > +  /*
>> > +   * We do not want to go through the relatively costly
>> > +   * dance to set AMR if we do not need to.  Check it
>> > +   * first and assume that if the execute-only pkey is
>> > +   * readwrite-disabled than we do not have to set it
>> > +   * ourselves.
>> > +   */
>> > +  if (!need_to_set_mm_pkey &&
>> > +  !pkey_allows_readwrite(execute_only_pkey))
>   ^
>   Here uamor and amr is read once each.

You are right. What confused me was that the call to mm_pkey_alloc above
also reads uamor and amr (and also iamr, and writes to all of those) but
if that function is called, then need_to_set_mm_pkey is true and
pkey_allows_readwrite won't be called.

>> > +  return execute_only_pkey;
>> > +
>> > +  /*
>> > +   * Set up AMR so that it denies access for everything
>> > +   * other than execution.
>> > +   */
>> > +  ret = __arch_set_user_pkey_access(current, execute_only_pkey,
>> > +  (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
>   ^^^
>   here amr and iamr are written once each if the
>   the function returns successfully.

__arch_set_user_pkey_access also reads uamor for the second time in its
call to is_pkey_enabled, and reads amr for the second time as well in
its calls to init_amr. The first reads are in either
pkey_allows_readwrite or pkey_status_change (called from
__arch_activate_pkey).

If need_to_set_mm_pkey is true, then the iamr read in init_iamr is the
2nd one during __execute_only_pkey's execution. In this case the writes
to amr and iamr will be the 2nd ones as well. The first reads and writes
are in pkey_status_change.

>> > +  /*
>> > +   * If the AMR-set operation failed somehow, just return
>> > +   * 0 and effectively disable execute-only support.
>> > +   */
>> > +  if (ret) {
>> > +  mm_set_pkey_free(mm, execute_only_pkey);
>   ^^^
>   here only if __arch_set_user_pkey_access() fails
>   amr and iamr and uamor will be written once each.

I assume the error case isn't perfomance sensitive and didn't account
for mm_set_pkey_free in my analysis.

>> > +  return -1;
>> > +  }
>> > +
>> > +  /* We got one, store it and use it from here on out */
>> > +  if (need_to_set_mm_pkey)
>> > +  mm->context.execute_only_pkey = execute_only_pkey;
>> > +  return execute_only_pkey;
>> > +}
>> 
>> If you follow the code flow in __execute_only_pkey, the AMR and UAMOR
>> are read 3 times in total, and AMR is written twice. IAMR is read and
>> written twice. Since they are SPRs and access to them is slow (or isn't
>> it?), is it worth it to read them once in __execute_only_pkey and pass
>> down their values to the callees, and then write them once at the end of
>> the function?
>
> If my calculations are right: 
>   uamor may be read once and may be written once.
>   amr may be read once and is written once.
>   iamr is written once.
> So not that bad, i think.

If I'm following the code correctly:
if need_to_set_mm_pkey = true:
uamor is read twice and written once.
amr is read twice and written twice.
iamr is read twice and written twice.
if need_to_set_mm_pkey = false:
uamor is read twice.
amr is read once or twice (depending on the value of uamor) and written 
once.
iamr is read once and written once.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table

2017-07-31 Thread Javier Martinez Canillas
Hello Wolfram,

On Mon, Jul 31, 2017 at 5:30 PM, Wolfram Sang  wrote:
>
>> Patches can be applied independently since the DTS changes without driver
>> changes are no-op and the OF table won't be used without the DTS changes.
>
> But there is a dependency, no? If I apply the driver patch,
> non-converted device trees will not find their eeproms anymore. So, I

I don't think that's correct. If you apply this patch before the DTS
changes, the driver will still match using the I2C device ID table
like it has been doing it until today.

IOW, this is what will happen:

1- an OF device is registered with the wrong compatible (not found in
the OF table)
2- the I2C core strips the vendor part and fills the struct i2c_client
.name with the device part.
3- i2c_device_match() will be called since a new device has been registered
4- i2c_of_match_device() will fail because there's no OF entry that
matches the device compatible
5- the I2C core fallbacks to i2c_match_id() and matches using the I2C
device ID table.

So no noticeable difference AFAICT in that case.

Best regards,
Javier


Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table

2017-07-31 Thread Wolfram Sang

> Patches can be applied independently since the DTS changes without driver
> changes are no-op and the OF table won't be used without the DTS changes.

But there is a dependency, no? If I apply the driver patch,
non-converted device trees will not find their eeproms anymore. So, I
need to wait until all DTS patches are upstream, right? I can pick patch
1, though. We can already document it.



signature.asc
Description: PGP signature


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-31 Thread Jonathan Cameron
On Mon, 31 Jul 2017 08:04:11 -0700
"Paul E. McKenney"  wrote:

> On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote:
> > On Fri, 28 Jul 2017 12:03:50 -0700
> > "Paul E. McKenney"  wrote:
> >   
> > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:  
> > > > On Fri, 28 Jul 2017 09:55:29 -0700
> > > > "Paul E. McKenney"  wrote:
> > > > 
> > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:
> > > > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > > > Jonathan Cameron  wrote:  
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the 
> > > > > > interest of having
> > > > > > fewer timer events.  Issue became much easier to trigger (on some 
> > > > > > runs before
> > > > > > I could get tracing up and running)
> > > > > >e
> > > > > > So logs are large enough that pastebin doesn't like them - please 
> > > > > > shoet if  
> > > > > >>e another timer period is of interest.  
> > > > > > 
> > > > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > > > 
> > > > > > The relevant timeout on the RCU stall detector was 8 seconds.  
> > > > > > Event is
> > > > > > detected around 835.
> > > > > > 
> > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but 
> > > > > > there
> > > > > > may well be one in there.  
> > > > > 
> > > > > The dmesg says:
> > > > > 
> > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 
> > > > > RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > > 
> > > > > So I look for "rcu_preempt" timer events and find these:
> > > > > 
> > > > > rcu_preempt-9 [019]    827.579114: timer_init: 
> > > > > timer=8017d5fc7da0
> > > > > rcu_preempt-9 [019] d..1   827.579115: timer_start: 
> > > > > timer=8017d5fc7da0 function=process_timeout 
> > > > > 
> > > > > Next look for "8017d5fc7da0" and I don't find anything else.
> > > > It does show up off the bottom of what would fit in pastebin...
> > > > 
> > > >  rcu_preempt-9 [001] d..1   837.681077: timer_cancel: 
> > > > timer=8017d5fc7da0
> > > >  rcu_preempt-9 [001]    837.681086: timer_init: 
> > > > timer=8017d5fc7da0
> > > >  rcu_preempt-9 [001] d..1   837.681087: timer_start: 
> > > > timer=8017d5fc7da0 function=process_timeout expires=4295101298 
> > > > [timeout=1] cpu=1 idx=0 flags=
> > > 
> > > Odd.  I would expect an expiration...  And ten seconds is way longer
> > > than the requested one jiffy!
> > >   
> > > > > The timeout was one jiffy, and more than a second later, no 
> > > > > expiration.
> > > > > Is it possible that this event was lost?  I am not seeing any sign of
> > > > > this is the trace.
> > > > > 
> > > > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > > > any case).
> > > > > 
> > > > > The last time we saw something like this it was a timer HW/driver 
> > > > > problem,
> > > > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > > > and SPARC.  ;-)
> > > > Could be different issues, both of which were hidden by that lockup 
> > > > detector.
> > > > 
> > > > There is an errata work around for the timers on this particular board.
> > > > I'm only vaguely aware of it, so may be unconnected.
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb
> > > > 
> > > > Seems unlikely though! + we've not yet seen it on the other chips that
> > > > errata effects (not that that means much).
> > > 
> > > If you can reproduce quickly, might be worth trying anyway...
> > > 
> > >   Thanx, Paul  
> > Errata fix is running already and was for all those tests.  
> 
> I was afraid of that...  ;-)
It's a pretty rare errata it seems.  Not actually managed to catch
one yet. 
> 
> > I'll have a dig into the timers today and see where I get to.  
> 
> Look forward to seeing what you find!
Nothing obvious turning up other than we don't seem to have issue
when we aren't running hrtimers.

On a plus side I just got a report that it is effecting our d03
boards which is good on the basis I couldn't tell what the difference
could be wrt to this issue!

It indeed looks like we are consistently missing a timer before
the rcu splat occurs.

J
> 
>   Thanx, Paul
> 
> > Jonathan  
> > >   
> > > > Jonathan
> > > > 
> > > > > 
> > > > > Thomas, any debugging suggestions?
> > > > > 
> > > > >   Thanx, Paul
> > > > > 
> > > > 
> > >   
> >   
> 



Re: [PATCH] i2c: Convert to using %pOF instead of full_name

2017-07-31 Thread Wolfram Sang
On Tue, Jul 18, 2017 at 04:43:06PM -0500, Rob Herring wrote:
> Now that we have a custom printf format specifier, convert users of
> full_name to use %pOF instead. This is preparation to remove storing
> of the full path string for each node.
> 
> Signed-off-by: Rob Herring 
> Cc: Haavard Skinnemoen 
> Cc: Wolfram Sang 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Maxime Ripard 
> Cc: Chen-Yu Tsai 
> Cc: Peter Rosin 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org

Applied to for-next, thanks!



signature.asc
Description: PGP signature


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-31 Thread Paul E. McKenney
On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote:
> On Fri, 28 Jul 2017 12:03:50 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:
> > > On Fri, 28 Jul 2017 09:55:29 -0700
> > > "Paul E. McKenney"  wrote:
> > >   
> > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > > Jonathan Cameron  wrote:
> > > > 
> > > > [ . . . ]
> > > >   
> > > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the 
> > > > > interest of having
> > > > > fewer timer events.  Issue became much easier to trigger (on some 
> > > > > runs before
> > > > > I could get tracing up and running)
> > > > >e
> > > > > So logs are large enough that pastebin doesn't like them - please 
> > > > > shoet if
> > > > >>e another timer period is of interest.
> > > > > 
> > > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > > 
> > > > > The relevant timeout on the RCU stall detector was 8 seconds.  Event 
> > > > > is
> > > > > detected around 835.
> > > > > 
> > > > > It's a lot of logs, so I haven't identified a smoking gun yet but 
> > > > > there
> > > > > may well be one in there.
> > > > 
> > > > The dmesg says:
> > > > 
> > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 
> > > > RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > 
> > > > So I look for "rcu_preempt" timer events and find these:
> > > > 
> > > > rcu_preempt-9 [019]    827.579114: timer_init: 
> > > > timer=8017d5fc7da0
> > > > rcu_preempt-9 [019] d..1   827.579115: timer_start: 
> > > > timer=8017d5fc7da0 function=process_timeout 
> > > > 
> > > > Next look for "8017d5fc7da0" and I don't find anything else.  
> > > It does show up off the bottom of what would fit in pastebin...
> > > 
> > >  rcu_preempt-9 [001] d..1   837.681077: timer_cancel: 
> > > timer=8017d5fc7da0
> > >  rcu_preempt-9 [001]    837.681086: timer_init: 
> > > timer=8017d5fc7da0
> > >  rcu_preempt-9 [001] d..1   837.681087: timer_start: 
> > > timer=8017d5fc7da0 function=process_timeout expires=4295101298 
> > > [timeout=1] cpu=1 idx=0 flags=  
> > 
> > Odd.  I would expect an expiration...  And ten seconds is way longer
> > than the requested one jiffy!
> > 
> > > > The timeout was one jiffy, and more than a second later, no expiration.
> > > > Is it possible that this event was lost?  I am not seeing any sign of
> > > > this is the trace.
> > > > 
> > > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > > any case).
> > > > 
> > > > The last time we saw something like this it was a timer HW/driver 
> > > > problem,
> > > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > > and SPARC.  ;-)  
> > > Could be different issues, both of which were hidden by that lockup 
> > > detector.
> > > 
> > > There is an errata work around for the timers on this particular board.
> > > I'm only vaguely aware of it, so may be unconnected.
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb
> > > 
> > > Seems unlikely though! + we've not yet seen it on the other chips that
> > > errata effects (not that that means much).  
> > 
> > If you can reproduce quickly, might be worth trying anyway...
> > 
> > Thanx, Paul
> Errata fix is running already and was for all those tests.

I was afraid of that...  ;-)

> I'll have a dig into the timers today and see where I get to.

Look forward to seeing what you find!

Thanx, Paul

> Jonathan
> > 
> > > Jonathan
> > >   
> > > > 
> > > > Thomas, any debugging suggestions?
> > > > 
> > > > Thanx, Paul
> > > >   
> > >   
> > 
> 



RE: [PATCH 1/5] Fix packed and aligned attribute warnings.

2017-07-31 Thread David Laight
From: SZ Lin
> Sent: 29 July 2017 08:24
...
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.h b/drivers/char/tpm/tpm_ibmvtpm.h
> index 91dfe766d080..9f708ca3dc84 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.h
> +++ b/drivers/char/tpm/tpm_ibmvtpm.h
> @@ -25,7 +25,7 @@ struct ibmvtpm_crq {
>   __be16 len;
>   __be32 data;
>   __be64 reserved;
> -} __attribute__((packed, aligned(8)));
> +} __packed __aligned(8);

You can't need __packed and __aligned(8) on that structure.
There are no gaps and you are saying it is always aligned.

So just remove the pointless attributes.

David



Re: [RFC v6 20/62] powerpc: store and restore the pkey state across context switches

2017-07-31 Thread Michael Ellerman
Ram Pai  writes:
> On Thu, Jul 27, 2017 at 02:32:59PM -0300, Thiago Jung Bauermann wrote:
>> Ram Pai  writes:
>> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> > index 2ad725e..9429361 100644
>> > --- a/arch/powerpc/kernel/process.c
>> > +++ b/arch/powerpc/kernel/process.c
>> > @@ -1096,6 +1096,11 @@ static inline void save_sprs(struct thread_struct 
>> > *t)
>> >t->tar = mfspr(SPRN_TAR);
>> >}
>> >  #endif
>> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
>> > +  t->amr = mfspr(SPRN_AMR);
>> > +  t->iamr = mfspr(SPRN_IAMR);
>> > +  t->uamor = mfspr(SPRN_UAMOR);
>> > +#endif
>> >  }
>> >
>> >  static inline void restore_sprs(struct thread_struct *old_thread,
>> > @@ -1131,6 +1136,14 @@ static inline void restore_sprs(struct 
>> > thread_struct *old_thread,
>> >mtspr(SPRN_TAR, new_thread->tar);
>> >}
>> >  #endif
>> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
>> > +  if (old_thread->amr != new_thread->amr)
>> > +  mtspr(SPRN_AMR, new_thread->amr);
>> > +  if (old_thread->iamr != new_thread->iamr)
>> > +  mtspr(SPRN_IAMR, new_thread->iamr);
>> > +  if (old_thread->uamor != new_thread->uamor)
>> > +  mtspr(SPRN_UAMOR, new_thread->uamor);
>> > +#endif
>> >  }
>> 
>> Shouldn't the saving and restoring of the SPRs be guarded by a check for
>> whether memory protection keys are enabled? What happens when trying to
>> access these registers on a CPU which doesn't have them?
>
> Good point. need to guard it.  However; i think, these registers have been
> available since power6.

The kernel runs on CPUs much older than that.

IAMR was added on Power8.

And performance is also an issue, so we should only switch them when we
need to.

cheers


Re: [RFC v6 19/62] powerpc: ability to create execute-disabled pkeys

2017-07-31 Thread Michael Ellerman
Ram Pai  writes:

> On Thu, Jul 27, 2017 at 11:54:31AM -0300, Thiago Jung Bauermann wrote:
>> 
>> Ram Pai  writes:
>> 
>> > --- a/arch/powerpc/include/asm/pkeys.h
>> > +++ b/arch/powerpc/include/asm/pkeys.h
>> > @@ -2,6 +2,18 @@
>> >  #define _ASM_PPC64_PKEYS_H
>> >
>> >  extern bool pkey_inited;
>> > +/* override any generic PKEY Permission defines */
>> > +#undef  PKEY_DISABLE_ACCESS
>> > +#define PKEY_DISABLE_ACCESS0x1
>> > +#undef  PKEY_DISABLE_WRITE
>> > +#define PKEY_DISABLE_WRITE 0x2
>> > +#undef  PKEY_DISABLE_EXECUTE
>> > +#define PKEY_DISABLE_EXECUTE   0x4
>> > +#undef  PKEY_ACCESS_MASK
>> > +#define PKEY_ACCESS_MASK   (PKEY_DISABLE_ACCESS |\
>> > +  PKEY_DISABLE_WRITE  |\
>> > +  PKEY_DISABLE_EXECUTE)
>> > +
>> 
>> Is it ok to #undef macros from another header? Especially since said
>> header is in uapi (include/uapi/asm-generic/mman-common.h).
>> 
>> Also, it's unnecessary to undef the _ACCESS and _WRITE macros since they
>> are identical to the original definition. And since these macros are
>> originally defined in an uapi header, the powerpc-specific ones should
>> be in an uapi header as well, if I understand it correctly.
>
> The architectural neutral code allows the implementation to define the
> macros to its taste. powerpc headers due to legacy reason includes the
> include/uapi/asm-generic/mman-common.h header. That header includes the
> generic definitions of only PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.
> Unfortunately we end up importing them. I dont want to depend on them.
> Any changes there could effect us. Example if the generic uapi header
> changed PKEY_DISABLE_ACCESS to 0x4, we will have a conflict with
> PKEY_DISABLE_EXECUTE.  Hence I undef them and define the it my way.

Don't do that.

The generic header can't change the values, it's an ABI.

Doing it this way risks the uapi value diverging from the value used in
the powerpc code (due to a change in the powerpc version), which would
mean userspace and the kernel wouldn't agree on what the values meant
... which would be exciting.

cheers


[PATCH v2 3/3] powerpc/strict_kernel_rwx: Don't depend on !RELOCATABLE

2017-07-31 Thread Balbir Singh
The concerns with extra permissions and overlap have been
address, remove the dependency on !RELOCTABLE

Signed-off-by: Balbir Singh 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36f858c..b5b8ba8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -165,7 +165,7 @@ config PPC
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
-   select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S_64 && 
!RELOCATABLE && !HIBERNATION)
+   select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S_64 && 
!HIBERNATION)
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select HAVE_CBPF_JITif !PPC64
select HAVE_CONTEXT_TRACKINGif PPC64
-- 
2.9.4



[PATCH v2 1/3] powerpc/mm/radix: Fix relocatable radix mappings for STRICT_RWX

2017-07-31 Thread Balbir Singh
The mappings now do perfect kernel pte mappings even when the
kernel is relocated. This patch refactors create_physical_mapping()
and mark_rodata_ro(). create_physical_mapping() is now largely done with
a helper called __create_physical_mapping(), which is defined differently
for when CONFIG_STRICT_KERNEL_RWX is enabled and when its not.

The goal of the patchset is to provide minimal changes when the
CONFIG_STRICT_KERNEL_RWX is disabled, when enabled however, we do
split the linear mapping so that permissions are strictly adherent
to expectations from the user.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 183 +---
 1 file changed, 151 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 671a45d..6e0176d 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -164,8 +164,14 @@ void radix__mark_rodata_ro(void)
end = (unsigned long)__init_begin;
 
radix__change_memory_range(start, end, _PAGE_WRITE);
+
+   start = (unsigned long)__start_interrupts - PHYSICAL_START;
+   end = (unsigned long)__end_interrupts - PHYSICAL_START;
+
+   radix__change_memory_range(start, end, _PAGE_WRITE);
 }
 
+
 void radix__mark_initmem_nx(void)
 {
unsigned long start = (unsigned long)__init_begin;
@@ -173,6 +179,7 @@ void radix__mark_initmem_nx(void)
 
radix__change_memory_range(start, end, _PAGE_EXEC);
 }
+
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
 static inline void __meminit print_mapping(unsigned long start,
@@ -185,31 +192,36 @@ static inline void __meminit print_mapping(unsigned long 
start,
pr_info("Mapped range 0x%lx - 0x%lx with 0x%lx\n", start, end, size);
 }
 
-static int __meminit create_physical_mapping(unsigned long start,
-unsigned long end)
+/*
+ * Create physical mapping and return the last mapping size
+ * If the call is successful, end_of_mapping will return the
+ * last address mapped via this call, if not, it will leave
+ * the value untouched.
+ */
+static int __meminit __create_physical_mapping(unsigned long vstart,
+   unsigned long vend, pgprot_t prot,
+   unsigned long *end_of_mapping)
 {
-   unsigned long vaddr, addr, mapping_size = 0;
-   pgprot_t prot;
-   unsigned long max_mapping_size;
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   int split_text_mapping = 1;
-#else
-   int split_text_mapping = 0;
-#endif
+   unsigned long mapping_size = 0;
+   static unsigned long previous_size;
+   unsigned long addr, start, end;
 
+   start = __pa(vstart);
+   end = __pa(vend);
start = _ALIGN_UP(start, PAGE_SIZE);
+
+   pr_devel("physical_mapping start %lx->%lx, prot %lx\n",
+vstart, vend, pgprot_val(prot));
+
for (addr = start; addr < end; addr += mapping_size) {
-   unsigned long gap, previous_size;
+   unsigned long gap;
int rc;
 
gap = end - addr;
previous_size = mapping_size;
-   max_mapping_size = PUD_SIZE;
 
-retry:
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
-   mmu_psize_defs[MMU_PAGE_1G].shift &&
-   PUD_SIZE <= max_mapping_size)
+   mmu_psize_defs[MMU_PAGE_1G].shift)
mapping_size = PUD_SIZE;
else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
 mmu_psize_defs[MMU_PAGE_2M].shift)
@@ -217,40 +229,147 @@ static int __meminit create_physical_mapping(unsigned 
long start,
else
mapping_size = PAGE_SIZE;
 
-   if (split_text_mapping && (mapping_size == PUD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext)) {
-   max_mapping_size = PMD_SIZE;
-   goto retry;
+   if (previous_size != mapping_size) {
+   print_mapping(start, addr, previous_size);
+   start = addr;
+   previous_size = mapping_size;
}
 
-   if (split_text_mapping && (mapping_size == PMD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext))
-   mapping_size = PAGE_SIZE;
+   rc = radix__map_kernel_page((unsigned long)__va(addr), addr,
+   prot, mapping_size);
+   if (rc)
+   return rc;
+   }
 
-   if (mapping_size != previous_size) {
-   print_mapping(start, addr, previous_size);
-   start = addr;
+   

[PATCH v2 2/3] powerpc/mm/hash: WARN if relocation is enabled and CONFIG_STRICT_KERNEL_RWX

2017-07-31 Thread Balbir Singh
For radix we split the mapping into smaller page sizes (at the cost of
additional TLB overhead), but for hash its best to print a warning. In
the case of hash and no-relocation, the kernel should be well aligned
to provide the least overhead with the current linear mapping size (16M)

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-hash64.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 443a2c6..656f7f3 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -434,8 +434,26 @@ static bool hash__change_memory_range(unsigned long start, 
unsigned long end,
shift = mmu_psize_defs[mmu_linear_psize].shift;
step = 1 << shift;
 
-   start = ALIGN_DOWN(start, step);
-   end = ALIGN(end, step); // aligns up
+   if (!IS_ALIGNED(PHYSICAL_START, step)) {
+   /*
+* For the relocatable case we might have
+* a case where _stext shares the page
+* with rw memory or __init_begin might
+* share the page with executable text.
+* This breaks strict RWX, but allows the
+* kernel to boot. If PHYSICAL_START is mmu_linear_psize
+* aligned, then we can continue to make the same
+* assumptions as the non-relocatable case.
+*
+* TODO: If we really care about the relocatable
+* case, we can align __init_begin/end better.
+*/
+   start = ALIGN(start, step);
+   end = ALIGN_DOWN(end, step);
+   } else {
+   start = ALIGN_DOWN(start, step);
+   end = ALIGN(end, step); /* Aligns up */
+   }
 
if (start >= end)
return false;
@@ -455,6 +473,12 @@ void hash__mark_rodata_ro(void)
 {
unsigned long start, end;
 
+   if (PHYSICAL_START > MEMORY_START)
+   pr_warn("Detected relocation and CONFIG_STRICT_KERNEL_RWX "
+   "permissions are best effort, some non-text area "
+   "might still be left as executable");
+
+
start = (unsigned long)_stext;
end = (unsigned long)__init_begin;
 
-- 
2.9.4



[PATCH v2 0/3] Have CONFIG_STRICT_KERNEL_RWX work with CONFIG_RELOCATABLE

2017-07-31 Thread Balbir Singh
These patches make CONFIG_STRICT_KERNEL_RWX work with CONFIG_RELOCATABLE
The first patch splits up the radix linear mapping nicely on relocation
to support granular read-only and execution bits. The second patch warns
if relocation is actually done (PHYSICAL_START > MEMORY_START), we do
best effort support of expected permissions. We could do more granular
linear mapping, but we decided to leave it as a TODO (to check for
performance/MPSS/etc).

The last patch changes the config so that we are no longer dependent on
!RELOCATABLE for CONFIG_STRICT_KERNEL_RWX feature.

Changelog v2
- Rebase on top of the changes made in v4.13
- Move hash tables to IS_ALIGNED logic

Balbir Singh (3):
  powerpc/mm/radix: Fix relocatable radix mappings for STRICT_RWX
  powerpc/mm/hash: WARN if relocation is enabled and
CONFIG_STRICT_KERNEL_RWX
  powerpc/strict_kernel_rwx: Don't depend on !RELOCATABLE

 arch/powerpc/Kconfig |   2 +-
 arch/powerpc/mm/pgtable-hash64.c |  28 +-
 arch/powerpc/mm/pgtable-radix.c  | 183 ---
 3 files changed, 178 insertions(+), 35 deletions(-)

-- 
2.9.4



Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-31 Thread Michael Ellerman
Brian King  writes:

> On 07/28/2017 10:17 AM, Brian J King wrote:
>> Jens Axboe  wrote on 07/28/2017 09:25:48 AM:
>> 
>>> Can you try the below fix? Should be more palatable than the previous
>>> one. Brian, maybe you can take a look at the IRQ issue mentioned above?
>
> Michael,
>
> Does this address the issue you are seeing?

Yes it seems to, thanks.

I only see the trace on reboot, and not 100% of the time. But I've
survived a couple of reboots now without seeing anything, so I think
this is helping.

I'll put the patch in my Jenkins over night and let you know how it
survives that, which should be ~= 25 boots.

cheers


Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion

2017-07-31 Thread Christophe LEROY



Le 25/07/2017 à 13:19, Michael Ellerman a écrit :

LEROY Christophe  writes:


Michael Ellerman  a écrit :


LEROY Christophe  writes:


Benjamin Herrenschmidt  a écrit :


When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.


Did you have a look at the following patch ? It's been waiting for
application for some weeks now.
https://patchwork.ozlabs.org/patch/771869


I actually merged it last merge window, but found I had no good way to
test it, so I took it out again until I can write a test case for it.

The way I realised it wasn't being tested was by removing all the
store_updates_sp logic entirely and having my system run happily for
several days :}


Which demonstrates how unlikely this is, hence doing that get_user()
at every fault is waste of time.


Yes I agree.


How do you plan to handle that in parralele to ben's serie ?


Not sure :)


I'll be back from vacation next week and may help finding a way to
test that. (A test program using alloca() ?)


I was thinking hand-crafted asm, but that might be a pain to get working
for 32 & 64-bit, in which case alloca() might work.


No need of very sofisticated thing indeed.
The following app makes the trick. If I modify store_updates_sp() to 
always return 0, the app gets a SIGSEGV.


#include 
#include 

int main(int argc, char **argv)
{
char buf[1024 * 1025];

sprintf(buf, "Hello world !\n");
printf(buf);

exit(0);
}

Christophe



cheers



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-31 Thread Jonathan Cameron
On Wed, 26 Jul 2017 16:15:05 -0700
"Paul E. McKenney"  wrote:

> On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> > From: "Paul E. McKenney" 
> > Date: Wed, 26 Jul 2017 15:36:58 -0700
> >   
> > > And without CONFIG_SOFTLOCKUP_DETECTOR, I see five runs of 24 with RCU
> > > CPU stall warnings.  So it seems likely that CONFIG_SOFTLOCKUP_DETECTOR
> > > really is having an effect.  
> > 
> > Thanks for all of the info Paul, I'll digest this and scan over the
> > code myself.
> > 
> > Just out of curiousity, what x86 idle method is your machine using?
> > The mwait one or the one which simply uses 'halt'?  The mwait variant
> > might mask this bug, and halt would be a lot closer to how sparc64 and
> > Jonathan's system operates.  
> 
> My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> I am not using the mwait one.  Here is a grep for IDLE in my .config:
> 
>   CONFIG_NO_HZ_IDLE=y
>   CONFIG_GENERIC_SMP_IDLE_THREAD=y
>   # CONFIG_IDLE_PAGE_TRACKING is not set
>   CONFIG_ACPI_PROCESSOR_IDLE=y
>   CONFIG_CPU_IDLE=y
>   # CONFIG_CPU_IDLE_GOV_LADDER is not set
>   CONFIG_CPU_IDLE_GOV_MENU=y
>   # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
>   # CONFIG_INTEL_IDLE is not set
> 
> > On sparc64 the cpu yield we do in the idle loop sleeps the cpu.  It's
> > local TICK register keeps advancing, and the local timer therefore
> > will still trigger.  Also, any externally generated interrupts
> > (including cross calls) will wake up the cpu as well.
> > 
> > The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> > case.  One of my running theories is that we miss scheduling a tick
> > due to a race.  That would be consistent with the behavior we see
> > in the RCU dumps, I think.  
> 
> But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> warning?  By default, your grace period needs to extend for more than
> 21 seconds (more than one-third of a -minute-) to get one.  Or do
> you mean that the ticks get shut off now and forever, as opposed to
> just losing one of them?
> 
> > Anyways, just a theory, and that's why I keep mentioning that commit
> > about the revert of the revert (specifically
> > 411fe24e6b7c283c3a1911450cdba6dd3aaea56e).
> > 
> > :-)  
> 
> I am running an overnight test in preparation for attempting to push
> some fixes for regressions into 4.12, but will try reverting this
> and enabling CONFIG_HZ_PERIODIC tomorrow.
> 
> Jonathan, might the commit that Dave points out above be what reduces
> the probability of occurrence as you test older releases?
I just got around to trying this out of curiosity.  Superficially it did
appear to possibly make the issue harder to hit took over 30 minutes
but the issue otherwise looks much the same with or without that patch.

Just out of curiosity, next thing on my list is to disable hrtimers entirely
and see what happens.

Jonathan
> 
>   Thanx, Paul
> 



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-31 Thread Jonathan Cameron
On Fri, 28 Jul 2017 12:03:50 -0700
"Paul E. McKenney"  wrote:

> On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:
> > On Fri, 28 Jul 2017 09:55:29 -0700
> > "Paul E. McKenney"  wrote:
> >   
> > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:  
> > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > Jonathan Cameron  wrote:
> > > 
> > > [ . . . ]
> > >   
> > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the interest 
> > > > of having
> > > > fewer timer events.  Issue became much easier to trigger (on some runs 
> > > > before
> > > > I could get tracing up and running)
> > > >e
> > > > So logs are large enough that pastebin doesn't like them - please shoet 
> > > > if
> > > >>e another timer period is of interest.
> > > > 
> > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > 
> > > > The relevant timeout on the RCU stall detector was 8 seconds.  Event is
> > > > detected around 835.
> > > > 
> > > > It's a lot of logs, so I haven't identified a smoking gun yet but there
> > > > may well be one in there.
> > > 
> > > The dmesg says:
> > > 
> > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 
> > > RCU_GP_WAIT_FQS(3) ->state=0x1
> > > 
> > > So I look for "rcu_preempt" timer events and find these:
> > > 
> > > rcu_preempt-9 [019]    827.579114: timer_init: 
> > > timer=8017d5fc7da0
> > > rcu_preempt-9 [019] d..1   827.579115: timer_start: 
> > > timer=8017d5fc7da0 function=process_timeout 
> > > 
> > > Next look for "8017d5fc7da0" and I don't find anything else.  
> > It does show up off the bottom of what would fit in pastebin...
> > 
> >  rcu_preempt-9 [001] d..1   837.681077: timer_cancel: 
> > timer=8017d5fc7da0
> >  rcu_preempt-9 [001]    837.681086: timer_init: 
> > timer=8017d5fc7da0
> >  rcu_preempt-9 [001] d..1   837.681087: timer_start: 
> > timer=8017d5fc7da0 function=process_timeout expires=4295101298 
> > [timeout=1] cpu=1 idx=0 flags=  
> 
> Odd.  I would expect an expiration...  And ten seconds is way longer
> than the requested one jiffy!
> 
> > > The timeout was one jiffy, and more than a second later, no expiration.
> > > Is it possible that this event was lost?  I am not seeing any sign of
> > > this is the trace.
> > > 
> > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > any case).
> > > 
> > > The last time we saw something like this it was a timer HW/driver problem,
> > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > and SPARC.  ;-)  
> > Could be different issues, both of which were hidden by that lockup 
> > detector.
> > 
> > There is an errata work around for the timers on this particular board.
> > I'm only vaguely aware of it, so may be unconnected.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb
> > 
> > Seems unlikely though! + we've not yet seen it on the other chips that
> > errata effects (not that that means much).  
> 
> If you can reproduce quickly, might be worth trying anyway...
> 
>   Thanx, Paul
Errata fix is running already and was for all those tests.

I'll have a dig into the timers today and see where I get to.

Jonathan
> 
> > Jonathan
> >   
> > > 
> > > Thomas, any debugging suggestions?
> > > 
> > >   Thanx, Paul
> > >   
> >   
> 



RE: [PATCH v2] qe: fix compile issue for arm64

2017-07-31 Thread Michael Ellerman
Qiang Zhao  writes:

> Fri 7/28/2017 2:14 PM, Michael Ellerman  wrote:
>
>> -Original Message-
>> From: Michael Ellerman [mailto:m...@ellerman.id.au]
>> Sent: Friday, July 28, 2017 2:14 PM
>> To: Qiang Zhao ; o...@buserror.net
>> Cc: valentin.longch...@keymile.com; linuxppc-dev@lists.ozlabs.org; linux-
>> ker...@vger.kernel.org; Qiang Zhao 
>> Subject: Re: [PATCH v2] qe: fix compile issue for arm64
>> 
>> Zhao Qiang  writes:
>> 
>> > Signed-off-by: Zhao Qiang 
>> > ---
>> > Changes for v2:
>> >- include all Errata QE_General4 in #ifdef
>> >
>> >  drivers/soc/fsl/qe/qe.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> 
>> AFAICS this driver can only be built on PPC, what am I missing?
>> 
>> config QUICC_ENGINE
>> bool "Freescale QUICC Engine (QE) Support"
>> depends on FSL_SOC && PPC32
>> 
>> cheers
>
> I sent another patchset to support it on arm64.

Where? I don't see it.

Shouldn't this patch be part of that series? Otherwise when that series
is merged the build will break on arm64.

cheers


Re: [PATCH 2/5] Fix "ERROR: code indent should use tabs where possible"

2017-07-31 Thread Michael Ellerman
SZ Lin  writes:

> ERROR: code indent should use tabs where possible
> +^I^I "Need to wait for TPM to finish\n");$
>
> Signed-off-by: SZ Lin 
> ---
>  drivers/char/tpm/tpm_ibmvtpm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index f01d083eced2..23913fc86158 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -127,7 +127,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 
> *buf, size_t count)
>  
>   if (ibmvtpm->tpm_processing_cmd) {
>   dev_info(ibmvtpm->dev,
> -  "Need to wait for TPM to finish\n");
> + "Need to wait for TPM to finish\n");

There's no reason for that to be on a separate line at all. Just make it
a single line dev_info( ... );

cheers


Re: [RFC PATCH] powerpc: improve accounting of non maskable interrupts

2017-07-31 Thread Michael Ellerman
Nicholas Piggin  writes:

> This fixes a case of double counting MCEs on PowerNV.
>
> Adds a counter for the system reset interrupt, which will
> see more use as a debugging NMI.
>
> Adds a soft-NMI counter for the 64s watchdog. Although this could cause
> confusion because it only fires when interrupts are soft-disabled, so it
> won't increment much even when the watchdog is running.
>
> Signed-off-by: Nicholas Piggin 
> ---
> I can split these out or drop any objectionable bits. At least the
> MCE we should fix, not sure if the other bits are wanted.

Yeah if you can split it please that would be good.

I'm not sure how useful it is to count the SOFT NMIs. I guess it's not
much overhead so we may as well, if nothing else it might be handy for
us debugging.

If you can make the ifdef look less horrendous that would be great :D -
perhaps a kconfig symbol that keys off both.

cheers


Re: [PATCH] mpc832x_rdb: fix of_irq_to_resource() error check

2017-07-31 Thread Michael Ellerman
Scott Wood  writes:

> On Sat, 2017-07-29 at 22:52 +0300, Sergei Shtylyov wrote:
>> of_irq_to_resource() has recently been  fixed to return negative error #'s
>> along with 0 in case of failure,  however the Freescale MPC832x RDB board
>> code still only regards 0 as as failure indication -- fix it up.
>> 
>> Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
>> Signed-off-by: Sergei Shtylyov 
>> 
>> ---
>> The patch is against the 'master' branch of Scott Wood's 'linux.git' repo
>> (the 'fixes' branch is too much behind).
>
> The master branch is also old.  Those branches are only used when needed to
> apply patches; I don't update them just to sync up.  If they're older than
> what's in Michael's or Linus's tree (as they almost always are), then use
> those instead.
>
> Not that I expect it to make a difference to this patch...

Do you want me to grab this as a fix for 4.13 ?

cheers


[PATCH] powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list

2017-07-31 Thread Madhavan Srinivasan
Add couple of more events (PM_LD_MISS_L1 and PM_BR_2PATH) to
power9 event list and power9_event_alternatives array (these
events can be counted in more than one PMC).

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-events-list.h | 8 
 arch/powerpc/perf/power9-pmu.c | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power9-events-list.h 
b/arch/powerpc/perf/power9-events-list.h
index 50689180a6c1..c5ed95d8f976 100644
--- a/arch/powerpc/perf/power9-events-list.h
+++ b/arch/powerpc/perf/power9-events-list.h
@@ -23,6 +23,9 @@ EVENT(PM_BR_MPRED_CMPL,   0x400f6)
 EVENT(PM_LD_REF_L1,0x100fc)
 /* Load Missed L1 */
 EVENT(PM_LD_MISS_L1_FIN,   0x2c04e)
+EVENT(PM_LD_MISS_L1,   0x3e054)
+/* Alternate event code for PM_LD_MISS_L1 */
+EVENT(PM_LD_MISS_L1_ALT,   0x400f0)
 /* Store Missed L1 */
 EVENT(PM_ST_MISS_L1,   0x300f0)
 /* L1 cache data prefetches */
@@ -62,3 +65,8 @@ EVENT(PM_INST_DISP,   0x200f2)
 EVENT(PM_INST_DISP_ALT,0x300f2)
 /* Alternate Branch event code */
 EVENT(PM_BR_CMPL_ALT,  0x10012)
+/* Branch event that are not strongly biased */
+EVENT(PM_BR_2PATH, 0x20036)
+/* ALternate branch event that are not strongly biased */
+EVENT(PM_BR_2PATH_ALT, 0x40036)
+
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 805bdfcb38ec..ed92159adf3e 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -109,6 +109,8 @@ static const unsigned int 
power9_event_alternatives[][MAX_ALT] = {
{ PM_INST_DISP, PM_INST_DISP_ALT },
{ PM_RUN_CYC_ALT,   PM_RUN_CYC },
{ PM_RUN_INST_CMPL_ALT, PM_RUN_INST_CMPL },
+   { PM_LD_MISS_L1,PM_LD_MISS_L1_ALT },
+   { PM_BR_2PATH,  PM_BR_2PATH_ALT },
 };
 
 static int power9_get_alternatives(u64 event, unsigned int flags, u64 alt[])
-- 
2.7.4



Re: [PATCH v2] powerpc/powernv: Use darn instr for random_seed on p9

2017-07-31 Thread Michael Ellerman
Hi Matt,

A few comments inline ...

Matt Brown  writes:
> Currently ppc_md.get_random_seed uses the powernv_get_random_long function.
> A guest calling this function would have to go through the hypervisor. The

This is not quite right. The powernv routine is only ever used on bare
metal. In a guest we use pseries_get_random_long(), which does go via
they hypervisor.

On both Power8 and Power9 there is a hardware RNG per chip. The
difference is that on Power8 we had to go and find it in the device tree
and read from it via MMIO. On Power9 that is all done transparently for
us by the DARN instruction.

> 'darn' instruction, introduced in POWER9, allows us to bypass this by
> directly obtaining a value from the mmio region.
>
> This patch adds a function for ppc_md.get_random_seed on p9,
> utilising the darn instruction.
>
> Signed-off-by: Matt Brown 
> ---
> v2:
>   - remove repeat darn attempts
>   - move hook to rng_init
> ---
>  arch/powerpc/include/asm/ppc-opcode.h |  4 
>  arch/powerpc/platforms/powernv/rng.c  | 22 ++
>  2 files changed, 26 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
> b/arch/powerpc/include/asm/ppc-opcode.h
> index c4ced1d..d5f7082 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -134,6 +134,7 @@
>  #define PPC_INST_COPY0x7c00060c
>  #define PPC_INST_COPY_FIRST  0x7c20060c
>  #define PPC_INST_CP_ABORT0x7c00068c
> +#define PPC_INST_DARN0x7c0005e6

That looks right to me.

> @@ -325,6 +326,9 @@
>  
>  /* Deal with instructions that older assemblers aren't aware of */
>  #define  PPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT)
> +#define PPC_DARN(t, l)   stringify_in_c(.long PPC_INST_DARN |  \
> + ___PPC_RT(t)   |  \
> + ___PPC_RA(l))

But this is not quite.

The macros are:

#define ___PPC_RA(a)(((a) & 0x1f) << 16)
#define ___PPC_RS(s)(((s) & 0x1f) << 21)
#define ___PPC_RT(t)___PPC_RS(t)

But the definition of darn is:

 +-+-+-+-+-+-++
 |31   |RT   |/|L|/|   755   | /  |
 | 31 - 26 | 25 - 21 | 20 - 18 | 17 - 16 | 15 - 11 |  10 - 1 | 0  |
 +-+-+-+-+-+-++

Using ___PPC_RT() gets you the right shift and mask, but because it's
the triple underscore verison, it doesn't check that you pass a register
number to it. You should use __PPC_RT() instead.

And ___PPC_RA() is not quite right. The L field is only 2 bits wide, not
the 5 that ___PPC_RA() allows.

We don't have a __PPC_L() macro, because L fields vary in size and
location. So I think you're best of open coding it, eg:

+#define PPC_DARN(t, l) stringify_in_c(.long PPC_INST_DARN |  \
+   __PPC_RT(t)|  \
+   (((l) & 0x3) << 16))

> diff --git a/arch/powerpc/platforms/powernv/rng.c 
> b/arch/powerpc/platforms/powernv/rng.c
> index 5dcbdea..ab6f411 100644
> --- a/arch/powerpc/platforms/powernv/rng.c
> +++ b/arch/powerpc/platforms/powernv/rng.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #define pr_fmt(fmt)  "powernv-rng: " fmt
> +#define DARN_ERR 0xul

Usual place for constants is after all the includes, before the first
code or variables.

>  #include 
>  #include 
> @@ -16,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -67,6 +69,21 @@ int powernv_get_random_real_mode(unsigned long *v)
>   return 1;
>  }
>  
> +int powernv_get_random_darn(unsigned long *v)
> +{
> + unsigned long val;
> +
> + /* Using DARN with L=1 - conditioned random number */

Actually L=1 is a 64-bit conditioned random number, vs L=0 for 32-bit.

> + asm (PPC_DARN(%0, 1)"\n" : "=r"(val) :);
> +
> + if (val == DARN_ERR)
> + return 0;
> +
> + *v = val;
> +
> + return 1;
> +}
> +
>  int powernv_get_random_long(unsigned long *v)
>  {
>   struct powernv_rng *rng;
> @@ -136,6 +153,7 @@ static __init int rng_create(struct device_node *dn)
>  static __init int rng_init(void)
>  {
>   struct device_node *dn;
> + unsigned long drn_test;

Buy yourself a vowel! ;)

>   int rc;
>  
>   for_each_compatible_node(dn, NULL, "ibm,power-rng") {
> @@ -150,6 +168,10 @@ static __init int rng_init(void)
>   of_platform_device_create(dn, NULL, NULL);
>   }
>  
> + if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> + powernv_get_random_darn(_test))
> + ppc_md.get_random_seed = powernv_get_random_darn;

I know we took the loop out of powernv_get_random_darn(), but we might
want to put it back in this case. ie. we don't want to skip 

Re: [PATCH] powerpc/asm/cacheflush: Cleanup cacheflush function params

2017-07-31 Thread Christophe LEROY



Le 20/07/2017 à 08:28, Matt Brown a écrit :

The cacheflush prototypes currently use start and stop values and each
call requires typecasting the address to an unsigned long.
This patch changes the cacheflush prototypes to follow the x86 style of
using a base and size values, with base being a void pointer.

All callers of the cacheflush functions, including drivers, have been
modified to conform to the new prototypes.

The 64 bit cacheflush functions which were implemented in assembly code
(flush_dcache_range, flush_inval_dcache_range) have been translated into
C for readability and coherence.

Signed-off-by: Matt Brown 
---
  arch/powerpc/include/asm/cacheflush.h| 47 +
  arch/powerpc/kernel/misc_64.S| 52 
  arch/powerpc/mm/dma-noncoherent.c| 15 
  arch/powerpc/platforms/512x/mpc512x_shared.c | 10 +++---
  arch/powerpc/platforms/85xx/smp.c|  6 ++--
  arch/powerpc/sysdev/dart_iommu.c |  5 +--
  drivers/ata/pata_bf54x.c |  3 +-
  drivers/char/agp/uninorth-agp.c  |  6 ++--
  drivers/gpu/drm/drm_cache.c  |  3 +-
  drivers/macintosh/smu.c  | 15 
  drivers/mmc/host/bfin_sdh.c  |  3 +-
  drivers/mtd/nand/bf5xx_nand.c|  6 ++--
  drivers/soc/fsl/qbman/dpaa_sys.h |  2 +-
  drivers/soc/fsl/qbman/qman_ccsr.c|  3 +-
  drivers/spi/spi-bfin5xx.c| 10 +++---
  drivers/tty/serial/mpsc.c| 46 
  drivers/usb/musb/blackfin.c  |  6 ++--
  17 files changed, 86 insertions(+), 152 deletions(-)



[...]


diff --git a/arch/powerpc/mm/dma-noncoherent.c 
b/arch/powerpc/mm/dma-noncoherent.c
index 3825284..5fd3171 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -204,9 +204,9 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *handle, gfp_t
 * kernel direct-mapped region for device DMA.
 */
{
-   unsigned long kaddr = (unsigned long)page_address(page);
+   void *kaddr = page_address(page);
memset(page_address(page), 0, size);
-   flush_dcache_range(kaddr, kaddr + size);
+   flush_dcache_range(kaddr, size);
}
  
  	/*

@@ -316,9 +316,6 @@ EXPORT_SYMBOL(__dma_free_coherent);
   */
  void __dma_sync(void *vaddr, size_t size, int direction)
  {
-   unsigned long start = (unsigned long)vaddr;
-   unsigned long end   = start + size;
-
switch (direction) {
case DMA_NONE:
BUG();
@@ -328,15 +325,15 @@ void __dma_sync(void *vaddr, size_t size, int direction)
 * the potential for discarding uncommitted data from the cache
 */
if ((start | end) & (L1_CACHE_BYTES - 1))


How can the above compile when 'start' and 'end' are removed ?
Shouldn't it be replaced by

if ((vaddr | size) & (L1_CACHE_BYTES - 1))


-   flush_dcache_range(start, end);
+   flush_dcache_range(vaddr, size);
else
-   invalidate_dcache_range(start, end);
+   invalidate_dcache_range(vaddr, size);
break;
case DMA_TO_DEVICE: /* writeback only */
-   clean_dcache_range(start, end);
+   clean_dcache_range(vaddr, size);
break;
case DMA_BIDIRECTIONAL: /* writeback and invalidate */
-   flush_dcache_range(start, end);
+   flush_dcache_range(vaddr, size);
break;
}
  }


[...]

Christophe


Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

2017-07-31 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:50PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> In prep for a new function that will need additional resource information
> during the resource walk, update the resource walk callback to pass the
> resource structure.  Since the current callback start and end arguments
> are pulled from the resource structure, the callback functions can obtain
> them from the resource structure directly.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/powerpc/kernel/machine_kexec_file_64.c | 12 +---
>  arch/x86/kernel/crash.c | 18 +-
>  arch/x86/kernel/pmem.c  |  2 +-
>  include/linux/ioport.h  |  4 ++--
>  include/linux/kexec.h   |  2 +-
>  kernel/kexec_file.c |  5 +++--
>  kernel/resource.c   |  9 +
>  7 files changed, 30 insertions(+), 22 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


[PATCH] powerpc/perf: Factor out PPMU_ONLY_COUNT_RUN check code from power8

2017-07-31 Thread Madhavan Srinivasan
There are some hardware events on Power systems which only
count when the processor is not idle, and there are some
fixed-function counters which count such events. For example,
the "run cycles" event counts cycles when the processor is
not idle. If the user asks to count cycles, we can use
"run cycles" if this is a per-task event, since the processor
is running when the task is running, by definition. We can't
use "run cycles" if the user asks for "cycles" on a system-wide
counter.

Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN
flag in power8_get_alternatives() function. Based on the
flag, events are switched if needed. This function should
also be enabled in power9, so factor out the code to
isa207_get_alternatives().

Fixes: efe881afdd999 ('powerpc/perf: Factor out event_alternative function')
Reported-by: Anton Blanchard 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 29 +++--
 arch/powerpc/perf/isa207-common.h |  4 ++--
 arch/powerpc/perf/power8-pmu.c| 33 +
 arch/powerpc/perf/power9-pmu.c|  5 +++--
 4 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 3f3aa9a7063a..26c81aaa4c2d 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -488,8 +488,8 @@ static int find_alternative(u64 event, const unsigned int 
ev_alt[][MAX_ALT], int
return -1;
 }
 
-int isa207_get_alternatives(u64 event, u64 alt[],
-   const unsigned int ev_alt[][MAX_ALT], int size)
+int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags,
+   const unsigned int ev_alt[][MAX_ALT])
 {
int i, j, num_alt = 0;
u64 alt_event;
@@ -505,5 +505,30 @@ int isa207_get_alternatives(u64 event, u64 alt[],
}
}
 
+   if (flags & PPMU_ONLY_COUNT_RUN) {
+   /*
+* We're only counting in RUN state, so PM_CYC is equivalent to
+* PM_RUN_CYC and PM_INST_CMPL === PM_RUN_INST_CMPL.
+*/
+   j = num_alt;
+   for (i = 0; i < num_alt; ++i) {
+   switch (alt[i]) {
+   case 0x1e:  /* PMC_CYC */
+   alt[j++] = 0x600f4; /* PM_RUN_CYC */
+   break;
+   case 0x600f4:
+   alt[j++] = 0x1e;
+   break;
+   case 0x2:   /* PM_INST_CMPL */
+   alt[j++] = 0x500fa; /* PM_RUN_INST_CMPL */
+   break;
+   case 0x500fa:
+   alt[j++] = 0x2;
+   break;
+   }
+   }
+   num_alt = j;
+   }
+
return num_alt;
 }
diff --git a/arch/powerpc/perf/isa207-common.h 
b/arch/powerpc/perf/isa207-common.h
index 8acbe6e802c7..348cb6b9b911 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -287,8 +287,8 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], unsigned long mmcr[],
struct perf_event *pevents[]);
 void isa207_disable_pmc(unsigned int pmc, unsigned long mmcr[]);
-int isa207_get_alternatives(u64 event, u64 alt[],
-   const unsigned int ev_alt[][MAX_ALT], int size);
+int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags,
+   const unsigned int ev_alt[][MAX_ALT]);
 void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
struct pt_regs *regs);
 void isa207_get_mem_weight(u64 *weight);
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5463516e369b..0bd27769e81e 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -50,34 +50,11 @@ static const unsigned int event_alternatives[][MAX_ALT] = {
 
 static int power8_get_alternatives(u64 event, unsigned int flags, u64 alt[])
 {
-   int i, j, num_alt = 0;
-
-   num_alt = isa207_get_alternatives(event, alt, event_alternatives,
-   (int)ARRAY_SIZE(event_alternatives));
-   if (flags & PPMU_ONLY_COUNT_RUN) {
-   /*
-* We're only counting in RUN state, so PM_CYC is equivalent to
-* PM_RUN_CYC and PM_INST_CMPL === PM_RUN_INST_CMPL.
-*/
-   j = num_alt;
-   for (i = 0; i < num_alt; ++i) {
-   switch (alt[i]) {
-   case PM_CYC:
-   

Re: [PATCH 1/2] KVM: PPC: e500: fix some NULL dereferences on error

2017-07-31 Thread Dan Carpenter
On Mon, Jul 31, 2017 at 04:03:40PM +1000, Paul Mackerras wrote:
> On Thu, Jul 13, 2017 at 10:38:29AM +0300, Dan Carpenter wrote:
> > There are some error paths in kvmppc_core_vcpu_create_e500() where we
> > forget to set the error code.  It means that we return ERR_PTR(0) which
> > is NULL and it results in a NULL pointer dereference in the caller.
> > 
> > Signed-off-by: Dan Carpenter 
> 
> Are these user-triggerable, and therefore needing to go into 4.13
> and be back-ported to the stable trees?  Or can they wait for 4.14?
> 

These are static checker fixes...  I imagine that they might be user
triggerable with quite a bit of work but it's a only NULL derefence.

regards,
dan carpenter



[RESEND][PATCH V10 0/3] powernv : Add support for OPAL-OCC command/response interface

2017-07-31 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. The following commands are supported:

1) Set system powercap
2) Set CPU-GPU power shifting ratio
3) Clear min/max for OCC sensor groups

Changes from V9:
- Fixed return after erroring from mutex_lock_interruptible()
- Added documentation
- [RESEND] Fixed the version number of the patch-set in Subject

Shilpasri G Bhat (3):
  powernv: powercap: Add support for powercap framework
  powernv: Add support to set power-shifting-ratio
  powernv: Add support to clear sensor groups data

 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 Documentation/ABI/testing/sysfs-firmware-opal-psr  |  18 ++
 .../bindings/powerpc/opal/sensor-groups.txt|  23 ++
 arch/powerpc/include/asm/opal-api.h|   8 +-
 arch/powerpc/include/asm/opal.h|   9 +
 arch/powerpc/include/uapi/asm/opal-occ.h   |  23 ++
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 116 ++
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-psr.c  | 175 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   5 +
 arch/powerpc/platforms/powernv/opal.c  |  10 +
 12 files changed, 662 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

-- 
1.8.3.1



[RESEND][PATCH V10 3/3] powernv: Add support to clear sensor groups data

2017-07-31 Thread Shilpasri G Bhat
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat 
---
 .../bindings/powerpc/opal/sensor-groups.txt|  23 
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/include/uapi/asm/opal-occ.h   |  23 
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 116 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   3 +
 8 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt 
b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
new file mode 100644
index 000..304b87c
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
@@ -0,0 +1,23 @@
+IBM OPAL Sensor Groups Binding
+---
+
+Node: /ibm,opal/sensor-groups
+
+Description: Contains sensor groups available in the Powernv P9
+servers. Each child node indicates a sensor group.
+
+- compatible : Should be "ibm,opal-occ-sensor-group"
+
+Each child node contains below properties:
+
+- type : String to indicate the type of sensor-group
+
+- sensor-group-id: Abstract unique identifier provided by firmware of
+  type  which is used for sensor-group
+  operations like clearing the min/max history of all
+  sensors belonging to the group.
+
+- ibm,chip-id : Chip ID
+
+- sensors : Phandle array of child nodes of /ibm,opal/sensor/
+   belonging to this group
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 92e31fd..0841659 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -195,7 +195,8 @@
 #define OPAL_SET_POWERCAP  153
 #define OPAL_GET_POWER_SHIFT_RATIO 154
 #define OPAL_SET_POWER_SHIFT_RATIO 155
-#define OPAL_LAST  155
+#define OPAL_SENSOR_GROUPS_CLEAR   156
+#define OPAL_LAST  156
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index b9ea77f..a716def 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int opal_set_powercap(u32 handle, int token, u32 pcap);
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
+int opal_sensor_groups_clear(u32 group_hndl, int token);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_powercap_init(void);
 void opal_psr_init(void);
+int opal_sensor_groups_clear_history(u32 handle);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h 
b/arch/powerpc/include/uapi/asm/opal-occ.h
new file mode 100644
index 000..97c45e2
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/opal-occ.h
@@ -0,0 +1,23 @@
+/*
+ * OPAL OCC command interface
+ * Supported on POWERNV platform
+ *
+ * (C) Copyright IBM 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_
+#define _UAPI_ASM_POWERPC_OPAL_OCC_H_
+
+#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32)
+
+#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 9ed7d33..f193b33 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y 

[RESEND][PATCH V10 2/3] powernv: Add support to set power-shifting-ratio

2017-07-31 Thread Shilpasri G Bhat
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat 
---
 Documentation/ABI/testing/sysfs-firmware-opal-psr |  18 +++
 arch/powerpc/include/asm/opal-api.h   |   4 +-
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-psr.c | 175 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S|   2 +
 arch/powerpc/platforms/powernv/opal.c |   3 +
 7 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr 
b/Documentation/ABI/testing/sysfs-firmware-opal-psr
new file mode 100644
index 000..cc2ece7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr
@@ -0,0 +1,18 @@
+What:  /sys/firmware/opal/psr
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Power-Shift-Ratio directory for Powernv P9 servers
+
+   Power-Shift-Ratio allows to provide hints the firmware
+   to shift/throttle power between different entities in
+   the system. Each attribute in this directory indicates
+   a settable PSR.
+
+What:  /sys/firmware/opal/psr/cpu_to_gpu_X
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   PSR sysfs attributes for Powernv P9 servers
+
+   Power-Shift-Ratio between CPU and GPU for a given chip
+   with chip-id X. This file gives the ratio (0-100)
+   which is used by OCC for power-capping.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index c3e0c4a..92e31fd 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -193,7 +193,9 @@
 #define OPAL_NPU_MAP_LPAR  148
 #define OPAL_GET_POWERCAP  152
 #define OPAL_SET_POWERCAP  153
-#define OPAL_LAST  153
+#define OPAL_GET_POWER_SHIFT_RATIO 154
+#define OPAL_SET_POWER_SHIFT_RATIO 155
+#define OPAL_LAST  155
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ec2087c..b9ea77f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
 int opal_get_powercap(u32 handle, int token, u32 *pcap);
 int opal_set_powercap(u32 handle, int token, u32 pcap);
+int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
+int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 void opal_wake_poller(void);
 
 void opal_powercap_init(void);
+void opal_psr_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index e79f806..9ed7d33 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
b/arch/powerpc/platforms/powernv/opal-psr.c
new file mode 100644
index 000..7313b7f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-psr.c
@@ -0,0 +1,175 @@
+/*
+ * PowerNV OPAL Power-Shift-Ratio interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-psr: " fmt
+
+#include 
+#include 
+#include 
+
+#include 
+
+DEFINE_MUTEX(psr_mutex);
+
+static struct kobject 

[RESEND][PATCH V10 1/3] powernv: powercap: Add support for powercap framework

2017-07-31 Thread Shilpasri G Bhat
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat 
---
 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 arch/powerpc/include/asm/opal-api.h|   5 +-
 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/opal.c  |   4 +
 7 files changed, 290 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap 
b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
new file mode 100644
index 000..c9b66ec
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
@@ -0,0 +1,31 @@
+What:  /sys/firmware/opal/powercap
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Powercap directory for Powernv (P8, P9) servers
+
+   Each folder in this directory contains a
+   power-cappable component.
+
+What:  /sys/firmware/opal/powercap/system-powercap
+   /sys/firmware/opal/powercap/system-powercap/powercap-min
+   /sys/firmware/opal/powercap/system-powercap/powercap-max
+   /sys/firmware/opal/powercap/system-powercap/powercap-current
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   System powercap directory and attributes applicable for
+   Powernv (P8, P9) servers
+
+   This directory provides powercap information. It
+   contains below sysfs attributes:
+
+   - powercap-min : This file provides the minimum
+ possible powercap in Watt units
+
+   - powercap-max : This file provides the maximum
+ possible powercap in Watt units
+
+   - powercap-current : This file provides the current
+ powercap set on the system. Writing to this file
+ creates a request for setting a new-powercap. The
+ powercap requested must be between powercap-min
+ and powercap-max.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 3130a73..c3e0c4a 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,7 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_TIMEOUT   -33
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -190,7 +191,9 @@
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
-#define OPAL_LAST  148
+#define OPAL_GET_POWERCAP  152
+#define OPAL_SET_POWERCAP  153
+#define OPAL_LAST  153
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c..ec2087c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int64_t opal_xive_free_irq(uint32_t girq);
 int64_t opal_xive_sync(uint32_t type, uint32_t id);
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
+int opal_get_powercap(u32 handle, int token, u32 *pcap);
+int opal_set_powercap(u32 handle, int token, u32 pcap);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+void opal_powercap_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..e79f806 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-powercap.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff 

[PATCH V9 1/3] powernv: powercap: Add support for powercap framework

2017-07-31 Thread Shilpasri G Bhat
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat 
---
 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 arch/powerpc/include/asm/opal-api.h|   5 +-
 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/opal.c  |   4 +
 7 files changed, 290 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap 
b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
new file mode 100644
index 000..c9b66ec
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
@@ -0,0 +1,31 @@
+What:  /sys/firmware/opal/powercap
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Powercap directory for Powernv (P8, P9) servers
+
+   Each folder in this directory contains a
+   power-cappable component.
+
+What:  /sys/firmware/opal/powercap/system-powercap
+   /sys/firmware/opal/powercap/system-powercap/powercap-min
+   /sys/firmware/opal/powercap/system-powercap/powercap-max
+   /sys/firmware/opal/powercap/system-powercap/powercap-current
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   System powercap directory and attributes applicable for
+   Powernv (P8, P9) servers
+
+   This directory provides powercap information. It
+   contains below sysfs attributes:
+
+   - powercap-min : This file provides the minimum
+ possible powercap in Watt units
+
+   - powercap-max : This file provides the maximum
+ possible powercap in Watt units
+
+   - powercap-current : This file provides the current
+ powercap set on the system. Writing to this file
+ creates a request for setting a new-powercap. The
+ powercap requested must be between powercap-min
+ and powercap-max.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 3130a73..c3e0c4a 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,7 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_TIMEOUT   -33
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -190,7 +191,9 @@
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
-#define OPAL_LAST  148
+#define OPAL_GET_POWERCAP  152
+#define OPAL_SET_POWERCAP  153
+#define OPAL_LAST  153
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c..ec2087c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int64_t opal_xive_free_irq(uint32_t girq);
 int64_t opal_xive_sync(uint32_t type, uint32_t id);
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
+int opal_get_powercap(u32 handle, int token, u32 *pcap);
+int opal_set_powercap(u32 handle, int token, u32 pcap);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+void opal_powercap_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..e79f806 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-powercap.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff 

[PATCH V9 3/3] powernv: Add support to clear sensor groups data

2017-07-31 Thread Shilpasri G Bhat
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat 
---
 .../bindings/powerpc/opal/sensor-groups.txt|  23 
 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/include/uapi/asm/opal-occ.h   |  23 
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 116 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   3 +
 8 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt 
b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
new file mode 100644
index 000..304b87c
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
@@ -0,0 +1,23 @@
+IBM OPAL Sensor Groups Binding
+---
+
+Node: /ibm,opal/sensor-groups
+
+Description: Contains sensor groups available in the Powernv P9
+servers. Each child node indicates a sensor group.
+
+- compatible : Should be "ibm,opal-occ-sensor-group"
+
+Each child node contains below properties:
+
+- type : String to indicate the type of sensor-group
+
+- sensor-group-id: Abstract unique identifier provided by firmware of
+  type  which is used for sensor-group
+  operations like clearing the min/max history of all
+  sensors belonging to the group.
+
+- ibm,chip-id : Chip ID
+
+- sensors : Phandle array of child nodes of /ibm,opal/sensor/
+   belonging to this group
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 92e31fd..0841659 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -195,7 +195,8 @@
 #define OPAL_SET_POWERCAP  153
 #define OPAL_GET_POWER_SHIFT_RATIO 154
 #define OPAL_SET_POWER_SHIFT_RATIO 155
-#define OPAL_LAST  155
+#define OPAL_SENSOR_GROUPS_CLEAR   156
+#define OPAL_LAST  156
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index b9ea77f..a716def 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int opal_set_powercap(u32 handle, int token, u32 pcap);
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
+int opal_sensor_groups_clear(u32 group_hndl, int token);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_powercap_init(void);
 void opal_psr_init(void);
+int opal_sensor_groups_clear_history(u32 handle);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h 
b/arch/powerpc/include/uapi/asm/opal-occ.h
new file mode 100644
index 000..97c45e2
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/opal-occ.h
@@ -0,0 +1,23 @@
+/*
+ * OPAL OCC command interface
+ * Supported on POWERNV platform
+ *
+ * (C) Copyright IBM 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_
+#define _UAPI_ASM_POWERPC_OPAL_OCC_H_
+
+#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32)
+
+#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 9ed7d33..f193b33 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y 

[PATCH V9 2/3] powernv: Add support to set power-shifting-ratio

2017-07-31 Thread Shilpasri G Bhat
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat 
---
 Documentation/ABI/testing/sysfs-firmware-opal-psr |  18 +++
 arch/powerpc/include/asm/opal-api.h   |   4 +-
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-psr.c | 175 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S|   2 +
 arch/powerpc/platforms/powernv/opal.c |   3 +
 7 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr 
b/Documentation/ABI/testing/sysfs-firmware-opal-psr
new file mode 100644
index 000..cc2ece7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr
@@ -0,0 +1,18 @@
+What:  /sys/firmware/opal/psr
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Power-Shift-Ratio directory for Powernv P9 servers
+
+   Power-Shift-Ratio allows to provide hints the firmware
+   to shift/throttle power between different entities in
+   the system. Each attribute in this directory indicates
+   a settable PSR.
+
+What:  /sys/firmware/opal/psr/cpu_to_gpu_X
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   PSR sysfs attributes for Powernv P9 servers
+
+   Power-Shift-Ratio between CPU and GPU for a given chip
+   with chip-id X. This file gives the ratio (0-100)
+   which is used by OCC for power-capping.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index c3e0c4a..92e31fd 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -193,7 +193,9 @@
 #define OPAL_NPU_MAP_LPAR  148
 #define OPAL_GET_POWERCAP  152
 #define OPAL_SET_POWERCAP  153
-#define OPAL_LAST  153
+#define OPAL_GET_POWER_SHIFT_RATIO 154
+#define OPAL_SET_POWER_SHIFT_RATIO 155
+#define OPAL_LAST  155
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ec2087c..b9ea77f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
 int opal_get_powercap(u32 handle, int token, u32 *pcap);
 int opal_set_powercap(u32 handle, int token, u32 pcap);
+int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
+int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 void opal_wake_poller(void);
 
 void opal_powercap_init(void);
+void opal_psr_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index e79f806..9ed7d33 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
b/arch/powerpc/platforms/powernv/opal-psr.c
new file mode 100644
index 000..7313b7f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-psr.c
@@ -0,0 +1,175 @@
+/*
+ * PowerNV OPAL Power-Shift-Ratio interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-psr: " fmt
+
+#include 
+#include 
+#include 
+
+#include 
+
+DEFINE_MUTEX(psr_mutex);
+
+static struct kobject 

[PATCH V9 0/3] powernv : Add support for OPAL-OCC command/response interface

2017-07-31 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. The following commands are supported:

1) Set system powercap
2) Set CPU-GPU power shifting ratio
3) Clear min/max for OCC sensor groups

Changes from V9:
- Fixed return after erroring from mutex_lock_interruptible()
- Added documentation

Shilpasri G Bhat (3):
  powernv: powercap: Add support for powercap framework
  powernv: Add support to set power-shifting-ratio
  powernv: Add support to clear sensor groups data

 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 Documentation/ABI/testing/sysfs-firmware-opal-psr  |  18 ++
 .../bindings/powerpc/opal/sensor-groups.txt|  23 ++
 arch/powerpc/include/asm/opal-api.h|   8 +-
 arch/powerpc/include/asm/opal.h|   9 +
 arch/powerpc/include/uapi/asm/opal-occ.h   |  23 ++
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-occ.c  | 116 ++
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-psr.c  | 175 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   5 +
 arch/powerpc/platforms/powernv/opal.c  |  10 +
 12 files changed, 662 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

-- 
1.8.3.1



Re: [PATCH] fs: convert a pile of fsync routines to errseq_t based reporting

2017-07-31 Thread Jan Kara
On Fri 28-07-17 10:23:21, Jeff Layton wrote:
> From: Jeff Layton 
> 
> This patch converts most of the in-kernel filesystems that do writeback
> out of the pagecache to report errors using the errseq_t-based
> infrastructure that was recently added. This allows them to report
> errors once for each open file description.
> 
> Most filesystems have a fairly straightforward fsync operation. They
> call filemap_write_and_wait_range to write back all of the data and
> wait on it, and then (sometimes) sync out the metadata.
> 
> For those filesystems this is a straightforward conversion from calling
> filemap_write_and_wait_range in their fsync operation to calling
> file_write_and_wait_range.
> 
> Signed-off-by: Jeff Layton 

This all looks rather obvious. Feel free to add:

Acked-by: Jan Kara 

Honza


> ---
>  arch/powerpc/platforms/cell/spufs/file.c   | 2 +-
>  drivers/staging/lustre/lustre/llite/file.c | 2 +-
>  drivers/video/fbdev/core/fb_defio.c| 2 +-
>  fs/9p/vfs_file.c   | 4 ++--
>  fs/affs/file.c | 2 +-
>  fs/afs/write.c | 2 +-
>  fs/cifs/file.c | 4 ++--
>  fs/exofs/file.c| 2 +-
>  fs/f2fs/file.c | 2 +-
>  fs/hfs/inode.c | 2 +-
>  fs/hfsplus/inode.c | 2 +-
>  fs/hostfs/hostfs_kern.c| 2 +-
>  fs/hpfs/file.c | 2 +-
>  fs/jffs2/file.c| 2 +-
>  fs/jfs/file.c  | 2 +-
>  fs/ncpfs/file.c| 2 +-
>  fs/ntfs/dir.c  | 2 +-
>  fs/ntfs/file.c | 2 +-
>  fs/ocfs2/file.c| 2 +-
>  fs/reiserfs/dir.c  | 2 +-
>  fs/reiserfs/file.c | 2 +-
>  fs/ubifs/file.c| 2 +-
>  22 files changed, 24 insertions(+), 24 deletions(-)
> 
> Rolling up all of these conversions into a single patch, as Christoph
> Hellwig suggested. Most of these are not tested, but the conversion
> here is fairly straightforward.
> 
> Any maintainers who object, please let me know and I'll yank that
> part out of this patch.
> 
> diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
> b/arch/powerpc/platforms/cell/spufs/file.c
> index ae2f740a82f1..5ffcdeb1eb17 100644
> --- a/arch/powerpc/platforms/cell/spufs/file.c
> +++ b/arch/powerpc/platforms/cell/spufs/file.c
> @@ -1749,7 +1749,7 @@ static int spufs_mfc_flush(struct file *file, 
> fl_owner_t id)
>  static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
>   struct inode *inode = file_inode(file);
> - int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + int err = file_write_and_wait_range(file, start, end);
>   if (!err) {
>   inode_lock(inode);
>   err = spufs_mfc_flush(file, NULL);
> diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> b/drivers/staging/lustre/lustre/llite/file.c
> index ab1c85c1ed38..f7d07735ac66 100644
> --- a/drivers/staging/lustre/lustre/llite/file.c
> +++ b/drivers/staging/lustre/lustre/llite/file.c
> @@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t 
> end, int datasync)
>  PFID(ll_inode2fid(inode)), inode);
>   ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1);
>  
> - rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + rc = file_write_and_wait_range(file, start, end);
>   inode_lock(inode);
>  
>   /* catch async errors that were recorded back when async writeback
> diff --git a/drivers/video/fbdev/core/fb_defio.c 
> b/drivers/video/fbdev/core/fb_defio.c
> index 37f69c061210..487d5e336e1b 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, 
> loff_t end, int datasy
>  {
>   struct fb_info *info = file->private_data;
>   struct inode *inode = file_inode(file);
> - int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + int err = file_write_and_wait_range(file, start, end);
>   if (err)
>   return err;
>  
> diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
> index 3de3b4a89d89..4802d75b3cf7 100644
> --- a/fs/9p/vfs_file.c
> +++ b/fs/9p/vfs_file.c
> @@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t 
> start, loff_t end,
>   struct p9_wstat wstat;
>   int retval;
>  
> - retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + retval = file_write_and_wait_range(filp, start, end);
>   if (retval)
>   return retval;
>  
> @@ -468,7 +468,7 @@ 

Re: powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU

2017-07-31 Thread Michael Ellerman
On Thu, 2017-07-27 at 13:23:37 UTC, Michael Ellerman wrote:
> In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
> CPU, which means it has to run *on* the boot CPU.
> 
> In the past we ensured it ran on the boot CPU by changing the CPU
> affinity mask of current directly. That was removed in commit
> 6d11b87d55eb ("powerpc/smp: Replace open coded task affinity logic"),
> and replaced with a work queue call.
> 
> Unfortunately using a work queue leads to a lockdep warning, now that
> the CPU hotplug lock is a regular semaphore:
> 
>   ==
>   WARNING: possible circular locking dependency detected
>   ...
>   kworker/0:1/971 is trying to acquire lock:
>(cpu_hotplug_lock.rw_sem){++}, at: [] 
> apply_workqueue_attrs+0x34/0xa0
> 
>   but task is already holding lock:
>(()){+.+.+.}, at: [] 
> process_one_work+0x25c/0x800
>   ...
>CPU0CPU1
>
>   lock(());
>lock(cpu_hotplug_lock.rw_sem);
>lock(());
>   lock(cpu_hotplug_lock.rw_sem);
> 
> Although the deadlock can't happen in practice, because
> smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
> lockdep can't tell that.
> 
> Luckily in commit 8fb12156b8db ("init: Pin init task to the boot CPU,
> initially") tglx changed the generic code to pin init to the boot CPU
> to begin with. The unpinning of init from the boot CPU happens in
> sched_init_smp(), which is called after smp_cpus_done().
> 
> So smp_cpus_done() is always called on the boot CPU, which means we
> don't need the work queue call at all - and the lockdep warning goes
> away.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/7b7622bb95eb587cbaa79608e47b83

cheers


Re: powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler

2017-07-31 Thread Michael Ellerman
On Wed, 2017-07-26 at 13:19:04 UTC, Michael Ellerman wrote:
> Historically the boot wrapper was always built 32-bit big endian, even
> for 64-bit kernels. That was because old firmwares didn't necessarily
> support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
> uses CROSS32CC for compilation.
> 
> However when we added 64-bit little endian support, we also added
> support for building the boot wrapper 64-bit. However we kept using
> CROSS32CC, because in most cases it is just CC and everything works.
> 
> However if the user doesn't specify CROSS32_COMPILE (which no one ever
> does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
> becomes just "gcc". On native systems that is probably OK, but if we're
> cross building it definitely isn't, leading to eg:
> 
>   gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
>   gcc: error: unrecognized argument in option ‘-mabi=elfv2’
>   gcc: error: unrecognized command line option ‘-mlittle-endian’
>   make: *** [zImage] Error 2
> 
> To fix it, stop using CROSS32CC, because we may or may not be building
> 32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
> CROSS32_COMPILE if it's set and we're building for 32-bit.
> 
> Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian 
> wrapper")
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/65c5ec11c25eff6ba6e9b1cbfff014

cheers


Re: powerpc/powernv/pci: Return failure for some uses of dma_set_mask()

2017-07-31 Thread Michael Ellerman
On Wed, 2017-07-26 at 05:26:40 UTC, Alistair Popple wrote:
> Commit 8e3f1b1d8255 ("powerpc/powernv/pci: Enable 64-bit devices to access
> >4GB DMA space") introduced the ability for PCI device drivers to request a
> DMA mask between 64 and 32 bits and actually get a mask greater than
> 32-bits. However currently if certain machine configuration dependent
> conditions are not meet the code silently falls back to a 32-bit mask.
> 
> This makes it hard for device drivers to detect which mask they actually
> got. Instead we should return an error when the request could not be
> fulfilled which allows drivers to either fallback or implement other
> workarounds as documented in DMA-API-HOWTO.txt.
> 
> Signed-off-by: Alistair Popple 
> Acked-by: Russell Currey 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/253fd51e2f533552ae35a0c661705d

cheers


Re: powerpc/tm: fix TM SPRs in code dump file

2017-07-31 Thread Michael Ellerman
On Wed, 2017-07-19 at 05:44:13 UTC, Gustavo Romero wrote:
> Currently flush_tmregs_to_thread() does not update accordingly the thread
> structures from live state before a core dump rendering wrong values of
> THFAR, TFIAR, and TEXASR in core dump files.
> 
> That commit fixes it by copying from live state to the appropriate thread
> structures when it's necessary.
> 
> Signed-off-by: Gustavo Romero 
> Reviewed-by: Cyril Bur 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/cd63f3cf1d59b7ad8419eba1cac8f9

cheers


Re: [PATCH 1/2] KVM: PPC: e500: fix some NULL dereferences on error

2017-07-31 Thread Paul Mackerras
On Thu, Jul 13, 2017 at 10:38:29AM +0300, Dan Carpenter wrote:
> There are some error paths in kvmppc_core_vcpu_create_e500() where we
> forget to set the error code.  It means that we return ERR_PTR(0) which
> is NULL and it results in a NULL pointer dereference in the caller.
> 
> Signed-off-by: Dan Carpenter 

Are these user-triggerable, and therefore needing to go into 4.13
and be back-ported to the stable trees?  Or can they wait for 4.14?

Paul.