[patch added to 3.12-stable] powerpc: Fix build warning on 32-bit PPC

2017-01-25 Thread Jiri Slaby
From: Larry Finger 

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===

commit 8ae679c4bc2ea2d16d92620da8e3e9332fa4039f upstream.

I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:

AS  arch/powerpc/kernel/misc_32.o
  arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not 
defined [-Wundef]

This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created.  That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").

The offending line does not make a lot of sense.  This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.

Thanks to Nicholas Piggin for suggesting this solution.

Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, 
systbl.S")
Signed-off-by: Larry Finger 
Cc: Nicholas Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby 
---
 arch/powerpc/kernel/misc_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index ace34137a501..e23298f065df 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -313,7 +313,7 @@ _GLOBAL(flush_instruction_cache)
lis r3, KERNELBASE@h
iccci   0,r3
 #endif
-#elif CONFIG_FSL_BOOKE
+#elif defined(CONFIG_FSL_BOOKE)
 BEGIN_FTR_SECTION
mfspr   r3,SPRN_L1CSR0
ori r3,r3,L1CSR0_CFI|L1CSR0_CLFC
-- 
2.11.0



Re: [bug] stack protector panics on v4.10-rc1+

2017-01-25 Thread Benjamin Herrenschmidt
On Thu, 2017-01-26 at 18:05 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2017-01-25 at 09:24 +0530, Balbir Singh wrote:
> > That makes sense. We then wait for the right gcc version? I guess
> > we
> > also
> > push for per-task gaurd value as opposed to a global one?
> 
> I'm thinking per-cpu will be easier as r13 is readily available as
> PACA.

Actually it has to be per-task ... so we'll have to put it in the PACA
and context switch the value in there.

Cheers,
Ben.



Re: [bug] stack protector panics on v4.10-rc1+

2017-01-25 Thread Benjamin Herrenschmidt
On Wed, 2017-01-25 at 09:24 +0530, Balbir Singh wrote:
> That makes sense. We then wait for the right gcc version? I guess we
> also
> push for per-task gaurd value as opposed to a global one?

I'm thinking per-cpu will be easier as r13 is readily available as
PACA.

Cheers,
Ben.



Re: [PATCH net 2/5] ibmvnic: Fix MTU settings

2017-01-25 Thread David Miller
From: Thomas Falcon 
Date: Wed, 25 Jan 2017 15:02:20 -0600

> In the current driver, the MTU is set to the maximum value
> capable for the backing device. This patch sets the MTU to the
> default value for a Linux net device.

Why are you doing this?

What happens to users who depend upon the current behavior.

They will break, and that isn't acceptable.


Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-25 Thread David Miller
From: Thomas Falcon 
Date: Wed, 25 Jan 2017 15:02:19 -0600

> Move most interrupt handler processing into a tasklet, and
> introduce a delay after re-enabling interrupts to fix timing
> issues encountered in hardware testing.
> 
> Signed-off-by: Thomas Falcon 

I don't think you have any idea what the real problem is.  This looks
like a hack, at best.  Next patch you'll increase the delay to "20",
right?  And if that doesn't work you'll try "40".

Or if you do know the reason, you need to explain it in detail in this
commit message so that we can properly evaluate this patch.

Furthermore, if you're going to move all of your packet processing
into software interrupt context, you might as well use NAPI polling
which is a purposefully built piece of infrastructure for doing
exactly this.



[PATCH v2 10/10] VAS: Define copy/paste interfaces

2017-01-25 Thread Sukadev Bhattiprolu
Define interfaces (wrappers) to the 'copy' and 'paste' instructions
(which are new in PowerISA 3.0). These are intended to be used to
by NX driver(s) to submit Coprocessor Request Blocks (CRBs) to the
NX hardware engines.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/vas.h  | 17 +-
 drivers/misc/vas/copy-paste.h   | 74 +
 drivers/misc/vas/vas-internal.h | 14 
 drivers/misc/vas/vas-window.c   | 43 
 4 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/vas/copy-paste.h

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index a841084..27710d1 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -116,11 +116,26 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
 int vas_win_close(struct vas_window *win);
 
 /*
+ * Copy the co-processor request block (CRB) @crb into the local L2 cache.
+ * For now, @offset must be 0 and @first must be true.
+ */
+extern int vas_copy_crb(void *crb, int offset, bool first);
+
+/*
+ * Paste a previously copied CRB (see vas_copy_crb()) from the L2 cache to
+ * the hardware address associated with the window @win. For now, @off must
+ * 0 and @last must be true. @re is expected/assumed to be true for NX windows.
+ */
+extern int vas_paste_crb(struct vas_window *win, int off, bool last, bool re);
+
+
+
+
+/*
  * Get/Set bit fields
  */
 #define GET_FIELD(m, v)(((v) & (m)) >> MASK_LSH(m))
 #define MASK_LSH(m)(__builtin_ffsl(m) - 1)
 #define SET_FIELD(m, v, val)   \
(((v) & ~(m)) | typeof(v))(val)) << MASK_LSH(m)) & (m)))
-
 #endif
diff --git a/drivers/misc/vas/copy-paste.h b/drivers/misc/vas/copy-paste.h
new file mode 100644
index 000..7783bb8
--- /dev/null
+++ b/drivers/misc/vas/copy-paste.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/*
+ * Macros taken from tools/testing/selftests/powerpc/context_switch/cp_abort.c
+ */
+#define PASTE(RA, RB, L, RC) \
+   .long (0x7c00070c | (RA) << (31-15) | (RB) << (31-20) \
+ | (L) << (31-10) | (RC) << (31-31))
+
+#define COPY(RA, RB, L) \
+   .long (0x7c00060c | (RA) << (31-15) | (RB) << (31-20) \
+ | (L) << (31-10))
+
+#define CR0_FXM"0x80"
+#define CR0_SHIFT  28
+#define CR0_MASK   0xF
+/*
+ * Copy/paste instructions:
+ *
+ * copy RA,RB,L
+ * Copy contents of address (RA) + effective_address(RB)
+ * to internal copy-buffer.
+ *
+ * L == 1 indicates this is the first copy.
+ *
+ * L == 0 indicates its a continuation of a prior first copy.
+ *
+ * paste RA,RB,L
+ * Paste contents of internal copy-buffer to the address
+ * (RA) + effective_address(RB)
+ *
+ * L == 0 indicates its a continuation of a prior paste. i.e.
+ * don't wait for the completion or update status.
+ *
+ * L == 1 indicates this is the last paste in the group (i.e.
+ * wait for the group to complete and update status in CR0).
+ *
+ * For Power9, the L bit must be 'true' in both copy and paste.
+ */
+
+static inline int vas_copy(void *crb, int offset, int first)
+{
+   WARN_ON_ONCE(!first);
+
+   __asm__ __volatile(stringify_in_c(COPY(%0, %1, %2))";"
+   :
+   : "b" (offset), "b" (crb), "i" (1)
+   : "memory");
+
+   return 0;
+}
+
+static inline int vas_paste(void *paste_address, int offset, int last)
+{
+   unsigned long long cr;
+
+   WARN_ON_ONCE(!last);
+
+   cr = 0;
+   __asm__ __volatile(stringify_in_c(PASTE(%1, %2, 1, 1))";"
+   "mfocrf %0," CR0_FXM ";"
+   : "=r" (cr)
+   : "b" (paste_address), "b" (offset)
+   : "memory");
+
+   return cr;
+}
diff --git a/drivers/misc/vas/vas-internal.h b/drivers/misc/vas/vas-internal.h
index 139d12a..d1c2b90 100644
--- a/drivers/misc/vas/vas-internal.h
+++ b/drivers/misc/vas/vas-internal.h
@@ -450,4 +450,18 @@ static inline uint64_t read_hvwc_reg(struct vas_window 
*win,
return in_be64(win->hvwc_map+reg);
 }
 
+#ifdef vas_debug
+
+static void print_fifo_msg_count(struct vas_window *txwin)
+{
+   uint64_t read_hvwc_reg(struct vas_window *w, char *n, uint64_t o);
+   pr_devel("Winid %d, Msg count %llu\n", txwin->winid,
+   (uint64_t)read_hvwc_reg(txwin, VREG(LRFIFO_PUSH)));
+}
+#else  /* vas_debug */
+
+#define print_fifo_msg_count(window)
+
+#endif /* vas_debug */
+
 #endif
diff --git a/drivers/misc/vas/vas-window.c 

[PATCH v2 07/10] VAS: Define vas_rx_win_open() interface

2017-01-25 Thread Sukadev Bhattiprolu
Define the vas_rx_win_open() interface. This interface is intended to be
used by the Nest Accelerator (NX) driver(s) to setup receive windows for
one or more NX engines (which implement compression/encryption algorithms
in the hardware).

Follow-on patches will provide an interface to close the window and to open
a send window that kenrel subsystems can use to access the NX engines.

The interface to open a receive window is expected to be invoked for each
instance of VAS in the system.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/vas.h  |  39 +
 drivers/misc/vas/vas-internal.h |  11 +++
 drivers/misc/vas/vas-window.c   | 182 
 3 files changed, 232 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index fef9e87..b6362e9 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -38,6 +38,45 @@ enum vas_thresh_ctl {
 };
 
 /*
+ * Receive window attributes specified by the (in-kernel) owner of window.
+ */
+struct vas_rx_win_attr {
+   void *rx_fifo;
+   int rx_fifo_size;
+   int wcreds_max;
+
+   bool pin_win;
+   bool rej_no_credit;
+   bool tx_wcred_mode;
+   bool rx_wcred_mode;
+   bool tx_win_ord_mode;
+   bool rx_win_ord_mode;
+   bool data_stamp;
+   bool nx_win;
+   bool fault_win;
+   bool notify_disable;
+   bool intr_disable;
+   bool notify_early;
+
+   int lnotify_lpid;
+   int lnotify_pid;
+   int lnotify_tid;
+   int pswid;
+
+   enum vas_thresh_ctl tc_mode;
+};
+
+/*
+ * Open a VAS receive window for the instance of VAS identified by @vasid
+ * Use @attr to initialize the attributes of the window.
+ *
+ * Return a handle to the window or ERR_PTR() on error.
+ */
+struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop,
+   struct vas_rx_win_attr *attr);
+
+
+/*
  * Get/Set bit fields
  */
 #define GET_FIELD(m, v)(((v) & (m)) >> MASK_LSH(m))
diff --git a/drivers/misc/vas/vas-internal.h b/drivers/misc/vas/vas-internal.h
index 0a396ea..139d12a 100644
--- a/drivers/misc/vas/vas-internal.h
+++ b/drivers/misc/vas/vas-internal.h
@@ -396,6 +396,16 @@ extern struct vas_instance *find_vas_instance(int vasid);
 #define VREG(r)VREG_SFX(r, _OFFSET)
 
 #ifndef vas_debug
+static inline void dump_rx_win_attr(struct vas_rx_win_attr *attr)
+{
+   pr_err("VAS: fault %d, notify %d, intr %d early %d\n",
+   attr->fault_win, attr->notify_disable,
+   attr->intr_disable, attr->notify_early);
+
+   pr_err("VAS: rx_fifo_size %d, max value %d\n",
+   attr->rx_fifo_size, VAS_RX_FIFO_SIZE_MAX);
+}
+
 static inline void vas_log_write(struct vas_window *win, char *name,
void *regptr, uint64_t val)
 {
@@ -408,6 +418,7 @@ static inline void vas_log_write(struct vas_window *win, 
char *name,
 #else  /* vas_debug */
 
 #define vas_log_write(win, name, reg, val)
+#define dump_rx_win_attr(attr)
 
 #endif /* vas_debug */
 
diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index 3ea698a..a640d59 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -529,3 +529,185 @@ int vas_window_reset(struct vas_instance *vinst, int 
winid)
 
return 0;
 }
+
+static void put_rx_win(struct vas_window *rxwin)
+{
+   /* Better not be a send window! */
+   WARN_ON_ONCE(rxwin->tx_win);
+
+   atomic_dec(>num_txwins);
+}
+
+struct vas_window *get_vinstance_rxwin(struct vas_instance *vinst,
+   enum vas_cop_type cop)
+{
+   struct vas_window *rxwin;
+
+   mutex_lock(>mutex);
+
+   rxwin = vinst->rxwin[cop];
+   if (rxwin)
+   atomic_inc(>num_txwins);
+
+   mutex_unlock(>mutex);
+
+   return rxwin;
+}
+
+static void set_vinstance_rxwin(struct vas_instance *vinst,
+   enum vas_cop_type cop, struct vas_window *window)
+{
+   mutex_lock(>mutex);
+
+   /*
+* There should only be one receive window for a coprocessor type.
+*/
+   WARN_ON_ONCE(vinst->rxwin[cop]);
+   vinst->rxwin[cop] = window;
+
+   mutex_unlock(>mutex);
+}
+
+static void init_winctx_for_rxwin(struct vas_window *rxwin,
+   struct vas_rx_win_attr *rxattr,
+   struct vas_winctx *winctx)
+{
+   /*
+* We first zero (memset()) all fields and only set non-zero fields.
+* Following fields are 0/false but maybe deserve a comment:
+*
+*  ->user_win  No support for user Rx windows yet
+*  ->notify_os_intr_regIn powerNV, send intrs to HV
+*  ->notify_disableFalse for NX windows
+*  ->xtra_writeFalse for NX windows
+*  ->notify_early  NA for NX 

[PATCH v2 06/10] VAS: Define helpers to alloc/free windows

2017-01-25 Thread Sukadev Bhattiprolu
Define helpers to allocate/free VAS window objects. These will
be used in follow-on patches when opening/closing windows.

Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/misc/vas/vas-window.c | 72 ++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index c2e6b4e..3ea698a 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -454,8 +454,78 @@ int init_winctx_regs(struct vas_window *window, struct 
vas_winctx *winctx)
return 0;
 }
 
-/* stub for now */
+DEFINE_SPINLOCK(vas_ida_lock);
+
+void vas_release_window_id(struct ida *ida, int winid)
+{
+   spin_lock(_ida_lock);
+   ida_remove(ida, winid);
+   spin_unlock(_ida_lock);
+}
+
+int vas_assign_window_id(struct ida *ida)
+{
+   int rc, winid;
+
+   rc = ida_pre_get(ida, GFP_KERNEL);
+   if (!rc)
+   return -EAGAIN;
+
+   spin_lock(_ida_lock);
+   rc = ida_get_new_above(ida, 1, );
+   spin_unlock(_ida_lock);
+
+   if (rc)
+   return rc;
+
+   if (winid > VAS_MAX_WINDOWS_PER_CHIP) {
+   pr_err("VAS: Too many (%d) open windows\n", winid);
+   vas_release_window_id(ida, winid);
+   return -EAGAIN;
+   }
+
+   return winid;
+}
+
+static void vas_window_free(struct vas_window *window)
+{
+   unmap_wc_mmio_bars(window);
+   kfree(window->paste_addr_name);
+   kfree(window);
+}
+
+static struct vas_window *vas_window_alloc(struct vas_instance *vinst, int id)
+{
+   struct vas_window *window;
+
+   window = kzalloc(sizeof(*window), GFP_KERNEL);
+   if (!window)
+   return NULL;
+
+   window->vinst = vinst;
+   window->winid = id;
+
+   if (map_wc_mmio_bars(window))
+   goto out_free;
+
+   return window;
+
+out_free:
+   kfree(window);
+   return NULL;
+}
+
 int vas_window_reset(struct vas_instance *vinst, int winid)
 {
+   struct vas_window *window;
+
+   window = vas_window_alloc(vinst, winid);
+   if (!window)
+   return -ENOMEM;
+
+   reset_window_regs(window);
+
+   vas_window_free(window);
+
return 0;
 }
-- 
2.7.4



[PATCH v2 05/10] VAS: Define helpers to init window context

2017-01-25 Thread Sukadev Bhattiprolu
Define helpers to initialize window context registers of the VAS
hardware. These will be used in follow-on patches when opening/closing
VAS windows.

Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/misc/vas/vas-internal.h |  56 +++
 drivers/misc/vas/vas-window.c   | 330 
 2 files changed, 386 insertions(+)

diff --git a/drivers/misc/vas/vas-internal.h b/drivers/misc/vas/vas-internal.h
index 61cfaad..0a396ea 100644
--- a/drivers/misc/vas/vas-internal.h
+++ b/drivers/misc/vas/vas-internal.h
@@ -11,6 +11,7 @@
 #define VAS_INTERNAL_H
 #include 
 #include 
+#include 
 #include 
 
 #ifdef CONFIG_PPC_4K_PAGES
@@ -383,4 +384,59 @@ struct vas_winctx {
 extern int vas_initialized;
 extern int vas_window_reset(struct vas_instance *vinst, int winid);
 extern struct vas_instance *find_vas_instance(int vasid);
+
+/*
+ * VREG(x):
+ * Expand a register's short name (eg: LPID) into two parameters:
+ * - the register's short name in string form ("LPID"), and
+ * - the name of the macro (eg: VAS_LPID_OFFSET), defining the
+ *   register's offset in the window context
+ */
+#define VREG_SFX(n, s) __stringify(n), VAS_##n##s
+#define VREG(r)VREG_SFX(r, _OFFSET)
+
+#ifndef vas_debug
+static inline void vas_log_write(struct vas_window *win, char *name,
+   void *regptr, uint64_t val)
+{
+   if (val)
+   pr_err("%swin #%d: %s reg %p, val 0x%llx\n",
+   win->tx_win ? "Tx" : "Rx", win->winid, name,
+   regptr, val);
+}
+
+#else  /* vas_debug */
+
+#define vas_log_write(win, name, reg, val)
+
+#endif /* vas_debug */
+
+static inline void write_uwc_reg(struct vas_window *win, char *name,
+   int32_t reg, uint64_t val)
+{
+   void *regptr;
+
+   regptr = win->uwc_map + reg;
+   vas_log_write(win, name, regptr, val);
+
+   out_be64(regptr, val);
+}
+
+static inline void write_hvwc_reg(struct vas_window *win, char *name,
+   int32_t reg, uint64_t val)
+{
+   void *regptr;
+
+   regptr = win->hvwc_map + reg;
+   vas_log_write(win, name, regptr, val);
+
+   out_be64(regptr, val);
+}
+
+static inline uint64_t read_hvwc_reg(struct vas_window *win,
+   char *name __maybe_unused, int32_t reg)
+{
+   return in_be64(win->hvwc_map+reg);
+}
+
 #endif
diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index cfbd2f4..c2e6b4e 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -14,6 +14,8 @@
 #include 
 #include "vas-internal.h"
 
+static int fault_winid;
+
 /*
  * Compute the paste address region for the window @window using the
  * ->win_base_addr and ->win_id_shift we got from device tree.
@@ -124,6 +126,334 @@ int map_wc_mmio_bars(struct vas_window *window)
return 0;
 }
 
+/*
+ * Reset all valid registers in the HV and OS/User Window Contexts for
+ * the window identified by @window.
+ *
+ * NOTE: We cannot really use a for loop to reset window context. Not all
+ *  offsets in a window context are valid registers and the valid
+ *  registers are not sequential. And, we can only write to offsets
+ *  with valid registers (or is that only in Simics?).
+ */
+void reset_window_regs(struct vas_window *window)
+{
+   write_hvwc_reg(window, VREG(LPID), 0ULL);
+   write_hvwc_reg(window, VREG(PID), 0ULL);
+   write_hvwc_reg(window, VREG(XLATE_MSR), 0ULL);
+   write_hvwc_reg(window, VREG(XLATE_LPCR), 0ULL);
+   write_hvwc_reg(window, VREG(XLATE_CTL), 0ULL);
+   write_hvwc_reg(window, VREG(AMR), 0ULL);
+   write_hvwc_reg(window, VREG(SEIDR), 0ULL);
+   write_hvwc_reg(window, VREG(FAULT_TX_WIN), 0ULL);
+   write_hvwc_reg(window, VREG(OSU_INTR_SRC_RA), 0ULL);
+   write_hvwc_reg(window, VREG(HV_INTR_SRC_RA), 0ULL);
+   write_hvwc_reg(window, VREG(PSWID), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE1), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE2), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE3), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE4), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE5), 0ULL);
+   write_hvwc_reg(window, VREG(SPARE6), 0ULL);
+   write_hvwc_reg(window, VREG(LFIFO_BAR), 0ULL);
+   write_hvwc_reg(window, VREG(LDATA_STAMP_CTL), 0ULL);
+   write_hvwc_reg(window, VREG(LDMA_CACHE_CTL), 0ULL);
+   write_hvwc_reg(window, VREG(LRFIFO_PUSH), 0ULL);
+   write_hvwc_reg(window, VREG(CURR_MSG_COUNT), 0ULL);
+   write_hvwc_reg(window, VREG(LNOTIFY_AFTER_COUNT), 0ULL);
+   write_hvwc_reg(window, VREG(LRX_WCRED), 0ULL);
+   write_hvwc_reg(window, VREG(LRX_WCRED_ADDER), 0ULL);
+   write_hvwc_reg(window, VREG(TX_WCRED), 0ULL);
+   write_hvwc_reg(window, VREG(TX_WCRED_ADDER), 0ULL);
+   write_hvwc_reg(window, VREG(LFIFO_SIZE), 0ULL);
+   write_hvwc_reg(window, VREG(WINCTL), 0ULL);
+   

[PATCH v2 08/10] VAS: Define vas_win_close() interface

2017-01-25 Thread Sukadev Bhattiprolu
Define the vas_win_close() interface which should be used to close a
send or receive windows.

While the hardware configurations required to open send and receive windows
differ, the configuration to close a window is the same for both. So we use
a single interface to close the window.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/vas.h |  6 +
 drivers/misc/vas/vas-window.c  | 52 ++
 2 files changed, 58 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index b6362e9..bda851a 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -75,6 +75,12 @@ struct vas_rx_win_attr {
 struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop,
struct vas_rx_win_attr *attr);
 
+/*
+ * Close the send or receive window identified by @win. For receive windows
+ * return -EAGAIN if there are active send windows attached to this receive
+ * window.
+ */
+int vas_win_close(struct vas_window *win);
 
 /*
  * Get/Set bit fields
diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index a640d59..4b06780 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -711,3 +711,55 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
vas_release_window_id(>ida, rxwin->winid);
return ERR_PTR(rc);
 }
+
+int vas_win_close(struct vas_window *window)
+{
+   uint64_t val;
+   int cached;
+
+   if (!window)
+   return 0;
+
+   if (!window->tx_win && atomic_read(>num_txwins) != 0) {
+   pr_devel("VAS: Attempting to close an active Rx window!\n");
+   WARN_ON_ONCE(1);
+   return -EAGAIN;
+   }
+
+   /* Unpin window from cache and close it */
+   val = 0ULL;
+   val = SET_FIELD(VAS_WINCTL_PIN, val, 0);
+   val = SET_FIELD(VAS_WINCTL_OPEN, val, 0);
+   write_hvwc_reg(window, VREG(WINCTL), val);
+
+   /*
+* See Section 1.11.1 for details on closing window, including
+*  - disable new paste operations
+*  - block till pending requests are completed
+*  - If Rx window, ensure FIFO is empty.
+*/
+
+   /* Cast window context out of the cache */
+retry:
+   val = read_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL));
+   cached = GET_FIELD(val, VAS_WIN_CACHE_STATUS);
+   if (cached) {
+   val = 0ULL;
+   val = SET_FIELD(VAS_CASTOUT_REQ, val, 1);
+   val = SET_FIELD(VAS_PUSH_TO_MEM, val, 0);
+   write_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL), val);
+
+   schedule_timeout(2000);
+   goto retry;
+   }
+
+   /* if send window, drop reference to matching receive window */
+   if (window->tx_win)
+   put_rx_win(window->rxwin);
+
+   vas_release_window_id(>vinst->ida, window->winid);
+
+   vas_window_free(window);
+
+   return 0;
+}
-- 
2.7.4



[PATCH v2 09/10] VAS: Define vas_tx_win_open()

2017-01-25 Thread Sukadev Bhattiprolu
Define an interface to open a VAS send window. This interface is
intended to be used the Nest Accelerator (NX) driver(s) to open
a send window and use it to submit compression/encryption requests
to a VAS receive window.

The receive window, identified by the [node, chip, cop] parameters,
must already be open in VAS (i.e connected to an NX engine).

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/vas.h |  33 ++
 drivers/misc/vas/vas-window.c  | 142 +
 2 files changed, 175 insertions(+)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index bda851a..a841084 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -67,6 +67,26 @@ struct vas_rx_win_attr {
 };
 
 /*
+ * Window attributes specified by the in-kernel owner of a send window.
+ */
+struct vas_tx_win_attr {
+   enum vas_cop_type cop;
+   int wcreds_max;
+   int lpid;
+   int pid;
+   int pswid;
+   int rsvd_txbuf_count;
+
+   bool user_win;
+   bool pin_win;
+   bool rej_no_credit;
+   bool rsvd_txbuf_enable;
+   bool tx_win_ord_mode;
+   bool rx_win_ord_mode;
+   enum vas_thresh_ctl tc_mode;
+};
+
+/*
  * Open a VAS receive window for the instance of VAS identified by @vasid
  * Use @attr to initialize the attributes of the window.
  *
@@ -76,6 +96,19 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
struct vas_rx_win_attr *attr);
 
 /*
+ * Open a VAS send window for the instance of VAS identified by @vasid
+ * and the co-processor type @cop. Use @attr to initialize attributes
+ * of the window.
+ *
+ * Note: The instance of VAS must already have an open receive window for
+ * the coprocessor type @cop.
+ *
+ * Return a handle to the send window or ERR_PTR() on error.
+ */
+struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop,
+   struct vas_tx_win_attr *attr);
+
+/*
  * Close the send or receive window identified by @win. For receive windows
  * return -EAGAIN if there are active send windows attached to this receive
  * window.
diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index 4b06780..3b4b801 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -712,6 +712,148 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
return ERR_PTR(rc);
 }
 
+static void init_winctx_for_txwin(struct vas_window *txwin,
+   struct vas_tx_win_attr *txattr,
+   struct vas_winctx *winctx)
+{
+   /*
+* We first zero all fields and only set non-zero ones. Following
+* are some fields set to 0/false for the stated reason:
+*
+*  ->notify_os_intr_regIn powerNV, send intrs to HV
+*  ->rsvd_txbuf_count  Not supported yet.
+*  ->notify_disableFalse for NX windows
+*  ->xtra_writeFalse for NX windows
+*  ->notify_early  NA for NX windows
+*  ->lnotify_lpid  NA for Tx windows
+*  ->lnotify_pid   NA for Tx windows
+*  ->lnotify_tid   NA for Tx windows
+*  ->tx_win_cred_mode  Ignore for now for NX windows
+*  ->rx_win_cred_mode  Ignore for now for NX windows
+*/
+   memset(winctx, 0, sizeof(struct vas_winctx));
+
+   winctx->wcreds_max = txattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
+
+   winctx->user_win = txattr->user_win;
+   winctx->nx_win = txwin->rxwin->nx_win;
+   winctx->pin_win = txattr->pin_win;
+
+   winctx->rx_word_mode = true;
+   winctx->tx_word_mode = true;
+
+   if (winctx->nx_win) {
+   winctx->data_stamp = true;
+   winctx->intr_disable = true;
+   }
+
+   winctx->lpid = txattr->lpid;
+   winctx->pid = txattr->pid;
+   winctx->rx_win_id = txwin->rxwin->winid;
+   winctx->fault_win_id = fault_winid;
+
+   winctx->dma_type = VAS_DMA_TYPE_INJECT;
+   winctx->tc_mode = txattr->tc_mode;
+   winctx->min_scope = VAS_SCOPE_LOCAL;
+   winctx->max_scope = VAS_SCOPE_VECTORED_GROUP;
+   winctx->irq_port = txwin->irq_port;
+}
+
+static bool tx_win_args_valid(enum vas_cop_type cop,
+   struct vas_tx_win_attr *attr)
+{
+   if (attr->tc_mode != VAS_THRESH_DISABLED)
+   return false;
+
+   if (cop > VAS_COP_TYPE_MAX)
+   return false;
+
+   if (attr->user_win) {
+   if (cop != VAS_COP_TYPE_GZIP && cop != VAS_COP_TYPE_GZIP_HIPRI)
+   return false;
+
+   if (attr->rsvd_txbuf_count != 0)
+   return false;
+   }
+
+   return true;
+}
+
+struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop,
+   struct 

Re: Query regarding randomization bits for a ASLR elf on PPC64

2017-01-25 Thread Kees Cook
On Sun, Jan 22, 2017 at 9:34 PM, Bhupesh Sharma  wrote:
> I was recently looking at ways to extend the randomization range for a
> ASLR elf on a PPC64LE system.
>
> I basically have been using 28-bits of randomization on x86_64 for an
> ASLR elf using appropriate ARCH_MMAP_RND_BITS_MIN and
> ARCH_MMAP_RND_BITS_MAX values:
>
> http://lxr.free-electrons.com/source/arch/x86/Kconfig#L192
>
> And I understand from looking at the PPC64 code base that both
> ARCH_MMAP_RND_BITS_MIN and ARCH_MMAP_RND_BITS_MAX are not used in the
> current upstream code.

Yeah, looks like PPC could use it. If you've got hardware to test
with, please add it. :)

> I am looking at ways to randomize the mmap, stack and brk ranges for a
> ALSR elf on PPC64LE. Currently I am using a PAGE SIZE of 64K in my
> config file and hence the randomization usually translates to
> something like this for me:

Just to be clear: 64K pages will lose you 4 bits of entropy when
compared to 4K on x86_64. (Assuming I'm doing the math right...)

> mmap:
> ---
> http://lxr.free-electrons.com/source/arch/powerpc/mm/mmap.c#L67
>
> rnd = get_random_long() % (1UL<<(30-PAGE_SHIFT));
>
> Since PAGE_SHIFT is 16 for 64K page size, this computation reduces to:
> rnd = get_random_long() % (1UL<<(14));
>
> If I compare this to x86_64, I see there:
>
> http://lxr.free-electrons.com/source/arch/x86/mm/mmap.c#L79
>
> rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
>
> So, if mmap_rnd_bits = 28, this equates to:
> rnd = get_random_long() & ((1UL << 28) - 1);
>
> Observations and Queries:
> --
>
> - So, x86_64 gives approx twice number of random bits for a ASLR elf
> running on it as compared to PPC64 although both use a 48-bit VA.
>
> - I also see this comment for PPC at various places, regarding 1GB
> randomness spread for PPC64. Is this restricted by the hardware or the
> kernel usage?:
>
> /* 8MB for 32bit, 1GB for 64bit */
>  64 if (is_32bit_task())
>  65 rnd = get_random_long() % (1<<(23-PAGE_SHIFT));
>  66 else
>  67 rnd = get_random_long() % (1UL<<(30-PAGE_SHIFT));

Yeah, I'm not sure about this. The comments above the MIN_GAP* macros
seem to talk about making sure there is the 1GB stack gap, but that
shouldn't limit mmap.

Stack base is randomized in fs/binfmt_elf.c randomize_stack_top()
which uses STACK_RND_MASK (and PAGE_SHIFT).

x86:
/* 1GB for 64bit, 8MB for 32bit */
#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3f)

powerpc:
/* 1GB for 64bit, 8MB for 32bit */
#define STACK_RND_MASK (is_32bit_task() ? \
(0x7ff >> (PAGE_SHIFT - 12)) : \
(0x3 >> (PAGE_SHIFT - 12)))

So, in the 64k page case, stack randomization entropy is reduced, but
otherwise identical to x86.

x86 and powerpc both use arch_mmap_rnd() for both mmap and ET_DYN
(with different bases).

x86 uses ELF_ET_DYN_BASE as TASK_SIZE / 3 * 2 (which the ELF loader
pushes back up the nearest PAGE_SIZE alignment: 0x5000),
though powerpc uses 0x2000, so it should have significantly more
space for mmap and ET_DYN ASLR than x86.

> - I tried to increase the randomness to 28 bits for PPC as well by
> making the PPC mmap, brk code equivalent to x86_64 and it works fine
> for my use case.

The PPC brk randomization on powerpc doesn't use the more common
randomize_page() way other archs do it...

/* 8MB for 32bit, 1GB for 64bit */
if (is_32bit_task())
rnd = (get_random_long() % (1UL<<(23-PAGE_SHIFT)));
else
rnd = (get_random_long() % (1UL<<(30-PAGE_SHIFT)));

return rnd << PAGE_SHIFT;

x86 uses 0x0200 (via randomize_page()), which, if I'm doing the
math right is 14 bits, regardless of 32/64-bit. arm64 uses 0x4000
(20 bits) on 64-bit processes and the same as x86 (14) for 32-bit
processes. Looks like powerpc uses either 13 or 20 for 4k pages, which
is close to the same.

> - But, I am not sure this is the right thing to do and whether the
> PPC64 also supports the MIN and MAX ranges for randomization.

It can support it once you implement the Kconfigs for it. :)

> - If it does I would like to understand, test and push a patch to
> implement the same for PPC64 in upstream.
>
> Sorry for the long mail, but would really appreciate if someone can
> help me understand the details here.

Hopefully this helped a bit. I would literally draw out the memory
map, and double-check nothing can collide at your max values.

-Kees

-- 
Kees Cook
Nexus Security


[PATCH v2 01/10] VAS: Define macros, register fields and structures

2017-01-25 Thread Sukadev Bhattiprolu
Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- Add an overview of VAS in vas-internal.h
- Get window context parameters from device tree and drop
  unnecessary macros.
---
 MAINTAINERS |   6 +
 arch/powerpc/include/asm/vas.h  |  40 +
 drivers/misc/vas/vas-internal.h | 383 
 3 files changed, 429 insertions(+)
 create mode 100644 arch/powerpc/include/asm/vas.h
 create mode 100644 drivers/misc/vas/vas-internal.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 63cefa6..54f015c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12739,6 +12739,12 @@ S: Maintained
 F: Documentation/fb/uvesafb.txt
 F: drivers/video/fbdev/uvesafb.*
 
+VAS (IBM Virtual Accelerator Switchboard) DRIVER
+M: Sukadev Bhattiprolu 
+L: linuxppc-dev@lists.ozlabs.org
+S: Supported
+F: drivers/misc/vas/*
+
 VF610 NAND DRIVER
 M: Stefan Agner 
 L: linux-...@lists.infradead.org
diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
new file mode 100644
index 000..1c10437
--- /dev/null
+++ b/arch/powerpc/include/asm/vas.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef VAS_H
+#define VAS_H
+
+#define VAS_RX_FIFO_SIZE_MAX   (8 << 20)   /* 8MB */
+/*
+ * Co-processor Engine type.
+ */
+enum vas_cop_type {
+   VAS_COP_TYPE_FAULT,
+   VAS_COP_TYPE_842,
+   VAS_COP_TYPE_842_HIPRI,
+   VAS_COP_TYPE_GZIP,
+   VAS_COP_TYPE_GZIP_HIPRI,
+   VAS_COP_TYPE_MAX,
+};
+
+/*
+ * Threshold Control Mode: Have paste operation fail if the number of
+ * requests in receive FIFO exceeds a threshold.
+ *
+ * NOTE: No special error code yet if paste is rejected because of these
+ *  limits. So users can't distinguish between this and other errors.
+ */
+enum vas_thresh_ctl {
+   VAS_THRESH_DISABLED,
+   VAS_THRESH_FIFO_GT_HALF_FULL,
+   VAS_THRESH_FIFO_GT_QTR_FULL,
+   VAS_THRESH_FIFO_GT_EIGHTH_FULL,
+};
+
+#endif
diff --git a/drivers/misc/vas/vas-internal.h b/drivers/misc/vas/vas-internal.h
new file mode 100644
index 000..aa4e781
--- /dev/null
+++ b/drivers/misc/vas/vas-internal.h
@@ -0,0 +1,383 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef VAS_INTERNAL_H
+#define VAS_INTERNAL_H
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_PPC_4K_PAGES
+#  error "TODO: Compute RMA/Paste-address for 4K pages."
+#else
+#ifndef CONFIG_PPC_64K_PAGES
+#  error "Unexpected Page size."
+#endif
+#endif
+
+/*
+ * Overview of Virtual Accelerator Switchboard (VAS).
+ *
+ * VAS is a hardware "switchboard" that allows senders and receivers to
+ * exchange messages with _minimal_ kernel involvment. The receivers are
+ * typically NX coprocessor engines that perform compression or encryption
+ * in hardware, but receivers can also be other software threads.
+ *
+ * Senders are user/kernel threads that submit compression/encryption or
+ * other requests to the receivers. Senders must format their messages as
+ * Coprocessor Request Blocks (CRB)s and submit them using the instructions
+ * "copy" and "paste" which were introduced in Power9.
+ *
+ * A Power node can have (upto?) 8 Power chips. There is one instance of
+ * VAS in each Power9 chip. Each instance of VAS has 64K windows or ports,
+ * Senders and receivers must each connect to a separate window before they
+ * can exchange messages through the switchboard.
+ *
+ * Each window is described by two types of window contexts:
+ *
+ * Hypervisor Window Context (HVWC) of size VAS_HVWC_SIZE bytes
+ * OS/User Window Context (UWC) of size VAS_UWC_SIZE bytes.
+ *
+ * A window context can be viewed as a set of 64-bit registers. The settings
+ * in these registers configure/control/determine the behavior of the VAS
+ * hardware when messages are sent/received through the window. The registers
+ * in the HVWC are configured by the kernel while the registers in the UWC can
+ * be configured by the kernel or by the user space application that is using
+ * the window.
+ *
+ * The HVWCs for all windows on a specific instance of VAS are in a contiguous
+ * range of hardware addresses or Base address region (BAR) referred to as the
+ * HVWC BAR for the instance. Similarly the UWCs for all windows on an instance
+ 

[PATCH v2 04/10] VAS: Define helpers for access MMIO regions

2017-01-25 Thread Sukadev Bhattiprolu
Define some helper functions to access the MMIO regions. We use these
in a follow-on patches to read/write VAS hardware registers. These
helpers are also used to later issue 'paste' instructions to submit
requests to the NX hardware engines.

Signed-off-by: Sukadev Bhattiprolu 

Changelog [v2]:
- Get HVWC, UWC and paste addresses from window->vinst (i.e DT)
  rather than kernel macros.
---
 drivers/misc/vas/vas-window.c | 112 ++
 1 file changed, 112 insertions(+)

diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
index 468f3bf..cfbd2f4 100644
--- a/drivers/misc/vas/vas-window.c
+++ b/drivers/misc/vas/vas-window.c
@@ -9,9 +9,121 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include "vas-internal.h"
 
+/*
+ * Compute the paste address region for the window @window using the
+ * ->win_base_addr and ->win_id_shift we got from device tree.
+ */
+void compute_paste_address(struct vas_window *window, uint64_t *addr, int *len)
+{
+   uint64_t base, shift;
+   int winid;
+
+   base = window->vinst->win_base_addr;
+   shift = window->vinst->win_id_shift;
+   winid = window->winid;
+
+   *addr  = base + (winid << shift);
+   *len = PAGE_SIZE;
+
+   pr_debug("Txwin #%d: Paste addr 0x%llx\n", winid, *addr);
+}
+
+static inline void get_hvwc_mmio_bar(struct vas_window *window,
+   uint64_t *start, int *len)
+{
+   uint64_t pbaddr;
+
+   pbaddr = window->vinst->hvwc_bar_start;
+   *start = pbaddr + window->winid * VAS_HVWC_SIZE;
+   *len = VAS_HVWC_SIZE;
+}
+
+static inline void get_uwc_mmio_bar(struct vas_window *window,
+   uint64_t *start, int *len)
+{
+   uint64_t pbaddr;
+
+   pbaddr = window->vinst->uwc_bar_start;
+   *start = pbaddr + window->winid * VAS_UWC_SIZE;
+   *len = VAS_UWC_SIZE;
+}
+
+static void *map_mmio_region(char *name, uint64_t start, int len)
+{
+   void *map;
+
+   if (!request_mem_region(start, len, name)) {
+   pr_devel("%s(): request_mem_region(0x%llx, %d) failed\n",
+   __func__, start, len);
+   return NULL;
+   }
+
+   map = __ioremap(start, len, pgprot_val(pgprot_cached(__pgprot(0;
+   if (!map) {
+   pr_devel("%s(): ioremap(0x%llx, %d) failed\n", __func__, start,
+   len);
+   return NULL;
+   }
+
+   return map;
+}
+
+/*
+ * Unmap the MMIO regions for a window.
+ */
+void unmap_wc_mmio_bars(struct vas_window *window)
+{
+   int len;
+   uint64_t busaddr_start;
+
+   if (window->paste_kaddr) {
+   iounmap(window->paste_kaddr);
+   compute_paste_address(window, _start, );
+   release_mem_region((phys_addr_t)busaddr_start, len);
+   }
+
+   if (window->hvwc_map) {
+   iounmap(window->hvwc_map);
+   get_hvwc_mmio_bar(window, _start, );
+   release_mem_region((phys_addr_t)busaddr_start, len);
+   }
+
+   if (window->uwc_map) {
+   iounmap(window->uwc_map);
+   get_uwc_mmio_bar(window, _start, );
+   release_mem_region((phys_addr_t)busaddr_start, len);
+   }
+}
+
+/*
+ * Find the Hypervisor Window Context (HVWC) MMIO Base Address Region and the
+ * OS/User Window Context (UWC) MMIO Base Address Region for the given window.
+ * Map these bus addresses and save the mapped kernel addresses in @window.
+ */
+int map_wc_mmio_bars(struct vas_window *window)
+{
+   int len;
+   uint64_t start;
+
+   window->hvwc_map = window->uwc_map = NULL;
+
+   get_hvwc_mmio_bar(window, , );
+   window->hvwc_map = map_mmio_region("HVWCM_Window", start, len);
+
+   get_uwc_mmio_bar(window, , );
+   window->uwc_map = map_mmio_region("UWCM_Window", start, len);
+
+   if (!window->hvwc_map || !window->uwc_map)
+   return -1;
+
+   return 0;
+}
+
 /* stub for now */
 int vas_window_reset(struct vas_instance *vinst, int winid)
 {
-- 
2.7.4



[PATCH v2 00/10] Enable VAS

2017-01-25 Thread Sukadev Bhattiprolu
Power9 introduces a hardware subsystem referred to as the Virtual
Accelerator Switchboard (VAS). VAS allows kernel subsystems and user
space processes to directly access the Nest Accelerator (NX) engines
which implement compression and encryption algorithms in the hardware.

NX has been in Power processors since Power7+, but access to the NX
engines was through the 'icswx' instruction which is only available
to the kernel/hypervisor. Starting with Power9, access to the NX
engines is provided to both kernel and user space processes through
VAS.

The switchboard (i.e VAS) multiplexes accesses between "receivers" and
"senders", where the "receivers" are typically the NX engines and
"senders" are the kernel subsystems and user processors that wish to
access the receivers (NX engines).  Once a sender is "connected" to
a receiver through the switchboard, the sender submit compression/
encryption requests to the hardware using the new (PowerISA 3.0)
"copy" and "paste" instructions.

In the initial OPAL and PowerNV kernel patchsets, the "senders" can
only be kernel subsystems (eg NX-842 driver). A follow-on patch set 
will allow senders to be user-space processes.

This kernel patch set configures the VAS subsystems and provides
kernel interfaces to drivers like NX-842 to open receive and send
windows in VAS and to submit requests to the NX engine.

This patch set that has been tested in a Simics Power9 environment using
a modified NX-842 kernel driver and a compression self-test module from
Power8. The corresponding OPAL patchset for VAS support was posted to
skiboot mailing list:

https://lists.ozlabs.org/pipermail/skiboot/2017-January/006193.html

OPAL and kernel patchsets for NX-842 driver will be posted separately.
All four patchsets are needed to effectively use VAS/NX in Power9.

Thanks to input from Ben Herrenschmidt, Michael Neuling, Michael Ellerman
and Haren Myneni.

Changelog[v2]
- Use vas-id, HVWC, UWC and paste address, entries from device tree
  rather than defining/computing them in kernel and reorg code.

Sukadev Bhattiprolu (10):
  VAS: Define macros, register fields and structures
  Move GET_FIELD/SET_FIELD to vas.h
  VAS: Define vas_init() and vas_exit()
  VAS: Define helpers for access MMIO regions
  VAS: Define helpers to init window context
  VAS: Define helpers to alloc/free windows
  VAS: Define vas_rx_win_open() interface
  VAS: Define vas_win_close() interface
  VAS: Define vas_tx_win_open()
  VAS: Define copy/paste interfaces

 MAINTAINERS|   6 +
 arch/powerpc/include/asm/reg.h |   1 +
 arch/powerpc/include/asm/vas.h | 141 ++
 drivers/crypto/nx/nx-842-powernv.c |   1 +
 drivers/crypto/nx/nx-842.h |   5 -
 drivers/misc/Kconfig   |   1 +
 drivers/misc/Makefile  |   1 +
 drivers/misc/vas/Kconfig   |  20 +
 drivers/misc/vas/Makefile  |   3 +
 drivers/misc/vas/copy-paste.h  |  74 +++
 drivers/misc/vas/vas-internal.h| 467 ++
 drivers/misc/vas/vas-window.c  | 950 +
 drivers/misc/vas/vas.c | 156 ++
 13 files changed, 1821 insertions(+), 5 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vas.h
 create mode 100644 drivers/misc/vas/Kconfig
 create mode 100644 drivers/misc/vas/Makefile
 create mode 100644 drivers/misc/vas/copy-paste.h
 create mode 100644 drivers/misc/vas/vas-internal.h
 create mode 100644 drivers/misc/vas/vas-window.c
 create mode 100644 drivers/misc/vas/vas.c

-- 
2.7.4



[PATCH v2 03/10] VAS: Define vas_init() and vas_exit()

2017-01-25 Thread Sukadev Bhattiprolu
Implement vas_init() and vas_exit() functions for a new VAS module.
This VAS module is essentially a library for other device drivers
and kernel users of the NX coprocessors like NX-842 and NX-GZIP.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]:
- Get HVWC, UWC and window address parameters from device tree.
---
 arch/powerpc/include/asm/reg.h  |   1 +
 drivers/misc/Kconfig|   1 +
 drivers/misc/Makefile   |   1 +
 drivers/misc/vas/Kconfig|  20 ++
 drivers/misc/vas/Makefile   |   3 +
 drivers/misc/vas/vas-internal.h |   3 +
 drivers/misc/vas/vas-window.c   |  19 +
 drivers/misc/vas/vas.c  | 156 
 8 files changed, 204 insertions(+)
 create mode 100644 drivers/misc/vas/Kconfig
 create mode 100644 drivers/misc/vas/Makefile
 create mode 100644 drivers/misc/vas/vas-window.c
 create mode 100644 drivers/misc/vas/vas.c

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 9e1499f..9cba3c18 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1210,6 +1210,7 @@
 #define PVR_POWER8E0x004B
 #define PVR_POWER8NVL  0x004C
 #define PVR_POWER8 0x004D
+#define PVR_POWER9 0x004E
 #define PVR_BE 0x0070
 #define PVR_PA6T   0x0090
 
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 64971ba..c84ab67 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -778,4 +778,5 @@ source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
 source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/vas/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 3198336..97a076e 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_GENWQE)  += genwqe/
 obj-$(CONFIG_ECHO) += echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)  += vexpress-syscfg.o
 obj-$(CONFIG_CXL_BASE) += cxl/
+obj-$(CONFIG_VAS)  += vas/
 obj-$(CONFIG_PANEL) += panel.o
 
 lkdtm-$(CONFIG_LKDTM)  += lkdtm_core.o
diff --git a/drivers/misc/vas/Kconfig b/drivers/misc/vas/Kconfig
new file mode 100644
index 000..c212cea
--- /dev/null
+++ b/drivers/misc/vas/Kconfig
@@ -0,0 +1,20 @@
+#
+# IBM Virtual Accelarator Switchboard (VAS) compatible devices
+#depends on PPC_POWERNV && PCI_MSI && EEH
+#
+
+config VAS
+   tristate "Support for IBM Virtual Accelerator Switchboard (VAS)"
+   depends on PPC_POWERNV
+   default n
+   help
+ Select this option to enable driver support for IBM Virtual
+ Accelerator Switchboard (VAS).
+ VAS allows accelerators in co processors like NX-842 to be
+ directly available to a user process.  This driver enables
+ userspace programs to access these accelerators via
+ /dev/vas/vas-nxM.N devices.
+
+ VAS adapters are found in POWER9 based systems.
+
+ If unsure, say N.
diff --git a/drivers/misc/vas/Makefile b/drivers/misc/vas/Makefile
new file mode 100644
index 000..7dd7139
--- /dev/null
+++ b/drivers/misc/vas/Makefile
@@ -0,0 +1,3 @@
+ccflags-y  := $(call cc-disable-warning, 
unused-const-variable)
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+obj-$(CONFIG_VAS)  += vas.o vas-window.o
diff --git a/drivers/misc/vas/vas-internal.h b/drivers/misc/vas/vas-internal.h
index aa4e781..61cfaad 100644
--- a/drivers/misc/vas/vas-internal.h
+++ b/drivers/misc/vas/vas-internal.h
@@ -380,4 +380,7 @@ struct vas_winctx {
enum vas_notify_after_count notify_after_count;
 };
 
+extern int vas_initialized;
+extern int vas_window_reset(struct vas_instance *vinst, int winid);
+extern struct vas_instance *find_vas_instance(int vasid);
 #endif
diff --git a/drivers/misc/vas/vas-window.c b/drivers/misc/vas/vas-window.c
new file mode 100644
index 000..468f3bf
--- /dev/null
+++ b/drivers/misc/vas/vas-window.c
@@ -0,0 +1,19 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include "vas-internal.h"
+
+/* stub for now */
+int vas_window_reset(struct vas_instance *vinst, int winid)
+{
+   return 0;
+}
diff --git a/drivers/misc/vas/vas.c b/drivers/misc/vas/vas.c
new file mode 100644
index 000..1e28d10
--- /dev/null
+++ b/drivers/misc/vas/vas.c
@@ -0,0 +1,156 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 

[PATCH v2 02/10] Move GET_FIELD/SET_FIELD to vas.h

2017-01-25 Thread Sukadev Bhattiprolu
Move the GET_FIELD and SET_FIELD macros to vas.h as VAS and other
users of VAS, including NX-842 can use those macros.

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/vas.h | 8 
 drivers/crypto/nx/nx-842-powernv.c | 1 +
 drivers/crypto/nx/nx-842.h | 5 -
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
index 1c10437..fef9e87 100644
--- a/arch/powerpc/include/asm/vas.h
+++ b/arch/powerpc/include/asm/vas.h
@@ -37,4 +37,12 @@ enum vas_thresh_ctl {
VAS_THRESH_FIFO_GT_EIGHTH_FULL,
 };
 
+/*
+ * Get/Set bit fields
+ */
+#define GET_FIELD(m, v)(((v) & (m)) >> MASK_LSH(m))
+#define MASK_LSH(m)(__builtin_ffsl(m) - 1)
+#define SET_FIELD(m, v, val)   \
+   (((v) & ~(m)) | typeof(v))(val)) << MASK_LSH(m)) & (m)))
+
 #endif
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 1710f80..ea6fb6c 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Dan Streetman ");
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index a4eee3b..30929bd 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -100,11 +100,6 @@ static inline unsigned long nx842_get_pa(void *addr)
return page_to_phys(vmalloc_to_page(addr)) + offset_in_page(addr);
 }
 
-/* Get/Set bit fields */
-#define MASK_LSH(m)(__builtin_ffsl(m) - 1)
-#define GET_FIELD(v, m)(((v) & (m)) >> MASK_LSH(m))
-#define SET_FIELD(v, m, val)   (((v) & ~(m)) | (((val) << MASK_LSH(m)) & (m)))
-
 /**
  * This provides the driver's constraints.  Different nx842 implementations
  * may have varying requirements.  The constraints are:
-- 
2.7.4



gcc trunk fails to build kernel on PowerPC64 due to oprofile warnings

2017-01-25 Thread Anton Blanchard
Hi,

gcc trunk has failed to build PowerPC64 kernels for a month or so. The issue
is in oprofile, which is common code but ends up being sucked into
arch/powerpc and therefore subject to the -Werror applied to arch/powerpc:
 
linux/arch/powerpc/oprofile/../../../drivers/oprofile/oprofile_stats.c: In 
function ‘oprofile_create_stats_files’:
linux/arch/powerpc/oprofile/../../../drivers/oprofile/oprofile_stats.c:55:25: 
error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes 
into a region of size 7 [-Werror=format-truncation=]
   snprintf(buf, 10, "cpu%d", i);
 ^~
linux/arch/powerpc/oprofile/../../../drivers/oprofile/oprofile_stats.c:55:21: 
note: using the range [1, -2147483648] for directive argument
   snprintf(buf, 10, "cpu%d", i);
 ^~~
linux/arch/powerpc/oprofile/../../../drivers/oprofile/oprofile_stats.c:55:3: 
note: format output between 5 and 15 bytes into a destination of size 10
   snprintf(buf, 10, "cpu%d", i);
   ^
  LD  crypto/async_tx/built-in.o
  CC  lib/random32.o
cc1: all warnings being treated as errors

Anton


[PATCH net 5/5] ibmvnic: init completion struct before requesting long term mapped buffers

2017-01-25 Thread Thomas Falcon
Initialize this completion structure before requesting that
a buffer be long-term mapped . This fix resolves a bug where firmware
sends a response before the structure is initialized.

Signed-off-by: John Allen 
Signed-off-by: Nathan Fontenot 
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index ec6c5fe..d1ffc61 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -189,9 +189,9 @@ static int alloc_long_term_buff(struct ibmvnic_adapter 
*adapter,
}
ltb->map_id = adapter->map_id;
adapter->map_id++;
+   init_completion(>fw_done);
send_request_map(adapter, ltb->addr,
 ltb->size, ltb->map_id);
-   init_completion(>fw_done);
wait_for_completion(>fw_done);
return 0;
 }
-- 
1.8.3.1



[PATCH net 4/5] ibmvnic: Fix endian errors in error reporting output

2017-01-25 Thread Thomas Falcon
Error reports received from firmware were not being converted from
big endian values, leading to bogus error codes reported on little
endian systems.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index ec4eaed..ec6c5fe 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2185,12 +2185,12 @@ static void handle_error_info_rsp(union ibmvnic_crq 
*crq,
 
if (!found) {
dev_err(dev, "Couldn't find error id %x\n",
-   crq->request_error_rsp.error_id);
+   be32_to_cpu(crq->request_error_rsp.error_id));
return;
}
 
dev_err(dev, "Detailed info for error id %x:",
-   crq->request_error_rsp.error_id);
+   be32_to_cpu(crq->request_error_rsp.error_id));
 
for (i = 0; i < error_buff->len; i++) {
pr_cont("%02x", (int)error_buff->buff[i]);
@@ -2269,8 +2269,8 @@ static void handle_error_indication(union ibmvnic_crq 
*crq,
dev_err(dev, "Firmware reports %serror id %x, cause %d\n",
crq->error_indication.
flags & IBMVNIC_FATAL_ERROR ? "FATAL " : "",
-   crq->error_indication.error_id,
-   crq->error_indication.error_cause);
+   be32_to_cpu(crq->error_indication.error_id),
+   be16_to_cpu(crq->error_indication.error_cause));
 
error_buff = kmalloc(sizeof(*error_buff), GFP_ATOMIC);
if (!error_buff)
-- 
1.8.3.1



[PATCH net 2/5] ibmvnic: Fix MTU settings

2017-01-25 Thread Thomas Falcon
In the current driver, the MTU is set to the maximum value
capable for the backing device. This patch sets the MTU to the
default value for a Linux net device. It also corrects a
discrepancy between MTU values received from firmware, which includes
the ethernet header length, and net device MTU values. Finally, it removes
redundant min/max MTU assignments after device initialization.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 48debde2..f95f6a4 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1496,7 +1496,7 @@ static void init_sub_crqs(struct ibmvnic_adapter 
*adapter, int retry)
adapter->req_rx_queues = adapter->opt_rx_comp_queues;
adapter->req_rx_add_queues = adapter->max_rx_add_queues;
 
-   adapter->req_mtu = adapter->max_mtu;
+   adapter->req_mtu = adapter->netdev->mtu + ETH_HLEN;
}
 
total_queues = adapter->req_tx_queues + adapter->req_rx_queues;
@@ -2626,12 +2626,12 @@ static void handle_query_cap_rsp(union ibmvnic_crq *crq,
break;
case MIN_MTU:
adapter->min_mtu = be64_to_cpu(crq->query_capability.number);
-   netdev->min_mtu = adapter->min_mtu;
+   netdev->min_mtu = adapter->min_mtu - ETH_HLEN;
netdev_dbg(netdev, "min_mtu = %lld\n", adapter->min_mtu);
break;
case MAX_MTU:
adapter->max_mtu = be64_to_cpu(crq->query_capability.number);
-   netdev->max_mtu = adapter->max_mtu;
+   netdev->max_mtu = adapter->max_mtu - ETH_HLEN;
netdev_dbg(netdev, "max_mtu = %lld\n", adapter->max_mtu);
break;
case MAX_MULTICAST_FILTERS:
@@ -3672,9 +3672,7 @@ static void handle_crq_init_rsp(struct work_struct *work)
goto task_failed;
 
netdev->real_num_tx_queues = adapter->req_tx_queues;
-   netdev->mtu = adapter->req_mtu;
-   netdev->min_mtu = adapter->min_mtu;
-   netdev->max_mtu = adapter->max_mtu;
+   netdev->mtu = adapter->req_mtu - ETH_HLEN;
 
if (adapter->failover) {
adapter->failover = false;
@@ -3814,7 +3812,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
}
 
netdev->real_num_tx_queues = adapter->req_tx_queues;
-   netdev->mtu = adapter->req_mtu;
+   netdev->mtu = adapter->req_mtu - ETH_HLEN;
 
rc = register_netdev(netdev);
if (rc) {
-- 
1.8.3.1



[PATCH net 3/5] ibmvnic: Fix endian error when requesting device capabilites

2017-01-25 Thread Thomas Falcon
When a IBM VNIC client driver requests a faulty device setting, the
server returns an acceptable value for the client to request.
This 64 bit value was incorrectly being swapped as a 32 bit value,
resulting in loss of data. This patch corrects that by using
the 64 bit swap function.

Signed-off-by: John Allen 
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index f95f6a4..ec4eaed 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2388,10 +2388,10 @@ static void handle_request_cap_rsp(union ibmvnic_crq 
*crq,
case PARTIALSUCCESS:
dev_info(dev, "req=%lld, rsp=%ld in %s queue, retrying.\n",
 *req_value,
-(long int)be32_to_cpu(crq->request_capability_rsp.
+(long int)be64_to_cpu(crq->request_capability_rsp.
   number), name);
release_sub_crqs_no_irqs(adapter);
-   *req_value = be32_to_cpu(crq->request_capability_rsp.number);
+   *req_value = be64_to_cpu(crq->request_capability_rsp.number);
init_sub_crqs(adapter, 1);
return;
default:
-- 
1.8.3.1



[PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-25 Thread Thomas Falcon
Move most interrupt handler processing into a tasklet, and
introduce a delay after re-enabling interrupts to fix timing
issues encountered in hardware testing.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 21 +++--
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index c125966..09071bf 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3414,6 +3414,18 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
 {
struct ibmvnic_adapter *adapter = instance;
+   unsigned long flags;
+
+   spin_lock_irqsave(>crq.lock, flags);
+   vio_disable_interrupts(adapter->vdev);
+   tasklet_schedule(>tasklet);
+   spin_unlock_irqrestore(>crq.lock, flags);
+   return IRQ_HANDLED;
+}
+
+static void ibmvnic_tasklet(void *data)
+{
+   struct ibmvnic_adapter *adapter = data;
struct ibmvnic_crq_queue *queue = >crq;
struct vio_dev *vdev = adapter->vdev;
union ibmvnic_crq *crq;
@@ -3421,7 +3433,6 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
bool done = false;
 
spin_lock_irqsave(>lock, flags);
-   vio_disable_interrupts(vdev);
while (!done) {
/* Pull all the valid messages off the CRQ */
while ((crq = ibmvnic_next_crq(adapter)) != NULL) {
@@ -3429,6 +3440,8 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
crq->generic.first = 0;
}
vio_enable_interrupts(vdev);
+   /* delay in case of firmware hiccup */
+   mdelay(10);
crq = ibmvnic_next_crq(adapter);
if (crq) {
vio_disable_interrupts(vdev);
@@ -3439,7 +3452,6 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
}
}
spin_unlock_irqrestore(>lock, flags);
-   return IRQ_HANDLED;
 }
 
 static int ibmvnic_reenable_crq_queue(struct ibmvnic_adapter *adapter)
@@ -3494,6 +3506,7 @@ static void ibmvnic_release_crq_queue(struct 
ibmvnic_adapter *adapter)
 
netdev_dbg(adapter->netdev, "Releasing CRQ\n");
free_irq(vdev->irq, adapter);
+   tasklet_kill(>tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
@@ -3539,6 +3552,9 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
 
retrc = 0;
 
+   tasklet_init(>tasklet, (void *)ibmvnic_tasklet,
+(unsigned long)adapter);
+
netdev_dbg(adapter->netdev, "registering irq 0x%x\n", vdev->irq);
rc = request_irq(vdev->irq, ibmvnic_interrupt, 0, IBMVNIC_NAME,
 adapter);
@@ -3560,6 +3576,7 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
return retrc;
 
 req_irq_failed:
+   tasklet_kill(>tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index dd775d9..0d0edc3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1049,5 +1049,6 @@ struct ibmvnic_adapter {
 
struct work_struct vnic_crq_init;
struct work_struct ibmvnic_xport;
+   struct tasklet_struct tasklet;
bool failover;
 };
-- 
2.7.4



Re: [PATCH v4 2/5] ia64: reuse append_elf_note() and final_note() functions

2017-01-25 Thread Hari Bathini



On Tuesday 24 January 2017 11:53 PM, Tony Luck wrote:

On Tue, Jan 24, 2017 at 10:11 AM, Hari Bathini
 wrote:


Hello IA64 folks,

Could you please review this patch..?

It looks OK in principal.  My lab is in partial disarray at the
moment (just got back from a sabbatical) so I can't test
build and boot. Have you cross-compiled it (or gotten a success
build report from zero-day)?


I haven't gotten a success/failure build report from zero-day. Not sure 
what to make of it.

But I did try cross-compiling and it was successful. Should that do?

Thanks
Hari



If you have ... then add an Acked-by: Tony Luck 

-Tony





[PATCH] tty: serial: constify uart_ops structures

2017-01-25 Thread Bhumika Goyal
Declare uart_ops structures as const as they are only stored in the ops
field of an uart_port structure. This field is of type const, so
uart_ops structures having this property can be made const too.

File size details before and after patching.
First line of every .o file shows the file size before patching
and second line shows the size after patching.

   textdata bss dec hex filename

   2977 456  643497 da9 drivers/tty/serial/amba-pl010.o
   3169 272  643505 db1 drivers/tty/serial/amba-pl010.o

   3109 456   03565 ded drivers/tty/serial/efm32-uart.o
   3301 272   03573 df5 drivers/tty/serial/efm32-uart.o

  10668 753   1   114222c9e drivers/tty/serial/icom.o
  10860 561   1   114222c9e drivers/tty/serial/icom.o

  23904 408   8   243205f00 drivers/tty/serial/ioc3_serial.o
  24088 224   8   243205f00 drivers/tty/serial/ioc3_serial.o

  10516 560   4   110802b48 drivers/tty/serial/ioc4_serial.o
  10709 368   4   110812b49 drivers/tty/serial/ioc4_serial.o

   7853 6481216971725f5 drivers/tty/serial/mpsc.o
   8037 4561216970925ed drivers/tty/serial/mpsc.o

  10248 456   0   1070429d0 drivers/tty/serial/omap-serial.o
  10440 272   0   1071229d8 drivers/tty/serial/omap-serial.o

   8122 5321984   10638298e drivers/tty/serial/pmac_zilog.o
   8306 3401984   106302986 drivers/tty/serial/pmac_zilog.o

   3808 456   0426410a8 drivers/tty/serial/pxa.o
   4000 264   0426410a8 drivers/tty/serial/pxa.o

  217813864   0   25645642d drivers/tty/serial/serial-tegra.o
  220373608   0   25645642d drivers/tty/serial/serial-tegra.o

   2481 456  963033 bd9 drivers/tty/serial/sprd_serial.o
   2673 272  963041 be1 drivers/tty/serial/sprd_serial.o

   5534 300 512634618ca drivers/tty/serial/vr41xx_siu.o
   5630 204 512634618ca drivers/tty/serial/vr41xx_siu.o

   67301576 128843420f2 drivers/tty/serial/vt8500_serial.o
   69861320 128843420f2 drivers/tty/serial/vt8500_serial.o

Cross compiled for mips architecture.

   3005 488   03493 da5 drivers/tty/serial/pnx8xxx_uart.o
   3189 304   03493 da5 drivers/tty/serial/pnx8xxx_uart.o

   4272 196105655241594 drivers/tty/serial/dz.o
   4368 100105655241594 drivers/tty/serial/dz.o

   6551 144  1667111a37 drivers/tty/serial/ip22zilog.o
   6647  48  1667111a37 drivers/tty/serial/ip22zilog.o

   9612 4281520   115602d28 drivers/tty/serial/serial_txx9.o
   9708 3321520   115602d28 drivers/tty/serial/serial_txx9.o

   4156 296  1644681174 drivers/tty/serial/ar933x_uart.o
   4252 200  1644681174 drivers/tty/serial/ar933x_uart.o

Cross compiled for arm archiecture.

  117161780  44   1354034e4 drivers/tty/serial/sirfsoc_uart.o
  118081688  44   1354034e4 drivers/tty/serial/sirfsoc_uart.o

  13352 596  56   1400436b4 drivers/tty/serial/amba-pl011.o
  13444 504  56   1400436b4 drivers/tty/serial/amba-pl011.o

Cross compiled for sparc architecture.

   4664 528  3252241468 drivers/tty/serial/sunhv.o
   4848 344  3252241468 drivers/tty/serial/sunhv.o

   8080 332  28844020f8 drivers/tty/serial/sunzilog.o
   8184 228  28844020f8 drivers/tty/serial/sunzilog.o

Cross compiled for ia64 architecture.

  10226 549 472   112472bef drivers/tty/serial/sn_console.o
  10414 365 472   112512bf3 drivers/tty/serial/sn_console.o

The files drivers/tty/serial/zs.o, drivers/tty/serial/lpc32xx_hs.o and
drivers/tty/serial/lantiq.o did not compile.

Signed-off-by: Bhumika Goyal 
---
 drivers/tty/serial/amba-pl010.c| 2 +-
 drivers/tty/serial/amba-pl011.c| 2 +-
 drivers/tty/serial/ar933x_uart.c   | 2 +-
 drivers/tty/serial/dz.c| 2 +-
 drivers/tty/serial/efm32-uart.c| 2 +-
 drivers/tty/serial/icom.c  | 2 +-
 drivers/tty/serial/ioc3_serial.c   | 2 +-
 drivers/tty/serial/ioc4_serial.c   | 2 +-
 drivers/tty/serial/ip22zilog.c | 2 +-
 drivers/tty/serial/lantiq.c| 2 +-
 drivers/tty/serial/lpc32xx_hs.c| 2 +-
 drivers/tty/serial/mpsc.c  | 2 +-
 drivers/tty/serial/omap-serial.c   | 2 +-
 drivers/tty/serial/pmac_zilog.c| 2 +-
 drivers/tty/serial/pnx8xxx_uart.c  | 2 +-
 drivers/tty/serial/pxa.c   | 2 +-
 drivers/tty/serial/serial-tegra.c  | 2 +-
 drivers/tty/serial/serial_txx9.c   | 2 +-
 drivers/tty/serial/sirfsoc_uart.c  | 2 +-
 drivers/tty/serial/sn_console.c| 2 +-
 drivers/tty/serial/sprd_serial.c   | 2 +-
 drivers/tty/serial/sunhv.c | 2 +-
 

[PATCH] powerpc/mm: use the correct pointer when setting a 2M pte

2017-01-25 Thread Reza Arbab
When setting a 2M pte, radix__map_kernel_page() is using the address

ptep = (pte_t *)pudp;

Fix this conversion to use pmdp instead. Use pmdp_ptep() to do this
instead of casting the pointer.

Signed-off-by: Reza Arbab 
---
 arch/powerpc/mm/pgtable-radix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index cfa53cc..34f1a0d 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -65,7 +65,7 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
if (!pmdp)
return -ENOMEM;
if (map_page_size == PMD_SIZE) {
-   ptep = (pte_t *)pudp;
+   ptep = pmdp_ptep(pmdp);
goto set_the_pte;
}
ptep = pte_alloc_kernel(pmdp, ea);
@@ -90,7 +90,7 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
}
pmdp = pmd_offset(pudp, ea);
if (map_page_size == PMD_SIZE) {
-   ptep = (pte_t *)pudp;
+   ptep = pmdp_ptep(pmdp);
goto set_the_pte;
}
if (!pmd_present(*pmdp)) {
-- 
1.8.3.1



Re: BUILD_BUG_ON(!__builtin_constant_p(feature)) breaks bcc trace tool

2017-01-25 Thread Arnd Bergmann
On Wed, Jan 25, 2017 at 11:35 AM, David Laight  wrote:
> From: Michael Ellerman

>> #define inlineinline  __attribute__((always_inline)) 
>> notrace
>>
>> So in fact every inline function is marked always_inline all the time,
>> which seems dubious.
>
> I've had to do that in the past to get gcc to inline some small leaf functions
> that were only called once.
> (That was for some embedded code where I don't actually want any function 
> calls
> at all.)
>
> To my mind 'inline' should mean 'always_inline' since you need to explicitly
> stop functions being inlined even when not marked 'inline'.

On x86, this is configurable using OPTIMIZE_INLINING, but IIRC no
other architecture
supports it in mainline Linux. I have played a bit with enabling it on
ARM, which
showed a couple of build warnings and errors from the changed inlining, but
they were all fixable. I have not submitted that since I have not actually been
able to do extensive runtime testing on it.

It may turn out to be worthwhile for powerpc, where you have a much more limited
set of configurations you care about and you already do some regular testing.
Potentially this is more efficient than the current default, and any
function that
actually requires being forced inline can be annotated as __always_inline rather
than plain inline.

 Arnd


[PowerPC] [next-20170125] WARNING at kernel/sched/sched.h:804 set_next_entity+0xb88/0xca0

2017-01-25 Thread abdul

Hi,

Today's next tree has warning messages in dmesg when running rcutorture 
tests.


Machine : Power8 PowerVM LPAR
Build kernel : 4.10.0-rc5-next-20170125

Steps to recreate:
1.modprobe rcutorture
2.32 CPUS; offline 0-15 CPUS keeping 16-31 CPUs online and vice versa in 
a loop

3. modprobe -r rcutorture

trace messages:
rcu-torture: Creating rcu_torture_cbflood task
rcu-torture: rcu_torture_cbflood task started
rcu-torture: rcu_torture_cbflood task started
rq->clock_update_flags < RQCF_ACT_SKIP
[ cut here ]
WARNING: CPU: 1 PID: 8780 at kernel/sched/sched.h:804 
set_next_entity+0xb88/0xca0
Modules linked in: rcutorture torture tun bridge stp llc kvm xt_tcpudp 
ipt_REJECT nf_reject_ipv4 xt_conntrack nfnetlink iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_mangle iptable_filter vmx_crypto pseries_rng rng_core 
binfmt_misc nfsd ip_tables x_tables autofs4
CPU: 1 PID: 8780 Comm: torture_shuffle Not tainted 
4.10.0-rc5-next-20170125-autotest #1

task: c003a652 task.stack: c003aa3e4000
NIP: c16c1a18 LR: c16c1a14 CTR: c1b38d30
REGS: c003aa3e7900 TRAP: 0700   Not tainted 
(4.10.0-rc5-next-20170125-autotest)

MSR: 8282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  CR: 28042022  XER: 
CFAR: c1f8c204 SOFTE: 0
GPR00: c16c1a14 c003aa3e7b80 c2631500 0026
GPR04:  0006 6574616470755f6b 3c207367616c665f
GPR08:  c230bdc0 0003bda23000 01dd
GPR12: 2200 cea80400 c169e0d8 c003a643ff00
GPR16:    
GPR20:   0001 c264aaa8
GPR24:  d6370cb8 d6370400 c264aaa8
GPR28: d63712f0 c003bfd75a80 c003bfd75a00 c003acd74a80
NIP [c16c1a18] set_next_entity+0xb88/0xca0
LR [c16c1a14] set_next_entity+0xb84/0xca0
Call Trace:
[c003aa3e7b80] [c16c1a14] set_next_entity+0xb84/0xca0 
(unreliable)

[c003aa3e7c30] [c16c1b6c] set_curr_task_fair+0x3c/0x60
[c003aa3e7c60] [c16ace64] do_set_cpus_allowed+0xd4/0x1c0
[c003aa3e7ca0] [c16ad6e8] __set_cpus_allowed_ptr+0x128/0x290
[c003aa3e7d10] [d636eaec] 
torture_kthread_stopping+0x1dc/0x710 [torture]

[c003aa3e7dc0] [c169e21c] kthread+0x14c/0x190
[c003aa3e7e30] [c15cbc60] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
fbdf0068 4bfff4fc 3d42fff2 892af4ab 2f89 40fef8a4 3921 3c62ffb4
38636328 992af4ab 488ca7b5 6000 <0fe0> 4bfff884 3d02fff2 8928f4ab
---[ end trace 33443e753ef16727 ]---
rcu-torture: Stopping rcu_torture_reader
rcu-torture: Stopping rcu_torture_reader
rcu-torture: Stopping torture_shuffle task
rcu-torture: Stopping rcu_torture_reader

Config file attached
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.9.0-rc8 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
# CONFIG_POWER7_CPU is not set
CONFIG_POWER8_CPU=y
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
# CONFIG_CPU_BIG_ENDIAN is not set
CONFIG_CPU_LITTLE_ENDIAN=y
CONFIG_PPC64_BOOT_WRAPPER=y
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
CONFIG_GENERIC_CSUM=y
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_UDBG_16550=y
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_EPAPR_BOOT=y
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
# CONFIG_PPC_OF_PLATFORM_PCI is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PPC_EMULATE_SSTEP=y
CONFIG_ZONE_DMA32=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZI

[PowerPC] [next-20170125] WARNING at kernel/sched/sched.h:804 set_next_entity+0xb88/0xca0

2017-01-25 Thread abdul

Hi,

Today's next tree has warning messages in dmesg when running rcutorture 
tests.


Machine : Power8 PowerVM LPAR
Build kernel : 4.10.0-rc5-next-20170125

Steps to recreate:
1.modprobe rcutorture
2.32 CPUS; offline 0-15 CPUS keeping 16-31 CPUs online and vice versa in 
a loop

3. modprobe -r rcutorture

trace messages:
rcu-torture: Creating rcu_torture_cbflood task
rcu-torture: rcu_torture_cbflood task started
rcu-torture: rcu_torture_cbflood task started
rq->clock_update_flags < RQCF_ACT_SKIP
[ cut here ]
WARNING: CPU: 1 PID: 8780 at kernel/sched/sched.h:804 
set_next_entity+0xb88/0xca0
Modules linked in: rcutorture torture tun bridge stp llc kvm xt_tcpudp 
ipt_REJECT nf_reject_ipv4 xt_conntrack nfnetlink iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_mangle iptable_filter vmx_crypto pseries_rng rng_core 
binfmt_misc nfsd ip_tables x_tables autofs4
CPU: 1 PID: 8780 Comm: torture_shuffle Not tainted 
4.10.0-rc5-next-20170125-autotest #1

task: c003a652 task.stack: c003aa3e4000
NIP: c16c1a18 LR: c16c1a14 CTR: c1b38d30
REGS: c003aa3e7900 TRAP: 0700   Not tainted 
(4.10.0-rc5-next-20170125-autotest)

MSR: 8282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  CR: 28042022  XER: 
CFAR: c1f8c204 SOFTE: 0
GPR00: c16c1a14 c003aa3e7b80 c2631500 0026
GPR04:  0006 6574616470755f6b 3c207367616c665f
GPR08:  c230bdc0 0003bda23000 01dd
GPR12: 2200 cea80400 c169e0d8 c003a643ff00
GPR16:    
GPR20:   0001 c264aaa8
GPR24:  d6370cb8 d6370400 c264aaa8
GPR28: d63712f0 c003bfd75a80 c003bfd75a00 c003acd74a80
NIP [c16c1a18] set_next_entity+0xb88/0xca0
LR [c16c1a14] set_next_entity+0xb84/0xca0
Call Trace:
[c003aa3e7b80] [c16c1a14] set_next_entity+0xb84/0xca0 
(unreliable)

[c003aa3e7c30] [c16c1b6c] set_curr_task_fair+0x3c/0x60
[c003aa3e7c60] [c16ace64] do_set_cpus_allowed+0xd4/0x1c0
[c003aa3e7ca0] [c16ad6e8] __set_cpus_allowed_ptr+0x128/0x290
[c003aa3e7d10] [d636eaec] 
torture_kthread_stopping+0x1dc/0x710 [torture]

[c003aa3e7dc0] [c169e21c] kthread+0x14c/0x190
[c003aa3e7e30] [c15cbc60] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
fbdf0068 4bfff4fc 3d42fff2 892af4ab 2f89 40fef8a4 3921 3c62ffb4
38636328 992af4ab 488ca7b5 6000 <0fe0> 4bfff884 3d02fff2 8928f4ab
---[ end trace 33443e753ef16727 ]---
rcu-torture: Stopping rcu_torture_reader
rcu-torture: Stopping rcu_torture_reader
rcu-torture: Stopping torture_shuffle task
rcu-torture: Stopping rcu_torture_reader

Config file attached
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.9.0-rc8 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
# CONFIG_POWER7_CPU is not set
CONFIG_POWER8_CPU=y
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
# CONFIG_CPU_BIG_ENDIAN is not set
CONFIG_CPU_LITTLE_ENDIAN=y
CONFIG_PPC64_BOOT_WRAPPER=y
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
CONFIG_GENERIC_CSUM=y
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_UDBG_16550=y
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_EPAPR_BOOT=y
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
# CONFIG_PPC_OF_PLATFORM_PCI is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PPC_EMULATE_SSTEP=y
CONFIG_ZONE_DMA32=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZI

[PATCH] KVM: PPC: Book3S PR: Refactor program interrupt related code into separate function

2017-01-25 Thread Thomas Huth
The function kvmppc_handle_exit_pr() is quite huge and thus hard to read,
and even contains a "spaghetti-code"-like goto between the different case
labels of the big switch statement. This can be made much more readable
by moving the code related to injecting program interrupts / instruction
emulation into a separate function instead.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/book3s_pr.c | 130 +--
 1 file changed, 65 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 1482961..d4dfc0c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -902,6 +902,69 @@ static void kvmppc_clear_debug(struct kvm_vcpu *vcpu)
}
 }
 
+static int kvmppc_exit_pr_progint(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned int exit_nr)
+{
+   enum emulation_result er;
+   ulong flags;
+   u32 last_inst;
+   int emul, r;
+
+   /*
+* shadow_srr1 only contains valid flags if we came here via a program
+* exception. The other exceptions (emulation assist, FP unavailable,
+* etc.) do not provide flags in SRR1, so use an illegal-instruction
+* exception when injecting a program interrupt into the guest.
+*/
+   if (exit_nr == BOOK3S_INTERRUPT_PROGRAM)
+   flags = vcpu->arch.shadow_srr1 & 0x1full;
+   else
+   flags = SRR1_PROGILL;
+
+   emul = kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
+   if (emul != EMULATE_DONE)
+   return RESUME_GUEST;
+
+   if (kvmppc_get_msr(vcpu) & MSR_PR) {
+#ifdef EXIT_DEBUG
+   pr_info("Userspace triggered 0x700 exception at\n 0x%lx 
(0x%x)\n",
+   kvmppc_get_pc(vcpu), last_inst);
+#endif
+   if ((last_inst & 0xff0007ff) != (INS_DCBZ & 0xfff7)) {
+   kvmppc_core_queue_program(vcpu, flags);
+   return RESUME_GUEST;
+   }
+   }
+
+   vcpu->stat.emulated_inst_exits++;
+   er = kvmppc_emulate_instruction(run, vcpu);
+   switch (er) {
+   case EMULATE_DONE:
+   r = RESUME_GUEST_NV;
+   break;
+   case EMULATE_AGAIN:
+   r = RESUME_GUEST;
+   break;
+   case EMULATE_FAIL:
+   pr_crit("%s: emulation at %lx failed (%08x)\n",
+   __func__, kvmppc_get_pc(vcpu), last_inst);
+   kvmppc_core_queue_program(vcpu, flags);
+   r = RESUME_GUEST;
+   break;
+   case EMULATE_DO_MMIO:
+   run->exit_reason = KVM_EXIT_MMIO;
+   r = RESUME_HOST_NV;
+   break;
+   case EMULATE_EXIT_USER:
+   r = RESUME_HOST_NV;
+   break;
+   default:
+   BUG();
+   }
+
+   return r;
+}
+
 int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
  unsigned int exit_nr)
 {
@@ -1044,71 +1107,8 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
case BOOK3S_INTERRUPT_PROGRAM:
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-   {
-   enum emulation_result er;
-   ulong flags;
-   u32 last_inst;
-   int emul;
-
-program_interrupt:
-   /*
-* shadow_srr1 only contains valid flags if we came here via
-* a program exception. The other exceptions (emulation assist,
-* FP unavailable, etc.) do not provide flags in SRR1, so use
-* an illegal-instruction exception when injecting a program
-* interrupt into the guest.
-*/
-   if (exit_nr == BOOK3S_INTERRUPT_PROGRAM)
-   flags = vcpu->arch.shadow_srr1 & 0x1full;
-   else
-   flags = SRR1_PROGILL;
-
-   emul = kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   if (emul != EMULATE_DONE) {
-   r = RESUME_GUEST;
-   break;
-   }
-
-   if (kvmppc_get_msr(vcpu) & MSR_PR) {
-#ifdef EXIT_DEBUG
-   pr_info("Userspace triggered 0x700 exception at\n 0x%lx 
(0x%x)\n",
-   kvmppc_get_pc(vcpu), last_inst);
-#endif
-   if ((last_inst & 0xff0007ff) !=
-   (INS_DCBZ & 0xfff7)) {
-   kvmppc_core_queue_program(vcpu, flags);
-   r = RESUME_GUEST;
-   break;
-   }
-   }
-
-   vcpu->stat.emulated_inst_exits++;
-   er = kvmppc_emulate_instruction(run, vcpu);
-   switch (er) {
-   case EMULATE_DONE:
-   r = 

"Unable to handle kernel paging request for instruction fetch" on P4080

2017-01-25 Thread Thomas De Schampheleire
Hi,

We are experiencing kernel panics of the type "Unable to handle kernel paging
request for instruction fetch" but are stuck in our analysis. We would
appreciate any help you can give.

The problem occurs from time to time on different instances of a particular
embedded systems. The kernel is very old, 2.6.36.4, and runs on an 8-core
Freescale P4080 QorIQ processor (e500mc-based). Upgrading the kernel to a newer
version is not an option at this point.

Here is an example panic log: (Panic 1)

<1>[2497591.165517] Unable to handle kernel paging request for instruction fetch
<1>[2497591.248027] Faulting instruction address: 0x4837ad68
<4>[2497591.309541] Oops: Kernel access of bad area, sig: 11 [#1]
<4>[2497591.376218] PREEMPT SMP NR_CPUS=8 P4080 DS
<0>[2497591.427290] last sysfs file:
/sys/module/ndps_b_reboot_helper/parameters/panic_counter
<4>[2497591.524180] Modules linked in: uio_generic_driver
reborn_macfilter ndps_b_reboot_helper ramoops cpld_watchdog physmap_of
sysfs_exports write_unlock ndps_a_cpld reborn_class generic_access
<4>[2497591.726423] NIP: 4837ad68 LR: 4837ad69 CTR: c0009c30
<4>[2497591.787899] REGS: e3619cc0 TRAP: 0400   Tainted: GW
(2.6.36.4)
<4>[2497591.871243] MSR: 00029002   CR: 24002844  XER: 
<4>[2497591.946313] TASK = ec2dc600[444] 'watchdog_monito' THREAD:
e3618000 CPU: 0
<4>[2497592.028624] GPR00: 4837ad69 e3619d70 ec2dc600 0001
00260100 0008 0001 039d
<4>[2497592.130812] GPR08:   00029002 c3389040
 100d624c 3b9ac9ff 0201
<4>[2497592.233007] GPR16: 03fb 1009d604 1424120d e3638000
0200 0678 0001 039d
<4>[2497592.335209] GPR24: 1424120d ec1f3800 0001 481c
4819b729 7fe5fb78 7c641b78 3c60c042
<4>[2497592.439505] NIP [4837ad68] 0x4837ad68
<4>[2497592.485353] LR [4837ad69] 0x4837ad69
<4>[2497592.530159] Call Trace:
<4>[2497592.561424] [e3619d70] [4837ad69] 0x4837ad69 (unreliable)
<4>[2497592.628123] Instruction dump:
<4>[2497592.665641]     
  
<4>[2497592.760511]     
  
<0>[2497592.855933] Kernel panic - not syncing: Fatal exception
<4>[2497592.920568] Call Trace:
<4>[2497592.951911] [e3619bf0] [c0006fe4] show_stack+0x78/0x18c (unreliable)
<4>[2497593.030177] [e3619c30] [c0389c34] panic+0xc0/0x1e8
<4>[2497593.089648] [e3619c80] [c000ba9c] die+0x1f0/0x1fc
<4>[2497593.148080] [e3619ca0] [c001109c] bad_page_fault+0xb0/0xc8
<4>[2497593.215872] [e3619cb0] [c000e3f8] handle_page_fault+0x7c/0x80
<4>[2497593.286798] --- Exception: 400 at 0x4837ad68
<4>[2497593.286803] LR = 0x4837ad69

Our analysis of the above panic log lead to the conclusion
that there is a kernel stack corruption, such that when restoring the link
register (LR) from the stack frame near the end of a function, an invalid value
is restored, so that the fetching of that instruction upon return from that
function causes an instruction fetch exception (0x400).
This is based on the fact that registers R27-R31 contain values that are
actually coming from the vmlinux image itself, in this case they are parts
from the function smp_message_recv. In another case, we have seen opcodes from
generic_smp_call_function_interrupt.

In a typical function epilogue (e.g. from doorbell_message_pass) we see:

   0xc0009de8 <+440>:   lwz r0,52(r1) # load value from memory
from address (r1+52) and store into r0
   0xc0009dec <+444>:   lwz r25,20(r1)
   0xc0009df0 <+448>:   lwz r26,24(r1)
   0xc0009df4 <+452>:   mtlrr0# store value of r0 into
LR (link register)
   0xc0009df8 <+456>:   lwz r27,28(r1)# load value from memory
from address (r1+28) and store into r27
   0xc0009dfc <+460>:   lwz r28,32(r1)# r28 <- (r1+32)
   0xc0009e00 <+464>:   lwz r29,36(r1)# r29 <- (r1+36)
   0xc0009e04 <+468>:   lwz r30,40(r1)# r30 <- (r1+40)
   0xc0009e08 <+472>:   lwz r31,44(r1)# r31 <- (r1+44)
   0xc0009e0c <+476>:   addir1,r1,48
   0xc0009e10 <+480>:   blr   # jump to value stored
in link register

So, when the values at r1 + 52 (and other offsets nearby) are corrupted, then
after this code the registers r0, r25-31 contain bogus values. As r0 is used to
provision the link register LR, an instruction fetch abort is unavoidable.

Unfortunately, it is unclear what would cause the corruption of memory.
Moreover, in some of the other kernel panics, we do _not_ see this pattern of
r27-r31 containing values coming from the .text section of the kernel image
itself. There may be different corruption mechanisms at play here.

While the signature of the different panics varies in several ways, the
recurring aspects are:
- always reported on CPU 0 (even though this is an SMP system with 8 running
  cores)
- R0 always contains the same value as LR, confirming that LR is always restored
  from a corrupted R0 value
- the CTR register is always the same 

RE: BUILD_BUG_ON(!__builtin_constant_p(feature)) breaks bcc trace tool

2017-01-25 Thread David Laight
From: Michael Ellerman
> Sent: 24 January 2017 06:16
> Anton Blanchard  writes:
> >> We added:
> >>
> >> BUILD_BUG_ON(!__builtin_constant_p(feature))
> >>
> >> to cpu_has_feature() and mmu_has_feature() in order to catch usage
> >> issues (such as cpu_has_feature(cpu_has_feature(X)). Unfortunately
> >> LLVM isn't smart enough to resolve this, and it errors out.
> >>
> >> I work around it in my clang/LLVM builds of the kernel, but I have
> >> just discovered that it causes a lot of issues for the bcc (eBPF)
> >> trace tool (which uses LLVM).
> >>
> >> How should we work around this? Wrap the checks in !clang perhaps?
> >
> > Looks like it's a weakness in LLVM with inlining:
> >
> > #include 
> >
> > #if 1
> > static inline void foo(unsigned long x)

Does making this 'const unsigned long x' help?
ISTR that making a slight difference to the instruction patterns
gcc would use.

> > {
> > assert(__builtin_constant_p(x));
> > }
> > #else
> > #define foo(X) assert(__builtin_constant_p(X))
> > #endif
...
> #define inlineinline  __attribute__((always_inline)) 
> notrace
> 
> So in fact every inline function is marked always_inline all the time,
> which seems dubious.

I've had to do that in the past to get gcc to inline some small leaf functions
that were only called once.
(That was for some embedded code where I don't actually want any function calls
at all.)

To my mind 'inline' should mean 'always_inline' since you need to explicitly
stop functions being inlined even when not marked 'inline'.

David



Re: [PATCH] powerpc: opal-msglog: Report size of memcons log

2017-01-25 Thread Joel Stanley
On Wed, Jan 25, 2017 at 1:33 PM, Michael Ellerman  wrote:
> Michael Ellerman  writes:
>
>> Joel Stanley  writes:
>>
>>> The OPAL memory console is reported to be size zero, as we do not
>>> initialise the struct attr with any size information due to the size
>>> being variable. This leads users to think that the console is empty.
>>
>> Hmm OK. That is a general property of /proc and /sys files that are
>> dynamically generated, so users probably need to get used to it :)
>>
>>> Instead report the maximum size.
>>
>> But OK. That sounds sane enough. My only worry is that it might confuse
>> some tools, ie. the file claims to be x bytes but is actually smaller.
>> But I guess that can actually happen anyway with any file.
>>
>> So I'll merge this and stop blabbing :)
>
> Hmm, but then I get:
>
>   $ ls -la msglog
>   -r--r--r-- 1 root root 4503599627370496 Jan 25 13:09 msglog
>
> I know firmware likes to spit out lots of messages, but 4PB seems a bit
> large :P
>
> I fixed it with the patch below which I'll fold in, resulting in:
>
>   $ ls -la msglog
>   -r--r--r-- 1 root root 1048576 Jan 25 13:30 msglog
>
>
> Which seems more likely.

My bad. In my excitement I forgot to mention that this was just an
idea, and I had not boot tested it.

Thanks for testing it and finding the bug.

Cheers,

Joel


[PATCH v6 5/5] Documentation:powerpc: Add device-tree bindings for power-mgt

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Document the device-tree bindings defining the the properties under
the @power-mgt node in the device tree that describe the idle states
for Linux running on baremetal POWER servers.

These bindings are documented separately instead of using the the
common idle state bindings since the idle-states on POWER servers
are exposed as property arrays where as the common idle state bindings
expect idle-states to be described as nodes.

Cc: Rob Herring 
Signed-off-by: Gautham R. Shenoy 
---
 .../devicetree/bindings/powerpc/opal/power-mgt.txt | 118 +
 1 file changed, 118 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt

diff --git a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt 
b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
new file mode 100644
index 000..9d619e9
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
@@ -0,0 +1,118 @@
+IBM Power-Management Bindings
+=
+
+Linux running on baremetal POWER machines has access to the processor
+idle states. The description of these idle states is exposed via the
+node @power-mgt in the device-tree by the firmware.
+
+Definitions:
+
+Typically each idle state has the following associated properties:
+
+- name: The name of the idle state as defined by the firmware.
+
+- flags: indicating some aspects of this idle states such as the
+ extent of state-loss, whether timebase is stopped on this
+ idle states and so on. The flag bits are as follows:
+
+- exit-latency: The latency involved in transitioning the state of the
+   CPU from idle to running.
+
+- target-residency: The minimum time that the CPU needs to reside in
+   this idle state in order to accrue power-savings
+   benefit.
+
+Properties
+
+The following properties provide details about the idle states. These
+properties are exposed as arrays. Each entry in the property array
+provides the value of that property for the idle state associated with
+the array index of that entry.
+
+If idle-states are defined, then the properties
+"ibm,cpu-idle-state-names" and "ibm,cpu-idle-state-flags" are
+required. The other properties are required unless mentioned
+otherwise. The length of all the property arrays must be the same.
+
+- ibm,cpu-idle-state-names:
+   Array of strings containing the names of the idle states.
+
+- ibm,cpu-idle-state-flags:
+   Array of unsigned 32-bit values containing the values of the
+   flags associated with the the aforementioned idle-states. The
+   flag bits are as follows:
+   0x0001 /* Decrementer would stop */
+   0x0002 /* Needs timebase restore */
+   0x1000 /* Restore GPRs like nap */
+   0x2000 /* Restore hypervisor resource from PACA pointer */
+   0x4000 /* Program PORE to restore PACA pointer */
+   0x0001 /* This is a nap state (POWER7,POWER8) */
+   0x0002 /* This is a fast-sleep state (POWER8)*/
+   0x0004 /* This is a winkle state (POWER8) */
+   0x0008 /* This is a fast-sleep state which requires a */
+  /* software workaround for restoring the */
+  /* timebase (POWER8) */
+   0x0080 /* This state uses SPR PMICR instruction */
+  /* (POWER8)*/
+   0x0010 /* This is a fast stop state (POWER9) */
+   0x0020 /* This is a deep-stop state (POWER9) */
+
+- ibm,cpu-idle-state-latencies-ns:
+   Array of unsigned 32-bit values containing the values of the
+   exit-latencies (in ns) for the idle states in
+   ibm,cpu-idle-state-names.
+
+- ibm,cpu-idle-state-residency-ns:
+   Array of unsigned 32-bit values containing the values of the
+   target-residency (in ns) for the idle states in
+   ibm,cpu-idle-state-names. On POWER8 this is an optional
+   property. If the property is absent, the target residency for
+   the "Nap", "FastSleep" are defined to 1 and 3
+   respectively by the kernel. On POWER9 this property is required.
+
+- ibm,cpu-idle-state-psscr:
+   Array of unsigned 64-bit values containing the values for the
+   PSSCR for each of the idle states in ibm,cpu-idle-state-names.
+   This property is required on POWER9 and absent on POWER8.
+
+- ibm,cpu-idle-state-psscr-mask:
+   Array of unsigned 64-bit values containing the masks
+   indicating which psscr fields are set in the corresponding
+   entries of ibm,cpu-idle-state-psscr. This property is
+   required on POWER9 and absent on POWER8.
+
+   Whenever the firmware sets an entry in
+   ibm,cpu-idle-state-psscr-mask value to 

[PATCH v6 4/5] powernv: Pass PSSCR value and mask to power9_idle_stop

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.

This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.

In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.

The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.

This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html

[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]

Acked-by: Balbir Singh 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cpuidle.h   |  44 ++
 arch/powerpc/include/asm/processor.h |   3 +-
 arch/powerpc/kernel/idle_book3s.S|  30 ---
 arch/powerpc/platforms/powernv/idle.c| 138 ---
 arch/powerpc/platforms/powernv/powernv.h |   3 +-
 arch/powerpc/platforms/powernv/smp.c |  14 ++--
 drivers/cpuidle/cpuidle-powernv.c|  52 +---
 7 files changed, 241 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 0a3255b..fd321eb4 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -10,11 +10,55 @@
 #define PNV_CORE_IDLE_LOCK_BIT  0x100
 #define PNV_CORE_IDLE_THREAD_BITS   0x0FF
 
+/*
+ *  NOTE =
+ * The older firmware populates only the RL field in the psscr_val and
+ * sets the psscr_mask to 0xf. On such a firmware, the kernel sets the
+ * remaining PSSCR fields to default values as follows:
+ *
+ * - ESL and EC bits are to 1. So wakeup from any stop state will be
+ *   at vector 0x100.
+ *
+ * - MTL and PSLL are set to the maximum allowed value as per the ISA,
+ *i.e. 15.
+ *
+ * - The Transition Rate, TR is set to the Maximum value 3.
+ */
+#define PSSCR_HV_DEFAULT_VAL(PSSCR_ESL | PSSCR_EC |\
+   PSSCR_PSLL_MASK | PSSCR_TR_MASK |   \
+   PSSCR_MTL_MASK)
+
+#define PSSCR_HV_DEFAULT_MASK   (PSSCR_ESL | PSSCR_EC |\
+   PSSCR_PSLL_MASK | PSSCR_TR_MASK |   \
+   PSSCR_MTL_MASK | PSSCR_RL_MASK)
+#define PSSCR_EC_SHIFT20
+#define PSSCR_ESL_SHIFT   21
+#define GET_PSSCR_EC(x)   (((x) & PSSCR_EC) >> PSSCR_EC_SHIFT)
+#define GET_PSSCR_ESL(x)  (((x) & PSSCR_ESL) >> PSSCR_ESL_SHIFT)
+#define GET_PSSCR_RL(x)   ((x) & PSSCR_RL_MASK)
+
+#define ERR_EC_ESL_MISMATCH-1
+#define ERR_DEEP_STATE_ESL_MISMATCH-2
+
 #ifndef __ASSEMBLY__
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
 extern u64 pnv_first_deep_stop_state;
+
+int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
+static inline void report_invalid_psscr_val(u64 psscr_val, int err)
+{
+   switch (err) {
+   case ERR_EC_ESL_MISMATCH:
+   pr_warn("Invalid psscr 0x%016llx : ESL,EC bits unequal",
+   psscr_val);
+   break;
+   case ERR_DEEP_STATE_ESL_MISMATCH:
+   pr_warn("Invalid psscr 0x%016llx : ESL cleared for deep 
stop-state",
+   psscr_val);
+   }
+}
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 1ba8144..21e0b52 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -454,7 +454,8 @@ static inline unsigned long get_clean_sp(unsigned long sp, 
int is_32)
 extern unsigned long power7_nap(int check_irq);
 

[PATCH v6 2/5] powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_init

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Balbir pointed out that the name of the function pnv_arch300_idle_init
was inconsistent with the names of the variables and functions
pertaining to POWER9 features in book3s_idle.S.

This patch renames pnv_arch300_idle_init to pnv_power9_idle_init.

This patch does not change any behaviour.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/platforms/powernv/idle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 479c256..57bec03 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -298,7 +298,7 @@ static void power9_idle(void)
  * @dt_idle_states: Number of idle state entries
  * Returns 0 on success
  */
-static int __init pnv_arch300_idle_init(struct device_node *np, u32 *flags,
+static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
int dt_idle_states)
 {
u64 *psscr_val = NULL;
@@ -373,7 +373,7 @@ static void __init pnv_probe_idle_states(void)
}
 
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
-   if (pnv_arch300_idle_init(np, flags, dt_idle_states))
+   if (pnv_power9_idle_init(np, flags, dt_idle_states))
goto out;
}
 
-- 
1.9.4



[PATCH v6 3/5] cpuidle:powernv: Add helper function to populate powernv idle states.

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

In the current code for powernv_add_idle_states, there is a lot of code
duplication while initializing an idle state in powernv_states table.

Add an inline helper function to populate the powernv_states[] table
for a given idle state. Invoke this for populating the "Nap",
"Fastsleep" and the stop states in powernv_add_idle_states.

Acked-by: Balbir Singh 
Signed-off-by: Gautham R. Shenoy 
---
 drivers/cpuidle/cpuidle-powernv.c | 89 +++
 include/linux/cpuidle.h   |  1 +
 2 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 0835a37..6871b7f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -20,6 +20,10 @@
 #include 
 #include 
 
+/*
+ * Expose only those Hardware idle states via the cpuidle framework
+ * that have latency value below POWERNV_THRESHOLD_LATENCY_NS.
+ */
 #define POWERNV_THRESHOLD_LATENCY_NS 20
 
 static struct cpuidle_driver powernv_idle_driver = {
@@ -167,6 +171,24 @@ static int powernv_cpuidle_driver_init(void)
return 0;
 }
 
+static inline void add_powernv_state(int index, const char *name,
+unsigned int flags,
+int (*idle_fn)(struct cpuidle_device *,
+   struct cpuidle_driver *,
+   int),
+unsigned int target_residency,
+unsigned int exit_latency,
+u64 psscr_val)
+{
+   strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
+   strlcpy(powernv_states[index].desc, name, CPUIDLE_NAME_LEN);
+   powernv_states[index].flags = flags;
+   powernv_states[index].target_residency = target_residency;
+   powernv_states[index].exit_latency = exit_latency;
+   powernv_states[index].enter = idle_fn;
+   stop_psscr_table[index] = psscr_val;
+}
+
 static int powernv_add_idle_states(void)
 {
struct device_node *power_mgt;
@@ -236,6 +258,7 @@ static int powernv_add_idle_states(void)
"ibm,cpu-idle-state-residency-ns", residency_ns, 
dt_idle_states);
 
for (i = 0; i < dt_idle_states; i++) {
+   unsigned int exit_latency, target_residency;
/*
 * If an idle state has exit latency beyond
 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
@@ -243,28 +266,33 @@ static int powernv_add_idle_states(void)
 */
if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
continue;
+   /*
+* Firmware passes residency and latency values in ns.
+* cpuidle expects it in us.
+*/
+   exit_latency = latency_ns[i] / 1000;
+   if (!rc)
+   target_residency = residency_ns[i] / 1000;
+   else
+   target_residency = 0;
 
/*
-* Cpuidle accepts exit_latency and target_residency in us.
-* Use default target_residency values if f/w does not expose 
it.
+* For nap and fastsleep, use default target_residency
+* values if f/w does not expose it.
 */
if (flags[i] & OPAL_PM_NAP_ENABLED) {
+   if (!rc)
+   target_residency = 100;
/* Add NAP state */
-   strcpy(powernv_states[nr_idle_states].name, "Nap");
-   strcpy(powernv_states[nr_idle_states].desc, "Nap");
-   powernv_states[nr_idle_states].flags = 0;
-   powernv_states[nr_idle_states].target_residency = 100;
-   powernv_states[nr_idle_states].enter = nap_loop;
+   add_powernv_state(nr_idle_states, "Nap",
+ CPUIDLE_FLAG_NONE, nap_loop,
+ target_residency, exit_latency, 0);
} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
-   strncpy(powernv_states[nr_idle_states].name,
-   names[i], CPUIDLE_NAME_LEN);
-   strncpy(powernv_states[nr_idle_states].desc,
-   names[i], CPUIDLE_NAME_LEN);
-   powernv_states[nr_idle_states].flags = 0;
-
-   powernv_states[nr_idle_states].enter = stop_loop;
-   stop_psscr_table[nr_idle_states] = psscr_val[i];
+   add_powernv_state(nr_idle_states, names[i],
+   

[PATCH v6 0/5] powernv:stop: Use psscr_val,mask provided by firmware

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

This is the sixth iteration of the patchset to use the psscr_val and
psscr_mask provided by the firmware for each of the stop states.

The previous versions can be found here:
[v5]: https://lkml.org/lkml/2017/1/10/147
[v4]: https://lkml.org/lkml/2016/12/9/288
[v3]: https://lkml.org/lkml/2016/11/10/37
[v2]: https://lkml.org/lkml/2016/10/27/143
[v1]: https://lkml.org/lkml/2016/9/29/45

This version addresses the feedback provided by Balbir and Rob Herring to
v5. The key changes are:

- [PATCH 2] Rename pnv_arch300_idle_init to pnv_power9_idle_init to be 
consistent
  with the nomenclature of variables and functions in idle_book3s.S.

- [PATCH 5] Updated the introduction to Idle-state properties in the
  devicetree bindings documentation in order to clarify when the
  "name" and "flags" properties are required. Also fixed the typos in
  the Documentation for the device-tree bindings

Synopsis
==
In the current implementation, the code for ISA
v3.0 stop implementation has a couple of shortcomings.

a) The code hand-codes the values for ESL, EC, TR, MTL bits of PSSCR
   and uses only the RL field from the firmware. While this is not
   incorrect, since the hand-coded values are legitimate, it is not a
   very flexible design since the firmware has the capability to
   communicate these values via the "ibm,cpu-idle-state-psscr" and
   "ibm,cpu-idle-state-psscr-mask" properties. In case where the
   firmware provides values for these fields that is different from
   the hand-coded values, the current code will not work as intended.

b) Due to issue a), the current code assumes that ESL=EC=1 for all the
   stop states and hence the wakeup from the stop instruction will
   happen at 0x100, the system-reset vector. However, the ISA v3.0
   allows the ESL=EC=0 behaviour where the corresponding stop-state
   loses no state and wakes up from the subsequent instruction. The
   current code doesn't handle this case.
   
This patch series addresses these issues.

The first patch in the series renames the existing
IDLE_STATE_ENTER_SEQ macro to IDLE_STATE_ENTER_SEQ_NORET. It reuses
the name IDLE_STATE_ENTER_SEQ for entering into stop-states which wake
up at the subsequent instruction.

The second patch in the series renames pnv_arch300_idle_init()
to pnv_power9_idle_init.

The third patch adds a helper function in cpuidle-powernv.c for
initializing entries of the powernv_states[] table that is passed to
the cpu-idle core. This eliminates some of the code duplication in the
function that discovers and initializes the stop states.

The fourth patch in the series fixes issues a) and b) by ensuring that
the psscr-value and the psscr-mask provided by the firmware are what
will be used to set a particular stop state. It also adds support for
handling wake-up from stop states which were entered with ESL=EC=0.
It validates hat the psscr values exposed by the firmware maintains
the invariants mentioned in the ISA.

The fourth patch also handles the older firmware which sets only the
Requested Level (RL) field in the psscr and psscr-mask exposed in the
device tree. In the presence of such older firmware, this patch will
set the default sane values for for remaining PSSCR fields (i.e PSLL,
MTL, ESL, EC, and TR).

The fifth patch provides the documentation for the device-tree
bindings describing the idle state properties under the @power-mgt
node in the device-tree.

The skiboot patch populates all the relevant fields in the PSSCR
values and the mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html

Gautham R. Shenoy (5):
  powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro
  powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_init
  cpuidle:powernv: Add helper function to populate powernv idle states.
  powernv: Pass PSSCR value and mask to power9_idle_stop
  Documentation:powerpc: Add device-tree bindings for power-mgt

 .../devicetree/bindings/powerpc/opal/power-mgt.txt | 118 +
 arch/powerpc/include/asm/cpuidle.h |  49 ++-
 arch/powerpc/include/asm/processor.h   |   3 +-
 arch/powerpc/kernel/exceptions-64s.S   |   6 +-
 arch/powerpc/kernel/idle_book3s.S  |  40 +++---
 arch/powerpc/platforms/powernv/idle.c  | 142 ++---
 arch/powerpc/platforms/powernv/powernv.h   |   3 +-
 arch/powerpc/platforms/powernv/smp.c   |  14 +-
 drivers/cpuidle/cpuidle-powernv.c  | 129 +--
 include/linux/cpuidle.h|   1 +
 10 files changed, 421 insertions(+), 84 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt

-- 
1.9.4



[PATCH v6 1/5] powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro

2017-01-25 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Currently all the low-power idle states are expected to wake up
at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ
that puts the CPU to an idle state and never returns.

On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU
is expected to wake up at the next instruction of the idle
instruction.

This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the
no-return variant and reuses the name IDLE_STATE_ENTER_SEQ
for a variant that allows resuming operation at the instruction next
to the idle-instruction.

Acked-by: Balbir Singh 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cpuidle.h   |  5 -
 arch/powerpc/kernel/exceptions-64s.S |  6 +++---
 arch/powerpc/kernel/idle_book3s.S| 10 +-
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 3919332..0a3255b 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -21,7 +21,7 @@
 
 /* Idle state entry routines */
 #ifdef CONFIG_PPC_P7_NAP
-#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \
+#define IDLE_STATE_ENTER_SEQ(IDLE_INST) \
/* Magic NAP/SLEEP/WINKLE mode enter sequence */\
std r0,0(r1);   \
ptesync;\
@@ -29,6 +29,9 @@
 1: cmpdcr0,r0,r0;  \
bne 1b; \
IDLE_INST;  \
+
+#defineIDLE_STATE_ENTER_SEQ_NORET(IDLE_INST)   \
+   IDLE_STATE_ENTER_SEQ(IDLE_INST) \
b   .
 #endif /* CONFIG_PPC_P7_NAP */
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d39d611..069aac8 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -381,12 +381,12 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
lbz r3,PACA_THREAD_IDLE_STATE(r13)
cmpwi   r3,PNV_THREAD_NAP
bgt 10f
-   IDLE_STATE_ENTER_SEQ(PPC_NAP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
/* No return */
 10:
cmpwi   r3,PNV_THREAD_SLEEP
bgt 2f
-   IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
/* No return */
 
 2:
@@ -400,7 +400,7 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
 */
ori r13,r13,1
SET_PACA(r13)
-   IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_WINKLE)
/* No return */
 4:
 #endif
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 72dac0b..be90e2f 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -205,7 +205,7 @@ pnv_enter_arch207_idle_mode:
stb r3,PACA_THREAD_IDLE_STATE(r13)
cmpwi   cr3,r3,PNV_THREAD_SLEEP
bge cr3,2f
-   IDLE_STATE_ENTER_SEQ(PPC_NAP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
/* No return */
 2:
/* Sleep or winkle */
@@ -239,7 +239,7 @@ pnv_fastsleep_workaround_at_entry:
 
 common_enter: /* common code for all the threads entering sleep or winkle */
bgt cr3,enter_winkle
-   IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
 
 fastsleep_workaround_at_entry:
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
@@ -261,7 +261,7 @@ fastsleep_workaround_at_entry:
 enter_winkle:
bl  save_sprs_to_stack
 
-   IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_WINKLE)
 
 /*
  * r3 - requested stop state
@@ -280,7 +280,7 @@ power_enter_stop:
ld  r4,ADDROFF(pnv_first_deep_stop_state)(r5)
cmpdr3,r4
bge 2f
-   IDLE_STATE_ENTER_SEQ(PPC_STOP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
 2:
 /*
  * Entering deep idle state.
@@ -302,7 +302,7 @@ lwarx_loop_stop:
 
bl  save_sprs_to_stack
 
-   IDLE_STATE_ENTER_SEQ(PPC_STOP)
+   IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
 
 _GLOBAL(power7_idle)
/* Now check if user or arch enabled NAP mode */
-- 
1.9.4



Re: [PATCH v5 5/5] Documentation:powerpc: Add device-tree bindings for power-mgt

2017-01-25 Thread Gautham R Shenoy
Hello Rob,

Thank you very much for your review. I had missed this mail
and found it while looking at the lkml thread while preparing for the
next iteration.

On Fri, Jan 13, 2017 at 10:57:43AM -0600, Rob Herring wrote:
> On Tue, Jan 10, 2017 at 02:37:04PM +0530, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" 
> > 
> > Document the device-tree bindings defining the the properties under
> > the @power-mgt node in the device tree that describe the idle states
> > for Linux running on baremetal POWER servers.
> > 
> 
> We have "common" idle state bindings. Perhaps some explanation why those 
> can't be used.

Are you referring to
Documentation/devicetree/bindings/power/domain-idle-state.txt ?

On POWER, since POWER8 time, the DVFS states as well as the idle
states were exposed as properties under the common /ibm,opal/power-mgt
node. Since the DVFS states were large in number, creating a separate
node for each one of them didn't seem like a good design. Hence the
DVFS state properties were encoded as property arrays. The same design
was carried forward to the idle-states as well.

Which is why the common idle-state bindings in domain-idle-state.txt
cannot be used since each idle-state is described as a node there.

> 
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> > [v4]-> [v5]: Fixed a couple of typos.
> > 
> >  .../devicetree/bindings/powerpc/opal/power-mgt.txt | 125 
> > +
> >  1 file changed, 125 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt 
> > b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
> > new file mode 100644
> > index 000..4967831
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/powerpc/opal/power-mgt.txt
> > @@ -0,0 +1,125 @@
> > +IBM Power-Management Bindings
> > +=
> > +
> > +Linux running on baremetal POWER machines has access to the processor
> > +idle states. The description of these idle states is exposed via the
> > +node @power-mgt in the device-tree by the firmware.
> > +
> > +Definitions:
> > +
> > +Typically each idle state has the following associated properties:
> > +
> > +- name: The name of the idle state as defined by the firmware.
> > +
> > +- flags: indicating some aspects of this idle states such as the
> > + extent of state-loss, whether timebase is stopped on this
> > + idle states and so on. The flag bits are as follows:
> > +
> > +- exit-latency: The latency involved in transitioning the state of the
> > +   CPU from idle to running.
> > +
> > +- target-residency: The minimum time that the CPU needs to reside in
> > +   this idle state in order to accrue power-savings
> > +   benefit.
> > +
> > +Properties
> > +
> > +The following properties provide details about the idle states. These
> > +properties are optional unless mentioned otherwise below.
> 
> -names is optional but everything else seems to require it. It is not 
> clear what the binding looks like if -names is not present.

The firmware could choose not to expose the idle-states to the kernel,
in which case, it would expose any of these properties.

However, if the firmware chooses to expose idle-states, then the 
ibm,cpu-idle-state-names and ibm,cpu-idle-state-flags properties are
required.

I will reword this as follows:

  The following properties provide details about the idle
  states. These properties are exposed as arrays. Each entry in the
  property array provides the value of that property for the idle
  state associated with the array index of that entry.

  If idle-states are defined, then the properties
  "ibm,cpu-idle-state-names" and "ibm,cpu-idle-state-flags" are
  required. The other properties are required unless mentioned
  otherwise. The length of all the property arrays must be the same.


> 
> > +
> > +- ibm,cpu-idle-state-names:
> > +   Array of strings containing the names of the idle states.
> > +
> > +- ibm,cpu-idle-state-flags:
> > +   Array of unsigned 32-bit values containing the values of the
> > +   flags associated with the the aforementioned idle-states. This
> > +   property is required on POWER9 whenever
> > +   ibm,cpu-idle-state-names is defined and the length of this
> > +   property array should be the same as
> > +   ibm,-cpu-idle-state-names.The flag bits are as follows:
> 
> s/ibm,-cpu/ibm,cpu/
> 
> Needs a space after the period.

Thanks for catching this. Will fix it in the next iteration.

> 
> > +   0x0001 /* Decrementer would stop */
> > +   0x0002 /* Needs timebase restore */
> > +   0x1000 /* Restore GPRs like nap */
> > +   0x2000 /* Restore hypervisor resource from PACA pointer */
> > +   0x4000 /* Program PORE to restore PACA pointer */
> > +