date:20200415

On Wed, Apr 15, 2020 at 02:27:51PM -0500, Scott Wood wrote:
> > > + dev_err(&pdev->dev, "error no valid uio-map configured\n");
> > > + ret = -EINVAL;
> > > + goto err_info_free_internel;
> > > + }
> > > +
> > > + info->version = "0.1.0";
> > 
> > Could you define some DRIVER_VERSION in the top of the file next to 
> > DRIVER_NAME instead of hard coding in the middle on a function ?
> 
> That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
> thought it was the common-but-pointless practice of having the driver print a
> version number that never gets updated, rather than something the UIO API
> (unfortunately, compared to a feature query interface) expects.  That said,
> I'm not sure what the value is of making it a macro since it should only be
> used once, that use is self documenting, it isn't tunable, etc.  Though if
> this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
> again, it should be UIO_VERSION, not DRIVER_VERSION).
> 
> Does this really need a three-part version scheme?  What's wrong with a
> version of "1", to be changed to "2" in the hopefully-unlikely event that the
> userspace API changes?  Assuming UIO is used for this at all, which doesn't
> seem like a great fit to me.

No driver version numbers at all please, they do not make any sense when
the driver is included in the kernel tree.

greg k-h

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, Apr 15, 2020 at 02:26:55PM -0500, Scott Wood wrote:
> Instead, have module parameters that take the sizes and alignments you'd like
> to allocate and expose to userspace.  Better still would be some sort of
> dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
> if it succeeds you can mmap it, and when the fd is closed the region is
> freed).

No module parameters please, this is not the 1990's.

Use device tree, that is what it is there for.

thanks,

greg k-h

[PATCH v11 14/14] powerpc: Use mm_context vas_windows counter to issue CP_ABORT



set_thread_uses_vas() sets used_vas flag for a process that opened VAS
window and issue CP_ABORT during context switch for only that process.
In multi-thread application, windows can be shared. For example Thread
A can open a window and Thread B can run COPY/PASTE instructions to
send NX request which may cause corruption or snooping or a covert
channel Also once this flag is set, continue to run CP_ABORT even the
VAS window is closed.

So define vas-windows counter in process mm_context, increment this
counter for each window open and decrement it for window close. If
vas-windows is set, issue CP_ABORT during context switch. It means
clear the foreign real address mapping only if the process / thread
uses COPY/PASTE. Then disable it for that process if windows are not
open.

Moved set_thread_uses_vas() code to vas_tx_win_open() as this
functionality is needed only for userspace open windows. We are adding
VAS userspace support along with this fix. So no need to include this
fix in stable releases.

Fixes: 9d2a4d71332c ("powerpc: Define set_thread_uses_vas()")
Signed-off-by: Haren Myneni 
Reported-by: Nicholas Piggin 
Suggested-by: Milton Miller 
Suggested-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu.h|  3 +++
 arch/powerpc/include/asm/mmu_context.h  | 30 +
 arch/powerpc/include/asm/processor.h|  1 -
 arch/powerpc/include/asm/switch_to.h|  2 --
 arch/powerpc/kernel/process.c   | 24 ++-
 arch/powerpc/platforms/powernv/vas-window.c | 22 -
 6 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index bb3deb7..f0a9ff6 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -116,6 +116,9 @@ struct patb_entry {
/* Number of users of the external (Nest) MMU */
atomic_t copros;
 
+   /* Number of user space windows opened in process mm_context */
+   atomic_t vas_windows;
+
struct hash_mm_context *hash_context;
 
unsigned long vdso_base;
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 360367c..1a474f6b 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -185,11 +185,41 @@ static inline void mm_context_remove_copro(struct 
mm_struct *mm)
dec_mm_active_cpus(mm);
}
 }
+
+/*
+ * vas_windows counter shows number of open windows in the mm
+ * context. During context switch, use this counter to clear the
+ * foreign real address mapping (CP_ABORT) for the thread / process
+ * that intend to use COPY/PASTE. When a process closes all windows,
+ * disable CP_ABORT which is expensive to run.
+ *
+ * For user context, register a copro so that TLBIs are seen by the
+ * nest MMU. mm_context_add/remove_vas_window() are used only for user
+ * space windows.
+ */
+static inline void mm_context_add_vas_window(struct mm_struct *mm)
+{
+   atomic_inc(&mm->context.vas_windows);
+   mm_context_add_copro(mm);
+}
+
+static inline void mm_context_remove_vas_window(struct mm_struct *mm)
+{
+   int v;
+
+   mm_context_remove_copro(mm);
+   v = atomic_dec_if_positive(&mm->context.vas_windows);
+
+   /* Detect imbalance between add and remove */
+   WARN_ON(v < 0);
+}
 #else
 static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
 static inline void dec_mm_active_cpus(struct mm_struct *mm) { }
 static inline void mm_context_add_copro(struct mm_struct *mm) { }
 static inline void mm_context_remove_copro(struct mm_struct *mm) { }
+static inline void mm_context_add_vas_windows(struct mm_struct *mm) { }
+static inline void mm_context_remove_vas_windows(struct mm_struct *mm) { }
 #endif
 
 
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index eedcbfb..bfa336f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -272,7 +272,6 @@ struct thread_struct {
unsignedmmcr0;
 
unsignedused_ebb;
-   unsigned intused_vas;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a..012db9a 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -91,8 +91,6 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
-extern int set_thread_uses_vas(void);
-
 extern int set_thread_tidr(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db..ed3f645 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1221,7 +1221,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
 * mappings, w

[PATCH v11 13/14] powerpc/vas: Free send window in VAS instance after credits returned



NX may be processing requests while trying to close window. Wait until
all credits are returned and then free send window from VAS instance.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index d0c07cf..e15b405 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1316,14 +1316,14 @@ int vas_win_close(struct vas_window *window)
 
unmap_paste_region(window);
 
-   clear_vinst_win(window);
-
poll_window_busy_state(window);
 
unpin_close_window(window);
 
poll_window_credits(window);
 
+   clear_vinst_win(window);
+
poll_window_castout(window);
 
/* if send window, drop reference to matching receive window */
-- 
1.8.3.1

[PATCH v11 12/14] powerpc/vas: Display process stuck message



Process can not close send window until all requests are processed.
Means wait until window state is not busy and send credits are
returned. Display debug messages in case taking longer to close the
window.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 30 -
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 4b5adf5..d0c07cf 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1181,6 +1181,7 @@ static void poll_window_credits(struct vas_window *window)
 {
u64 val;
int creds, mode;
+   int count = 0;
 
val = read_hvwc_reg(window, VREG(WINCTL));
if (window->tx_win)
@@ -1199,10 +1200,27 @@ static void poll_window_credits(struct vas_window 
*window)
creds = GET_FIELD(VAS_LRX_WCRED, val);
}
 
+   /*
+* Takes around few milliseconds to complete all pending requests
+* and return credits.
+* TODO: Scan fault FIFO and invalidate CRBs points to this window
+*   and issue CRB Kill to stop all pending requests. Need only
+*   if there is a bug in NX or fault handling in kernel.
+*/
if (creds < window->wcreds_max) {
val = 0;
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(10));
+   count++;
+   /*
+* Process can not close send window until all credits are
+* returned.
+*/
+   if (!(count % 1000))
+   pr_warn_ratelimited("VAS: pid %d stuck. Waiting for 
credits returned for Window(%d). creds %d, Retries %d\n",
+   vas_window_pid(window), window->winid,
+   creds, count);
+
goto retry;
}
 }
@@ -1216,6 +1234,7 @@ static void poll_window_busy_state(struct vas_window 
*window)
 {
int busy;
u64 val;
+   int count = 0;
 
 retry:
val = read_hvwc_reg(window, VREG(WIN_STATUS));
@@ -1223,7 +1242,16 @@ static void poll_window_busy_state(struct vas_window 
*window)
if (busy) {
val = 0;
set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(msecs_to_jiffies(5));
+   schedule_timeout(msecs_to_jiffies(10));
+   count++;
+   /*
+* Takes around few milliseconds to process all pending
+* requests.
+*/
+   if (!(count % 1000))
+   pr_warn_ratelimited("VAS: pid %d stuck. Window (ID=%d) 
is in busy state. Retries %d\n",
+   vas_window_pid(window), window->winid, count);
+
goto retry;
}
 }
-- 
1.8.3.1

[PATCH v11 11/14] powerpc/vas: Do not use default credits for receive window



System checkstops if RxFIFO overruns with more requests than the
maximum possible number of CRBs allowed in FIFO at any time. So
max credits value (rxattr.wcreds_max) is set and is passed to
vas_rx_win_open() by the the driver.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 4 ++--
 arch/powerpc/platforms/powernv/vas.h| 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 3ef7120..4b5adf5 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -772,7 +772,7 @@ static bool rx_win_args_valid(enum vas_cop_type cop,
if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX)
return false;
 
-   if (attr->wcreds_max > VAS_RX_WCREDS_MAX)
+   if (!attr->wcreds_max)
return false;
 
if (attr->nx_win) {
@@ -877,7 +877,7 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
rxwin->nx_win = rxattr->nx_win;
rxwin->user_win = rxattr->user_win;
rxwin->cop = cop;
-   rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
+   rxwin->wcreds_max = rxattr->wcreds_max;
 
init_winctx_for_rxwin(rxwin, rxattr, &winctx);
init_winctx_regs(rxwin, &winctx);
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 60bdda6..a7143b1 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -101,11 +101,9 @@
 /*
  * Initial per-process credits.
  * Max send window credits:4K-1 (12-bits in VAS_TX_WCRED)
- * Max receive window credits: 64K-1 (16 bits in VAS_LRX_WCRED)
  *
  * TODO: Needs tuning for per-process credits
  */
-#define VAS_RX_WCREDS_MAX  ((64 << 10) - 1)
 #define VAS_TX_WCREDS_MAX  ((4 << 10) - 1)
 #define VAS_WCREDS_DEFAULT (1 << 10)
 
-- 
1.8.3.1

[PATCH v11 10/14] powerpc/vas: Print CRB and FIFO values



Dump FIFO entries if could not find send window and print CRB
for debugging.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c | 41 ++
 1 file changed, 41 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index b6bec64..25db70b 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -26,6 +26,28 @@
  */
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
+static void dump_crb(struct coprocessor_request_block *crb)
+{
+   struct data_descriptor_entry *dde;
+   struct nx_fault_stamp *nx;
+
+   dde = &crb->source;
+   pr_devel("SrcDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n",
+   be64_to_cpu(dde->address), be32_to_cpu(dde->length),
+   dde->count, dde->index, dde->flags);
+
+   dde = &crb->target;
+   pr_devel("TgtDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n",
+   be64_to_cpu(dde->address), be32_to_cpu(dde->length),
+   dde->count, dde->index, dde->flags);
+
+   nx = &crb->stamp.nx;
+   pr_devel("NX Stamp: PSWID 0x%x, FSA 0x%llx, flags 0x%x, FS 0x%x\n",
+   be32_to_cpu(nx->pswid),
+   be64_to_cpu(crb->stamp.nx.fault_storage_addr),
+   nx->flags, nx->fault_status);
+}
+
 /*
  * Update the CSB to indicate a translation error.
  *
@@ -148,6 +170,23 @@ static void update_csb(struct vas_window *window,
pid_vnr(pid), rc);
 }
 
+static void dump_fifo(struct vas_instance *vinst, void *entry)
+{
+   unsigned long *end = vinst->fault_fifo + vinst->fault_fifo_size;
+   unsigned long *fifo = entry;
+   int i;
+
+   pr_err("Fault fifo size %d, Max crbs %d\n", vinst->fault_fifo_size,
+   vinst->fault_fifo_size / CRB_SIZE);
+
+   /* Dump 10 CRB entries or until end of FIFO */
+   pr_err("Fault FIFO Dump:\n");
+   for (i = 0; i < 10*(CRB_SIZE/8) && fifo < end; i += 4, fifo += 4) {
+   pr_err("[%.3d, %p]: 0x%.16lx 0x%.16lx 0x%.16lx 0x%.16lx\n",
+   i, fifo, *fifo, *(fifo+1), *(fifo+2), *(fifo+3));
+   }
+}
+
 /*
  * Process valid CRBs in fault FIFO.
  * NX process user space requests, return credit and update the status
@@ -233,6 +272,7 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data)
vinst->vas_id, vinst->fault_fifo, fifo,
vinst->fault_crbs);
 
+   dump_crb(crb);
window = vas_pswid_to_window(vinst,
be32_to_cpu(crb->stamp.nx.pswid));
 
@@ -245,6 +285,7 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data)
 * But we should not get here.
 * TODO: Disable IRQ.
 */
+   dump_fifo(vinst, (void *)entry);
pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, 
fault_crbs %d bad CRB?\n",
vinst->vas_id, vinst->fault_fifo, fifo,
be32_to_cpu(crb->stamp.nx.pswid),
-- 
1.8.3.1

[PATCH v11 09/14] powerpc/vas: Return credits after handling fault



NX uses credit mechanism to control the number of requests issued on
a specific window at any point of time. Only send windows and fault
window are used credits. When the request is issued on a given window,
a credit is taken. This credit will be returned after that request is
processed. If credits are not available, returns RMA_Busy for send
window and RMA_Reject for fault window.

NX expects OS to return credit for send window after processing fault
CRB. Also credit has to be returned for fault window after handling
the fault.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c  |  9 
 arch/powerpc/platforms/powernv/vas-window.c | 36 +
 arch/powerpc/platforms/powernv/vas.h|  1 +
 3 files changed, 46 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 354577d..b6bec64 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -224,6 +224,10 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data)
memcpy(crb, fifo, CRB_SIZE);
entry->stamp.nx.pswid = cpu_to_be32(FIFO_INVALID_ENTRY);
entry->ccw |= cpu_to_be32(CCW0_INVALID);
+   /*
+* Return credit for the fault window.
+*/
+   vas_return_credit(vinst->fault_win, false);
 
pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n",
vinst->vas_id, vinst->fault_fifo, fifo,
@@ -249,6 +253,11 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data)
WARN_ON_ONCE(1);
} else {
update_csb(window, crb);
+   /*
+* Return credit for send window after processing
+* fault CRB.
+*/
+   vas_return_credit(window, true);
}
}
 }
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index f12f7eb..3ef7120 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -1317,6 +1317,42 @@ int vas_win_close(struct vas_window *window)
 }
 EXPORT_SYMBOL_GPL(vas_win_close);
 
+/*
+ * Return credit for the given window.
+ * Send windows and fault window uses credit mechanism as follows:
+ *
+ * Send windows:
+ * - The default number of credits available for each send window is
+ *   1024. It means 1024 requests can be issued asynchronously at the
+ *   same time. If the credit is not available, that request will be
+ *   returned with RMA_Busy.
+ * - One credit is taken when NX request is issued.
+ * - This credit is returned after NX processed that request.
+ * - If NX encounters translation error, kernel will return the
+ *   credit on the specific send window after processing the fault CRB.
+ *
+ * Fault window:
+ * - The total number credits available is FIFO_SIZE/CRB_SIZE.
+ *   Means 4MB/128 in the current implementation. If credit is not
+ *   available, RMA_Reject is returned.
+ * - A credit is taken when NX pastes CRB in fault FIFO.
+ * - The kernel with return credit on fault window after reading entry
+ *   from fault FIFO.
+ */
+void vas_return_credit(struct vas_window *window, bool tx)
+{
+   uint64_t val;
+
+   val = 0ULL;
+   if (tx) { /* send window */
+   val = SET_FIELD(VAS_TX_WCRED, val, 1);
+   write_hvwc_reg(window, VREG(TX_WCRED_ADDER), val);
+   } else {
+   val = SET_FIELD(VAS_LRX_WCRED, val, 1);
+   write_hvwc_reg(window, VREG(LRX_WCRED_ADDER), val);
+   }
+}
+
 struct vas_window *vas_pswid_to_window(struct vas_instance *vinst,
uint32_t pswid)
 {
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index cd165c8..60bdda6 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -436,6 +436,7 @@ struct vas_winctx {
 extern int vas_setup_fault_window(struct vas_instance *vinst);
 extern irqreturn_t vas_fault_thread_fn(int irq, void *data);
 extern irqreturn_t vas_fault_handler(int irq, void *dev_id);
+extern void vas_return_credit(struct vas_window *window, bool tx);
 extern struct vas_window *vas_pswid_to_window(struct vas_instance *vinst,
uint32_t pswid);
 
-- 
1.8.3.1

[PATCH v11 07/14] powerpc/vas: Setup thread IRQ handler per VAS instance



When NX encounters translation error on CRB and any request buffer,
raises an interrupt on the CPU to handle the fault. It can raise one
interrupt for multiple faults. Expects OS to handle these faults and
return credits for fault window after processing faults.

Setup thread IRQ handler and IRQ thread function per each VAS instance.
IRQ handler checks if the thread is already woken up and can handle new
faults. If so returns with IRQ_HANDLED, otherwise wake up thread to
process new faults.

The thread functions reads each CRB entry from fault FIFO until sees
invalid entry. After reading each CRB, determine the corresponding
send window using pswid (from CRB) and process fault CRB. Then
invalidate the entry and return credit. Processing fault CRB and
return credit is described in subsequent patches.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c  | 131 
 arch/powerpc/platforms/powernv/vas-window.c |  60 +
 arch/powerpc/platforms/powernv/vas.c|  23 -
 arch/powerpc/platforms/powernv/vas.h|   7 ++
 4 files changed, 220 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 4044998..0da8358 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "vas.h"
@@ -25,6 +26,136 @@
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
 /*
+ * Process valid CRBs in fault FIFO.
+ * NX process user space requests, return credit and update the status
+ * in CRB. If it encounters transalation error when accessing CRB or
+ * request buffers, raises interrupt on the CPU to handle the fault.
+ * It takes credit on fault window, updates nx_fault_stamp in CRB with
+ * the following information and pastes CRB in fault FIFO.
+ *
+ * pswid - window ID of the window on which the request is sent.
+ * fault_storage_addr - fault address
+ *
+ * It can raise a single interrupt for multiple faults. Expects OS to
+ * process all valid faults and return credit for each fault on user
+ * space and fault windows. This fault FIFO control will be done with
+ * credit mechanism. NX can continuously paste CRBs until credits are not
+ * available on fault window. Otherwise, returns with RMA_reject.
+ *
+ * Total credits available on fault window: FIFO_SIZE(4MB)/CRBS_SIZE(128)
+ *
+ */
+irqreturn_t vas_fault_thread_fn(int irq, void *data)
+{
+   struct vas_instance *vinst = data;
+   struct coprocessor_request_block *crb, *entry;
+   struct coprocessor_request_block buf;
+   struct vas_window *window;
+   unsigned long flags;
+   void *fifo;
+
+   crb = &buf;
+
+   /*
+* VAS can interrupt with multiple page faults. So process all
+* valid CRBs within fault FIFO until reaches invalid CRB.
+* We use CCW[0] and pswid to validate validate CRBs:
+*
+* CCW[0]   Reserved bit. When NX pastes CRB, CCW[0]=0
+*  OS sets this bit to 1 after reading CRB.
+* pswidNX assigns window ID. Set pswid to -1 after
+*  reading CRB from fault FIFO.
+*
+* We exit this function if no valid CRBs are available to process.
+* So acquire fault_lock and reset fifo_in_progress to 0 before
+* exit.
+* In case kernel receives another interrupt with different page
+* fault, interrupt handler returns with IRQ_HANDLED if
+* fifo_in_progress is set. Means these new faults will be
+* handled by the current thread. Otherwise set fifo_in_progress
+* and return IRQ_WAKE_THREAD to wake up thread.
+*/
+   while (true) {
+   spin_lock_irqsave(&vinst->fault_lock, flags);
+   /*
+* Advance the fault fifo pointer to next CRB.
+* Use CRB_SIZE rather than sizeof(*crb) since the latter is
+* aligned to CRB_ALIGN (256) but the CRB written to by VAS is
+* only CRB_SIZE in len.
+*/
+   fifo = vinst->fault_fifo + (vinst->fault_crbs * CRB_SIZE);
+   entry = fifo;
+
+   if ((entry->stamp.nx.pswid == cpu_to_be32(FIFO_INVALID_ENTRY))
+   || (entry->ccw & cpu_to_be32(CCW0_INVALID))) {
+   vinst->fifo_in_progress = 0;
+   spin_unlock_irqrestore(&vinst->fault_lock, flags);
+   return IRQ_HANDLED;
+   }
+
+   spin_unlock_irqrestore(&vinst->fault_lock, flags);
+   vinst->fault_crbs++;
+   if (vinst->fault_crbs == (vinst->fault_fifo_size / CRB_SIZE))
+   vinst->fault_crbs = 0;
+
+   memcpy(crb, fifo, CRB_SIZE);
+   entry->stamp.nx.pswid = cpu_to_be32(FIFO_

[PATCH v11 08/14] powerpc/vas: Update CSB and notify process for fault CRBs



Applications polls on CSB for the status update after requests are
issued. NX process these requests and update the CSB with the status.
If it encounters translation error, pastes CRB in fault FIFO and
raises an interrupt. The kernel handles fault by reading CRB from
fault FIFO and process the fault CRB.

For each fault CRB, update fault address in CRB (fault_storage_addr)
and translation error status in CSB so that user space can touch the
fault address and resend the request. If the user space passed invalid
CSB address send signal to process with SIGSEGV.

In the case of multi-thread applications, child thread may not be
available. So if the task is not running, send signal to tgid.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-fault.c | 126 -
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 0da8358..354577d 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -26,6 +27,128 @@
 #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
 
 /*
+ * Update the CSB to indicate a translation error.
+ *
+ * User space will be polling on CSB after the request is issued.
+ * If NX can handle the request without any issues, it updates CSB.
+ * Whereas if NX encounters page fault, the kernel will handle the
+ * fault and update CSB with translation error.
+ *
+ * If we are unable to update the CSB means copy_to_user failed due to
+ * invalid csb_addr, send a signal to the process.
+ */
+static void update_csb(struct vas_window *window,
+   struct coprocessor_request_block *crb)
+{
+   struct coprocessor_status_block csb;
+   struct kernel_siginfo info;
+   struct task_struct *tsk;
+   void __user *csb_addr;
+   struct pid *pid;
+   int rc;
+
+   /*
+* NX user space windows can not be opened for task->mm=NULL
+* and faults will not be generated for kernel requests.
+*/
+   if (WARN_ON_ONCE(!window->mm || !window->user_win))
+   return;
+
+   csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
+
+   memset(&csb, 0, sizeof(csb));
+   csb.cc = CSB_CC_TRANSLATION;
+   csb.ce = CSB_CE_TERMINATION;
+   csb.cs = 0;
+   csb.count = 0;
+
+   /*
+* NX operates and returns in BE format as defined CRB struct.
+* So saves fault_storage_addr in BE as NX pastes in FIFO and
+* expects user space to convert to CPU format.
+*/
+   csb.address = crb->stamp.nx.fault_storage_addr;
+   csb.flags = 0;
+
+   pid = window->pid;
+   tsk = get_pid_task(pid, PIDTYPE_PID);
+   /*
+* Process closes send window after all pending NX requests are
+* completed. In multi-thread applications, a child thread can
+* open a window and can exit without closing it. May be some
+* requests are pending or this window can be used by other
+* threads later. We should handle faults if NX encounters
+* pages faults on these requests. Update CSB with translation
+* error and fault address. If csb_addr passed by user space is
+* invalid, send SEGV signal to pid saved in window. If the
+* child thread is not running, send the signal to tgid.
+* Parent thread (tgid) will close this window upon its exit.
+*
+* pid and mm references are taken when window is opened by
+* process (pid). So tgid is used only when child thread opens
+* a window and exits without closing it.
+*/
+   if (!tsk) {
+   pid = window->tgid;
+   tsk = get_pid_task(pid, PIDTYPE_PID);
+   /*
+* Parent thread (tgid) will be closing window when it
+* exits. So should not get here.
+*/
+   if (WARN_ON_ONCE(!tsk))
+   return;
+   }
+
+   /* Return if the task is exiting. */
+   if (tsk->flags & PF_EXITING) {
+   put_task_struct(tsk);
+   return;
+   }
+
+   use_mm(window->mm);
+   rc = copy_to_user(csb_addr, &csb, sizeof(csb));
+   /*
+* User space polls on csb.flags (first byte). So add barrier
+* then copy first byte with csb flags update.
+*/
+   if (!rc) {
+   csb.flags = CSB_V;
+   /* Make sure update to csb.flags is visible now */
+   smp_mb();
+   rc = copy_to_user(csb_addr, &csb, sizeof(u8));
+   }
+   unuse_mm(window->mm);
+   put_task_struct(tsk);
+
+   /* Success */
+   if (!rc)
+   return;
+
+   pr_debug("Invalid CSB address 0x%p signalling pid(%d)\n",
+   csb_addr, pid_vnr(pid));
+

[PATCH v11 06/14] powerpc/vas: Take reference to PID and mm for user space windows



When process opens a window, its pid and tgid will be saved in the
vas_window struct. This window will be closed when the process exits.
The kernel handles NX faults by updating CSB or send SEGV signal to pid
of the process if the userspace csb addr is invalid.

In multi-thread applications, a window can be opened by a child thread,
but it will not be closed when this thread exits. It is expected that
the parent will clean up all resources including NX windows opened by
child threads. A child thread can send NX requests using this window
and could be killed before completion is reported. If the pid assigned
to this thread is reused while requests are pending, a failure SEGV
would be directed to the wrong place.

To prevent reusing the pid, take references to pid and mm when the window
is opened and release them when when the window is closed. Then if child
thread is not running, SEGV signal will be sent to thread group leader
(tgid).

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-debug.c  |  2 +-
 arch/powerpc/platforms/powernv/vas-window.c | 50 ++---
 arch/powerpc/platforms/powernv/vas.h|  9 +-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index 09e63df..ef9a717 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -38,7 +38,7 @@ static int info_show(struct seq_file *s, void *private)
 
seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop),
window->tx_win ? "Send" : "Receive");
-   seq_printf(s, "Pid : %d\n", window->pid);
+   seq_printf(s, "Pid : %d\n", vas_window_pid(window));
 
 unlock:
mutex_unlock(&vas_mutex);
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index dc46bf6..063cda2 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include "vas.h"
@@ -876,8 +878,6 @@ struct vas_window *vas_rx_win_open(int vasid, enum 
vas_cop_type cop,
rxwin->user_win = rxattr->user_win;
rxwin->cop = cop;
rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT;
-   if (rxattr->user_win)
-   rxwin->pid = task_pid_vnr(current);
 
init_winctx_for_rxwin(rxwin, rxattr, &winctx);
init_winctx_regs(rxwin, &winctx);
@@ -1027,7 +1027,6 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
txwin->tx_win = 1;
txwin->rxwin = rxwin;
txwin->nx_win = txwin->rxwin->nx_win;
-   txwin->pid = attr->pid;
txwin->user_win = attr->user_win;
txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT;
 
@@ -1057,6 +1056,40 @@ struct vas_window *vas_tx_win_open(int vasid, enum 
vas_cop_type cop,
rc = set_thread_uses_vas();
if (rc)
goto free_window;
+
+   /*
+* Window opened by a child thread may not be closed when
+* it exits. So take reference to its pid and release it
+* when the window is free by parent thread.
+* Acquire a reference to the task's pid to make sure
+* pid will not be re-used - needed only for multithread
+* applications.
+*/
+   txwin->pid = get_task_pid(current, PIDTYPE_PID);
+   /*
+* Acquire a reference to the task's mm.
+*/
+   txwin->mm = get_task_mm(current);
+
+   if (!txwin->mm) {
+   put_pid(txwin->pid);
+   pr_err("VAS: pid(%d): mm_struct is not found\n",
+   current->pid);
+   rc = -EPERM;
+   goto free_window;
+   }
+
+   mmgrab(txwin->mm);
+   mmput(txwin->mm);
+   mm_context_add_copro(txwin->mm);
+   /*
+* Process closes window during exit. In the case of
+* multithread application, the child thread can open
+* window and can exit without closing it. Expects parent
+* thread to use and close the window. So do not need
+* to take pid reference for parent thread.
+*/
+   txwin->tgid = find_get_pid(task_tgid_vnr(current));
}
 
set_vinst_win(vinst, txwin);
@@ -1257,8 +1290,17 @@ int vas_win_close(struct vas_window *window)
poll_window_castout(window);
 
/* if send window, drop reference to matching receive window */
-   if (window->tx_win)
+   if (window->tx_win) {
+   if (window->user_win) {
+   /* Dro

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram





Le 16/04/2020 à 07:22, Wang Wenhu a écrit :

Yes, kzalloc() would clean the allocated areas and the init of remaining array
elements are redundant. I will remove the block in v3.


+   dev_err(&pdev->dev, "error no valid uio-map configured\n");
+   ret = -EINVAL;
+   goto err_info_free_internel;
+   }
+
+   info->version = "0.1.0";


Could you define some DRIVER_VERSION in the top of the file next to
DRIVER_NAME instead of hard coding in the middle on a function ?


That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
thought it was the common-but-pointless practice of having the driver print a
version number that never gets updated, rather than something the UIO API
(unfortunately, compared to a feature query interface) expects.  That said,
I'm not sure what the value is of making it a macro since it should only be
used once, that use is self documenting, it isn't tunable, etc.  Though if
this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
again, it should be UIO_VERSION, not DRIVER_VERSION).

Does this really need a three-part version scheme?  What's wrong with a
version of "1", to be changed to "2" in the hopefully-unlikely event that the
userspace API changes?  Assuming UIO is used for this at all, which doesn't
seem like a great fit to me.

-Scott



As Scott mentioned, the version define as necessity by uio core but actually
useless for us here(and for many other type of devices I guess). So maybe the 
better
way is to set it optionally, but this belong first to uio core.

For the cache-sram uio driver, I will define an UIO_VERSION micro as a 
compromise
fit all wonders, no confusing as Greg first mentioned.


Yes I like it.




+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",  },
+   {},
+};


NACK

The device tree describes the hardware, not what driver you want to bind the
hardware to, or how you want to allocate the resources.  And even if defining
nodes for sram allocation were the right way to go, why do you have a separate
compatible for each chip when you're just describing software configuration?

Instead, have module parameters that take the sizes and alignments you'd like
to allocate and expose to userspace.  Better still would be some sort of
dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
if it succeeds you can mmap it, and when the fd is closed the region is
freed).

-Scott



Can not agree more. But what if I want to define more than one cache-sram uio 
devices?
How about use the device tree for pseudo uio cache-sram driver?

static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
{   .compatible = "uio,cache-sram",   },
{},
};



You can still give it a name in line with your driver, ie: 
"uio,mpc85xx-cache-sram"


After, it you have different behaviours depending on the compatible, 
then you have to add a .data field which will tell the driver which 
behaviour to implement.


Christophe

[PATCH v11 05/14] powerpc/vas: Register NX with fault window ID and IRQ port value



For each user space send window, register NX with fault window ID
and port value so that NX paste CRBs in this fault FIFO when it
sees fault on the request buffer.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/vas-window.c | 15 +--
 arch/powerpc/platforms/powernv/vas.h| 15 +++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 1783fa9..dc46bf6 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -373,7 +373,7 @@ int init_winctx_regs(struct vas_window *window, struct 
vas_winctx *winctx)
init_xlate_regs(window, winctx->user_win);
 
val = 0ULL;
-   val = SET_FIELD(VAS_FAULT_TX_WIN, val, 0);
+   val = SET_FIELD(VAS_FAULT_TX_WIN, val, winctx->fault_win_id);
write_hvwc_reg(window, VREG(FAULT_TX_WIN), val);
 
/* In PowerNV, interrupts go to HV. */
@@ -748,6 +748,8 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin,
 
winctx->min_scope = VAS_SCOPE_LOCAL;
winctx->max_scope = VAS_SCOPE_VECTORED_GROUP;
+   if (rxwin->vinst->virq)
+   winctx->irq_port = rxwin->vinst->irq_port;
 }
 
 static bool rx_win_args_valid(enum vas_cop_type cop,
@@ -944,13 +946,22 @@ static void init_winctx_for_txwin(struct vas_window 
*txwin,
winctx->lpid = txattr->lpid;
winctx->pidr = txattr->pidr;
winctx->rx_win_id = txwin->rxwin->winid;
+   /*
+* IRQ and fault window setup is successful. Set fault window
+* for the send window so that ready to handle faults.
+*/
+   if (txwin->vinst->virq)
+   winctx->fault_win_id = txwin->vinst->fault_win->winid;
 
winctx->dma_type = VAS_DMA_TYPE_INJECT;
winctx->tc_mode = txattr->tc_mode;
winctx->min_scope = VAS_SCOPE_LOCAL;
winctx->max_scope = VAS_SCOPE_VECTORED_GROUP;
+   if (txwin->vinst->virq)
+   winctx->irq_port = txwin->vinst->irq_port;
 
-   winctx->pswid = 0;
+   winctx->pswid = txattr->pswid ? txattr->pswid :
+   encode_pswid(txwin->vinst->vas_id, txwin->winid);
 }
 
 static bool tx_win_args_valid(enum vas_cop_type cop,
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 9c8e3f5..88d084d 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -467,6 +467,21 @@ static inline u64 read_hvwc_reg(struct vas_window *win,
return in_be64(win->hvwc_map+reg);
 }
 
+/*
+ * Encode/decode the Partition Send Window ID (PSWID) for a window in
+ * a way that we can uniquely identify any window in the system. i.e.
+ * we should be able to locate the 'struct vas_window' given the PSWID.
+ *
+ * BitsUsage
+ * 0:7 VAS id (8 bits)
+ * 8:15Unused, 0 (3 bits)
+ * 16:31   Window id (16 bits)
+ */
+static inline u32 encode_pswid(int vasid, int winid)
+{
+   return ((u32)winid | (vasid << (31 - 7)));
+}
+
 static inline void decode_pswid(u32 pswid, int *vasid, int *winid)
 {
if (vasid)
-- 
1.8.3.1

[PATCH v11 04/14] powerpc/vas: Setup fault window per VAS instance



Setup fault window for each VAS instance. When NX gets a fault on
request buffer, pastes fault CRB in the corresponding fault FIFO and
then raises an interrupt to the OS. The kernel handles this fault
and process faults CRB from this FIFO.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/powernv/Makefile |  2 +-
 arch/powerpc/platforms/powernv/vas-fault.c  | 77 +
 arch/powerpc/platforms/powernv/vas-window.c |  4 +-
 arch/powerpc/platforms/powernv/vas.c| 20 
 arch/powerpc/platforms/powernv/vas.h| 21 
 5 files changed, 121 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index c0f8120..395789f 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_MEMORY_FAILURE)  += opal-memory-errors.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
-obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
+obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o vas-fault.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
 obj-$(CONFIG_SCOM_DEBUGFS) += opal-xscom.o
 obj-$(CONFIG_PPC_SECURE_BOOT) += opal-secvar.o
diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
new file mode 100644
index 000..4044998
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * VAS Fault handling.
+ * Copyright 2019, IBM Corporation
+ */
+
+#define pr_fmt(fmt) "vas: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vas.h"
+
+/*
+ * The maximum FIFO size for fault window can be 8MB
+ * (VAS_RX_FIFO_SIZE_MAX). Using 4MB FIFO since each VAS
+ * instance will be having fault window.
+ * 8MB FIFO can be used if expects more faults for each VAS
+ * instance.
+ */
+#define VAS_FAULT_WIN_FIFO_SIZE(4 << 20)
+
+/*
+ * Fault window is opened per VAS instance. NX pastes fault CRB in fault
+ * FIFO upon page faults.
+ */
+int vas_setup_fault_window(struct vas_instance *vinst)
+{
+   struct vas_rx_win_attr attr;
+
+   vinst->fault_fifo_size = VAS_FAULT_WIN_FIFO_SIZE;
+   vinst->fault_fifo = kzalloc(vinst->fault_fifo_size, GFP_KERNEL);
+   if (!vinst->fault_fifo) {
+   pr_err("Unable to alloc %d bytes for fault_fifo\n",
+   vinst->fault_fifo_size);
+   return -ENOMEM;
+   }
+
+   /*
+* Invalidate all CRB entries. NX pastes valid entry for each fault.
+*/
+   memset(vinst->fault_fifo, FIFO_INVALID_ENTRY, vinst->fault_fifo_size);
+   vas_init_rx_win_attr(&attr, VAS_COP_TYPE_FAULT);
+
+   attr.rx_fifo_size = vinst->fault_fifo_size;
+   attr.rx_fifo = vinst->fault_fifo;
+
+   /*
+* Max creds is based on number of CRBs can fit in the FIFO.
+* (fault_fifo_size/CRB_SIZE). If 8MB FIFO is used, max creds
+* will be 0x since the receive creds field is 16bits wide.
+*/
+   attr.wcreds_max = vinst->fault_fifo_size / CRB_SIZE;
+   attr.lnotify_lpid = 0;
+   attr.lnotify_pid = mfspr(SPRN_PID);
+   attr.lnotify_tid = mfspr(SPRN_PID);
+
+   vinst->fault_win = vas_rx_win_open(vinst->vas_id, VAS_COP_TYPE_FAULT,
+   &attr);
+
+   if (IS_ERR(vinst->fault_win)) {
+   pr_err("VAS: Error %ld opening FaultWin\n",
+   PTR_ERR(vinst->fault_win));
+   kfree(vinst->fault_fifo);
+   return PTR_ERR(vinst->fault_win);
+   }
+
+   pr_devel("VAS: Created FaultWin %d, LPID/PID/TID [%d/%d/%d]\n",
+   vinst->fault_win->winid, attr.lnotify_lpid,
+   attr.lnotify_pid, attr.lnotify_tid);
+
+   return 0;
+}
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
index 0c0d27d..1783fa9 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -827,9 +827,9 @@ void vas_init_rx_win_attr(struct vas_rx_win_attr *rxattr, 
enum vas_cop_type cop)
rxattr->fault_win = true;
rxattr->notify_disable = true;
rxattr->rx_wcred_mode = true;
-   rxattr->tx_wcred_mode = true;
rxattr->rx_win_ord_mode = true;
-   rxattr->tx_win_ord_mode = true;
+   rxattr->rej_no_credit = true;
+   rxattr->tc_mode = VAS_THRESH_DISABLED;
} else if (cop == VAS_COP_TYPE_FTW) {
rxattr->user_win = true;
rxattr->intr_disable = true;
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index

[PATCH v11 03/14] powerpc/vas: Alloc and setup IRQ and trigger port address



Allocate a xive irq on each chip with a vas instance. The NX coprocessor
raises a host CPU interrupt via vas if it encounters page fault on user
space request buffer. Subsequent patches register the trigger port with
the NX coprocessor, and create a vas fault handler for this interrupt
mapping.

Signed-off-by: Haren Myneni 
Reviewed-by: Cédric Le Goater 
---
 arch/powerpc/platforms/powernv/vas.c | 44 +++-
 arch/powerpc/platforms/powernv/vas.h |  2 ++
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index ed9cc6d..3303cfe 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vas.h"
 
@@ -25,10 +26,12 @@
 
 static int init_vas_instance(struct platform_device *pdev)
 {
-   int rc, cpu, vasid;
-   struct resource *res;
-   struct vas_instance *vinst;
struct device_node *dn = pdev->dev.of_node;
+   struct vas_instance *vinst;
+   struct xive_irq_data *xd;
+   uint32_t chipid, hwirq;
+   struct resource *res;
+   int rc, cpu, vasid;
 
rc = of_property_read_u32(dn, "ibm,vas-id", &vasid);
if (rc) {
@@ -36,6 +39,12 @@ static int init_vas_instance(struct platform_device *pdev)
return -ENODEV;
}
 
+   rc = of_property_read_u32(dn, "ibm,chip-id", &chipid);
+   if (rc) {
+   pr_err("No ibm,chip-id property for %s?\n", pdev->name);
+   return -ENODEV;
+   }
+
if (pdev->num_resources != 4) {
pr_err("Unexpected DT configuration for [%s, %d]\n",
pdev->name, vasid);
@@ -69,9 +78,32 @@ static int init_vas_instance(struct platform_device *pdev)
 
vinst->paste_win_id_shift = 63 - res->end;
 
-   pr_devel("Initialized instance [%s, %d], paste_base 0x%llx, "
-   "paste_win_id_shift 0x%llx\n", pdev->name, vasid,
-   vinst->paste_base_addr, vinst->paste_win_id_shift);
+   hwirq = xive_native_alloc_irq_on_chip(chipid);
+   if (!hwirq) {
+   pr_err("Inst%d: Unable to allocate global irq for chip %d\n",
+   vinst->vas_id, chipid);
+   return -ENOENT;
+   }
+
+   vinst->virq = irq_create_mapping(NULL, hwirq);
+   if (!vinst->virq) {
+   pr_err("Inst%d: Unable to map global irq %d\n",
+   vinst->vas_id, hwirq);
+   return -EINVAL;
+   }
+
+   xd = irq_get_handler_data(vinst->virq);
+   if (!xd) {
+   pr_err("Inst%d: Invalid virq %d\n",
+   vinst->vas_id, vinst->virq);
+   return -EINVAL;
+   }
+
+   vinst->irq_port = xd->trig_page;
+   pr_devel("Initialized instance [%s, %d] paste_base 0x%llx 
paste_win_id_shift 0x%llx IRQ %d Port 0x%llx\n",
+   pdev->name, vasid, vinst->paste_base_addr,
+   vinst->paste_win_id_shift, vinst->virq,
+   vinst->irq_port);
 
for_each_possible_cpu(cpu) {
if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn))
diff --git a/arch/powerpc/platforms/powernv/vas.h 
b/arch/powerpc/platforms/powernv/vas.h
index 5574aec..598608b 100644
--- a/arch/powerpc/platforms/powernv/vas.h
+++ b/arch/powerpc/platforms/powernv/vas.h
@@ -313,6 +313,8 @@ struct vas_instance {
u64 paste_base_addr;
u64 paste_win_id_shift;
 
+   u64 irq_port;
+   int virq;
struct mutex mutex;
struct vas_window *rxwin[VAS_COP_TYPE_MAX];
struct vas_window *windows[VAS_WINDOWS_PER_CHIP];
-- 
1.8.3.1

[PATCH v11 02/14] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block



Kernel sets fault address and status in CRB for NX page fault on user
space address after processing page fault. User space gets the signal
and handles the fault mentioned in CRB by bringing the page in to
memory and send NX request again.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/icswx.h | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 9872f85..965b1f3 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -108,6 +108,17 @@ struct data_descriptor_entry {
__be64 address;
 } __packed __aligned(DDE_ALIGN);
 
+/* 4.3.2 NX-stamped Fault CRB */
+
+#define NX_STAMP_ALIGN  (0x10)
+
+struct nx_fault_stamp {
+   __be64 fault_storage_addr;
+   __be16 reserved;
+   __u8   flags;
+   __u8   fault_status;
+   __be32 pswid;
+} __packed __aligned(NX_STAMP_ALIGN);
 
 /* Chapter 6.5.2 Coprocessor-Request Block (CRB) */
 
@@ -135,10 +146,15 @@ struct coprocessor_request_block {
 
struct coprocessor_completion_block ccb;
 
-   u8 reserved[48];
+   union {
+   struct nx_fault_stamp nx;
+   u8 reserved[16];
+   } stamp;
+
+   u8 reserved[32];
 
struct coprocessor_status_block csb;
-} __packed __aligned(CRB_ALIGN);
+} __packed;
 
 
 /* RFC02167 Initiate Coprocessor Instructions document
-- 
1.8.3.1

[PATCH v11 01/14] powerpc/xive: Define xive_native_alloc_irq_on_chip()



This function allocates IRQ on a specific chip. VAS needs per chip
IRQ allocation and will have IRQ handler per VAS instance.

Signed-off-by: Haren Myneni 
Reviewed-by: Cédric Le Goater 
---
 arch/powerpc/include/asm/xive.h   | 9 -
 arch/powerpc/sysdev/xive/native.c | 6 +++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 93f982db..d08ea11 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -5,6 +5,8 @@
 #ifndef _ASM_POWERPC_XIVE_H
 #define _ASM_POWERPC_XIVE_H
 
+#include 
+
 #define XIVE_INVALID_VP0x
 
 #ifdef CONFIG_PPC_XIVE
@@ -108,7 +110,6 @@ struct xive_q {
 int xive_native_populate_irq_data(u32 hw_irq,
  struct xive_irq_data *data);
 void xive_cleanup_irq_data(struct xive_irq_data *xd);
-u32 xive_native_alloc_irq(void);
 void xive_native_free_irq(u32 irq);
 int xive_native_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq);
 
@@ -137,6 +138,12 @@ int xive_native_set_queue_state(u32 vp_id, uint32_t prio, 
u32 qtoggle,
u32 qindex);
 int xive_native_get_vp_state(u32 vp_id, u64 *out_state);
 bool xive_native_has_queue_state_support(void);
+extern u32 xive_native_alloc_irq_on_chip(u32 chip_id);
+
+static inline u32 xive_native_alloc_irq(void)
+{
+   return xive_native_alloc_irq_on_chip(OPAL_XIVE_ANY_CHIP);
+}
 
 #else
 
diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 0ff6b73..14d4406 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -279,12 +279,12 @@ static int xive_native_get_ipi(unsigned int cpu, struct 
xive_cpu *xc)
 }
 #endif /* CONFIG_SMP */
 
-u32 xive_native_alloc_irq(void)
+u32 xive_native_alloc_irq_on_chip(u32 chip_id)
 {
s64 rc;
 
for (;;) {
-   rc = opal_xive_allocate_irq(OPAL_XIVE_ANY_CHIP);
+   rc = opal_xive_allocate_irq(chip_id);
if (rc != OPAL_BUSY)
break;
msleep(OPAL_BUSY_DELAY_MS);
@@ -293,7 +293,7 @@ u32 xive_native_alloc_irq(void)
return 0;
return rc;
 }
-EXPORT_SYMBOL_GPL(xive_native_alloc_irq);
+EXPORT_SYMBOL_GPL(xive_native_alloc_irq_on_chip);
 
 void xive_native_free_irq(u32 irq)
 {
-- 
1.8.3.1

[PATCH v11 00/14] powerpc/vas: Page fault handling for user space NX requests



On power9, Virtual Accelerator Switchboard (VAS) allows user space or
kernel to communicate with Nest Accelerator (NX) directly using COPY/PASTE
instructions. NX provides various functionalities such as compression,
encryption and etc. But only compression (842 and GZIP formats) is
supported in Linux kernel on power9.

842 compression driver (drivers/crypto/nx/nx-842-powernv.c)
is already included in Linux. Only GZIP support will be available from
user space.

Applications can issue GZIP compression / decompression requests to NX with
COPY/PASTE instructions. When NX is processing these requests, can hit
fault on the request buffer (not in memory). It issues an interrupt and
pastes fault CRB in fault FIFO. Expects kernel to handle this fault and
return credits for both send and fault windows after processing.

This patch series adds IRQ and fault window setup, and NX fault handling:
- Alloc IRQ and trigger port address, and configure IRQ per VAS instance.
- Set port# for each window to generate an interrupt when noticed fault.
- Set fault window and FIFO on which NX paste fault CRB.
- Setup IRQ thread fault handler per VAS instance.
- When receiving an interrupt, Read CRBs from fault FIFO and update
  coprocessor_status_block (CSB) in the corresponding CRB with translation
  failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls
  on CSB address. When it sees translation error, can touch the request
  buffer to bring the page in to memory and reissue NX request.
- If copy_to_user fails on user space CSB address, OS sends SEGV signal.

Tested these patches with NX-GZIP enable patches and posted them as separate
patch series.

Patch 1: Define alloc IRQ per chip which is needed to alloc IRQ per VAS
   instance.
Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault
 CRB
Patch 3: Alloc and setup IRQ and trigger port address for each VAS instance
Patches 4 & 5: Setup fault window and register NX per each VAS instance. This
 window is used for NX to paste fault CRB in FIFO.
Patch 6: Reference to pid and mm so that pid is not used until window closed.
 Needed for multi thread application where child can open a window
 and can be used by parent it later.
Patch 7: Setup threaded IRQ handler per VAS
Patch 8: Process CRBs from fault FIFO and notify tasks by updating CSB or
 through signals.
Patches 9 & 11: Return credits for send and fault windows after handling
faults.
Patches 10 & 12: Dump FIFO / CRB data and messages for error conditions
Patch 13: Fix closing send window after all credits are returned. This issue
 happens only for user space requests. No page faults on kernel
 request buffer.
Patch 14: For each process / thread, use mm_context->vas_windows counter to
 clear foreign address mapping and disable it.

Changelog:

V2:
  - Use threaded IRQ instead of own kernel thread handler
  - Use pswid instead of user space CSB address to find valid CRB
  - Removed unused macros and other changes as suggested by Christoph Hellwig

V3:
  - Rebased to 5.5-rc2
  - Use struct pid * instead of pid_t for vas_window tgid
  - Code cleanup as suggested by Christoph Hellwig

V4:
  - Define xive alloc and get IRQ info based on chip ID and use these
   functions for IRQ setup per VAS instance. It eliminates skiboot
dependency as suggested by Oliver.

V5:
  - Do not update CSB if the process is exiting (patch8)

V6:
  - Add interrupt handler instead of default one and return IRQ_HANDLED
if the fault handling thread is already in progress. (Patch7)
  - Use platform send window ID and CCW[0] bit to find valid CRB in
fault FIFO (Patch7).
  - Return fault address to user space in BE and other changes as
suggested by Michael Neuling. (patch8)
  - Rebased to 5.6-rc4

V7:
  - Fixed sparse warnings (patches 4, 9 and 10)

V8:
  - Moved mm_context_remove_copro() before mmdrop() (patch6)
  - Moved barrier before csb.flags store and add WARN_ON_ONCE() checks (patch8)

V9:
  - Rebased to 5.6
  - Changes based on Cedric's comments
- Removed "Define xive_native_alloc_get_irq_info()" patch and used
  irq_get_handler_data() (patch3)
  - Changes based on comments from Nicholas Piggin
- Moved "Taking PID reference" patch before setting VAS fault handler
  patch
- Removed mutex_lock/unlock (patch7)
- Other cleanup changes

V10:
  - Include patch to enable and disable CP_ABORT execution using
mm_context->vas_windows counter.
  - Remove 'if (txwin)' line which is covered with 'else' before (patch6)

V11:
  - Added comments for fault_lock and fifo_in_progress elements (patch7)
  - Use pr_warn_ratelimited instead of pr_debug to display message during
window close (patch12)
  - Moved set_thread_uses_vas() to vas_win_open() (patch14)

Haren Myneni (14):
  powerpc/xive: Define xive_native_alloc_irq_on_chip()
  powerpc/vas: Define nx_fault_stamp in coprocessor_request

Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()


Hi,

Le 16/04/2020 à 00:06, Segher Boessenkool a écrit :

Hi!

On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:

At the time being, __put_user()/__get_user() and friends only use
register indirect with immediate index addressing, with the index
set to 0. Ex:

lwz reg1, 0(reg2)


This is called a "D-form" instruction, or sometimes "offset addressing".
Don't talk about an "index", it confuses things, because the *other*
kind is called "indexed" already, also in the ISA docs!  (X-form, aka
indexed addressing, [reg+reg], where D-form does [reg+imm], and both
forms can do [reg]).


In the "Programming Environments Manual for 32-Bit Implementations of 
the PowerPC™ Architecture", they list the following addressing modes:


Load and store operations have three categories of effective address 
generation that depend on the

operands specified:
• Register indirect with immediate index mode
• Register indirect with index mode
• Register indirect mode





Give the compiler the opportunity to use other adressing modes
whenever possible, to get more optimised code.


Great :-)


--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -114,7 +114,7 @@ extern long __put_user_bad(void);
   */
  #define __put_user_asm(x, addr, err, op)  \
__asm__ __volatile__(   \
-   "1:" op " %1,0(%2)   # put_user\n"  \
+   "1:" op "%U2%X2 %1,%2# put_user\n"  \
"2:\n"\
".section .fixup,\"ax\"\n"  \
"3:li %0,%3\n"\
@@ -122,7 +122,7 @@ extern long __put_user_bad(void);
".previous\n" \
EX_TABLE(1b, 3b)\
: "=r" (err)  \
-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))


%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
want pre-modify ("update") insns to be generated.  (You then will want
to make sure that operand is used in a way GCC can understand; since it
is used only once here, that works fine).


Ah ? Indeed I got the idea from include/asm/io.h where there is:

#define DEF_MMIO_IN_D(name, size, insn) \
static inline u##size name(const volatile u##size __iomem *addr)\
{   \
u##size ret;\
__asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\
: "=r" (ret) : "m" (*addr) : "memory");   \
return ret; \
}

It should be "m<>" there as well ?




@@ -130,8 +130,8 @@ extern long __put_user_bad(void);
  #else /* __powerpc64__ */
  #define __put_user_asm2(x, addr, err) \
__asm__ __volatile__(   \
-   "1:stw %1,0(%2)\n"\
-   "2:stw %1+1,4(%2)\n"  \
+   "1:stw%U2%X2 %1,%2\n" \
+   "2:stw%U2%X2 %L1,%L2\n"   \
"3:\n"\
".section .fixup,\"ax\"\n"  \
"4:li %0,%3\n"\
@@ -140,7 +140,7 @@ extern long __put_user_bad(void);
EX_TABLE(1b, 4b)\
EX_TABLE(2b, 4b)\
: "=r" (err)  \
-   : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
+   : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))


Here, it doesn't work.  You don't want two consecutive update insns in
any case.  Easiest is to just not use "m<>", and then, don't use %Un
(which won't do anything, but it is confusing).


Can't we leave the Un on the second stw ?



Same for the reads.

Rest looks fine, and update should be good with that fixed as said.

Reviewed-by: Segher Boessenkool 


Segher



Thanks for the review
Christophe

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

Yes, kzalloc() would clean the allocated areas and the init of remaining array
elements are redundant. I will remove the block in v3.

>> > +  dev_err(&pdev->dev, "error no valid uio-map configured\n");
>> > +  ret = -EINVAL;
>> > +  goto err_info_free_internel;
>> > +  }
>> > +
>> > +  info->version = "0.1.0";
>> 
>> Could you define some DRIVER_VERSION in the top of the file next to 
>> DRIVER_NAME instead of hard coding in the middle on a function ?
>
>That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
>thought it was the common-but-pointless practice of having the driver print a
>version number that never gets updated, rather than something the UIO API
>(unfortunately, compared to a feature query interface) expects.  That said,
>I'm not sure what the value is of making it a macro since it should only be
>used once, that use is self documenting, it isn't tunable, etc.  Though if
>this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
>again, it should be UIO_VERSION, not DRIVER_VERSION).
>
>Does this really need a three-part version scheme?  What's wrong with a
>version of "1", to be changed to "2" in the hopefully-unlikely event that the
>userspace API changes?  Assuming UIO is used for this at all, which doesn't
>seem like a great fit to me.
>
>-Scott
>

As Scott mentioned, the version define as necessity by uio core but actually
useless for us here(and for many other type of devices I guess). So maybe the 
better
way is to set it optionally, but this belong first to uio core.

For the cache-sram uio driver, I will define an UIO_VERSION micro as a 
compromise
fit all wonders, no confusing as Greg first mentioned.

>> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
>> +{   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
>> +{},
>> +};
>
>NACK
>
>The device tree describes the hardware, not what driver you want to bind the
>hardware to, or how you want to allocate the resources.  And even if defining
>nodes for sram allocation were the right way to go, why do you have a separate
>compatible for each chip when you're just describing software configuration?
>
>Instead, have module parameters that take the sizes and alignments you'd like
>to allocate and expose to userspace.  Better still would be some sort of
>dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
>if it succeeds you can mmap it, and when the fd is closed the region is
>freed).
>
>-Scott
>

Can not agree more. But what if I want to define more than one cache-sram uio 
devices?
How about use the device tree for pseudo uio cache-sram driver?

static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
{   .compatible = "uio,cache-sram", },
{},
};

Thanks,
Wenhu

[PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function

2020-04-15 Thread Tianjia Zhang

In earlier versions of kvm, 'kvm_run' is an independent structure
and is not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplify the function definition, removes the extra
'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure
if necessary.

Signed-off-by: Tianjia Zhang 
---

v2 change:
  remove 'kvm_run' parameter and extract it from 'kvm_vcpu'

 arch/mips/kvm/mips.c   |  3 ++-
 arch/powerpc/kvm/powerpc.c |  3 ++-
 arch/s390/kvm/kvm-s390.c   |  3 ++-
 arch/x86/kvm/x86.c | 11 ++-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/arm/arm.c |  6 +++---
 virt/kvm/kvm_main.c|  2 +-
 7 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 8f05dd0a0f4e..ec24adf4857e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
return -ENOIOCTLCMD;
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r = -EINTR;
 
vcpu_load(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index e15166b0a16d..7e24691e138a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
return r;
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *run = vcpu->run;
int r;
 
vcpu_load(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 19a81024fe16..443af3ead739 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
store_regs_fmt2(vcpu, kvm_run);
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int rc;
 
if (kvm_run->immediate_exit)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bf2ecafd027..a0338e86c90f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
trace_kvm_fpu(0);
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+   struct kvm_run *kvm_run = vcpu->run;
int r;
 
vcpu_load(vcpu);
@@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
r = -EAGAIN;
if (signal_pending(current)) {
r = -EINTR;
-   vcpu->run->exit_reason = KVM_EXIT_INTR;
+   kvm_run->exit_reason = KVM_EXIT_INTR;
++vcpu->stat.signal_exits;
}
goto out;
}
 
-   if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
+   if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) {
r = -EINVAL;
goto out;
}
 
-   if (vcpu->run->kvm_dirty_regs) {
+   if (kvm_run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
goto out;
@@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 
 out:
kvm_put_guest_fpu(vcpu);
-   if (vcpu->run->kvm_valid_regs)
+   if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d58beb65454..1e17ef719595 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state);
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
 
 int kvm_arch_init(void *opaque);
 void kvm_arch_exit(void);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..f5390ac2165b 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:  The VCPU pointer
- * @run:   The kvm_run structure pointer used for userspace state

[PATCH] KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions

2020-04-15 Thread Paul Mackerras

Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE.  That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down.  That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead, letting it fault again, and retrying the whole page fault from
the beginning.

For completeness, this fixes the radix page fault handler in the same
way.  For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.

Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page 
fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Reported-by: David Gibson 
Signed-off-by: Paul Mackerras 
---
This is a reworked version of the patch David Gibson sent recently,
with the fix applied to the radix case as well. The commit message
is mostly stolen from David's patch.

 arch/powerpc/kvm/book3s_64_mmu_hv.c| 9 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 3aecec8..20b7dce 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -604,18 +604,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 */
local_irq_disable();
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
+   pte = __pte(0);
+   if (ptep)
+   pte = *ptep;
+   local_irq_enable();
/*
 * If the PTE disappeared temporarily due to a THP
 * collapse, just return and let the guest try again.
 */
-   if (!ptep) {
-   local_irq_enable();
+   if (!pte_present(pte)) {
if (page)
put_page(page);
return RESUME_GUEST;
}
-   pte = *ptep;
-   local_irq_enable();
hpa = pte_pfn(pte) << PAGE_SHIFT;
pte_size = PAGE_SIZE;
if (shift)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 134fbc1..7bf94ba 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -815,18 +815,19 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 */
local_irq_disable();
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
+   pte = __pte(0);
+   if (ptep)
+   pte = *ptep;
+   local_irq_enable();
/*
 * If the PTE disappeared temporarily due to a THP
 * collapse, just return and let the guest try again.
 */
-   if (!ptep) {
-   local_irq_enable();
+   if (!pte_present(pte)) {
if (page)
put_page(page);
return RESUME_GUEST;
}
-   pte = *ptep;
-   local_irq_enable();
 
/* If we're logging dirty pages, always map single pages */
large_enable = !(memslot->flags & KVM_MEM_LOG_DIRTY_PAGES);
-- 
2.7.4

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

2020-04-15 Thread Florian Weimer

* Rich Felker:

> My preference would be that it work just like the i386 AT_SYSINFO
> where you just replace "int $128" with "call *%%gs:16" and the kernel
> provides a stub in the vdso that performs either scv or the old
> mechanism with the same calling convention.

The i386 mechanism has received some criticism because it provides an
effective means to redirect execution flow to anyone who can write to
the TCB.  I am not sure if it makes sense to copy it.

Re: [PATCH v2] powerpc/setup_64: Set cache-line-size based on cache-block-size

2020-04-15 Thread Chris Packham

Hi All,

On Wed, 2020-03-25 at 16:18 +1300, Chris Packham wrote:
> If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not,
> use
> the block-size value for both. Per the devicetree spec cache-line-
> size
> is only needed if it differs from the block size.
> 
> Signed-off-by: Chris Packham 
> ---
> It looks as though the bsizep = lsizep is not required per the spec
> but it's
> probably safer to retain it.
> 
> Changes in v2:
> - Scott pointed out that u-boot should be filling in the cache
> properties
>   (which it does). But it does not specify a cache-line-size because
> it
>   provides a cache-block-size and the spec says you don't have to if
> they are
>   the same. So the error is in the parsing not in the devicetree
> itself.
> 

Ping? This thread went kind of quiet.

>  arch/powerpc/kernel/setup_64.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c
> b/arch/powerpc/kernel/setup_64.c
> index e05e6dd67ae6..dd8a238b54b8 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -516,6 +516,8 @@ static bool __init parse_cache_info(struct
> device_node *np,
>   lsizep = of_get_property(np, propnames[3], NULL);
>   if (bsizep == NULL)
>   bsizep = lsizep;
> + if (lsizep == NULL)
> + lsizep = bsizep;
>   if (lsizep != NULL)
>   lsize = be32_to_cpu(*lsizep);
>   if (bsizep != NULL)

Re: [PATCH v2, 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

From: Scott Wood 

>> +bool "32-bit kernel"
>
>Why make that user selectable ?
>
>Either a kernel is 64-bit or it is 32-bit. So having PPC64 user 
>selectable is all we need.
>
>And what is the link between this change and the description in the log ?
>
>>  default y if !PPC64
>>  select KASAN_VMALLOC if KASAN && MODULES
>>   
>> @@ -15,6 +15,7 @@ config PPC_BOOK3S_32
>>  bool
>>   
>>   menu "Processor support"
>> +
>
>Why adding this space ?
>
>>   choice
>>  prompt "Processor Type"
>>  depends on PPC32
>> @@ -211,9 +212,9 @@ config PPC_BOOK3E
>>  depends on PPC_BOOK3E_64
>>   
>>   config E500
>> +bool "e500 Support"
>>  select FSL_EMB_PERFMON
>>  select PPC_FSL_BOOK3E
>> -bool
>
>Why make this user-selectable ? This is already selected by the 
>processors requiring it, ie 8500, e5500 and e6500.
>
>Is there any other case where we need E500 ?
>
>And again, what's the link between this change and the description in 
>the log ?
>
>
>>   
>>   config PPC_E500MC
>>  bool "e500mc Support"
>> 
>
>Christophe

Hi, Scott, Christophe!

I find that I did not get the point well of the defferences between
configurability and selectability(maybe words I created) of Kconfig items.

You are right that FSL_85XX_CACHE_SRAM should only be selected by a caller
but never enable it seperately.

Same answer for the comments from Christophe. I will drop this patch in v3.

Thanks,
Wenhu

Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Paul Mackerras

On Wed, Apr 15, 2020 at 04:03:29PM +0200, Michal Suchánek wrote:
> On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote:
> > The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
> > Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and
> > User Authority Mask Override Register (UAMOR) are not correctly saved and
> > restored when the CPU is going into/coming out of idle state.
> > 
> > On POWER9 CPUs, this means that a CPU may return from idle with the AMR
> > value of another thread on the same core.
> > 
> > This allows a trivial Denial of Service attack against KVM hosts, by booting
> > a guest kernel which makes use of the AMR, such as a v5.2 or later kernel
> > with Kernel Userspace Access Prevention (KUAP) enabled.
> > 
> > The guest kernel will set the AMR to prevent userspace access, then the
> > thread will go idle. At a later point, the hardware thread that the guest
> > was using may come out of idle and start executing in the host, without
> > restoring the host AMR value. The host kernel can get caught in a page fault
> > loop, as the AMR is unexpectedly causing memory accesses to fail in the
> > host, and the host is eventually rendered unusable.
> 
> Hello,
> 
> shouldn't the kernel restore the host registers when leaving the guest?

It does.  That's not the bug.

> I recall some code exists for handling the *AM*R when leaving guest. Can
> the KVM guest enter idle without exiting to host?

No, we currently never execute the "stop" instruction in guest context.

The bug occurs when a thread that is in the host goes idle and
executes the stop instruction to go to a power-saving state, while
another thread is executing inside a guest.  Hardware loses the first
thread's AMR while it is stopped, and as it happens, it is possible
for the first thread to wake up with the contents of its AMR equal to
the other thread's AMR.  This can happen even if the first thread has
never executed in the guest.

The kernel needs to save and restore AMR (among other registers)
across the stop instruction because of this hardware behaviour.
We missed the AMR initially, which is what led to this vulnerability.

Paul.

[PATCH] KVM: PPC: Handle non-present PTEs in kvmppc_book3s_hv_page_fault()

2020-04-15 Thread David Gibson

Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE.  That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down.  That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead.

Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page 
fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Suggested-by: Paul Mackerras 
Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6404df613ea3..394fca8e630a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -616,6 +616,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
}
pte = *ptep;
local_irq_enable();
+   if (!pte_present(pte)) {
+   if (page)
+   put_page(page);
+   return RESUME_GUEST;
+   }
hpa = pte_pfn(pte) << PAGE_SHIFT;
pte_size = PAGE_SIZE;
if (shift)
-- 
2.25.2

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 1:03 pm:
> On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote:
>> > Not to mention the dcache line to access
>> > __hwcap or whatever, and the icache lines to setup access TOC-relative
>> > access to it. (Of course you could put a copy of its value in TLS at a
>> > fixed offset, which would somewhat mitigate both.)
>> > 
>> >> And finally, the HWCAP test can eventually go away in future. A vdso
>> >> call can not.
>> > 
>> > We support nearly arbitrarily old kernels (with limited functionality)
>> > and hardware (with full functionality) and don't intend for that to
>> > change, ever. But indeed glibc might want too eventually drop the
>> > check.
>> 
>> Ah, cool. Any build-time flexibility there?
>> 
>> We may or may not be getting a new ABI that will use instructions not 
>> supported by old processors.
>> 
>> https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html
>> 
>> Current ABI continues to work of course and be the default for some 
>> time, but building for new one would give some opportunity to drop
>> such support for old procs, at least for glibc.
> 
> What does "new ABI" entail to you? In the terminology I use with musl,
> "new ABI" and "new ISA level" are different things. You can compile
> (explicit -march or compiler default) binaries that won't run on older
> cpus due to use of new insns etc., but we consider it the same ABI if
> you can link code for an older/baseline ISA level with the
> newer-ISA-level object files, i.e. if the interface surface for
> linkage remains compatible. We also try to avoid gratuitous
> proliferation of different ABIs unless there's a strong underlying
> need (like addition of softfloat ABIs for archs that usually have FPU,
> or vice versa).

Yeah it will be a new ABI type that also requires a new ISA level.
As far as I know (and I'm not on the toolchain side) there will be
some call compatibility between the two, so it may be fine to
continue with existing ABI for libc. But it just something that
comes to mind as a build-time cutover where we might be able to
assume particular features.

> In principle the same could be done for kernels except it's a bigger
> silent gotcha (possible ENOSYS in places where it shouldn't be able to
> happen rather than a trapping SIGILL or similar) and there's rarely
> any serious performance or size benefit to dropping support for older
> kernels.

Right, I don't think it'd be a huge problem whatever way we go,
compared with the cost of the system call.

Thanks,
Nick

Re: [PATCH] papr/scm: Add bad memory ranges to nvdimm bad ranges

2020-04-15 Thread Vaibhav Jain

Hi Santosh,

Some review comments below.

Santosh Sivaraj  writes:

> Subscribe to the MCE notification and add the physical address which
> generated a memory error to nvdimm bad range.
>
> Signed-off-by: Santosh Sivaraj 
> ---
>
> This patch depends on "powerpc/mce: Add MCE notification chain" [1].
>
> Unlike the previous series[2], the patch adds badblock registration only for
> pseries scm driver. Handling badblocks for baremetal (powernv) PMEM will be 
> done
> later and if possible get the badblock handling as a common code.
>
> [1] 
> https://lore.kernel.org/linuxppc-dev/20200330071219.12284-1-ganes...@linux.ibm.com/
> [2] 
> https://lore.kernel.org/linuxppc-dev/20190820023030.18232-1-sant...@fossix.org/
>
> arch/powerpc/platforms/pseries/papr_scm.c | 96 ++-
>  1 file changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0b4467e378e5..5012cbf4606e 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  
> @@ -39,8 +41,12 @@ struct papr_scm_priv {
>   struct resource res;
>   struct nd_region *region;
>   struct nd_interleave_set nd_set;
> + struct list_head region_list;
>  };
>  
> +LIST_HEAD(papr_nd_regions);
> +DEFINE_MUTEX(papr_ndr_lock);
> +
>  static int drc_pmem_bind(struct papr_scm_priv *p)
>  {
>   unsigned long ret[PLPAR_HCALL_BUFSIZE];
> @@ -372,6 +378,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   dev_info(dev, "Region registered with target node %d and online 
> node %d",
>target_nid, online_nid);
>  
> + mutex_lock(&papr_ndr_lock);
> + list_add_tail(&p->region_list, &papr_nd_regions);
> + mutex_unlock(&papr_ndr_lock);
> +
>   return 0;
>  
>  err: nvdimm_bus_unregister(p->bus);
> @@ -379,6 +389,68 @@ err: nvdimm_bus_unregister(p->bus);
>   return -ENXIO;
>  }
>  
> +void papr_scm_add_badblock(struct nd_region *region, struct nvdimm_bus *bus,
> +u64 phys_addr)
> +{
> + u64 aligned_addr = ALIGN_DOWN(phys_addr, L1_CACHE_BYTES);
> +
> + if (nvdimm_bus_add_badrange(bus, aligned_addr, L1_CACHE_BYTES)) {
> + pr_err("Bad block registration for 0x%llx failed\n", phys_addr);
> + return;
> + }
> +
> + pr_debug("Add memory range (0x%llx - 0x%llx) as bad range\n",
> +  aligned_addr, aligned_addr + L1_CACHE_BYTES);
> +
> + nvdimm_region_notify(region, NVDIMM_REVALIDATE_POISON);
> +}
> +
> +static int handle_mce_ue(struct notifier_block *nb, unsigned long val,
> +  void *data)
> +{
> + struct machine_check_event *evt = data;
> + struct papr_scm_priv *p;
> + u64 phys_addr;
> + bool found = false;
> +
> + if (evt->error_type != MCE_ERROR_TYPE_UE)
> + return NOTIFY_DONE;
> +
> + if (list_empty(&papr_nd_regions))
> + return NOTIFY_DONE;
> +
> + phys_addr = evt->u.ue_error.physical_address +
> + (evt->u.ue_error.effective_address & ~PAGE_MASK);
Though it seems that you are trying to get the actual physical address
from the page aligned evt->u.ue_error.physical_address, it would be nice
if you could put a comment as to why you are doing this seemingly wierd
math with real and effective addresses here.
> +
> + if (!evt->u.ue_error.physical_address_provided ||
> + !is_zone_device_page(pfn_to_page(phys_addr >> PAGE_SHIFT)))
> + return NOTIFY_DONE;
> +
> + /* mce notifier is called from a process context, so mutex is safe */
> + mutex_lock(&papr_ndr_lock);
> + list_for_each_entry(p, &papr_nd_regions, region_list) {
> + struct resource res = p->res;
> +
> + if (phys_addr >= res.start && phys_addr <= res.end) {
> + found = true;
> + break;
> + }
> + }
> +
> + mutex_unlock(&papr_ndr_lock);
> +
> + if (!found)
> + return NOTIFY_DONE;
> +
> + papr_scm_add_badblock(p->region, p->bus, phys_addr);
I see a possible race between papr_scm_add_badblock() and
papr_scm_remove() as a bad block may be reported just remove a region is
disabled. Would recomment calling papr_scm_bad_block() in context of
papr_ndr_lock.

> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block mce_ue_nb = {
> + .notifier_call = handle_mce_ue
> +};
> +
>  static int papr_scm_probe(struct platform_device *pdev)
>  {
>   struct device_node *dn = pdev->dev.of_node;
> @@ -476,6 +548,10 @@ static int papr_scm_remove(struct platform_device *pdev)
>  {
>   struct papr_scm_priv *p = platform_get_drvdata(pdev);
>  
> + mutex_lock(&papr_ndr_lock);
> + list_del(&(p->region_list));
> + mutex_unlock(&papr_ndr_lock);
> +
>   nvdimm_bus_unregister(p->bus

Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c

2020-04-15 Thread Frank Rowand

On 4/8/20 10:22 AM, Frank Rowand wrote:
> Hi Michael,
> 
> On 4/7/20 10:13 PM, Michael Ellerman wrote:
>> bugzilla-dae...@bugzilla.kernel.org writes:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=206203
>>>
>>> Erhard F. (erhar...@mailbox.org) changed:
>>>
>>>What|Removed |Added
>>> 
>>>  Attachment #286801|0   |1
>>> is obsolete||
>>>
>>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
>>> Created attachment 288189
>>>   --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit
>>> kmemleak output (kernel 5.6.2, Talos II)
>>
>> These are all in or triggered by the of unittest code AFAICS.
>> Content of the log reproduced below.
>>
>> Frank/Rob, are these memory leaks expected?
> 
> Thanks for the report.  I'll look at each one.

Only one of the leaks was expected.  I have patches to fix the
unexpected leaks and to remove the expected leak so that the
kmemleak report of it will not have to be checked again.

I expect to send the patch series tomorrow (Thursday).

-Frank

> 
> -Frank
> 
> 
>>
>> cheers
>>
>>
>> unreferenced object 0xc007eb89ca58 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
>> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [] .of_unittest_changeset+0x13c/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec978008 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 31 00 6b 6b 6b 6b a5  n1..
>>   backtrace:
>> [] .kstrdup+0x44/0xb0
>> [] .__of_node_dup+0x50/0x1c0
>> [] .of_unittest_changeset+0x13c/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007eb89e318 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
>> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec97ab08 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 32 00 6b 6b 6b 6b a5  n2..
>>   backtrace:
>> [] .kstrdup+0x44/0xb0
>> [] .__of_node_dup+0x50/0x1c0
>> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007eb89e528 (size 192):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 32 bytes):
>> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00  
>> c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<07b50c76>] .__of_node_dup+0x38/0x1c0
>> [] .of_unittest_changeset+0x1ec/0xa20
>> [<925a8013>] .of_unittest+0x1ba0/0x3778
>> [] .do_one_initcall+0x7c/0x420
>> [] .kernel_init_freeable+0x318/0x3d8
>> [<01b957ee>] .kernel_init+0x14/0x168
>> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
>> unreferenced object 0xc007ec97b318 (size 8):
>>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>>   hex dump (first 8 bytes):
>> 6e 32 31 00 6b 6b 6b a5  n21.kkk.
>>   backtrace:
>> [] .kstrdup+0x44/0xb0

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote:
> > Not to mention the dcache line to access
> > __hwcap or whatever, and the icache lines to setup access TOC-relative
> > access to it. (Of course you could put a copy of its value in TLS at a
> > fixed offset, which would somewhat mitigate both.)
> > 
> >> And finally, the HWCAP test can eventually go away in future. A vdso
> >> call can not.
> > 
> > We support nearly arbitrarily old kernels (with limited functionality)
> > and hardware (with full functionality) and don't intend for that to
> > change, ever. But indeed glibc might want too eventually drop the
> > check.
> 
> Ah, cool. Any build-time flexibility there?
> 
> We may or may not be getting a new ABI that will use instructions not 
> supported by old processors.
> 
> https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html
> 
> Current ABI continues to work of course and be the default for some 
> time, but building for new one would give some opportunity to drop
> such support for old procs, at least for glibc.

What does "new ABI" entail to you? In the terminology I use with musl,
"new ABI" and "new ISA level" are different things. You can compile
(explicit -march or compiler default) binaries that won't run on older
cpus due to use of new insns etc., but we consider it the same ABI if
you can link code for an older/baseline ISA level with the
newer-ISA-level object files, i.e. if the interface surface for
linkage remains compatible. We also try to avoid gratuitous
proliferation of different ABIs unless there's a strong underlying
need (like addition of softfloat ABIs for archs that usually have FPU,
or vice versa).

In principle the same could be done for kernels except it's a bigger
silent gotcha (possible ENOSYS in places where it shouldn't be able to
happen rather than a trapping SIGILL or similar) and there's rarely
any serious performance or size benefit to dropping support for older
kernels.

Rich

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 12:35 pm:
> On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote:
>> >> > Likewise, it's not useful to have different error return mechanisms
>> >> > because the caller just has to branch to support both (or the
>> >> > kernel-provided stub just has to emulate one for it; that could work
>> >> > if you really want to change the bad existing convention).
>> >> > 
>> >> > Thoughts?
>> >> 
>> >> The existing convention has to change somewhat because of the clobbers,
>> >> so I thought we could change the error return at the same time. I'm
>> >> open to not changing it and using CR0[SO], but others liked the idea.
>> >> Pro: it matches sc and vsyscall. Con: it's different from other common
>> >> archs. Performnce-wise it would really be a wash -- cost of conditional
>> >> branch is not the cmp but the mispredict.
>> > 
>> > If you do the branch on hwcap at each syscall, then you significantly
>> > increase code size of every syscall point, likely turning a bunch of
>> > trivial functions that didn't need stack frames into ones that do. You
>> > also potentially make them need a TOC pointer. Making them all just do
>> > an indirect call unconditionally (with pointer in TLS like i386?) is a
>> > lot more efficient in code size and at least as good for performance.
>> 
>> I disagree. Doing the long vdso indirect call *necessarily* requires
>> touching a new icache line, and even a new TLB entry. Indirect branches
> 
> The increase in number of icache lines from the branch at every
> syscall point is far greater than the use of a single extra icache
> line shared by all syscalls.

That's true, I was thinking of a single function that does the test and 
calls syscalls, which might be the fair comparison.

> Not to mention the dcache line to access
> __hwcap or whatever, and the icache lines to setup access TOC-relative
> access to it. (Of course you could put a copy of its value in TLS at a
> fixed offset, which would somewhat mitigate both.)
> 
>> And finally, the HWCAP test can eventually go away in future. A vdso
>> call can not.
> 
> We support nearly arbitrarily old kernels (with limited functionality)
> and hardware (with full functionality) and don't intend for that to
> change, ever. But indeed glibc might want too eventually drop the
> check.

Ah, cool. Any build-time flexibility there?

We may or may not be getting a new ABI that will use instructions not 
supported by old processors.

https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html

Current ABI continues to work of course and be the default for some 
time, but building for new one would give some opportunity to drop
such support for old procs, at least for glibc.

> 
>> If you really want to select with an indirect branch rather than
>> direct conditional, you can do that all within the library.
> 
> OK. It's a little bit more work if that's not the interface the kernel
> will give us, but it's no big deal.

Okay.

Thanks,
Nick

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Oliver O'Halloran

On Thu, Apr 16, 2020 at 12:34 PM Oliver O'Halloran  wrote:
>
> On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy  wrote:
> >
> > Anyone? Is it totally useless or wrong approach? Thanks,
>
> I wouldn't say it's either, but I still hate it.
>
> The 4GB mode being per-PHB makes it difficult to use unless we force
> that mode on 100% of the time which I'd prefer not to do. Ideally
> devices that actually support 64bit addressing (which is most of them)
> should be able to use no-translate mode when possible since a) It's
> faster, and b) It frees up room in the TCE cache devices that actually
> need them. I know you've done some testing with 100G NICs and found
> the overhead was fine, but IMO that's a bad test since it's pretty
> much the best-case scenario since all the devices on the PHB are in
> the same PE. The PHB's TCE cache only hits when the TCE matches the
> DMA bus address and the PE number for the device so in a multi-PE
> environment there's a lot of potential for TCE cache trashing. If
> there was one or two PEs under that PHB it's probably not going to
> matter, but if you have an NVMe rack with 20 drives it starts to look
> a bit ugly.
>
> That all said, it might be worth doing this anyway since we probably
> want the software infrastructure in place to take advantage of it.
> Maybe expand the command line parameters to allow it to be enabled on
> a per-PHB basis rather than globally.

Since we're on the topic

I've been thinking the real issue we have is that we're trying to pick
an "optimal" IOMMU config at a point where we don't have enough
information to work out what's actually optimal. The IOMMU config is
done on a per-PE basis, but since PEs may contain devices with
different DMA masks (looking at you wierd AMD audio function) we're
always going to have to pick something conservative as the default
config for TVE#0 (64k, no bypass mapping) since the driver will tell
us what the device actually supports long after the IOMMU configuation
is done. What we really want is to be able to have separate IOMMU
contexts for each device, or at the very least a separate context for
the crippled devices.

We could allow a per-device IOMMU context by extending the Master /
Slave PE thing to cover DMA in addition to MMIO. Right now we only use
slave PEs when a device's MMIO BARs extend over multiple m64 segments.
When that happens an MMIO error causes the PHB to freezes the PE
corresponding to one of those segments, but not any of the others. To
present a single "PE" to the EEH core we check the freeze status of
each of the slave PEs when the EEH core does a PE status check and if
any of them are frozen, we freeze the rest of them too. When a driver
sets a limited DMA mask we could move that device to a seperate slave
PE so that it has it's own IOMMU context taylored to its DMA
addressing limits.

Thoughts?

Oliver

Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings

Excerpts from Will Deacon's message of April 15, 2020 8:47 pm:
> Hi Nick,
> 
> On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
>> have vmalloc attempt to allocate PMD-sized pages first, before falling back
>> to small pages. Allocations which use something other than PAGE_KERNEL
>> protections are not permitted to use huge pages yet, not all callers expect
>> this (e.g., module allocations vs strict module rwx).
>> 
>> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
>> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.
> 
> I wonder if it's worth extending vmap() to handle higher order pages in
> a similar way? That might be helpful for tracing PMUs such as Arm SPE,
> where the CPU streams tracing data out to a virtually addressed buffer
> (see rb_alloc_aux_page()).

Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after
this patch, I have something to do it but no callers ready yet, if
you have an easy one we can add it.

>> This can result in more internal fragmentation and memory overhead for a
>> given allocation. It can also cause greater NUMA unbalance on hashdist
>> allocations.
>> 
>> There may be other callers that expect small pages under vmalloc but use
>> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
>> alternative would be a new function or flag which enables large mappings,
>> and use that in callers.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  include/linux/vmalloc.h |   2 +
>>  mm/vmalloc.c| 135 +---
>>  2 files changed, 102 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 291313a7e663..853b82eac192 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */
>>  #define VM_UNINITIALIZED0x0020  /* vm_struct is not fully 
>> initialized */
>>  #define VM_NO_GUARD 0x0040  /* don't add guard page */
>>  #define VM_KASAN0x0080  /* has allocated kasan shadow 
>> memory */
>> +#define VM_HUGE_PAGES   0x0100  /* may use huge pages */
> 
> Please can you add a check for this in the arm64 change_memory_common()
> code? Other architectures might need something similar, but we need to
> forbid changing memory attributes for portions of the huge page.

Yeah good idea, I can look about adding some more checks.

> 
> In general, I'm a bit wary of software table walkers tripping over this.
> For example, I don't think apply_to_existing_page_range() can handle
> huge mappings at all, but the one user (KASAN) only ever uses page mappings
> so it's ok there.

Right, I have something to warn for apply to page range (and looking
at adding support for bigger pages). It doesn't even have a test and
warn at the moment which isn't good practice IMO so we should add one
even without huge vmap.

> 
>> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned 
>> long size,
>>  if (unlikely(!size))
>>  return NULL;
>>  
>> -if (flags & VM_IOREMAP)
>> -align = 1ul << clamp_t(int, get_count_order_long(size),
>> -   PAGE_SHIFT, IOREMAP_MAX_ORDER);
>> +if (flags & VM_IOREMAP) {
>> +align = max(align,
>> +1ul << clamp_t(int, get_count_order_long(size),
>> +   PAGE_SHIFT, IOREMAP_MAX_ORDER));
>> +}
> 
> 
> I don't follow this part. Please could you explain why you're potentially
> aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
> of the patch.

Trying to remember. If the caller asks for a particular alignment we 
shouldn't reduce it. Should put it in another patch.

Thanks,
Nick

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote:
> >> > Likewise, it's not useful to have different error return mechanisms
> >> > because the caller just has to branch to support both (or the
> >> > kernel-provided stub just has to emulate one for it; that could work
> >> > if you really want to change the bad existing convention).
> >> > 
> >> > Thoughts?
> >> 
> >> The existing convention has to change somewhat because of the clobbers,
> >> so I thought we could change the error return at the same time. I'm
> >> open to not changing it and using CR0[SO], but others liked the idea.
> >> Pro: it matches sc and vsyscall. Con: it's different from other common
> >> archs. Performnce-wise it would really be a wash -- cost of conditional
> >> branch is not the cmp but the mispredict.
> > 
> > If you do the branch on hwcap at each syscall, then you significantly
> > increase code size of every syscall point, likely turning a bunch of
> > trivial functions that didn't need stack frames into ones that do. You
> > also potentially make them need a TOC pointer. Making them all just do
> > an indirect call unconditionally (with pointer in TLS like i386?) is a
> > lot more efficient in code size and at least as good for performance.
> 
> I disagree. Doing the long vdso indirect call *necessarily* requires
> touching a new icache line, and even a new TLB entry. Indirect branches

The increase in number of icache lines from the branch at every
syscall point is far greater than the use of a single extra icache
line shared by all syscalls. Not to mention the dcache line to access
__hwcap or whatever, and the icache lines to setup access TOC-relative
access to it. (Of course you could put a copy of its value in TLS at a
fixed offset, which would somewhat mitigate both.)

> And finally, the HWCAP test can eventually go away in future. A vdso
> call can not.

We support nearly arbitrarily old kernels (with limited functionality)
and hardware (with full functionality) and don't intend for that to
change, ever. But indeed glibc might want too eventually drop the
check.

> If you really want to select with an indirect branch rather than
> direct conditional, you can do that all within the library.

OK. It's a little bit more work if that's not the interface the kernel
will give us, but it's no big deal.

Rich

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Oliver O'Halloran

On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy  wrote:
>
> Anyone? Is it totally useless or wrong approach? Thanks,

I wouldn't say it's either, but I still hate it.

The 4GB mode being per-PHB makes it difficult to use unless we force
that mode on 100% of the time which I'd prefer not to do. Ideally
devices that actually support 64bit addressing (which is most of them)
should be able to use no-translate mode when possible since a) It's
faster, and b) It frees up room in the TCE cache devices that actually
need them. I know you've done some testing with 100G NICs and found
the overhead was fine, but IMO that's a bad test since it's pretty
much the best-case scenario since all the devices on the PHB are in
the same PE. The PHB's TCE cache only hits when the TCE matches the
DMA bus address and the PE number for the device so in a multi-PE
environment there's a lot of potential for TCE cache trashing. If
there was one or two PEs under that PHB it's probably not going to
matter, but if you have an NVMe rack with 20 drives it starts to look
a bit ugly.

That all said, it might be worth doing this anyway since we probably
want the software infrastructure in place to take advantage of it.
Maybe expand the command line parameters to allow it to be enabled on
a per-PHB basis rather than globally.

Oliver

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 10:48 am:
> On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote:
>> Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
>> > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
>> >> I would like to enable Linux support for the powerpc 'scv' instruction,
>> >> as a faster system call instruction.
>> >> 
>> >> This requires two things to be defined: Firstly a way to advertise to 
>> >> userspace that kernel supports scv, and a way to allocate and advertise
>> >> support for individual scv vectors. Secondly, a calling convention ABI
>> >> for this new instruction.
>> >> 
>> >> Thanks to those who commented last time, since then I have removed my
>> >> answered questions and unpopular alternatives but you can find them
>> >> here
>> >> 
>> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
>> >> 
>> >> Let me try one more with a wider cc list, and then we'll get something
>> >> merged. Any questions or counter-opinions are welcome.
>> >> 
>> >> System Call Vectored (scv) ABI
>> >> ==
>> >> 
>> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an
>> >> rfscv counter-part. The benefit of these instructions is performance
>> >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
>> >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
>> >> updates. The scv instruction has 128 interrupt entry points (not enough 
>> >> to cover the Linux system call space).
>> >> 
>> >> The proposal is to assign scv numbers very conservatively and allocate 
>> >> them as individual HWCAP features as we add support for more. The zero 
>> >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
>> >> 
>> >> Advertisement
>> >> 
>> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
>> >> SIGILL in current environments. Linux has defined a HWCAP2 bit 
>> >> PPC_FEATURE2_SCV for SCV support, but does not set it.
>> >> 
>> >> When scv instruction support and the scv 0 vector for system calls are 
>> >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
>> >> should not be used without future HWCAP bits indicating support, which is
>> >> how we will allocate them. (Should unallocated ones generate SIGILL, or
>> >> return -ENOSYS in r3?)
>> >> 
>> >> Calling convention
>> >> 
>> >> The proposal is for scv 0 to provide the standard Linux system call ABI 
>> >> with the following differences from sc convention[1]:
>> >> 
>> >> - LR is to be volatile across scv calls. This is necessary because the 
>> >>   scv instruction clobbers LR. From previous discussion, this should be 
>> >>   possible to deal with in GCC clobbers and CFI.
>> >> 
>> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>> >>   kernel system call exit to avoid restoring the CR register (although 
>> >>   we probably still would anyway to avoid information leak).
>> >> 
>> >> - Error handling: I think the consensus has been to move to using negative
>> >>   return value in r3 rather than CR0[SO]=1 to indicate error, which 
>> >> matches
>> >>   most other architectures and is closer to a function call.
>> >> 
>> >> The number of scratch registers (r9-r12) at kernel entry seems 
>> >> sufficient that we don't have any costly spilling, patch is here[2].  
>> >> 
>> >> [1] 
>> >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
>> >> [2] 
>> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html
>> > 
>> > My preference would be that it work just like the i386 AT_SYSINFO
>> > where you just replace "int $128" with "call *%%gs:16" and the kernel
>> > provides a stub in the vdso that performs either scv or the old
>> > mechanism with the same calling convention. Then if the kernel doesn't
>> > provide it (because the kernel is too old) libc would have to provide
>> > its own stub that uses the legacy method and matches the calling
>> > convention of the one the kernel is expected to provide.
>> 
>> I'm not sure if that's necessary. That's done on x86-32 because they
>> select different sequences to use based on the CPU running and if the host
>> kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
>> bits and select the right sequence in libc as well I suppose.
> 
> It's not just a HWCAP. It's a contract between the kernel and
> userspace to support a particular calling convention that's not
> exposed except as the public entry point the kernel exports via
> AT_SYSINFO.

Right.

>> > Note that any libc that actually makes use of the new functionality is
>> > not going to be able to make clobbers conditional on support for it;
>> > branching around different clobbers is going to defeat any gains vs
>> > always just treating anything clobbered by either method as clobbered.
>> 
>> Well it would have to test HWCAP and

Re: [PATCH 01/34] docs: filesystems: fix references for doc files there

2020-04-15 Thread Joseph Qi




On 2020/4/15 22:32, Mauro Carvalho Chehab wrote:
> Several files there were renamed to ReST. Fix the broken
> references.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/ABI/stable/sysfs-devices-node   | 2 +-
>  Documentation/ABI/testing/procfs-smaps_rollup | 2 +-
>  Documentation/admin-guide/cpu-load.rst| 2 +-
>  Documentation/admin-guide/nfs/nfsroot.rst | 2 +-
>  Documentation/driver-api/driver-model/device.rst  | 2 +-
>  Documentation/driver-api/driver-model/overview.rst| 2 +-
>  Documentation/filesystems/dax.txt | 2 +-
>  Documentation/filesystems/dnotify.txt | 2 +-
>  Documentation/filesystems/ramfs-rootfs-initramfs.rst  | 2 +-
>  Documentation/powerpc/firmware-assisted-dump.rst  | 2 +-
>  Documentation/process/adding-syscalls.rst | 2 +-
>  .../translations/it_IT/process/adding-syscalls.rst| 2 +-
>  Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 2 +-
>  fs/Kconfig| 2 +-
>  fs/Kconfig.binfmt | 2 +-
>  fs/adfs/Kconfig   | 2 +-
>  fs/affs/Kconfig   | 2 +-
>  fs/afs/Kconfig| 6 +++---
>  fs/bfs/Kconfig| 2 +-
>  fs/cramfs/Kconfig | 2 +-
>  fs/ecryptfs/Kconfig   | 2 +-
>  fs/fat/Kconfig| 8 
>  fs/fuse/Kconfig   | 2 +-
>  fs/fuse/dev.c | 2 +-
>  fs/hfs/Kconfig| 2 +-
>  fs/hpfs/Kconfig   | 2 +-
>  fs/isofs/Kconfig  | 2 +-
>  fs/namespace.c| 2 +-
>  fs/notify/inotify/Kconfig | 2 +-
>  fs/ntfs/Kconfig   | 2 +-
>  fs/ocfs2/Kconfig  | 2 +-

For ocfs2 part,
Acked-by: Joseph Qi 

>  fs/overlayfs/Kconfig  | 6 +++---
>  fs/proc/Kconfig   | 4 ++--
>  fs/romfs/Kconfig  | 2 +-
>  fs/sysfs/dir.c| 2 +-
>  fs/sysfs/file.c   | 2 +-
>  fs/sysfs/mount.c  | 2 +-
>  fs/sysfs/symlink.c| 2 +-
>  fs/sysv/Kconfig   | 2 +-
>  fs/udf/Kconfig| 2 +-
>  include/linux/relay.h | 2 +-
>  include/linux/sysfs.h | 2 +-
>  kernel/relay.c| 2 +-
>  44 files changed, 54 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
> b/Documentation/ABI/stable/sysfs-devices-node
> index df8413cf1468..484fc04bcc25 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -54,7 +54,7 @@ Date:   October 2002
>  Contact: Linux Memory Management list 
>  Description:
>   Provides information about the node's distribution and memory
> - utilization. Similar to /proc/meminfo, see 
> Documentation/filesystems/proc.txt
> + utilization. Similar to /proc/meminfo, see 
> Documentation/filesystems/proc.rst
>  
>  What:/sys/devices/system/node/nodeX/numastat
>  Date:October 2002
> diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
> b/Documentation/ABI/testing/procfs-smaps_rollup
> index 274df44d8b1b..046978193368 100644
> --- a/Documentation/ABI/testing/procfs-smaps_rollup
> +++ b/Documentation/ABI/testing/procfs-smaps_rollup
> @@ -11,7 +11,7 @@ Description:
>   Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
>   are not present in /proc/pid/smaps.  These fields represent
>   the sum of the Pss field of each type (anon, file, shmem).
> - For more details, see Documentation/filesystems/proc.txt
> + For more details, see Documentation/filesystems/proc.rst
>   and the procfs man page.
>  
>   Typical output looks like this:
> diff --git a/Documentation/admin-guide/cpu-load.rst 
> b/Documentation/admin-guide/cpu-load.rst
> index 2d01ce43d2a2..ebdecf864080 100644
> --- a/Documentation/admin-guide/cpu-load.rst
> +++ b/Documentation/ad

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-15 Thread Alexey Kardashevskiy

Anyone? Is it totally useless or wrong approach? Thanks,


On 08/04/2020 19:43, Alexey Kardashevskiy wrote:
> 
> 
> On 23/03/2020 18:53, Alexey Kardashevskiy wrote:
>> Here is an attempt to support bigger DMA space for devices
>> supporting DMA masks less than 59 bits (GPUs come into mind
>> first). POWER9 PHBs have an option to map 2 windows at 0
>> and select a windows based on DMA address being below or above
>> 4GB.
>>
>> This adds the "iommu=iommu_bypass" kernel parameter and
>> supports VFIO+pseries machine - current this requires telling
>> upstream+unmodified QEMU about this via
>> -global spapr-pci-host-bridge.dma64_win_addr=0x1
>> or per-phb property. 4/4 advertises the new option but
>> there is no automation around it in QEMU (should it be?).
>>
>> For now it is either 1<<59 or 4GB mode; dynamic switching is
>> not supported (could be via sysfs).
>>
>> This is a rebased version of
>> https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/
>>
>> The main change since v1 is that now it is 7 patches with
>> clearer separation of steps.
>>
>>
>> This is based on 6c90b86a745a "Merge tag 'mmc-v5.6-rc6' of 
>> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc"
>>
>> Please comment. Thanks.
> 
> Ping?
> 
> 
>>
>>
>>
>> Alexey Kardashevskiy (7):
>>   powerpc/powernv/ioda: Move TCE bypass base to PE
>>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>>   powerpc/powernv/ioda: Allow smaller TCE table levels
>>   powerpc/powernv/phb4: Use IOMMU instead of bypassing
>>   powerpc/iommu: Add a window number to
>> iommu_table_group_ops::get_table_size
>>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
>>
>>  arch/powerpc/include/asm/iommu.h  |   3 +
>>  arch/powerpc/include/asm/opal-api.h   |   9 +-
>>  arch/powerpc/include/asm/opal.h   |   2 +
>>  arch/powerpc/platforms/powernv/pci.h  |   4 +-
>>  include/uapi/linux/vfio.h |   2 +
>>  arch/powerpc/platforms/powernv/npu-dma.c  |   1 +
>>  arch/powerpc/platforms/powernv/opal-call.c|   2 +
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 234 ++
>>  drivers/vfio/vfio_iommu_spapr_tce.c   |  17 +-
>>  10 files changed, 213 insertions(+), 65 deletions(-)
>>
> 

-- 
Alexey

Re: [PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

2020-04-15 Thread kbuild test robot

Hi Wang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing staging/staging-testing 
v5.7-rc1 next-20200415]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Wang-Wenhu/drivers-uio-new-driver-uio_fsl_85xx_cache_sram/20200416-040633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for ARCH_32BIT_OFF_T
   Depends on !64BIT
   Selected by
   - PPC && PPC32
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:14:3: error: conflicting types for 
>> 'atomic64_t'
   14 | } atomic64_t;
   | ^~
   In file included from include/linux/page-flags.h:9,
   from kernel/bounds.c:10:
   include/linux/types.h:178:3: note: previous declaration of 'atomic64_t' was 
here
   178 | } atomic64_t;
   | ^~
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:18:12: error: conflicting types for 
>> 'atomic64_read'
   18 | extern s64 atomic64_read(const atomic64_t
   | ^
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:300:23: note: previous definition of 
'atomic64_read' was here
   300 | static __inline__ s64 atomic64_read(const atomic64_t
   | ^
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:19:13: error: conflicting types for 
>> 'atomic64_set'
   19 | extern void atomic64_set(atomic64_t s64 i);
   | ^~~~
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:309:24: note: previous definition of 
'atomic64_set' was here
   309 | static __inline__ void atomic64_set(atomic64_t s64 i)
   | ^~~~
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:32: warning: "ATOMIC64_OPS" redefined
   32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) 
ATOMIC64_FETCH_OP(op)
   |
   In file included from include/linux/atomic.h:7,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
   arch/powerpc/include/asm/atomic.h:380: note: this is the location of the 
previous definition
   380 | #define ATOMIC64_OPS(op, asm_op) |
   In file included from include/linux/atomic-fallback.h:1185,
   from include/linux/atomic.h:74,
   from include/linux/debug_locks.h:6,
   from include/linux/lockdep.h:28,
   from include/linux/spinlock_types.h:18,
   from kernel/bounds.c:14:
>> include/asm-generic/atomic64.h:24:14: error: conflicting types for 
>> 'atomic64_add'
   24 | extern void atomic64_##op(s64 a, atomic64_t
   | ^
>> include/asm-generic/atomic64.h:32:26: note: in expansion of macro 
>> 'ATOMIC64_OP'
   32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) 
ATOMIC64_FETCH_OP(op)
   | ^~~
>> include/asm-ge

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote:
> Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
> > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
> >> I would like to enable Linux support for the powerpc 'scv' instruction,
> >> as a faster system call instruction.
> >> 
> >> This requires two things to be defined: Firstly a way to advertise to 
> >> userspace that kernel supports scv, and a way to allocate and advertise
> >> support for individual scv vectors. Secondly, a calling convention ABI
> >> for this new instruction.
> >> 
> >> Thanks to those who commented last time, since then I have removed my
> >> answered questions and unpopular alternatives but you can find them
> >> here
> >> 
> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
> >> 
> >> Let me try one more with a wider cc list, and then we'll get something
> >> merged. Any questions or counter-opinions are welcome.
> >> 
> >> System Call Vectored (scv) ABI
> >> ==
> >> 
> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an
> >> rfscv counter-part. The benefit of these instructions is performance
> >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
> >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
> >> updates. The scv instruction has 128 interrupt entry points (not enough 
> >> to cover the Linux system call space).
> >> 
> >> The proposal is to assign scv numbers very conservatively and allocate 
> >> them as individual HWCAP features as we add support for more. The zero 
> >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
> >> 
> >> Advertisement
> >> 
> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
> >> SIGILL in current environments. Linux has defined a HWCAP2 bit 
> >> PPC_FEATURE2_SCV for SCV support, but does not set it.
> >> 
> >> When scv instruction support and the scv 0 vector for system calls are 
> >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
> >> should not be used without future HWCAP bits indicating support, which is
> >> how we will allocate them. (Should unallocated ones generate SIGILL, or
> >> return -ENOSYS in r3?)
> >> 
> >> Calling convention
> >> 
> >> The proposal is for scv 0 to provide the standard Linux system call ABI 
> >> with the following differences from sc convention[1]:
> >> 
> >> - LR is to be volatile across scv calls. This is necessary because the 
> >>   scv instruction clobbers LR. From previous discussion, this should be 
> >>   possible to deal with in GCC clobbers and CFI.
> >> 
> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
> >>   kernel system call exit to avoid restoring the CR register (although 
> >>   we probably still would anyway to avoid information leak).
> >> 
> >> - Error handling: I think the consensus has been to move to using negative
> >>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
> >>   most other architectures and is closer to a function call.
> >> 
> >> The number of scratch registers (r9-r12) at kernel entry seems 
> >> sufficient that we don't have any costly spilling, patch is here[2].  
> >> 
> >> [1] 
> >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
> >> [2] 
> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html
> > 
> > My preference would be that it work just like the i386 AT_SYSINFO
> > where you just replace "int $128" with "call *%%gs:16" and the kernel
> > provides a stub in the vdso that performs either scv or the old
> > mechanism with the same calling convention. Then if the kernel doesn't
> > provide it (because the kernel is too old) libc would have to provide
> > its own stub that uses the legacy method and matches the calling
> > convention of the one the kernel is expected to provide.
> 
> I'm not sure if that's necessary. That's done on x86-32 because they
> select different sequences to use based on the CPU running and if the host
> kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
> bits and select the right sequence in libc as well I suppose.

It's not just a HWCAP. It's a contract between the kernel and
userspace to support a particular calling convention that's not
exposed except as the public entry point the kernel exports via
AT_SYSINFO.

> > Note that any libc that actually makes use of the new functionality is
> > not going to be able to make clobbers conditional on support for it;
> > branching around different clobbers is going to defeat any gains vs
> > always just treating anything clobbered by either method as clobbered.
> 
> Well it would have to test HWCAP and patch in or branch to two 
> completely different sequences including register save/restores yes.
> You could have the same asm and matching clobbers to put the sequence
> inline

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

Excerpts from Rich Felker's message of April 16, 2020 8:55 am:
> On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
>> I would like to enable Linux support for the powerpc 'scv' instruction,
>> as a faster system call instruction.
>> 
>> This requires two things to be defined: Firstly a way to advertise to 
>> userspace that kernel supports scv, and a way to allocate and advertise
>> support for individual scv vectors. Secondly, a calling convention ABI
>> for this new instruction.
>> 
>> Thanks to those who commented last time, since then I have removed my
>> answered questions and unpopular alternatives but you can find them
>> here
>> 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
>> 
>> Let me try one more with a wider cc list, and then we'll get something
>> merged. Any questions or counter-opinions are welcome.
>> 
>> System Call Vectored (scv) ABI
>> ==
>> 
>> The scv instruction is introduced with POWER9 / ISA3, it comes with an
>> rfscv counter-part. The benefit of these instructions is performance
>> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
>> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
>> updates. The scv instruction has 128 interrupt entry points (not enough 
>> to cover the Linux system call space).
>> 
>> The proposal is to assign scv numbers very conservatively and allocate 
>> them as individual HWCAP features as we add support for more. The zero 
>> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
>> 
>> Advertisement
>> 
>> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
>> SIGILL in current environments. Linux has defined a HWCAP2 bit 
>> PPC_FEATURE2_SCV for SCV support, but does not set it.
>> 
>> When scv instruction support and the scv 0 vector for system calls are 
>> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
>> should not be used without future HWCAP bits indicating support, which is
>> how we will allocate them. (Should unallocated ones generate SIGILL, or
>> return -ENOSYS in r3?)
>> 
>> Calling convention
>> 
>> The proposal is for scv 0 to provide the standard Linux system call ABI 
>> with the following differences from sc convention[1]:
>> 
>> - LR is to be volatile across scv calls. This is necessary because the 
>>   scv instruction clobbers LR. From previous discussion, this should be 
>>   possible to deal with in GCC clobbers and CFI.
>> 
>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>   kernel system call exit to avoid restoring the CR register (although 
>>   we probably still would anyway to avoid information leak).
>> 
>> - Error handling: I think the consensus has been to move to using negative
>>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
>>   most other architectures and is closer to a function call.
>> 
>> The number of scratch registers (r9-r12) at kernel entry seems 
>> sufficient that we don't have any costly spilling, patch is here[2].  
>> 
>> [1] 
>> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html
> 
> My preference would be that it work just like the i386 AT_SYSINFO
> where you just replace "int $128" with "call *%%gs:16" and the kernel
> provides a stub in the vdso that performs either scv or the old
> mechanism with the same calling convention. Then if the kernel doesn't
> provide it (because the kernel is too old) libc would have to provide
> its own stub that uses the legacy method and matches the calling
> convention of the one the kernel is expected to provide.

I'm not sure if that's necessary. That's done on x86-32 because they
select different sequences to use based on the CPU running and if the host
kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP
bits and select the right sequence in libc as well I suppose.

> Note that any libc that actually makes use of the new functionality is
> not going to be able to make clobbers conditional on support for it;
> branching around different clobbers is going to defeat any gains vs
> always just treating anything clobbered by either method as clobbered.

Well it would have to test HWCAP and patch in or branch to two 
completely different sequences including register save/restores yes.
You could have the same asm and matching clobbers to put the sequence
inline and then you could patch the one sc/scv instruction I suppose.

A bit of logic to select between them doesn't defeat gains though,
it's about 90 cycle improvement which is a handful of branch mispredicts 
so it really is an improvement. Eventually userspace will stop 
supporting the old variant too.

> Likewise, it's not useful to have different error return mechanisms
> because the caller just has to branch to support both (or the
> kernel-provided s

Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2

On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote:
> I would like to enable Linux support for the powerpc 'scv' instruction,
> as a faster system call instruction.
> 
> This requires two things to be defined: Firstly a way to advertise to 
> userspace that kernel supports scv, and a way to allocate and advertise
> support for individual scv vectors. Secondly, a calling convention ABI
> for this new instruction.
> 
> Thanks to those who commented last time, since then I have removed my
> answered questions and unpopular alternatives but you can find them
> here
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html
> 
> Let me try one more with a wider cc list, and then we'll get something
> merged. Any questions or counter-opinions are welcome.
> 
> System Call Vectored (scv) ABI
> ==
> 
> The scv instruction is introduced with POWER9 / ISA3, it comes with an
> rfscv counter-part. The benefit of these instructions is performance
> (trading slower SRR0/1 with faster LR/CTR registers, and entering the
> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
> updates. The scv instruction has 128 interrupt entry points (not enough 
> to cover the Linux system call space).
> 
> The proposal is to assign scv numbers very conservatively and allocate 
> them as individual HWCAP features as we add support for more. The zero 
> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.
> 
> Advertisement
> 
> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
> SIGILL in current environments. Linux has defined a HWCAP2 bit 
> PPC_FEATURE2_SCV for SCV support, but does not set it.
> 
> When scv instruction support and the scv 0 vector for system calls are 
> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
> should not be used without future HWCAP bits indicating support, which is
> how we will allocate them. (Should unallocated ones generate SIGILL, or
> return -ENOSYS in r3?)
> 
> Calling convention
> 
> The proposal is for scv 0 to provide the standard Linux system call ABI 
> with the following differences from sc convention[1]:
> 
> - LR is to be volatile across scv calls. This is necessary because the 
>   scv instruction clobbers LR. From previous discussion, this should be 
>   possible to deal with in GCC clobbers and CFI.
> 
> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>   kernel system call exit to avoid restoring the CR register (although 
>   we probably still would anyway to avoid information leak).
> 
> - Error handling: I think the consensus has been to move to using negative
>   return value in r3 rather than CR0[SO]=1 to indicate error, which matches
>   most other architectures and is closer to a function call.
> 
> The number of scratch registers (r9-r12) at kernel entry seems 
> sufficient that we don't have any costly spilling, patch is here[2].  
> 
> [1] 
> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html

My preference would be that it work just like the i386 AT_SYSINFO
where you just replace "int $128" with "call *%%gs:16" and the kernel
provides a stub in the vdso that performs either scv or the old
mechanism with the same calling convention. Then if the kernel doesn't
provide it (because the kernel is too old) libc would have to provide
its own stub that uses the legacy method and matches the calling
convention of the one the kernel is expected to provide.

Note that any libc that actually makes use of the new functionality is
not going to be able to make clobbers conditional on support for it;
branching around different clobbers is going to defeat any gains vs
always just treating anything clobbered by either method as clobbered.
Likewise, it's not useful to have different error return mechanisms
because the caller just has to branch to support both (or the
kernel-provided stub just has to emulate one for it; that could work
if you really want to change the bad existing convention).

Thoughts?

Rich

Re: [PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-04-15 Thread Segher Boessenkool

Hi!

On Wed, Apr 15, 2020 at 09:25:59AM +, Christophe Leroy wrote:
> +#define __put_user_goto(x, ptr, label) \
> + __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), 
> label)

This line gets too long, can you break it up somehow?

> +#define __put_user_asm_goto(x, addr, label, op)  \
> + asm volatile goto(  \
> + "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> + EX_TABLE(1b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> + :   \
> + : label)

Same "%Un" problem as in the other patch.  You could use "m<>" here,
but maybe just dropping "%Un" is better.

> +#ifdef __powerpc64__
> +#define __put_user_asm2_goto(x, ptr, label)  \
> + __put_user_asm_goto(x, ptr, label, "std")
> +#else /* __powerpc64__ */
> +#define __put_user_asm2_goto(x, addr, label) \
> + asm volatile goto(  \
> + "1: stw%U1%X1 %0, %1\n" \
> + "2: stw%U1%X1 %L0, %L1\n"   \
> + EX_TABLE(1b, %l2)   \
> + EX_TABLE(2b, %l2)   \
> + :   \
> + : "r" (x), "m" (*addr)  \
> + :   \
> + : label)
> +#endif /* __powerpc64__ */

Here, you should drop it for sure.

Rest looks fine.

Reviewed-by: Segher Boessenkool 


Segher

Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()

2020-04-15 Thread Segher Boessenkool

Hi!

On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote:
> At the time being, __put_user()/__get_user() and friends only use
> register indirect with immediate index addressing, with the index
> set to 0. Ex:
> 
>   lwz reg1, 0(reg2)

This is called a "D-form" instruction, or sometimes "offset addressing".
Don't talk about an "index", it confuses things, because the *other*
kind is called "indexed" already, also in the ISA docs!  (X-form, aka
indexed addressing, [reg+reg], where D-form does [reg+imm], and both
forms can do [reg]).

> Give the compiler the opportunity to use other adressing modes
> whenever possible, to get more optimised code.

Great :-)

> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -114,7 +114,7 @@ extern long __put_user_bad(void);
>   */
>  #define __put_user_asm(x, addr, err, op) \
>   __asm__ __volatile__(   \
> - "1: " op " %1,0(%2) # put_user\n"   \
> + "1: " op "%U2%X2 %1,%2  # put_user\n"   \
>   "2:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "3: li %0,%3\n" \
> @@ -122,7 +122,7 @@ extern long __put_user_bad(void);
>   ".previous\n"   \
>   EX_TABLE(1b, 3b)\
>   : "=r" (err)\
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

%Un on an "m" operand doesn't do much: you need to make it "m<>" if you
want pre-modify ("update") insns to be generated.  (You then will want
to make sure that operand is used in a way GCC can understand; since it
is used only once here, that works fine).

> @@ -130,8 +130,8 @@ extern long __put_user_bad(void);
>  #else /* __powerpc64__ */
>  #define __put_user_asm2(x, addr, err)\
>   __asm__ __volatile__(   \
> - "1: stw %1,0(%2)\n" \
> - "2: stw %1+1,4(%2)\n"   \
> + "1: stw%U2%X2 %1,%2\n"  \
> + "2: stw%U2%X2 %L1,%L2\n"\
>   "3:\n"  \
>   ".section .fixup,\"ax\"\n"  \
>   "4: li %0,%3\n" \
> @@ -140,7 +140,7 @@ extern long __put_user_bad(void);
>   EX_TABLE(1b, 4b)\
>   EX_TABLE(2b, 4b)\
>   : "=r" (err)\
> - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))

Here, it doesn't work.  You don't want two consecutive update insns in
any case.  Easiest is to just not use "m<>", and then, don't use %Un
(which won't do anything, but it is confusing).

Same for the reads.

Rest looks fine, and update should be good with that fixed as said.

Reviewed-by: Segher Boessenkool 

Segher

Powerpc Linux 'scv' system call ABI proposal take 2

I would like to enable Linux support for the powerpc 'scv' instruction,
as a faster system call instruction.

This requires two things to be defined: Firstly a way to advertise to 
userspace that kernel supports scv, and a way to allocate and advertise
support for individual scv vectors. Secondly, a calling convention ABI
for this new instruction.

Thanks to those who commented last time, since then I have removed my
answered questions and unpopular alternatives but you can find them
here

https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html

Let me try one more with a wider cc list, and then we'll get something
merged. Any questions or counter-opinions are welcome.

System Call Vectored (scv) ABI
==

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR 
updates. The scv instruction has 128 interrupt entry points (not enough 
to cover the Linux system call space).

The proposal is to assign scv numbers very conservatively and allocate 
them as individual HWCAP features as we add support for more. The zero 
vector ('scv 0') will be used for normal system calls, equivalent to 'sc'.

Advertisement

Linux has not enabled FSCR[SCV] yet, so the instruction will cause a
SIGILL in current environments. Linux has defined a HWCAP2 bit 
PPC_FEATURE2_SCV for SCV support, but does not set it.

When scv instruction support and the scv 0 vector for system calls are 
added, PPC_FEATURE2_SCV will indicate support for these. Other vectors 
should not be used without future HWCAP bits indicating support, which is
how we will allocate them. (Should unallocated ones generate SIGILL, or
return -ENOSYS in r3?)

Calling convention

The proposal is for scv 0 to provide the standard Linux system call ABI 
with the following differences from sc convention[1]:

- LR is to be volatile across scv calls. This is necessary because the 
  scv instruction clobbers LR. From previous discussion, this should be 
  possible to deal with in GCC clobbers and CFI.

- CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
  kernel system call exit to avoid restoring the CR register (although 
  we probably still would anyway to avoid information leak).

- Error handling: I think the consensus has been to move to using negative
  return value in r3 rather than CR0[SO]=1 to indicate error, which matches
  most other architectures and is closer to a function call.

The number of scratch registers (r9-r12) at kernel entry seems 
sufficient that we don't have any costly spilling, patch is here[2].  

[1] 
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html

Re: [PATCH v5 0/6] implement KASLR for powerpc/fsl_booke/64

On Mon, 2020-03-30 at 10:20 +0800, Jason Yan wrote:
> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718&state=*
> 
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
> 
> KERNELBASE
> 
>   64K |--> kernel <--|
>|  |  |
> +--+--+--++--+--+--+--+--+--+--+--+--++--+--+
> |  |  |  ||  |  |  |  |  |  |  |  |  ||  |  |
> +--+--+--++--+--+--+--+--+--+--+--+--++--+--+
> | |1G
> |->   offset<-|
> 
>   kernstart_virt_addr
> 
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
> 
> Thank you all.
> 
> v4->v5:
>   Fix "-Werror=maybe-uninitialized" compile error.
>   Fix typo "similar as" -> "similar to".
> v3->v4:
>   Do not define __kaslr_offset as a fixed symbol. Reference __run_at_load
> and
> __kaslr_offset by symbol instead of magic offsets.
>   Use IS_ENABLED(CONFIG_PPC32) instead of #ifdef CONFIG_PPC32.
>   Change kaslr-booke32 to kaslr-booke in index.rst
>   Switch some instructions to 64-bit.
> v2->v3:
>   Fix build error when KASLR is disabled.
> v1->v2:
>   Add __kaslr_offset for the secondary cpu boot up.
> 
> Jason Yan (6):
>   powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
> kaslr_early_init()
>   powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>   powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>   powerpc/fsl_booke/64: do not clear the BSS for the second pass
>   powerpc/fsl_booke/64: clear the original kernel if randomized
>   powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
> and add 64bit part
> 
>  Documentation/powerpc/index.rst   |  2 +-
>  .../{kaslr-booke32.rst => kaslr-booke.rst}| 35 ++-
>  arch/powerpc/Kconfig  |  2 +-
>  arch/powerpc/kernel/exceptions-64e.S  | 23 +
>  arch/powerpc/kernel/head_64.S | 13 +++
>  arch/powerpc/kernel/setup_64.c|  3 +
>  arch/powerpc/mm/mmu_decl.h| 23 +++--
>  arch/powerpc/mm/nohash/kaslr_booke.c  | 91 +--
>  8 files changed, 147 insertions(+), 45 deletions(-)
>  rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
> 

Acked-by: Scott Wood 

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, 2020-04-15 at 18:52 +0200, Christophe Leroy wrote:
> 
> Le 15/04/2020 à 17:24, Wang Wenhu a écrit :
> > +
> > +   if (uiomem >= &info->mem[MAX_UIO_MAPS]) {
> 
> I'd prefer
>   if (uiomem - info->mem >= MAX_UIO_MAPS) {
> 
> > +   dev_warn(&pdev->dev, "more than %d uio-maps for
> > device.\n",
> > +MAX_UIO_MAPS);
> > +   break;
> > +   }
> > +   }
> > +
> > +   while (uiomem < &info->mem[MAX_UIO_MAPS]) {
> 
> I'd prefer
> 
>   while (uiomem - info->mem < MAX_UIO_MAPS) {
> 

I wouldn't.  You're turning a simple comparison into a division and a
comparison (if the compiler doesn't optimize it back into the original form),
and making it less clear in the process.

Of course, working with array indices to begin with instead of incrementing a
pointer would be more idiomatic.

> > +   uiomem->size = 0;
> > +   ++uiomem;
> > +   }
> > +
> > +   if (info->mem[0].size == 0) {
> 
> Is there any point in doing all the clearing loop above if it's to bail 
> out here ?
> 
> Wouldn't it be cleaner to do the test above the clearing loop, by just 
> checking whether uiomem is still equal to info->mem ?

There's no point doing the clearing at all, since the array was allocated with
kzalloc().

> > +   dev_err(&pdev->dev, "error no valid uio-map configured\n");
> > +   ret = -EINVAL;
> > +   goto err_info_free_internel;
> > +   }
> > +
> > +   info->version = "0.1.0";
> 
> Could you define some DRIVER_VERSION in the top of the file next to 
> DRIVER_NAME instead of hard coding in the middle on a function ?

That's what v1 had, and Greg KH said to remove it.  I'm guessing that he
thought it was the common-but-pointless practice of having the driver print a
version number that never gets updated, rather than something the UIO API
(unfortunately, compared to a feature query interface) expects.  That said,
I'm not sure what the value is of making it a macro since it should only be
used once, that use is self documenting, it isn't tunable, etc.  Though if
this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro
again, it should be UIO_VERSION, not DRIVER_VERSION).

Does this really need a three-part version scheme?  What's wrong with a
version of "1", to be changed to "2" in the hopefully-unlikely event that the
userspace API changes?  Assuming UIO is used for this at all, which doesn't
seem like a great fit to me.

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote:
> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
> + {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
> + {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
> + {},
> +};

NACK

The device tree describes the hardware, not what driver you want to bind the
hardware to, or how you want to allocate the resources.  And even if defining
nodes for sram allocation were the right way to go, why do you have a separate
compatible for each chip when you're just describing software configuration?

Instead, have module parameters that take the sizes and alignments you'd like
to allocate and expose to userspace.  Better still would be some sort of
dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment,
if it succeeds you can mmap it, and when the fd is closed the region is
freed).

-Scott

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote:
> Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
> could be configured and used as a piece of SRAM which is hignly
> friendly for some user level application performances.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Christophe Leroy 
> Cc: Scott Wood 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Wang Wenhu 
> ---
> Changes since v1:
>  * None
> ---
>  arch/powerpc/platforms/85xx/Kconfig| 2 +-
>  arch/powerpc/platforms/Kconfig.cputype | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/Kconfig
> b/arch/powerpc/platforms/85xx/Kconfig
> index fa3d29dcb57e..6debb4f1b9cc 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
>  if PPC32
>  
>  config FSL_85XX_CACHE_SRAM
> - bool
> + bool "Freescale 85xx Cache-Sram"
>   select PPC_LIB_RHEAP
>   help
> When selected, this option enables cache-sram support

NACK

As discussed before, the driver that uses this API should "select" this
symbol.

-Scott

Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
  * Addressed comments of Greg K-H
  * Moved kfree(info->name) into uio_info_free_internal()
---
  drivers/uio/Kconfig   |   8 ++
  drivers/uio/Makefile  |   1 +
  drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
  3 files changed, 191 insertions(+)
  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
  
+config UIO_FSL_85XX_CACHE_SRAM

+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM


Is there any point having FSL_85XX_CACHE_SRAM without this ?

Should it be the other way round, leave FSL_85XX_CACHE_SRAM unselectable 
by user, and this driver select FSL_85XX_CACHE_SRAM instead of depending 
on it ?



+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
  config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
  obj-$(CONFIG_UIO_FSL_ELBC_GPCM)   += uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
  obj-$(CONFIG_UIO_HV_GENERIC)  += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..fb6903fdaddb
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",},
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",  },
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = &info->mem[0];
+
+   while (uiomem < &info->mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+
+   kfree(info->name);
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;


Align is not used outside of the for loop, it should be declared in the 
loop block.



+   void *virt;


Same for virt


+

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.


It looks like following patches are fixing errors generated by selecting 
FSL_85XX_CACHE_SRAM.


So this patch should go after the patches which fixes the errors, ie it 
should be patch 4 in the series.


Christophe

Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable





Le 15/04/2020 à 17:24, Wang Wenhu a écrit :

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
  * None
---
  arch/powerpc/platforms/85xx/Kconfig| 2 +-
  arch/powerpc/platforms/Kconfig.cputype | 5 +++--
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
  if PPC32
  
  config FSL_85XX_CACHE_SRAM

-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
  # SPDX-License-Identifier: GPL-2.0
  config PPC32
-   bool
+   bool "32-bit kernel"


Why make that user selectable ?

Either a kernel is 64-bit or it is 32-bit. So having PPC64 user 
selectable is all we need.


And what is the link between this change and the description in the log ?


default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
  
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32

bool
  
  menu "Processor support"

+


Why adding this space ?


  choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
  
  config E500

+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool


Why make this user-selectable ? This is already selected by the 
processors requiring it, ie 8500, e5500 and e6500.


Is there any other case where we need E500 ?

And again, what's the link between this change and the description in 
the log ?



  
  config PPC_E500MC

bool "e500mc Support"



Christophe

Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching

2020-04-15 Thread Christopher M Riedl

> On April 15, 2020 3:45 AM Christophe Leroy  wrote:
> 
>  
> Le 15/04/2020 à 07:11, Christopher M Riedl a écrit :
> >> On March 24, 2020 11:25 AM Christophe Leroy  
> >> wrote:
> >>
> >>   
> >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> >>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> >>> mappings to other CPUs. These mappings should be kept local to the CPU
> >>> doing the patching. Use the pre-initialized temporary mm and patching
> >>> address for this purpose. Also add a check after patching to ensure the
> >>> patch succeeded.
> >>>
> >>> Based on x86 implementation:
> >>>
> >>> commit b3fd8e83ada0
> >>> ("x86/alternatives: Use temporary mm for text poking")
> >>>
> >>> Signed-off-by: Christopher M. Riedl 
> >>> ---
> >>>arch/powerpc/lib/code-patching.c | 128 ++-
> >>>1 file changed, 57 insertions(+), 71 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/lib/code-patching.c 
> >>> b/arch/powerpc/lib/code-patching.c
> >>> index 18b88ecfc5a8..f156132e8975 100644
> >>> --- a/arch/powerpc/lib/code-patching.c
> >>> +++ b/arch/powerpc/lib/code-patching.c
> >>> @@ -19,6 +19,7 @@
> >>>#include 
> >>>#include 
> >>>#include 
> >>> +#include 
> >>>
> >>>static int __patch_instruction(unsigned int *exec_addr, unsigned int 
> >>> instr,
> >>>  unsigned int *patch_addr)
> >>> @@ -65,99 +66,79 @@ void __init poking_init(void)
> >>>   pte_unmap_unlock(ptep, ptl);
> >>>}
> >>>
> >>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >>> -
> >>> -static int text_area_cpu_up(unsigned int cpu)
> >>> -{
> >>> - struct vm_struct *area;
> >>> -
> >>> - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >>> - if (!area) {
> >>> - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >>> - cpu);
> >>> - return -1;
> >>> - }
> >>> - this_cpu_write(text_poke_area, area);
> >>> -
> >>> - return 0;
> >>> -}
> >>> -
> >>> -static int text_area_cpu_down(unsigned int cpu)
> >>> -{
> >>> - free_vm_area(this_cpu_read(text_poke_area));
> >>> - return 0;
> >>> -}
> >>> -
> >>> -/*
> >>> - * Run as a late init call. This allows all the boot time patching to be 
> >>> done
> >>> - * simply by patching the code, and then we're called here prior to
> >>> - * mark_rodata_ro(), which happens after all init calls are run. Although
> >>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and 
> >>> we judge
> >>> - * it as being preferable to a kernel that will crash later when someone 
> >>> tries
> >>> - * to use patch_instruction().
> >>> - */
> >>> -static int __init setup_text_poke_area(void)
> >>> -{
> >>> - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >>> - "powerpc/text_poke:online", text_area_cpu_up,
> >>> - text_area_cpu_down));
> >>> -
> >>> - return 0;
> >>> -}
> >>> -late_initcall(setup_text_poke_area);
> >>> +struct patch_mapping {
> >>> + spinlock_t *ptl; /* for protecting pte table */
> >>> + struct temp_mm temp_mm;
> >>> +};
> >>>
> >>>/*
> >>> * This can be called for kernel text or a module.
> >>> */
> >>> -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> >>> +static int map_patch(const void *addr, struct patch_mapping 
> >>> *patch_mapping)
> >>
> >> Why change the name ?
> >>
> > 
> > It's not really an "area" anymore.
> > 
> >>>{
> >>> - unsigned long pfn;
> >>> - int err;
> >>> + struct page *page;
> >>> + pte_t pte, *ptep;
> >>> + pgprot_t pgprot;
> >>>
> >>>   if (is_vmalloc_addr(addr))
> >>> - pfn = vmalloc_to_pfn(addr);
> >>> + page = vmalloc_to_page(addr);
> >>>   else
> >>> - pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> >>> + page = virt_to_page(addr);
> >>>
> >>> - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> >>> + if (radix_enabled())
> >>> + pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> >>> + else
> >>> + pgprot = PAGE_SHARED;
> >>
> >> Can you explain the difference between radix and non radix ?
> >>
> >> Why PAGE_KERNEL for a page that is mapped in userspace ?
> >>
> >> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just
> >> using PAGE_KERNEL ?
> >>
> > 
> > On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for
> > kernel to userspace access in __hash_page - hence we cannot access the 
> > mapping
> > if the page is mapped PAGE_KERNEL on hash. However, I would like to use
> > PAGE_KERNEL here as well and am working on understanding why this check is
> > done in hash and if this can change. On radix this works just fine.
> > 
> > The page is mapped PAGE_KERNEL because the address is technically a 
> > userspace
> > address - but only to keep the mapping local to this CPU doing the patching.
> > PAGE_KERNEL makes it clear both in intent and protection that this is a 
> > kernel
> > mapping.
> > 
> > I think the correct

Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching

2020-04-15 Thread Christopher M Riedl

> On April 15, 2020 4:12 AM Christophe Leroy  wrote:
> 
>  
> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >> On March 26, 2020 9:42 AM Christophe Leroy  wrote:
> >>
> >>   
> >> This patch fixes the RFC series identified below.
> >> It fixes three points:
> >> - Failure with CONFIG_PPC_KUAP
> >> - Failure to write do to lack of DIRTY bit set on the 8xx
> >> - Inadequaly complex WARN post verification
> >>
> >> However, it has an impact on the CPU load. Here is the time
> >> needed on an 8xx to run the ftrace selftests without and
> >> with this series:
> >> - Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX==> 40 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX + this series  ==> 43 seconds
> >>
> >> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >> Signed-off-by: Christophe Leroy 
> >> ---
> >>   arch/powerpc/lib/code-patching.c | 5 -
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/lib/code-patching.c 
> >> b/arch/powerpc/lib/code-patching.c
> >> index f156132e8975..4ccff427592e 100644
> >> --- a/arch/powerpc/lib/code-patching.c
> >> +++ b/arch/powerpc/lib/code-patching.c
> >> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct 
> >> patch_mapping *patch_mapping)
> >>}
> >>   
> >>pte = mk_pte(page, pgprot);
> >> +  pte = pte_mkdirty(pte);
> >>set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>   
> >>init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, 
> >> unsigned int instr)
> >>(offset_in_page((unsigned long)addr) /
> >>sizeof(unsigned int));
> >>   
> >> +  allow_write_to_user(patch_addr, sizeof(instr));
> >>__patch_instruction(addr, instr, patch_addr);
> >> +  prevent_write_to_user(patch_addr, sizeof(instr));
> >>
> > 
> > On radix we can map the page with PAGE_KERNEL protection which ends up
> > setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> > ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> > 
> > Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> > the __patch_instruction() with the allow_/prevent_write_to_user() KUAP 
> > things
> > because this is a temporary kernel mapping which really isn't userspace in
> > the usual sense.
> 
> On the 8xx, that's pretty different.
> 
> The PTE doesn't control whether a page is user page or a kernel page. 
> The only thing that is set in the PTE is whether a page is linked to a 
> given PID or not.
> PAGE_KERNEL tells that the page can be addressed with any PID.
> 
> The user access right is given by a kind of zone, which is in the PGD 
> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> 
> By default, zone 0 can only be accessed by kernel, and zone 1 can only 
> be accessed by user. When kernel wants to access zone 1, it temporarily 
> changes properties of zone 1 to allow both kernel and user accesses.
> 
> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
> must unlock it to access it.
> 
> 
> And this is more or less the same on hash/32. This is managed by segment 
> registers. One segment register corresponds to a 256Mbytes area. Every 
> pages below PAGE_OFFSET can only be read by default by kernel. Only user 
> can write if the PTE allows it. When the kernel needs to write at an 
> address below PAGE_OFFSET, it must change the segment properties in the 
> corresponding segment register.
> 
> So, for both cases, if we want to have it local to a task while still 
> allowing kernel access, it means we have to define a new special area 
> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> 
> That looks complex to me for a small benefit, especially as 8xx is not 
> SMP and neither are most of the hash/32 targets.
> 

Agreed. So I guess the solution is to differentiate between radix/non-radix
and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
is enabled. Hmm, I need to think about this some more, especially if it's
acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?

I don't necessarily want to drop the local mm patching idea for non-radix
platforms since that means we would have to maintain two implementations.

> Christophe

Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.

2020-04-15 Thread Qian Cai




> On Apr 10, 2020, at 3:20 PM, Qian Cai  wrote:
> 
> 
> 
>> On Apr 9, 2020, at 10:14 AM, Steven Rostedt  wrote:
>> 
>> On Thu, 9 Apr 2020 06:06:35 -0400
>> Qian Cai  wrote:
>> 
> I’ll go to bisect some more but it is going to take a while.
> 
> $ git log --oneline 4c205c84e249..8e99cf91b99b
> 8e99cf91b99b tracing: Do not allocate buffer in trace_find_next_entry() 
> in atomic
> 2ab2a0924b99 tracing: Add documentation on set_ftrace_notrace_pid and 
> set_event_notrace_pid
> ebed9628f5c2 selftests/ftrace: Add test to test new set_event_notrace_pid 
> file
> ed8839e072b8 selftests/ftrace: Add test to test new 
> set_ftrace_notrace_pid file
> 276836260301 tracing: Create set_event_notrace_pid to not trace tasks  
 
> b3b1e6ededa4 ftrace: Create set_ftrace_notrace_pid to not trace tasks
> 717e3f5ebc82 ftrace: Make function trace pid filtering a bit more exact  
 
 If it is affecting function tracing, it is probably one of the above two
 commits.  
>>> 
>>> OK, it was narrowed down to one of those messed with mcount here,
>> 
>> Thing is, nothing here touches mcount.
> 
> Yes, you are right. I went back to test the commit just before the 5.7-trace 
> merge request,
> I did reproduce there. The thing is that this bastard could take more 6-hour 
> to happen,
> so my previous attempt did not wait long enough. Back to the square one…

OK, I starts to test all commits up to 12 hours. The progess on far is,

BAD: v5.6-rc1
GOOD: v5.5
GOOD: 153b5c566d30 Merge tag 'microblaze-v5.6-rc1' of 
git://git.monstr.eu/linux-2.6-microblaze

The next step I’ll be testing,

71c3a888cbca Merge tag 'powerpc-5.6-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

IF that is BAD, the merge request is the culprit. I can see a few commits are 
more related that others.

5290ae2b8e5f powerpc/64: Use {SAVE,REST}_NVGPRS macros
ed0bc98f8cbe powerpc/64s: Reimplement power4_idle code in C

Does it ring any bell yet?

Re: [PATCH] lib/mpi: Fix building for powerpc with clang

2020-04-15 Thread Herbert Xu

On Mon, Apr 13, 2020 at 12:50:42PM -0700, Nathan Chancellor wrote:
> 0day reports over and over on an powerpc randconfig with clang:
> 
> lib/mpi/generic_mpih-mul1.c:37:13: error: invalid use of a cast in a
> inline asm context requiring an l-value: remove the cast or build with
> -fheinous-gnu-extensions
> 
> Remove the superfluous casts, which have been done previously for x86
> and arm32 in commit dea632cadd12 ("lib/mpi: fix build with clang") and
> commit 7b7c1df2883d ("lib/mpi/longlong.h: fix building with 32-bit
> x86").
> 
> Reported-by: kbuild test robot 
> Link: https://github.com/ClangBuiltLinux/linux/issues/991
> Signed-off-by: Nathan Chancellor 

Acked-by: Herbert Xu 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH v2, 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr

Include "linux/of_address.h" to fix the compile error for
mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.

  CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’:
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of 
function ‘of_iomap’; did you mean ‘pci_iomap’? 
[-Werror=implicit-function-declaration]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
   ^~~~
   pci_iomap
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer 
from integer without a cast [-Werror=int-conversion]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
 ^
cc1: all warnings being treated as errors
scripts/Makefile.build:267: recipe for target 
'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c 
b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
index 2d0af0c517bb..7533572492f0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
+++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fsl_85xx_cache_ctlr.h"
-- 
2.17.1

[PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * Addressed comments of Greg K-H
 * Moved kfree(info->name) into uio_info_free_internal()
---
 drivers/uio/Kconfig   |   8 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
 3 files changed, 191 insertions(+)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
 
+config UIO_FSL_85XX_CACHE_SRAM
+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM
+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
 config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
 obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
 obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
 obj-$(CONFIG_UIO_HV_GENERIC)   += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..fb6903fdaddb
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = &info->mem[0];
+
+   while (uiomem < &info->mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+
+   kfree(info->name);
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;
+   void *virt;
+   phys_addr_t phys;
+   int ret = -ENODEV;
+
+   /* alloc uio_info for one device */
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info) {
+   ret = -ENOMEM;
+   goto err_out;
+   }
+
+   /* get optional uio name */
+   if (of_property_read_string(parent, "uio_name", &dt_name))
+

[PATCH v2, 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram

Include linux/io.h into fsl_85xx_cache_sram.c to fix the
implicit-declaration compile error when building Cache-Sram.

arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’:
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of 
function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? 
[-Werror=implicit-function-declaration]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
  ^~~~
  bitmap_complement
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes 
pointer from integer without a cast [-Werror=int-conversion]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
^
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of 
function ‘iounmap’; did you mean ‘roundup’? 
[-Werror=implicit-function-declaration]
  iounmap(cache_sram->base_virt);
  ^~~
  roundup
cc1: all warnings being treated as errors

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: WANG Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index f6c665dac725..be3aef4229d7 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fsl_85xx_cache_ctlr.h"
 
-- 
2.17.1

[PATCH v2,0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram

This series add a new uio driver for freescale 85xx platforms to
access the Cache-Sram form user level. This is extremely helpful
for the user-space applications that require high performance memory
accesses.

It fixes the compile errors and warning of the hardware level drivers
and implements the uio driver in uio_fsl_85xx_cache_sram.c.

Changes since v1:
 * Addressed comments of Greg K-H
 * Moved kfree(info->name) into uio_info_free_internal()

Wang Wenhu (5):
  powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
  powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
  drivers: uio: new driver for fsl_85xx_cache_sram

 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|   5 +-
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c |   3 +-
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c |   1 +
 drivers/uio/Kconfig   |   8 +
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++
 7 files changed, 198 insertions(+), 4 deletions(-)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

-- 
2.17.1

[PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
Changes since v1:
 * None
---
 arch/powerpc/platforms/85xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
 if PPC32
 
 config FSL_85XX_CACHE_SRAM
-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 config PPC32
-   bool
+   bool "32-bit kernel"
default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
 
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32
bool
 
 menu "Processor support"
+
 choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
 
 config E500
+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool
 
 config PPC_E500MC
bool "e500mc Support"
-- 
2.17.1

[PATCH v2, 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram

Function instantiate_cache_sram should not be linked into the init
section for its caller mpc85xx_l2ctlr_of_probe is none-__init.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 

Warning information:
  MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from 
the function mpc85xx_l2ctlr_of_probe() to the function 
.init.text:instantiate_cache_sram()
The function mpc85xx_l2ctlr_of_probe() references
the function __init instantiate_cache_sram().
This is often because mpc85xx_l2ctlr_of_probe lacks a __init
annotation or the annotation of instantiate_cache_sram is wrong.
---
Changes since v1:
 * None
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index be3aef4229d7..3de5ac8382c0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr)
 }
 EXPORT_SYMBOL(mpc85xx_cache_sram_free);
 
-int __init instantiate_cache_sram(struct platform_device *dev,
+int instantiate_cache_sram(struct platform_device *dev,
struct sram_parameters sram_params)
 {
int ret = 0;
-- 
2.17.1

[PATCH] powerpc/uaccess: Don't set KUAP by default on book3s/32

On book3s/32, KUAP is an heavy process as it requires to
determine which segments are impacted and unlock/lock
each of them.

And since the implementation of user_access_begin/end, it
is even worth for the time being because unlike __get_user(),
user_access_begin doesn't make difference between read and write
and unlocks access also for read allthought that's unneeded
on book3s/32.

As shown by the size of a kernel built with KUAP and one without,
the overhead is 64k bytes of code. As a comparison a similar
build on an 8xx has an overhead of only 8k bytes of code.

   textdata bss dec hex filename
7230416 1425868  837376 9493660  90dc9c vmlinux.kuap6xx
7165012 1425548  837376 9427936  8fdbe0 vmlinux.nokuap6xx
6519796 1960028  477464 8957288  88ad68 vmlinux.kuap8xx
6511664 1959864  477464 8948992  888d00 vmlinux.nokuap8xx

Until a more optimised KUAP is implemented on book3s/32,
don't select it by default.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..0c7151c98b56 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -389,7 +389,7 @@ config PPC_HAVE_KUAP
 config PPC_KUAP
bool "Kernel Userspace Access Protection"
depends on PPC_HAVE_KUAP
-   default y
+   default y if !PPC_BOOK3S_32
help
  Enable support for Kernel Userspace Access Protection (KUAP)
 
-- 
2.25.0

[PATCH] powerpc/uaccess: Don't set KUEP by default on book3s/32

On book3s/32, KUEP is an heavy process as it requires to
set/unset the NX bit in each of the 12 user segments
everytime the kernel is entered/exited from/to user space.

Don't select KUEP by default on book3s/32.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c7151c98b56..11412078e732 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -377,7 +377,7 @@ config PPC_HAVE_KUEP
 config PPC_KUEP
bool "Kernel Userspace Execution Prevention"
depends on PPC_HAVE_KUEP
-   default y
+   default y if !PPC_BOOK3S_32
help
  Enable support for Kernel Userspace Execution Prevention (KUEP)
 
-- 
2.25.0

[PATCH 01/34] docs: filesystems: fix references for doc files there

2020-04-15 Thread Mauro Carvalho Chehab

Several files there were renamed to ReST. Fix the broken
references.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/sysfs-devices-node   | 2 +-
 Documentation/ABI/testing/procfs-smaps_rollup | 2 +-
 Documentation/admin-guide/cpu-load.rst| 2 +-
 Documentation/admin-guide/nfs/nfsroot.rst | 2 +-
 Documentation/driver-api/driver-model/device.rst  | 2 +-
 Documentation/driver-api/driver-model/overview.rst| 2 +-
 Documentation/filesystems/dax.txt | 2 +-
 Documentation/filesystems/dnotify.txt | 2 +-
 Documentation/filesystems/ramfs-rootfs-initramfs.rst  | 2 +-
 Documentation/powerpc/firmware-assisted-dump.rst  | 2 +-
 Documentation/process/adding-syscalls.rst | 2 +-
 .../translations/it_IT/process/adding-syscalls.rst| 2 +-
 Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   | 2 +-
 fs/Kconfig| 2 +-
 fs/Kconfig.binfmt | 2 +-
 fs/adfs/Kconfig   | 2 +-
 fs/affs/Kconfig   | 2 +-
 fs/afs/Kconfig| 6 +++---
 fs/bfs/Kconfig| 2 +-
 fs/cramfs/Kconfig | 2 +-
 fs/ecryptfs/Kconfig   | 2 +-
 fs/fat/Kconfig| 8 
 fs/fuse/Kconfig   | 2 +-
 fs/fuse/dev.c | 2 +-
 fs/hfs/Kconfig| 2 +-
 fs/hpfs/Kconfig   | 2 +-
 fs/isofs/Kconfig  | 2 +-
 fs/namespace.c| 2 +-
 fs/notify/inotify/Kconfig | 2 +-
 fs/ntfs/Kconfig   | 2 +-
 fs/ocfs2/Kconfig  | 2 +-
 fs/overlayfs/Kconfig  | 6 +++---
 fs/proc/Kconfig   | 4 ++--
 fs/romfs/Kconfig  | 2 +-
 fs/sysfs/dir.c| 2 +-
 fs/sysfs/file.c   | 2 +-
 fs/sysfs/mount.c  | 2 +-
 fs/sysfs/symlink.c| 2 +-
 fs/sysv/Kconfig   | 2 +-
 fs/udf/Kconfig| 2 +-
 include/linux/relay.h | 2 +-
 include/linux/sysfs.h | 2 +-
 kernel/relay.c| 2 +-
 44 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
 Contact:   Linux Memory Management list 
 Description:
Provides information about the node's distribution and memory
-   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.txt
+   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.rst
 
 What:  /sys/devices/system/node/nodeX/numastat
 Date:  October 2002
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps.  These fields represent
the sum of the Pss field of each type (anon, file, shmem).
-   For more details, see Documentation/filesystems/proc.txt
+   For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
 
Typical output looks like this:
diff --git a/Documentation/admin-guide/cpu-load.rst 
b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
 --
 
 - http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
+- Documentation/filesystems/proc.rst (1.8)
 
 
 Thanks
diff --git a/Documentation/admin-guide/nfs/nfsroot.rst

[PATCH 29/34] docs: filesystems: convert spufs/spufs.txt to ReST

2020-04-15 Thread Mauro Carvalho Chehab

This file is at groff output format. Manually convert it to
ReST format, trying to preserve a similar output after parsed.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/filesystems/spufs/index.rst |  1 +
 .../spufs/{spufs.txt => spufs.rst}| 59 +--
 MAINTAINERS   |  2 +-
 3 files changed, 30 insertions(+), 32 deletions(-)
 rename Documentation/filesystems/spufs/{spufs.txt => spufs.rst} (95%)

diff --git a/Documentation/filesystems/spufs/index.rst 
b/Documentation/filesystems/spufs/index.rst
index 39553c6ebefd..939cf59a7d9e 100644
--- a/Documentation/filesystems/spufs/index.rst
+++ b/Documentation/filesystems/spufs/index.rst
@@ -8,4 +8,5 @@ SPU Filesystem
 .. toctree::
:maxdepth: 1
 
+   spufs
spu_create
diff --git a/Documentation/filesystems/spufs/spufs.txt 
b/Documentation/filesystems/spufs/spufs.rst
similarity index 95%
rename from Documentation/filesystems/spufs/spufs.txt
rename to Documentation/filesystems/spufs/spufs.rst
index caf36aaae804..8a42859bb100 100644
--- a/Documentation/filesystems/spufs/spufs.txt
+++ b/Documentation/filesystems/spufs/spufs.rst
@@ -1,12 +1,18 @@
-SPUFS(2)   Linux Programmer's Manual  SPUFS(2)
+.. SPDX-License-Identifier: GPL-2.0
 
+=
+spufs
+=
 
+Name
+
 
-NAME
spufs - the SPU file system
 
 
-DESCRIPTION
+Description
+===
+
The SPU file system is used on PowerPC machines that implement the Cell
Broadband Engine Architecture in order to access Synergistic  Processor
Units (SPUs).
@@ -21,7 +27,9 @@ DESCRIPTION
ally add or remove files.
 
 
-MOUNT OPTIONS
+Mount Options
+=
+
uid=
   set the user owning the mount point, the default is 0 (root).
 
@@ -29,7 +37,9 @@ MOUNT OPTIONS
   set the group owning the mount point, the default is 0 (root).
 
 
-FILES
+Files
+=
+
The files in spufs mostly follow the standard behavior for regular sys-
tem  calls like read(2) or write(2), but often support only a subset of
the operations supported on regular file systems. This list details the
@@ -125,14 +135,12 @@ FILES
   space is available for writing.
 
 
-   /mbox_stat
-   /ibox_stat
-   /wbox_stat
+   /mbox_stat, /ibox_stat, /wbox_stat
Read-only files that contain the length of the current queue, i.e.  how
many  words  can  be  read  from  mbox or ibox or how many words can be
written to wbox without blocking.  The files can be read only in 4-byte
units  and  return  a  big-endian  binary integer number.  The possible
-   operations on an open *box_stat file are:
+   operations on an open ``*box_stat`` file are:
 
read(2)
   If a count smaller than four is requested, read returns  -1  and
@@ -143,12 +151,7 @@ FILES
   in EAGAIN.
 
 
-   /npc
-   /decr
-   /decr_status
-   /spu_tag_mask
-   /event_mask
-   /srr0
+   /npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0
Internal  registers  of  the SPU. The representation is an ASCII string
with the numeric value of the next instruction to  be  executed.  These
can  be  used in read/write mode for debugging, but normal operation of
@@ -157,17 +160,14 @@ FILES
 
The contents of these files are:
 
+   === ===
npc Next Program Counter
-
decrSPU Decrementer
-
decr_status Decrementer Status
-
spu_tag_maskMFC tag mask for SPU DMA
-
event_mask  Event mask for SPU interrupts
-
srr0Interrupt Return address register
+   === ===
 
 
The   possible   operations   on   an   open  npc,  decr,  decr_status,
@@ -206,8 +206,7 @@ FILES
   from the data buffer, updating the value of the fpcr register.
 
 
-   /signal1
-   /signal2
+   /signal1, /signal2
The two signal notification channels of an SPU.  These  are  read-write
files  that  operate  on  a 32 bit word.  Writing to one of these files
triggers an interrupt on the SPU.  The  value  written  to  the  signal
@@ -233,8 +232,7 @@ FILES
   file.
 
 
-   /signal1_type
-   /signal2_type
+   /signal1_type, /signal2_type
These two files change the behavior of the signal1 and signal2  notifi-
cation  files.  The  contain  a numerical ASCII string which is read as
either "1" or "0".  In mode 0 (overwrite), the  hardware  replaces  the
@@ -259,18 +257,17 @@ FILES
   the previous setting.
 
 
-EXAMPLES
+Examples
+
/etc/fstab entry
   none  /spu  spufs gid=spu   00
 
 
-AUTHORS
+Authors
+===
Arnd  Bergmann  ,  Mark  Nutter ,
Ulrich Weigand 
 
-SEE ALSO
+See Also
+

[PATCH 00/34] fs: convert remaining docs to ReST file format

2020-04-15 Thread Mauro Carvalho Chehab

This patch series convert the remaining files under Documentation/filesystems
to the ReST file format. It is based on linux-next (next-20200414).

PS.: I opted to add mainly ML from the output of get_maintainers.pl to the c/c
list of patch 00/34, because  otherwise the number of c/c would be too many,
with would very likely cause ML servers to reject it.

The results of those changes (together with other changes from my pending
doc patches) are available at:

   https://www.infradead.org/~mchehab/kernel_docs/filesystems/index.html

Mauro Carvalho Chehab (34):
  docs: filesystems: fix references for doc files there
  docs: filesystems: convert caching/object.txt to ReST
  docs: filesystems: convert caching/fscache.txt to ReST format
  docs: filesystems: caching/netfs-api.txt: convert it to ReST
  docs: filesystems: caching/operations.txt: convert it to ReST
  docs: filesystems: caching/cachefiles.txt: convert to ReST
  docs: filesystems: caching/backend-api.txt: convert it to ReST
  docs: filesystems: convert cifs/cifsroot.rst to ReST
  docs: filesystems: convert configfs.txt to ReST
  docs: filesystems: convert automount-support.txt to ReST
  docs: filesystems: convert coda.txt to ReST
  docs: filesystems: convert dax.txt to ReST
  docs: filesystems: convert devpts.txt to ReST
  docs: filesystems: convert dnotify.txt to ReST
  docs: filesystems: convert fiemap.txt to ReST
  docs: filesystems: convert files.txt to ReST
  docs: filesystems: convert fuse-io.txt to ReST
  docs: filesystems: convert gfs2-glocks.txt to ReST
  docs: filesystems: convert locks.txt to ReST
  docs: filesystems: convert mandatory-locking.txt to ReST
  docs: filesystems: convert mount_api.txt to ReST
  docs: filesystems: rename path-lookup.txt file
  docs: filesystems: convert path-walking.txt to ReST
  docs: filesystems: convert quota.txt to ReST
  docs: filesystems: convert seq_file.txt to ReST
  docs: filesystems: convert sharedsubtree.txt to ReST
  docs: filesystems: split spufs.txt into 3 separate files
  docs: filesystems: convert spufs/spu_create.txt to ReST
  docs: filesystems: convert spufs/spufs.txt to ReST
  docs: filesystems: convert spufs/spu_run.txt to ReST
  docs: filesystems: convert sysfs-pci.txt to ReST
  docs: filesystems: convert sysfs-tagging.txt to ReST
  docs: filesystems: convert xfs-delayed-logging-design.txt to ReST
  docs: filesystems: convert xfs-self-describing-metadata.txt to ReST

 Documentation/ABI/stable/sysfs-devices-node   |2 +-
 Documentation/ABI/testing/procfs-smaps_rollup |2 +-
 Documentation/admin-guide/cpu-load.rst|2 +-
 Documentation/admin-guide/ext4.rst|2 +-
 Documentation/admin-guide/nfs/nfsroot.rst |2 +-
 Documentation/admin-guide/sysctl/kernel.rst   |2 +-
 .../driver-api/driver-model/device.rst|2 +-
 .../driver-api/driver-model/overview.rst  |2 +-
 ...ount-support.txt => automount-support.rst} |   23 +-
 .../{backend-api.txt => backend-api.rst}  |  165 +-
 .../{cachefiles.txt => cachefiles.rst}|  139 +-
 Documentation/filesystems/caching/fscache.rst |  565 ++
 Documentation/filesystems/caching/fscache.txt |  448 -
 Documentation/filesystems/caching/index.rst   |   14 +
 .../caching/{netfs-api.txt => netfs-api.rst}  |  172 +-
 .../caching/{object.txt => object.rst}|   43 +-
 .../{operations.txt => operations.rst}|   45 +-
 .../cifs/{cifsroot.txt => cifsroot.rst}   |   56 +-
 Documentation/filesystems/coda.rst| 1670 
 Documentation/filesystems/coda.txt| 1676 -
 .../{configfs/configfs.txt => configfs.rst}   |  129 +-
 .../filesystems/{dax.txt => dax.rst}  |   11 +-
 Documentation/filesystems/devpts.rst  |   36 +
 Documentation/filesystems/devpts.txt  |   26 -
 .../filesystems/{dnotify.txt => dnotify.rst}  |   13 +-
 Documentation/filesystems/ext2.rst|2 +-
 .../filesystems/{fiemap.txt => fiemap.rst}|  133 +-
 .../filesystems/{files.txt => files.rst}  |   15 +-
 .../filesystems/{fuse-io.txt => fuse-io.rst}  |6 +
 .../{gfs2-glocks.txt => gfs2-glocks.rst}  |  147 +-
 Documentation/filesystems/index.rst   |   26 +
 .../filesystems/{locks.txt => locks.rst}  |   14 +-
 ...tory-locking.txt => mandatory-locking.rst} |   25 +-
 .../{mount_api.txt => mount_api.rst}  |  329 ++--
 .../{path-lookup.txt => path-walking.rst} |   88 +-
 Documentation/filesystems/porting.rst |2 +-
 Documentation/filesystems/proc.rst|2 +-
 .../filesystems/{quota.txt => quota.rst}  |   41 +-
 .../filesystems/ramfs-rootfs-initramfs.rst|2 +-
 .../{seq_file.txt => seq_file.rst}|   61 +-
 .../{sharedsubtree.txt => sharedsubtree.rst}  |  394 ++--
 Documentation/filesystems/spufs/index.rst |   13 +
 .../filesystems/spufs/spu_create.rst  |  131 ++
 Documentation/filesystems/spufs/spu_run.rst   |  138 ++
 .../{spufs.txt => spufs/spufs.

Re: [PATCH v6 6/7] ASoC: dt-bindings: fsl_easrc: Add document for EASRC

2020-04-15 Thread Rob Herring

On Tue, Apr 14, 2020 at 9:56 PM Shengjiu Wang  wrote:
>
> Hi Rob
>
> On Tue, Apr 14, 2020 at 11:49 PM Rob Herring  wrote:
> >
> > On Wed, Apr 01, 2020 at 04:45:39PM +0800, Shengjiu Wang wrote:
> > > EASRC (Enhanced Asynchronous Sample Rate Converter) is a new
> > > IP module found on i.MX8MN.
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  .../devicetree/bindings/sound/fsl,easrc.yaml  | 101 ++
> > >  1 file changed, 101 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/sound/fsl,easrc.yaml 
> > > b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > > new file mode 100644
> > > index ..14ea60084420
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml
> > > @@ -0,0 +1,101 @@
> > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/sound/fsl,easrc.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > +
> > > +title: NXP Asynchronous Sample Rate Converter (ASRC) Controller
> > > +
> > > +maintainers:
> > > +  - Shengjiu Wang 
> > > +
> > > +properties:
> > > +  $nodename:
> > > +pattern: "^easrc@.*"
> > > +
> > > +  compatible:
> > > +const: fsl,imx8mn-easrc
> > > +
> > > +  reg:
> > > +maxItems: 1
> > > +
> > > +  interrupts:
> > > +maxItems: 1
> > > +
> > > +  clocks:
> > > +items:
> > > +  - description: Peripheral clock
> > > +
> > > +  clock-names:
> > > +items:
> > > +  - const: mem
> > > +
> > > +  dmas:
> > > +maxItems: 8
> > > +
> > > +  dma-names:
> > > +items:
> > > +  - const: ctx0_rx
> > > +  - const: ctx0_tx
> > > +  - const: ctx1_rx
> > > +  - const: ctx1_tx
> > > +  - const: ctx2_rx
> > > +  - const: ctx2_tx
> > > +  - const: ctx3_rx
> > > +  - const: ctx3_tx
> > > +
> > > +  firmware-name:
> > > +allOf:
> > > +  - $ref: /schemas/types.yaml#/definitions/string
> > > +  - const: imx/easrc/easrc-imx8mn.bin
> > > +description: The coefficient table for the filters
> > > +
> > > +  fsl,asrc-rate:
> >
> > fsl,asrc-rate-hz
>
> Can we keep "fsl,asrc-rate", because I want this property
> align with the one in fsl,asrc.txt.  These two asrc modules
> can share same property name.

Oh, yes.

So with the example fixed:

Reviewed-by: Rob Herring

Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram>On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote:

Hi, Greg k-h!
Thank you for you fast reply. All the comments will
be addressed with v2 soon. Detailed explanations are
just below specific comment.

>> A driver for freescale 85xx platforms to access the Cache-Sram form
>> user level. This is extremely helpful for some user-space applications
>> that require high performance memory accesses.
>> 
>> Cc: Greg Kroah-Hartman 
>> Cc: Christophe Leroy 
>> Cc: Scott Wood 
>> Cc: Michael Ellerman 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Wang Wenhu 
>> ---
>>  drivers/uio/Kconfig   |   8 ++
>>  drivers/uio/Makefile  |   1 +
>>  drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
>>  3 files changed, 204 insertions(+)
>>  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c
>> 
>> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
>> index 202ee81cfc2b..afd38ec13de0 100644
>> --- a/drivers/uio/Kconfig
>> +++ b/drivers/uio/Kconfig
>> @@ -105,6 +105,14 @@ config UIO_NETX
>>To compile this driver as a module, choose M here; the module
>>will be called uio_netx.
>>  
>> +config UIO_FSL_85XX_CACHE_SRAM
>> +tristate "Freescale 85xx Cache-Sram driver"
>> +depends on FSL_85XX_CACHE_SRAM
>> +help
>> +  Generic driver for accessing the Cache-Sram form user level. This
>> +  is extremely helpful for some user-space applications that require
>> +  high performance memory accesses.
>> +
>>  config UIO_FSL_ELBC_GPCM
>>  tristate "eLBC/GPCM driver"
>>  depends on FSL_LBC
>> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
>> index c285dd2a4539..be2056cffc21 100644
>> --- a/drivers/uio/Makefile
>> +++ b/drivers/uio/Makefile
>> @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o
>>  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
>>  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
>>  obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
>> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)   += uio_fsl_85xx_cache_sram.o
>>  obj-$(CONFIG_UIO_HV_GENERIC)+= uio_hv_generic.o
>> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
>> b/drivers/uio/uio_fsl_85xx_cache_sram.c
>> new file mode 100644
>> index ..e11202dd5b93
>> --- /dev/null
>> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
>> + * Copyright (C) 2020 Wang Wenhu 
>> + * All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published
>> + * by the Free Software Foundation.
>
>Nit, you don't need this sentance anymore now that you have the SPDX
>line above
>
Got, I will delete it with v2.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define DRIVER_VERSION  "0.1.0"
>
>Don't do DRIVER_VERSIONs, they never work once the code is in the kernel
>tree.
>
>> +#define DRIVER_NAME "uio_fsl_85xx_cache_sram"
>
>KBUILD_MODNAME?

Yes, and sorry for that I did not get what should have been done?

>
>> +#define UIO_NAME"uio_cache_sram"
>> +
>> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
>> +{   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
>> +{   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
>> +{   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
>> +{},
>> +};
>> +
>> +static void uio_info_free_internal(struct uio_info *info)
>> +{
>> +struct uio_mem *uiomem = &info->mem[0];
>> +
>> +while (uiomem < &info->mem[MAX_UIO_MAPS]) {
>> +if (uiomem->size) {
>> +mpc85xx_cache_sram_free(uiomem->internal_addr);
>> +kfree(uiomem->name);
>>

Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Michal Suchánek

On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote:
> The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
> Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and
> User Authority Mask Override Register (UAMOR) are not correctly saved and
> restored when the CPU is going into/coming out of idle state.
> 
> On POWER9 CPUs, this means that a CPU may return from idle with the AMR
> value of another thread on the same core.
> 
> This allows a trivial Denial of Service attack against KVM hosts, by booting
> a guest kernel which makes use of the AMR, such as a v5.2 or later kernel
> with Kernel Userspace Access Prevention (KUAP) enabled.
> 
> The guest kernel will set the AMR to prevent userspace access, then the
> thread will go idle. At a later point, the hardware thread that the guest
> was using may come out of idle and start executing in the host, without
> restoring the host AMR value. The host kernel can get caught in a page fault
> loop, as the AMR is unexpectedly causing memory accesses to fail in the
> host, and the host is eventually rendered unusable.

Hello,

shouldn't the kernel restore the host registers when leaving the guest?

I recall some code exists for handling the *AM*R when leaving guest. Can
the KVM guest enter idle without exiting to host?

Thanks

Michal

[PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property

2020-04-15 Thread Aishwarya R

>> Use of_property_read_u32 to read the "reg" and "i2c-address" property
>> instead of using of_get_property to check the return values.
>>
>> Signed-off-by: Aishwarya R 

> This is quite a fragile driver. Have you tested it on HW?

This change is not tested with the Hardware.
But of_property_read_u32 is better here than generic of_get_property.
This make sure that value read properly independent of system endianess.

Re: [PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

On Wed, Apr 15, 2020 at 10:40:05PM +1000, Andrew Donnellan wrote:
> From: Michael Ellerman 
> 
> commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.
> 
> In order to implement KUAP (Kernel Userspace Access Protection) on
> Power9 we will be using the AMR, and therefore indirectly the
> UAMOR/AMOR.
> 
> So save/restore these regs in the idle code.
> 
> Signed-off-by: Michael Ellerman 
> [ajd: Backport to 4.19 tree, CVE-2020-11669]
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/kernel/idle_book3s.S | 27 +++
>  1 file changed, 23 insertions(+), 4 deletions(-)

This and the 4.14 patch now queued up, thanks.

greg k-h

CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts

2020-04-15 Thread Andrew Donnellan

The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the
Authority Mask Register (AMR), Authority Mask Override Register (AMOR)
and User Authority Mask Override Register (UAMOR) are not correctly
saved and restored when the CPU is going into/coming out of idle state.

On POWER9 CPUs, this means that a CPU may return from idle with the AMR
value of another thread on the same core.

This allows a trivial Denial of Service attack against KVM hosts, by
booting a guest kernel which makes use of the AMR, such as a v5.2 or
later kernel with Kernel Userspace Access Prevention (KUAP) enabled.

The guest kernel will set the AMR to prevent userspace access, then the
thread will go idle. At a later point, the hardware thread that the
guest was using may come out of idle and start executing in the host,
without restoring the host AMR value. The host kernel can get caught in
a page fault loop, as the AMR is unexpectedly causing memory accesses to
fail in the host, and the host is eventually rendered unusable.

The fix is to correctly save and restore the AMR in the idle state
handling code.

The bug does not affect POWER8 or earlier Power CPUs.

CVE-2020-11669 has been assigned.

The bug has already been fixed upstream in kernels v5.2 onwards, by [0].

Fixes have been submitted for inclusion in upstream stable kernel trees
for v4.19[1] and v4.14[2].

The bug is already fixed in Red Hat Enterprise Linux 8 kernels from
4.18.0-147 onwards - see RHSA-2019:3517[3].

Thanks to David Gibson of Red Hat for the initial bug report.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53a712bae5dd919521a58d7bad773b949358add0

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208661.html

[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208660.html

[3] https://access.redhat.com/errata/RHSA-2019:3517

--
Andrew Donnellan OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram

On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote:
> A driver for freescale 85xx platforms to access the Cache-Sram form
> user level. This is extremely helpful for some user-space applications
> that require high performance memory accesses.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Christophe Leroy 
> Cc: Scott Wood 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Wang Wenhu 
> ---
>  drivers/uio/Kconfig   |   8 ++
>  drivers/uio/Makefile  |   1 +
>  drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
>  3 files changed, 204 insertions(+)
>  create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c
> 
> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> index 202ee81cfc2b..afd38ec13de0 100644
> --- a/drivers/uio/Kconfig
> +++ b/drivers/uio/Kconfig
> @@ -105,6 +105,14 @@ config UIO_NETX
> To compile this driver as a module, choose M here; the module
> will be called uio_netx.
>  
> +config UIO_FSL_85XX_CACHE_SRAM
> + tristate "Freescale 85xx Cache-Sram driver"
> + depends on FSL_85XX_CACHE_SRAM
> + help
> +   Generic driver for accessing the Cache-Sram form user level. This
> +   is extremely helpful for some user-space applications that require
> +   high performance memory accesses.
> +
>  config UIO_FSL_ELBC_GPCM
>   tristate "eLBC/GPCM driver"
>   depends on FSL_LBC
> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> index c285dd2a4539..be2056cffc21 100644
> --- a/drivers/uio/Makefile
> +++ b/drivers/uio/Makefile
> @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)  += uio_netx.o
>  obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
>  obj-$(CONFIG_UIO_MF624) += uio_mf624.o
>  obj-$(CONFIG_UIO_FSL_ELBC_GPCM)  += uio_fsl_elbc_gpcm.o
> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)+= uio_fsl_85xx_cache_sram.o
>  obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o
> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
> b/drivers/uio/uio_fsl_85xx_cache_sram.c
> new file mode 100644
> index ..e11202dd5b93
> --- /dev/null
> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
> + * Copyright (C) 2020 Wang Wenhu 
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.

Nit, you don't need this sentance anymore now that you have the SPDX
line above

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_VERSION   "0.1.0"

Don't do DRIVER_VERSIONs, they never work once the code is in the kernel
tree.

> +#define DRIVER_NAME  "uio_fsl_85xx_cache_sram"

KBUILD_MODNAME?

> +#define UIO_NAME "uio_cache_sram"
> +
> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
> + {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
> + {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
> + {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
> + {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
> + {},
> +};
> +
> +static void uio_info_free_internal(struct uio_info *info)
> +{
> + struct uio_mem *uiomem = &info->mem[0];
> +
> + while (uiomem < &info->mem[MAX_UIO_MAPS]) {
> + if (uiomem->size) {
> + mpc85xx_cache_sram_free(uiomem->internal_addr);
> + kfree(uiomem->name);
> + }
> + uiomem++;
> + }
> +}
> +
> +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
> +{
> + struct device_node *parent = pdev->dev.of_node;
> + struct device_node *node = NULL;
> + struct uio_info *info;
> +

[PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2020-04-15 Thread Andrew Donnellan

From: Michael Ellerman 

commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.

In order to implement KUAP (Kernel Userspace Access Protection) on
Power9 we will be using the AMR, and therefore indirectly the
UAMOR/AMOR.

So save/restore these regs in the idle code.

Signed-off-by: Michael Ellerman 
[ajd: Backport to 4.19 tree, CVE-2020-11669]
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/idle_book3s.S | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 36178000a2f2..4a860d3b9229 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -170,8 +170,11 @@ core_idle_lock_held:
bne-core_idle_lock_held
blr
 
-/* Reuse an unused pt_regs slot for IAMR */
+/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */
+#define PNV_POWERSAVE_AMR  _TRAP
 #define PNV_POWERSAVE_IAMR _DAR
+#define PNV_POWERSAVE_UAMOR_DSISR
+#define PNV_POWERSAVE_AMOR RESULT
 
 /*
  * Pass requested state in r3:
@@ -205,8 +208,16 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
 
 BEGIN_FTR_SECTION
+   mfspr   r4, SPRN_AMR
mfspr   r5, SPRN_IAMR
+   mfspr   r6, SPRN_UAMOR
+   std r4, PNV_POWERSAVE_AMR(r1)
std r5, PNV_POWERSAVE_IAMR(r1)
+   std r6, PNV_POWERSAVE_UAMOR(r1)
+BEGIN_FTR_SECTION_NESTED(42)
+   mfspr   r7, SPRN_AMOR
+   std r7, PNV_POWERSAVE_AMOR(r1)
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
mfcrr5
@@ -935,12 +946,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_GPR(2, r1)
 
 BEGIN_FTR_SECTION
-   /* IAMR was saved in pnv_powersave_common() */
+   /* These regs were saved in pnv_powersave_common() */
+   ld  r4, PNV_POWERSAVE_AMR(r1)
ld  r5, PNV_POWERSAVE_IAMR(r1)
+   ld  r6, PNV_POWERSAVE_UAMOR(r1)
+   mtspr   SPRN_AMR, r4
mtspr   SPRN_IAMR, r5
+   mtspr   SPRN_UAMOR, r6
+BEGIN_FTR_SECTION_NESTED(42)
+   ld  r7, PNV_POWERSAVE_AMOR(r1)
+   mtspr   SPRN_AMOR, r7
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
/*
-* We don't need an isync here because the upcoming mtmsrd is
-* execution synchronizing.
+* We don't need an isync here after restoring IAMR because the upcoming
+* mtmsrd is execution synchronizing.
 */
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
-- 
2.20.1

[PATCH 4.14] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle

2020-04-15 Thread Andrew Donnellan

From: Michael Ellerman 

commit 53a712bae5dd919521a58d7bad773b949358add0 upstream.

In order to implement KUAP (Kernel Userspace Access Protection) on
Power9 we will be using the AMR, and therefore indirectly the
UAMOR/AMOR.

So save/restore these regs in the idle code.

Signed-off-by: Michael Ellerman 
[ajd: Backport to 4.14 tree, CVE-2020-11669]
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/kernel/idle_book3s.S | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 74fc20431082..01b823bdb49c 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -163,8 +163,11 @@ core_idle_lock_held:
bne-core_idle_lock_held
blr
 
-/* Reuse an unused pt_regs slot for IAMR */
+/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */
+#define PNV_POWERSAVE_AMR  _TRAP
 #define PNV_POWERSAVE_IAMR _DAR
+#define PNV_POWERSAVE_UAMOR_DSISR
+#define PNV_POWERSAVE_AMOR RESULT
 
 /*
  * Pass requested state in r3:
@@ -198,8 +201,16 @@ pnv_powersave_common:
SAVE_NVGPRS(r1)
 
 BEGIN_FTR_SECTION
+   mfspr   r4, SPRN_AMR
mfspr   r5, SPRN_IAMR
+   mfspr   r6, SPRN_UAMOR
+   std r4, PNV_POWERSAVE_AMR(r1)
std r5, PNV_POWERSAVE_IAMR(r1)
+   std r6, PNV_POWERSAVE_UAMOR(r1)
+BEGIN_FTR_SECTION_NESTED(42)
+   mfspr   r7, SPRN_AMOR
+   std r7, PNV_POWERSAVE_AMOR(r1)
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
mfcrr5
@@ -951,12 +962,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_GPR(2, r1)
 
 BEGIN_FTR_SECTION
-   /* IAMR was saved in pnv_powersave_common() */
+   /* These regs were saved in pnv_powersave_common() */
+   ld  r4, PNV_POWERSAVE_AMR(r1)
ld  r5, PNV_POWERSAVE_IAMR(r1)
+   ld  r6, PNV_POWERSAVE_UAMOR(r1)
+   mtspr   SPRN_AMR, r4
mtspr   SPRN_IAMR, r5
+   mtspr   SPRN_UAMOR, r6
+BEGIN_FTR_SECTION_NESTED(42)
+   ld  r7, PNV_POWERSAVE_AMOR(r1)
+   mtspr   SPRN_AMOR, r7
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
/*
-* We don't need an isync here because the upcoming mtmsrd is
-* execution synchronizing.
+* We don't need an isync here after restoring IAMR because the upcoming
+* mtmsrd is execution synchronizing.
 */
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
-- 
2.20.1

Applied "ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe()" to the asoc tree

2020-04-15 Thread Mark Brown

The patch

   ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe()

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 83b35f4586e235bfb785a7947b555ad8f3d96887 Mon Sep 17 00:00:00 2001
From: Tang Bin 
Date: Wed, 15 Apr 2020 12:45:13 +0800
Subject: [PATCH] ASoC: fsl_micfil: Omit superfluous error message in
 fsl_micfil_probe()

In the function fsl_micfil_probe(), when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant message here.

Signed-off-by: Tang Bin 
Signed-off-by: Shengju Zhang 
Link: 
https://lore.kernel.org/r/20200415044513.17492-1-tang...@cmss.chinamobile.com
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_micfil.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c
index f7f2d29f1bfe..e73bd6570a08 100644
--- a/sound/soc/fsl/fsl_micfil.c
+++ b/sound/soc/fsl/fsl_micfil.c
@@ -702,10 +702,8 @@ static int fsl_micfil_probe(struct platform_device *pdev)
for (i = 0; i < MICFIL_IRQ_LINES; i++) {
micfil->irq[i] = platform_get_irq(pdev, i);
dev_err(&pdev->dev, "GET IRQ: %d\n", micfil->irq[i]);
-   if (micfil->irq[i] < 0) {
-   dev_err(&pdev->dev, "no irq for node %s\n", pdev->name);
+   if (micfil->irq[i] < 0)
return micfil->irq[i];
-   }
}
 
if (of_property_read_bool(np, "fsl,shared-interrupt"))
-- 
2.20.1

[PATCH 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr

Include "linux/of_address.h" to fix the compile error for
mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.

  CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’:
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of 
function ‘of_iomap’; did you mean ‘pci_iomap’? 
[-Werror=implicit-function-declaration]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
   ^~~~
   pci_iomap
arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer 
from integer without a cast [-Werror=int-conversion]
  l2ctlr = of_iomap(dev->dev.of_node, 0);
 ^
cc1: all warnings being treated as errors
scripts/Makefile.build:267: recipe for target 
'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c 
b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
index 2d0af0c517bb..7533572492f0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
+++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "fsl_85xx_cache_ctlr.h"
-- 
2.17.1

[PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram

A driver for freescale 85xx platforms to access the Cache-Sram form
user level. This is extremely helpful for some user-space applications
that require high performance memory accesses.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
 drivers/uio/Kconfig   |   8 ++
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
 3 files changed, 204 insertions(+)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 202ee81cfc2b..afd38ec13de0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -105,6 +105,14 @@ config UIO_NETX
  To compile this driver as a module, choose M here; the module
  will be called uio_netx.
 
+config UIO_FSL_85XX_CACHE_SRAM
+   tristate "Freescale 85xx Cache-Sram driver"
+   depends on FSL_85XX_CACHE_SRAM
+   help
+ Generic driver for accessing the Cache-Sram form user level. This
+ is extremely helpful for some user-space applications that require
+ high performance memory accesses.
+
 config UIO_FSL_ELBC_GPCM
tristate "eLBC/GPCM driver"
depends on FSL_LBC
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index c285dd2a4539..be2056cffc21 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o
 obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
 obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)  += uio_fsl_85xx_cache_sram.o
 obj-$(CONFIG_UIO_HV_GENERIC)   += uio_hv_generic.o
diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c 
b/drivers/uio/uio_fsl_85xx_cache_sram.c
new file mode 100644
index ..e11202dd5b93
--- /dev/null
+++ b/drivers/uio/uio_fsl_85xx_cache_sram.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+ * Copyright (C) 2020 Wang Wenhu 
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION "0.1.0"
+#define DRIVER_NAME"uio_fsl_85xx_cache_sram"
+#define UIO_NAME   "uio_cache_sram"
+
+static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = {
+   {   .compatible = "uio,fsl,p2020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p2010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1020-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1011-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1013-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1022-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,mpc8548-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8544-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8572-l2-cache-controller",},
+   {   .compatible = "uio,fsl,mpc8536-l2-cache-controller",},
+   {   .compatible = "uio,fsl,p1021-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1012-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1025-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1016-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1024-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1015-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,p1010-l2-cache-controller",  },
+   {   .compatible = "uio,fsl,bsc9131-l2-cache-controller",},
+   {},
+};
+
+static void uio_info_free_internal(struct uio_info *info)
+{
+   struct uio_mem *uiomem = &info->mem[0];
+
+   while (uiomem < &info->mem[MAX_UIO_MAPS]) {
+   if (uiomem->size) {
+   mpc85xx_cache_sram_free(uiomem->internal_addr);
+   kfree(uiomem->name);
+   }
+   uiomem++;
+   }
+}
+
+static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev)
+{
+   struct device_node *parent = pdev->dev.of_node;
+   struct device_node *node = NULL;
+   struct uio_info *info;
+   struct uio_mem *uiomem;
+   const char *dt_name;
+   u32 mem_size;
+   u32 align;
+   void *virt;
+   phys_addr_t phys;
+   int ret = -ENODEV;
+
+   /* alloc uio_info for one device */
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info) {
+   dev_err(&pdev->dev, "kzalloc uio_info failed\n");
+   ret = -ENOMEM;
+

[PATCH 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram

Function instantiate_cache_sram should not be linked into the init
section for its caller mpc85xx_l2ctlr_of_probe is none-__init.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 

Warning information:
  MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from 
the function mpc85xx_l2ctlr_of_probe() to the function 
.init.text:instantiate_cache_sram()
The function mpc85xx_l2ctlr_of_probe() references
the function __init instantiate_cache_sram().
This is often because mpc85xx_l2ctlr_of_probe lacks a __init
annotation or the annotation of instantiate_cache_sram is wrong.
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index be3aef4229d7..3de5ac8382c0 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr)
 }
 EXPORT_SYMBOL(mpc85xx_cache_sram_free);
 
-int __init instantiate_cache_sram(struct platform_device *dev,
+int instantiate_cache_sram(struct platform_device *dev,
struct sram_parameters sram_params)
 {
int ret = 0;
-- 
2.17.1

[PATCH 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram

Include linux/io.h into fsl_85xx_cache_sram.c to fix the
implicit-declaration compile error when building Cache-Sram.

arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’:
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of 
function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? 
[-Werror=implicit-function-declaration]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
  ^~~~
  bitmap_complement
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes 
pointer from integer without a cast [-Werror=int-conversion]
  cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
^
arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of 
function ‘iounmap’; did you mean ‘roundup’? 
[-Werror=implicit-function-declaration]
  iounmap(cache_sram->base_virt);
  ^~~
  roundup
cc1: all warnings being treated as errors

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
index f6c665dac725..be3aef4229d7 100644
--- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
+++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fsl_85xx_cache_ctlr.h"
 
-- 
2.17.1

[PATCH 0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram

This series add a new uio driver for freescale 85xx platforms to
access the Cache-Sram form user level. This is extremely helpful
for the user-space applications that require high performance memory
accesses.

It fixes the compile errors and warning of the hardware level drivers
and implements the uio driver in uio_fsl_85xx_cache_sram.c.

Wang Wenhu (5):
  powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
  powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
  powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
  drivers: uio: new driver for fsl_85xx_cache_sram

 arch/powerpc/platforms/85xx/Kconfig   |   2 +-
 arch/powerpc/platforms/Kconfig.cputype|   5 +-
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c |   3 +-
 arch/powerpc/sysdev/fsl_85xx_l2ctlr.c |   1 +
 drivers/uio/Kconfig   |   8 +
 drivers/uio/Makefile  |   1 +
 drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++
 7 files changed, 211 insertions(+), 4 deletions(-)
 create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c

-- 
2.17.1

[PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable

Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache
could be configured and used as a piece of SRAM which is hignly
friendly for some user level application performances.

Cc: Greg Kroah-Hartman 
Cc: Christophe Leroy 
Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Wang Wenhu 
---
 arch/powerpc/platforms/85xx/Kconfig| 2 +-
 arch/powerpc/platforms/Kconfig.cputype | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index fa3d29dcb57e..6debb4f1b9cc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -17,7 +17,7 @@ if FSL_SOC_BOOKE
 if PPC32
 
 config FSL_85XX_CACHE_SRAM
-   bool
+   bool "Freescale 85xx Cache-Sram"
select PPC_LIB_RHEAP
help
  When selected, this option enables cache-sram support
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c3c1902135c..1921e9a573e8 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 config PPC32
-   bool
+   bool "32-bit kernel"
default y if !PPC64
select KASAN_VMALLOC if KASAN && MODULES
 
@@ -15,6 +15,7 @@ config PPC_BOOK3S_32
bool
 
 menu "Processor support"
+
 choice
prompt "Processor Type"
depends on PPC32
@@ -211,9 +212,9 @@ config PPC_BOOK3E
depends on PPC_BOOK3E_64
 
 config E500
+   bool "e500 Support"
select FSL_EMB_PERFMON
select PPC_FSL_BOOK3E
-   bool
 
 config PPC_E500MC
bool "e500mc Support"
-- 
2.17.1

[PATCH AUTOSEL 4.9 06/21] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 4.14 10/30] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 4.19 10/40] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index b7f937563827d..d1fee2d35b49c 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -299,23 +299,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.4 32/84] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 9cd6f3e1000b3..09a0594350b69 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -294,23 +294,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.4 31/84] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index eba9d4ee4baf6..689664cd4e79b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1761,6 +1761,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)&os_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.4 21/84] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 36abbe3c346df..e2183fed947d4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3623,6 +3623,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

[PATCH AUTOSEL 5.5 046/106] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 9cd6f3e1000b3..09a0594350b69 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -294,23 +294,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.5 045/106] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 577345382b23f..673f13b87db13 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)&os_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.5 031/106] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ef6aa63b071b3..a1d793b96d2b7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3628,6 +3628,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

[PATCH AUTOSEL 5.6 053/129] powerpc/prom_init: Pass the "os-term" message to hypervisor

From: Alexey Kardashevskiy 

[ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ]

The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/prom_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 577345382b23f..673f13b87db13 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str)
if (token == 0)
prom_panic("Could not get token for ibm,os-term\n");
os_term_args.token = cpu_to_be32(token);
+   os_term_args.nargs = cpu_to_be32(1);
+   os_term_args.nret = cpu_to_be32(1);
+   os_term_args.args[0] = cpu_to_be32(__pa(str));
prom_rtas_hcall((uint64_t)&os_term_args);
 }
 #endif /* CONFIG_PPC_SVM */
-- 
2.20.1

[PATCH AUTOSEL 5.6 054/129] powerpc/maple: Fix declaration made after definition

From: Nathan Chancellor 

[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ]

When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers 
Suggested-by: Ilie Halip 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/maple/setup.c | 34 ++--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 6f019df37916f..15b2c6eb506d0 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -291,23 +291,6 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple) {
-   .name   = "Maple",
-   .probe  = maple_probe,
-   .setup_arch = maple_setup_arch,
-   .init_IRQ   = maple_init_IRQ,
-   .pci_irq_fixup  = maple_pci_irq_fixup,
-   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
-   .restart= maple_restart,
-   .halt   = maple_halt,
-   .get_boot_time  = maple_get_boot_time,
-   .set_rtc_time   = maple_set_rtc_time,
-   .get_rtc_time   = maple_get_rtc_time,
-   .calibrate_decr = generic_calibrate_decr,
-   .progress   = maple_progress,
-   .power_save = power4_idle,
-};
-
 #ifdef CONFIG_EDAC
 /*
  * Register a platform device for CPC925 memory controller on
@@ -364,3 +347,20 @@ static int __init maple_cpc925_edac_setup(void)
 }
 machine_device_initcall(maple, maple_cpc925_edac_setup);
 #endif
+
+define_machine(maple) {
+   .name   = "Maple",
+   .probe  = maple_probe,
+   .setup_arch = maple_setup_arch,
+   .init_IRQ   = maple_init_IRQ,
+   .pci_irq_fixup  = maple_pci_irq_fixup,
+   .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq,
+   .restart= maple_restart,
+   .halt   = maple_halt,
+   .get_boot_time  = maple_get_boot_time,
+   .set_rtc_time   = maple_set_rtc_time,
+   .get_rtc_time   = maple_get_rtc_time,
+   .calibrate_decr = generic_calibrate_decr,
+   .progress   = maple_progress,
+   .power_save = power4_idle,
+};
-- 
2.20.1

[PATCH AUTOSEL 5.6 039/129] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests

From: Michael Roth 

[ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ]

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 
2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when 
nested")
Cc: linuxppc-...@ozlabs.org
Cc: David Gibson 
Cc: Paul Mackerras 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Paul Mackerras 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2cefd071b8483..c0c43a7338304 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3616,6 +3616,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_nested_cede(vcpu);
+   kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
} else {
-- 
2.20.1

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-04-15 Thread Alexey Kardashevskiy

On 15/04/2020 16:18, Christoph Hellwig wrote:
> On Wed, Apr 15, 2020 at 12:26:04PM +1000, Alexey Kardashevskiy wrote:
>> May be this is correct and allowed (no idea) but removing exported
>> symbols at least deserves a mention in the commit log, does not it?
>>
>> The rest of the series is fine and works. Thanks,
> 
> Maybe I can throw in a line, but the point is that dma_direct_*
> was exported as dma_* called them inline.  Now dma_* is out of line
> and exported instead, which always was the actual API.

They become inline in 2/4.

And the fact they were exported leaves possibility that there is a
driver somewhere relying on these symbols or distro kernel won't build
because the symbol disappeared from exports (I do not know what KABI
guarantees or if mainline kernel cares). I do not care in particular but
some might, a line separated with empty lines in the commit log would do.

-- 
Alexey

Re: [PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property

2020-04-15 Thread Wolfram Sang

On Wed, Apr 08, 2020 at 03:33:53PM +0530, Aishwarya R wrote:
> Use of_property_read_u32 to read the "reg" and "i2c-address" property
> instead of using of_get_property to check the return values.
> 
> Signed-off-by: Aishwarya R 

This is quite a fragile driver. Have you tested it on HW?



signature.asc
Description: PGP signature

Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings

2020-04-15 Thread Will Deacon

Hi Nick,

On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
> have vmalloc attempt to allocate PMD-sized pages first, before falling back
> to small pages. Allocations which use something other than PAGE_KERNEL
> protections are not permitted to use huge pages yet, not all callers expect
> this (e.g., module allocations vs strict module rwx).
> 
> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.

I wonder if it's worth extending vmap() to handle higher order pages in
a similar way? That might be helpful for tracing PMUs such as Arm SPE,
where the CPU streams tracing data out to a virtually addressed buffer
(see rb_alloc_aux_page()).

> This can result in more internal fragmentation and memory overhead for a
> given allocation. It can also cause greater NUMA unbalance on hashdist
> allocations.
> 
> There may be other callers that expect small pages under vmalloc but use
> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
> alternative would be a new function or flag which enables large mappings,
> and use that in callers.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/linux/vmalloc.h |   2 +
>  mm/vmalloc.c| 135 +---
>  2 files changed, 102 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 291313a7e663..853b82eac192 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -24,6 +24,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_UNINITIALIZED 0x0020  /* vm_struct is not fully 
> initialized */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
> +#define VM_HUGE_PAGES0x0100  /* may use huge pages */

Please can you add a check for this in the arm64 change_memory_common()
code? Other architectures might need something similar, but we need to
forbid changing memory attributes for portions of the huge page.

In general, I'm a bit wary of software table walkers tripping over this.
For example, I don't think apply_to_existing_page_range() can handle
huge mappings at all, but the one user (KASAN) only ever uses page mappings
so it's ok there.

> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned 
> long size,
>   if (unlikely(!size))
>   return NULL;
>  
> - if (flags & VM_IOREMAP)
> - align = 1ul << clamp_t(int, get_count_order_long(size),
> -PAGE_SHIFT, IOREMAP_MAX_ORDER);
> + if (flags & VM_IOREMAP) {
> + align = max(align,
> + 1ul << clamp_t(int, get_count_order_long(size),
> +PAGE_SHIFT, IOREMAP_MAX_ORDER));
> + }


I don't follow this part. Please could you explain why you're potentially
aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
of the patch.

Cheers,

Will

Re: [PATCH v2] Fix: buffer overflow during hvc_alloc().