Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
On Wed, Apr 15, 2020 at 02:27:51PM -0500, Scott Wood wrote: > > > + dev_err(&pdev->dev, "error no valid uio-map configured\n"); > > > + ret = -EINVAL; > > > + goto err_info_free_internel; > > > + } > > > + > > > + info->version = "0.1.0"; > > > > Could you define some DRIVER_VERSION in the top of the file next to > > DRIVER_NAME instead of hard coding in the middle on a function ? > > That's what v1 had, and Greg KH said to remove it. I'm guessing that he > thought it was the common-but-pointless practice of having the driver print a > version number that never gets updated, rather than something the UIO API > (unfortunately, compared to a feature query interface) expects. That said, > I'm not sure what the value is of making it a macro since it should only be > used once, that use is self documenting, it isn't tunable, etc. Though if > this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro > again, it should be UIO_VERSION, not DRIVER_VERSION). > > Does this really need a three-part version scheme? What's wrong with a > version of "1", to be changed to "2" in the hopefully-unlikely event that the > userspace API changes? Assuming UIO is used for this at all, which doesn't > seem like a great fit to me. No driver version numbers at all please, they do not make any sense when the driver is included in the kernel tree. greg k-h
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
On Wed, Apr 15, 2020 at 02:26:55PM -0500, Scott Wood wrote: > Instead, have module parameters that take the sizes and alignments you'd like > to allocate and expose to userspace. Better still would be some sort of > dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment, > if it succeeds you can mmap it, and when the fd is closed the region is > freed). No module parameters please, this is not the 1990's. Use device tree, that is what it is there for. thanks, greg k-h
[PATCH v11 14/14] powerpc: Use mm_context vas_windows counter to issue CP_ABORT
set_thread_uses_vas() sets used_vas flag for a process that opened VAS window and issue CP_ABORT during context switch for only that process. In multi-thread application, windows can be shared. For example Thread A can open a window and Thread B can run COPY/PASTE instructions to send NX request which may cause corruption or snooping or a covert channel Also once this flag is set, continue to run CP_ABORT even the VAS window is closed. So define vas-windows counter in process mm_context, increment this counter for each window open and decrement it for window close. If vas-windows is set, issue CP_ABORT during context switch. It means clear the foreign real address mapping only if the process / thread uses COPY/PASTE. Then disable it for that process if windows are not open. Moved set_thread_uses_vas() code to vas_tx_win_open() as this functionality is needed only for userspace open windows. We are adding VAS userspace support along with this fix. So no need to include this fix in stable releases. Fixes: 9d2a4d71332c ("powerpc: Define set_thread_uses_vas()") Signed-off-by: Haren Myneni Reported-by: Nicholas Piggin Suggested-by: Milton Miller Suggested-by: Nicholas Piggin Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu.h| 3 +++ arch/powerpc/include/asm/mmu_context.h | 30 + arch/powerpc/include/asm/processor.h| 1 - arch/powerpc/include/asm/switch_to.h| 2 -- arch/powerpc/kernel/process.c | 24 ++- arch/powerpc/platforms/powernv/vas-window.c | 22 - 6 files changed, 48 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index bb3deb7..f0a9ff6 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -116,6 +116,9 @@ struct patb_entry { /* Number of users of the external (Nest) MMU */ atomic_t copros; + /* Number of user space windows opened in process mm_context */ + atomic_t vas_windows; + struct hash_mm_context *hash_context; unsigned long vdso_base; diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 360367c..1a474f6b 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -185,11 +185,41 @@ static inline void mm_context_remove_copro(struct mm_struct *mm) dec_mm_active_cpus(mm); } } + +/* + * vas_windows counter shows number of open windows in the mm + * context. During context switch, use this counter to clear the + * foreign real address mapping (CP_ABORT) for the thread / process + * that intend to use COPY/PASTE. When a process closes all windows, + * disable CP_ABORT which is expensive to run. + * + * For user context, register a copro so that TLBIs are seen by the + * nest MMU. mm_context_add/remove_vas_window() are used only for user + * space windows. + */ +static inline void mm_context_add_vas_window(struct mm_struct *mm) +{ + atomic_inc(&mm->context.vas_windows); + mm_context_add_copro(mm); +} + +static inline void mm_context_remove_vas_window(struct mm_struct *mm) +{ + int v; + + mm_context_remove_copro(mm); + v = atomic_dec_if_positive(&mm->context.vas_windows); + + /* Detect imbalance between add and remove */ + WARN_ON(v < 0); +} #else static inline void inc_mm_active_cpus(struct mm_struct *mm) { } static inline void dec_mm_active_cpus(struct mm_struct *mm) { } static inline void mm_context_add_copro(struct mm_struct *mm) { } static inline void mm_context_remove_copro(struct mm_struct *mm) { } +static inline void mm_context_add_vas_windows(struct mm_struct *mm) { } +static inline void mm_context_remove_vas_windows(struct mm_struct *mm) { } #endif diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index eedcbfb..bfa336f 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -272,7 +272,6 @@ struct thread_struct { unsignedmmcr0; unsignedused_ebb; - unsigned intused_vas; #endif }; diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 5b03d8a..012db9a 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -91,8 +91,6 @@ static inline void clear_task_ebb(struct task_struct *t) #endif } -extern int set_thread_uses_vas(void); - extern int set_thread_tidr(struct task_struct *t); #endif /* _ASM_POWERPC_SWITCH_TO_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db..ed3f645 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1221,7 +1221,8 @@ struct task_struct *__switch_to(struct task_struct *prev, * mappings, w
[PATCH v11 13/14] powerpc/vas: Free send window in VAS instance after credits returned
NX may be processing requests while trying to close window. Wait until all credits are returned and then free send window from VAS instance. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index d0c07cf..e15b405 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1316,14 +1316,14 @@ int vas_win_close(struct vas_window *window) unmap_paste_region(window); - clear_vinst_win(window); - poll_window_busy_state(window); unpin_close_window(window); poll_window_credits(window); + clear_vinst_win(window); + poll_window_castout(window); /* if send window, drop reference to matching receive window */ -- 1.8.3.1
[PATCH v11 12/14] powerpc/vas: Display process stuck message
Process can not close send window until all requests are processed. Means wait until window state is not busy and send credits are returned. Display debug messages in case taking longer to close the window. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 30 - 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 4b5adf5..d0c07cf 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1181,6 +1181,7 @@ static void poll_window_credits(struct vas_window *window) { u64 val; int creds, mode; + int count = 0; val = read_hvwc_reg(window, VREG(WINCTL)); if (window->tx_win) @@ -1199,10 +1200,27 @@ static void poll_window_credits(struct vas_window *window) creds = GET_FIELD(VAS_LRX_WCRED, val); } + /* +* Takes around few milliseconds to complete all pending requests +* and return credits. +* TODO: Scan fault FIFO and invalidate CRBs points to this window +* and issue CRB Kill to stop all pending requests. Need only +* if there is a bug in NX or fault handling in kernel. +*/ if (creds < window->wcreds_max) { val = 0; set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(10)); + count++; + /* +* Process can not close send window until all credits are +* returned. +*/ + if (!(count % 1000)) + pr_warn_ratelimited("VAS: pid %d stuck. Waiting for credits returned for Window(%d). creds %d, Retries %d\n", + vas_window_pid(window), window->winid, + creds, count); + goto retry; } } @@ -1216,6 +1234,7 @@ static void poll_window_busy_state(struct vas_window *window) { int busy; u64 val; + int count = 0; retry: val = read_hvwc_reg(window, VREG(WIN_STATUS)); @@ -1223,7 +1242,16 @@ static void poll_window_busy_state(struct vas_window *window) if (busy) { val = 0; set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(msecs_to_jiffies(5)); + schedule_timeout(msecs_to_jiffies(10)); + count++; + /* +* Takes around few milliseconds to process all pending +* requests. +*/ + if (!(count % 1000)) + pr_warn_ratelimited("VAS: pid %d stuck. Window (ID=%d) is in busy state. Retries %d\n", + vas_window_pid(window), window->winid, count); + goto retry; } } -- 1.8.3.1
[PATCH v11 11/14] powerpc/vas: Do not use default credits for receive window
System checkstops if RxFIFO overruns with more requests than the maximum possible number of CRBs allowed in FIFO at any time. So max credits value (rxattr.wcreds_max) is set and is passed to vas_rx_win_open() by the the driver. Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 4 ++-- arch/powerpc/platforms/powernv/vas.h| 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 3ef7120..4b5adf5 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -772,7 +772,7 @@ static bool rx_win_args_valid(enum vas_cop_type cop, if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX) return false; - if (attr->wcreds_max > VAS_RX_WCREDS_MAX) + if (!attr->wcreds_max) return false; if (attr->nx_win) { @@ -877,7 +877,7 @@ struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop, rxwin->nx_win = rxattr->nx_win; rxwin->user_win = rxattr->user_win; rxwin->cop = cop; - rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; + rxwin->wcreds_max = rxattr->wcreds_max; init_winctx_for_rxwin(rxwin, rxattr, &winctx); init_winctx_regs(rxwin, &winctx); diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 60bdda6..a7143b1 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -101,11 +101,9 @@ /* * Initial per-process credits. * Max send window credits:4K-1 (12-bits in VAS_TX_WCRED) - * Max receive window credits: 64K-1 (16 bits in VAS_LRX_WCRED) * * TODO: Needs tuning for per-process credits */ -#define VAS_RX_WCREDS_MAX ((64 << 10) - 1) #define VAS_TX_WCREDS_MAX ((4 << 10) - 1) #define VAS_WCREDS_DEFAULT (1 << 10) -- 1.8.3.1
[PATCH v11 10/14] powerpc/vas: Print CRB and FIFO values
Dump FIFO entries if could not find send window and print CRB for debugging. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 41 ++ 1 file changed, 41 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index b6bec64..25db70b 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -26,6 +26,28 @@ */ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) +static void dump_crb(struct coprocessor_request_block *crb) +{ + struct data_descriptor_entry *dde; + struct nx_fault_stamp *nx; + + dde = &crb->source; + pr_devel("SrcDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n", + be64_to_cpu(dde->address), be32_to_cpu(dde->length), + dde->count, dde->index, dde->flags); + + dde = &crb->target; + pr_devel("TgtDDE: addr 0x%llx, len %d, count %d, idx %d, flags %d\n", + be64_to_cpu(dde->address), be32_to_cpu(dde->length), + dde->count, dde->index, dde->flags); + + nx = &crb->stamp.nx; + pr_devel("NX Stamp: PSWID 0x%x, FSA 0x%llx, flags 0x%x, FS 0x%x\n", + be32_to_cpu(nx->pswid), + be64_to_cpu(crb->stamp.nx.fault_storage_addr), + nx->flags, nx->fault_status); +} + /* * Update the CSB to indicate a translation error. * @@ -148,6 +170,23 @@ static void update_csb(struct vas_window *window, pid_vnr(pid), rc); } +static void dump_fifo(struct vas_instance *vinst, void *entry) +{ + unsigned long *end = vinst->fault_fifo + vinst->fault_fifo_size; + unsigned long *fifo = entry; + int i; + + pr_err("Fault fifo size %d, Max crbs %d\n", vinst->fault_fifo_size, + vinst->fault_fifo_size / CRB_SIZE); + + /* Dump 10 CRB entries or until end of FIFO */ + pr_err("Fault FIFO Dump:\n"); + for (i = 0; i < 10*(CRB_SIZE/8) && fifo < end; i += 4, fifo += 4) { + pr_err("[%.3d, %p]: 0x%.16lx 0x%.16lx 0x%.16lx 0x%.16lx\n", + i, fifo, *fifo, *(fifo+1), *(fifo+2), *(fifo+3)); + } +} + /* * Process valid CRBs in fault FIFO. * NX process user space requests, return credit and update the status @@ -233,6 +272,7 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data) vinst->vas_id, vinst->fault_fifo, fifo, vinst->fault_crbs); + dump_crb(crb); window = vas_pswid_to_window(vinst, be32_to_cpu(crb->stamp.nx.pswid)); @@ -245,6 +285,7 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data) * But we should not get here. * TODO: Disable IRQ. */ + dump_fifo(vinst, (void *)entry); pr_err("VAS[%d] fault_fifo %p, fifo %p, pswid 0x%x, fault_crbs %d bad CRB?\n", vinst->vas_id, vinst->fault_fifo, fifo, be32_to_cpu(crb->stamp.nx.pswid), -- 1.8.3.1
[PATCH v11 09/14] powerpc/vas: Return credits after handling fault
NX uses credit mechanism to control the number of requests issued on a specific window at any point of time. Only send windows and fault window are used credits. When the request is issued on a given window, a credit is taken. This credit will be returned after that request is processed. If credits are not available, returns RMA_Busy for send window and RMA_Reject for fault window. NX expects OS to return credit for send window after processing fault CRB. Also credit has to be returned for fault window after handling the fault. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 9 arch/powerpc/platforms/powernv/vas-window.c | 36 + arch/powerpc/platforms/powernv/vas.h| 1 + 3 files changed, 46 insertions(+) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index 354577d..b6bec64 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -224,6 +224,10 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data) memcpy(crb, fifo, CRB_SIZE); entry->stamp.nx.pswid = cpu_to_be32(FIFO_INVALID_ENTRY); entry->ccw |= cpu_to_be32(CCW0_INVALID); + /* +* Return credit for the fault window. +*/ + vas_return_credit(vinst->fault_win, false); pr_devel("VAS[%d] fault_fifo %p, fifo %p, fault_crbs %d\n", vinst->vas_id, vinst->fault_fifo, fifo, @@ -249,6 +253,11 @@ irqreturn_t vas_fault_thread_fn(int irq, void *data) WARN_ON_ONCE(1); } else { update_csb(window, crb); + /* +* Return credit for send window after processing +* fault CRB. +*/ + vas_return_credit(window, true); } } } diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index f12f7eb..3ef7120 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -1317,6 +1317,42 @@ int vas_win_close(struct vas_window *window) } EXPORT_SYMBOL_GPL(vas_win_close); +/* + * Return credit for the given window. + * Send windows and fault window uses credit mechanism as follows: + * + * Send windows: + * - The default number of credits available for each send window is + * 1024. It means 1024 requests can be issued asynchronously at the + * same time. If the credit is not available, that request will be + * returned with RMA_Busy. + * - One credit is taken when NX request is issued. + * - This credit is returned after NX processed that request. + * - If NX encounters translation error, kernel will return the + * credit on the specific send window after processing the fault CRB. + * + * Fault window: + * - The total number credits available is FIFO_SIZE/CRB_SIZE. + * Means 4MB/128 in the current implementation. If credit is not + * available, RMA_Reject is returned. + * - A credit is taken when NX pastes CRB in fault FIFO. + * - The kernel with return credit on fault window after reading entry + * from fault FIFO. + */ +void vas_return_credit(struct vas_window *window, bool tx) +{ + uint64_t val; + + val = 0ULL; + if (tx) { /* send window */ + val = SET_FIELD(VAS_TX_WCRED, val, 1); + write_hvwc_reg(window, VREG(TX_WCRED_ADDER), val); + } else { + val = SET_FIELD(VAS_LRX_WCRED, val, 1); + write_hvwc_reg(window, VREG(LRX_WCRED_ADDER), val); + } +} + struct vas_window *vas_pswid_to_window(struct vas_instance *vinst, uint32_t pswid) { diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index cd165c8..60bdda6 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -436,6 +436,7 @@ struct vas_winctx { extern int vas_setup_fault_window(struct vas_instance *vinst); extern irqreturn_t vas_fault_thread_fn(int irq, void *data); extern irqreturn_t vas_fault_handler(int irq, void *dev_id); +extern void vas_return_credit(struct vas_window *window, bool tx); extern struct vas_window *vas_pswid_to_window(struct vas_instance *vinst, uint32_t pswid); -- 1.8.3.1
[PATCH v11 07/14] powerpc/vas: Setup thread IRQ handler per VAS instance
When NX encounters translation error on CRB and any request buffer, raises an interrupt on the CPU to handle the fault. It can raise one interrupt for multiple faults. Expects OS to handle these faults and return credits for fault window after processing faults. Setup thread IRQ handler and IRQ thread function per each VAS instance. IRQ handler checks if the thread is already woken up and can handle new faults. If so returns with IRQ_HANDLED, otherwise wake up thread to process new faults. The thread functions reads each CRB entry from fault FIFO until sees invalid entry. After reading each CRB, determine the corresponding send window using pswid (from CRB) and process fault CRB. Then invalidate the entry and return credit. Processing fault CRB and return credit is described in subsequent patches. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 131 arch/powerpc/platforms/powernv/vas-window.c | 60 + arch/powerpc/platforms/powernv/vas.c| 23 - arch/powerpc/platforms/powernv/vas.h| 7 ++ 4 files changed, 220 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index 4044998..0da8358 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "vas.h" @@ -25,6 +26,136 @@ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) /* + * Process valid CRBs in fault FIFO. + * NX process user space requests, return credit and update the status + * in CRB. If it encounters transalation error when accessing CRB or + * request buffers, raises interrupt on the CPU to handle the fault. + * It takes credit on fault window, updates nx_fault_stamp in CRB with + * the following information and pastes CRB in fault FIFO. + * + * pswid - window ID of the window on which the request is sent. + * fault_storage_addr - fault address + * + * It can raise a single interrupt for multiple faults. Expects OS to + * process all valid faults and return credit for each fault on user + * space and fault windows. This fault FIFO control will be done with + * credit mechanism. NX can continuously paste CRBs until credits are not + * available on fault window. Otherwise, returns with RMA_reject. + * + * Total credits available on fault window: FIFO_SIZE(4MB)/CRBS_SIZE(128) + * + */ +irqreturn_t vas_fault_thread_fn(int irq, void *data) +{ + struct vas_instance *vinst = data; + struct coprocessor_request_block *crb, *entry; + struct coprocessor_request_block buf; + struct vas_window *window; + unsigned long flags; + void *fifo; + + crb = &buf; + + /* +* VAS can interrupt with multiple page faults. So process all +* valid CRBs within fault FIFO until reaches invalid CRB. +* We use CCW[0] and pswid to validate validate CRBs: +* +* CCW[0] Reserved bit. When NX pastes CRB, CCW[0]=0 +* OS sets this bit to 1 after reading CRB. +* pswidNX assigns window ID. Set pswid to -1 after +* reading CRB from fault FIFO. +* +* We exit this function if no valid CRBs are available to process. +* So acquire fault_lock and reset fifo_in_progress to 0 before +* exit. +* In case kernel receives another interrupt with different page +* fault, interrupt handler returns with IRQ_HANDLED if +* fifo_in_progress is set. Means these new faults will be +* handled by the current thread. Otherwise set fifo_in_progress +* and return IRQ_WAKE_THREAD to wake up thread. +*/ + while (true) { + spin_lock_irqsave(&vinst->fault_lock, flags); + /* +* Advance the fault fifo pointer to next CRB. +* Use CRB_SIZE rather than sizeof(*crb) since the latter is +* aligned to CRB_ALIGN (256) but the CRB written to by VAS is +* only CRB_SIZE in len. +*/ + fifo = vinst->fault_fifo + (vinst->fault_crbs * CRB_SIZE); + entry = fifo; + + if ((entry->stamp.nx.pswid == cpu_to_be32(FIFO_INVALID_ENTRY)) + || (entry->ccw & cpu_to_be32(CCW0_INVALID))) { + vinst->fifo_in_progress = 0; + spin_unlock_irqrestore(&vinst->fault_lock, flags); + return IRQ_HANDLED; + } + + spin_unlock_irqrestore(&vinst->fault_lock, flags); + vinst->fault_crbs++; + if (vinst->fault_crbs == (vinst->fault_fifo_size / CRB_SIZE)) + vinst->fault_crbs = 0; + + memcpy(crb, fifo, CRB_SIZE); + entry->stamp.nx.pswid = cpu_to_be32(FIFO_
[PATCH v11 08/14] powerpc/vas: Update CSB and notify process for fault CRBs
Applications polls on CSB for the status update after requests are issued. NX process these requests and update the CSB with the status. If it encounters translation error, pastes CRB in fault FIFO and raises an interrupt. The kernel handles fault by reading CRB from fault FIFO and process the fault CRB. For each fault CRB, update fault address in CRB (fault_storage_addr) and translation error status in CSB so that user space can touch the fault address and resend the request. If the user space passed invalid CSB address send signal to process with SIGSEGV. In the case of multi-thread applications, child thread may not be available. So if the task is not running, send signal to tgid. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-fault.c | 126 - 1 file changed, 125 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c index 0da8358..354577d 100644 --- a/arch/powerpc/platforms/powernv/vas-fault.c +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -26,6 +27,128 @@ #define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) /* + * Update the CSB to indicate a translation error. + * + * User space will be polling on CSB after the request is issued. + * If NX can handle the request without any issues, it updates CSB. + * Whereas if NX encounters page fault, the kernel will handle the + * fault and update CSB with translation error. + * + * If we are unable to update the CSB means copy_to_user failed due to + * invalid csb_addr, send a signal to the process. + */ +static void update_csb(struct vas_window *window, + struct coprocessor_request_block *crb) +{ + struct coprocessor_status_block csb; + struct kernel_siginfo info; + struct task_struct *tsk; + void __user *csb_addr; + struct pid *pid; + int rc; + + /* +* NX user space windows can not be opened for task->mm=NULL +* and faults will not be generated for kernel requests. +*/ + if (WARN_ON_ONCE(!window->mm || !window->user_win)) + return; + + csb_addr = (void __user *)be64_to_cpu(crb->csb_addr); + + memset(&csb, 0, sizeof(csb)); + csb.cc = CSB_CC_TRANSLATION; + csb.ce = CSB_CE_TERMINATION; + csb.cs = 0; + csb.count = 0; + + /* +* NX operates and returns in BE format as defined CRB struct. +* So saves fault_storage_addr in BE as NX pastes in FIFO and +* expects user space to convert to CPU format. +*/ + csb.address = crb->stamp.nx.fault_storage_addr; + csb.flags = 0; + + pid = window->pid; + tsk = get_pid_task(pid, PIDTYPE_PID); + /* +* Process closes send window after all pending NX requests are +* completed. In multi-thread applications, a child thread can +* open a window and can exit without closing it. May be some +* requests are pending or this window can be used by other +* threads later. We should handle faults if NX encounters +* pages faults on these requests. Update CSB with translation +* error and fault address. If csb_addr passed by user space is +* invalid, send SEGV signal to pid saved in window. If the +* child thread is not running, send the signal to tgid. +* Parent thread (tgid) will close this window upon its exit. +* +* pid and mm references are taken when window is opened by +* process (pid). So tgid is used only when child thread opens +* a window and exits without closing it. +*/ + if (!tsk) { + pid = window->tgid; + tsk = get_pid_task(pid, PIDTYPE_PID); + /* +* Parent thread (tgid) will be closing window when it +* exits. So should not get here. +*/ + if (WARN_ON_ONCE(!tsk)) + return; + } + + /* Return if the task is exiting. */ + if (tsk->flags & PF_EXITING) { + put_task_struct(tsk); + return; + } + + use_mm(window->mm); + rc = copy_to_user(csb_addr, &csb, sizeof(csb)); + /* +* User space polls on csb.flags (first byte). So add barrier +* then copy first byte with csb flags update. +*/ + if (!rc) { + csb.flags = CSB_V; + /* Make sure update to csb.flags is visible now */ + smp_mb(); + rc = copy_to_user(csb_addr, &csb, sizeof(u8)); + } + unuse_mm(window->mm); + put_task_struct(tsk); + + /* Success */ + if (!rc) + return; + + pr_debug("Invalid CSB address 0x%p signalling pid(%d)\n", + csb_addr, pid_vnr(pid)); +
[PATCH v11 06/14] powerpc/vas: Take reference to PID and mm for user space windows
When process opens a window, its pid and tgid will be saved in the vas_window struct. This window will be closed when the process exits. The kernel handles NX faults by updating CSB or send SEGV signal to pid of the process if the userspace csb addr is invalid. In multi-thread applications, a window can be opened by a child thread, but it will not be closed when this thread exits. It is expected that the parent will clean up all resources including NX windows opened by child threads. A child thread can send NX requests using this window and could be killed before completion is reported. If the pid assigned to this thread is reused while requests are pending, a failure SEGV would be directed to the wrong place. To prevent reusing the pid, take references to pid and mm when the window is opened and release them when when the window is closed. Then if child thread is not running, SEGV signal will be sent to thread group leader (tgid). Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-debug.c | 2 +- arch/powerpc/platforms/powernv/vas-window.c | 50 ++--- arch/powerpc/platforms/powernv/vas.h| 9 +- 3 files changed, 55 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-debug.c b/arch/powerpc/platforms/powernv/vas-debug.c index 09e63df..ef9a717 100644 --- a/arch/powerpc/platforms/powernv/vas-debug.c +++ b/arch/powerpc/platforms/powernv/vas-debug.c @@ -38,7 +38,7 @@ static int info_show(struct seq_file *s, void *private) seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop), window->tx_win ? "Send" : "Receive"); - seq_printf(s, "Pid : %d\n", window->pid); + seq_printf(s, "Pid : %d\n", vas_window_pid(window)); unlock: mutex_unlock(&vas_mutex); diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index dc46bf6..063cda2 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include #include #include "vas.h" @@ -876,8 +878,6 @@ struct vas_window *vas_rx_win_open(int vasid, enum vas_cop_type cop, rxwin->user_win = rxattr->user_win; rxwin->cop = cop; rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; - if (rxattr->user_win) - rxwin->pid = task_pid_vnr(current); init_winctx_for_rxwin(rxwin, rxattr, &winctx); init_winctx_regs(rxwin, &winctx); @@ -1027,7 +1027,6 @@ struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop, txwin->tx_win = 1; txwin->rxwin = rxwin; txwin->nx_win = txwin->rxwin->nx_win; - txwin->pid = attr->pid; txwin->user_win = attr->user_win; txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT; @@ -1057,6 +1056,40 @@ struct vas_window *vas_tx_win_open(int vasid, enum vas_cop_type cop, rc = set_thread_uses_vas(); if (rc) goto free_window; + + /* +* Window opened by a child thread may not be closed when +* it exits. So take reference to its pid and release it +* when the window is free by parent thread. +* Acquire a reference to the task's pid to make sure +* pid will not be re-used - needed only for multithread +* applications. +*/ + txwin->pid = get_task_pid(current, PIDTYPE_PID); + /* +* Acquire a reference to the task's mm. +*/ + txwin->mm = get_task_mm(current); + + if (!txwin->mm) { + put_pid(txwin->pid); + pr_err("VAS: pid(%d): mm_struct is not found\n", + current->pid); + rc = -EPERM; + goto free_window; + } + + mmgrab(txwin->mm); + mmput(txwin->mm); + mm_context_add_copro(txwin->mm); + /* +* Process closes window during exit. In the case of +* multithread application, the child thread can open +* window and can exit without closing it. Expects parent +* thread to use and close the window. So do not need +* to take pid reference for parent thread. +*/ + txwin->tgid = find_get_pid(task_tgid_vnr(current)); } set_vinst_win(vinst, txwin); @@ -1257,8 +1290,17 @@ int vas_win_close(struct vas_window *window) poll_window_castout(window); /* if send window, drop reference to matching receive window */ - if (window->tx_win) + if (window->tx_win) { + if (window->user_win) { + /* Dro
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
Le 16/04/2020 à 07:22, Wang Wenhu a écrit : Yes, kzalloc() would clean the allocated areas and the init of remaining array elements are redundant. I will remove the block in v3. + dev_err(&pdev->dev, "error no valid uio-map configured\n"); + ret = -EINVAL; + goto err_info_free_internel; + } + + info->version = "0.1.0"; Could you define some DRIVER_VERSION in the top of the file next to DRIVER_NAME instead of hard coding in the middle on a function ? That's what v1 had, and Greg KH said to remove it. I'm guessing that he thought it was the common-but-pointless practice of having the driver print a version number that never gets updated, rather than something the UIO API (unfortunately, compared to a feature query interface) expects. That said, I'm not sure what the value is of making it a macro since it should only be used once, that use is self documenting, it isn't tunable, etc. Though if this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro again, it should be UIO_VERSION, not DRIVER_VERSION). Does this really need a three-part version scheme? What's wrong with a version of "1", to be changed to "2" in the hopefully-unlikely event that the userspace API changes? Assuming UIO is used for this at all, which doesn't seem like a great fit to me. -Scott As Scott mentioned, the version define as necessity by uio core but actually useless for us here(and for many other type of devices I guess). So maybe the better way is to set it optionally, but this belong first to uio core. For the cache-sram uio driver, I will define an UIO_VERSION micro as a compromise fit all wonders, no confusing as Greg first mentioned. Yes I like it. +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { + { .compatible = "uio,fsl,p2020-l2-cache-controller",}, + { .compatible = "uio,fsl,p2010-l2-cache-controller",}, + { .compatible = "uio,fsl,p1020-l2-cache-controller",}, + { .compatible = "uio,fsl,p1011-l2-cache-controller",}, + { .compatible = "uio,fsl,p1013-l2-cache-controller",}, + { .compatible = "uio,fsl,p1022-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8548-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8544-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8572-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8536-l2-cache-controller", }, + { .compatible = "uio,fsl,p1021-l2-cache-controller",}, + { .compatible = "uio,fsl,p1012-l2-cache-controller",}, + { .compatible = "uio,fsl,p1025-l2-cache-controller",}, + { .compatible = "uio,fsl,p1016-l2-cache-controller",}, + { .compatible = "uio,fsl,p1024-l2-cache-controller",}, + { .compatible = "uio,fsl,p1015-l2-cache-controller",}, + { .compatible = "uio,fsl,p1010-l2-cache-controller",}, + { .compatible = "uio,fsl,bsc9131-l2-cache-controller", }, + {}, +}; NACK The device tree describes the hardware, not what driver you want to bind the hardware to, or how you want to allocate the resources. And even if defining nodes for sram allocation were the right way to go, why do you have a separate compatible for each chip when you're just describing software configuration? Instead, have module parameters that take the sizes and alignments you'd like to allocate and expose to userspace. Better still would be some sort of dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment, if it succeeds you can mmap it, and when the fd is closed the region is freed). -Scott Can not agree more. But what if I want to define more than one cache-sram uio devices? How about use the device tree for pseudo uio cache-sram driver? static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { { .compatible = "uio,cache-sram", }, {}, }; You can still give it a name in line with your driver, ie: "uio,mpc85xx-cache-sram" After, it you have different behaviours depending on the compatible, then you have to add a .data field which will tell the driver which behaviour to implement. Christophe
[PATCH v11 05/14] powerpc/vas: Register NX with fault window ID and IRQ port value
For each user space send window, register NX with fault window ID and port value so that NX paste CRBs in this fault FIFO when it sees fault on the request buffer. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/vas-window.c | 15 +-- arch/powerpc/platforms/powernv/vas.h| 15 +++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 1783fa9..dc46bf6 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -373,7 +373,7 @@ int init_winctx_regs(struct vas_window *window, struct vas_winctx *winctx) init_xlate_regs(window, winctx->user_win); val = 0ULL; - val = SET_FIELD(VAS_FAULT_TX_WIN, val, 0); + val = SET_FIELD(VAS_FAULT_TX_WIN, val, winctx->fault_win_id); write_hvwc_reg(window, VREG(FAULT_TX_WIN), val); /* In PowerNV, interrupts go to HV. */ @@ -748,6 +748,8 @@ static void init_winctx_for_rxwin(struct vas_window *rxwin, winctx->min_scope = VAS_SCOPE_LOCAL; winctx->max_scope = VAS_SCOPE_VECTORED_GROUP; + if (rxwin->vinst->virq) + winctx->irq_port = rxwin->vinst->irq_port; } static bool rx_win_args_valid(enum vas_cop_type cop, @@ -944,13 +946,22 @@ static void init_winctx_for_txwin(struct vas_window *txwin, winctx->lpid = txattr->lpid; winctx->pidr = txattr->pidr; winctx->rx_win_id = txwin->rxwin->winid; + /* +* IRQ and fault window setup is successful. Set fault window +* for the send window so that ready to handle faults. +*/ + if (txwin->vinst->virq) + winctx->fault_win_id = txwin->vinst->fault_win->winid; winctx->dma_type = VAS_DMA_TYPE_INJECT; winctx->tc_mode = txattr->tc_mode; winctx->min_scope = VAS_SCOPE_LOCAL; winctx->max_scope = VAS_SCOPE_VECTORED_GROUP; + if (txwin->vinst->virq) + winctx->irq_port = txwin->vinst->irq_port; - winctx->pswid = 0; + winctx->pswid = txattr->pswid ? txattr->pswid : + encode_pswid(txwin->vinst->vas_id, txwin->winid); } static bool tx_win_args_valid(enum vas_cop_type cop, diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 9c8e3f5..88d084d 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -467,6 +467,21 @@ static inline u64 read_hvwc_reg(struct vas_window *win, return in_be64(win->hvwc_map+reg); } +/* + * Encode/decode the Partition Send Window ID (PSWID) for a window in + * a way that we can uniquely identify any window in the system. i.e. + * we should be able to locate the 'struct vas_window' given the PSWID. + * + * BitsUsage + * 0:7 VAS id (8 bits) + * 8:15Unused, 0 (3 bits) + * 16:31 Window id (16 bits) + */ +static inline u32 encode_pswid(int vasid, int winid) +{ + return ((u32)winid | (vasid << (31 - 7))); +} + static inline void decode_pswid(u32 pswid, int *vasid, int *winid) { if (vasid) -- 1.8.3.1
[PATCH v11 04/14] powerpc/vas: Setup fault window per VAS instance
Setup fault window for each VAS instance. When NX gets a fault on request buffer, pastes fault CRB in the corresponding fault FIFO and then raises an interrupt to the OS. The kernel handles this fault and process faults CRB from this FIFO. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/vas-fault.c | 77 + arch/powerpc/platforms/powernv/vas-window.c | 4 +- arch/powerpc/platforms/powernv/vas.c| 20 arch/powerpc/platforms/powernv/vas.h| 21 5 files changed, 121 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/vas-fault.c diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index c0f8120..395789f 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o obj-$(CONFIG_OPAL_PRD) += opal-prd.o obj-$(CONFIG_PERF_EVENTS) += opal-imc.o obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o -obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o +obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o vas-fault.o obj-$(CONFIG_OCXL_BASE)+= ocxl.o obj-$(CONFIG_SCOM_DEBUGFS) += opal-xscom.o obj-$(CONFIG_PPC_SECURE_BOOT) += opal-secvar.o diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c new file mode 100644 index 000..4044998 --- /dev/null +++ b/arch/powerpc/platforms/powernv/vas-fault.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * VAS Fault handling. + * Copyright 2019, IBM Corporation + */ + +#define pr_fmt(fmt) "vas: " fmt + +#include +#include +#include +#include +#include +#include + +#include "vas.h" + +/* + * The maximum FIFO size for fault window can be 8MB + * (VAS_RX_FIFO_SIZE_MAX). Using 4MB FIFO since each VAS + * instance will be having fault window. + * 8MB FIFO can be used if expects more faults for each VAS + * instance. + */ +#define VAS_FAULT_WIN_FIFO_SIZE(4 << 20) + +/* + * Fault window is opened per VAS instance. NX pastes fault CRB in fault + * FIFO upon page faults. + */ +int vas_setup_fault_window(struct vas_instance *vinst) +{ + struct vas_rx_win_attr attr; + + vinst->fault_fifo_size = VAS_FAULT_WIN_FIFO_SIZE; + vinst->fault_fifo = kzalloc(vinst->fault_fifo_size, GFP_KERNEL); + if (!vinst->fault_fifo) { + pr_err("Unable to alloc %d bytes for fault_fifo\n", + vinst->fault_fifo_size); + return -ENOMEM; + } + + /* +* Invalidate all CRB entries. NX pastes valid entry for each fault. +*/ + memset(vinst->fault_fifo, FIFO_INVALID_ENTRY, vinst->fault_fifo_size); + vas_init_rx_win_attr(&attr, VAS_COP_TYPE_FAULT); + + attr.rx_fifo_size = vinst->fault_fifo_size; + attr.rx_fifo = vinst->fault_fifo; + + /* +* Max creds is based on number of CRBs can fit in the FIFO. +* (fault_fifo_size/CRB_SIZE). If 8MB FIFO is used, max creds +* will be 0x since the receive creds field is 16bits wide. +*/ + attr.wcreds_max = vinst->fault_fifo_size / CRB_SIZE; + attr.lnotify_lpid = 0; + attr.lnotify_pid = mfspr(SPRN_PID); + attr.lnotify_tid = mfspr(SPRN_PID); + + vinst->fault_win = vas_rx_win_open(vinst->vas_id, VAS_COP_TYPE_FAULT, + &attr); + + if (IS_ERR(vinst->fault_win)) { + pr_err("VAS: Error %ld opening FaultWin\n", + PTR_ERR(vinst->fault_win)); + kfree(vinst->fault_fifo); + return PTR_ERR(vinst->fault_win); + } + + pr_devel("VAS: Created FaultWin %d, LPID/PID/TID [%d/%d/%d]\n", + vinst->fault_win->winid, attr.lnotify_lpid, + attr.lnotify_pid, attr.lnotify_tid); + + return 0; +} diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c index 0c0d27d..1783fa9 100644 --- a/arch/powerpc/platforms/powernv/vas-window.c +++ b/arch/powerpc/platforms/powernv/vas-window.c @@ -827,9 +827,9 @@ void vas_init_rx_win_attr(struct vas_rx_win_attr *rxattr, enum vas_cop_type cop) rxattr->fault_win = true; rxattr->notify_disable = true; rxattr->rx_wcred_mode = true; - rxattr->tx_wcred_mode = true; rxattr->rx_win_ord_mode = true; - rxattr->tx_win_ord_mode = true; + rxattr->rej_no_credit = true; + rxattr->tc_mode = VAS_THRESH_DISABLED; } else if (cop == VAS_COP_TYPE_FTW) { rxattr->user_win = true; rxattr->intr_disable = true; diff --git a/arch/powerpc/platforms/powernv/vas.c b/arch/powerpc/platforms/powernv/vas.c index
[PATCH v11 03/14] powerpc/vas: Alloc and setup IRQ and trigger port address
Allocate a xive irq on each chip with a vas instance. The NX coprocessor raises a host CPU interrupt via vas if it encounters page fault on user space request buffer. Subsequent patches register the trigger port with the NX coprocessor, and create a vas fault handler for this interrupt mapping. Signed-off-by: Haren Myneni Reviewed-by: Cédric Le Goater --- arch/powerpc/platforms/powernv/vas.c | 44 +++- arch/powerpc/platforms/powernv/vas.h | 2 ++ 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/powernv/vas.c b/arch/powerpc/platforms/powernv/vas.c index ed9cc6d..3303cfe 100644 --- a/arch/powerpc/platforms/powernv/vas.c +++ b/arch/powerpc/platforms/powernv/vas.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "vas.h" @@ -25,10 +26,12 @@ static int init_vas_instance(struct platform_device *pdev) { - int rc, cpu, vasid; - struct resource *res; - struct vas_instance *vinst; struct device_node *dn = pdev->dev.of_node; + struct vas_instance *vinst; + struct xive_irq_data *xd; + uint32_t chipid, hwirq; + struct resource *res; + int rc, cpu, vasid; rc = of_property_read_u32(dn, "ibm,vas-id", &vasid); if (rc) { @@ -36,6 +39,12 @@ static int init_vas_instance(struct platform_device *pdev) return -ENODEV; } + rc = of_property_read_u32(dn, "ibm,chip-id", &chipid); + if (rc) { + pr_err("No ibm,chip-id property for %s?\n", pdev->name); + return -ENODEV; + } + if (pdev->num_resources != 4) { pr_err("Unexpected DT configuration for [%s, %d]\n", pdev->name, vasid); @@ -69,9 +78,32 @@ static int init_vas_instance(struct platform_device *pdev) vinst->paste_win_id_shift = 63 - res->end; - pr_devel("Initialized instance [%s, %d], paste_base 0x%llx, " - "paste_win_id_shift 0x%llx\n", pdev->name, vasid, - vinst->paste_base_addr, vinst->paste_win_id_shift); + hwirq = xive_native_alloc_irq_on_chip(chipid); + if (!hwirq) { + pr_err("Inst%d: Unable to allocate global irq for chip %d\n", + vinst->vas_id, chipid); + return -ENOENT; + } + + vinst->virq = irq_create_mapping(NULL, hwirq); + if (!vinst->virq) { + pr_err("Inst%d: Unable to map global irq %d\n", + vinst->vas_id, hwirq); + return -EINVAL; + } + + xd = irq_get_handler_data(vinst->virq); + if (!xd) { + pr_err("Inst%d: Invalid virq %d\n", + vinst->vas_id, vinst->virq); + return -EINVAL; + } + + vinst->irq_port = xd->trig_page; + pr_devel("Initialized instance [%s, %d] paste_base 0x%llx paste_win_id_shift 0x%llx IRQ %d Port 0x%llx\n", + pdev->name, vasid, vinst->paste_base_addr, + vinst->paste_win_id_shift, vinst->virq, + vinst->irq_port); for_each_possible_cpu(cpu) { if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn)) diff --git a/arch/powerpc/platforms/powernv/vas.h b/arch/powerpc/platforms/powernv/vas.h index 5574aec..598608b 100644 --- a/arch/powerpc/platforms/powernv/vas.h +++ b/arch/powerpc/platforms/powernv/vas.h @@ -313,6 +313,8 @@ struct vas_instance { u64 paste_base_addr; u64 paste_win_id_shift; + u64 irq_port; + int virq; struct mutex mutex; struct vas_window *rxwin[VAS_COP_TYPE_MAX]; struct vas_window *windows[VAS_WINDOWS_PER_CHIP]; -- 1.8.3.1
[PATCH v11 02/14] powerpc/vas: Define nx_fault_stamp in coprocessor_request_block
Kernel sets fault address and status in CRB for NX page fault on user space address after processing page fault. User space gets the signal and handles the fault mentioned in CRB by bringing the page in to memory and send NX request again. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Haren Myneni --- arch/powerpc/include/asm/icswx.h | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h index 9872f85..965b1f3 100644 --- a/arch/powerpc/include/asm/icswx.h +++ b/arch/powerpc/include/asm/icswx.h @@ -108,6 +108,17 @@ struct data_descriptor_entry { __be64 address; } __packed __aligned(DDE_ALIGN); +/* 4.3.2 NX-stamped Fault CRB */ + +#define NX_STAMP_ALIGN (0x10) + +struct nx_fault_stamp { + __be64 fault_storage_addr; + __be16 reserved; + __u8 flags; + __u8 fault_status; + __be32 pswid; +} __packed __aligned(NX_STAMP_ALIGN); /* Chapter 6.5.2 Coprocessor-Request Block (CRB) */ @@ -135,10 +146,15 @@ struct coprocessor_request_block { struct coprocessor_completion_block ccb; - u8 reserved[48]; + union { + struct nx_fault_stamp nx; + u8 reserved[16]; + } stamp; + + u8 reserved[32]; struct coprocessor_status_block csb; -} __packed __aligned(CRB_ALIGN); +} __packed; /* RFC02167 Initiate Coprocessor Instructions document -- 1.8.3.1
[PATCH v11 01/14] powerpc/xive: Define xive_native_alloc_irq_on_chip()
This function allocates IRQ on a specific chip. VAS needs per chip IRQ allocation and will have IRQ handler per VAS instance. Signed-off-by: Haren Myneni Reviewed-by: Cédric Le Goater --- arch/powerpc/include/asm/xive.h | 9 - arch/powerpc/sysdev/xive/native.c | 6 +++--- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h index 93f982db..d08ea11 100644 --- a/arch/powerpc/include/asm/xive.h +++ b/arch/powerpc/include/asm/xive.h @@ -5,6 +5,8 @@ #ifndef _ASM_POWERPC_XIVE_H #define _ASM_POWERPC_XIVE_H +#include + #define XIVE_INVALID_VP0x #ifdef CONFIG_PPC_XIVE @@ -108,7 +110,6 @@ struct xive_q { int xive_native_populate_irq_data(u32 hw_irq, struct xive_irq_data *data); void xive_cleanup_irq_data(struct xive_irq_data *xd); -u32 xive_native_alloc_irq(void); void xive_native_free_irq(u32 irq); int xive_native_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq); @@ -137,6 +138,12 @@ int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle, u32 qindex); int xive_native_get_vp_state(u32 vp_id, u64 *out_state); bool xive_native_has_queue_state_support(void); +extern u32 xive_native_alloc_irq_on_chip(u32 chip_id); + +static inline u32 xive_native_alloc_irq(void) +{ + return xive_native_alloc_irq_on_chip(OPAL_XIVE_ANY_CHIP); +} #else diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c index 0ff6b73..14d4406 100644 --- a/arch/powerpc/sysdev/xive/native.c +++ b/arch/powerpc/sysdev/xive/native.c @@ -279,12 +279,12 @@ static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) } #endif /* CONFIG_SMP */ -u32 xive_native_alloc_irq(void) +u32 xive_native_alloc_irq_on_chip(u32 chip_id) { s64 rc; for (;;) { - rc = opal_xive_allocate_irq(OPAL_XIVE_ANY_CHIP); + rc = opal_xive_allocate_irq(chip_id); if (rc != OPAL_BUSY) break; msleep(OPAL_BUSY_DELAY_MS); @@ -293,7 +293,7 @@ u32 xive_native_alloc_irq(void) return 0; return rc; } -EXPORT_SYMBOL_GPL(xive_native_alloc_irq); +EXPORT_SYMBOL_GPL(xive_native_alloc_irq_on_chip); void xive_native_free_irq(u32 irq) { -- 1.8.3.1
[PATCH v11 00/14] powerpc/vas: Page fault handling for user space NX requests
On power9, Virtual Accelerator Switchboard (VAS) allows user space or kernel to communicate with Nest Accelerator (NX) directly using COPY/PASTE instructions. NX provides various functionalities such as compression, encryption and etc. But only compression (842 and GZIP formats) is supported in Linux kernel on power9. 842 compression driver (drivers/crypto/nx/nx-842-powernv.c) is already included in Linux. Only GZIP support will be available from user space. Applications can issue GZIP compression / decompression requests to NX with COPY/PASTE instructions. When NX is processing these requests, can hit fault on the request buffer (not in memory). It issues an interrupt and pastes fault CRB in fault FIFO. Expects kernel to handle this fault and return credits for both send and fault windows after processing. This patch series adds IRQ and fault window setup, and NX fault handling: - Alloc IRQ and trigger port address, and configure IRQ per VAS instance. - Set port# for each window to generate an interrupt when noticed fault. - Set fault window and FIFO on which NX paste fault CRB. - Setup IRQ thread fault handler per VAS instance. - When receiving an interrupt, Read CRBs from fault FIFO and update coprocessor_status_block (CSB) in the corresponding CRB with translation failure (CSB_CC_TRANSLATION). After issuing NX requests, process polls on CSB address. When it sees translation error, can touch the request buffer to bring the page in to memory and reissue NX request. - If copy_to_user fails on user space CSB address, OS sends SEGV signal. Tested these patches with NX-GZIP enable patches and posted them as separate patch series. Patch 1: Define alloc IRQ per chip which is needed to alloc IRQ per VAS instance. Patch 2: Define nx_fault_stamp on which NX writes fault status for the fault CRB Patch 3: Alloc and setup IRQ and trigger port address for each VAS instance Patches 4 & 5: Setup fault window and register NX per each VAS instance. This window is used for NX to paste fault CRB in FIFO. Patch 6: Reference to pid and mm so that pid is not used until window closed. Needed for multi thread application where child can open a window and can be used by parent it later. Patch 7: Setup threaded IRQ handler per VAS Patch 8: Process CRBs from fault FIFO and notify tasks by updating CSB or through signals. Patches 9 & 11: Return credits for send and fault windows after handling faults. Patches 10 & 12: Dump FIFO / CRB data and messages for error conditions Patch 13: Fix closing send window after all credits are returned. This issue happens only for user space requests. No page faults on kernel request buffer. Patch 14: For each process / thread, use mm_context->vas_windows counter to clear foreign address mapping and disable it. Changelog: V2: - Use threaded IRQ instead of own kernel thread handler - Use pswid instead of user space CSB address to find valid CRB - Removed unused macros and other changes as suggested by Christoph Hellwig V3: - Rebased to 5.5-rc2 - Use struct pid * instead of pid_t for vas_window tgid - Code cleanup as suggested by Christoph Hellwig V4: - Define xive alloc and get IRQ info based on chip ID and use these functions for IRQ setup per VAS instance. It eliminates skiboot dependency as suggested by Oliver. V5: - Do not update CSB if the process is exiting (patch8) V6: - Add interrupt handler instead of default one and return IRQ_HANDLED if the fault handling thread is already in progress. (Patch7) - Use platform send window ID and CCW[0] bit to find valid CRB in fault FIFO (Patch7). - Return fault address to user space in BE and other changes as suggested by Michael Neuling. (patch8) - Rebased to 5.6-rc4 V7: - Fixed sparse warnings (patches 4, 9 and 10) V8: - Moved mm_context_remove_copro() before mmdrop() (patch6) - Moved barrier before csb.flags store and add WARN_ON_ONCE() checks (patch8) V9: - Rebased to 5.6 - Changes based on Cedric's comments - Removed "Define xive_native_alloc_get_irq_info()" patch and used irq_get_handler_data() (patch3) - Changes based on comments from Nicholas Piggin - Moved "Taking PID reference" patch before setting VAS fault handler patch - Removed mutex_lock/unlock (patch7) - Other cleanup changes V10: - Include patch to enable and disable CP_ABORT execution using mm_context->vas_windows counter. - Remove 'if (txwin)' line which is covered with 'else' before (patch6) V11: - Added comments for fault_lock and fifo_in_progress elements (patch7) - Use pr_warn_ratelimited instead of pr_debug to display message during window close (patch12) - Moved set_thread_uses_vas() to vas_win_open() (patch14) Haren Myneni (14): powerpc/xive: Define xive_native_alloc_irq_on_chip() powerpc/vas: Define nx_fault_stamp in coprocessor_request
Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()
Hi, Le 16/04/2020 à 00:06, Segher Boessenkool a écrit : Hi! On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote: At the time being, __put_user()/__get_user() and friends only use register indirect with immediate index addressing, with the index set to 0. Ex: lwz reg1, 0(reg2) This is called a "D-form" instruction, or sometimes "offset addressing". Don't talk about an "index", it confuses things, because the *other* kind is called "indexed" already, also in the ISA docs! (X-form, aka indexed addressing, [reg+reg], where D-form does [reg+imm], and both forms can do [reg]). In the "Programming Environments Manual for 32-Bit Implementations of the PowerPC™ Architecture", they list the following addressing modes: Load and store operations have three categories of effective address generation that depend on the operands specified: • Register indirect with immediate index mode • Register indirect with index mode • Register indirect mode Give the compiler the opportunity to use other adressing modes whenever possible, to get more optimised code. Great :-) --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -114,7 +114,7 @@ extern long __put_user_bad(void); */ #define __put_user_asm(x, addr, err, op) \ __asm__ __volatile__( \ - "1:" op " %1,0(%2) # put_user\n" \ + "1:" op "%U2%X2 %1,%2# put_user\n" \ "2:\n"\ ".section .fixup,\"ax\"\n" \ "3:li %0,%3\n"\ @@ -122,7 +122,7 @@ extern long __put_user_bad(void); ".previous\n" \ EX_TABLE(1b, 3b)\ : "=r" (err) \ - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err)) %Un on an "m" operand doesn't do much: you need to make it "m<>" if you want pre-modify ("update") insns to be generated. (You then will want to make sure that operand is used in a way GCC can understand; since it is used only once here, that works fine). Ah ? Indeed I got the idea from include/asm/io.h where there is: #define DEF_MMIO_IN_D(name, size, insn) \ static inline u##size name(const volatile u##size __iomem *addr)\ { \ u##size ret;\ __asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\ : "=r" (ret) : "m" (*addr) : "memory"); \ return ret; \ } It should be "m<>" there as well ? @@ -130,8 +130,8 @@ extern long __put_user_bad(void); #else /* __powerpc64__ */ #define __put_user_asm2(x, addr, err) \ __asm__ __volatile__( \ - "1:stw %1,0(%2)\n"\ - "2:stw %1+1,4(%2)\n" \ + "1:stw%U2%X2 %1,%2\n" \ + "2:stw%U2%X2 %L1,%L2\n" \ "3:\n"\ ".section .fixup,\"ax\"\n" \ "4:li %0,%3\n"\ @@ -140,7 +140,7 @@ extern long __put_user_bad(void); EX_TABLE(1b, 4b)\ EX_TABLE(2b, 4b)\ : "=r" (err) \ - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err)) Here, it doesn't work. You don't want two consecutive update insns in any case. Easiest is to just not use "m<>", and then, don't use %Un (which won't do anything, but it is confusing). Can't we leave the Un on the second stw ? Same for the reads. Rest looks fine, and update should be good with that fixed as said. Reviewed-by: Segher Boessenkool Segher Thanks for the review Christophe
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
Yes, kzalloc() would clean the allocated areas and the init of remaining array elements are redundant. I will remove the block in v3. >> > + dev_err(&pdev->dev, "error no valid uio-map configured\n"); >> > + ret = -EINVAL; >> > + goto err_info_free_internel; >> > + } >> > + >> > + info->version = "0.1.0"; >> >> Could you define some DRIVER_VERSION in the top of the file next to >> DRIVER_NAME instead of hard coding in the middle on a function ? > >That's what v1 had, and Greg KH said to remove it. I'm guessing that he >thought it was the common-but-pointless practice of having the driver print a >version number that never gets updated, rather than something the UIO API >(unfortunately, compared to a feature query interface) expects. That said, >I'm not sure what the value is of making it a macro since it should only be >used once, that use is self documenting, it isn't tunable, etc. Though if >this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro >again, it should be UIO_VERSION, not DRIVER_VERSION). > >Does this really need a three-part version scheme? What's wrong with a >version of "1", to be changed to "2" in the hopefully-unlikely event that the >userspace API changes? Assuming UIO is used for this at all, which doesn't >seem like a great fit to me. > >-Scott > As Scott mentioned, the version define as necessity by uio core but actually useless for us here(and for many other type of devices I guess). So maybe the better way is to set it optionally, but this belong first to uio core. For the cache-sram uio driver, I will define an UIO_VERSION micro as a compromise fit all wonders, no confusing as Greg first mentioned. >> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { >> +{ .compatible = "uio,fsl,p2020-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p2010-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1020-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1011-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1013-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1022-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,p1021-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1012-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1025-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1016-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1024-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1015-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1010-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, >> +{}, >> +}; > >NACK > >The device tree describes the hardware, not what driver you want to bind the >hardware to, or how you want to allocate the resources. And even if defining >nodes for sram allocation were the right way to go, why do you have a separate >compatible for each chip when you're just describing software configuration? > >Instead, have module parameters that take the sizes and alignments you'd like >to allocate and expose to userspace. Better still would be some sort of >dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment, >if it succeeds you can mmap it, and when the fd is closed the region is >freed). > >-Scott > Can not agree more. But what if I want to define more than one cache-sram uio devices? How about use the device tree for pseudo uio cache-sram driver? static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { { .compatible = "uio,cache-sram", }, {}, }; Thanks, Wenhu
[PATCH v2] KVM: Optimize kvm_arch_vcpu_ioctl_run function
In earlier versions of kvm, 'kvm_run' is an independent structure and is not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplify the function definition, removes the extra 'kvm_run' parameter, and extract it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang --- v2 change: remove 'kvm_run' parameter and extract it from 'kvm_vcpu' arch/mips/kvm/mips.c | 3 ++- arch/powerpc/kvm/powerpc.c | 3 ++- arch/s390/kvm/kvm-s390.c | 3 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 2 +- virt/kvm/arm/arm.c | 6 +++--- virt/kvm/kvm_main.c| 2 +- 7 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 8f05dd0a0f4e..ec24adf4857e 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -439,8 +439,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, return -ENOIOCTLCMD; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r = -EINTR; vcpu_load(vcpu); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index e15166b0a16d..7e24691e138a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1764,8 +1764,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) return r; } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *run = vcpu->run; int r; vcpu_load(vcpu); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 19a81024fe16..443af3ead739 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4333,8 +4333,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) store_regs_fmt2(vcpu, kvm_run); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int rc; if (kvm_run->immediate_exit) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3bf2ecafd027..a0338e86c90f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8707,8 +8707,9 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; int r; vcpu_load(vcpu); @@ -8726,18 +8727,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; + kvm_run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } goto out; } - if (vcpu->run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { + if (kvm_run->kvm_valid_regs & ~KVM_SYNC_X86_VALID_FIELDS) { r = -EINVAL; goto out; } - if (vcpu->run->kvm_dirty_regs) { + if (kvm_run->kvm_dirty_regs) { r = sync_regs(vcpu); if (r != 0) goto out; @@ -8767,7 +8768,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) out: kvm_put_guest_fpu(vcpu); - if (vcpu->run->kvm_valid_regs) + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); kvm_sigset_deactivate(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d58beb65454..1e17ef719595 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -866,7 +866,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state); int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); -int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); +int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); int kvm_arch_init(void *opaque); void kvm_arch_exit(void); diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 48d0ec44ad77..f5390ac2165b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -639,7 +639,6 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The VCPU pointer - * @run: The kvm_run structure pointer used for userspace state
[PATCH] KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions
Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler", it's been possible in fairly rare circumstances to load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a guest on a POWER8 host. Because that case wasn't checked for, we could misinterpret the non-present PTE as being a cache-inhibited PTE. That could mismatch with the corresponding hash PTE, which would cause the function to fail with -EFAULT a little further down. That would propagate up to the KVM_RUN ioctl() generally causing the KVM userspace (usually qemu) to fall over. This addresses the problem by catching that case and returning to the guest instead, letting it fault again, and retrying the whole page fault from the beginning. For completeness, this fixes the radix page fault handler in the same way. For radix this didn't cause any obvious misbehaviour, because we ended up putting the non-present PTE into the guest's partition-scoped page tables, leading immediately to another hypervisor data/instruction storage interrupt, which would go through the page fault path again and fix things up. Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402 Reported-by: David Gibson Signed-off-by: Paul Mackerras --- This is a reworked version of the patch David Gibson sent recently, with the fix applied to the radix case as well. The commit message is mostly stolen from David's patch. arch/powerpc/kvm/book3s_64_mmu_hv.c| 9 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 + 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 3aecec8..20b7dce 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -604,18 +604,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, */ local_irq_disable(); ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); + pte = __pte(0); + if (ptep) + pte = *ptep; + local_irq_enable(); /* * If the PTE disappeared temporarily due to a THP * collapse, just return and let the guest try again. */ - if (!ptep) { - local_irq_enable(); + if (!pte_present(pte)) { if (page) put_page(page); return RESUME_GUEST; } - pte = *ptep; - local_irq_enable(); hpa = pte_pfn(pte) << PAGE_SHIFT; pte_size = PAGE_SIZE; if (shift) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 134fbc1..7bf94ba 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -815,18 +815,19 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, */ local_irq_disable(); ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); + pte = __pte(0); + if (ptep) + pte = *ptep; + local_irq_enable(); /* * If the PTE disappeared temporarily due to a THP * collapse, just return and let the guest try again. */ - if (!ptep) { - local_irq_enable(); + if (!pte_present(pte)) { if (page) put_page(page); return RESUME_GUEST; } - pte = *ptep; - local_irq_enable(); /* If we're logging dirty pages, always map single pages */ large_enable = !(memslot->flags & KVM_MEM_LOG_DIRTY_PAGES); -- 2.7.4
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
* Rich Felker: > My preference would be that it work just like the i386 AT_SYSINFO > where you just replace "int $128" with "call *%%gs:16" and the kernel > provides a stub in the vdso that performs either scv or the old > mechanism with the same calling convention. The i386 mechanism has received some criticism because it provides an effective means to redirect execution flow to anyone who can write to the TCB. I am not sure if it makes sense to copy it.
Re: [PATCH v2] powerpc/setup_64: Set cache-line-size based on cache-block-size
Hi All, On Wed, 2020-03-25 at 16:18 +1300, Chris Packham wrote: > If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, > use > the block-size value for both. Per the devicetree spec cache-line- > size > is only needed if it differs from the block size. > > Signed-off-by: Chris Packham > --- > It looks as though the bsizep = lsizep is not required per the spec > but it's > probably safer to retain it. > > Changes in v2: > - Scott pointed out that u-boot should be filling in the cache > properties > (which it does). But it does not specify a cache-line-size because > it > provides a cache-block-size and the spec says you don't have to if > they are > the same. So the error is in the parsing not in the devicetree > itself. > Ping? This thread went kind of quiet. > arch/powerpc/kernel/setup_64.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/kernel/setup_64.c > b/arch/powerpc/kernel/setup_64.c > index e05e6dd67ae6..dd8a238b54b8 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -516,6 +516,8 @@ static bool __init parse_cache_info(struct > device_node *np, > lsizep = of_get_property(np, propnames[3], NULL); > if (bsizep == NULL) > bsizep = lsizep; > + if (lsizep == NULL) > + lsizep = bsizep; > if (lsizep != NULL) > lsize = be32_to_cpu(*lsizep); > if (bsizep != NULL)
Re: [PATCH v2, 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
From: Scott Wood >> +bool "32-bit kernel" > >Why make that user selectable ? > >Either a kernel is 64-bit or it is 32-bit. So having PPC64 user >selectable is all we need. > >And what is the link between this change and the description in the log ? > >> default y if !PPC64 >> select KASAN_VMALLOC if KASAN && MODULES >> >> @@ -15,6 +15,7 @@ config PPC_BOOK3S_32 >> bool >> >> menu "Processor support" >> + > >Why adding this space ? > >> choice >> prompt "Processor Type" >> depends on PPC32 >> @@ -211,9 +212,9 @@ config PPC_BOOK3E >> depends on PPC_BOOK3E_64 >> >> config E500 >> +bool "e500 Support" >> select FSL_EMB_PERFMON >> select PPC_FSL_BOOK3E >> -bool > >Why make this user-selectable ? This is already selected by the >processors requiring it, ie 8500, e5500 and e6500. > >Is there any other case where we need E500 ? > >And again, what's the link between this change and the description in >the log ? > > >> >> config PPC_E500MC >> bool "e500mc Support" >> > >Christophe Hi, Scott, Christophe! I find that I did not get the point well of the defferences between configurability and selectability(maybe words I created) of Kconfig items. You are right that FSL_85XX_CACHE_SRAM should only be selected by a caller but never enable it seperately. Same answer for the comments from Christophe. I will drop this patch in v3. Thanks, Wenhu
Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts
On Wed, Apr 15, 2020 at 04:03:29PM +0200, Michal Suchánek wrote: > On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote: > > The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the > > Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and > > User Authority Mask Override Register (UAMOR) are not correctly saved and > > restored when the CPU is going into/coming out of idle state. > > > > On POWER9 CPUs, this means that a CPU may return from idle with the AMR > > value of another thread on the same core. > > > > This allows a trivial Denial of Service attack against KVM hosts, by booting > > a guest kernel which makes use of the AMR, such as a v5.2 or later kernel > > with Kernel Userspace Access Prevention (KUAP) enabled. > > > > The guest kernel will set the AMR to prevent userspace access, then the > > thread will go idle. At a later point, the hardware thread that the guest > > was using may come out of idle and start executing in the host, without > > restoring the host AMR value. The host kernel can get caught in a page fault > > loop, as the AMR is unexpectedly causing memory accesses to fail in the > > host, and the host is eventually rendered unusable. > > Hello, > > shouldn't the kernel restore the host registers when leaving the guest? It does. That's not the bug. > I recall some code exists for handling the *AM*R when leaving guest. Can > the KVM guest enter idle without exiting to host? No, we currently never execute the "stop" instruction in guest context. The bug occurs when a thread that is in the host goes idle and executes the stop instruction to go to a power-saving state, while another thread is executing inside a guest. Hardware loses the first thread's AMR while it is stopped, and as it happens, it is possible for the first thread to wake up with the contents of its AMR equal to the other thread's AMR. This can happen even if the first thread has never executed in the guest. The kernel needs to save and restore AMR (among other registers) across the stop instruction because of this hardware behaviour. We missed the AMR initially, which is what led to this vulnerability. Paul.
[PATCH] KVM: PPC: Handle non-present PTEs in kvmppc_book3s_hv_page_fault()
Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler", it's been possible in fairly rare circumstances to load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a guest on a POWER8 host. Because that case wasn't checked for, we could misinterpret the non-present PTE as being a cache-inhibited PTE. That could mismatch with the corresponding hash PTE, which would cause the function to fail with -EFAULT a little further down. That would propagate up to the KVM_RUN ioctl() generally causing the KVM userspace (usually qemu) to fall over. This addresses the problem by catching that case and returning to the guest instead. Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402 Suggested-by: Paul Mackerras Signed-off-by: David Gibson --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 6404df613ea3..394fca8e630a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -616,6 +616,11 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } pte = *ptep; local_irq_enable(); + if (!pte_present(pte)) { + if (page) + put_page(page); + return RESUME_GUEST; + } hpa = pte_pfn(pte) << PAGE_SHIFT; pte_size = PAGE_SIZE; if (shift) -- 2.25.2
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Excerpts from Rich Felker's message of April 16, 2020 1:03 pm: > On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote: >> > Not to mention the dcache line to access >> > __hwcap or whatever, and the icache lines to setup access TOC-relative >> > access to it. (Of course you could put a copy of its value in TLS at a >> > fixed offset, which would somewhat mitigate both.) >> > >> >> And finally, the HWCAP test can eventually go away in future. A vdso >> >> call can not. >> > >> > We support nearly arbitrarily old kernels (with limited functionality) >> > and hardware (with full functionality) and don't intend for that to >> > change, ever. But indeed glibc might want too eventually drop the >> > check. >> >> Ah, cool. Any build-time flexibility there? >> >> We may or may not be getting a new ABI that will use instructions not >> supported by old processors. >> >> https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html >> >> Current ABI continues to work of course and be the default for some >> time, but building for new one would give some opportunity to drop >> such support for old procs, at least for glibc. > > What does "new ABI" entail to you? In the terminology I use with musl, > "new ABI" and "new ISA level" are different things. You can compile > (explicit -march or compiler default) binaries that won't run on older > cpus due to use of new insns etc., but we consider it the same ABI if > you can link code for an older/baseline ISA level with the > newer-ISA-level object files, i.e. if the interface surface for > linkage remains compatible. We also try to avoid gratuitous > proliferation of different ABIs unless there's a strong underlying > need (like addition of softfloat ABIs for archs that usually have FPU, > or vice versa). Yeah it will be a new ABI type that also requires a new ISA level. As far as I know (and I'm not on the toolchain side) there will be some call compatibility between the two, so it may be fine to continue with existing ABI for libc. But it just something that comes to mind as a build-time cutover where we might be able to assume particular features. > In principle the same could be done for kernels except it's a bigger > silent gotcha (possible ENOSYS in places where it shouldn't be able to > happen rather than a trapping SIGILL or similar) and there's rarely > any serious performance or size benefit to dropping support for older > kernels. Right, I don't think it'd be a huge problem whatever way we go, compared with the cost of the system call. Thanks, Nick
Re: [PATCH] papr/scm: Add bad memory ranges to nvdimm bad ranges
Hi Santosh, Some review comments below. Santosh Sivaraj writes: > Subscribe to the MCE notification and add the physical address which > generated a memory error to nvdimm bad range. > > Signed-off-by: Santosh Sivaraj > --- > > This patch depends on "powerpc/mce: Add MCE notification chain" [1]. > > Unlike the previous series[2], the patch adds badblock registration only for > pseries scm driver. Handling badblocks for baremetal (powernv) PMEM will be > done > later and if possible get the badblock handling as a common code. > > [1] > https://lore.kernel.org/linuxppc-dev/20200330071219.12284-1-ganes...@linux.ibm.com/ > [2] > https://lore.kernel.org/linuxppc-dev/20190820023030.18232-1-sant...@fossix.org/ > > arch/powerpc/platforms/pseries/papr_scm.c | 96 ++- > 1 file changed, 95 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index 0b4467e378e5..5012cbf4606e 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -12,6 +12,8 @@ > #include > #include > #include > +#include > +#include > > #include > > @@ -39,8 +41,12 @@ struct papr_scm_priv { > struct resource res; > struct nd_region *region; > struct nd_interleave_set nd_set; > + struct list_head region_list; > }; > > +LIST_HEAD(papr_nd_regions); > +DEFINE_MUTEX(papr_ndr_lock); > + > static int drc_pmem_bind(struct papr_scm_priv *p) > { > unsigned long ret[PLPAR_HCALL_BUFSIZE]; > @@ -372,6 +378,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) > dev_info(dev, "Region registered with target node %d and online > node %d", >target_nid, online_nid); > > + mutex_lock(&papr_ndr_lock); > + list_add_tail(&p->region_list, &papr_nd_regions); > + mutex_unlock(&papr_ndr_lock); > + > return 0; > > err: nvdimm_bus_unregister(p->bus); > @@ -379,6 +389,68 @@ err: nvdimm_bus_unregister(p->bus); > return -ENXIO; > } > > +void papr_scm_add_badblock(struct nd_region *region, struct nvdimm_bus *bus, > +u64 phys_addr) > +{ > + u64 aligned_addr = ALIGN_DOWN(phys_addr, L1_CACHE_BYTES); > + > + if (nvdimm_bus_add_badrange(bus, aligned_addr, L1_CACHE_BYTES)) { > + pr_err("Bad block registration for 0x%llx failed\n", phys_addr); > + return; > + } > + > + pr_debug("Add memory range (0x%llx - 0x%llx) as bad range\n", > + aligned_addr, aligned_addr + L1_CACHE_BYTES); > + > + nvdimm_region_notify(region, NVDIMM_REVALIDATE_POISON); > +} > + > +static int handle_mce_ue(struct notifier_block *nb, unsigned long val, > + void *data) > +{ > + struct machine_check_event *evt = data; > + struct papr_scm_priv *p; > + u64 phys_addr; > + bool found = false; > + > + if (evt->error_type != MCE_ERROR_TYPE_UE) > + return NOTIFY_DONE; > + > + if (list_empty(&papr_nd_regions)) > + return NOTIFY_DONE; > + > + phys_addr = evt->u.ue_error.physical_address + > + (evt->u.ue_error.effective_address & ~PAGE_MASK); Though it seems that you are trying to get the actual physical address from the page aligned evt->u.ue_error.physical_address, it would be nice if you could put a comment as to why you are doing this seemingly wierd math with real and effective addresses here. > + > + if (!evt->u.ue_error.physical_address_provided || > + !is_zone_device_page(pfn_to_page(phys_addr >> PAGE_SHIFT))) > + return NOTIFY_DONE; > + > + /* mce notifier is called from a process context, so mutex is safe */ > + mutex_lock(&papr_ndr_lock); > + list_for_each_entry(p, &papr_nd_regions, region_list) { > + struct resource res = p->res; > + > + if (phys_addr >= res.start && phys_addr <= res.end) { > + found = true; > + break; > + } > + } > + > + mutex_unlock(&papr_ndr_lock); > + > + if (!found) > + return NOTIFY_DONE; > + > + papr_scm_add_badblock(p->region, p->bus, phys_addr); I see a possible race between papr_scm_add_badblock() and papr_scm_remove() as a bad block may be reported just remove a region is disabled. Would recomment calling papr_scm_bad_block() in context of papr_ndr_lock. > + > + return NOTIFY_OK; > +} > + > +static struct notifier_block mce_ue_nb = { > + .notifier_call = handle_mce_ue > +}; > + > static int papr_scm_probe(struct platform_device *pdev) > { > struct device_node *dn = pdev->dev.of_node; > @@ -476,6 +548,10 @@ static int papr_scm_remove(struct platform_device *pdev) > { > struct papr_scm_priv *p = platform_get_drvdata(pdev); > > + mutex_lock(&papr_ndr_lock); > + list_del(&(p->region_list)); > + mutex_unlock(&papr_ndr_lock); > + > nvdimm_bus_unregister(p->bus
Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
On 4/8/20 10:22 AM, Frank Rowand wrote: > Hi Michael, > > On 4/7/20 10:13 PM, Michael Ellerman wrote: >> bugzilla-dae...@bugzilla.kernel.org writes: >>> https://bugzilla.kernel.org/show_bug.cgi?id=206203 >>> >>> Erhard F. (erhar...@mailbox.org) changed: >>> >>>What|Removed |Added >>> >>> Attachment #286801|0 |1 >>> is obsolete|| >>> >>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) --- >>> Created attachment 288189 >>> --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit >>> kmemleak output (kernel 5.6.2, Talos II) >> >> These are all in or triggered by the of unittest code AFAICS. >> Content of the log reproduced below. >> >> Frank/Rob, are these memory leaks expected? > > Thanks for the report. I'll look at each one. Only one of the leaks was expected. I have patches to fix the unexpected leaks and to remove the expected leak so that the kmemleak report of it will not have to be checked again. I expect to send the patch series tomorrow (Thursday). -Frank > > -Frank > > >> >> cheers >> >> >> unreferenced object 0xc007eb89ca58 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 >> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00 >> backtrace: >> [<07b50c76>] .__of_node_dup+0x38/0x1c0 >> [] .of_unittest_changeset+0x13c/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [ ] .do_one_initcall+0x7c/0x420 >> [ ] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007ec978008 (size 8): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 8 bytes): >> 6e 31 00 6b 6b 6b 6b a5 n1.. >> backtrace: >> [ ] .kstrdup+0x44/0xb0 >> [ ] .__of_node_dup+0x50/0x1c0 >> [ ] .of_unittest_changeset+0x13c/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [ ] .do_one_initcall+0x7c/0x420 >> [ ] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007eb89e318 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 >> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00 >> backtrace: >> [<07b50c76>] .__of_node_dup+0x38/0x1c0 >> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [ ] .do_one_initcall+0x7c/0x420 >> [ ] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007ec97ab08 (size 8): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 8 bytes): >> 6e 32 00 6b 6b 6b 6b a5 n2.. >> backtrace: >> [ ] .kstrdup+0x44/0xb0 >> [ ] .__of_node_dup+0x50/0x1c0 >> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [ ] .do_one_initcall+0x7c/0x420 >> [ ] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007eb89e528 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00 >> c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00 >> backtrace: >> [<07b50c76>] .__of_node_dup+0x38/0x1c0 >> [ ] .of_unittest_changeset+0x1ec/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [ ] .do_one_initcall+0x7c/0x420 >> [ ] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007ec97b318 (size 8): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 8 bytes): >> 6e 32 31 00 6b 6b 6b a5 n21.kkk. >> backtrace: >> [ ] .kstrdup+0x44/0xb0
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
On Thu, Apr 16, 2020 at 12:53:31PM +1000, Nicholas Piggin wrote: > > Not to mention the dcache line to access > > __hwcap or whatever, and the icache lines to setup access TOC-relative > > access to it. (Of course you could put a copy of its value in TLS at a > > fixed offset, which would somewhat mitigate both.) > > > >> And finally, the HWCAP test can eventually go away in future. A vdso > >> call can not. > > > > We support nearly arbitrarily old kernels (with limited functionality) > > and hardware (with full functionality) and don't intend for that to > > change, ever. But indeed glibc might want too eventually drop the > > check. > > Ah, cool. Any build-time flexibility there? > > We may or may not be getting a new ABI that will use instructions not > supported by old processors. > > https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html > > Current ABI continues to work of course and be the default for some > time, but building for new one would give some opportunity to drop > such support for old procs, at least for glibc. What does "new ABI" entail to you? In the terminology I use with musl, "new ABI" and "new ISA level" are different things. You can compile (explicit -march or compiler default) binaries that won't run on older cpus due to use of new insns etc., but we consider it the same ABI if you can link code for an older/baseline ISA level with the newer-ISA-level object files, i.e. if the interface surface for linkage remains compatible. We also try to avoid gratuitous proliferation of different ABIs unless there's a strong underlying need (like addition of softfloat ABIs for archs that usually have FPU, or vice versa). In principle the same could be done for kernels except it's a bigger silent gotcha (possible ENOSYS in places where it shouldn't be able to happen rather than a trapping SIGILL or similar) and there's rarely any serious performance or size benefit to dropping support for older kernels. Rich
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Excerpts from Rich Felker's message of April 16, 2020 12:35 pm: > On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote: >> >> > Likewise, it's not useful to have different error return mechanisms >> >> > because the caller just has to branch to support both (or the >> >> > kernel-provided stub just has to emulate one for it; that could work >> >> > if you really want to change the bad existing convention). >> >> > >> >> > Thoughts? >> >> >> >> The existing convention has to change somewhat because of the clobbers, >> >> so I thought we could change the error return at the same time. I'm >> >> open to not changing it and using CR0[SO], but others liked the idea. >> >> Pro: it matches sc and vsyscall. Con: it's different from other common >> >> archs. Performnce-wise it would really be a wash -- cost of conditional >> >> branch is not the cmp but the mispredict. >> > >> > If you do the branch on hwcap at each syscall, then you significantly >> > increase code size of every syscall point, likely turning a bunch of >> > trivial functions that didn't need stack frames into ones that do. You >> > also potentially make them need a TOC pointer. Making them all just do >> > an indirect call unconditionally (with pointer in TLS like i386?) is a >> > lot more efficient in code size and at least as good for performance. >> >> I disagree. Doing the long vdso indirect call *necessarily* requires >> touching a new icache line, and even a new TLB entry. Indirect branches > > The increase in number of icache lines from the branch at every > syscall point is far greater than the use of a single extra icache > line shared by all syscalls. That's true, I was thinking of a single function that does the test and calls syscalls, which might be the fair comparison. > Not to mention the dcache line to access > __hwcap or whatever, and the icache lines to setup access TOC-relative > access to it. (Of course you could put a copy of its value in TLS at a > fixed offset, which would somewhat mitigate both.) > >> And finally, the HWCAP test can eventually go away in future. A vdso >> call can not. > > We support nearly arbitrarily old kernels (with limited functionality) > and hardware (with full functionality) and don't intend for that to > change, ever. But indeed glibc might want too eventually drop the > check. Ah, cool. Any build-time flexibility there? We may or may not be getting a new ABI that will use instructions not supported by old processors. https://sourceware.org/legacy-ml/binutils/2019-05/msg00331.html Current ABI continues to work of course and be the default for some time, but building for new one would give some opportunity to drop such support for old procs, at least for glibc. > >> If you really want to select with an indirect branch rather than >> direct conditional, you can do that all within the library. > > OK. It's a little bit more work if that's not the interface the kernel > will give us, but it's no big deal. Okay. Thanks, Nick
Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB
On Thu, Apr 16, 2020 at 12:34 PM Oliver O'Halloran wrote: > > On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy wrote: > > > > Anyone? Is it totally useless or wrong approach? Thanks, > > I wouldn't say it's either, but I still hate it. > > The 4GB mode being per-PHB makes it difficult to use unless we force > that mode on 100% of the time which I'd prefer not to do. Ideally > devices that actually support 64bit addressing (which is most of them) > should be able to use no-translate mode when possible since a) It's > faster, and b) It frees up room in the TCE cache devices that actually > need them. I know you've done some testing with 100G NICs and found > the overhead was fine, but IMO that's a bad test since it's pretty > much the best-case scenario since all the devices on the PHB are in > the same PE. The PHB's TCE cache only hits when the TCE matches the > DMA bus address and the PE number for the device so in a multi-PE > environment there's a lot of potential for TCE cache trashing. If > there was one or two PEs under that PHB it's probably not going to > matter, but if you have an NVMe rack with 20 drives it starts to look > a bit ugly. > > That all said, it might be worth doing this anyway since we probably > want the software infrastructure in place to take advantage of it. > Maybe expand the command line parameters to allow it to be enabled on > a per-PHB basis rather than globally. Since we're on the topic I've been thinking the real issue we have is that we're trying to pick an "optimal" IOMMU config at a point where we don't have enough information to work out what's actually optimal. The IOMMU config is done on a per-PE basis, but since PEs may contain devices with different DMA masks (looking at you wierd AMD audio function) we're always going to have to pick something conservative as the default config for TVE#0 (64k, no bypass mapping) since the driver will tell us what the device actually supports long after the IOMMU configuation is done. What we really want is to be able to have separate IOMMU contexts for each device, or at the very least a separate context for the crippled devices. We could allow a per-device IOMMU context by extending the Master / Slave PE thing to cover DMA in addition to MMIO. Right now we only use slave PEs when a device's MMIO BARs extend over multiple m64 segments. When that happens an MMIO error causes the PHB to freezes the PE corresponding to one of those segments, but not any of the others. To present a single "PE" to the EEH core we check the freeze status of each of the slave PEs when the EEH core does a PE status check and if any of them are frozen, we freeze the rest of them too. When a driver sets a limited DMA mask we could move that device to a seperate slave PE so that it has it's own IOMMU context taylored to its DMA addressing limits. Thoughts? Oliver
Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings
Excerpts from Will Deacon's message of April 15, 2020 8:47 pm: > Hi Nick, > > On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote: >> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings, >> have vmalloc attempt to allocate PMD-sized pages first, before falling back >> to small pages. Allocations which use something other than PAGE_KERNEL >> protections are not permitted to use huge pages yet, not all callers expect >> this (e.g., module allocations vs strict module rwx). >> >> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from >> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9. > > I wonder if it's worth extending vmap() to handle higher order pages in > a similar way? That might be helpful for tracing PMUs such as Arm SPE, > where the CPU streams tracing data out to a virtually addressed buffer > (see rb_alloc_aux_page()). Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after this patch, I have something to do it but no callers ready yet, if you have an easy one we can add it. >> This can result in more internal fragmentation and memory overhead for a >> given allocation. It can also cause greater NUMA unbalance on hashdist >> allocations. >> >> There may be other callers that expect small pages under vmalloc but use >> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An >> alternative would be a new function or flag which enables large mappings, >> and use that in callers. >> >> Signed-off-by: Nicholas Piggin >> --- >> include/linux/vmalloc.h | 2 + >> mm/vmalloc.c| 135 +--- >> 2 files changed, 102 insertions(+), 35 deletions(-) >> >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 291313a7e663..853b82eac192 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */ >> #define VM_UNINITIALIZED0x0020 /* vm_struct is not fully >> initialized */ >> #define VM_NO_GUARD 0x0040 /* don't add guard page */ >> #define VM_KASAN0x0080 /* has allocated kasan shadow >> memory */ >> +#define VM_HUGE_PAGES 0x0100 /* may use huge pages */ > > Please can you add a check for this in the arm64 change_memory_common() > code? Other architectures might need something similar, but we need to > forbid changing memory attributes for portions of the huge page. Yeah good idea, I can look about adding some more checks. > > In general, I'm a bit wary of software table walkers tripping over this. > For example, I don't think apply_to_existing_page_range() can handle > huge mappings at all, but the one user (KASAN) only ever uses page mappings > so it's ok there. Right, I have something to warn for apply to page range (and looking at adding support for bigger pages). It doesn't even have a test and warn at the moment which isn't good practice IMO so we should add one even without huge vmap. > >> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned >> long size, >> if (unlikely(!size)) >> return NULL; >> >> -if (flags & VM_IOREMAP) >> -align = 1ul << clamp_t(int, get_count_order_long(size), >> - PAGE_SHIFT, IOREMAP_MAX_ORDER); >> +if (flags & VM_IOREMAP) { >> +align = max(align, >> +1ul << clamp_t(int, get_count_order_long(size), >> + PAGE_SHIFT, IOREMAP_MAX_ORDER)); >> +} > > > I don't follow this part. Please could you explain why you're potentially > aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest > of the patch. Trying to remember. If the caller asks for a particular alignment we shouldn't reduce it. Should put it in another patch. Thanks, Nick
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
On Thu, Apr 16, 2020 at 12:24:16PM +1000, Nicholas Piggin wrote: > >> > Likewise, it's not useful to have different error return mechanisms > >> > because the caller just has to branch to support both (or the > >> > kernel-provided stub just has to emulate one for it; that could work > >> > if you really want to change the bad existing convention). > >> > > >> > Thoughts? > >> > >> The existing convention has to change somewhat because of the clobbers, > >> so I thought we could change the error return at the same time. I'm > >> open to not changing it and using CR0[SO], but others liked the idea. > >> Pro: it matches sc and vsyscall. Con: it's different from other common > >> archs. Performnce-wise it would really be a wash -- cost of conditional > >> branch is not the cmp but the mispredict. > > > > If you do the branch on hwcap at each syscall, then you significantly > > increase code size of every syscall point, likely turning a bunch of > > trivial functions that didn't need stack frames into ones that do. You > > also potentially make them need a TOC pointer. Making them all just do > > an indirect call unconditionally (with pointer in TLS like i386?) is a > > lot more efficient in code size and at least as good for performance. > > I disagree. Doing the long vdso indirect call *necessarily* requires > touching a new icache line, and even a new TLB entry. Indirect branches The increase in number of icache lines from the branch at every syscall point is far greater than the use of a single extra icache line shared by all syscalls. Not to mention the dcache line to access __hwcap or whatever, and the icache lines to setup access TOC-relative access to it. (Of course you could put a copy of its value in TLS at a fixed offset, which would somewhat mitigate both.) > And finally, the HWCAP test can eventually go away in future. A vdso > call can not. We support nearly arbitrarily old kernels (with limited functionality) and hardware (with full functionality) and don't intend for that to change, ever. But indeed glibc might want too eventually drop the check. > If you really want to select with an indirect branch rather than > direct conditional, you can do that all within the library. OK. It's a little bit more work if that's not the interface the kernel will give us, but it's no big deal. Rich
Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB
On Thu, Apr 16, 2020 at 11:27 AM Alexey Kardashevskiy wrote: > > Anyone? Is it totally useless or wrong approach? Thanks, I wouldn't say it's either, but I still hate it. The 4GB mode being per-PHB makes it difficult to use unless we force that mode on 100% of the time which I'd prefer not to do. Ideally devices that actually support 64bit addressing (which is most of them) should be able to use no-translate mode when possible since a) It's faster, and b) It frees up room in the TCE cache devices that actually need them. I know you've done some testing with 100G NICs and found the overhead was fine, but IMO that's a bad test since it's pretty much the best-case scenario since all the devices on the PHB are in the same PE. The PHB's TCE cache only hits when the TCE matches the DMA bus address and the PE number for the device so in a multi-PE environment there's a lot of potential for TCE cache trashing. If there was one or two PEs under that PHB it's probably not going to matter, but if you have an NVMe rack with 20 drives it starts to look a bit ugly. That all said, it might be worth doing this anyway since we probably want the software infrastructure in place to take advantage of it. Maybe expand the command line parameters to allow it to be enabled on a per-PHB basis rather than globally. Oliver
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Excerpts from Rich Felker's message of April 16, 2020 10:48 am: > On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote: >> Excerpts from Rich Felker's message of April 16, 2020 8:55 am: >> > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote: >> >> I would like to enable Linux support for the powerpc 'scv' instruction, >> >> as a faster system call instruction. >> >> >> >> This requires two things to be defined: Firstly a way to advertise to >> >> userspace that kernel supports scv, and a way to allocate and advertise >> >> support for individual scv vectors. Secondly, a calling convention ABI >> >> for this new instruction. >> >> >> >> Thanks to those who commented last time, since then I have removed my >> >> answered questions and unpopular alternatives but you can find them >> >> here >> >> >> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html >> >> >> >> Let me try one more with a wider cc list, and then we'll get something >> >> merged. Any questions or counter-opinions are welcome. >> >> >> >> System Call Vectored (scv) ABI >> >> == >> >> >> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an >> >> rfscv counter-part. The benefit of these instructions is performance >> >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the >> >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR >> >> updates. The scv instruction has 128 interrupt entry points (not enough >> >> to cover the Linux system call space). >> >> >> >> The proposal is to assign scv numbers very conservatively and allocate >> >> them as individual HWCAP features as we add support for more. The zero >> >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'. >> >> >> >> Advertisement >> >> >> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a >> >> SIGILL in current environments. Linux has defined a HWCAP2 bit >> >> PPC_FEATURE2_SCV for SCV support, but does not set it. >> >> >> >> When scv instruction support and the scv 0 vector for system calls are >> >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors >> >> should not be used without future HWCAP bits indicating support, which is >> >> how we will allocate them. (Should unallocated ones generate SIGILL, or >> >> return -ENOSYS in r3?) >> >> >> >> Calling convention >> >> >> >> The proposal is for scv 0 to provide the standard Linux system call ABI >> >> with the following differences from sc convention[1]: >> >> >> >> - LR is to be volatile across scv calls. This is necessary because the >> >> scv instruction clobbers LR. From previous discussion, this should be >> >> possible to deal with in GCC clobbers and CFI. >> >> >> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the >> >> kernel system call exit to avoid restoring the CR register (although >> >> we probably still would anyway to avoid information leak). >> >> >> >> - Error handling: I think the consensus has been to move to using negative >> >> return value in r3 rather than CR0[SO]=1 to indicate error, which >> >> matches >> >> most other architectures and is closer to a function call. >> >> >> >> The number of scratch registers (r9-r12) at kernel entry seems >> >> sufficient that we don't have any costly spilling, patch is here[2]. >> >> >> >> [1] >> >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst >> >> [2] >> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html >> > >> > My preference would be that it work just like the i386 AT_SYSINFO >> > where you just replace "int $128" with "call *%%gs:16" and the kernel >> > provides a stub in the vdso that performs either scv or the old >> > mechanism with the same calling convention. Then if the kernel doesn't >> > provide it (because the kernel is too old) libc would have to provide >> > its own stub that uses the legacy method and matches the calling >> > convention of the one the kernel is expected to provide. >> >> I'm not sure if that's necessary. That's done on x86-32 because they >> select different sequences to use based on the CPU running and if the host >> kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP >> bits and select the right sequence in libc as well I suppose. > > It's not just a HWCAP. It's a contract between the kernel and > userspace to support a particular calling convention that's not > exposed except as the public entry point the kernel exports via > AT_SYSINFO. Right. >> > Note that any libc that actually makes use of the new functionality is >> > not going to be able to make clobbers conditional on support for it; >> > branching around different clobbers is going to defeat any gains vs >> > always just treating anything clobbered by either method as clobbered. >> >> Well it would have to test HWCAP and
Re: [PATCH 01/34] docs: filesystems: fix references for doc files there
On 2020/4/15 22:32, Mauro Carvalho Chehab wrote: > Several files there were renamed to ReST. Fix the broken > references. > > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/ABI/stable/sysfs-devices-node | 2 +- > Documentation/ABI/testing/procfs-smaps_rollup | 2 +- > Documentation/admin-guide/cpu-load.rst| 2 +- > Documentation/admin-guide/nfs/nfsroot.rst | 2 +- > Documentation/driver-api/driver-model/device.rst | 2 +- > Documentation/driver-api/driver-model/overview.rst| 2 +- > Documentation/filesystems/dax.txt | 2 +- > Documentation/filesystems/dnotify.txt | 2 +- > Documentation/filesystems/ramfs-rootfs-initramfs.rst | 2 +- > Documentation/powerpc/firmware-assisted-dump.rst | 2 +- > Documentation/process/adding-syscalls.rst | 2 +- > .../translations/it_IT/process/adding-syscalls.rst| 2 +- > Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++--- > drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 +- > fs/Kconfig| 2 +- > fs/Kconfig.binfmt | 2 +- > fs/adfs/Kconfig | 2 +- > fs/affs/Kconfig | 2 +- > fs/afs/Kconfig| 6 +++--- > fs/bfs/Kconfig| 2 +- > fs/cramfs/Kconfig | 2 +- > fs/ecryptfs/Kconfig | 2 +- > fs/fat/Kconfig| 8 > fs/fuse/Kconfig | 2 +- > fs/fuse/dev.c | 2 +- > fs/hfs/Kconfig| 2 +- > fs/hpfs/Kconfig | 2 +- > fs/isofs/Kconfig | 2 +- > fs/namespace.c| 2 +- > fs/notify/inotify/Kconfig | 2 +- > fs/ntfs/Kconfig | 2 +- > fs/ocfs2/Kconfig | 2 +- For ocfs2 part, Acked-by: Joseph Qi > fs/overlayfs/Kconfig | 6 +++--- > fs/proc/Kconfig | 4 ++-- > fs/romfs/Kconfig | 2 +- > fs/sysfs/dir.c| 2 +- > fs/sysfs/file.c | 2 +- > fs/sysfs/mount.c | 2 +- > fs/sysfs/symlink.c| 2 +- > fs/sysv/Kconfig | 2 +- > fs/udf/Kconfig| 2 +- > include/linux/relay.h | 2 +- > include/linux/sysfs.h | 2 +- > kernel/relay.c| 2 +- > 44 files changed, 54 insertions(+), 54 deletions(-) > > diff --git a/Documentation/ABI/stable/sysfs-devices-node > b/Documentation/ABI/stable/sysfs-devices-node > index df8413cf1468..484fc04bcc25 100644 > --- a/Documentation/ABI/stable/sysfs-devices-node > +++ b/Documentation/ABI/stable/sysfs-devices-node > @@ -54,7 +54,7 @@ Date: October 2002 > Contact: Linux Memory Management list > Description: > Provides information about the node's distribution and memory > - utilization. Similar to /proc/meminfo, see > Documentation/filesystems/proc.txt > + utilization. Similar to /proc/meminfo, see > Documentation/filesystems/proc.rst > > What:/sys/devices/system/node/nodeX/numastat > Date:October 2002 > diff --git a/Documentation/ABI/testing/procfs-smaps_rollup > b/Documentation/ABI/testing/procfs-smaps_rollup > index 274df44d8b1b..046978193368 100644 > --- a/Documentation/ABI/testing/procfs-smaps_rollup > +++ b/Documentation/ABI/testing/procfs-smaps_rollup > @@ -11,7 +11,7 @@ Description: > Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem > are not present in /proc/pid/smaps. These fields represent > the sum of the Pss field of each type (anon, file, shmem). > - For more details, see Documentation/filesystems/proc.txt > + For more details, see Documentation/filesystems/proc.rst > and the procfs man page. > > Typical output looks like this: > diff --git a/Documentation/admin-guide/cpu-load.rst > b/Documentation/admin-guide/cpu-load.rst > index 2d01ce43d2a2..ebdecf864080 100644 > --- a/Documentation/admin-guide/cpu-load.rst > +++ b/Documentation/ad
Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB
Anyone? Is it totally useless or wrong approach? Thanks, On 08/04/2020 19:43, Alexey Kardashevskiy wrote: > > > On 23/03/2020 18:53, Alexey Kardashevskiy wrote: >> Here is an attempt to support bigger DMA space for devices >> supporting DMA masks less than 59 bits (GPUs come into mind >> first). POWER9 PHBs have an option to map 2 windows at 0 >> and select a windows based on DMA address being below or above >> 4GB. >> >> This adds the "iommu=iommu_bypass" kernel parameter and >> supports VFIO+pseries machine - current this requires telling >> upstream+unmodified QEMU about this via >> -global spapr-pci-host-bridge.dma64_win_addr=0x1 >> or per-phb property. 4/4 advertises the new option but >> there is no automation around it in QEMU (should it be?). >> >> For now it is either 1<<59 or 4GB mode; dynamic switching is >> not supported (could be via sysfs). >> >> This is a rebased version of >> https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/ >> >> The main change since v1 is that now it is 7 patches with >> clearer separation of steps. >> >> >> This is based on 6c90b86a745a "Merge tag 'mmc-v5.6-rc6' of >> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc" >> >> Please comment. Thanks. > > Ping? > > >> >> >> >> Alexey Kardashevskiy (7): >> powerpc/powernv/ioda: Move TCE bypass base to PE >> powerpc/powernv/ioda: Rework for huge DMA window at 4GB >> powerpc/powernv/ioda: Allow smaller TCE table levels >> powerpc/powernv/phb4: Use IOMMU instead of bypassing >> powerpc/iommu: Add a window number to >> iommu_table_group_ops::get_table_size >> powerpc/powernv/phb4: Add 4GB IOMMU bypass mode >> vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB >> >> arch/powerpc/include/asm/iommu.h | 3 + >> arch/powerpc/include/asm/opal-api.h | 9 +- >> arch/powerpc/include/asm/opal.h | 2 + >> arch/powerpc/platforms/powernv/pci.h | 4 +- >> include/uapi/linux/vfio.h | 2 + >> arch/powerpc/platforms/powernv/npu-dma.c | 1 + >> arch/powerpc/platforms/powernv/opal-call.c| 2 + >> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 234 ++ >> drivers/vfio/vfio_iommu_spapr_tce.c | 17 +- >> 10 files changed, 213 insertions(+), 65 deletions(-) >> > -- Alexey
Re: [PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
Hi Wang, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on char-misc/char-misc-testing staging/staging-testing v5.7-rc1 next-20200415] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Wang-Wenhu/drivers-uio-new-driver-uio_fsl_85xx_cache_sram/20200416-040633 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All error/warnings (new ones prefixed by >>): WARNING: unmet direct dependencies detected for ARCH_32BIT_OFF_T Depends on !64BIT Selected by - PPC && PPC32 In file included from include/linux/atomic-fallback.h:1185, from include/linux/atomic.h:74, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: >> include/asm-generic/atomic64.h:14:3: error: conflicting types for >> 'atomic64_t' 14 | } atomic64_t; | ^~ In file included from include/linux/page-flags.h:9, from kernel/bounds.c:10: include/linux/types.h:178:3: note: previous declaration of 'atomic64_t' was here 178 | } atomic64_t; | ^~ In file included from include/linux/atomic-fallback.h:1185, from include/linux/atomic.h:74, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: >> include/asm-generic/atomic64.h:18:12: error: conflicting types for >> 'atomic64_read' 18 | extern s64 atomic64_read(const atomic64_t | ^ In file included from include/linux/atomic.h:7, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: arch/powerpc/include/asm/atomic.h:300:23: note: previous definition of 'atomic64_read' was here 300 | static __inline__ s64 atomic64_read(const atomic64_t | ^ In file included from include/linux/atomic-fallback.h:1185, from include/linux/atomic.h:74, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: >> include/asm-generic/atomic64.h:19:13: error: conflicting types for >> 'atomic64_set' 19 | extern void atomic64_set(atomic64_t s64 i); | ^~~~ In file included from include/linux/atomic.h:7, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: arch/powerpc/include/asm/atomic.h:309:24: note: previous definition of 'atomic64_set' was here 309 | static __inline__ void atomic64_set(atomic64_t s64 i) | ^~~~ In file included from include/linux/atomic-fallback.h:1185, from include/linux/atomic.h:74, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: >> include/asm-generic/atomic64.h:32: warning: "ATOMIC64_OPS" redefined 32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) ATOMIC64_FETCH_OP(op) | In file included from include/linux/atomic.h:7, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: arch/powerpc/include/asm/atomic.h:380: note: this is the location of the previous definition 380 | #define ATOMIC64_OPS(op, asm_op) | In file included from include/linux/atomic-fallback.h:1185, from include/linux/atomic.h:74, from include/linux/debug_locks.h:6, from include/linux/lockdep.h:28, from include/linux/spinlock_types.h:18, from kernel/bounds.c:14: >> include/asm-generic/atomic64.h:24:14: error: conflicting types for >> 'atomic64_add' 24 | extern void atomic64_##op(s64 a, atomic64_t | ^ >> include/asm-generic/atomic64.h:32:26: note: in expansion of macro >> 'ATOMIC64_OP' 32 | #define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) ATOMIC64_FETCH_OP(op) | ^~~ >> include/asm-ge
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
On Thu, Apr 16, 2020 at 10:16:54AM +1000, Nicholas Piggin wrote: > Excerpts from Rich Felker's message of April 16, 2020 8:55 am: > > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote: > >> I would like to enable Linux support for the powerpc 'scv' instruction, > >> as a faster system call instruction. > >> > >> This requires two things to be defined: Firstly a way to advertise to > >> userspace that kernel supports scv, and a way to allocate and advertise > >> support for individual scv vectors. Secondly, a calling convention ABI > >> for this new instruction. > >> > >> Thanks to those who commented last time, since then I have removed my > >> answered questions and unpopular alternatives but you can find them > >> here > >> > >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html > >> > >> Let me try one more with a wider cc list, and then we'll get something > >> merged. Any questions or counter-opinions are welcome. > >> > >> System Call Vectored (scv) ABI > >> == > >> > >> The scv instruction is introduced with POWER9 / ISA3, it comes with an > >> rfscv counter-part. The benefit of these instructions is performance > >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the > >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR > >> updates. The scv instruction has 128 interrupt entry points (not enough > >> to cover the Linux system call space). > >> > >> The proposal is to assign scv numbers very conservatively and allocate > >> them as individual HWCAP features as we add support for more. The zero > >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'. > >> > >> Advertisement > >> > >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a > >> SIGILL in current environments. Linux has defined a HWCAP2 bit > >> PPC_FEATURE2_SCV for SCV support, but does not set it. > >> > >> When scv instruction support and the scv 0 vector for system calls are > >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors > >> should not be used without future HWCAP bits indicating support, which is > >> how we will allocate them. (Should unallocated ones generate SIGILL, or > >> return -ENOSYS in r3?) > >> > >> Calling convention > >> > >> The proposal is for scv 0 to provide the standard Linux system call ABI > >> with the following differences from sc convention[1]: > >> > >> - LR is to be volatile across scv calls. This is necessary because the > >> scv instruction clobbers LR. From previous discussion, this should be > >> possible to deal with in GCC clobbers and CFI. > >> > >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the > >> kernel system call exit to avoid restoring the CR register (although > >> we probably still would anyway to avoid information leak). > >> > >> - Error handling: I think the consensus has been to move to using negative > >> return value in r3 rather than CR0[SO]=1 to indicate error, which matches > >> most other architectures and is closer to a function call. > >> > >> The number of scratch registers (r9-r12) at kernel entry seems > >> sufficient that we don't have any costly spilling, patch is here[2]. > >> > >> [1] > >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst > >> [2] > >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840..html > > > > My preference would be that it work just like the i386 AT_SYSINFO > > where you just replace "int $128" with "call *%%gs:16" and the kernel > > provides a stub in the vdso that performs either scv or the old > > mechanism with the same calling convention. Then if the kernel doesn't > > provide it (because the kernel is too old) libc would have to provide > > its own stub that uses the legacy method and matches the calling > > convention of the one the kernel is expected to provide. > > I'm not sure if that's necessary. That's done on x86-32 because they > select different sequences to use based on the CPU running and if the host > kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP > bits and select the right sequence in libc as well I suppose. It's not just a HWCAP. It's a contract between the kernel and userspace to support a particular calling convention that's not exposed except as the public entry point the kernel exports via AT_SYSINFO. > > Note that any libc that actually makes use of the new functionality is > > not going to be able to make clobbers conditional on support for it; > > branching around different clobbers is going to defeat any gains vs > > always just treating anything clobbered by either method as clobbered. > > Well it would have to test HWCAP and patch in or branch to two > completely different sequences including register save/restores yes. > You could have the same asm and matching clobbers to put the sequence > inline
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Excerpts from Rich Felker's message of April 16, 2020 8:55 am: > On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote: >> I would like to enable Linux support for the powerpc 'scv' instruction, >> as a faster system call instruction. >> >> This requires two things to be defined: Firstly a way to advertise to >> userspace that kernel supports scv, and a way to allocate and advertise >> support for individual scv vectors. Secondly, a calling convention ABI >> for this new instruction. >> >> Thanks to those who commented last time, since then I have removed my >> answered questions and unpopular alternatives but you can find them >> here >> >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html >> >> Let me try one more with a wider cc list, and then we'll get something >> merged. Any questions or counter-opinions are welcome. >> >> System Call Vectored (scv) ABI >> == >> >> The scv instruction is introduced with POWER9 / ISA3, it comes with an >> rfscv counter-part. The benefit of these instructions is performance >> (trading slower SRR0/1 with faster LR/CTR registers, and entering the >> kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR >> updates. The scv instruction has 128 interrupt entry points (not enough >> to cover the Linux system call space). >> >> The proposal is to assign scv numbers very conservatively and allocate >> them as individual HWCAP features as we add support for more. The zero >> vector ('scv 0') will be used for normal system calls, equivalent to 'sc'. >> >> Advertisement >> >> Linux has not enabled FSCR[SCV] yet, so the instruction will cause a >> SIGILL in current environments. Linux has defined a HWCAP2 bit >> PPC_FEATURE2_SCV for SCV support, but does not set it. >> >> When scv instruction support and the scv 0 vector for system calls are >> added, PPC_FEATURE2_SCV will indicate support for these. Other vectors >> should not be used without future HWCAP bits indicating support, which is >> how we will allocate them. (Should unallocated ones generate SIGILL, or >> return -ENOSYS in r3?) >> >> Calling convention >> >> The proposal is for scv 0 to provide the standard Linux system call ABI >> with the following differences from sc convention[1]: >> >> - LR is to be volatile across scv calls. This is necessary because the >> scv instruction clobbers LR. From previous discussion, this should be >> possible to deal with in GCC clobbers and CFI. >> >> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the >> kernel system call exit to avoid restoring the CR register (although >> we probably still would anyway to avoid information leak). >> >> - Error handling: I think the consensus has been to move to using negative >> return value in r3 rather than CR0[SO]=1 to indicate error, which matches >> most other architectures and is closer to a function call. >> >> The number of scratch registers (r9-r12) at kernel entry seems >> sufficient that we don't have any costly spilling, patch is here[2]. >> >> [1] >> https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst >> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html > > My preference would be that it work just like the i386 AT_SYSINFO > where you just replace "int $128" with "call *%%gs:16" and the kernel > provides a stub in the vdso that performs either scv or the old > mechanism with the same calling convention. Then if the kernel doesn't > provide it (because the kernel is too old) libc would have to provide > its own stub that uses the legacy method and matches the calling > convention of the one the kernel is expected to provide. I'm not sure if that's necessary. That's done on x86-32 because they select different sequences to use based on the CPU running and if the host kernel is 32 or 64 bit. Sure they could in theory have a bunch of HWCAP bits and select the right sequence in libc as well I suppose. > Note that any libc that actually makes use of the new functionality is > not going to be able to make clobbers conditional on support for it; > branching around different clobbers is going to defeat any gains vs > always just treating anything clobbered by either method as clobbered. Well it would have to test HWCAP and patch in or branch to two completely different sequences including register save/restores yes. You could have the same asm and matching clobbers to put the sequence inline and then you could patch the one sc/scv instruction I suppose. A bit of logic to select between them doesn't defeat gains though, it's about 90 cycle improvement which is a handful of branch mispredicts so it really is an improvement. Eventually userspace will stop supporting the old variant too. > Likewise, it's not useful to have different error return mechanisms > because the caller just has to branch to support both (or the > kernel-provided s
Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
On Thu, Apr 16, 2020 at 07:45:09AM +1000, Nicholas Piggin wrote: > I would like to enable Linux support for the powerpc 'scv' instruction, > as a faster system call instruction. > > This requires two things to be defined: Firstly a way to advertise to > userspace that kernel supports scv, and a way to allocate and advertise > support for individual scv vectors. Secondly, a calling convention ABI > for this new instruction. > > Thanks to those who commented last time, since then I have removed my > answered questions and unpopular alternatives but you can find them > here > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html > > Let me try one more with a wider cc list, and then we'll get something > merged. Any questions or counter-opinions are welcome. > > System Call Vectored (scv) ABI > == > > The scv instruction is introduced with POWER9 / ISA3, it comes with an > rfscv counter-part. The benefit of these instructions is performance > (trading slower SRR0/1 with faster LR/CTR registers, and entering the > kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR > updates. The scv instruction has 128 interrupt entry points (not enough > to cover the Linux system call space). > > The proposal is to assign scv numbers very conservatively and allocate > them as individual HWCAP features as we add support for more. The zero > vector ('scv 0') will be used for normal system calls, equivalent to 'sc'. > > Advertisement > > Linux has not enabled FSCR[SCV] yet, so the instruction will cause a > SIGILL in current environments. Linux has defined a HWCAP2 bit > PPC_FEATURE2_SCV for SCV support, but does not set it. > > When scv instruction support and the scv 0 vector for system calls are > added, PPC_FEATURE2_SCV will indicate support for these. Other vectors > should not be used without future HWCAP bits indicating support, which is > how we will allocate them. (Should unallocated ones generate SIGILL, or > return -ENOSYS in r3?) > > Calling convention > > The proposal is for scv 0 to provide the standard Linux system call ABI > with the following differences from sc convention[1]: > > - LR is to be volatile across scv calls. This is necessary because the > scv instruction clobbers LR. From previous discussion, this should be > possible to deal with in GCC clobbers and CFI. > > - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the > kernel system call exit to avoid restoring the CR register (although > we probably still would anyway to avoid information leak). > > - Error handling: I think the consensus has been to move to using negative > return value in r3 rather than CR0[SO]=1 to indicate error, which matches > most other architectures and is closer to a function call. > > The number of scratch registers (r9-r12) at kernel entry seems > sufficient that we don't have any costly spilling, patch is here[2]. > > [1] > https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst > [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html My preference would be that it work just like the i386 AT_SYSINFO where you just replace "int $128" with "call *%%gs:16" and the kernel provides a stub in the vdso that performs either scv or the old mechanism with the same calling convention. Then if the kernel doesn't provide it (because the kernel is too old) libc would have to provide its own stub that uses the legacy method and matches the calling convention of the one the kernel is expected to provide. Note that any libc that actually makes use of the new functionality is not going to be able to make clobbers conditional on support for it; branching around different clobbers is going to defeat any gains vs always just treating anything clobbered by either method as clobbered. Likewise, it's not useful to have different error return mechanisms because the caller just has to branch to support both (or the kernel-provided stub just has to emulate one for it; that could work if you really want to change the bad existing convention). Thoughts? Rich
Re: [PATCH v2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'
Hi! On Wed, Apr 15, 2020 at 09:25:59AM +, Christophe Leroy wrote: > +#define __put_user_goto(x, ptr, label) \ > + __put_user_nocheck_goto((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)), > label) This line gets too long, can you break it up somehow? > +#define __put_user_asm_goto(x, addr, label, op) \ > + asm volatile goto( \ > + "1: " op "%U1%X1 %0,%1 # put_user\n" \ > + EX_TABLE(1b, %l2) \ > + : \ > + : "r" (x), "m" (*addr) \ > + : \ > + : label) Same "%Un" problem as in the other patch. You could use "m<>" here, but maybe just dropping "%Un" is better. > +#ifdef __powerpc64__ > +#define __put_user_asm2_goto(x, ptr, label) \ > + __put_user_asm_goto(x, ptr, label, "std") > +#else /* __powerpc64__ */ > +#define __put_user_asm2_goto(x, addr, label) \ > + asm volatile goto( \ > + "1: stw%U1%X1 %0, %1\n" \ > + "2: stw%U1%X1 %L0, %L1\n" \ > + EX_TABLE(1b, %l2) \ > + EX_TABLE(2b, %l2) \ > + : \ > + : "r" (x), "m" (*addr) \ > + : \ > + : label) > +#endif /* __powerpc64__ */ Here, you should drop it for sure. Rest looks fine. Reviewed-by: Segher Boessenkool Segher
Re: [PATCH] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()
Hi! On Wed, Apr 15, 2020 at 09:20:26AM +, Christophe Leroy wrote: > At the time being, __put_user()/__get_user() and friends only use > register indirect with immediate index addressing, with the index > set to 0. Ex: > > lwz reg1, 0(reg2) This is called a "D-form" instruction, or sometimes "offset addressing". Don't talk about an "index", it confuses things, because the *other* kind is called "indexed" already, also in the ISA docs! (X-form, aka indexed addressing, [reg+reg], where D-form does [reg+imm], and both forms can do [reg]). > Give the compiler the opportunity to use other adressing modes > whenever possible, to get more optimised code. Great :-) > --- a/arch/powerpc/include/asm/uaccess.h > +++ b/arch/powerpc/include/asm/uaccess.h > @@ -114,7 +114,7 @@ extern long __put_user_bad(void); > */ > #define __put_user_asm(x, addr, err, op) \ > __asm__ __volatile__( \ > - "1: " op " %1,0(%2) # put_user\n" \ > + "1: " op "%U2%X2 %1,%2 # put_user\n" \ > "2:\n" \ > ".section .fixup,\"ax\"\n" \ > "3: li %0,%3\n" \ > @@ -122,7 +122,7 @@ extern long __put_user_bad(void); > ".previous\n" \ > EX_TABLE(1b, 3b)\ > : "=r" (err)\ > - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) > + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err)) %Un on an "m" operand doesn't do much: you need to make it "m<>" if you want pre-modify ("update") insns to be generated. (You then will want to make sure that operand is used in a way GCC can understand; since it is used only once here, that works fine). > @@ -130,8 +130,8 @@ extern long __put_user_bad(void); > #else /* __powerpc64__ */ > #define __put_user_asm2(x, addr, err)\ > __asm__ __volatile__( \ > - "1: stw %1,0(%2)\n" \ > - "2: stw %1+1,4(%2)\n" \ > + "1: stw%U2%X2 %1,%2\n" \ > + "2: stw%U2%X2 %L1,%L2\n"\ > "3:\n" \ > ".section .fixup,\"ax\"\n" \ > "4: li %0,%3\n" \ > @@ -140,7 +140,7 @@ extern long __put_user_bad(void); > EX_TABLE(1b, 4b)\ > EX_TABLE(2b, 4b)\ > : "=r" (err)\ > - : "r" (x), "b" (addr), "i" (-EFAULT), "0" (err)) > + : "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err)) Here, it doesn't work. You don't want two consecutive update insns in any case. Easiest is to just not use "m<>", and then, don't use %Un (which won't do anything, but it is confusing). Same for the reads. Rest looks fine, and update should be good with that fixed as said. Reviewed-by: Segher Boessenkool Segher
Powerpc Linux 'scv' system call ABI proposal take 2
I would like to enable Linux support for the powerpc 'scv' instruction, as a faster system call instruction. This requires two things to be defined: Firstly a way to advertise to userspace that kernel supports scv, and a way to allocate and advertise support for individual scv vectors. Secondly, a calling convention ABI for this new instruction. Thanks to those who commented last time, since then I have removed my answered questions and unpopular alternatives but you can find them here https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-January/203545.html Let me try one more with a wider cc list, and then we'll get something merged. Any questions or counter-opinions are welcome. System Call Vectored (scv) ABI == The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 interrupt entry points (not enough to cover the Linux system call space). The proposal is to assign scv numbers very conservatively and allocate them as individual HWCAP features as we add support for more. The zero vector ('scv 0') will be used for normal system calls, equivalent to 'sc'. Advertisement Linux has not enabled FSCR[SCV] yet, so the instruction will cause a SIGILL in current environments. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. When scv instruction support and the scv 0 vector for system calls are added, PPC_FEATURE2_SCV will indicate support for these. Other vectors should not be used without future HWCAP bits indicating support, which is how we will allocate them. (Should unallocated ones generate SIGILL, or return -ENOSYS in r3?) Calling convention The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the CR register (although we probably still would anyway to avoid information leak). - Error handling: I think the consensus has been to move to using negative return value in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures and is closer to a function call. The number of scratch registers (r9-r12) at kernel entry seems sufficient that we don't have any costly spilling, patch is here[2]. [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204840.html
Re: [PATCH v5 0/6] implement KASLR for powerpc/fsl_booke/64
On Mon, 2020-03-30 at 10:20 +0800, Jason Yan wrote: > This is a try to implement KASLR for Freescale BookE64 which is based on > my earlier implementation for Freescale BookE32: > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718&state=* > > The implementation for Freescale BookE64 is similar as BookE32. One > difference is that Freescale BookE64 set up a TLB mapping of 1G during > booting. Another difference is that ppc64 needs the kernel to be > 64K-aligned. So we can randomize the kernel in this 1G mapping and make > it 64K-aligned. This can save some code to creat another TLB map at > early boot. The disadvantage is that we only have about 1G/64K = 16384 > slots to put the kernel in. > > KERNELBASE > > 64K |--> kernel <--| >| | | > +--+--+--++--+--+--+--+--+--+--+--+--++--+--+ > | | | || | | | | | | | | || | | > +--+--+--++--+--+--+--+--+--+--+--+--++--+--+ > | |1G > |-> offset<-| > > kernstart_virt_addr > > I'm not sure if the slot numbers is enough or the design has any > defects. If you have some better ideas, I would be happy to hear that. > > Thank you all. > > v4->v5: > Fix "-Werror=maybe-uninitialized" compile error. > Fix typo "similar as" -> "similar to". > v3->v4: > Do not define __kaslr_offset as a fixed symbol. Reference __run_at_load > and > __kaslr_offset by symbol instead of magic offsets. > Use IS_ENABLED(CONFIG_PPC32) instead of #ifdef CONFIG_PPC32. > Change kaslr-booke32 to kaslr-booke in index.rst > Switch some instructions to 64-bit. > v2->v3: > Fix build error when KASLR is disabled. > v1->v2: > Add __kaslr_offset for the secondary cpu boot up. > > Jason Yan (6): > powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and > kaslr_early_init() > powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper > powerpc/fsl_booke/64: implement KASLR for fsl_booke64 > powerpc/fsl_booke/64: do not clear the BSS for the second pass > powerpc/fsl_booke/64: clear the original kernel if randomized > powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst > and add 64bit part > > Documentation/powerpc/index.rst | 2 +- > .../{kaslr-booke32.rst => kaslr-booke.rst}| 35 ++- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/kernel/exceptions-64e.S | 23 + > arch/powerpc/kernel/head_64.S | 13 +++ > arch/powerpc/kernel/setup_64.c| 3 + > arch/powerpc/mm/mmu_decl.h| 23 +++-- > arch/powerpc/mm/nohash/kaslr_booke.c | 91 +-- > 8 files changed, 147 insertions(+), 45 deletions(-) > rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) > Acked-by: Scott Wood -Scott
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
On Wed, 2020-04-15 at 18:52 +0200, Christophe Leroy wrote: > > Le 15/04/2020 à 17:24, Wang Wenhu a écrit : > > + > > + if (uiomem >= &info->mem[MAX_UIO_MAPS]) { > > I'd prefer > if (uiomem - info->mem >= MAX_UIO_MAPS) { > > > + dev_warn(&pdev->dev, "more than %d uio-maps for > > device.\n", > > +MAX_UIO_MAPS); > > + break; > > + } > > + } > > + > > + while (uiomem < &info->mem[MAX_UIO_MAPS]) { > > I'd prefer > > while (uiomem - info->mem < MAX_UIO_MAPS) { > I wouldn't. You're turning a simple comparison into a division and a comparison (if the compiler doesn't optimize it back into the original form), and making it less clear in the process. Of course, working with array indices to begin with instead of incrementing a pointer would be more idiomatic. > > + uiomem->size = 0; > > + ++uiomem; > > + } > > + > > + if (info->mem[0].size == 0) { > > Is there any point in doing all the clearing loop above if it's to bail > out here ? > > Wouldn't it be cleaner to do the test above the clearing loop, by just > checking whether uiomem is still equal to info->mem ? There's no point doing the clearing at all, since the array was allocated with kzalloc(). > > + dev_err(&pdev->dev, "error no valid uio-map configured\n"); > > + ret = -EINVAL; > > + goto err_info_free_internel; > > + } > > + > > + info->version = "0.1.0"; > > Could you define some DRIVER_VERSION in the top of the file next to > DRIVER_NAME instead of hard coding in the middle on a function ? That's what v1 had, and Greg KH said to remove it. I'm guessing that he thought it was the common-but-pointless practice of having the driver print a version number that never gets updated, rather than something the UIO API (unfortunately, compared to a feature query interface) expects. That said, I'm not sure what the value is of making it a macro since it should only be used once, that use is self documenting, it isn't tunable, etc. Though if this isn't a macro, UIO_NAME also shouldn't be (and if it is made a macro again, it should be UIO_VERSION, not DRIVER_VERSION). Does this really need a three-part version scheme? What's wrong with a version of "1", to be changed to "2" in the hopefully-unlikely event that the userspace API changes? Assuming UIO is used for this at all, which doesn't seem like a great fit to me. -Scott
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote: > +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { > + { .compatible = "uio,fsl,p2020-l2-cache-controller", }, > + { .compatible = "uio,fsl,p2010-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1020-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1011-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1013-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1022-l2-cache-controller", }, > + { .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, > + { .compatible = "uio,fsl,p1021-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1012-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1025-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1016-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1024-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1015-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1010-l2-cache-controller", }, > + { .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, > + {}, > +}; NACK The device tree describes the hardware, not what driver you want to bind the hardware to, or how you want to allocate the resources. And even if defining nodes for sram allocation were the right way to go, why do you have a separate compatible for each chip when you're just describing software configuration? Instead, have module parameters that take the sizes and alignments you'd like to allocate and expose to userspace. Better still would be some sort of dynamic allocation (e.g. open a fd, ioctl to set the requested size/alignment, if it succeeds you can mmap it, and when the fd is closed the region is freed). -Scott
Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
On Wed, 2020-04-15 at 08:24 -0700, Wang Wenhu wrote: > Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache > could be configured and used as a piece of SRAM which is hignly > friendly for some user level application performances. > > Cc: Greg Kroah-Hartman > Cc: Christophe Leroy > Cc: Scott Wood > Cc: Michael Ellerman > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Wang Wenhu > --- > Changes since v1: > * None > --- > arch/powerpc/platforms/85xx/Kconfig| 2 +- > arch/powerpc/platforms/Kconfig.cputype | 5 +++-- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/85xx/Kconfig > b/arch/powerpc/platforms/85xx/Kconfig > index fa3d29dcb57e..6debb4f1b9cc 100644 > --- a/arch/powerpc/platforms/85xx/Kconfig > +++ b/arch/powerpc/platforms/85xx/Kconfig > @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE > if PPC32 > > config FSL_85XX_CACHE_SRAM > - bool > + bool "Freescale 85xx Cache-Sram" > select PPC_LIB_RHEAP > help > When selected, this option enables cache-sram support NACK As discussed before, the driver that uses this API should "select" this symbol. -Scott
Re: [PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
Le 15/04/2020 à 17:24, Wang Wenhu a écrit : A driver for freescale 85xx platforms to access the Cache-Sram form user level. This is extremely helpful for some user-space applications that require high performance memory accesses. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- Changes since v1: * Addressed comments of Greg K-H * Moved kfree(info->name) into uio_info_free_internal() --- drivers/uio/Kconfig | 8 ++ drivers/uio/Makefile | 1 + drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++ 3 files changed, 191 insertions(+) create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig index 202ee81cfc2b..afd38ec13de0 100644 --- a/drivers/uio/Kconfig +++ b/drivers/uio/Kconfig @@ -105,6 +105,14 @@ config UIO_NETX To compile this driver as a module, choose M here; the module will be called uio_netx. +config UIO_FSL_85XX_CACHE_SRAM + tristate "Freescale 85xx Cache-Sram driver" + depends on FSL_85XX_CACHE_SRAM Is there any point having FSL_85XX_CACHE_SRAM without this ? Should it be the other way round, leave FSL_85XX_CACHE_SRAM unselectable by user, and this driver select FSL_85XX_CACHE_SRAM instead of depending on it ? + help + Generic driver for accessing the Cache-Sram form user level. This + is extremely helpful for some user-space applications that require + high performance memory accesses. + config UIO_FSL_ELBC_GPCM tristate "eLBC/GPCM driver" depends on FSL_LBC diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile index c285dd2a4539..be2056cffc21 100644 --- a/drivers/uio/Makefile +++ b/drivers/uio/Makefile @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o obj-$(CONFIG_UIO_MF624) += uio_mf624.o obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM) += uio_fsl_85xx_cache_sram.o obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c b/drivers/uio/uio_fsl_85xx_cache_sram.c new file mode 100644 index ..fb6903fdaddb --- /dev/null +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd. + * Copyright (C) 2020 Wang Wenhu + * All rights reserved. + */ + +#include +#include +#include +#include +#include +#include + +#define DRIVER_NAME"uio_fsl_85xx_cache_sram" +#define UIO_NAME "uio_cache_sram" + +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { + { .compatible = "uio,fsl,p2020-l2-cache-controller",}, + { .compatible = "uio,fsl,p2010-l2-cache-controller",}, + { .compatible = "uio,fsl,p1020-l2-cache-controller",}, + { .compatible = "uio,fsl,p1011-l2-cache-controller",}, + { .compatible = "uio,fsl,p1013-l2-cache-controller",}, + { .compatible = "uio,fsl,p1022-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8548-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8544-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8572-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8536-l2-cache-controller", }, + { .compatible = "uio,fsl,p1021-l2-cache-controller",}, + { .compatible = "uio,fsl,p1012-l2-cache-controller",}, + { .compatible = "uio,fsl,p1025-l2-cache-controller",}, + { .compatible = "uio,fsl,p1016-l2-cache-controller",}, + { .compatible = "uio,fsl,p1024-l2-cache-controller",}, + { .compatible = "uio,fsl,p1015-l2-cache-controller",}, + { .compatible = "uio,fsl,p1010-l2-cache-controller",}, + { .compatible = "uio,fsl,bsc9131-l2-cache-controller", }, + {}, +}; + +static void uio_info_free_internal(struct uio_info *info) +{ + struct uio_mem *uiomem = &info->mem[0]; + + while (uiomem < &info->mem[MAX_UIO_MAPS]) { + if (uiomem->size) { + mpc85xx_cache_sram_free(uiomem->internal_addr); + kfree(uiomem->name); + } + uiomem++; + } + + kfree(info->name); +} + +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev) +{ + struct device_node *parent = pdev->dev.of_node; + struct device_node *node = NULL; + struct uio_info *info; + struct uio_mem *uiomem; + const char *dt_name; + u32 mem_size; + u32 align; Align is not used outside of the for loop, it should be declared in the loop block. + void *virt; Same for virt +
Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
Le 15/04/2020 à 17:24, Wang Wenhu a écrit : Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache could be configured and used as a piece of SRAM which is hignly friendly for some user level application performances. It looks like following patches are fixing errors generated by selecting FSL_85XX_CACHE_SRAM. So this patch should go after the patches which fixes the errors, ie it should be patch 4 in the series. Christophe
Re: [PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
Le 15/04/2020 à 17:24, Wang Wenhu a écrit : Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache could be configured and used as a piece of SRAM which is hignly friendly for some user level application performances. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- Changes since v1: * None --- arch/powerpc/platforms/85xx/Kconfig| 2 +- arch/powerpc/platforms/Kconfig.cputype | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig index fa3d29dcb57e..6debb4f1b9cc 100644 --- a/arch/powerpc/platforms/85xx/Kconfig +++ b/arch/powerpc/platforms/85xx/Kconfig @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE if PPC32 config FSL_85XX_CACHE_SRAM - bool + bool "Freescale 85xx Cache-Sram" select PPC_LIB_RHEAP help When selected, this option enables cache-sram support diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c3c1902135c..1921e9a573e8 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 config PPC32 - bool + bool "32-bit kernel" Why make that user selectable ? Either a kernel is 64-bit or it is 32-bit. So having PPC64 user selectable is all we need. And what is the link between this change and the description in the log ? default y if !PPC64 select KASAN_VMALLOC if KASAN && MODULES @@ -15,6 +15,7 @@ config PPC_BOOK3S_32 bool menu "Processor support" + Why adding this space ? choice prompt "Processor Type" depends on PPC32 @@ -211,9 +212,9 @@ config PPC_BOOK3E depends on PPC_BOOK3E_64 config E500 + bool "e500 Support" select FSL_EMB_PERFMON select PPC_FSL_BOOK3E - bool Why make this user-selectable ? This is already selected by the processors requiring it, ie 8500, e5500 and e6500. Is there any other case where we need E500 ? And again, what's the link between this change and the description in the log ? config PPC_E500MC bool "e500mc Support" Christophe
Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
> On April 15, 2020 3:45 AM Christophe Leroy wrote: > > > Le 15/04/2020 à 07:11, Christopher M Riedl a écrit : > >> On March 24, 2020 11:25 AM Christophe Leroy > >> wrote: > >> > >> > >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit : > >>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary > >>> mappings to other CPUs. These mappings should be kept local to the CPU > >>> doing the patching. Use the pre-initialized temporary mm and patching > >>> address for this purpose. Also add a check after patching to ensure the > >>> patch succeeded. > >>> > >>> Based on x86 implementation: > >>> > >>> commit b3fd8e83ada0 > >>> ("x86/alternatives: Use temporary mm for text poking") > >>> > >>> Signed-off-by: Christopher M. Riedl > >>> --- > >>>arch/powerpc/lib/code-patching.c | 128 ++- > >>>1 file changed, 57 insertions(+), 71 deletions(-) > >>> > >>> diff --git a/arch/powerpc/lib/code-patching.c > >>> b/arch/powerpc/lib/code-patching.c > >>> index 18b88ecfc5a8..f156132e8975 100644 > >>> --- a/arch/powerpc/lib/code-patching.c > >>> +++ b/arch/powerpc/lib/code-patching.c > >>> @@ -19,6 +19,7 @@ > >>>#include > >>>#include > >>>#include > >>> +#include > >>> > >>>static int __patch_instruction(unsigned int *exec_addr, unsigned int > >>> instr, > >>> unsigned int *patch_addr) > >>> @@ -65,99 +66,79 @@ void __init poking_init(void) > >>> pte_unmap_unlock(ptep, ptl); > >>>} > >>> > >>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); > >>> - > >>> -static int text_area_cpu_up(unsigned int cpu) > >>> -{ > >>> - struct vm_struct *area; > >>> - > >>> - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > >>> - if (!area) { > >>> - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > >>> - cpu); > >>> - return -1; > >>> - } > >>> - this_cpu_write(text_poke_area, area); > >>> - > >>> - return 0; > >>> -} > >>> - > >>> -static int text_area_cpu_down(unsigned int cpu) > >>> -{ > >>> - free_vm_area(this_cpu_read(text_poke_area)); > >>> - return 0; > >>> -} > >>> - > >>> -/* > >>> - * Run as a late init call. This allows all the boot time patching to be > >>> done > >>> - * simply by patching the code, and then we're called here prior to > >>> - * mark_rodata_ro(), which happens after all init calls are run. Although > >>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and > >>> we judge > >>> - * it as being preferable to a kernel that will crash later when someone > >>> tries > >>> - * to use patch_instruction(). > >>> - */ > >>> -static int __init setup_text_poke_area(void) > >>> -{ > >>> - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, > >>> - "powerpc/text_poke:online", text_area_cpu_up, > >>> - text_area_cpu_down)); > >>> - > >>> - return 0; > >>> -} > >>> -late_initcall(setup_text_poke_area); > >>> +struct patch_mapping { > >>> + spinlock_t *ptl; /* for protecting pte table */ > >>> + struct temp_mm temp_mm; > >>> +}; > >>> > >>>/* > >>> * This can be called for kernel text or a module. > >>> */ > >>> -static int map_patch_area(void *addr, unsigned long text_poke_addr) > >>> +static int map_patch(const void *addr, struct patch_mapping > >>> *patch_mapping) > >> > >> Why change the name ? > >> > > > > It's not really an "area" anymore. > > > >>>{ > >>> - unsigned long pfn; > >>> - int err; > >>> + struct page *page; > >>> + pte_t pte, *ptep; > >>> + pgprot_t pgprot; > >>> > >>> if (is_vmalloc_addr(addr)) > >>> - pfn = vmalloc_to_pfn(addr); > >>> + page = vmalloc_to_page(addr); > >>> else > >>> - pfn = __pa_symbol(addr) >> PAGE_SHIFT; > >>> + page = virt_to_page(addr); > >>> > >>> - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); > >>> + if (radix_enabled()) > >>> + pgprot = __pgprot(pgprot_val(PAGE_KERNEL)); > >>> + else > >>> + pgprot = PAGE_SHARED; > >> > >> Can you explain the difference between radix and non radix ? > >> > >> Why PAGE_KERNEL for a page that is mapped in userspace ? > >> > >> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just > >> using PAGE_KERNEL ? > >> > > > > On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for > > kernel to userspace access in __hash_page - hence we cannot access the > > mapping > > if the page is mapped PAGE_KERNEL on hash. However, I would like to use > > PAGE_KERNEL here as well and am working on understanding why this check is > > done in hash and if this can change. On radix this works just fine. > > > > The page is mapped PAGE_KERNEL because the address is technically a > > userspace > > address - but only to keep the mapping local to this CPU doing the patching. > > PAGE_KERNEL makes it clear both in intent and protection that this is a > > kernel > > mapping. > > > > I think the correct
Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching
> On April 15, 2020 4:12 AM Christophe Leroy wrote: > > > Le 15/04/2020 à 07:16, Christopher M Riedl a écrit : > >> On March 26, 2020 9:42 AM Christophe Leroy wrote: > >> > >> > >> This patch fixes the RFC series identified below. > >> It fixes three points: > >> - Failure with CONFIG_PPC_KUAP > >> - Failure to write do to lack of DIRTY bit set on the 8xx > >> - Inadequaly complex WARN post verification > >> > >> However, it has an impact on the CPU load. Here is the time > >> needed on an 8xx to run the ftrace selftests without and > >> with this series: > >> - Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds > >> - With CONFIG_STRICT_KERNEL_RWX==> 40 seconds > >> - With CONFIG_STRICT_KERNEL_RWX + this series ==> 43 seconds > >> > >> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003 > >> Signed-off-by: Christophe Leroy > >> --- > >> arch/powerpc/lib/code-patching.c | 5 - > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/lib/code-patching.c > >> b/arch/powerpc/lib/code-patching.c > >> index f156132e8975..4ccff427592e 100644 > >> --- a/arch/powerpc/lib/code-patching.c > >> +++ b/arch/powerpc/lib/code-patching.c > >> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct > >> patch_mapping *patch_mapping) > >>} > >> > >>pte = mk_pte(page, pgprot); > >> + pte = pte_mkdirty(pte); > >>set_pte_at(patching_mm, patching_addr, ptep, pte); > >> > >>init_temp_mm(&patch_mapping->temp_mm, patching_mm); > >> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, > >> unsigned int instr) > >>(offset_in_page((unsigned long)addr) / > >>sizeof(unsigned int)); > >> > >> + allow_write_to_user(patch_addr, sizeof(instr)); > >>__patch_instruction(addr, instr, patch_addr); > >> + prevent_write_to_user(patch_addr, sizeof(instr)); > >> > > > > On radix we can map the page with PAGE_KERNEL protection which ends up > > setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is > > ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0. > > > > Can we employ a similar approach on the 8xx? I would prefer *not* to wrap > > the __patch_instruction() with the allow_/prevent_write_to_user() KUAP > > things > > because this is a temporary kernel mapping which really isn't userspace in > > the usual sense. > > On the 8xx, that's pretty different. > > The PTE doesn't control whether a page is user page or a kernel page. > The only thing that is set in the PTE is whether a page is linked to a > given PID or not. > PAGE_KERNEL tells that the page can be addressed with any PID. > > The user access right is given by a kind of zone, which is in the PGD > entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. > Every pages below PAGE_OFFSET are defined as belonging to zone 1. > > By default, zone 0 can only be accessed by kernel, and zone 1 can only > be accessed by user. When kernel wants to access zone 1, it temporarily > changes properties of zone 1 to allow both kernel and user accesses. > > So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel > must unlock it to access it. > > > And this is more or less the same on hash/32. This is managed by segment > registers. One segment register corresponds to a 256Mbytes area. Every > pages below PAGE_OFFSET can only be read by default by kernel. Only user > can write if the PTE allows it. When the kernel needs to write at an > address below PAGE_OFFSET, it must change the segment properties in the > corresponding segment register. > > So, for both cases, if we want to have it local to a task while still > allowing kernel access, it means we have to define a new special area > between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone. > > That looks complex to me for a small benefit, especially as 8xx is not > SMP and neither are most of the hash/32 targets. > Agreed. So I guess the solution is to differentiate between radix/non-radix and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP is enabled. Hmm, I need to think about this some more, especially if it's acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED? I don't necessarily want to drop the local mm patching idea for non-radix platforms since that means we would have to maintain two implementations. > Christophe
Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.
> On Apr 10, 2020, at 3:20 PM, Qian Cai wrote: > > > >> On Apr 9, 2020, at 10:14 AM, Steven Rostedt wrote: >> >> On Thu, 9 Apr 2020 06:06:35 -0400 >> Qian Cai wrote: >> > I’ll go to bisect some more but it is going to take a while. > > $ git log --oneline 4c205c84e249..8e99cf91b99b > 8e99cf91b99b tracing: Do not allocate buffer in trace_find_next_entry() > in atomic > 2ab2a0924b99 tracing: Add documentation on set_ftrace_notrace_pid and > set_event_notrace_pid > ebed9628f5c2 selftests/ftrace: Add test to test new set_event_notrace_pid > file > ed8839e072b8 selftests/ftrace: Add test to test new > set_ftrace_notrace_pid file > 276836260301 tracing: Create set_event_notrace_pid to not trace tasks > b3b1e6ededa4 ftrace: Create set_ftrace_notrace_pid to not trace tasks > 717e3f5ebc82 ftrace: Make function trace pid filtering a bit more exact If it is affecting function tracing, it is probably one of the above two commits. >>> >>> OK, it was narrowed down to one of those messed with mcount here, >> >> Thing is, nothing here touches mcount. > > Yes, you are right. I went back to test the commit just before the 5.7-trace > merge request, > I did reproduce there. The thing is that this bastard could take more 6-hour > to happen, > so my previous attempt did not wait long enough. Back to the square one… OK, I starts to test all commits up to 12 hours. The progess on far is, BAD: v5.6-rc1 GOOD: v5.5 GOOD: 153b5c566d30 Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze The next step I’ll be testing, 71c3a888cbca Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux IF that is BAD, the merge request is the culprit. I can see a few commits are more related that others. 5290ae2b8e5f powerpc/64: Use {SAVE,REST}_NVGPRS macros ed0bc98f8cbe powerpc/64s: Reimplement power4_idle code in C Does it ring any bell yet?
Re: [PATCH] lib/mpi: Fix building for powerpc with clang
On Mon, Apr 13, 2020 at 12:50:42PM -0700, Nathan Chancellor wrote: > 0day reports over and over on an powerpc randconfig with clang: > > lib/mpi/generic_mpih-mul1.c:37:13: error: invalid use of a cast in a > inline asm context requiring an l-value: remove the cast or build with > -fheinous-gnu-extensions > > Remove the superfluous casts, which have been done previously for x86 > and arm32 in commit dea632cadd12 ("lib/mpi: fix build with clang") and > commit 7b7c1df2883d ("lib/mpi/longlong.h: fix building with 32-bit > x86"). > > Reported-by: kbuild test robot > Link: https://github.com/ClangBuiltLinux/linux/issues/991 > Signed-off-by: Nathan Chancellor Acked-by: Herbert Xu -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH v2, 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
Include "linux/of_address.h" to fix the compile error for mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c. CC arch/powerpc/sysdev/fsl_85xx_l2ctlr.o arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’: arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of function ‘of_iomap’; did you mean ‘pci_iomap’? [-Werror=implicit-function-declaration] l2ctlr = of_iomap(dev->dev.of_node, 0); ^~~~ pci_iomap arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] l2ctlr = of_iomap(dev->dev.of_node, 0); ^ cc1: all warnings being treated as errors scripts/Makefile.build:267: recipe for target 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1 Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: Wang Wenhu --- Changes since v1: * None --- arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c index 2d0af0c517bb..7533572492f0 100644 --- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c +++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include "fsl_85xx_cache_ctlr.h" -- 2.17.1
[PATCH v2,5/5] drivers: uio: new driver for fsl_85xx_cache_sram
A driver for freescale 85xx platforms to access the Cache-Sram form user level. This is extremely helpful for some user-space applications that require high performance memory accesses. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- Changes since v1: * Addressed comments of Greg K-H * Moved kfree(info->name) into uio_info_free_internal() --- drivers/uio/Kconfig | 8 ++ drivers/uio/Makefile | 1 + drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++ 3 files changed, 191 insertions(+) create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig index 202ee81cfc2b..afd38ec13de0 100644 --- a/drivers/uio/Kconfig +++ b/drivers/uio/Kconfig @@ -105,6 +105,14 @@ config UIO_NETX To compile this driver as a module, choose M here; the module will be called uio_netx. +config UIO_FSL_85XX_CACHE_SRAM + tristate "Freescale 85xx Cache-Sram driver" + depends on FSL_85XX_CACHE_SRAM + help + Generic driver for accessing the Cache-Sram form user level. This + is extremely helpful for some user-space applications that require + high performance memory accesses. + config UIO_FSL_ELBC_GPCM tristate "eLBC/GPCM driver" depends on FSL_LBC diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile index c285dd2a4539..be2056cffc21 100644 --- a/drivers/uio/Makefile +++ b/drivers/uio/Makefile @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o obj-$(CONFIG_UIO_MF624) += uio_mf624.o obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM) += uio_fsl_85xx_cache_sram.o obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c b/drivers/uio/uio_fsl_85xx_cache_sram.c new file mode 100644 index ..fb6903fdaddb --- /dev/null +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd. + * Copyright (C) 2020 Wang Wenhu + * All rights reserved. + */ + +#include +#include +#include +#include +#include +#include + +#define DRIVER_NAME"uio_fsl_85xx_cache_sram" +#define UIO_NAME "uio_cache_sram" + +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { + { .compatible = "uio,fsl,p2020-l2-cache-controller", }, + { .compatible = "uio,fsl,p2010-l2-cache-controller", }, + { .compatible = "uio,fsl,p1020-l2-cache-controller", }, + { .compatible = "uio,fsl,p1011-l2-cache-controller", }, + { .compatible = "uio,fsl,p1013-l2-cache-controller", }, + { .compatible = "uio,fsl,p1022-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, + { .compatible = "uio,fsl,p1021-l2-cache-controller", }, + { .compatible = "uio,fsl,p1012-l2-cache-controller", }, + { .compatible = "uio,fsl,p1025-l2-cache-controller", }, + { .compatible = "uio,fsl,p1016-l2-cache-controller", }, + { .compatible = "uio,fsl,p1024-l2-cache-controller", }, + { .compatible = "uio,fsl,p1015-l2-cache-controller", }, + { .compatible = "uio,fsl,p1010-l2-cache-controller", }, + { .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, + {}, +}; + +static void uio_info_free_internal(struct uio_info *info) +{ + struct uio_mem *uiomem = &info->mem[0]; + + while (uiomem < &info->mem[MAX_UIO_MAPS]) { + if (uiomem->size) { + mpc85xx_cache_sram_free(uiomem->internal_addr); + kfree(uiomem->name); + } + uiomem++; + } + + kfree(info->name); +} + +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev) +{ + struct device_node *parent = pdev->dev.of_node; + struct device_node *node = NULL; + struct uio_info *info; + struct uio_mem *uiomem; + const char *dt_name; + u32 mem_size; + u32 align; + void *virt; + phys_addr_t phys; + int ret = -ENODEV; + + /* alloc uio_info for one device */ + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) { + ret = -ENOMEM; + goto err_out; + } + + /* get optional uio name */ + if (of_property_read_string(parent, "uio_name", &dt_name)) +
[PATCH v2, 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
Include linux/io.h into fsl_85xx_cache_sram.c to fix the implicit-declaration compile error when building Cache-Sram. arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’: arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? [-Werror=implicit-function-declaration] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^~~~ bitmap_complement arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^ arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of function ‘iounmap’; did you mean ‘roundup’? [-Werror=implicit-function-declaration] iounmap(cache_sram->base_virt); ^~~ roundup cc1: all warnings being treated as errors Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: WANG Wenhu --- Changes since v1: * None --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index f6c665dac725..be3aef4229d7 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "fsl_85xx_cache_ctlr.h" -- 2.17.1
[PATCH v2,0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram
This series add a new uio driver for freescale 85xx platforms to access the Cache-Sram form user level. This is extremely helpful for the user-space applications that require high performance memory accesses. It fixes the compile errors and warning of the hardware level drivers and implements the uio driver in uio_fsl_85xx_cache_sram.c. Changes since v1: * Addressed comments of Greg K-H * Moved kfree(info->name) into uio_info_free_internal() Wang Wenhu (5): powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable powerpc: sysdev: fix compile error for fsl_85xx_cache_sram powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr drivers: uio: new driver for fsl_85xx_cache_sram arch/powerpc/platforms/85xx/Kconfig | 2 +- arch/powerpc/platforms/Kconfig.cputype| 5 +- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 3 +- arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 + drivers/uio/Kconfig | 8 + drivers/uio/Makefile | 1 + drivers/uio/uio_fsl_85xx_cache_sram.c | 182 ++ 7 files changed, 198 insertions(+), 4 deletions(-) create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c -- 2.17.1
[PATCH v2,1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache could be configured and used as a piece of SRAM which is hignly friendly for some user level application performances. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- Changes since v1: * None --- arch/powerpc/platforms/85xx/Kconfig| 2 +- arch/powerpc/platforms/Kconfig.cputype | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig index fa3d29dcb57e..6debb4f1b9cc 100644 --- a/arch/powerpc/platforms/85xx/Kconfig +++ b/arch/powerpc/platforms/85xx/Kconfig @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE if PPC32 config FSL_85XX_CACHE_SRAM - bool + bool "Freescale 85xx Cache-Sram" select PPC_LIB_RHEAP help When selected, this option enables cache-sram support diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c3c1902135c..1921e9a573e8 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 config PPC32 - bool + bool "32-bit kernel" default y if !PPC64 select KASAN_VMALLOC if KASAN && MODULES @@ -15,6 +15,7 @@ config PPC_BOOK3S_32 bool menu "Processor support" + choice prompt "Processor Type" depends on PPC32 @@ -211,9 +212,9 @@ config PPC_BOOK3E depends on PPC_BOOK3E_64 config E500 + bool "e500 Support" select FSL_EMB_PERFMON select PPC_FSL_BOOK3E - bool config PPC_E500MC bool "e500mc Support" -- 2.17.1
[PATCH v2, 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
Function instantiate_cache_sram should not be linked into the init section for its caller mpc85xx_l2ctlr_of_probe is none-__init. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: Wang Wenhu Warning information: MODPOST vmlinux.o WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from the function mpc85xx_l2ctlr_of_probe() to the function .init.text:instantiate_cache_sram() The function mpc85xx_l2ctlr_of_probe() references the function __init instantiate_cache_sram(). This is often because mpc85xx_l2ctlr_of_probe lacks a __init annotation or the annotation of instantiate_cache_sram is wrong. --- Changes since v1: * None --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index be3aef4229d7..3de5ac8382c0 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr) } EXPORT_SYMBOL(mpc85xx_cache_sram_free); -int __init instantiate_cache_sram(struct platform_device *dev, +int instantiate_cache_sram(struct platform_device *dev, struct sram_parameters sram_params) { int ret = 0; -- 2.17.1
[PATCH] powerpc/uaccess: Don't set KUAP by default on book3s/32
On book3s/32, KUAP is an heavy process as it requires to determine which segments are impacted and unlock/lock each of them. And since the implementation of user_access_begin/end, it is even worth for the time being because unlike __get_user(), user_access_begin doesn't make difference between read and write and unlocks access also for read allthought that's unneeded on book3s/32. As shown by the size of a kernel built with KUAP and one without, the overhead is 64k bytes of code. As a comparison a similar build on an 8xx has an overhead of only 8k bytes of code. textdata bss dec hex filename 7230416 1425868 837376 9493660 90dc9c vmlinux.kuap6xx 7165012 1425548 837376 9427936 8fdbe0 vmlinux.nokuap6xx 6519796 1960028 477464 8957288 88ad68 vmlinux.kuap8xx 6511664 1959864 477464 8948992 888d00 vmlinux.nokuap8xx Until a more optimised KUAP is implemented on book3s/32, don't select it by default. Signed-off-by: Christophe Leroy --- arch/powerpc/platforms/Kconfig.cputype | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c3c1902135c..0c7151c98b56 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -389,7 +389,7 @@ config PPC_HAVE_KUAP config PPC_KUAP bool "Kernel Userspace Access Protection" depends on PPC_HAVE_KUAP - default y + default y if !PPC_BOOK3S_32 help Enable support for Kernel Userspace Access Protection (KUAP) -- 2.25.0
[PATCH] powerpc/uaccess: Don't set KUEP by default on book3s/32
On book3s/32, KUEP is an heavy process as it requires to set/unset the NX bit in each of the 12 user segments everytime the kernel is entered/exited from/to user space. Don't select KUEP by default on book3s/32. Signed-off-by: Christophe Leroy --- arch/powerpc/platforms/Kconfig.cputype | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c7151c98b56..11412078e732 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -377,7 +377,7 @@ config PPC_HAVE_KUEP config PPC_KUEP bool "Kernel Userspace Execution Prevention" depends on PPC_HAVE_KUEP - default y + default y if !PPC_BOOK3S_32 help Enable support for Kernel Userspace Execution Prevention (KUEP) -- 2.25.0
[PATCH 01/34] docs: filesystems: fix references for doc files there
Several files there were renamed to ReST. Fix the broken references. Signed-off-by: Mauro Carvalho Chehab --- Documentation/ABI/stable/sysfs-devices-node | 2 +- Documentation/ABI/testing/procfs-smaps_rollup | 2 +- Documentation/admin-guide/cpu-load.rst| 2 +- Documentation/admin-guide/nfs/nfsroot.rst | 2 +- Documentation/driver-api/driver-model/device.rst | 2 +- Documentation/driver-api/driver-model/overview.rst| 2 +- Documentation/filesystems/dax.txt | 2 +- Documentation/filesystems/dnotify.txt | 2 +- Documentation/filesystems/ramfs-rootfs-initramfs.rst | 2 +- Documentation/powerpc/firmware-assisted-dump.rst | 2 +- Documentation/process/adding-syscalls.rst | 2 +- .../translations/it_IT/process/adding-syscalls.rst| 2 +- Documentation/translations/zh_CN/filesystems/sysfs.txt| 6 +++--- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 +- fs/Kconfig| 2 +- fs/Kconfig.binfmt | 2 +- fs/adfs/Kconfig | 2 +- fs/affs/Kconfig | 2 +- fs/afs/Kconfig| 6 +++--- fs/bfs/Kconfig| 2 +- fs/cramfs/Kconfig | 2 +- fs/ecryptfs/Kconfig | 2 +- fs/fat/Kconfig| 8 fs/fuse/Kconfig | 2 +- fs/fuse/dev.c | 2 +- fs/hfs/Kconfig| 2 +- fs/hpfs/Kconfig | 2 +- fs/isofs/Kconfig | 2 +- fs/namespace.c| 2 +- fs/notify/inotify/Kconfig | 2 +- fs/ntfs/Kconfig | 2 +- fs/ocfs2/Kconfig | 2 +- fs/overlayfs/Kconfig | 6 +++--- fs/proc/Kconfig | 4 ++-- fs/romfs/Kconfig | 2 +- fs/sysfs/dir.c| 2 +- fs/sysfs/file.c | 2 +- fs/sysfs/mount.c | 2 +- fs/sysfs/symlink.c| 2 +- fs/sysv/Kconfig | 2 +- fs/udf/Kconfig| 2 +- include/linux/relay.h | 2 +- include/linux/sysfs.h | 2 +- kernel/relay.c| 2 +- 44 files changed, 54 insertions(+), 54 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index df8413cf1468..484fc04bcc25 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -54,7 +54,7 @@ Date: October 2002 Contact: Linux Memory Management list Description: Provides information about the node's distribution and memory - utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt + utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst What: /sys/devices/system/node/nodeX/numastat Date: October 2002 diff --git a/Documentation/ABI/testing/procfs-smaps_rollup b/Documentation/ABI/testing/procfs-smaps_rollup index 274df44d8b1b..046978193368 100644 --- a/Documentation/ABI/testing/procfs-smaps_rollup +++ b/Documentation/ABI/testing/procfs-smaps_rollup @@ -11,7 +11,7 @@ Description: Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem are not present in /proc/pid/smaps. These fields represent the sum of the Pss field of each type (anon, file, shmem). - For more details, see Documentation/filesystems/proc.txt + For more details, see Documentation/filesystems/proc.rst and the procfs man page. Typical output looks like this: diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst index 2d01ce43d2a2..ebdecf864080 100644 --- a/Documentation/admin-guide/cpu-load.rst +++ b/Documentation/admin-guide/cpu-load.rst @@ -105,7 +105,7 @@ References -- - http://lkml.org/lkml/2007/2/12/6 -- Documentation/filesystems/proc.txt (1.8) +- Documentation/filesystems/proc.rst (1.8) Thanks diff --git a/Documentation/admin-guide/nfs/nfsroot.rst
[PATCH 29/34] docs: filesystems: convert spufs/spufs.txt to ReST
This file is at groff output format. Manually convert it to ReST format, trying to preserve a similar output after parsed. Signed-off-by: Mauro Carvalho Chehab --- Documentation/filesystems/spufs/index.rst | 1 + .../spufs/{spufs.txt => spufs.rst}| 59 +-- MAINTAINERS | 2 +- 3 files changed, 30 insertions(+), 32 deletions(-) rename Documentation/filesystems/spufs/{spufs.txt => spufs.rst} (95%) diff --git a/Documentation/filesystems/spufs/index.rst b/Documentation/filesystems/spufs/index.rst index 39553c6ebefd..939cf59a7d9e 100644 --- a/Documentation/filesystems/spufs/index.rst +++ b/Documentation/filesystems/spufs/index.rst @@ -8,4 +8,5 @@ SPU Filesystem .. toctree:: :maxdepth: 1 + spufs spu_create diff --git a/Documentation/filesystems/spufs/spufs.txt b/Documentation/filesystems/spufs/spufs.rst similarity index 95% rename from Documentation/filesystems/spufs/spufs.txt rename to Documentation/filesystems/spufs/spufs.rst index caf36aaae804..8a42859bb100 100644 --- a/Documentation/filesystems/spufs/spufs.txt +++ b/Documentation/filesystems/spufs/spufs.rst @@ -1,12 +1,18 @@ -SPUFS(2) Linux Programmer's Manual SPUFS(2) +.. SPDX-License-Identifier: GPL-2.0 += +spufs += +Name + -NAME spufs - the SPU file system -DESCRIPTION +Description +=== + The SPU file system is used on PowerPC machines that implement the Cell Broadband Engine Architecture in order to access Synergistic Processor Units (SPUs). @@ -21,7 +27,9 @@ DESCRIPTION ally add or remove files. -MOUNT OPTIONS +Mount Options += + uid= set the user owning the mount point, the default is 0 (root). @@ -29,7 +37,9 @@ MOUNT OPTIONS set the group owning the mount point, the default is 0 (root). -FILES +Files += + The files in spufs mostly follow the standard behavior for regular sys- tem calls like read(2) or write(2), but often support only a subset of the operations supported on regular file systems. This list details the @@ -125,14 +135,12 @@ FILES space is available for writing. - /mbox_stat - /ibox_stat - /wbox_stat + /mbox_stat, /ibox_stat, /wbox_stat Read-only files that contain the length of the current queue, i.e. how many words can be read from mbox or ibox or how many words can be written to wbox without blocking. The files can be read only in 4-byte units and return a big-endian binary integer number. The possible - operations on an open *box_stat file are: + operations on an open ``*box_stat`` file are: read(2) If a count smaller than four is requested, read returns -1 and @@ -143,12 +151,7 @@ FILES in EAGAIN. - /npc - /decr - /decr_status - /spu_tag_mask - /event_mask - /srr0 + /npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0 Internal registers of the SPU. The representation is an ASCII string with the numeric value of the next instruction to be executed. These can be used in read/write mode for debugging, but normal operation of @@ -157,17 +160,14 @@ FILES The contents of these files are: + === === npc Next Program Counter - decrSPU Decrementer - decr_status Decrementer Status - spu_tag_maskMFC tag mask for SPU DMA - event_mask Event mask for SPU interrupts - srr0Interrupt Return address register + === === The possible operations on an open npc, decr, decr_status, @@ -206,8 +206,7 @@ FILES from the data buffer, updating the value of the fpcr register. - /signal1 - /signal2 + /signal1, /signal2 The two signal notification channels of an SPU. These are read-write files that operate on a 32 bit word. Writing to one of these files triggers an interrupt on the SPU. The value written to the signal @@ -233,8 +232,7 @@ FILES file. - /signal1_type - /signal2_type + /signal1_type, /signal2_type These two files change the behavior of the signal1 and signal2 notifi- cation files. The contain a numerical ASCII string which is read as either "1" or "0". In mode 0 (overwrite), the hardware replaces the @@ -259,18 +257,17 @@ FILES the previous setting. -EXAMPLES +Examples + /etc/fstab entry none /spu spufs gid=spu 00 -AUTHORS +Authors +=== Arnd Bergmann , Mark Nutter , Ulrich Weigand -SEE ALSO +See Also +
[PATCH 00/34] fs: convert remaining docs to ReST file format
This patch series convert the remaining files under Documentation/filesystems to the ReST file format. It is based on linux-next (next-20200414). PS.: I opted to add mainly ML from the output of get_maintainers.pl to the c/c list of patch 00/34, because otherwise the number of c/c would be too many, with would very likely cause ML servers to reject it. The results of those changes (together with other changes from my pending doc patches) are available at: https://www.infradead.org/~mchehab/kernel_docs/filesystems/index.html Mauro Carvalho Chehab (34): docs: filesystems: fix references for doc files there docs: filesystems: convert caching/object.txt to ReST docs: filesystems: convert caching/fscache.txt to ReST format docs: filesystems: caching/netfs-api.txt: convert it to ReST docs: filesystems: caching/operations.txt: convert it to ReST docs: filesystems: caching/cachefiles.txt: convert to ReST docs: filesystems: caching/backend-api.txt: convert it to ReST docs: filesystems: convert cifs/cifsroot.rst to ReST docs: filesystems: convert configfs.txt to ReST docs: filesystems: convert automount-support.txt to ReST docs: filesystems: convert coda.txt to ReST docs: filesystems: convert dax.txt to ReST docs: filesystems: convert devpts.txt to ReST docs: filesystems: convert dnotify.txt to ReST docs: filesystems: convert fiemap.txt to ReST docs: filesystems: convert files.txt to ReST docs: filesystems: convert fuse-io.txt to ReST docs: filesystems: convert gfs2-glocks.txt to ReST docs: filesystems: convert locks.txt to ReST docs: filesystems: convert mandatory-locking.txt to ReST docs: filesystems: convert mount_api.txt to ReST docs: filesystems: rename path-lookup.txt file docs: filesystems: convert path-walking.txt to ReST docs: filesystems: convert quota.txt to ReST docs: filesystems: convert seq_file.txt to ReST docs: filesystems: convert sharedsubtree.txt to ReST docs: filesystems: split spufs.txt into 3 separate files docs: filesystems: convert spufs/spu_create.txt to ReST docs: filesystems: convert spufs/spufs.txt to ReST docs: filesystems: convert spufs/spu_run.txt to ReST docs: filesystems: convert sysfs-pci.txt to ReST docs: filesystems: convert sysfs-tagging.txt to ReST docs: filesystems: convert xfs-delayed-logging-design.txt to ReST docs: filesystems: convert xfs-self-describing-metadata.txt to ReST Documentation/ABI/stable/sysfs-devices-node |2 +- Documentation/ABI/testing/procfs-smaps_rollup |2 +- Documentation/admin-guide/cpu-load.rst|2 +- Documentation/admin-guide/ext4.rst|2 +- Documentation/admin-guide/nfs/nfsroot.rst |2 +- Documentation/admin-guide/sysctl/kernel.rst |2 +- .../driver-api/driver-model/device.rst|2 +- .../driver-api/driver-model/overview.rst |2 +- ...ount-support.txt => automount-support.rst} | 23 +- .../{backend-api.txt => backend-api.rst} | 165 +- .../{cachefiles.txt => cachefiles.rst}| 139 +- Documentation/filesystems/caching/fscache.rst | 565 ++ Documentation/filesystems/caching/fscache.txt | 448 - Documentation/filesystems/caching/index.rst | 14 + .../caching/{netfs-api.txt => netfs-api.rst} | 172 +- .../caching/{object.txt => object.rst}| 43 +- .../{operations.txt => operations.rst}| 45 +- .../cifs/{cifsroot.txt => cifsroot.rst} | 56 +- Documentation/filesystems/coda.rst| 1670 Documentation/filesystems/coda.txt| 1676 - .../{configfs/configfs.txt => configfs.rst} | 129 +- .../filesystems/{dax.txt => dax.rst} | 11 +- Documentation/filesystems/devpts.rst | 36 + Documentation/filesystems/devpts.txt | 26 - .../filesystems/{dnotify.txt => dnotify.rst} | 13 +- Documentation/filesystems/ext2.rst|2 +- .../filesystems/{fiemap.txt => fiemap.rst}| 133 +- .../filesystems/{files.txt => files.rst} | 15 +- .../filesystems/{fuse-io.txt => fuse-io.rst} |6 + .../{gfs2-glocks.txt => gfs2-glocks.rst} | 147 +- Documentation/filesystems/index.rst | 26 + .../filesystems/{locks.txt => locks.rst} | 14 +- ...tory-locking.txt => mandatory-locking.rst} | 25 +- .../{mount_api.txt => mount_api.rst} | 329 ++-- .../{path-lookup.txt => path-walking.rst} | 88 +- Documentation/filesystems/porting.rst |2 +- Documentation/filesystems/proc.rst|2 +- .../filesystems/{quota.txt => quota.rst} | 41 +- .../filesystems/ramfs-rootfs-initramfs.rst|2 +- .../{seq_file.txt => seq_file.rst}| 61 +- .../{sharedsubtree.txt => sharedsubtree.rst} | 394 ++-- Documentation/filesystems/spufs/index.rst | 13 + .../filesystems/spufs/spu_create.rst | 131 ++ Documentation/filesystems/spufs/spu_run.rst | 138 ++ .../{spufs.txt => spufs/spufs.
Re: [PATCH v6 6/7] ASoC: dt-bindings: fsl_easrc: Add document for EASRC
On Tue, Apr 14, 2020 at 9:56 PM Shengjiu Wang wrote: > > Hi Rob > > On Tue, Apr 14, 2020 at 11:49 PM Rob Herring wrote: > > > > On Wed, Apr 01, 2020 at 04:45:39PM +0800, Shengjiu Wang wrote: > > > EASRC (Enhanced Asynchronous Sample Rate Converter) is a new > > > IP module found on i.MX8MN. > > > > > > Signed-off-by: Shengjiu Wang > > > --- > > > .../devicetree/bindings/sound/fsl,easrc.yaml | 101 ++ > > > 1 file changed, 101 insertions(+) > > > create mode 100644 Documentation/devicetree/bindings/sound/fsl,easrc.yaml > > > > > > diff --git a/Documentation/devicetree/bindings/sound/fsl,easrc.yaml > > > b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml > > > new file mode 100644 > > > index ..14ea60084420 > > > --- /dev/null > > > +++ b/Documentation/devicetree/bindings/sound/fsl,easrc.yaml > > > @@ -0,0 +1,101 @@ > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > > +%YAML 1.2 > > > +--- > > > +$id: http://devicetree.org/schemas/sound/fsl,easrc.yaml# > > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > > + > > > +title: NXP Asynchronous Sample Rate Converter (ASRC) Controller > > > + > > > +maintainers: > > > + - Shengjiu Wang > > > + > > > +properties: > > > + $nodename: > > > +pattern: "^easrc@.*" > > > + > > > + compatible: > > > +const: fsl,imx8mn-easrc > > > + > > > + reg: > > > +maxItems: 1 > > > + > > > + interrupts: > > > +maxItems: 1 > > > + > > > + clocks: > > > +items: > > > + - description: Peripheral clock > > > + > > > + clock-names: > > > +items: > > > + - const: mem > > > + > > > + dmas: > > > +maxItems: 8 > > > + > > > + dma-names: > > > +items: > > > + - const: ctx0_rx > > > + - const: ctx0_tx > > > + - const: ctx1_rx > > > + - const: ctx1_tx > > > + - const: ctx2_rx > > > + - const: ctx2_tx > > > + - const: ctx3_rx > > > + - const: ctx3_tx > > > + > > > + firmware-name: > > > +allOf: > > > + - $ref: /schemas/types.yaml#/definitions/string > > > + - const: imx/easrc/easrc-imx8mn.bin > > > +description: The coefficient table for the filters > > > + > > > + fsl,asrc-rate: > > > > fsl,asrc-rate-hz > > Can we keep "fsl,asrc-rate", because I want this property > align with the one in fsl,asrc.txt. These two asrc modules > can share same property name. Oh, yes. So with the example fixed: Reviewed-by: Rob Herring
Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram>On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote:
Hi, Greg k-h! Thank you for you fast reply. All the comments will be addressed with v2 soon. Detailed explanations are just below specific comment. >> A driver for freescale 85xx platforms to access the Cache-Sram form >> user level. This is extremely helpful for some user-space applications >> that require high performance memory accesses. >> >> Cc: Greg Kroah-Hartman >> Cc: Christophe Leroy >> Cc: Scott Wood >> Cc: Michael Ellerman >> Cc: linuxppc-dev@lists.ozlabs.org >> Signed-off-by: Wang Wenhu >> --- >> drivers/uio/Kconfig | 8 ++ >> drivers/uio/Makefile | 1 + >> drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++ >> 3 files changed, 204 insertions(+) >> create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c >> >> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig >> index 202ee81cfc2b..afd38ec13de0 100644 >> --- a/drivers/uio/Kconfig >> +++ b/drivers/uio/Kconfig >> @@ -105,6 +105,14 @@ config UIO_NETX >>To compile this driver as a module, choose M here; the module >>will be called uio_netx. >> >> +config UIO_FSL_85XX_CACHE_SRAM >> +tristate "Freescale 85xx Cache-Sram driver" >> +depends on FSL_85XX_CACHE_SRAM >> +help >> + Generic driver for accessing the Cache-Sram form user level. This >> + is extremely helpful for some user-space applications that require >> + high performance memory accesses. >> + >> config UIO_FSL_ELBC_GPCM >> tristate "eLBC/GPCM driver" >> depends on FSL_LBC >> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile >> index c285dd2a4539..be2056cffc21 100644 >> --- a/drivers/uio/Makefile >> +++ b/drivers/uio/Makefile >> @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o >> obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o >> obj-$(CONFIG_UIO_MF624) += uio_mf624.o >> obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o >> +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM) += uio_fsl_85xx_cache_sram.o >> obj-$(CONFIG_UIO_HV_GENERIC)+= uio_hv_generic.o >> diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c >> b/drivers/uio/uio_fsl_85xx_cache_sram.c >> new file mode 100644 >> index ..e11202dd5b93 >> --- /dev/null >> +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c >> @@ -0,0 +1,195 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd. >> + * Copyright (C) 2020 Wang Wenhu >> + * All rights reserved. >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published >> + * by the Free Software Foundation. > >Nit, you don't need this sentance anymore now that you have the SPDX >line above > Got, I will delete it with v2. >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#define DRIVER_VERSION "0.1.0" > >Don't do DRIVER_VERSIONs, they never work once the code is in the kernel >tree. > >> +#define DRIVER_NAME "uio_fsl_85xx_cache_sram" > >KBUILD_MODNAME? Yes, and sorry for that I did not get what should have been done? > >> +#define UIO_NAME"uio_cache_sram" >> + >> +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { >> +{ .compatible = "uio,fsl,p2020-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p2010-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1020-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1011-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1013-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1022-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, >> +{ .compatible = "uio,fsl,p1021-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1012-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1025-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1016-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1024-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1015-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,p1010-l2-cache-controller", }, >> +{ .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, >> +{}, >> +}; >> + >> +static void uio_info_free_internal(struct uio_info *info) >> +{ >> +struct uio_mem *uiomem = &info->mem[0]; >> + >> +while (uiomem < &info->mem[MAX_UIO_MAPS]) { >> +if (uiomem->size) { >> +mpc85xx_cache_sram_free(uiomem->internal_addr); >> +kfree(uiomem->name); >>
Re: CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts
On Wed, Apr 15, 2020 at 10:52:53PM +1000, Andrew Donnellan wrote: > The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the > Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and > User Authority Mask Override Register (UAMOR) are not correctly saved and > restored when the CPU is going into/coming out of idle state. > > On POWER9 CPUs, this means that a CPU may return from idle with the AMR > value of another thread on the same core. > > This allows a trivial Denial of Service attack against KVM hosts, by booting > a guest kernel which makes use of the AMR, such as a v5.2 or later kernel > with Kernel Userspace Access Prevention (KUAP) enabled. > > The guest kernel will set the AMR to prevent userspace access, then the > thread will go idle. At a later point, the hardware thread that the guest > was using may come out of idle and start executing in the host, without > restoring the host AMR value. The host kernel can get caught in a page fault > loop, as the AMR is unexpectedly causing memory accesses to fail in the > host, and the host is eventually rendered unusable. Hello, shouldn't the kernel restore the host registers when leaving the guest? I recall some code exists for handling the *AM*R when leaving guest. Can the KVM guest enter idle without exiting to host? Thanks Michal
[PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property
>> Use of_property_read_u32 to read the "reg" and "i2c-address" property >> instead of using of_get_property to check the return values. >> >> Signed-off-by: Aishwarya R > This is quite a fragile driver. Have you tested it on HW? This change is not tested with the Hardware. But of_property_read_u32 is better here than generic of_get_property. This make sure that value read properly independent of system endianess.
Re: [PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
On Wed, Apr 15, 2020 at 10:40:05PM +1000, Andrew Donnellan wrote: > From: Michael Ellerman > > commit 53a712bae5dd919521a58d7bad773b949358add0 upstream. > > In order to implement KUAP (Kernel Userspace Access Protection) on > Power9 we will be using the AMR, and therefore indirectly the > UAMOR/AMOR. > > So save/restore these regs in the idle code. > > Signed-off-by: Michael Ellerman > [ajd: Backport to 4.19 tree, CVE-2020-11669] > Signed-off-by: Andrew Donnellan > --- > arch/powerpc/kernel/idle_book3s.S | 27 +++ > 1 file changed, 23 insertions(+), 4 deletions(-) This and the 4.14 patch now queued up, thanks. greg k-h
CVE-2020-11669: Linux kernel 4.10 to 5.1: powerpc: guest can cause DoS on POWER9 KVM hosts
The Linux kernel for powerpc from v4.10 to v5.1 has a bug where the Authority Mask Register (AMR), Authority Mask Override Register (AMOR) and User Authority Mask Override Register (UAMOR) are not correctly saved and restored when the CPU is going into/coming out of idle state. On POWER9 CPUs, this means that a CPU may return from idle with the AMR value of another thread on the same core. This allows a trivial Denial of Service attack against KVM hosts, by booting a guest kernel which makes use of the AMR, such as a v5.2 or later kernel with Kernel Userspace Access Prevention (KUAP) enabled. The guest kernel will set the AMR to prevent userspace access, then the thread will go idle. At a later point, the hardware thread that the guest was using may come out of idle and start executing in the host, without restoring the host AMR value. The host kernel can get caught in a page fault loop, as the AMR is unexpectedly causing memory accesses to fail in the host, and the host is eventually rendered unusable. The fix is to correctly save and restore the AMR in the idle state handling code. The bug does not affect POWER8 or earlier Power CPUs. CVE-2020-11669 has been assigned. The bug has already been fixed upstream in kernels v5.2 onwards, by [0]. Fixes have been submitted for inclusion in upstream stable kernel trees for v4.19[1] and v4.14[2]. The bug is already fixed in Red Hat Enterprise Linux 8 kernels from 4.18.0-147 onwards - see RHSA-2019:3517[3]. Thanks to David Gibson of Red Hat for the initial bug report. [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53a712bae5dd919521a58d7bad773b949358add0 [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208661.html [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208660.html [3] https://access.redhat.com/errata/RHSA-2019:3517 -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram
On Wed, Apr 15, 2020 at 05:33:46AM -0700, Wang Wenhu wrote: > A driver for freescale 85xx platforms to access the Cache-Sram form > user level. This is extremely helpful for some user-space applications > that require high performance memory accesses. > > Cc: Greg Kroah-Hartman > Cc: Christophe Leroy > Cc: Scott Wood > Cc: Michael Ellerman > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Wang Wenhu > --- > drivers/uio/Kconfig | 8 ++ > drivers/uio/Makefile | 1 + > drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++ > 3 files changed, 204 insertions(+) > create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c > > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig > index 202ee81cfc2b..afd38ec13de0 100644 > --- a/drivers/uio/Kconfig > +++ b/drivers/uio/Kconfig > @@ -105,6 +105,14 @@ config UIO_NETX > To compile this driver as a module, choose M here; the module > will be called uio_netx. > > +config UIO_FSL_85XX_CACHE_SRAM > + tristate "Freescale 85xx Cache-Sram driver" > + depends on FSL_85XX_CACHE_SRAM > + help > + Generic driver for accessing the Cache-Sram form user level. This > + is extremely helpful for some user-space applications that require > + high performance memory accesses. > + > config UIO_FSL_ELBC_GPCM > tristate "eLBC/GPCM driver" > depends on FSL_LBC > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile > index c285dd2a4539..be2056cffc21 100644 > --- a/drivers/uio/Makefile > +++ b/drivers/uio/Makefile > @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o > obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o > obj-$(CONFIG_UIO_MF624) += uio_mf624.o > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o > +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)+= uio_fsl_85xx_cache_sram.o > obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o > diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c > b/drivers/uio/uio_fsl_85xx_cache_sram.c > new file mode 100644 > index ..e11202dd5b93 > --- /dev/null > +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c > @@ -0,0 +1,195 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd. > + * Copyright (C) 2020 Wang Wenhu > + * All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published > + * by the Free Software Foundation. Nit, you don't need this sentance anymore now that you have the SPDX line above > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DRIVER_VERSION "0.1.0" Don't do DRIVER_VERSIONs, they never work once the code is in the kernel tree. > +#define DRIVER_NAME "uio_fsl_85xx_cache_sram" KBUILD_MODNAME? > +#define UIO_NAME "uio_cache_sram" > + > +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { > + { .compatible = "uio,fsl,p2020-l2-cache-controller", }, > + { .compatible = "uio,fsl,p2010-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1020-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1011-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1013-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1022-l2-cache-controller", }, > + { .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, > + { .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, > + { .compatible = "uio,fsl,p1021-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1012-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1025-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1016-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1024-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1015-l2-cache-controller", }, > + { .compatible = "uio,fsl,p1010-l2-cache-controller", }, > + { .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, > + {}, > +}; > + > +static void uio_info_free_internal(struct uio_info *info) > +{ > + struct uio_mem *uiomem = &info->mem[0]; > + > + while (uiomem < &info->mem[MAX_UIO_MAPS]) { > + if (uiomem->size) { > + mpc85xx_cache_sram_free(uiomem->internal_addr); > + kfree(uiomem->name); > + } > + uiomem++; > + } > +} > + > +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev) > +{ > + struct device_node *parent = pdev->dev.of_node; > + struct device_node *node = NULL; > + struct uio_info *info; > +
[PATCH 4.19] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
From: Michael Ellerman commit 53a712bae5dd919521a58d7bad773b949358add0 upstream. In order to implement KUAP (Kernel Userspace Access Protection) on Power9 we will be using the AMR, and therefore indirectly the UAMOR/AMOR. So save/restore these regs in the idle code. Signed-off-by: Michael Ellerman [ajd: Backport to 4.19 tree, CVE-2020-11669] Signed-off-by: Andrew Donnellan --- arch/powerpc/kernel/idle_book3s.S | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 36178000a2f2..4a860d3b9229 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -170,8 +170,11 @@ core_idle_lock_held: bne-core_idle_lock_held blr -/* Reuse an unused pt_regs slot for IAMR */ +/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */ +#define PNV_POWERSAVE_AMR _TRAP #define PNV_POWERSAVE_IAMR _DAR +#define PNV_POWERSAVE_UAMOR_DSISR +#define PNV_POWERSAVE_AMOR RESULT /* * Pass requested state in r3: @@ -205,8 +208,16 @@ pnv_powersave_common: SAVE_NVGPRS(r1) BEGIN_FTR_SECTION + mfspr r4, SPRN_AMR mfspr r5, SPRN_IAMR + mfspr r6, SPRN_UAMOR + std r4, PNV_POWERSAVE_AMR(r1) std r5, PNV_POWERSAVE_IAMR(r1) + std r6, PNV_POWERSAVE_UAMOR(r1) +BEGIN_FTR_SECTION_NESTED(42) + mfspr r7, SPRN_AMOR + std r7, PNV_POWERSAVE_AMOR(r1) +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mfcrr5 @@ -935,12 +946,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) REST_GPR(2, r1) BEGIN_FTR_SECTION - /* IAMR was saved in pnv_powersave_common() */ + /* These regs were saved in pnv_powersave_common() */ + ld r4, PNV_POWERSAVE_AMR(r1) ld r5, PNV_POWERSAVE_IAMR(r1) + ld r6, PNV_POWERSAVE_UAMOR(r1) + mtspr SPRN_AMR, r4 mtspr SPRN_IAMR, r5 + mtspr SPRN_UAMOR, r6 +BEGIN_FTR_SECTION_NESTED(42) + ld r7, PNV_POWERSAVE_AMOR(r1) + mtspr SPRN_AMOR, r7 +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) /* -* We don't need an isync here because the upcoming mtmsrd is -* execution synchronizing. +* We don't need an isync here after restoring IAMR because the upcoming +* mtmsrd is execution synchronizing. */ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) -- 2.20.1
[PATCH 4.14] powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
From: Michael Ellerman commit 53a712bae5dd919521a58d7bad773b949358add0 upstream. In order to implement KUAP (Kernel Userspace Access Protection) on Power9 we will be using the AMR, and therefore indirectly the UAMOR/AMOR. So save/restore these regs in the idle code. Signed-off-by: Michael Ellerman [ajd: Backport to 4.14 tree, CVE-2020-11669] Signed-off-by: Andrew Donnellan --- arch/powerpc/kernel/idle_book3s.S | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 74fc20431082..01b823bdb49c 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -163,8 +163,11 @@ core_idle_lock_held: bne-core_idle_lock_held blr -/* Reuse an unused pt_regs slot for IAMR */ +/* Reuse some unused pt_regs slots for AMR/IAMR/UAMOR/UAMOR */ +#define PNV_POWERSAVE_AMR _TRAP #define PNV_POWERSAVE_IAMR _DAR +#define PNV_POWERSAVE_UAMOR_DSISR +#define PNV_POWERSAVE_AMOR RESULT /* * Pass requested state in r3: @@ -198,8 +201,16 @@ pnv_powersave_common: SAVE_NVGPRS(r1) BEGIN_FTR_SECTION + mfspr r4, SPRN_AMR mfspr r5, SPRN_IAMR + mfspr r6, SPRN_UAMOR + std r4, PNV_POWERSAVE_AMR(r1) std r5, PNV_POWERSAVE_IAMR(r1) + std r6, PNV_POWERSAVE_UAMOR(r1) +BEGIN_FTR_SECTION_NESTED(42) + mfspr r7, SPRN_AMOR + std r7, PNV_POWERSAVE_AMOR(r1) +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mfcrr5 @@ -951,12 +962,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) REST_GPR(2, r1) BEGIN_FTR_SECTION - /* IAMR was saved in pnv_powersave_common() */ + /* These regs were saved in pnv_powersave_common() */ + ld r4, PNV_POWERSAVE_AMR(r1) ld r5, PNV_POWERSAVE_IAMR(r1) + ld r6, PNV_POWERSAVE_UAMOR(r1) + mtspr SPRN_AMR, r4 mtspr SPRN_IAMR, r5 + mtspr SPRN_UAMOR, r6 +BEGIN_FTR_SECTION_NESTED(42) + ld r7, PNV_POWERSAVE_AMOR(r1) + mtspr SPRN_AMOR, r7 +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) /* -* We don't need an isync here because the upcoming mtmsrd is -* execution synchronizing. +* We don't need an isync here after restoring IAMR because the upcoming +* mtmsrd is execution synchronizing. */ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) -- 2.20.1
Applied "ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe()" to the asoc tree
The patch ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe() has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 83b35f4586e235bfb785a7947b555ad8f3d96887 Mon Sep 17 00:00:00 2001 From: Tang Bin Date: Wed, 15 Apr 2020 12:45:13 +0800 Subject: [PATCH] ASoC: fsl_micfil: Omit superfluous error message in fsl_micfil_probe() In the function fsl_micfil_probe(), when get irq failed, the function platform_get_irq() logs an error message, so remove redundant message here. Signed-off-by: Tang Bin Signed-off-by: Shengju Zhang Link: https://lore.kernel.org/r/20200415044513.17492-1-tang...@cmss.chinamobile.com Signed-off-by: Mark Brown --- sound/soc/fsl/fsl_micfil.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c index f7f2d29f1bfe..e73bd6570a08 100644 --- a/sound/soc/fsl/fsl_micfil.c +++ b/sound/soc/fsl/fsl_micfil.c @@ -702,10 +702,8 @@ static int fsl_micfil_probe(struct platform_device *pdev) for (i = 0; i < MICFIL_IRQ_LINES; i++) { micfil->irq[i] = platform_get_irq(pdev, i); dev_err(&pdev->dev, "GET IRQ: %d\n", micfil->irq[i]); - if (micfil->irq[i] < 0) { - dev_err(&pdev->dev, "no irq for node %s\n", pdev->name); + if (micfil->irq[i] < 0) return micfil->irq[i]; - } } if (of_property_read_bool(np, "fsl,shared-interrupt")) -- 2.20.1
[PATCH 4/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr
Include "linux/of_address.h" to fix the compile error for mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c. CC arch/powerpc/sysdev/fsl_85xx_l2ctlr.o arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function ‘mpc85xx_l2ctlr_of_probe’: arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit declaration of function ‘of_iomap’; did you mean ‘pci_iomap’? [-Werror=implicit-function-declaration] l2ctlr = of_iomap(dev->dev.of_node, 0); ^~~~ pci_iomap arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] l2ctlr = of_iomap(dev->dev.of_node, 0); ^ cc1: all warnings being treated as errors scripts/Makefile.build:267: recipe for target 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1 Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: Wang Wenhu --- arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c index 2d0af0c517bb..7533572492f0 100644 --- a/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c +++ b/arch/powerpc/sysdev/fsl_85xx_l2ctlr.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include "fsl_85xx_cache_ctlr.h" -- 2.17.1
[PATCH 5/5] drivers: uio: new driver for fsl_85xx_cache_sram
A driver for freescale 85xx platforms to access the Cache-Sram form user level. This is extremely helpful for some user-space applications that require high performance memory accesses. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- drivers/uio/Kconfig | 8 ++ drivers/uio/Makefile | 1 + drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++ 3 files changed, 204 insertions(+) create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig index 202ee81cfc2b..afd38ec13de0 100644 --- a/drivers/uio/Kconfig +++ b/drivers/uio/Kconfig @@ -105,6 +105,14 @@ config UIO_NETX To compile this driver as a module, choose M here; the module will be called uio_netx. +config UIO_FSL_85XX_CACHE_SRAM + tristate "Freescale 85xx Cache-Sram driver" + depends on FSL_85XX_CACHE_SRAM + help + Generic driver for accessing the Cache-Sram form user level. This + is extremely helpful for some user-space applications that require + high performance memory accesses. + config UIO_FSL_ELBC_GPCM tristate "eLBC/GPCM driver" depends on FSL_LBC diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile index c285dd2a4539..be2056cffc21 100644 --- a/drivers/uio/Makefile +++ b/drivers/uio/Makefile @@ -10,4 +10,5 @@ obj-$(CONFIG_UIO_NETX)+= uio_netx.o obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o obj-$(CONFIG_UIO_MF624) += uio_mf624.o obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM) += uio_fsl_85xx_cache_sram.o obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c b/drivers/uio/uio_fsl_85xx_cache_sram.c new file mode 100644 index ..e11202dd5b93 --- /dev/null +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2020 Vivo Communication Technology Co. Ltd. + * Copyright (C) 2020 Wang Wenhu + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +#define DRIVER_VERSION "0.1.0" +#define DRIVER_NAME"uio_fsl_85xx_cache_sram" +#define UIO_NAME "uio_cache_sram" + +static const struct of_device_id uio_mpc85xx_l2ctlr_of_match[] = { + { .compatible = "uio,fsl,p2020-l2-cache-controller", }, + { .compatible = "uio,fsl,p2010-l2-cache-controller", }, + { .compatible = "uio,fsl,p1020-l2-cache-controller", }, + { .compatible = "uio,fsl,p1011-l2-cache-controller", }, + { .compatible = "uio,fsl,p1013-l2-cache-controller", }, + { .compatible = "uio,fsl,p1022-l2-cache-controller", }, + { .compatible = "uio,fsl,mpc8548-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8544-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8572-l2-cache-controller",}, + { .compatible = "uio,fsl,mpc8536-l2-cache-controller",}, + { .compatible = "uio,fsl,p1021-l2-cache-controller", }, + { .compatible = "uio,fsl,p1012-l2-cache-controller", }, + { .compatible = "uio,fsl,p1025-l2-cache-controller", }, + { .compatible = "uio,fsl,p1016-l2-cache-controller", }, + { .compatible = "uio,fsl,p1024-l2-cache-controller", }, + { .compatible = "uio,fsl,p1015-l2-cache-controller", }, + { .compatible = "uio,fsl,p1010-l2-cache-controller", }, + { .compatible = "uio,fsl,bsc9131-l2-cache-controller",}, + {}, +}; + +static void uio_info_free_internal(struct uio_info *info) +{ + struct uio_mem *uiomem = &info->mem[0]; + + while (uiomem < &info->mem[MAX_UIO_MAPS]) { + if (uiomem->size) { + mpc85xx_cache_sram_free(uiomem->internal_addr); + kfree(uiomem->name); + } + uiomem++; + } +} + +static int uio_fsl_85xx_cache_sram_probe(struct platform_device *pdev) +{ + struct device_node *parent = pdev->dev.of_node; + struct device_node *node = NULL; + struct uio_info *info; + struct uio_mem *uiomem; + const char *dt_name; + u32 mem_size; + u32 align; + void *virt; + phys_addr_t phys; + int ret = -ENODEV; + + /* alloc uio_info for one device */ + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) { + dev_err(&pdev->dev, "kzalloc uio_info failed\n"); + ret = -ENOMEM; +
[PATCH 3/5] powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram
Function instantiate_cache_sram should not be linked into the init section for its caller mpc85xx_l2ctlr_of_probe is none-__init. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: Wang Wenhu Warning information: MODPOST vmlinux.o WARNING: modpost: vmlinux.o(.text+0x1e540): Section mismatch in reference from the function mpc85xx_l2ctlr_of_probe() to the function .init.text:instantiate_cache_sram() The function mpc85xx_l2ctlr_of_probe() references the function __init instantiate_cache_sram(). This is often because mpc85xx_l2ctlr_of_probe lacks a __init annotation or the annotation of instantiate_cache_sram is wrong. --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index be3aef4229d7..3de5ac8382c0 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -68,7 +68,7 @@ void mpc85xx_cache_sram_free(void *ptr) } EXPORT_SYMBOL(mpc85xx_cache_sram_free); -int __init instantiate_cache_sram(struct platform_device *dev, +int instantiate_cache_sram(struct platform_device *dev, struct sram_parameters sram_params) { int ret = 0; -- 2.17.1
[PATCH 2/5] powerpc: sysdev: fix compile error for fsl_85xx_cache_sram
Include linux/io.h into fsl_85xx_cache_sram.c to fix the implicit-declaration compile error when building Cache-Sram. arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function ‘instantiate_cache_sram’: arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit declaration of function ‘ioremap_coherent’; did you mean ‘bitmap_complement’? [-Werror=implicit-function-declaration] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^~~~ bitmap_complement arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes pointer from integer without a cast [-Werror=int-conversion] cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys, ^ arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit declaration of function ‘iounmap’; did you mean ‘roundup’? [-Werror=implicit-function-declaration] iounmap(cache_sram->base_virt); ^~~ roundup cc1: all warnings being treated as errors Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support") Signed-off-by: Wang Wenhu --- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c index f6c665dac725..be3aef4229d7 100644 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "fsl_85xx_cache_ctlr.h" -- 2.17.1
[PATCH 0/5] drivers: uio: new driver uio_fsl_85xx_cache_sram
This series add a new uio driver for freescale 85xx platforms to access the Cache-Sram form user level. This is extremely helpful for the user-space applications that require high performance memory accesses. It fixes the compile errors and warning of the hardware level drivers and implements the uio driver in uio_fsl_85xx_cache_sram.c. Wang Wenhu (5): powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable powerpc: sysdev: fix compile error for fsl_85xx_cache_sram powerpc: sysdev: fix compile warning for fsl_85xx_cache_sram powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr drivers: uio: new driver for fsl_85xx_cache_sram arch/powerpc/platforms/85xx/Kconfig | 2 +- arch/powerpc/platforms/Kconfig.cputype| 5 +- arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 3 +- arch/powerpc/sysdev/fsl_85xx_l2ctlr.c | 1 + drivers/uio/Kconfig | 8 + drivers/uio/Makefile | 1 + drivers/uio/uio_fsl_85xx_cache_sram.c | 195 ++ 7 files changed, 211 insertions(+), 4 deletions(-) create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c -- 2.17.1
[PATCH 1/5] powerpc: 85xx: make FSL_85XX_CACHE_SRAM configurable
Enable FSL_85XX_CACHE_SRAM selection. On e500 platforms, the cache could be configured and used as a piece of SRAM which is hignly friendly for some user level application performances. Cc: Greg Kroah-Hartman Cc: Christophe Leroy Cc: Scott Wood Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Wang Wenhu --- arch/powerpc/platforms/85xx/Kconfig| 2 +- arch/powerpc/platforms/Kconfig.cputype | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig index fa3d29dcb57e..6debb4f1b9cc 100644 --- a/arch/powerpc/platforms/85xx/Kconfig +++ b/arch/powerpc/platforms/85xx/Kconfig @@ -17,7 +17,7 @@ if FSL_SOC_BOOKE if PPC32 config FSL_85XX_CACHE_SRAM - bool + bool "Freescale 85xx Cache-Sram" select PPC_LIB_RHEAP help When selected, this option enables cache-sram support diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c3c1902135c..1921e9a573e8 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 config PPC32 - bool + bool "32-bit kernel" default y if !PPC64 select KASAN_VMALLOC if KASAN && MODULES @@ -15,6 +15,7 @@ config PPC_BOOK3S_32 bool menu "Processor support" + choice prompt "Processor Type" depends on PPC32 @@ -211,9 +212,9 @@ config PPC_BOOK3E depends on PPC_BOOK3E_64 config E500 + bool "e500 Support" select FSL_EMB_PERFMON select PPC_FSL_BOOK3E - bool config PPC_E500MC bool "e500mc Support" -- 2.17.1
[PATCH AUTOSEL 4.9 06/21] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index b7f937563827d..d1fee2d35b49c 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -299,23 +299,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 4.14 10/30] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index b7f937563827d..d1fee2d35b49c 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -299,23 +299,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 4.19 10/40] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index b7f937563827d..d1fee2d35b49c 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -299,23 +299,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -372,3 +355,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 5.4 32/84] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 9cd6f3e1000b3..09a0594350b69 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -294,23 +294,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 5.4 31/84] powerpc/prom_init: Pass the "os-term" message to hypervisor
From: Alexey Kardashevskiy [ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ] The "os-term" RTAS calls has one argument with a message address of OS termination cause. rtas_os_term() already passes it but the recently added prom_init's version of that missed it; it also does not fill args correctly. This passes the message address and initializes the number of arguments. Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru Signed-off-by: Sasha Levin --- arch/powerpc/kernel/prom_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index eba9d4ee4baf6..689664cd4e79b 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1761,6 +1761,9 @@ static void __init prom_rtas_os_term(char *str) if (token == 0) prom_panic("Could not get token for ibm,os-term\n"); os_term_args.token = cpu_to_be32(token); + os_term_args.nargs = cpu_to_be32(1); + os_term_args.nret = cpu_to_be32(1); + os_term_args.args[0] = cpu_to_be32(__pa(str)); prom_rtas_hcall((uint64_t)&os_term_args); } #endif /* CONFIG_PPC_SVM */ -- 2.20.1
[PATCH AUTOSEL 5.4 21/84] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
From: Michael Roth [ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ] The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest via the guest/nested hypervisor. ./run-tests.sh -v ... TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm" FAIL h_cede_tm (2 tests, 1 unexpected failures) While the test relates to transactional memory instructions, the actual failure is due to the return code of the H_CEDE hypercall, which is reported as 224 instead of 0. This happens even when no TM instructions are issued. 224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3 is where the caller expects the return code to be placed upon return. In the case of guest running under a nested hypervisor, issuing H_CEDE causes a return from H_ENTER_NESTED. In this case H_CEDE is specially-handled immediately rather than later in kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to set the return code for the caller, hence why kvm-unit-test sees the 224 return code and reports an error. Guest kernels generally don't check the return value of H_CEDE, so that likely explains why this hasn't caused issues outside of kvm-unit-tests so far. Fix this by setting r3 to 0 after we finish processing the H_CEDE. RHBZ: 1778556 Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested") Cc: linuxppc-...@ozlabs.org Cc: David Gibson Cc: Paul Mackerras Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_hv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 36abbe3c346df..e2183fed947d4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3623,6 +3623,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && kvmppc_get_gpr(vcpu, 3) == H_CEDE) { kvmppc_nested_cede(vcpu); + kvmppc_set_gpr(vcpu, 3, 0); trap = 0; } } else { -- 2.20.1
[PATCH AUTOSEL 5.5 046/106] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 9cd6f3e1000b3..09a0594350b69 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -294,23 +294,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -367,3 +350,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 5.5 045/106] powerpc/prom_init: Pass the "os-term" message to hypervisor
From: Alexey Kardashevskiy [ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ] The "os-term" RTAS calls has one argument with a message address of OS termination cause. rtas_os_term() already passes it but the recently added prom_init's version of that missed it; it also does not fill args correctly. This passes the message address and initializes the number of arguments. Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru Signed-off-by: Sasha Levin --- arch/powerpc/kernel/prom_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 577345382b23f..673f13b87db13 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str) if (token == 0) prom_panic("Could not get token for ibm,os-term\n"); os_term_args.token = cpu_to_be32(token); + os_term_args.nargs = cpu_to_be32(1); + os_term_args.nret = cpu_to_be32(1); + os_term_args.args[0] = cpu_to_be32(__pa(str)); prom_rtas_hcall((uint64_t)&os_term_args); } #endif /* CONFIG_PPC_SVM */ -- 2.20.1
[PATCH AUTOSEL 5.5 031/106] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
From: Michael Roth [ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ] The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest via the guest/nested hypervisor. ./run-tests.sh -v ... TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm" FAIL h_cede_tm (2 tests, 1 unexpected failures) While the test relates to transactional memory instructions, the actual failure is due to the return code of the H_CEDE hypercall, which is reported as 224 instead of 0. This happens even when no TM instructions are issued. 224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3 is where the caller expects the return code to be placed upon return. In the case of guest running under a nested hypervisor, issuing H_CEDE causes a return from H_ENTER_NESTED. In this case H_CEDE is specially-handled immediately rather than later in kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to set the return code for the caller, hence why kvm-unit-test sees the 224 return code and reports an error. Guest kernels generally don't check the return value of H_CEDE, so that likely explains why this hasn't caused issues outside of kvm-unit-tests so far. Fix this by setting r3 to 0 after we finish processing the H_CEDE. RHBZ: 1778556 Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested") Cc: linuxppc-...@ozlabs.org Cc: David Gibson Cc: Paul Mackerras Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_hv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ef6aa63b071b3..a1d793b96d2b7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3628,6 +3628,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && kvmppc_get_gpr(vcpu, 3) == H_CEDE) { kvmppc_nested_cede(vcpu); + kvmppc_set_gpr(vcpu, 3, 0); trap = 0; } } else { -- 2.20.1
[PATCH AUTOSEL 5.6 053/129] powerpc/prom_init: Pass the "os-term" message to hypervisor
From: Alexey Kardashevskiy [ Upstream commit 74bb84e5117146fa73eb9d01305975c53022b3c3 ] The "os-term" RTAS calls has one argument with a message address of OS termination cause. rtas_os_term() already passes it but the recently added prom_init's version of that missed it; it also does not fill args correctly. This passes the message address and initializes the number of arguments. Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200312074404.87293-1-...@ozlabs.ru Signed-off-by: Sasha Levin --- arch/powerpc/kernel/prom_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 577345382b23f..673f13b87db13 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1773,6 +1773,9 @@ static void __init prom_rtas_os_term(char *str) if (token == 0) prom_panic("Could not get token for ibm,os-term\n"); os_term_args.token = cpu_to_be32(token); + os_term_args.nargs = cpu_to_be32(1); + os_term_args.nret = cpu_to_be32(1); + os_term_args.args[0] = cpu_to_be32(__pa(str)); prom_rtas_hcall((uint64_t)&os_term_args); } #endif /* CONFIG_PPC_SVM */ -- 2.20.1
[PATCH AUTOSEL 5.6 054/129] powerpc/maple: Fix declaration made after definition
From: Nathan Chancellor [ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers Suggested-by: Ilie Halip Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancel...@gmail.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/maple/setup.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/maple/setup.c b/arch/powerpc/platforms/maple/setup.c index 6f019df37916f..15b2c6eb506d0 100644 --- a/arch/powerpc/platforms/maple/setup.c +++ b/arch/powerpc/platforms/maple/setup.c @@ -291,23 +291,6 @@ static int __init maple_probe(void) return 1; } -define_machine(maple) { - .name = "Maple", - .probe = maple_probe, - .setup_arch = maple_setup_arch, - .init_IRQ = maple_init_IRQ, - .pci_irq_fixup = maple_pci_irq_fixup, - .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, - .restart= maple_restart, - .halt = maple_halt, - .get_boot_time = maple_get_boot_time, - .set_rtc_time = maple_set_rtc_time, - .get_rtc_time = maple_get_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = maple_progress, - .power_save = power4_idle, -}; - #ifdef CONFIG_EDAC /* * Register a platform device for CPC925 memory controller on @@ -364,3 +347,20 @@ static int __init maple_cpc925_edac_setup(void) } machine_device_initcall(maple, maple_cpc925_edac_setup); #endif + +define_machine(maple) { + .name = "Maple", + .probe = maple_probe, + .setup_arch = maple_setup_arch, + .init_IRQ = maple_init_IRQ, + .pci_irq_fixup = maple_pci_irq_fixup, + .pci_get_legacy_ide_irq = maple_pci_get_legacy_ide_irq, + .restart= maple_restart, + .halt = maple_halt, + .get_boot_time = maple_get_boot_time, + .set_rtc_time = maple_set_rtc_time, + .get_rtc_time = maple_get_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = maple_progress, + .power_save = power4_idle, +}; -- 2.20.1
[PATCH AUTOSEL 5.6 039/129] KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests
From: Michael Roth [ Upstream commit 1f50cc1705350a4697923203fedd7d8fb1087fe2 ] The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest via the guest/nested hypervisor. ./run-tests.sh -v ... TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm" FAIL h_cede_tm (2 tests, 1 unexpected failures) While the test relates to transactional memory instructions, the actual failure is due to the return code of the H_CEDE hypercall, which is reported as 224 instead of 0. This happens even when no TM instructions are issued. 224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3 is where the caller expects the return code to be placed upon return. In the case of guest running under a nested hypervisor, issuing H_CEDE causes a return from H_ENTER_NESTED. In this case H_CEDE is specially-handled immediately rather than later in kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to set the return code for the caller, hence why kvm-unit-test sees the 224 return code and reports an error. Guest kernels generally don't check the return value of H_CEDE, so that likely explains why this hasn't caused issues outside of kvm-unit-tests so far. Fix this by setting r3 to 0 after we finish processing the H_CEDE. RHBZ: 1778556 Fixes: 4bad77799fed ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested") Cc: linuxppc-...@ozlabs.org Cc: David Gibson Cc: Paul Mackerras Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_hv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2cefd071b8483..c0c43a7338304 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3616,6 +3616,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && kvmppc_get_gpr(vcpu, 3) == H_CEDE) { kvmppc_nested_cede(vcpu); + kvmppc_set_gpr(vcpu, 3, 0); trap = 0; } } else { -- 2.20.1
Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line
On 15/04/2020 16:18, Christoph Hellwig wrote: > On Wed, Apr 15, 2020 at 12:26:04PM +1000, Alexey Kardashevskiy wrote: >> May be this is correct and allowed (no idea) but removing exported >> symbols at least deserves a mention in the commit log, does not it? >> >> The rest of the series is fine and works. Thanks, > > Maybe I can throw in a line, but the point is that dma_direct_* > was exported as dma_* called them inline. Now dma_* is out of line > and exported instead, which always was the actual API. They become inline in 2/4. And the fact they were exported leaves possibility that there is a driver somewhere relying on these symbols or distro kernel won't build because the symbol disappeared from exports (I do not know what KABI guarantees or if mainline kernel cares). I do not care in particular but some might, a line separated with empty lines in the commit log would do. -- Alexey
Re: [PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property
On Wed, Apr 08, 2020 at 03:33:53PM +0530, Aishwarya R wrote: > Use of_property_read_u32 to read the "reg" and "i2c-address" property > instead of using of_get_property to check the return values. > > Signed-off-by: Aishwarya R This is quite a fragile driver. Have you tested it on HW? signature.asc Description: PGP signature
Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings
Hi Nick, On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote: > For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings, > have vmalloc attempt to allocate PMD-sized pages first, before falling back > to small pages. Allocations which use something other than PAGE_KERNEL > protections are not permitted to use huge pages yet, not all callers expect > this (e.g., module allocations vs strict module rwx). > > This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from > 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9. I wonder if it's worth extending vmap() to handle higher order pages in a similar way? That might be helpful for tracing PMUs such as Arm SPE, where the CPU streams tracing data out to a virtually addressed buffer (see rb_alloc_aux_page()). > This can result in more internal fragmentation and memory overhead for a > given allocation. It can also cause greater NUMA unbalance on hashdist > allocations. > > There may be other callers that expect small pages under vmalloc but use > PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An > alternative would be a new function or flag which enables large mappings, > and use that in callers. > > Signed-off-by: Nicholas Piggin > --- > include/linux/vmalloc.h | 2 + > mm/vmalloc.c| 135 +--- > 2 files changed, 102 insertions(+), 35 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 291313a7e663..853b82eac192 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -24,6 +24,7 @@ struct notifier_block; /* in notifier.h */ > #define VM_UNINITIALIZED 0x0020 /* vm_struct is not fully > initialized */ > #define VM_NO_GUARD 0x0040 /* don't add guard page */ > #define VM_KASAN 0x0080 /* has allocated kasan shadow > memory */ > +#define VM_HUGE_PAGES0x0100 /* may use huge pages */ Please can you add a check for this in the arm64 change_memory_common() code? Other architectures might need something similar, but we need to forbid changing memory attributes for portions of the huge page. In general, I'm a bit wary of software table walkers tripping over this. For example, I don't think apply_to_existing_page_range() can handle huge mappings at all, but the one user (KASAN) only ever uses page mappings so it's ok there. > @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned > long size, > if (unlikely(!size)) > return NULL; > > - if (flags & VM_IOREMAP) > - align = 1ul << clamp_t(int, get_count_order_long(size), > -PAGE_SHIFT, IOREMAP_MAX_ORDER); > + if (flags & VM_IOREMAP) { > + align = max(align, > + 1ul << clamp_t(int, get_count_order_long(size), > +PAGE_SHIFT, IOREMAP_MAX_ORDER)); > + } I don't follow this part. Please could you explain why you're potentially aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest of the patch. Cheers, Will
Re: [PATCH v2] Fix: buffer overflow during hvc_alloc().
On Tue, Apr 14, 2020 at 10:15:03PM +0300, and...@daynix.com wrote: > From: Andrew Melnychenko > > If there is a lot(more then 16) of virtio-console devices > or virtio_console module is reloaded > - buffers 'vtermnos' and 'cons_ops' are overflowed. > In older kernels it overruns spinlock which leads to kernel freezing: > https://bugzilla.redhat.com/show_bug.cgi?id=1786239 > > To reproduce the issue, you can try simple script that > loads/unloads module. Something like this: > while [ 1 ] > do > modprobe virtio_console > sleep 2 > modprobe -r virtio_console > sleep 2 > done > > Description of problem: > Guest get 'Call Trace' when loading module "virtio_console" > and unloading it frequently - clearly reproduced on kernel-4.18.0: > > [ 81.498208] [ cut here ] > [ 81.499263] pvqspinlock: lock 0x92080020 has corrupted value > 0xc0774ca0! > [ 81.501000] WARNING: CPU: 0 PID: 785 at > kernel/locking/qspinlock_paravirt.h:500 > __pv_queued_spin_unlock_slowpath+0xc0/0xd0 > [ 81.503173] Modules linked in: virtio_console fuse xt_CHECKSUM > ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref > nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 > nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct > nf_tables_set nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 > nf_nat_ipv4 nf_nat nf_conntrack nft_chain_route_ipv4 ip6_tables nft_compat > ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper ttm > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 > pcspkr crct10dif_pclmul crc32_pclmul joydev ghash_clmulni_intel ip_tables xfs > libcrc32c sd_mod sg ata_generic ata_piix virtio_net libata crc32c_intel > net_failover failover serio_raw virtio_scsi dm_mirror dm_region_hash dm_log > dm_mod [last unloaded: virtio_console] > [ 81.517019] CPU: 0 PID: 785 Comm: kworker/0:2 Kdump: loaded Not tainted > 4.18.0-167.el8.x86_64 #1 > [ 81.518639] Hardware name: Red Hat KVM, BIOS > 1.12.0-5.scrmod+el8.2.0+5159+d8aa4d83 04/01/2014 > [ 81.520205] Workqueue: events control_work_handler [virtio_console] > [ 81.521354] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xc0/0xd0 > [ 81.522450] Code: 07 00 48 63 7a 10 e8 bf 64 f5 ff 66 90 c3 8b 05 e6 cf d6 > 01 85 c0 74 01 c3 8b 17 48 89 fe 48 c7 c7 38 4b 29 91 e8 3a 6c fa ff <0f> 0b > c3 0f 0b 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 > [ 81.525830] RSP: 0018:b51a01ffbd70 EFLAGS: 00010282 > [ 81.526798] RAX: RBX: 0010 RCX: > > [ 81.528110] RDX: 9e66f1826480 RSI: 9e66f1816a08 RDI: > 9e66f1816a08 > [ 81.529437] RBP: 9153ff10 R08: 026c R09: > 0053 > [ 81.530732] R10: R11: b51a01ffbc18 R12: > 9e66cd682200 > [ 81.532133] R13: 9153ff10 R14: 9e6685569500 R15: > 9e66cd682000 > [ 81.533442] FS: () GS:9e66f180() > knlGS: > [ 81.534914] CS: 0010 DS: ES: CR0: 80050033 > [ 81.535971] CR2: 5624c55b14d0 CR3: 0003a023c000 CR4: > 003406f0 > [ 81.537283] Call Trace: > [ 81.537763] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20 > [ 81.539011] .slowpath+0x9/0xe > [ 81.539585] hvc_alloc+0x25e/0x300 > [ 81.540237] init_port_console+0x28/0x100 [virtio_console] > [ 81.541251] handle_control_message.constprop.27+0x1c4/0x310 > [virtio_console] > [ 81.542546] control_work_handler+0x70/0x10c [virtio_console] > [ 81.543601] process_one_work+0x1a7/0x3b0 > [ 81.544356] worker_thread+0x30/0x390 > [ 81.545025] ? create_worker+0x1a0/0x1a0 > [ 81.545749] kthread+0x112/0x130 > [ 81.546358] ? kthread_flush_work_fn+0x10/0x10 > [ 81.547183] ret_from_fork+0x22/0x40 > [ 81.547842] ---[ end trace aa97649bd16c8655 ]--- > [ 83.546539] general protection fault: [#1] SMP NOPTI > [ 83.547422] CPU: 5 PID: 3225 Comm: modprobe Kdump: loaded Tainted: G > W- - - 4.18.0-167.el8.x86_64 #1 > [ 83.549191] Hardware name: Red Hat KVM, BIOS > 1.12.0-5.scrmod+el8.2.0+5159+d8aa4d83 04/01/2014 > [ 83.550544] RIP: 0010:__pv_queued_spin_lock_slowpath+0x19a/0x2a0 > [ 83.551504] Code: c4 c1 ea 12 41 be 01 00 00 00 4c 8d 6d 14 41 83 e4 03 8d > 42 ff 49 c1 e4 05 48 98 49 81 c4 40 a5 02 00 4c 03 24 c5 60 48 34 91 <49> 89 > 2c 24 b8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 14 84 > [ 83.554449] RSP: 0018:b51a0323fdb0 EFLAGS: 00010202 > [ 83.555290] RAX: 301c RBX: 92080020 RCX: > 0001 > [ 83.556426] RDX: 301d RSI: RDI: > > [ 83.557556] RBP: 9e66f196a540 R08: 028a R09: > 9e66d2757788 > [ 83.558688] R10: R11: R12: