RE: [PATCH] powerpc: mitigate impact of decrementer reset
Paul, what if your tb wraps during the test? -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+heinz.wrobel=freescale@lists.ozlabs.org] On Behalf Of Paul Clarke Sent: Tuesday, October 07, 2014 21:13 To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH] powerpc: mitigate impact of decrementer reset The POWER ISA defines an always-running decrementer which can be used to schedule interrupts after a certain time interval has elapsed. The decrementer counts down at the same frequency as the Time Base, which is 512 MHz. The maximum value of the decrementer is 0x7fff. This works out to a maximum interval of about 4.19 seconds. If a larger interval is desired, the kernel will set the decrementer to its maximum value and reset it after it expires (underflows) a sufficient number of times until the desired interval has elapsed. The negative effect of this is that an unwanted latency spike will impact normal processing at most every 4.19 seconds. On an IBM POWER8-based system, this spike was measured at about 25-30 microseconds, much of which was basic, opportunistic housekeeping tasks that could otherwise have waited. This patch short-circuits the reset of the decrementer, exiting after the decrementer reset, but before the housekeeping tasks if the only need for the interrupt is simply to reset it. After this patch, the latency spike was measured at about 150 nanoseconds. Signed-off-by: Paul A. Clarke p...@us.ibm.com --- arch/powerpc/kernel/time.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 368ab37..962a06b 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -528,6 +528,7 @@ void timer_interrupt(struct pt_regs * regs) { struct pt_regs *old_regs; u64 *next_tb = __get_cpu_var(decrementers_next_tb); + u64 now; /* Ensure a positive value is written to the decrementer, or else * some CPUs will continue to take decrementer exceptions. @@ -550,6 +551,18 @@ void timer_interrupt(struct pt_regs * regs) */ may_hard_irq_enable(); + /* If this is simply the decrementer expiring (underflow) due to + * the limited size of the decrementer, and not a set timer, + * reset (if needed) and return + */ + now = get_tb_or_rtc(); + if (now *next_tb) { What if now and *next_tb are not on the same wrap count? They are both modulo values AFACS. Shouldn't this be right here more like a if ((*next_tb - now) 2^63) style test to check for deltas within the range instead of absolute values? + now = *next_tb - now; + if (now = DECREMENTER_MAX) + set_dec((int)now); + __get_cpu_var(irq_stat).timer_irqs_others++; + return; + } #if defined(CONFIG_PPC32) defined(CONFIG_PPC_PMAC) if (atomic_read(ppc_n_lost_interrupts) != 0) -- 2.1.2.330.g565301e BR, Heinz ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 4/7] sound/radeon: Add quirk for broken 64-bit MSI
On Wed, Oct 8, 2014 at 1:28 AM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Tue, 2014-10-07 at 19:47 -0400, Alex Deucher wrote: This moves the setting of the quirk flag to the audio driver. While recent ASICs have that problem fixed, they don't seem to be listed in the PCI IDs of the current driver, so let's quirk all the ATI HDMI for now. The consequences are nil on x86 anyway. Signed-off-by: Alex Deucher alexdeuc...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org CC: sta...@vger.kernel.org Further discussion with the hw teams have revealed that this is still an issue on newer asics so I think your original patch is correct after all. Just disable 64 bit MSIs on all AMD audio PCI ids. Allright, I won't resend the whole series, I can just pickup my previous patch. Takashi, Bjorn, Dave, this series covers your 3 areas of maintainership, how do you want to proceed ? I'm happy to merge the whole lot via powerpc ASAP (since it's all CC'ed stable) if you guys send me the appropriate acks, otherwise, let me know. I don't remember if I gave my formal review of your original patch, so if not, Reviewed-by: Alex Deucher alexander.deuc...@amd.com Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 4/7] sound/radeon: Add quirk for broken 64-bit MSI
On Tue, 2014-10-07 at 19:47 -0400, Alex Deucher wrote: This moves the setting of the quirk flag to the audio driver. While recent ASICs have that problem fixed, they don't seem to be listed in the PCI IDs of the current driver, so let's quirk all the ATI HDMI for now. The consequences are nil on x86 anyway. Signed-off-by: Alex Deucher alexdeuc...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org CC: sta...@vger.kernel.org Further discussion with the hw teams have revealed that this is still an issue on newer asics so I think your original patch is correct after all. Just disable 64 bit MSIs on all AMD audio PCI ids. Allright, I won't resend the whole series, I can just pickup my previous patch. Takashi, Bjorn, Dave, this series covers your 3 areas of maintainership, how do you want to proceed ? I'm happy to merge the whole lot via powerpc ASAP (since it's all CC'ed stable) if you guys send me the appropriate acks, otherwise, let me know. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 4/7] sound/radeon: Add quirk for broken 64-bit MSI
At Wed, 08 Oct 2014 16:28:16 +1100, Benjamin Herrenschmidt wrote: On Tue, 2014-10-07 at 19:47 -0400, Alex Deucher wrote: This moves the setting of the quirk flag to the audio driver. While recent ASICs have that problem fixed, they don't seem to be listed in the PCI IDs of the current driver, so let's quirk all the ATI HDMI for now. The consequences are nil on x86 anyway. Signed-off-by: Alex Deucher alexdeuc...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org CC: sta...@vger.kernel.org Further discussion with the hw teams have revealed that this is still an issue on newer asics so I think your original patch is correct after all. Just disable 64 bit MSIs on all AMD audio PCI ids. Allright, I won't resend the whole series, I can just pickup my previous patch. Takashi, Bjorn, Dave, this series covers your 3 areas of maintainership, how do you want to proceed ? I'm happy to merge the whole lot via powerpc ASAP (since it's all CC'ed stable) if you guys send me the appropriate acks, otherwise, let me know. Feel free to merge through your tree. Reviewed-by: Takashi Iwai ti...@suse.de thanks, Takashi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 08/44] kernel: Move pm_power_off to common code
On Tue, Oct 07, 2014 at 07:28:10AM +0200, Guenter Roeck wrote: pm_power_off is defined for all architectures. Move it to common code. Have all architectures call do_kernel_poweroff instead of pm_power_off. Some architectures point pm_power_off to machine_power_off. For those, call do_kernel_poweroff from machine_power_off instead. For the CRIS parts: arch/cris/kernel/process.c | 4 +--- Acked-by: Jesper Nilsson jesper.nils...@axis.com /^JN - Jesper Nilsson -- Jesper Nilsson -- jesper.nils...@axis.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Reimplement __get_SP() as a function not a define
On 三, 2014-10-01 at 15:10 +1000, Anton Blanchard wrote: Li Zhong points out an issue with our current __get_SP() implementation. If ftrace function tracing is enabled (ie -pg profiling using _mcount) we spill a stack frame on 64bit all the time. If a function calls __get_SP() and later calls a function that is tail call optimised, we will pop the stack frame and the value returned by __get_SP() is no longer valid. An example from Li can be found in save_stack_trace - save_context_stack: c00432c0 .save_stack_trace: c00432c0: mflrr0 c00432c4: std r0,16(r1) c00432c8: stdur1,-128(r1) -- stack frame for _mcount c00432cc: std r3,112(r1) c00432d0: bl ._mcount c00432d4: nop c00432d8: mr r4,r1 -- __get_SP() c00432dc: ld r5,632(r13) c00432e0: ld r3,112(r1) c00432e4: li r6,1 c00432e8: addir1,r1,128 -- pop stack frame c00432ec: ld r0,16(r1) c00432f0: mtlrr0 c00432f4: b .save_context_stack -- tail call optimized save_context_stack ends up with a stack pointer below the current one, and it is likely to be scribbled over. Fix this by making __get_SP() a function which returns the callers stack frame. Also replace inline assembly which grabs the stack pointer in save_stack_trace and show_stack with __get_SP(). Reported-by: Li Zhong zh...@linux.vnet.ibm.com Signed-off-by: Anton Blanchard an...@samba.org --- arch/powerpc/include/asm/reg.h | 3 +-- arch/powerpc/kernel/misc.S | 4 arch/powerpc/kernel/process.c| 2 +- arch/powerpc/kernel/stacktrace.c | 2 +- 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 0c05059..0f973c0 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1264,8 +1264,7 @@ static inline unsigned long mfvtb (void) #define proc_trap() asm volatile(trap) -#define __get_SP() ({unsigned long sp; \ - asm volatile(mr %0,1: =r (sp)); sp;}) +extern unsigned long __get_SP(void); It seems that some module code is using __get_SP, e.g. xfs in the example below: ERROR: .__get_SP [fs/xfs/xfs.ko] undefined! Maybe we need export this symbol in arch/powerpc/kernel/ppc_ksyms.c? diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 48d17d6f..eebd4e4 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -207,3 +207,5 @@ EXPORT_SYMBOL_GPL(mmu_psize_defs); #ifdef CONFIG_EPAPR_PARAVIRT EXPORT_SYMBOL(epapr_hypercall_start); #endif + +EXPORT_SYMBOL(__get_SP); With the above compiling error fixed, this patch solved the SP issue I saw, so Tested-by: Li Zhong zh...@linux.vnet.ibm.com extern unsigned long scom970_read(unsigned int address); extern void scom970_write(unsigned int address, unsigned long value); diff --git a/arch/powerpc/kernel/misc.S b/arch/powerpc/kernel/misc.S index 7ce26d4..120deb7 100644 --- a/arch/powerpc/kernel/misc.S +++ b/arch/powerpc/kernel/misc.S @@ -114,3 +114,7 @@ _GLOBAL(longjmp) mtlrr0 mr r3,r4 blr + +_GLOBAL(__get_SP) + PPC_LL r3,0(r1) + blr diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index aa1df89..3cc6439 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1545,7 +1545,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) tsk = current; if (sp == 0) { if (tsk == current) - asm(mr %0,1 : =r (sp)); + sp = __get_SP(); else sp = tsk-thread.ksp; } diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index 3d30ef1..7f65bae 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -50,7 +50,7 @@ void save_stack_trace(struct stack_trace *trace) { unsigned long sp; - asm(mr %0,1 : =r (sp)); + sp = __get_SP(); save_context_stack(trace, sp, current, 1); } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 0/16] POWER8 Coherent Accelerator device driver
This is the latest version of the cxl driver. Change log below: v4: - Updates based on comments from mpe (offline and online). - Refactor the sstp lock to be an entry lock. - Fixed error paths on new status_mutex in start_work - added some missing include files - moved associating pid/mm from open() to start_work ioctl. - improved IDR setup and destroy - fix block comments. - remove #undef at top of files - wed - work_element_descriptor on user visible interfaces - Lots of documentation updates. - Device name changes. - No longer has a default dev name /dev/afuM.N for each mode. - Dedicated, slave and master all have distinct char devs. - Prevent AFU reset when contexts active. - Endian bug fix for find_free_sste(). - Fix locking on reset_store_afu. - Make CXL_IOCTL_GET_PROCESS_ELEMENT return a __u32 instead of int. - Rename event.afu_err.err to error - Fixed master specific sysfs attribute creation - fix sparse errors with debugfs. Was passing iomem ptrs to userspace. v3: - Updates based on comments from mpe, benh, aneesh and offline reviews. - Fixed bug freeing AFU IRQs that also freed the multiplexed PSL IRQ - Change copro_flush_all_slbs to a static inline as suggested by mpe - Implement sanitisation routines to clear out more registers and do full adapter wide tlbia and slbia when initialising hardware - Add self testcase to msi_bitmap to test allocations are aligned to a power of 2 and cleanup comment as suggested by mpe - Clean up cxl_use_count - Split out detach_process_native into two logical functions - Improve comment in set_msi_irq_chip as requested by mpe - Move cxl functions in pci-ioda.c to be under just one #ifdef CONFIG_CXL_BASE - Cleanup hash_page and hash_page_mm from mpes and Aneesh' reviews - Remove dead code in cxl_alloc_sst - Add timeout in afu_slbia_native - Remove cxl backend and driver ops abstractions - Removed separate cxl-pci module - Merged cxl pci module init calls into main driver init - Refactor afu_read() to be a bit simpler and more closely follow exising patterns in the kernel - Userspace API updates from reviews: - Added ioctl to get the process element number, and removed it as a return from the start work ioctl - Alter cxl_event to have one common header struct - Dropped check error ioctl - Added current and binary compatible API version numbers to sysfs - read() now takes a 4K (or greater) buffer - Pack event structs to reduce unecessary reserved fields - Event sizes can now differ - All event sizes are 64bit multiples to allow future event coalescing - Add flags fields to indicate which fields contain valid data - Add BUILD_BUG_ONs to protect against inadvertantly changing API without bumping version number and/or flags - Update documentation - Skip CXL SLBIA codepath if CXL is not in use - Split cxl_slbia_core into two functions to be easier to read - Refactor copro_data_segment (renamed to copro_calc_slb) since we are no longer merging with hash_page and cleaned up parameters. - Some renames: - struct cxl_t - struct cxl - struct cxl_afu_t - struct cxl_afu - struct cxl_context_t - struct cxl_context - copro_data_segment - copro_calc_slb - ctx-ph - ctx-pe - Added ctx-status mutex lock around for start and release context v2: - Updates based on comments from, Anton, Gavin, Aneesh, jk and offline reviews - Simplified copro_data_segment() and merged code with hash_page_mm() (New patch 10/17) - PCIe code simplifications based on Gavin's review - Removed redundant comment in msi_bitmap_alloc_hwirqs() - Fix for locking in idr_remove in core driver - Ensure PSL is enabled when PHB is flipped to CXL mode - Added CONFIG_PPC_COPRO_BASE to compile copro_fault.c - Merged SPU and cxl slb flushing calls into copro_flush_all_slbs() (New patch 03/17) - Moved slb_vsid_shift() to static inline from #define - Don't write paca-context when demoting segments and mm != current - Fix minor typos in documentation v1: - Initial post This add support for the Coherent Accelerator (cxl) attached to POWER8 processors. This coherent accelerator interface is designed to allow the coherent connection of FPGA based accelerators (and other devices) to a POWER systems. IBM refers to this as the Coherent Accelerator Processor Interface or CAPI. In this driver it's referred to by the name cxl to avoid confusion with the ISDN CAPI subsystem. An overview of the patches: Patches 1-3: Split some of the old Cell co-processor code out so it can be reused. Patches 4-10: Add infrastructure to arch/powerpc needed by cxl. Patches 11: Add call backs needed for invalidating cxl mm contexts. Patch12: Add cxl specific support that needs to be built in to the kernel (can't be a module). Patches 13-15: Add the majority of the device driver and API header. Patch16: Documentation. The documentation
[PATCH v4 01/16] powerpc/cell: Move spu_handle_mm_fault() out of cell platform
From: Ian Munsie imun...@au1.ibm.com Currently spu_handle_mm_fault() is in the cell platform. This code is generically useful for other non-cell co-processors on powerpc. This patch moves this function out of the cell platform into arch/powerpc/mm so that others may use it. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/Kconfig | 4 arch/powerpc/include/asm/copro.h | 16 arch/powerpc/include/asm/spu.h | 5 ++--- arch/powerpc/mm/Makefile | 1 + .../{platforms/cell/spu_fault.c = mm/copro_fault.c} | 14 ++ arch/powerpc/platforms/cell/Kconfig | 1 + arch/powerpc/platforms/cell/Makefile | 2 +- arch/powerpc/platforms/cell/spufs/fault.c| 4 ++-- 8 files changed, 33 insertions(+), 14 deletions(-) create mode 100644 arch/powerpc/include/asm/copro.h rename arch/powerpc/{platforms/cell/spu_fault.c = mm/copro_fault.c} (89%) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4bc7b62..8f094e9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -603,6 +603,10 @@ config PPC_SUBPAGE_PROT to set access permissions (read/write, readonly, or no access) on the 4k subpages of each 64k page. +config PPC_COPRO_BASE + bool + default n + config SCHED_SMT bool SMT (Hyperthreading) scheduler support depends on PPC64 SMP diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h new file mode 100644 index 000..51cae85 --- /dev/null +++ b/arch/powerpc/include/asm/copro.h @@ -0,0 +1,16 @@ +/* + * Copyright 2014 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_POWERPC_COPRO_H +#define _ASM_POWERPC_COPRO_H + +int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, + unsigned long dsisr, unsigned *flt); + +#endif /* _ASM_POWERPC_COPRO_H */ diff --git a/arch/powerpc/include/asm/spu.h b/arch/powerpc/include/asm/spu.h index 37b7ca3..a6e6e2b 100644 --- a/arch/powerpc/include/asm/spu.h +++ b/arch/powerpc/include/asm/spu.h @@ -27,6 +27,8 @@ #include linux/workqueue.h #include linux/device.h #include linux/mutex.h +#include asm/reg.h +#include asm/copro.h #define LS_SIZE (256 * 1024) #define LS_ADDR_MASK (LS_SIZE - 1) @@ -277,9 +279,6 @@ void spu_remove_dev_attr(struct device_attribute *attr); int spu_add_dev_attr_group(struct attribute_group *attrs); void spu_remove_dev_attr_group(struct attribute_group *attrs); -int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea, - unsigned long dsisr, unsigned *flt); - /* * Notifier blocks: * diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index d0130ff..325e861 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -34,3 +34,4 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o obj-$(CONFIG_HIGHMEM) += highmem.o +obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o diff --git a/arch/powerpc/platforms/cell/spu_fault.c b/arch/powerpc/mm/copro_fault.c similarity index 89% rename from arch/powerpc/platforms/cell/spu_fault.c rename to arch/powerpc/mm/copro_fault.c index 641e727..ba7df14 100644 --- a/arch/powerpc/platforms/cell/spu_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -1,5 +1,5 @@ /* - * SPU mm fault handler + * CoProcessor (SPU/AFU) mm fault handler * * (C) Copyright IBM Deutschland Entwicklung GmbH 2007 * @@ -23,16 +23,14 @@ #include linux/sched.h #include linux/mm.h #include linux/export.h - -#include asm/spu.h -#include asm/spu_csa.h +#include asm/reg.h /* * This ought to be kept in sync with the powerpc specific do_page_fault * function. Currently, there are a few corner cases that we haven't had * to handle fortunately. */ -int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea, +int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, unsigned long dsisr, unsigned *flt) { struct vm_area_struct *vma; @@ -58,12 +56,12 @@ int spu_handle_mm_fault(struct mm_struct *mm, unsigned long ea, goto out_unlock; } - is_write = dsisr MFC_DSISR_ACCESS_PUT; + is_write = dsisr DSISR_ISSTORE; if (is_write) { if (!(vma-vm_flags VM_WRITE)) goto out_unlock; } else { - if (dsisr MFC_DSISR_ACCESS_DENIED) + if (dsisr DSISR_PROTFAULT) goto out_unlock;
[PATCH v4 02/16] powerpc/cell: Move data segment faulting code out of cell platform
From: Ian Munsie imun...@au1.ibm.com __spu_trap_data_seg() currently contains code to determine the VSID and ESID required for a particular EA and mm struct. This code is generically useful for other co-processors. This moves the code of the cell platform so it can be used by other powerpc code. It also adds 1TB segment handling which Cell didn't support. The new function is called copro_calculate_slb(). This also moves the internal struct spu_slb to a generic struct copro_slb which is now used in the Cell and copro code. We use this new struct instead of passing around esid and vsid parameters. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/copro.h | 7 + arch/powerpc/include/asm/mmu-hash64.h | 7 + arch/powerpc/mm/copro_fault.c | 46 arch/powerpc/mm/slb.c | 3 -- arch/powerpc/platforms/cell/spu_base.c | 55 ++ 5 files changed, 69 insertions(+), 49 deletions(-) diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h index 51cae85..b0e6a18 100644 --- a/arch/powerpc/include/asm/copro.h +++ b/arch/powerpc/include/asm/copro.h @@ -10,7 +10,14 @@ #ifndef _ASM_POWERPC_COPRO_H #define _ASM_POWERPC_COPRO_H +struct copro_slb +{ + u64 esid, vsid; +}; + int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, unsigned long dsisr, unsigned *flt); +int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb); + #endif /* _ASM_POWERPC_COPRO_H */ diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index d765144..aeabd02 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -190,6 +190,13 @@ static inline unsigned int mmu_psize_to_shift(unsigned int mmu_psize) #ifndef __ASSEMBLY__ +static inline int slb_vsid_shift(int ssize) +{ + if (ssize == MMU_SEGSIZE_256M) + return SLB_VSID_SHIFT; + return SLB_VSID_SHIFT_1T; +} + static inline int segment_shift(int ssize) { if (ssize == MMU_SEGSIZE_256M) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index ba7df14..a15a23e 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -24,6 +24,7 @@ #include linux/mm.h #include linux/export.h #include asm/reg.h +#include asm/copro.h /* * This ought to be kept in sync with the powerpc specific do_page_fault @@ -90,3 +91,48 @@ out_unlock: return ret; } EXPORT_SYMBOL_GPL(copro_handle_mm_fault); + +int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb) +{ + u64 vsid; + int psize, ssize; + + slb-esid = (ea ESID_MASK) | SLB_ESID_V; + + switch (REGION_ID(ea)) { + case USER_REGION_ID: + pr_devel(%s: 0x%llx -- USER_REGION_ID\n, __func__, ea); + psize = get_slice_psize(mm, ea); + ssize = user_segment_size(ea); + vsid = get_vsid(mm-context.id, ea, ssize); + break; + case VMALLOC_REGION_ID: + pr_devel(%s: 0x%llx -- VMALLOC_REGION_ID\n, __func__, ea); + if (ea VMALLOC_END) + psize = mmu_vmalloc_psize; + else + psize = mmu_io_psize; + ssize = mmu_kernel_ssize; + vsid = get_kernel_vsid(ea, mmu_kernel_ssize); + break; + case KERNEL_REGION_ID: + pr_devel(%s: 0x%llx -- KERNEL_REGION_ID\n, __func__, ea); + psize = mmu_linear_psize; + ssize = mmu_kernel_ssize; + vsid = get_kernel_vsid(ea, mmu_kernel_ssize); + break; + default: + pr_debug(%s: invalid region access at %016llx\n, __func__, ea); + return 1; + } + + vsid = (vsid slb_vsid_shift(ssize)) | SLB_VSID_USER; + + vsid |= mmu_psize_defs[psize].sllp | + ((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0); + + slb-vsid = vsid; + + return 0; +} +EXPORT_SYMBOL_GPL(copro_calculate_slb); diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 0399a67..6e450ca 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -46,9 +46,6 @@ static inline unsigned long mk_esid_data(unsigned long ea, int ssize, return (ea slb_esid_mask(ssize)) | SLB_ESID_V | slot; } -#define slb_vsid_shift(ssize) \ - ((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T) - static inline unsigned long mk_vsid_data(unsigned long ea, int ssize, unsigned long flags) { diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c index 2930d1e..ffcbd24 100644 --- a/arch/powerpc/platforms/cell/spu_base.c +++ b/arch/powerpc/platforms/cell/spu_base.c @@
[PATCH v4 03/16] powerpc/cell: Make spu_flush_all_slbs() generic
From: Ian Munsie imun...@au1.ibm.com This moves spu_flush_all_slbs() into a generic call copro_flush_all_slbs(). This will be useful when we add cxl which also needs a similar SLB flush call. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/copro.h | 6 ++ arch/powerpc/mm/copro_fault.c| 9 + arch/powerpc/mm/hash_utils_64.c | 10 +++--- arch/powerpc/mm/slice.c | 10 +++--- 4 files changed, 21 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/copro.h b/arch/powerpc/include/asm/copro.h index b0e6a18..ce216df 100644 --- a/arch/powerpc/include/asm/copro.h +++ b/arch/powerpc/include/asm/copro.h @@ -20,4 +20,10 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb); + +#ifdef CONFIG_PPC_COPRO_BASE +void copro_flush_all_slbs(struct mm_struct *mm); +#else +static inline void copro_flush_all_slbs(struct mm_struct *mm) {} +#endif #endif /* _ASM_POWERPC_COPRO_H */ diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index a15a23e..f2aa5a8 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -25,6 +25,7 @@ #include linux/export.h #include asm/reg.h #include asm/copro.h +#include asm/spu.h /* * This ought to be kept in sync with the powerpc specific do_page_fault @@ -136,3 +137,11 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb) return 0; } EXPORT_SYMBOL_GPL(copro_calculate_slb); + +void copro_flush_all_slbs(struct mm_struct *mm) +{ +#ifdef CONFIG_SPU_BASE + spu_flush_all_slbs(mm); +#endif +} +EXPORT_SYMBOL_GPL(copro_flush_all_slbs); diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index daee7f4..5c0738d 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -51,7 +51,7 @@ #include asm/cacheflush.h #include asm/cputable.h #include asm/sections.h -#include asm/spu.h +#include asm/copro.h #include asm/udbg.h #include asm/code-patching.h #include asm/fadump.h @@ -901,9 +901,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr) if (get_slice_psize(mm, addr) == MMU_PAGE_4K) return; slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K); -#ifdef CONFIG_SPU_BASE - spu_flush_all_slbs(mm); -#endif + copro_flush_all_slbs(mm); if (get_paca_psize(addr) != MMU_PAGE_4K) { get_paca()-context = mm-context; slb_flush_and_rebolt(); @@ -1141,9 +1139,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap) to 4kB pages because of non-cacheable mapping\n); psize = mmu_vmalloc_psize = MMU_PAGE_4K; -#ifdef CONFIG_SPU_BASE - spu_flush_all_slbs(mm); -#endif + copro_flush_all_slbs(mm); } } diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index b0c75cc..a81791c 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -32,7 +32,7 @@ #include linux/export.h #include asm/mman.h #include asm/mmu.h -#include asm/spu.h +#include asm/copro.h /* some sanity checks */ #if (PGTABLE_RANGE 43) SLICE_MASK_SIZE @@ -232,9 +232,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz spin_unlock_irqrestore(slice_convert_lock, flags); -#ifdef CONFIG_SPU_BASE - spu_flush_all_slbs(mm); -#endif + copro_flush_all_slbs(mm); } /* @@ -671,9 +669,7 @@ void slice_set_psize(struct mm_struct *mm, unsigned long address, spin_unlock_irqrestore(slice_convert_lock, flags); -#ifdef CONFIG_SPU_BASE - spu_flush_all_slbs(mm); -#endif + copro_flush_all_slbs(mm); } void slice_set_range_psize(struct mm_struct *mm, unsigned long start, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 04/16] powerpc/msi: Improve IRQ bitmap allocator
From: Ian Munsie imun...@au1.ibm.com Currently msi_bitmap_alloc_hwirqs() will round up any IRQ allocation requests to the nearest power of 2. eg. ask for 5 IRQs and you'll get 8. This wastes a lot of IRQs which can be a scarce resource. For cxl we may require multiple IRQs for every context that is attached to the accelerator. There may be 1000s of contexts attached, hence we can easily run out of IRQs, especially if we are needlessly wasting them. This changes the msi_bitmap_alloc_hwirqs() to allocate only the required number of IRQs, hence avoiding this wastage. It keeps the natural alignment requirement though. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/sysdev/msi_bitmap.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c index 2ff6302..871d94b 100644 --- a/arch/powerpc/sysdev/msi_bitmap.c +++ b/arch/powerpc/sysdev/msi_bitmap.c @@ -20,32 +20,37 @@ int msi_bitmap_alloc_hwirqs(struct msi_bitmap *bmp, int num) int offset, order = get_count_order(num); spin_lock_irqsave(bmp-lock, flags); - /* -* This is fast, but stricter than we need. We might want to add -* a fallback routine which does a linear search with no alignment. -*/ - offset = bitmap_find_free_region(bmp-bitmap, bmp-irq_count, order); + + offset = bitmap_find_next_zero_area(bmp-bitmap, bmp-irq_count, 0, + num, (1 order) - 1); + if (offset bmp-irq_count) + goto err; + + bitmap_set(bmp-bitmap, offset, num); spin_unlock_irqrestore(bmp-lock, flags); - pr_debug(msi_bitmap: allocated 0x%x (2^%d) at offset 0x%x\n, -num, order, offset); + pr_debug(msi_bitmap: allocated 0x%x at offset 0x%x\n, num, offset); return offset; +err: + spin_unlock_irqrestore(bmp-lock, flags); + return -ENOMEM; } +EXPORT_SYMBOL(msi_bitmap_alloc_hwirqs); void msi_bitmap_free_hwirqs(struct msi_bitmap *bmp, unsigned int offset, unsigned int num) { unsigned long flags; - int order = get_count_order(num); - pr_debug(msi_bitmap: freeing 0x%x (2^%d) at offset 0x%x\n, -num, order, offset); + pr_debug(msi_bitmap: freeing 0x%x at offset 0x%x\n, +num, offset); spin_lock_irqsave(bmp-lock, flags); - bitmap_release_region(bmp-bitmap, offset, order); + bitmap_clear(bmp-bitmap, offset, num); spin_unlock_irqrestore(bmp-lock, flags); } +EXPORT_SYMBOL(msi_bitmap_free_hwirqs); void msi_bitmap_reserve_hwirq(struct msi_bitmap *bmp, unsigned int hwirq) { @@ -180,6 +185,15 @@ void __init test_basics(void) msi_bitmap_free_hwirqs(bmp, size / 2, 1); check(msi_bitmap_alloc_hwirqs(bmp, 1) == size / 2); + /* Check we get a naturally aligned offset */ + check(msi_bitmap_alloc_hwirqs(bmp, 2) % 2 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 4) % 4 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 8) % 8 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 9) % 16 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 3) % 4 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 7) % 8 == 0); + check(msi_bitmap_alloc_hwirqs(bmp, 121) % 128 == 0); + msi_bitmap_free(bmp); /* Clients may check bitmap == NULL for not-allocated */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 05/16] powerpc/mm: Export mmu_kernel_ssize and mmu_linear_psize
From: Ian Munsie imun...@au1.ibm.com Export mmu_kernel_ssize and mmu_linear_psize. These are needed by the cxl driver which has it's own MMU. To setup the MMU cxl needs access to these. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/mm/hash_utils_64.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index 5c0738d..bbdb054 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -98,6 +98,7 @@ unsigned long htab_size_bytes; unsigned long htab_hash_mask; EXPORT_SYMBOL_GPL(htab_hash_mask); int mmu_linear_psize = MMU_PAGE_4K; +EXPORT_SYMBOL_GPL(mmu_linear_psize); int mmu_virtual_psize = MMU_PAGE_4K; int mmu_vmalloc_psize = MMU_PAGE_4K; #ifdef CONFIG_SPARSEMEM_VMEMMAP @@ -105,6 +106,7 @@ int mmu_vmemmap_psize = MMU_PAGE_4K; #endif int mmu_io_psize = MMU_PAGE_4K; int mmu_kernel_ssize = MMU_SEGSIZE_256M; +EXPORT_SYMBOL_GPL(mmu_kernel_ssize); int mmu_highuser_ssize = MMU_SEGSIZE_256M; u16 mmu_slb_size = 64; EXPORT_SYMBOL_GPL(mmu_slb_size); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 06/16] powerpc/powernv: Split out set MSI IRQ chip code
From: Ian Munsie imun...@au1.ibm.com Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so split it out. This will be used by some of the cxl PCIe code later. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/platforms/powernv/pci-ioda.c | 42 ++- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index df241b1..baf3de6 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1306,14 +1306,35 @@ static void pnv_ioda2_msi_eoi(struct irq_data *d) icp_native_eoi(d); } + +static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq) +{ + struct irq_data *idata; + struct irq_chip *ichip; + + if (phb-type != PNV_PHB_IODA2) + return; + + if (!phb-ioda.irq_chip_init) { + /* +* First time we setup an MSI IRQ, we need to setup the +* corresponding IRQ chip to route correctly. +*/ + idata = irq_get_irq_data(virq); + ichip = irq_data_get_irq_chip(idata); + phb-ioda.irq_chip_init = 1; + phb-ioda.irq_chip = *ichip; + phb-ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi; + } + irq_set_chip(virq, phb-ioda.irq_chip); +} + static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev, unsigned int hwirq, unsigned int virq, unsigned int is_64, struct msi_msg *msg) { struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev); struct pci_dn *pdn = pci_get_pdn(dev); - struct irq_data *idata; - struct irq_chip *ichip; unsigned int xive_num = hwirq - phb-msi_base; __be32 data; int rc; @@ -1365,22 +1386,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev, } msg-data = be32_to_cpu(data); - /* -* Change the IRQ chip for the MSI interrupts on PHB3. -* The corresponding IRQ chip should be populated for -* the first time. -*/ - if (phb-type == PNV_PHB_IODA2) { - if (!phb-ioda.irq_chip_init) { - idata = irq_get_irq_data(virq); - ichip = irq_data_get_irq_chip(idata); - phb-ioda.irq_chip_init = 1; - phb-ioda.irq_chip = *ichip; - phb-ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi; - } - - irq_set_chip(virq, phb-ioda.irq_chip); - } + set_msi_irq_chip(phb, virq); pr_devel(%s: %s-bit MSI on hwirq %x (xive #%d), address=%x_%08x data=%x PE# %d\n, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 07/16] cxl: Add new header for call backs and structs
From: Ian Munsie imun...@au1.ibm.com This new header adds callbacks and structs needed by the rest of the kernel to hook into the cxl infrastructure. This adds the cxl_ctx_in_use() function for use in the mm code to see if any cxl contexts are currently in use. This is used by the tlbie() to determine if it can do local TLB invalidations or not. This also adds get/put calls for the cxl driver module to refcount the active cxl contexts. cxl_ctx_get/put/in_use are static inlined here as they are called in tlbie which we want to be fast (mpe's suggestion). Empty functions are provided when CONFIG_CXL_BASE is not enabled. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- include/misc/cxl.h | 48 1 file changed, 48 insertions(+) create mode 100644 include/misc/cxl.h diff --git a/include/misc/cxl.h b/include/misc/cxl.h new file mode 100644 index 000..975cc78 --- /dev/null +++ b/include/misc/cxl.h @@ -0,0 +1,48 @@ +/* + * Copyright 2014 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _MISC_CXL_H +#define _MISC_CXL_H + +#ifdef CONFIG_CXL_BASE + +#define CXL_IRQ_RANGES 4 + +struct cxl_irq_ranges { + irq_hw_number_t offset[CXL_IRQ_RANGES]; + irq_hw_number_t range[CXL_IRQ_RANGES]; +}; + +extern atomic_t cxl_use_count; + +static inline bool cxl_ctx_in_use(void) +{ + return (atomic_read(cxl_use_count) != 0); +} + +static inline void cxl_ctx_get(void) +{ + atomic_inc(cxl_use_count); +} + +static inline void cxl_ctx_put(void) +{ + atomic_dec(cxl_use_count); +} + +void cxl_slbia(struct mm_struct *mm); + +#else /* CONFIG_CXL_BASE */ + +static inline bool cxl_ctx_in_use(void) { return false; } +static inline void cxl_slbia(struct mm_struct *mm) {} + +#endif /* CONFIG_CXL_BASE */ + +#endif -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 08/16] powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
From: Ian Munsie imun...@au1.ibm.com This adds a number of functions for allocating IRQs under powernv PCIe for cxl. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/pnv-pci.h| 31 ++ arch/powerpc/platforms/powernv/pci-ioda.c | 154 ++ 2 files changed, 185 insertions(+) create mode 100644 arch/powerpc/include/asm/pnv-pci.h diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h new file mode 100644 index 000..f09a22f --- /dev/null +++ b/arch/powerpc/include/asm/pnv-pci.h @@ -0,0 +1,31 @@ +/* + * Copyright 2014 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_PNV_PCI_H +#define _ASM_PNV_PCI_H + +#include linux/pci.h +#include misc/cxl.h + +int pnv_phb_to_cxl(struct pci_dev *dev); +int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq, + unsigned int virq); +int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num); +void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num); +int pnv_cxl_get_irq_count(struct pci_dev *dev); +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev); + +#ifdef CONFIG_CXL_BASE +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, + struct pci_dev *dev, int num); +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs, + struct pci_dev *dev); +#endif + +#endif diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index baf3de6..2dfc857 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -37,6 +37,9 @@ #include asm/xics.h #include asm/debug.h #include asm/firmware.h +#include asm/pnv-pci.h + +#include misc/cxl.h #include powernv.h #include pci.h @@ -1329,6 +1332,157 @@ static void set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq) irq_set_chip(virq, phb-ioda.irq_chip); } +#ifdef CONFIG_CXL_BASE + +struct device_node *pnv_pci_to_phb_node(struct pci_dev *dev) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + + return hose-dn; +} +EXPORT_SYMBOL(pnv_pci_to_phb_node); + +int pnv_phb_to_cxl(struct pci_dev *dev) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + struct pnv_ioda_pe *pe; + int rc; + + pe = pnv_ioda_get_pe(dev); + if (!pe) + return -ENODEV; + + pe_info(pe, Switching PHB to CXL\n); + + rc = opal_pci_set_phb_cxl_mode(phb-opal_id, 1, pe-pe_number); + if (rc) + dev_err(dev-dev, opal_pci_set_phb_cxl_mode failed: %i\n, rc); + + return rc; +} +EXPORT_SYMBOL(pnv_phb_to_cxl); + +/* Find PHB for cxl dev and allocate MSI hwirqs? + * Returns the absolute hardware IRQ number + */ +int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + int hwirq = msi_bitmap_alloc_hwirqs(phb-msi_bmp, num); + + if (hwirq 0) { + dev_warn(dev-dev, Failed to find a free MSI\n); + return -ENOSPC; + } + + return phb-msi_base + hwirq; +} +EXPORT_SYMBOL(pnv_cxl_alloc_hwirqs); + +void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + + msi_bitmap_free_hwirqs(phb-msi_bmp, hwirq - phb-msi_base, num); +} +EXPORT_SYMBOL(pnv_cxl_release_hwirqs); + +void pnv_cxl_release_hwirq_ranges(struct cxl_irq_ranges *irqs, + struct pci_dev *dev) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + int i, hwirq; + + for (i = 1; i CXL_IRQ_RANGES; i++) { + if (!irqs-range[i]) + continue; + pr_devel(cxl release irq range 0x%x: offset: 0x%lx limit: %ld\n, +i, irqs-offset[i], +irqs-range[i]); + hwirq = irqs-offset[i] - phb-msi_base; + msi_bitmap_free_hwirqs(phb-msi_bmp, hwirq, + irqs-range[i]); + } +} +EXPORT_SYMBOL(pnv_cxl_release_hwirq_ranges); + +int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, + struct pci_dev *dev, int num) +{ + struct pci_controller *hose = pci_bus_to_host(dev-bus); + struct pnv_phb *phb = hose-private_data; + int i, hwirq, try; + + memset(irqs, 0, sizeof(struct
[PATCH v4 09/16] powerpc/mm: Add new hash_page_mm()
From: Ian Munsie imun...@au1.ibm.com This adds a new function hash_page_mm() based on the existing hash_page(). This version allows any struct mm to be passed in, rather than assuming current. This is useful for servicing co-processor faults which are not in the context of the current running process. We need to be careful here as the current hash_page() assumes current in a few places. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/mmu-hash64.h | 1 + arch/powerpc/mm/hash_utils_64.c | 24 +--- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index aeabd02..764e141 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -324,6 +324,7 @@ extern int __hash_page_64K(unsigned long ea, unsigned long access, unsigned int local, int ssize); struct mm_struct; unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap); +extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap); extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap); int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, int local, int ssize, diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index bbdb054..698834d 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -904,7 +904,7 @@ void demote_segment_4k(struct mm_struct *mm, unsigned long addr) return; slice_set_range_psize(mm, addr, 1, MMU_PAGE_4K); copro_flush_all_slbs(mm); - if (get_paca_psize(addr) != MMU_PAGE_4K) { + if ((get_paca_psize(addr) != MMU_PAGE_4K) (current-mm == mm)) { get_paca()-context = mm-context; slb_flush_and_rebolt(); } @@ -989,12 +989,11 @@ static void check_paca_psize(unsigned long ea, struct mm_struct *mm, * -1 - critical hash insertion error * -2 - access not permitted by subpage protection mechanism */ -int hash_page(unsigned long ea, unsigned long access, unsigned long trap) +int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap) { enum ctx_state prev_state = exception_enter(); pgd_t *pgdir; unsigned long vsid; - struct mm_struct *mm; pte_t *ptep; unsigned hugeshift; const struct cpumask *tmp; @@ -1008,7 +1007,6 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap) switch (REGION_ID(ea)) { case USER_REGION_ID: user_region = 1; - mm = current-mm; if (! mm) { DBG_LOW( user region with no mm !\n); rc = 1; @@ -1019,7 +1017,6 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap) vsid = get_vsid(mm-context.id, ea, ssize); break; case VMALLOC_REGION_ID: - mm = init_mm; vsid = get_kernel_vsid(ea, mmu_kernel_ssize); if (ea VMALLOC_END) psize = mmu_vmalloc_psize; @@ -1104,7 +1101,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap) WARN_ON(1); } #endif - check_paca_psize(ea, mm, psize, user_region); + if (current-mm == mm) + check_paca_psize(ea, mm, psize, user_region); goto bail; } @@ -1145,7 +1143,8 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap) } } - check_paca_psize(ea, mm, psize, user_region); + if (current-mm == mm) + check_paca_psize(ea, mm, psize, user_region); #endif /* CONFIG_PPC_64K_PAGES */ #ifdef CONFIG_PPC_HAS_HASH_64K @@ -1180,6 +1179,17 @@ bail: exception_exit(prev_state); return rc; } +EXPORT_SYMBOL_GPL(hash_page_mm); + +int hash_page(unsigned long ea, unsigned long access, unsigned long trap) +{ + struct mm_struct *mm = current-mm; + + if (REGION_ID(ea) == VMALLOC_REGION_ID) + mm = init_mm; + + return hash_page_mm(mm, ea, access, trap); +} EXPORT_SYMBOL_GPL(hash_page); void hash_preload(struct mm_struct *mm, unsigned long ea, -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 10/16] powerpc/opal: Add PHB to cxl mode call
From: Ian Munsie imun...@au1.ibm.com This adds the OPAL call to change a PHB into cxl mode. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/opal.h| 2 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + 2 files changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 86055e5..84c37c4dbc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -146,6 +146,7 @@ struct opal_sg_list { #define OPAL_GET_PARAM 89 #define OPAL_SET_PARAM 90 #define OPAL_DUMP_RESEND 91 +#define OPAL_PCI_SET_PHB_CXL_MODE 93 #define OPAL_DUMP_INFO294 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 @@ -924,6 +925,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data); int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); +int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 2e6ce1b..0fb56dc 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -247,3 +247,4 @@ OPAL_CALL(opal_set_param, OPAL_SET_PARAM); OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); +OPAL_CALL(opal_pci_set_phb_cxl_mode, OPAL_PCI_SET_PHB_CXL_MODE); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 11/16] powerpc/mm: Add hooks for cxl
From: Ian Munsie imun...@au1.ibm.com This adds hooks into the core powerpc mm code for cxl. The core powerpc code sometimes uses local tlbie. Unfortunately this won't work with the current cxl driver as it relies on snooping tlbie broadcasts. The cxl hardware can have TLB entries invalidated via MMIO but this is not currently supported by the driver. In future we can make local tlbie smarter so that it invalidates cxl contexts via MMIO when it needs to but for now we have this workaround. This workaround checks for any active cxl contexts and if so, disables local tlbie. This also adds a hook for when SLBs are invalidated. This ensures any corresponding SLBs in cxl are also invalidated at the same time. This is required for segment demotion. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/mm/copro_fault.c| 2 ++ arch/powerpc/mm/hash_native_64.c | 6 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index f2aa5a8..0f9939e 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -26,6 +26,7 @@ #include asm/reg.h #include asm/copro.h #include asm/spu.h +#include misc/cxl.h /* * This ought to be kept in sync with the powerpc specific do_page_fault @@ -143,5 +144,6 @@ void copro_flush_all_slbs(struct mm_struct *mm) #ifdef CONFIG_SPU_BASE spu_flush_all_slbs(mm); #endif + cxl_slbia(mm); } EXPORT_SYMBOL_GPL(copro_flush_all_slbs); diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index afc0a82..ae4962a 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -29,6 +29,8 @@ #include asm/kexec.h #include asm/ppc-opcode.h +#include misc/cxl.h + #ifdef DEBUG_LOW #define DBG_LOW(fmt...) udbg_printf(fmt) #else @@ -149,9 +151,11 @@ static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize) static inline void tlbie(unsigned long vpn, int psize, int apsize, int ssize, int local) { - unsigned int use_local = local mmu_has_feature(MMU_FTR_TLBIEL); + unsigned int use_local; int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); + use_local = local mmu_has_feature(MMU_FTR_TLBIEL) !cxl_ctx_in_use(); + if (use_local) use_local = mmu_psize_defs[psize].tlbiel; if (lock_tlbie !use_local) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 12/16] cxl: Add base builtin support
From: Ian Munsie imun...@au1.ibm.com This adds the base cxl support that cannot be built as a module. Specifically it adds the cxl callbacks that are called from the core powerpc mm code which must always exist irrespective of if the cxl module is loaded or not. This is similar to how cell works with CONFIG_SPU_BASE. This adds a cxl_slbia() call (similar to spu_flush_all_slbs()) which checks if the cxl module is loaded and in use, returning immediately if it is not. If it is in use it calls into the cxl SLB invalidation code. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- drivers/misc/Kconfig | 1 + drivers/misc/Makefile | 1 + drivers/misc/cxl/Kconfig | 8 + drivers/misc/cxl/Makefile | 1 + drivers/misc/cxl/base.c | 86 +++ 5 files changed, 97 insertions(+) create mode 100644 drivers/misc/cxl/Kconfig create mode 100644 drivers/misc/cxl/Makefile create mode 100644 drivers/misc/cxl/base.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index b841180..bbeb451 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -527,4 +527,5 @@ source drivers/misc/vmw_vmci/Kconfig source drivers/misc/mic/Kconfig source drivers/misc/genwqe/Kconfig source drivers/misc/echo/Kconfig +source drivers/misc/cxl/Kconfig endmenu diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 5497d02..7d5c4cd 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -55,3 +55,4 @@ obj-y += mic/ obj-$(CONFIG_GENWQE) += genwqe/ obj-$(CONFIG_ECHO) += echo/ obj-$(CONFIG_VEXPRESS_SYSCFG) += vexpress-syscfg.o +obj-$(CONFIG_CXL_BASE) += cxl/ diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig new file mode 100644 index 000..5cdd319 --- /dev/null +++ b/drivers/misc/cxl/Kconfig @@ -0,0 +1,8 @@ +# +# IBM Coherent Accelerator (CXL) compatible devices +# + +config CXL_BASE + bool + default n + select PPC_COPRO_BASE diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile new file mode 100644 index 000..e30ad0a --- /dev/null +++ b/drivers/misc/cxl/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_CXL_BASE) += base.o diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c new file mode 100644 index 000..0654ad8 --- /dev/null +++ b/drivers/misc/cxl/base.c @@ -0,0 +1,86 @@ +/* + * Copyright 2014 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include linux/module.h +#include linux/rcupdate.h +#include asm/errno.h +#include misc/cxl.h +#include cxl.h + +/* protected by rcu */ +static struct cxl_calls *cxl_calls; + +atomic_t cxl_use_count = ATOMIC_INIT(0); +EXPORT_SYMBOL(cxl_use_count); + +#ifdef CONFIG_CXL_MODULE + +static inline struct cxl_calls *cxl_calls_get(void) +{ + struct cxl_calls *calls = NULL; + + rcu_read_lock(); + calls = rcu_dereference(cxl_calls); + if (calls !try_module_get(calls-owner)) + calls = NULL; + rcu_read_unlock(); + + return calls; +} + +static inline void cxl_calls_put(struct cxl_calls *calls) +{ + BUG_ON(calls != cxl_calls); + + /* we don't need to rcu this, as we hold a reference to the module */ + module_put(cxl_calls-owner); +} + +#else /* !defined CONFIG_CXL_MODULE */ + +static inline struct cxl_calls *cxl_calls_get(void) +{ + return cxl_calls; +} + +static inline void cxl_calls_put(struct cxl_calls *calls) { } + +#endif /* CONFIG_CXL_MODULE */ + +void cxl_slbia(struct mm_struct *mm) +{ + struct cxl_calls *calls; + + calls = cxl_calls_get(); + if (!calls) + return; + + if (cxl_ctx_in_use()) + calls-cxl_slbia(mm); + + cxl_calls_put(calls); +} + +int register_cxl_calls(struct cxl_calls *calls) +{ + if (cxl_calls) + return -EBUSY; + + rcu_assign_pointer(cxl_calls, calls); + return 0; +} +EXPORT_SYMBOL_GPL(register_cxl_calls); + +void unregister_cxl_calls(struct cxl_calls *calls) +{ + BUG_ON(cxl_calls-owner != calls-owner); + RCU_INIT_POINTER(cxl_calls, NULL); + synchronize_rcu(); +} +EXPORT_SYMBOL_GPL(unregister_cxl_calls); -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 14/16] cxl: Add userspace header file
From: Ian Munsie imun...@au1.ibm.com This adds a header file for use by userspace programs wanting to interact with the kernel cxl driver. It defines structs and magic numbers required for userspace to interact with devices in /dev/cxl/afuM.N. Further documentation on this interface is added in a subsequent patch in Documentation/powerpc/cxl.txt. It also adds this new userspace header file to Kbuild so it's exported when doing make headers_installs. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- include/uapi/Kbuild | 1 + include/uapi/misc/Kbuild | 2 ++ include/uapi/misc/cxl.h | 87 3 files changed, 90 insertions(+) create mode 100644 include/uapi/misc/Kbuild create mode 100644 include/uapi/misc/cxl.h diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild index 81d2106..245aa6e 100644 --- a/include/uapi/Kbuild +++ b/include/uapi/Kbuild @@ -12,3 +12,4 @@ header-y += video/ header-y += drm/ header-y += xen/ header-y += scsi/ +header-y += misc/ diff --git a/include/uapi/misc/Kbuild b/include/uapi/misc/Kbuild new file mode 100644 index 000..e96cae7 --- /dev/null +++ b/include/uapi/misc/Kbuild @@ -0,0 +1,2 @@ +# misc Header export list +header-y += cxl.h diff --git a/include/uapi/misc/cxl.h b/include/uapi/misc/cxl.h new file mode 100644 index 000..c232be6 --- /dev/null +++ b/include/uapi/misc/cxl.h @@ -0,0 +1,87 @@ +/* + * Copyright 2014 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _UAPI_MISC_CXL_H +#define _UAPI_MISC_CXL_H + +#include linux/types.h +#include linux/ioctl.h + +/* Structs for IOCTLS for userspace to talk to the kernel */ +struct cxl_ioctl_start_work { + __u64 flags; + __u64 work_element_descriptor; + __u64 amr; + __s16 num_interrupts; + __s16 reserved1; + __s32 reserved2; + __u64 reserved3; + __u64 reserved4; + __u64 reserved5; + __u64 reserved6; +}; +#define CXL_START_WORK_AMR 0x0001ULL +#define CXL_START_WORK_NUM_IRQS0x0002ULL +#define CXL_START_WORK_ALL (CXL_START_WORK_AMR |\ +CXL_START_WORK_NUM_IRQS) + +/* IOCTL numbers */ +#define CXL_MAGIC 0xCA +#define CXL_IOCTL_START_WORK _IOW(CXL_MAGIC, 0x00, struct cxl_ioctl_start_work) +#define CXL_IOCTL_GET_PROCESS_ELEMENT _IOR(CXL_MAGIC, 0x01, __u32) + +/* Events from read() */ +#define CXL_READ_MIN_SIZE 0x1000 /* 4K */ + +enum cxl_event_type { + CXL_EVENT_RESERVED = 0, + CXL_EVENT_AFU_INTERRUPT = 1, + CXL_EVENT_DATA_STORAGE = 2, + CXL_EVENT_AFU_ERROR = 3, +}; + +struct cxl_event_header { + __u16 type; + __u16 size; + __u16 process_element; + __u16 reserved1; +}; + +struct cxl_event_afu_interrupt { + __u16 flags; + __u16 irq; /* Raised AFU interrupt number */ + __u32 reserved1; +}; + +struct cxl_event_data_storage { + __u16 flags; + __u16 reserved1; + __u32 reserved2; + __u64 addr; + __u64 dsisr; + __u64 reserved3; +}; + +struct cxl_event_afu_error { + __u16 flags; + __u16 reserved1; + __u32 reserved2; + __u64 error; +}; + +struct cxl_event { + struct cxl_event_header header; + union { + struct cxl_event_afu_interrupt irq; + struct cxl_event_data_storage fault; + struct cxl_event_afu_error afu_error; + }; +}; + +#endif /* _UAPI_MISC_CXL_H */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 15/16] cxl: Add driver to Kbuild and Makefiles
From: Ian Munsie imun...@au1.ibm.com Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- drivers/misc/cxl/Kconfig | 17 + drivers/misc/cxl/Makefile | 2 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig index 5cdd319..a990b39 100644 --- a/drivers/misc/cxl/Kconfig +++ b/drivers/misc/cxl/Kconfig @@ -6,3 +6,20 @@ config CXL_BASE bool default n select PPC_COPRO_BASE + +config CXL + tristate Support for IBM Coherent Accelerators (CXL) + depends on PPC_POWERNV PCI_MSI + select CXL_BASE + default m + help + Select this option to enable driver support for IBM Coherent + Accelerators (CXL). CXL is otherwise known as Coherent Accelerator + Processor Interface (CAPI). CAPI allows accelerators in FPGAs to be + coherently attached to a CPU via an MMU. This driver enables + userspace programs to access these accelerators via /dev/cxl/afuM.N + devices. + + CAPI adapters are found in POWER8 based systems. + + If unsure, say N. diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile index e30ad0a..165e98f 100644 --- a/drivers/misc/cxl/Makefile +++ b/drivers/misc/cxl/Makefile @@ -1 +1,3 @@ +cxl-y += main.o file.o irq.o fault.o native.o context.o sysfs.o debugfs.o pci.o +obj-$(CONFIG_CXL) += cxl.o obj-$(CONFIG_CXL_BASE) += base.o -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 16/16] cxl: Add documentation for userspace APIs
From: Ian Munsie imun...@au1.ibm.com This documentation gives an overview of the hardware architecture, userspace APIs via /dev/cxl/afuM.N and the syfs files. It also adds a MAINTAINERS file entry for cxl. Signed-off-by: Ian Munsie imun...@au1.ibm.com Signed-off-by: Michael Neuling mi...@neuling.org --- Documentation/ABI/testing/sysfs-class-cxl | 130 ++ Documentation/ioctl/ioctl-number.txt | 1 + Documentation/powerpc/00-INDEX| 2 + Documentation/powerpc/cxl.txt | 379 ++ MAINTAINERS | 12 + include/uapi/misc/cxl.h | 7 +- 6 files changed, 528 insertions(+), 3 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-class-cxl create mode 100644 Documentation/powerpc/cxl.txt diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl new file mode 100644 index 000..0a5508c --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -0,0 +1,130 @@ +Slave contexts (eg. /sys/class/cxl/afu0.0s): + +What: /sys/class/cxl/afu/irqs_max +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read/write +Decimal value of maximum number of interrupts that can be +requested by userspace. The default on probe is the maximum +that hardware can support (eg. 2037). Write values will limit +userspace applications to that many userspace interrupts. Must +be = irqs_min. + +What: /sys/class/cxl/afu/irqs_min +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the minimum number of interrupts that +userspace must request on a CXL_START_WORK ioctl. Userspace may +omit the num_interrupts field in the START_WORK IOCTL to get +this minimum automatically. + +What: /sys/class/cxl/afu/mmio_size +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the size of the MMIO space that may be mmaped +by userspace. + +What: /sys/class/cxl/afu/modes_supported +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +List of the modes this AFU supports. One per line. +Valid entries are: dedicated_process and afu_directed + +What: /sys/class/cxl/afu/mode +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read/write +The current mode the AFU is using. Will be one of the modes +given in modes_supported. Writing will change the mode +provided that no user contexts are attached. + + +What: /sys/class/cxl/afu/prefault_mode +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read/write +Set the mode for prefaulting in segments into the segment table +when performing the START_WORK ioctl. Possible values: +none: No prefaulting (default) +work_element_descriptor: Treat the work element + descriptor as an effective address and + prefault what it points to. +all: all segments process calling START_WORK maps. + +What: /sys/class/cxl/afu/reset +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:write only +Writing 1 here will reset the AFU provided there are not +contexts active on the AFU. + +What: /sys/class/cxl/afu/api_version +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the current version of the kernel/user API. + +What: /sys/class/cxl/afu/api_version_com +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the the lowest version of the userspace API +this this kernel supports. + + + +Master contexts (eg. /sys/class/cxl/afu0.0m) + +What: /sys/class/cxl/afum/mmio_size +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the size of the MMIO space that may be mmaped +by userspace. This includes all slave contexts space also. + +What: /sys/class/cxl/afum/pp_mmio_len +Date: September 2014 +Contact:linuxppc-dev@lists.ozlabs.org +Description:read only +Decimal value of the Per Process MMIO space length. + +What:
Re: [PATCH] tools/perf/powerpc: Fix build break
Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: CC arch/powerpc/util/skip-callchain-idx.o arch/powerpc/util/skip-callchain-idx.c: In function ‘check_return_reg’: arch/powerpc/util/skip-callchain-idx.c:55:3: error: implicit declaration of function ‘pr_debug’ [-Werror=implicit-function-declaration] pr_debug(dwarf_frame_register() %s\n, dwarf_errmsg(-1)); Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- tools/perf/arch/powerpc/util/skip-callchain-idx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/arch/powerpc/util/skip-callchain-idx.c b/tools/perf/arch/powerpc/util/skip-callchain-idx.c index a7c23a4b3778..d73ef8bb08c7 100644 --- a/tools/perf/arch/powerpc/util/skip-callchain-idx.c +++ b/tools/perf/arch/powerpc/util/skip-callchain-idx.c @@ -15,6 +15,7 @@ #include util/thread.h #include util/callchain.h +#include util/debug.h /* * When saving the callchain on Power, the kernel conservatively saves We still have this broken upstream. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: mitigate impact of decrementer reset
On 10/08/2014 08:22 AM, Michael Ellerman wrote: On Tue, 2014-07-10 at 19:13:24 UTC, Paul Clarke wrote: The POWER ISA defines an always-running decrementer which can be used to schedule interrupts after a certain time interval has elapsed. The decrementer counts down at the same frequency as the Time Base, which is 512 MHz. The maximum value of the decrementer is 0x7fff. This works out to a maximum interval of about 4.19 seconds. If a larger interval is desired, the kernel will set the decrementer to its maximum value and reset it after it expires (underflows) a sufficient number of times until the desired interval has elapsed. The negative effect of this is that an unwanted latency spike will impact normal processing at most every 4.19 seconds. On an IBM POWER8-based system, this spike was measured at about 25-30 microseconds, much of which was basic, opportunistic housekeeping tasks that could otherwise have waited. This patch short-circuits the reset of the decrementer, exiting after the decrementer reset, but before the housekeeping tasks if the only need for the interrupt is simply to reset it. After this patch, the latency spike was measured at about 150 nanoseconds. Hi Paul, Thanks for the excellent changelog. But this patch makes me a bit nervous :) Do you know where the latency is coming from? Is it primarily the irq work? If so I'd prefer if we could move the short circuit into __timer_interrupt() itself. That way we'd still have the trace points usable, and it would hopefully result in less duplicated logic. I agree, this is perhaps the better approach. Regards Preeti U Murthy cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 13/16] cxl: Driver code for powernv PCIe based cards for userspace access
Excerpts from Michael Neuling's message of 2014-10-08 19:55:02 +1100: +static ssize_t afu_read(struct file *file, char __user *buf, size_t count, +loff_t *off) ... +for (;;) { +prepare_to_wait(ctx-wq, wait, TASK_INTERRUPTIBLE); +if (ctx_event_pending(ctx)) +break; + +spin_unlock_irqrestore(ctx-lock, flags); +if (file-f_flags O_NONBLOCK) +return -EAGAIN; + +if (signal_pending(current)) +return -ERESTARTSYS; Looks like I mucked this up while refactoring - these two cases no longer call finish_wait() which can lead to a crash if something later wakes up the ctx-wq... I'll post a fix in a separate patch shortly. -Ian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] CXL: Fix afu_read() not doing finish_wait() on signal or non-blocking
If afu_read() returned due to a signal or the AFU file descriptor being opened non-blocking it would not call finish_wait() before returning, which could lead to a crash later when something else wakes up the wait queue. This patch restructures the wait logic to ensure that the cleanup is done correctly. Signed-off-by: Ian Munsie imun...@au1.ibm.com --- drivers/misc/cxl/file.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c index 847b7e6..378b099 100644 --- a/drivers/misc/cxl/file.c +++ b/drivers/misc/cxl/file.c @@ -273,6 +273,7 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, struct cxl_context *ctx = file-private_data; struct cxl_event event; unsigned long flags; + int rc; DEFINE_WAIT(wait); if (count CXL_READ_MIN_SIZE) @@ -285,13 +286,17 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, if (ctx_event_pending(ctx)) break; - spin_unlock_irqrestore(ctx-lock, flags); - if (file-f_flags O_NONBLOCK) - return -EAGAIN; + if (file-f_flags O_NONBLOCK) { + rc = -EAGAIN; + goto out; + } - if (signal_pending(current)) - return -ERESTARTSYS; + if (signal_pending(current)) { + rc = -ERESTARTSYS; + goto out; + } + spin_unlock_irqrestore(ctx-lock, flags); pr_devel(afu_read going to sleep...\n); schedule(); pr_devel(afu_read woken up\n); @@ -336,6 +341,11 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, if (copy_to_user(buf, event, event.header.size)) return -EFAULT; return event.header.size; + +out: + finish_wait(ctx-wq, wait); + spin_unlock_irqrestore(ctx-lock, flags); + return rc; } static const struct file_operations afu_fops = { -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
回复: [PATCH 08/44] kernel: Move pm_power_off to common code
- Guenter Roeck li...@roeck-us.net 写道: pm_power_off is defined for all architectures. Move it to common code. Have all architectures call do_kernel_poweroff instead of pm_power_off. Some architectures point pm_power_off to machine_power_off. For those, call do_kernel_poweroff from machine_power_off instead. For UniCore32 part, Acked-by: Xuetao Guan g...@mprc.pku.edu.cn Thanks Xuetao ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: mitigate impact of decrementer reset
On 10/08/2014 12:37 AM, Heinz Wrobel wrote: what if your tb wraps during the test? Per the Power ISA, Time Base is 64 bits, monotonically increasing, and is writable only in hypervisor state. To my understanding, it is set to zero at boot (although this is not prescribed). Also, as noted by others, the logic is roughly duplicated (with some differences) from the analogous code in __timer_interrupt just above it. I don't see wrapping as a concern. PC ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 3/3] powerpc, ptrace: Enable support for miscellaneous registers
On 08/28/2014 03:05 AM, Sukadev Bhattiprolu wrote: Anshuman Khandual [khand...@linux.vnet.ibm.com] wrote: | This patch enables get and set of miscellaneous registers through ptrace | PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing new powerpc | specific register set REGSET_MISC support corresponding to the new ELF | core note NT_PPC_MISC added previously in this regard. | | Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com | --- | arch/powerpc/kernel/ptrace.c | 81 | 1 file changed, 81 insertions(+) | | diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c | index 17642ef..63b883a 100644 | --- a/arch/powerpc/kernel/ptrace.c | +++ b/arch/powerpc/kernel/ptrace.c | @@ -1149,6 +1149,76 @@ static int tm_cvmx_set(struct task_struct *target, const struct user_regset *reg | #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ | | /* | + * Miscellaneous Registers | + * | + * struct { | + * unsigned long dscr; | + * unsigned long ppr; | + * unsigned long tar; | + * }; | + */ | +static int misc_get(struct task_struct *target, const struct user_regset *regset, | + unsigned int pos, unsigned int count, | + void *kbuf, void __user *ubuf) | +{ | + int ret; | + | + /* DSCR register */ | + ret = user_regset_copyout(pos, count, kbuf, ubuf, | + target-thread.dscr, 0, | + sizeof(unsigned long)); | + | + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + | + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); I see these in arch/powerpc/include/asm/processor.h #ifdef CONFIG_PPC64 unsigned long dscr; int dscr_inherit; unsigned long ppr;/* used to save/restore SMT priority */ #endif where there is an 'int' between ppr and dscr. So, should one of the above sizeof(unsigned long) be changed to sizeof(int) ? Right, I understand that but strangely I get this compile time error when it is changed to sizeof(int). error: call to ‘__compiletime_assert_1350’ declared with attribute error: BUILD_BUG_ON failed: TSO(dscr) + sizeof(unsigned long) + sizeof(int) != TSO(ppr) BUILD_BUG_ON(TSO(dscr) + sizeof(unsigned long) + sizeof(int) != TSO(ppr)); may be I am missing something here. Also, since we use offsetof(struct thread_struct, field) heavily, a macro local to the file, may simplify the code. Right, will do that. #define TSO(f) (offsetof(struct thread_struct, f)) | + | + /* PPR register */ | + if (!ret) | + ret = user_regset_copyout(pos, count, kbuf, ubuf, | + target-thread.ppr, sizeof(unsigned long), | + 2 * sizeof(unsigned long)); | + | + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) | + != offsetof(struct thread_struct, tar)); | + /* TAR register */ | + if (!ret) | + ret = user_regset_copyout(pos, count, kbuf, ubuf, | + target-thread.tar, 2 * sizeof(unsigned long), | + 3 * sizeof(unsigned long)); | + return ret; | +} | + | +static int misc_set(struct task_struct *target, const struct user_regset *regset, | + unsigned int pos, unsigned int count, | + const void *kbuf, const void __user *ubuf) | +{ | + int ret; | + | + /* DSCR register */ | + ret = user_regset_copyin(pos, count, kbuf, ubuf, | + target-thread.dscr, 0, | + sizeof(unsigned long)); | + | + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + | + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); | + | + /* PPR register */ | + if (!ret) | + ret = user_regset_copyin(pos, count, kbuf, ubuf, | + target-thread.ppr, sizeof(unsigned long), | + 2 * sizeof(unsigned long)); | + | + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) | + != offsetof(struct thread_struct, tar)); | + | + /* TAR register */ | + if (!ret) | + ret = user_regset_copyin(pos, count, kbuf, ubuf, | + target-thread.tar, 2 * sizeof(unsigned long), | + 3 * sizeof(unsigned long)); | + return ret; | +} | + | +/* | * These are our native regset flavors. | */ | enum powerpc_regset { | @@ -1169,6 +1239,7 @@ enum powerpc_regset { | REGSET_TM_CFPR, /* TM checkpointed FPR */ |
Re: [PATCH v2 1/2] spi: fsl-spi: Fix parameter ram offset setup for CPM1
Le 07/10/2014 02:15, Scott Wood a écrit : On Sat, 2014-10-04 at 14:02 +0200, christophe leroy wrote: Le 03/10/2014 22:29, Scott Wood a écrit : On Fri, 2014-10-03 at 18:49 +0200, Christophe Leroy wrote: On CPM1, the SPI parameter RAM has a default location. In fsl_spi_cpm_get_pram() there was a confusion between the SPI_BASE register and the base of the SPI parameter RAM. Fortunatly, it was working properly with MPC866 and MPC885 because they do set SPI_BASE, but on MPC860 and other old MPC8xx that doesn't set SPI_BASE, pram_ofs was not properly set. This patch fixes this confusion. Signed-off-by: Christophe Leroy christophe.le...@c-s.fr --- Changes from v1 to v2: none drivers/spi/spi-fsl-cpm.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/spi/spi-fsl-cpm.c b/drivers/spi/spi-fsl-cpm.c index 54b0637..0f3a912 100644 --- a/drivers/spi/spi-fsl-cpm.c +++ b/drivers/spi/spi-fsl-cpm.c @@ -262,15 +262,14 @@ static unsigned long fsl_spi_cpm_get_pram(struct mpc8xxx_spi *mspi) pram_ofs = cpm_muram_alloc(SPI_PRAM_SIZE, 64); out_be16(spi_base, pram_ofs); } else { - struct spi_pram __iomem *pram = spi_base; - u16 rpbase = in_be16(pram-rpbase); + u16 rpbase = in_be16(spi_base); - /* Microcode relocation patch applied? */ + /* Microcode relocation patch applied | rpbase set by default */ if (rpbase) { pram_ofs = rpbase; } else { - pram_ofs = cpm_muram_alloc(SPI_PRAM_SIZE, 64); - out_be16(spi_base, pram_ofs); + pram_ofs = offsetof(cpm8xx_t, cp_dparam[PROFF_SPI]) - + offsetof(cpm8xx_t, cp_dpmem[0]); } } Why is PROFF_SPI not coming from the device tree? That's where it starts to become tricky. PROFF_SPI is defined in cpm1.h which is included by the driver already. Yes, but those values shouldn't be used. It's a leftover from the old way of hardcoding things and describing the hardware with kconfig rather than the device tree. It provides the default offset from the start of the parameter RAM. Previously I had the following in my device tree, and the last part of the source above (the one for rpbase == 0) could not work. spi: spi@a80 { cell-index = 0; compatible = fsl,spi, fsl,cpm1-spi; reg = 0xa80 0x30 0x3d80 0x30; First reg area was the area for SPI registers. Second area was the parameter RAM zone, which was just mapped to get access to the SPI_BASE pointer (rpbase) Now I have compatible = fsl,spi, fsl,cpm1-spi-reloc; reg = 0xa80 0x30 0x3dac 0x2; First reg area is the area for SPI registers. Second area is the SPI_BASE, as for the CPM2. On recent 8xx (885 and 866 at least) it contains the offset (=0x1D80) of the parameter RAM. But on old ones (860, ...) it contains 0. Therefore we have to get the default index in another way. What I wanted was to keep something similar to what's done with CPM2. What should it look like if that offset had to be in the device tree ? If the offset is not relocatable or discoverable, it should stay in the device tree. If you have an old chip you wouldn't have fsl,cpm1-spi-reloc and thus you'd still have 0x3d80 0x30 in reg. This index is from the start of the dual port RAM. It is 0x2000 above the start of the CPM area. In the DTS, we have: soc@ff00 { compatible = fsl,mpc885, fsl,pq1-soc; #address-cells = 1; #size-cells = 1; device_type = soc; ranges = 0x0 0xff00 0x28000; bus-frequency = 0; clock-frequency = 0; cpm@9c0 { #address-cells = 1; #size-cells = 1; compatible = fsl,mpc885-cpm, fsl,cpm1; ranges; reg = 0x9c0 0x40; brg-frequency = 0; interrupts = 0;// cpm error interrupt interrupt-parent = CPM_PIC; muram@2000 { #address-cells = 1; #size-cells = 1; ranges = 0x0 0x2000 0x2000; data@0 { compatible = fsl,cpm-muram-data; reg = 0x0 0x1c00; }; }; spi: spi@a80 { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl,spi, fsl,cpm1-spi; reg = 0xa80 0x30 0x3d80 0x30; interrupts = 5; interrupt-parent = CPM_PIC; mode = cpu; The binding allows me to do an of_iomap() on the parameter RAM, hence to get access to the relocation index which is inside it. But if the relocation index is 0, I have to calculate it by myself because the calling function expects it in return. The binding is
Re: [PATCH v2 1/2] spi: fsl-spi: Fix parameter ram offset setup for CPM1
On Wed, 2014-10-08 at 18:21 +0200, leroy christophe wrote: Le 07/10/2014 02:15, Scott Wood a écrit : On Sat, 2014-10-04 at 14:02 +0200, christophe leroy wrote: What should it look like if that offset had to be in the device tree ? If the offset is not relocatable or discoverable, it should stay in the device tree. If you have an old chip you wouldn't have fsl,cpm1-spi-reloc and thus you'd still have 0x3d80 0x30 in reg. This index is from the start of the dual port RAM. It is 0x2000 above the start of the CPM area. In the DTS, we have: soc@ff00 { compatible = fsl,mpc885, fsl,pq1-soc; #address-cells = 1; #size-cells = 1; device_type = soc; ranges = 0x0 0xff00 0x28000; bus-frequency = 0; clock-frequency = 0; cpm@9c0 { #address-cells = 1; #size-cells = 1; compatible = fsl,mpc885-cpm, fsl,cpm1; ranges; reg = 0x9c0 0x40; brg-frequency = 0; interrupts = 0;// cpm error interrupt interrupt-parent = CPM_PIC; muram@2000 { #address-cells = 1; #size-cells = 1; ranges = 0x0 0x2000 0x2000; data@0 { compatible = fsl,cpm-muram-data; reg = 0x0 0x1c00; }; }; spi: spi@a80 { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl,spi, fsl,cpm1-spi; reg = 0xa80 0x30 0x3d80 0x30; interrupts = 5; interrupt-parent = CPM_PIC; mode = cpu; The binding allows me to do an of_iomap() on the parameter RAM, hence to get access to the relocation index which is inside it. But if the relocation index is 0, I have to calculate it by myself because the calling function expects it in return. The binding is also supposed to tell that the muram is at 0xff002000. But I don't know how I can get this info and use it to calculate the index of my param RAM ? I need to calculate the index which is 1d80 (0x3d80 - 0x2000) What binding are you talking about? There is no published binding for this yet. As for what the driver should do, it should do an of_iomap(), but what it does with the resulting memory depends on the compatible. For fsl,cpm1-spi, the result would be the parameter RAM for the device. For fsl,cpm1-spi-reloc and fsl,cpm2-spi, it would be the relocation register. The driver would either read the contents of the register, or write a different offset. My understanding is that the relocation register would only be zero on the chips where we'd use fsl,cpm1-spi, not fsl,cpm1-spi-reloc. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] spi: fsl-spi: Allow dynamic allocation of CPM1 parameter RAM
Le 07/10/2014 02:19, Scott Wood a écrit : On Sat, 2014-10-04 at 12:15 +0200, christophe leroy wrote: Le 03/10/2014 22:24, Scott Wood a écrit : On Fri, 2014-10-03 at 22:15 +0200, christophe leroy wrote: Le 03/10/2014 16:44, Mark Brown a écrit : On Fri, Oct 03, 2014 at 02:56:09PM +0200, Christophe Leroy wrote: +config CPM1_RELOCSPI + bool Dynamic SPI relocation + default n + help + On recent MPC8xx (at least MPC866 and MPC885) SPI can be relocated + without micropatch. This activates relocation to a dynamically + allocated area in the CPM Dual port RAM. + When combined with SPI relocation patch (for older MPC8xx) it avoids + the loss of additional Dual port RAM space just above the patch, + which might be needed for example when using the CPM QMC. Something like this shouldn't be a compile time option. Either it should be unconditional or it should be triggered in some system specific manner (from DT, from knowing about other users or similar). Can't be unconditional as older versions of mpc8xx (eg MPC860) don't support relocation without a micropatch. I have therefore submitted a v2 based on a DTS compatible property. So the device tree change is about whether relocation is supported, not whether it is required? Indeed no, my intension is to say that relocation is requested. Do you mean that it should then not use a compatible ? The device tree describes hardware. It doesn't tell software how to use that hardware. Based on one of your other e-mails, I think what you want to say here is that the old binding didn't describe the registers needed for relocation, so the new compatible describes the new binding, rather than requesting that software do a relocation. Software that sees the new binding could choose to relocate, or just choose to read the current offset from the register. Not exactly. The old binding does describe the entire default param RAM (0x3d80 size 0x30). The relocation index is within this param RAM at 0x3dac. So the old binding is enough to allow relocation. The issue today with the driver (hence my first patch) is that the driver reads the relocation index but takes a wrong decision if the index is 0: it assumes that an nul index means that a param RAM shall be allocated, which is wrong. A nul index means that the component doesn't support relocation, so the default param RAM shall be used. The function used for that is supposed to return the index. So when the index is null, I need to calculate it. Now, it can't be the SPI driver by itself that decide if he has to relocate or not. Because it depends whether I need to relocate or not. There is no point in waisting another area of the dualport RAM if I don't need to use SCC2 in a mode that overlaps the SPI parameter RAM. Today on the old MPC8xx, a microcode patch is needed in order to be able to relocate, and relocated address is directly fixed by the code handling the patch (sysdev/micropatch.c). The patch loading function is call very early in the boot process by cpm_reset() which is call by the xxx_setup_arch(). I have two issues with the way it is done today: 1/ the address which in hard coded is the micropatch loading function() is within the area for descripters for the QMC, so I would need to use another address. 2/ for new MPC8xx which don't need microcode patch, I have no way today to relocate. I have the same issue with the relocation of SMC1. Today when we activate SMC1 relocation microcode patch, the loading function has a hard coded relocation area for SMC1 which is the area dedicated to the MPC8xx DSP. It means that I need to change it as I want to use the DSP. Would it be acceptable to define a fixed relocation address in the Kconfig in which we select microcode patch (arch/powerpc/platforms/8xx), instead of having it hardcoded in micropatch.c ? Or maybe it would be possible to select which microcode patch we want/need via the device tree and which address shall be used for relocation ? What would you suggest to describe it ? How about checking for the existing specific-SoC compatibles? What do you mean ? Look for fsl,mpc885-cpm-i2c etc. Or, if you didn't follow that pattern (remember, I can't see your device tree!), look for fsl,mpc885-cpm or fsl,mpc866-cpm in the parent node. It's moot though, if the device tree also needs to be modified to describe the register used to relocate. -Scott I'm not sure I understood your question. My full device tree below Christophe /* * MIA ethernet Device Tree Source * * Copyright 2011 CSSI, Inc */ /dts-v1/; / { model = MIAE; compatible = fsl,cmpc885, fsl,mod885; #address-cells = 1; #size-cells = 1; aliases { ethernet0 = eth0; ethernet1 = eth1; mdio = phy; serial0 = smc1; }; cpus { #address-cells = 1; #size-cells = 0; PowerPC,885@0 { device_type = cpu;
Re: [PATCH V3 3/3] powerpc, ptrace: Enable support for miscellaneous registers
Anshuman Khandual [khand...@linux.vnet.ibm.com] wrote: | On 08/28/2014 03:05 AM, Sukadev Bhattiprolu wrote: | | I see these in arch/powerpc/include/asm/processor.h | | #ifdef CONFIG_PPC64 | unsigned long dscr; | int dscr_inherit; | unsigned long ppr;/* used to save/restore SMT priority */ | #endif | | where there is an 'int' between ppr and dscr. So, should one of | the above sizeof(unsigned long) be changed to sizeof(int) ? | | Right, I understand that but strangely I get this compile time error | when it is changed to sizeof(int). | | error: call to ‘__compiletime_assert_1350’ declared with attribute error: | BUILD_BUG_ON failed: TSO(dscr) + sizeof(unsigned long) + sizeof(int) != TSO(ppr) | BUILD_BUG_ON(TSO(dscr) + sizeof(unsigned long) + sizeof(int) != TSO(ppr)); | | may be I am missing something here. I guess there is a 4-byte padding after dscr_inherit. We could make that explicit by adding a field or just go with the sizeof(unsigned long). Thanks, Sukadev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] tools/perf/powerpc: Fix build break
Aneesh Kumar K.V [aneesh.ku...@linux.vnet.ibm.com] wrote: | Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: | |CC arch/powerpc/util/skip-callchain-idx.o | arch/powerpc/util/skip-callchain-idx.c: In function ‘check_return_reg’: | arch/powerpc/util/skip-callchain-idx.c:55:3: error: implicit declaration of function ‘pr_debug’ [-Werror=implicit-function-declaration] | pr_debug(dwarf_frame_register() %s\n, dwarf_errmsg(-1)); | | Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com | --- | tools/perf/arch/powerpc/util/skip-callchain-idx.c | 1 + | 1 file changed, 1 insertion(+) | | diff --git a/tools/perf/arch/powerpc/util/skip-callchain-idx.c b/tools/perf/arch/powerpc/util/skip-callchain-idx.c | index a7c23a4b3778..d73ef8bb08c7 100644 | --- a/tools/perf/arch/powerpc/util/skip-callchain-idx.c | +++ b/tools/perf/arch/powerpc/util/skip-callchain-idx.c | @@ -15,6 +15,7 @@ | | #include util/thread.h | #include util/callchain.h | +#include util/debug.h | | /* |* When saving the callchain on Power, the kernel conservatively saves | | We still have this broken upstream. The fix is in Ingo's tree, commit ad7e767. Ingo, can you push this fix to Linus - it fixes a build failure in Powerpc. Sukadev | | -aneesh | | ___ | Linuxppc-dev mailing list | Linuxppc-dev@lists.ozlabs.org | https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] spi: fsl-spi: Allow dynamic allocation of CPM1 parameter RAM
On Wed, 2014-10-08 at 18:46 +0200, leroy christophe wrote: Le 07/10/2014 02:19, Scott Wood a écrit : On Sat, 2014-10-04 at 12:15 +0200, christophe leroy wrote: Le 03/10/2014 22:24, Scott Wood a écrit : On Fri, 2014-10-03 at 22:15 +0200, christophe leroy wrote: Le 03/10/2014 16:44, Mark Brown a écrit : On Fri, Oct 03, 2014 at 02:56:09PM +0200, Christophe Leroy wrote: +config CPM1_RELOCSPI + bool Dynamic SPI relocation + default n + help +On recent MPC8xx (at least MPC866 and MPC885) SPI can be relocated +without micropatch. This activates relocation to a dynamically +allocated area in the CPM Dual port RAM. +When combined with SPI relocation patch (for older MPC8xx) it avoids +the loss of additional Dual port RAM space just above the patch, +which might be needed for example when using the CPM QMC. Something like this shouldn't be a compile time option. Either it should be unconditional or it should be triggered in some system specific manner (from DT, from knowing about other users or similar). Can't be unconditional as older versions of mpc8xx (eg MPC860) don't support relocation without a micropatch. I have therefore submitted a v2 based on a DTS compatible property. So the device tree change is about whether relocation is supported, not whether it is required? Indeed no, my intension is to say that relocation is requested. Do you mean that it should then not use a compatible ? The device tree describes hardware. It doesn't tell software how to use that hardware. Based on one of your other e-mails, I think what you want to say here is that the old binding didn't describe the registers needed for relocation, so the new compatible describes the new binding, rather than requesting that software do a relocation. Software that sees the new binding could choose to relocate, or just choose to read the current offset from the register. Not exactly. The old binding does describe the entire default param RAM (0x3d80 size 0x30). The relocation index is within this param RAM at 0x3dac. So the old binding is enough to allow relocation. Oh, so the relocation register is part of the region? If you relocate the region, does the relocation register move, or stay at 0x3dac? I checked the manual and it wasn't clear. I had assumed it worked the same as cpm2, where the relocation register does not move. The issue today with the driver (hence my first patch) is that the driver reads the relocation index but takes a wrong decision if the index is 0: it assumes that an nul index means that a param RAM shall be allocated, which is wrong. A nul index means that the component doesn't support relocation, so the default param RAM shall be used. The function used for that is supposed to return the index. So when the index is null, I need to calculate it. Now, it can't be the SPI driver by itself that decide if he has to relocate or not. Because it depends whether I need to relocate or not. There is no point in waisting another area of the dualport RAM if I don't need to use SCC2 in a mode that overlaps the SPI parameter RAM. Is the DPRAM currently fully utilized? If it's really important to not waste 48 bytes of DPRAM, Could you make the policy decision in platform code, or check at runtime what mode SCC2 is in? Today on the old MPC8xx, a microcode patch is needed in order to be able to relocate, and relocated address is directly fixed by the code handling the patch (sysdev/micropatch.c). The patch loading function is call very early in the boot process by cpm_reset() which is call by the xxx_setup_arch(). I have two issues with the way it is done today: 1/ the address which in hard coded is the micropatch loading function() is within the area for descripters for the QMC, so I would need to use another address. 2/ for new MPC8xx which don't need microcode patch, I have no way today to relocate. I have the same issue with the relocation of SMC1. Today when we activate SMC1 relocation microcode patch, the loading function has a hard coded relocation area for SMC1 which is the area dedicated to the MPC8xx DSP. It means that I need to change it as I want to use the DSP. Would it be acceptable to define a fixed relocation address in the Kconfig in which we select microcode patch (arch/powerpc/platforms/8xx), instead of having it hardcoded in micropatch.c ? No, that would prevent the ability to build support for all 8xx in one kernel. Or maybe it would be possible to select which microcode patch we want/need via the device tree and which address shall be used for relocation ? What would you suggest to describe it ? Yes, use the existing information in the device tree, or use PVR, to determine which chip you're on and thus whic microcode to use. How about checking for the existing specific-SoC compatibles? What do
Re: [PATCH 0/2] net: fs_enet: Remove non NAPI RX and add NAPI for TX
From: Christophe Leroy christophe.le...@c-s.fr Date: Tue, 7 Oct 2014 15:04:53 +0200 (CEST) When using a MPC8xx as a router, 'perf' shows a significant time spent in fs_enet_interrupt() and fs_enet_start_xmit(). 'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent between spin_unlock_irqrestore() and the following instruction, hence in interrupt handling. This is due to the TX complete interrupt that fires after each transmitted packet. This patchset first remove all non NAPI handling as NAPI has become the only mode for RX, then adds NAPI for handling TX complete. This improves NAT TCP throughput by 21% on MPC885 with FEC. Tested on MPC885 with FEC. [PATCH 1/2] net: fs_enet: Remove non NAPI RX [PATCH 2/2] net: fs_enet: Add NAPI TX Signed-off-by: Christophe Leroy christophe.le...@c-s.fr Series applied, thanks. Any particular reason you didn't just put the TX reclaim calls into the existing NAPI handler? That's what other drivers do, because TX reclaim can make SKBs available for RX packet receive on the local cpu. So generally you have one NAPI context that first does any pending TX reclaim, then polls the RX ring for new packets. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500
On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, September 30, 2014 2:36 AM To: Guenter Roeck Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G Varghese; Guenter Roeck; Jia Hongtao-B38951 Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500 On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote: From: Jojy G Varghese jo...@juniper.net For E500MC and E5500, a machine check exception in pci(e) memory space crashes the kernel. Testing shows that the MCAR(U) register is zero on a MC exception for the E5500 core. At the same time, DEAR register has been found to have the address of the faulty load address during an MC exception for this core. This fix changes the current behavior to fixup the result register and instruction pointers in the case of a load operation on a faulty PCI address. The changes are: - Added the hook to pci machine check handing to the e500mc machine check exception handler. - For the E5500 core, load faulting address from SPRN_DEAR register. As mentioned above, this is necessary because the E5500 core does not report the fault address in the MCAR register. Cc: Scott Wood scottw...@freescale.com Signed-off-by: Jojy G Varghese jo...@juniper.net [Guenter Roeck: updated description] Signed-off-by: Guenter Roeck gro...@juniper.net Signed-off-by: Guenter Roeck li...@roeck-us.net --- arch/powerpc/kernel/traps.c | 3 ++- arch/powerpc/sysdev/fsl_pci.c | 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs) int recoverable = 1; if (reason MCSR_LD) { - recoverable = fsl_rio_mcheck_exception(regs); + recoverable = fsl_rio_mcheck_exception(regs) || + fsl_pci_mcheck_exception(regs); if (recoverable == 1) goto silent_out; } diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs) #endif addr += mfspr(SPRN_MCAR); +#ifdef CONFIG_E5500_CPU + if (mfspr(SPRN_EPCR) SPRN_EPCR_ICM) + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR))); #endif Kconfig tells you what hardware is supported, not what hardware you're actually running on. Jia Hongtao, do you know anything about this issue? Is there an erratum? Sorry for the late response, I just return from my vacation. I don't know this issue. What chips are affected by the the erratum covered by http://patchwork.ozlabs.org/patch/240239/? MPC8544, MPC8548, MPC8572 are affected by this erratum. What is the erratum number? I checked P4080 which using e500mc and no such erratum is found. What is the erratum behavior, and how does it differ from the problem that Jojy and Guenter are trying to solve? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] CXL: Fix afu_read() not doing finish_wait() on signal or non-blocking
From: Ian Munsie imun...@au1.ibm.com If afu_read() returned due to a signal or the AFU file descriptor being opened non-blocking it would not call finish_wait() before returning, which could lead to a crash later when something else wakes up the wait queue. This patch restructures the wait logic to ensure that the cleanup is done correctly. Signed-off-by: Ian Munsie imun...@au1.ibm.com --- Resending with correct whitespace as my mailer decided to replace tabs with spaces on the last try. drivers/misc/cxl/file.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c index 847b7e6..378b099 100644 --- a/drivers/misc/cxl/file.c +++ b/drivers/misc/cxl/file.c @@ -273,6 +273,7 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, struct cxl_context *ctx = file-private_data; struct cxl_event event; unsigned long flags; + int rc; DEFINE_WAIT(wait); if (count CXL_READ_MIN_SIZE) @@ -285,13 +286,17 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, if (ctx_event_pending(ctx)) break; - spin_unlock_irqrestore(ctx-lock, flags); - if (file-f_flags O_NONBLOCK) - return -EAGAIN; + if (file-f_flags O_NONBLOCK) { + rc = -EAGAIN; + goto out; + } - if (signal_pending(current)) - return -ERESTARTSYS; + if (signal_pending(current)) { + rc = -ERESTARTSYS; + goto out; + } + spin_unlock_irqrestore(ctx-lock, flags); pr_devel(afu_read going to sleep...\n); schedule(); pr_devel(afu_read woken up\n); @@ -336,6 +341,11 @@ static ssize_t afu_read(struct file *file, char __user *buf, size_t count, if (copy_to_user(buf, event, event.header.size)) return -EFAULT; return event.header.size; + +out: + finish_wait(ctx-wq, wait); + spin_unlock_irqrestore(ctx-lock, flags); + return rc; } static const struct file_operations afu_fops = { -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500
-Original Message- From: Wood Scott-B07421 Sent: Thursday, October 09, 2014 7:48 AM To: Jia Hongtao-B38951 Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G Varghese; Guenter Roeck Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500 On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, September 30, 2014 2:36 AM To: Guenter Roeck Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Jojy G Varghese; Guenter Roeck; Jia Hongtao-B38951 Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500 On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote: From: Jojy G Varghese jo...@juniper.net For E500MC and E5500, a machine check exception in pci(e) memory space crashes the kernel. Testing shows that the MCAR(U) register is zero on a MC exception for the E5500 core. At the same time, DEAR register has been found to have the address of the faulty load address during an MC exception for this core. This fix changes the current behavior to fixup the result register and instruction pointers in the case of a load operation on a faulty PCI address. The changes are: - Added the hook to pci machine check handing to the e500mc machine check exception handler. - For the E5500 core, load faulting address from SPRN_DEAR register. As mentioned above, this is necessary because the E5500 core does not report the fault address in the MCAR register. Cc: Scott Wood scottw...@freescale.com Signed-off-by: Jojy G Varghese jo...@juniper.net [Guenter Roeck: updated description] Signed-off-by: Guenter Roeck gro...@juniper.net Signed-off-by: Guenter Roeck li...@roeck-us.net --- arch/powerpc/kernel/traps.c | 3 ++- arch/powerpc/sysdev/fsl_pci.c | 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs) int recoverable = 1; if (reason MCSR_LD) { - recoverable = fsl_rio_mcheck_exception(regs); + recoverable = fsl_rio_mcheck_exception(regs) || + fsl_pci_mcheck_exception(regs); if (recoverable == 1) goto silent_out; } diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs) #endif addr += mfspr(SPRN_MCAR); +#ifdef CONFIG_E5500_CPU + if (mfspr(SPRN_EPCR) SPRN_EPCR_ICM) + addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR))); #endif Kconfig tells you what hardware is supported, not what hardware you're actually running on. Jia Hongtao, do you know anything about this issue? Is there an erratum? Sorry for the late response, I just return from my vacation. I don't know this issue. What chips are affected by the the erratum covered by http://patchwork.ozlabs.org/patch/240239/? MPC8544, MPC8548, MPC8572 are affected by this erratum. What is the erratum number? The number of this erratum for each chip is not consistent. MPC8544: PCIe 4 MPC8548: PCI-Ex 39 MPC8572: PCI-Ex 3 I checked P4080 which using e500mc and no such erratum is found. What is the erratum behavior, and how does it differ from the problem that Jojy and Guenter are trying to solve? Here is the description of the erratum: When its link goes down, the PCI Express controller clears all outstanding transactions with an error indicator and sends a link down exception to the interrupt controller if PEX_PME_MES_DISR[LDDD] = 0. If, however, any transactions are sent to the controller after the link down event, they will be accepted by the controller and wait for the link to come back up before starting any timeout counters (e.g. completion timeout). There is no mechanism to cancel the new transactions short of a device HRESET. For e500mc as Jojy and Guenter described it's like the same erratum on e500, not 100% sure. For e5500 I don't quite understand yet. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 08/28/2014 03:05 AM, Sukadev Bhattiprolu wrote: Anshuman Khandual [khand...@linux.vnet.ibm.com] wrote: | This patch enables get and set of transactional memory related register | sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing | four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, | REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new | ELF core note types added previously in this regard. | | (1) NT_PPC_TM_SPR | (2) NT_PPC_TM_CGPR | (3) NT_PPC_TM_CFPR | (4) NT_PPC_TM_CVMX | | Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com | --- | arch/powerpc/include/asm/switch_to.h | 8 + | arch/powerpc/kernel/process.c| 24 ++ | arch/powerpc/kernel/ptrace.c | 792 +-- | 3 files changed, 795 insertions(+), 29 deletions(-) | | diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h | index 0e83e7d..2737f46 100644 | --- a/arch/powerpc/include/asm/switch_to.h | +++ b/arch/powerpc/include/asm/switch_to.h | @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t) | } | #endif | | +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM | +extern void flush_tmregs_to_thread(struct task_struct *); | +#else | +static inline void flush_tmregs_to_thread(struct task_struct *t) | +{ | +} | +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ | + | static inline void clear_task_ebb(struct task_struct *t) | { | #ifdef CONFIG_PPC_BOOK3S_64 | diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c | index 31d0215..e247898 100644 | --- a/arch/powerpc/kernel/process.c | +++ b/arch/powerpc/kernel/process.c | @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct *prev) | } | } | | +void flush_tmregs_to_thread(struct task_struct *tsk) | +{ | + /* | +* If task is not current, it should have been flushed | +* already to it's thread_struct during __switch_to(). | +*/ | + if (tsk != current) | + return; | + | + preempt_disable(); | + if (tsk-thread.regs) { | + /* | +* If we are still current, the TM state need to | +* be flushed to thread_struct as it will be still | +* present in the current cpu. | +*/ | + if (MSR_TM_ACTIVE(tsk-thread.regs-msr)) { | + __switch_to_tm(tsk); | + tm_recheckpoint_new_task(tsk); | + } | + } | + preempt_enable(); | +} | + | /* | * This is called if we are on the way out to userspace and the | * TIF_RESTORE_TM flag is set. It checks if we need to reload | diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c | index 2e3d2bf..17642ef 100644 | --- a/arch/powerpc/kernel/ptrace.c | +++ b/arch/powerpc/kernel/ptrace.c | @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, | return ret; | } | | +/* | + * When any transaction is active, thread_struct-transact_fp holds | + * the current running value of all FPR registers and thread_struct- | + * fp_state holds the last checkpointed FPR registers state for the | + * current transaction. | + * | + * struct data { | + * u64 fpr[32]; | + * u64 fpscr; | + * }; | + */ Maybe a reference to 'struct thread_fp_state' in the comments will help ? Okay, will try to add. | static int fpr_get(struct task_struct *target, const struct user_regset *regset, |unsigned int pos, unsigned int count, |void *kbuf, void __user *ubuf) | @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset, | u64 buf[33]; | int i; | #endif | - flush_fp_to_thread(target); | + if (MSR_TM_ACTIVE(target-thread.regs-msr)) { | + flush_fp_to_thread(target); | + flush_altivec_to_thread(target); | + flush_tmregs_to_thread(target); | + } else { | + flush_fp_to_thread(target); | + } flush_fp_to_thread(target) is uncondtional - so could be outside the if and else blocks ? yes | | #ifdef CONFIG_VSX | /* copy to local buffer then write that out */ | - for (i = 0; i 32 ; i++) | - buf[i] = target-thread.TS_FPR(i); | - buf[32] = target-thread.fp_state.fpscr; | + if (MSR_TM_ACTIVE(target-thread.regs-msr)) { | + for (i = 0; i 32 ; i++) | + buf[i] = target-thread.TS_TRANS_FPR(i); | + buf[32] = target-thread.transact_fp.fpscr; | + } else { | + for (i = 0; i 32 ; i++) | + buf[i] = target-thread.TS_FPR(i); | + buf[32] = target-thread.fp_state.fpscr; | + } | return user_regset_copyout(pos, count, kbuf, ubuf, buf, 0, -1); | | #else | - BUILD_BUG_ON(offsetof(struct
Re: [PATCH 0/2] net: fs_enet: Remove non NAPI RX and add NAPI for TX
Le 08/10/2014 22:03, David Miller a écrit : From: Christophe Leroy christophe.le...@c-s.fr Date: Tue, 7 Oct 2014 15:04:53 +0200 (CEST) When using a MPC8xx as a router, 'perf' shows a significant time spent in fs_enet_interrupt() and fs_enet_start_xmit(). 'perf annotate' shows that the time spent in fs_enet_start_xmit is indeed spent between spin_unlock_irqrestore() and the following instruction, hence in interrupt handling. This is due to the TX complete interrupt that fires after each transmitted packet. This patchset first remove all non NAPI handling as NAPI has become the only mode for RX, then adds NAPI for handling TX complete. This improves NAT TCP throughput by 21% on MPC885 with FEC. Tested on MPC885 with FEC. [PATCH 1/2] net: fs_enet: Remove non NAPI RX [PATCH 2/2] net: fs_enet: Add NAPI TX Signed-off-by: Christophe Leroy christophe.le...@c-s.fr Series applied, thanks. Any particular reason you didn't just put the TX reclaim calls into the existing NAPI handler? Not really. I used the gianfar.c driver as a model. That's what other drivers do, because TX reclaim can make SKBs available for RX packet receive on the local cpu. So generally you have one NAPI context that first does any pending TX reclaim, then polls the RX ring for new packets. Is that a better approach ? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev