linux-next: build failure after merge of the powerpc tree
Hi all, After merging the powerpc tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/kernel/irq.o: In function `.replay_system_reset': irq.c:(.text+0x10): undefined reference to `.ppc_save_regs' Caused by commit 78adf6c214f0 ("powerpc/64s: Implement system reset idle wakeup reason") I have applied the following fix patch: From: Stephen Rothwell Date: Thu, 2 Nov 2017 17:45:18 +1100 Subject: [PATCH] powerpc/64s: ppc_save_regs is now needed for all 64s builds Fixes: 78adf6c214f0 ("powerpc/64s: Implement system reset idle wakeup reason") Signed-off-by: Stephen Rothwell --- arch/powerpc/kernel/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 91960f83039c..34b6e729f38c 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -128,7 +128,7 @@ obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o obj-$(CONFIG_PPC64)+= $(obj64-y) obj-$(CONFIG_PPC32)+= $(obj32-y) -ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE),) +ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE)(CONFIG_PPC_BOOK3S),) obj-y += ppc_save_regs.o endif -- 2.14.1 -- Cheers, Stephen Rothwell
Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
On Tuesday 31 October 2017 07:49 PM, Naveen N . Rao wrote: Hi Kamalesh, Sorry for the late review. Overall, the patch looks good to me. So: Acked-by: Naveen N. Rao However, I have a few minor comments which can be addressed in a subsequent patch. Thanks for the review. [...] diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 0b0f896..005aaea 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c [...] @@ -239,10 +257,19 @@ static void relaswap(void *_x, void *_y, int size) /* Get size of potential trampolines required. */ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, - const Elf64_Shdr *sechdrs) + const Elf64_Shdr *sechdrs, + struct module *me) { /* One extra reloc so it's always 0-funcaddr terminated */ unsigned long relocs = 1; You might want to get rid of 'relocs' above and simply use the below two. + /* +* size of livepatch stub is 28 instructions, whereas the +* non-livepatch stub requires 7 instructions. Account for +* different stub sizes and track the livepatch relocation +* count in me->arch.klp_relocs. +*/ + unsigned long sec_relocs = 0; + unsigned long klp_relocs = 0; You should also initialize this to 1 (similar to relocs above) for use in the iterators below. Or, update the iterators to use me->arch.klp_relocs (1) relocs is Sum(sec_relocs), whereas per section relocation count is assigned to sec_relocs. If the section is livepatch section, then it added to klp_relocs or else to relocs. sec_relocs acts like a temp variable to hold the current section relocation count. unsigned i; /* Every relocated section... */ @@ -262,9 +289,14 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, sechdrs[i].sh_size / sizeof(Elf64_Rela), sizeof(Elf64_Rela), relacmp, relaswap); - relocs += count_relocs((void *)sechdrs[i].sh_addr, + sec_relocs = count_relocs((void *)sechdrs[i].sh_addr, sechdrs[i].sh_size / sizeof(Elf64_Rela)); How about also capturing 'sec_relocs' in 'struct mod_arch_specific'? (2) That will help simplify a lot of the calculations here and elsewhere. Note that there are many other places where the number of stubs is derived looking at 'sh_size', which is incorrect since we now have klp stubs as well (create_ftrace_stub() for instance). + relocs += sec_relocs; +#ifdef CONFIG_LIVEPATCH + if (sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH) + klp_relocs += sec_relocs; +#endif If a module has SHF_RELA_LIVEPATCH, but the kernel is compiled without CONFIG_LIVEPATCH, should we refuse to load the kernel module? Yes, the module load will fail. You might want to consider removing the above #ifdef and processing some of these flags regardless of CONFIG_LIVEPATCH. ifdef guard, can be replaced with check for SHF_RELA_LIVEPATCH in section flag. } } @@ -273,6 +305,15 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr, relocs++; #endif + relocs -= klp_relocs; +#ifdef CONFIG_LIVEPATCH + me->arch.klp_relocs = klp_relocs; + + pr_debug("Looks like a total of %lu stubs, (%lu) livepatch stubs, max\n", ^ (%lu livepatch stubs) Just wondering why the brackets are the way they are... Makes it better to use the brackets like you suggested. + relocs, klp_relocs); + return (relocs * sizeof(struct ppc64_stub_entry) + + klp_relocs * sizeof(struct ppc64le_klp_stub_entry)); +#endif pr_debug("Looks like a total of %lu stubs, max\n", relocs); return relocs * sizeof(struct ppc64_stub_entry); } [...] +#ifdef CONFIG_LIVEPATCH +static unsigned long klp_stub_for_addr(const Elf64_Shdr *sechdrs, + unsigned long addr, + struct module *me) +{ + struct ppc64le_klp_stub_entry *klp_stubs; + unsigned int num_klp_stubs = me->arch.klp_relocs; + unsigned int i, num_stubs; + + num_stubs = (sechdrs[me->arch.stubs_section].sh_size - + (num_klp_stubs * sizeof(*klp_stubs))) / + sizeof(struct ppc64_stub_entry); (2) This can be simplified if we have me->arch.sec_relocs. Having it will make the calculation look simple: num_stubs = (me->arch.sec_relocs * size(struct ppc64_stub_entry). + + /* +* Create livepatch stubs after the regular stubs. +*/ +
Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
You are welcome! :) On Thu, Nov 02, 2017 at 01:44:07PM +1100, Cyril Bur wrote: On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote: Hi Cyril, Thank you for the patch! Yet something to improve: Once again robot, you have done brilliantly! You're 100% correct and the last thing I want to do is break the build with CONFIG_PPC_TRANSACTIONAL_MEM turned off. Life saver, Thanks so much kbuild. Cyril [auto build test ERROR on powerpc/next] [also build test ERROR on v4.14-rc7 next-20171018] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-asp8347_defconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/kernel/process.c: In function 'is_transactionally_fp': > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has no member named 'ckpt_regs' (tsk->thread.ckpt_regs.msr & MSR_FP); ^ arch/powerpc/kernel/process.c:244:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors vim +243 arch/powerpc/kernel/process.c 239 240 static int is_transactionally_fp(struct task_struct *tsk) 241 { 242 return msr_tm_active(tsk->thread.regs->msr) && > 243 (tsk->thread.ckpt_regs.msr & MSR_FP); 244 } 245 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH v2] powerpc/powernv: Improve local TLB flush for boot and MCE on POWER9
There are two cases outside the normal address space management where a CPU's local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. CPU state restore from deep idle states also currently flushes the TLB. However this is just a side-effect of reusing the boot code to set CPU state, rather than a requirement itself. This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account. tlbiel is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. To improve this situation, consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash. Take it out from early cputable detection hooks, and move it later in the boot process after the MMU registers are set up and before relocation is first turned on. Provide capability for LPID flush to specify radix mode. The TLB flush is no longer called when restoring from deep idle states. Signed-off-by: Nicholas Piggin --- Booted this on power8 and power9 with hash and radix arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 1 + .../powerpc/include/asm/book3s/64/tlbflush-radix.h | 3 + arch/powerpc/include/asm/book3s/64/tlbflush.h | 34 ++ arch/powerpc/include/asm/cputable.h| 12 --- arch/powerpc/kernel/cpu_setup_power.S | 50 - arch/powerpc/kernel/cputable.c | 14 --- arch/powerpc/kernel/dt_cpu_ftrs.c | 30 -- arch/powerpc/kernel/mce_power.c| 115 + arch/powerpc/kvm/book3s_hv_ras.c | 6 +- arch/powerpc/mm/hash_native_64.c | 97 + arch/powerpc/mm/hash_utils_64.c| 8 ++ arch/powerpc/mm/pgtable-radix.c| 6 ++ arch/powerpc/mm/tlb-radix.c| 66 13 files changed, 219 insertions(+), 223 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h index 99c99bb04353..f0e17f3f3f2f 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -50,6 +50,7 @@ static inline void arch_leave_lazy_mmu_mode(void) #define arch_flush_lazy_mmu_mode() do {} while (0) +extern void hash__tlbiel_all(unsigned int action); extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize, unsigned long flags); diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h index af06c6fe8a9f..db025dd7f619 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h @@ -10,6 +10,8 @@ static inline int mmu_get_ap(int psize) return mmu_psize_defs[psize].ap; } +extern void radix__tlbiel_all(unsigned int action); + extern void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start, @@ -46,4 +48,5 @@ extern void radix__flush_tlb_lpid(unsigned long lpid); extern void radix__flush_tlb_all(void); extern void radix__flush_tlb_pte_p9_dd1(unsigned long old_pte, struct mm_struct *mm, unsigned long address); + #endif diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h index 70760d018bcd..d9f3dc35d8fd 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h @@ -7,6 +7,40 @@ #include #include +/* TLB flush actions. Used as argument to tlbiel_all() */ +enum { + TLB_INVAL_SCOPE_GLOBAL = 0, /* invalidate all TLBs */ + TLB_INVAL_SCOPE_LPID = 1, /* invalidate TLBs for current LPID */ +}; + +static inline void tlbiel_all(void) +{ + /* +* This is used for host machine check and bootup. +* +* This uses early_radix_enabled and implementations use +* early_cpu_has_feature etc because that works early in boot +* and this is the machine check path which is not performance +* critical. +*/ + if (early_radix_enabled()) + radix__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL); + else + hash__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL); +} + +static
Re: [RFC 2/2] powerpc/mm: Enable deferred flushing of TLB during reclaim
On 11/01/2017 03:47 PM, Anshuman Khandual wrote: > Deferred flushing can only be enabled on POWER9 DD2.0 processor onwards. > Because prior versions of POWER9 and previous hash table based POWER > processors will do TLB flushing in pte_get_and_clear() function itself > which then prevents batching and eventual flush completion later on. > > Signed-off-by: Anshuman Khandual > --- > arch/powerpc/Kconfig| 1 + > arch/powerpc/include/asm/tlbbatch.h | 30 +++ > arch/powerpc/include/asm/tlbflush.h | 3 +++ > arch/powerpc/mm/tlb-radix.c | 49 > + > 4 files changed, 83 insertions(+) > create mode 100644 arch/powerpc/include/asm/tlbbatch.h > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 809c468..f06b565 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -230,6 +230,7 @@ config PPC > select SPARSE_IRQ > select SYSCTL_EXCEPTION_TRACE > select VIRT_TO_BUS if !PPC64 > + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if (PPC64 && PPC_BOOK3S) > # > # Please keep this list sorted alphabetically. > # > diff --git a/arch/powerpc/include/asm/tlbbatch.h > b/arch/powerpc/include/asm/tlbbatch.h > new file mode 100644 > index 000..fc762ef > --- /dev/null > +++ b/arch/powerpc/include/asm/tlbbatch.h > @@ -0,0 +1,30 @@ > +#ifndef _ARCH_POWERPC_TLBBATCH_H > +#define _ARCH_POWERPC_TLBBATCH_H > + > +#include > + > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > + > +#define MAX_BATCHED_MM 1024 > + > +struct arch_tlbflush_unmap_batch { > + /* > + * Each bit set is a CPU that potentially has a > + * TLB entry for one of the PFN being flushed. > + * This represents whether all deferred struct > + * mm will be flushed for any given CPU. > + */ > + struct cpumask cpumask; > + > + /* All the deferred struct mm */ > + struct mm_struct *mm[MAX_BATCHED_MM]; > + unsigned long int nr_mm; > + > +}; > + > +extern bool arch_tlbbatch_should_defer(struct mm_struct *mm); > +extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > +extern void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, > + struct mm_struct *mm); > +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ > +#endif /* _ARCH_POWERPC_TLBBATCH_H */ > diff --git a/arch/powerpc/include/asm/tlbflush.h > b/arch/powerpc/include/asm/tlbflush.h > index 13dbcd4..2041923 100644 > --- a/arch/powerpc/include/asm/tlbflush.h > +++ b/arch/powerpc/include/asm/tlbflush.h > @@ -20,6 +20,9 @@ > */ > #ifdef __KERNEL__ > > +#include > +#include > + > #ifdef CONFIG_PPC_MMU_NOHASH > /* > * TLB flushing for software loaded TLB chips > diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c > index b3e849c..506e7ed 100644 > --- a/arch/powerpc/mm/tlb-radix.c > +++ b/arch/powerpc/mm/tlb-radix.c > @@ -12,6 +12,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -519,3 +521,50 @@ extern void radix_kvm_prefetch_workaround(struct > mm_struct *mm) > } > EXPORT_SYMBOL_GPL(radix_kvm_prefetch_workaround); > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > + > +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > +static void clear_tlb(void *data) > +{ > + struct arch_tlbflush_unmap_batch *batch = data; > + int i; > + > + WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)); > + > + for (i = 0; i < batch->nr_mm; i++) { > + if (batch->mm[i]) > + radix__local_flush_tlb_mm(batch->mm[i]); > + } > +} Instead of clearing each affected 'struct mm' on the CPU, we can just TLB flush the entire CPU for the given partition. But its not really giving any improvement from the flushing mechanism described above. diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index 506e7ed..b0eb218 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -525,15 +525,8 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct *mm) #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH static void clear_tlb(void *data) { - struct arch_tlbflush_unmap_batch *batch = data; - int i; - WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)); - - for (i = 0; i < batch->nr_mm; i++) { - if (batch->mm[i]) - radix__local_flush_tlb_mm(batch->mm[i]); - } + cur_cpu_spec->flush_tlb(TLB_INVAL_SCOPE_LPID); } $time ./run case-lru-file-mmap-read real4m15.766s user108m6.967s sys 393m15.152s
Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
On Thu, Nov 02, 2017 at 01:44:07PM +1100, Cyril Bur wrote: > On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote: > > Hi Cyril, > > > > Thank you for the patch! Yet something to improve: > > > > Once again robot, you have done brilliantly! You're 100% correct and > the last thing I want to do is break the build with > CONFIG_PPC_TRANSACTIONAL_MEM turned off. > > Life saver, > Thanks so much kbuild. Thanks Cyril, we are glad to do some help. > > Cyril > > > [auto build test ERROR on powerpc/next] > > [also build test ERROR on v4.14-rc7 next-20171018] > > [if your patch is applied to the wrong git tree, please drop us a note to > > help improve the system] > > > > url: > > https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > > next > > config: powerpc-asp8347_defconfig (attached as .config) > > compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 > > reproduce: > > wget > > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > > ~/bin/make.cross > > chmod +x ~/bin/make.cross > > # save the attached .config to linux build tree > > make.cross ARCH=powerpc > > > > All errors (new ones prefixed by >>): > > > >arch/powerpc/kernel/process.c: In function 'is_transactionally_fp': > > > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has > > > > no member named 'ckpt_regs' > > > > (tsk->thread.ckpt_regs.msr & MSR_FP); > > ^ > >arch/powerpc/kernel/process.c:244:1: error: control reaches end of > > non-void function [-Werror=return-type] > > } > > ^ > >cc1: all warnings being treated as errors > > > > vim +243 arch/powerpc/kernel/process.c > > > >239 > >240 static int is_transactionally_fp(struct task_struct *tsk) > >241 { > >242 return msr_tm_active(tsk->thread.regs->msr) && > > > 243 (tsk->thread.ckpt_regs.msr & MSR_FP); > >244 } > >245 > > > > --- > > 0-DAY kernel test infrastructureOpen Source Technology > > Center > > https://lists.01.org/pipermail/kbuild-all Intel > > Corporation >
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements
On Thu, 2 Nov 2017 08:49:49 +0530 Anshuman Khandual wrote: > On 11/01/2017 07:09 PM, Nicholas Piggin wrote: > > On Wed, 1 Nov 2017 17:35:51 +0530 > > Anshuman Khandual wrote: > > > >> On 10/31/2017 12:14 PM, Nicholas Piggin wrote: > >>> Here's a random mix of performance improvements for radix TLB flushing > >>> code. The main aims are to reduce the amount of translation that gets > >>> invalidated, and to reduce global flushes where we can do local. > >>> > >>> To that end, a parallel kernel compile benchmark using powerpc:tlbie > >>> tracepoint shows a reduction in tlbie instructions from about 290,000 > >>> to 80,000, and a reduction in tlbiel instructions from 49,500,000 to > >>> 15,000,000. Looks great, but unfortunately does not translate to a > >>> statistically significant performance improvement! The needle on TLB > >>> misses does not move much, I suspect because a lot of the flushing is > >>> done a startup and shutdown, and because a significant cost of TLB > >>> flushing itself is in the barriers. > >> > >> Does memory barrier initiate a single global invalidation with tlbie ? > >> > > > > I'm not quite sure what you're asking, and I don't know the details > > of how the hardware handles it, but from the measurements in patch > > 1 of the series we can see there is a benefit for both tlbie and > > tlbiel of batching them up between barriers. > > Ahh, I might have got the statement "a significant cost of TLB flushing > itself is in the barriers" wrong. I guess you were mentioning about the > total cost of multiple TLB flushes with memory barriers in between each > of them which is causing the high execution cost. This got reduced by > packing multiple tlbie(l) instruction between a single memory barrier. > Yes that did get reduced for the va range flush in my patches. However the big reduction in the number of tlbiel calls came from more use of range flushes and fewer use of PID flushes. But the PID flushes already have such optimization. Therefore despite tlbiel being reduced, the number of barriers probably has not gone down a great deal on this workload, which may explain why performance numbers are basically in the noise. Thanks, Nick
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements
On 11/01/2017 07:09 PM, Nicholas Piggin wrote: > On Wed, 1 Nov 2017 17:35:51 +0530 > Anshuman Khandual wrote: > >> On 10/31/2017 12:14 PM, Nicholas Piggin wrote: >>> Here's a random mix of performance improvements for radix TLB flushing >>> code. The main aims are to reduce the amount of translation that gets >>> invalidated, and to reduce global flushes where we can do local. >>> >>> To that end, a parallel kernel compile benchmark using powerpc:tlbie >>> tracepoint shows a reduction in tlbie instructions from about 290,000 >>> to 80,000, and a reduction in tlbiel instructions from 49,500,000 to >>> 15,000,000. Looks great, but unfortunately does not translate to a >>> statistically significant performance improvement! The needle on TLB >>> misses does not move much, I suspect because a lot of the flushing is >>> done a startup and shutdown, and because a significant cost of TLB >>> flushing itself is in the barriers. >> >> Does memory barrier initiate a single global invalidation with tlbie ? >> > > I'm not quite sure what you're asking, and I don't know the details > of how the hardware handles it, but from the measurements in patch > 1 of the series we can see there is a benefit for both tlbie and > tlbiel of batching them up between barriers. Ahh, I might have got the statement "a significant cost of TLB flushing itself is in the barriers" wrong. I guess you were mentioning about the total cost of multiple TLB flushes with memory barriers in between each of them which is causing the high execution cost. This got reduced by packing multiple tlbie(l) instruction between a single memory barrier.
[PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur --- V2: Unchanged V3: Unchanged arch/powerpc/include/asm/tm.h | 5 ++-- arch/powerpc/kernel/process.c | 22 ++- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 2 +- arch/powerpc/kernel/tm.S| 59 - arch/powerpc/kernel/traps.c | 26 +- 6 files changed, 35 insertions(+), 81 deletions(-) diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index 82e06ca3a49b..33d965911bec 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -11,10 +11,9 @@ extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, - unsigned long orig_msr, uint8_t cause); + uint8_t cause); extern void tm_reclaim_current(uint8_t cause); -extern void tm_recheckpoint(struct thread_struct *thread, - unsigned long orig_msr); +extern void tm_recheckpoint(struct thread_struct *thread); extern void tm_abort(uint8_t cause); extern void tm_save_sprs(struct thread_struct *thread); extern void tm_restore_sprs(struct thread_struct *thread); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index bf651f2fd3bd..b00c291cd05c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -869,6 +869,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, giveup_all(container_of(thr, struct task_struct, thread)); + tm_reclaim(thr, cause); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -887,8 +889,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, if ((thr->ckpt_regs.msr & MSR_VEC) == 0) memcpy(&thr->ckvr_state, &thr->vr_state, sizeof(struct thread_vr_state)); - - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } void tm_reclaim_current(uint8_t cause) @@ -937,11 +937,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk) tm_save_sprs(thr); } -extern void __tm_recheckpoint(struct thread_struct *th
[PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable
After handling a transactional FP, Altivec or VSX unavailable exception. The return to userspace code will detect that the TIF_RESTORE_TM bit is set and call restore_tm_state(). restore_tm_state() will call restore_math() to ensure that the correct facilities are loaded. This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm() is doing pointless work and can simply be removed. Signed-off-by: Cyril Bur --- V2: Obvious cleanup which should have been in v1 V3: Unchanged arch/powerpc/kernel/traps.c | 30 -- 1 file changed, 30 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 4a7bc64352fd..3181e85ef17c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * so we don't want to load the VRs from the thread_struct. */ tm_recheckpoint(¤t->thread); - - /* If VMX is in use, get the transactional values back */ - if (orig_msr & MSR_VEC) { - msr_check_and_set(MSR_VEC); - load_vr_state(¤t->thread.vr_state); - /* At this point all the VSX state is loaded, so enable it */ - regs->msr |= MSR_VSX; - } } void altivec_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs) current->thread.load_vec = 1; tm_recheckpoint(¤t->thread); current->thread.used_vr = 1; - - if (orig_msr & MSR_FP) { - msr_check_and_set(MSR_FP); - load_fp_state(¤t->thread.fp_state); - regs->msr |= MSR_VSX; - } } void vsx_unavailable_tm(struct pt_regs *regs) @@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs) current->thread.load_fp = 1; tm_recheckpoint(¤t->thread); - - msr_check_and_set(MSR_FP | MSR_VEC); - load_fp_state(¤t->thread.fp_state); - load_vr_state(¤t->thread.vr_state); } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ -- 2.15.0
[PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). This patch is a minimal fix for ease of backporting. A more correct fix which removes the msr parameter to tm_reclaim() and tm_recheckpoint() altogether has been upstreamed to apply on top of this patch. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Signed-off-by: Cyril Bur --- V2: Add this patch for ease of backporting the same fix as the next patch. V3: No change arch/powerpc/kernel/process.c | 4 ++-- arch/powerpc/kernel/traps.c | 22 +- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index cff887e67eb9..bf651f2fd3bd 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -867,6 +867,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, if (!MSR_TM_SUSPENDED(mfmsr())) return; + giveup_all(container_of(thr, struct task_struct, thread)); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, memcpy(&thr->ckvr_state, &thr->vr_state, sizeof(struct thread_vr_state)); - giveup_all(container_of(thr, struct task_struct, thread)); - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index ef6a45969812..a7d42c89a257 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * If VMX is in use, the VRs now hold checkpointed values, * so we don't want to load the VRs from the thread_struct. */ - tm_recheckpoint(¤t->thread, MSR_FP); + tm_recheckpoint(¤t->thread, orig_msr | MSR_FP); /* If VMX is in use, get the transactional values back */ - if (regs->msr & MSR_VEC) { + if (orig_msr & MSR_VEC) { msr_check_and_set(MSR_VEC); load_vr_state(¤t->thread.vr_state); /* At this point all the VSX state is loaded, so enable it */ @@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs) void altivec_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs) regs->nip, regs->msr); tm_reclaim_current(TM_CAUSE_FAC_UNAV); current->thread.load_vec = 1; - tm_recheckpoint(¤t->thread, MSR_VEC); + tm_recheckpoint(¤t->thread, orig_msr | MSR_VEC); current->thread.used_vr = 1; - if (regs->msr & MSR_FP) { + if (orig_msr & MSR_FP) { msr_check_and_set(MSR_FP); load_fp_state(¤t->thread.fp_state); regs->msr |= MSR_VSX; @@ -1559,7 +1571,7 @@ void vsx_u
[PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. Further compounding the issue, the transactional {fp,altivec,vsx} unavailable code has been incorrectly using the MSR to enable facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply means if the registers are live on the CPU, not if the kernel should load them before returning to userspace. This has worked due to the bug mentioned above. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur --- V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use the load_fp and load_vec booleans to help restore_math() make the correct decision V3: Put tm_active_with_{fp,altivec}() inside a #ifdef CONFIG_PPC_TRANSACTIONAL_MEM arch/powerpc/kernel/process.c | 18 -- arch/powerpc/kernel/traps.c | 8 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf3454..cff887e67eb9 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -97,9 +97,23 @@ static inline bool msr_tm_active(unsigned long msr) { return MSR_TM_ACTIVE(msr); } + +static bool tm_active_with_fp(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_FP); +} + +static bool tm_active_with_altivec(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_VEC); +} #else static inline bool msr_tm_active(unsigned long msr) { return false; } static inline void check_if_tm_restore_required(struct task_struct *tsk) { } +static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; } +static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ bool strict_msr_control; @@ -232,7 +246,7 @@ EXPORT_SYMBOL(enable_kernel_fp); static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) { + if (tsk->thread.load_fp || tm_active_with_fp(tsk)) { load_fp_state(¤t->thread.fp_state); current-
Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote: > Hi Cyril, > > Thank you for the patch! Yet something to improve: > Once again robot, you have done brilliantly! You're 100% correct and the last thing I want to do is break the build with CONFIG_PPC_TRANSACTIONAL_MEM turned off. Life saver, Thanks so much kbuild. Cyril > [auto build test ERROR on powerpc/next] > [also build test ERROR on v4.14-rc7 next-20171018] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816 > base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next > config: powerpc-asp8347_defconfig (attached as .config) > compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=powerpc > > All errors (new ones prefixed by >>): > >arch/powerpc/kernel/process.c: In function 'is_transactionally_fp': > > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has > > > no member named 'ckpt_regs' > > (tsk->thread.ckpt_regs.msr & MSR_FP); > ^ >arch/powerpc/kernel/process.c:244:1: error: control reaches end of > non-void function [-Werror=return-type] > } > ^ >cc1: all warnings being treated as errors > > vim +243 arch/powerpc/kernel/process.c > >239 >240static int is_transactionally_fp(struct task_struct *tsk) >241{ >242return msr_tm_active(tsk->thread.regs->msr) && > > 243(tsk->thread.ckpt_regs.msr & MSR_FP); >244} >245 > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
Hi Cyril, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.14-rc7 next-20171018] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-asp8347_defconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/kernel/process.c: In function 'is_transactionally_fp': >> arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has no >> member named 'ckpt_regs' (tsk->thread.ckpt_regs.msr & MSR_FP); ^ arch/powerpc/kernel/process.c:244:1: error: control reaches end of non-void function [-Werror=return-type] } ^ cc1: all warnings being treated as errors vim +243 arch/powerpc/kernel/process.c 239 240 static int is_transactionally_fp(struct task_struct *tsk) 241 { 242 return msr_tm_active(tsk->thread.regs->msr) && > 243 (tsk->thread.ckpt_regs.msr & MSR_FP); 244 } 245 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH v2 2/2] powerpc/64s: idle skip POWER9 DD1 and DD2.0 specific workarounds on DD2.1
DD2.1 does not have to flush the ERAT after a state-loss idle. It also does not have to save and restore MMCR0 (although it does have to save restore in deep idle states, like other PMU registers). Performance testing was done on a DD2.1 using only the stop0 idle state (the shallowest state which supports state loss), using context_switch selftest configured to ping-poing between two threads on the same core and two different cores. Performance improvement for same core is 7.0%, different cores is 14.8%. Reviewed-by: Vaidyanathan Srinivasan Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/idle_book3s.S | 31 --- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S index 175d49f468af..59657783d1d5 100644 --- a/arch/powerpc/kernel/idle_book3s.S +++ b/arch/powerpc/kernel/idle_book3s.S @@ -112,12 +112,14 @@ power9_save_additional_sprs: std r4, STOP_HFSCR(r13) mfspr r3, SPRN_MMCRA - mfspr r4, SPRN_MMCR1 + mfspr r4, SPRN_MMCR0 std r3, STOP_MMCRA(r13) - std r4, STOP_MMCR1(r13) + std r4, _MMCR0(r1) - mfspr r3, SPRN_MMCR2 - std r3, STOP_MMCR2(r13) + mfspr r3, SPRN_MMCR1 + mfspr r4, SPRN_MMCR2 + std r3, STOP_MMCR1(r13) + std r4, STOP_MMCR2(r13) blr power9_restore_additional_sprs: @@ -135,11 +137,14 @@ power9_restore_additional_sprs: ld r4, STOP_MMCRA(r13) mtspr SPRN_HFSCR, r3 mtspr SPRN_MMCRA, r4 - /* We have already restored PACA_MMCR0 */ - ld r3, STOP_MMCR1(r13) - ld r4, STOP_MMCR2(r13) - mtspr SPRN_MMCR1, r3 - mtspr SPRN_MMCR2, r4 + + ld r3, _MMCR0(r1) + ld r4, STOP_MMCR1(r13) + mtspr SPRN_MMCR0, r3 + mtspr SPRN_MMCR1, r4 + + ld r3, STOP_MMCR2(r13) + mtspr SPRN_MMCR2, r3 blr /* @@ -350,6 +355,7 @@ power_enter_stop: b pnv_wakeup_noloss .Lhandle_esl_ec_set: +BEGIN_FTR_SECTION /* * POWER9 DD2 can incorrectly set PMAO when waking up after a * state-loss idle. Saving and restoring MMCR0 over idle is a @@ -357,6 +363,7 @@ power_enter_stop: */ mfspr r4,SPRN_MMCR0 std r4,_MMCR0(r1) +END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20) /* * Check if the requested state is a deep idle state. @@ -542,15 +549,17 @@ pnv_restore_hyp_resource_arch300: * then clear bit 60 in MMCRA to ensure the PMU starts running. */ blt cr3,1f +BEGIN_FTR_SECTION PPC_INVALIDATE_ERAT ld r1,PACAR1(r13) + ld r4,_MMCR0(r1) + mtspr SPRN_MMCR0,r4 +END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20) mfspr r4,SPRN_MMCRA ori r4,r4,(1 << (63-60)) mtspr SPRN_MMCRA,r4 xorir4,r4,(1 << (63-60)) mtspr SPRN_MMCRA,r4 - ld r4,_MMCR0(r1) - mtspr SPRN_MMCR0,r4 1: /* * POWER ISA 3. Use PSSCR to determine if we -- 2.15.0
[PATCH v2 1/2] powerpc: add POWER9_DD20 feature
Cc: Michael Neuling Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/cputable.h | 5 - arch/powerpc/kernel/cputable.c | 20 arch/powerpc/kernel/dt_cpu_ftrs.c | 2 ++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index a9bf921f4efc..194dc3006446 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -215,6 +215,7 @@ enum { #define CPU_FTR_DABRX LONG_ASM_CONST(0x0800) #define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000) #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000) +#define CPU_FTR_POWER9_DD20LONG_ASM_CONST(0x8000) #ifndef __ASSEMBLY__ @@ -477,6 +478,7 @@ enum { CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300) #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \ (~CPU_FTR_SAO)) +#define CPU_FTRS_POWER9_DD20 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD20) #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -495,7 +497,8 @@ enum { (CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \ -CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | CPU_FTRS_POWER9_DD1) +CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \ +CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD20) #endif #else enum { diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 760872916013..171820190de7 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -547,6 +547,26 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check_early= __machine_check_early_realmode_p9, .platform = "power9", }, + { /* Power9 DD2.0 */ + .pvr_mask = 0xefff, + .pvr_value = 0x004e0200, + .cpu_name = "POWER9 (raw)", + .cpu_features = CPU_FTRS_POWER9_DD20, + .cpu_user_features = COMMON_USER_POWER9, + .cpu_user_features2 = COMMON_USER2_POWER9, + .mmu_features = MMU_FTRS_POWER9, + .icache_bsize = 128, + .dcache_bsize = 128, + .num_pmcs = 6, + .pmc_type = PPC_PMC_IBM, + .oprofile_cpu_type = "ppc64/power9", + .oprofile_type = PPC_OPROFILE_INVALID, + .cpu_setup = __setup_cpu_power9, + .cpu_restore= __restore_cpu_power9, + .flush_tlb = __flush_tlb_power9, + .machine_check_early= __machine_check_early_realmode_p9, + .platform = "power9", + }, { /* Power9 */ .pvr_mask = 0x, .pvr_value = 0x004e, diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 7275fed271af..63b9d7edd63f 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -735,6 +735,8 @@ static __init void cpufeatures_cpu_quirks(void) */ if ((version & 0xff00) == 0x004e0100) cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD1; + else if ((version & 0xefff) == 0x004e0200) + cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD20; } static void __init cpufeatures_setup_finished(void) -- 2.15.0
[PATCH v2 0/2] Add POWER9 DD2.0 feature, remove idle workarounds in DD2.1
Chnages since v1: - Change unnecessarily 'inverted else' conditions for the DD2.0 feature workarounds, noted by mpe. - One of the cases was incorrect, removing the code in for DD2.0 and leaving it in for 2.1. Performance is further slightly improved with this case fixed. Nicholas Piggin (2): powerpc: add POWER9_DD20 feature powerpc/64s: idle skip POWER9 DD1 and DD2.0 specific workarounds on DD2.1 arch/powerpc/include/asm/cputable.h | 5 - arch/powerpc/kernel/cputable.c | 20 arch/powerpc/kernel/dt_cpu_ftrs.c | 2 ++ arch/powerpc/kernel/idle_book3s.S | 31 --- 4 files changed, 46 insertions(+), 12 deletions(-) -- 2.15.0
RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC
On Wed, 1 Nov 2017, Thomas Gleixner wrote: > -Original Message- > From: Thomas Gleixner [mailto:t...@linutronix.de] > Sent: Thursday, November 02, 2017 1:10 AM > To: Qiang Zhao > Cc: Michael Ellerman ; Jason Cooper > ; Marc Zyngier ; > o...@buserror.net; linuxppc-dev@lists.ozlabs.org; Xiaobo Xie > ; linux-ker...@vger.kernel.org > Subject: RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC > > On Wed, 1 Nov 2017, Qiang Zhao wrote: > > Michael Ellerman wrote > > > > > > Qiang Zhao writes: > > > > > > > Hi all, > > > > > > > > Could anybody review this patchset and take action on them? Thank you! > > > > > > Who maintains this? I don't actually know, it's not powerpc code, or is > > > it? > > > > Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason > > and > Marc according to MAINTAINERS file. > > > > Hi Thomas, Jason and Marc, > > > > Could you keep an eye on this patchset? Thank you! > > It's on my radar, but I have zero capacity at the moment. Hopefully Marc can > spare a few cycles. > > Thanks, > > tglx Thank you! Best Regards Qiang Zhao
RE: [Patch v10] QE: remove PPCisms for QE
On 01/11/17 05:19PM, Julien Thierry wrote: > -Original Message- > From: Julien Thierry [mailto:julien.thie...@arm.com] > Sent: Wednesday, November 01, 2017 5:19 PM > To: Qiang Zhao ; o...@buserror.net > Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org > Subject: Re: [Patch v10] QE: remove PPCisms for QE > > Hi Zhao, > > I just noticed a small nit. > > > /* wait for the QE_CR_FLG to clear */ > > - ret = spin_event_timeout((in_be32(&qe_immr->cp.cecr) & QE_CR_FLG) > == 0, > > - 100, 0); > > + ret = -EIO; > > + for (i = 0; i < 100; i++) { > > + if ((ioread32be(&qe_immr->cp.cecr) & QE_CR_FLG) == 0) { > > + ret = 0; > > + break; > > + } > > + udelay(1); > > + } > > + > > /* On timeout (e.g. failure), the expression will be false (ret == 0), > >otherwise it will be true (ret == 1). */ > > nit: > The comment here is no longer valid, on timeout ret == -EIO and on success 0. > It > should probably be removed to avoid confusion. So thank you for you reminder, I ignored it, and will fix in the next version. BR -Qiang Zhao
Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM
On Wed, 2017-11-01 at 15:23 -0400, Gustavo Romero wrote: > Add a self test to check if FP/VEC/VSX registers are sane (restored > correctly) after a FP/VEC/VSX unavailable exception is caught during a > transaction. > > This test checks all possibilities in a thread regarding the combination > of MSR.[FP|VEC] states in a thread and for each scenario raises a > FP/VEC/VSX unavailable exception in transactional state, verifying if > vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg > sets, are not corrupted. > Thanks Gustavo, I do have one more thought on an improvement for this test which is that: + /* Counter for busy wait * + uint64_t counter = 0x1ff00; is a bit fragile, what we should do is have the test work out long it should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then use that for these tests. This will only become a problem if we were to change kernel heuristics which is fine for now. I'll try to get that added soon but for now this test has proven too useful to delay adding as is. > Signed-off-by: Gustavo Romero > Signed-off-by: Breno Leitao > Signed-off-by: Cyril Bur > --- > tools/testing/selftests/powerpc/tm/Makefile| 3 +- > .../testing/selftests/powerpc/tm/tm-unavailable.c | 368 > + > tools/testing/selftests/powerpc/tm/tm.h| 5 + > 3 files changed, 375 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c > > diff --git a/tools/testing/selftests/powerpc/tm/Makefile > b/tools/testing/selftests/powerpc/tm/Makefile > index 7bfcd45..24855c0 100644 > --- a/tools/testing/selftests/powerpc/tm/Makefile > +++ b/tools/testing/selftests/powerpc/tm/Makefile > @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr > tm-signal-context-chk-fpu > tm-signal-context-chk-vmx tm-signal-context-chk-vsx > > TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv > tm-signal-stack \ > - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \ > + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \ > $(SIGNAL_CONTEXT_CHK_TESTS) > > include ../../lib.mk > @@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include > $(OUTPUT)/tm-tmspr: CFLAGS += -pthread > $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 > $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o > +$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 > -Wno-error=uninitialized -mvsx > > SIGNAL_CONTEXT_CHK_TESTS := $(patsubst > %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) > $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S > diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > new file mode 100644 > index 000..69a4e8c > --- /dev/null > +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > @@ -0,0 +1,368 @@ > +/* > + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp. > + * Licensed under GPLv2. > + * > + * Force FP, VEC and VSX unavailable exception during transaction in all > + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP > + * is enable and VEC is disable, when FP is disable and VEC is enable, and > + * so on. Then we check if the restored state is correctly set for the > + * FP and VEC registers to the previous state we set just before we entered > + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and > + * VEC/Altivec registers on abortion due to an unavailable exception in TM. > + * N.B. In this test we do not test all the FP/Altivec/VSX registers for > + * corruption, but only for registers vs0 and vs32, which are respectively > + * representatives of FP and VEC/Altivec reg sets. > + */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "tm.h" > + > +#define DEBUG 0 > + > +/* Unavailable exceptions to test in HTM */ > +#define FP_UNA_EXCEPTION 0 > +#define VEC_UNA_EXCEPTION1 > +#define VSX_UNA_EXCEPTION2 > + > +#define NUM_EXCEPTIONS 3 > + > +struct Flags { > + int touch_fp; > + int touch_vec; > + int result; > + int exception; > +} flags; > + > +bool expecting_failure(void) > +{ > + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION) > + return false; > + > + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION) > + return false; > + > + /* If both FP and VEC are touched it does not mean that touching > + * VSX won't raise an exception. However since FP and VEC state > + * are already correctly loaded, the transaction is not aborted > + * (i.e. treclaimed/trecheckpointed) and MSR.VSX is just set as 1, > + * so a TM failure is not expected also in this case. > + */ > + if ((flags.touch_fp && flags.touch_vec) && > + flags.exception == VSX_UNA_EXCEPTION) > + return
linux-next: manual merge of the powerpc tree with the powerpc-fixes tree
Hi all, Today's linux-next merge of the powerpc tree got a conflict in: arch/powerpc/mm/tlb-radix.c between commit: 26e53d5ebe2e ("powerpc/64s/radix: Fix preempt imbalance in TLB flush") from the powerpc-fixes tree and commit: 6773027205ea ("powerpc/mm/radix: Drop unneeded NULL check") from the powerpc tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/mm/tlb-radix.c index d304028641a2,3a07d7a5e2fe.. --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@@ -359,8 -359,7 +359,8 @@@ void radix__flush_tlb_collapsed_pmd(str unsigned long pid, end; - pid = mm ? mm->context.id : 0; + pid = mm->context.id; + preempt_disable(); if (unlikely(pid == MMU_NO_CONTEXT)) goto no_context;
[PATCH] [net-next, v2] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
This patch implements and enables VDP support for the ibmvnic driver. Moreover, it includes the implementation of suitable structs, signal transmission/handling and functions which allows the retrival of firmware information from the ibmvnic card. Signed-off-by: Desnes A. Nunes do Rosario Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 135 - drivers/net/ethernet/ibm/ibmvnic.h | 27 +++- 2 files changed, 159 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d0cff28..ee7a27b 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter) return 0; } +static void release_vpd_data(struct ibmvnic_adapter *adapter) +{ + if (!adapter->vpd) + return; + + kfree(adapter->vpd->buff); + kfree(adapter->vpd); +} + static void release_tx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_tx_pool *tx_pool; @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter *adapter) { int i; + release_vpd_data(adapter); + release_tx_pools(adapter); release_rx_pools(adapter); @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev) return rc; } +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) +{ + struct device *dev = &adapter->vdev->dev; + union ibmvnic_crq crq; + dma_addr_t dma_addr; + int len; + + if (adapter->vpd->buff) + len = adapter->vpd->len; + + reinit_completion(&adapter->fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, +adapter->vpd->len, GFP_KERNEL); + + if (!adapter->vpd->buff) { + dev_err(dev, "Could allocate VPD buffer\n"); + return -ENOMEM; + } + + adapter->vpd->dma_addr = + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "Could not map VPD buffer\n"); + return -ENOMEM; + } + + reinit_completion(&adapter->fw_done); + crq.get_vpd.first = IBMVNIC_CRQ_CMD; + crq.get_vpd.cmd = GET_VPD; + crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr); + crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len); + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + + return 0; +} + static int init_resources(struct ibmvnic_adapter *adapter) { struct net_device *netdev = adapter->netdev; @@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter *adapter) if (rc) return rc; + adapter->vpd = kzalloc(sizeof(*adapter->vpd), GFP_KERNEL); + if (!adapter->vpd) + return -ENOMEM; + adapter->map_id = 1; adapter->napi = kcalloc(adapter->req_rx_queues, sizeof(struct napi_struct), GFP_KERNEL); @@ -950,6 +1012,10 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); + + /* Vital Product Data (VPD) */ + ibmvnic_get_vpd(adapter); + mutex_unlock(&adapter->reset_lock); return rc; @@ -1878,11 +1944,15 @@ static int ibmvnic_get_link_ksettings(struct net_device *netdev, return 0; } -static void ibmvnic_get_drvinfo(struct net_device *dev, +static void ibmvnic_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *info) { + struct ibmvnic_adapter *adapter = netdev_priv(netdev); + strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver)); strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version)); + strlcpy(info->fw_version, adapter->fw_version, + sizeof(info->fw_version)); } static u32 ibmvnic_get_msglevel(struct net_device *netdev) @@ -3076,6 +3146,63 @@ static void send_cap_queries(struct ibmvnic_adapter *adapter) ibmvnic_send_crq(adapter, &crq); } +static void handle_vpd_size_rsp(union ibmvnic_crq *crq, + struct ibmvnic_adapter *adapter) +{ + struct device *dev = &adapter->vdev->dev; + + if (crq->get_vpd_size_rsp.rc.code) { + dev_err(dev, "Error retrieving VPD size, rc=%x\n", + crq->get_vpd_
[PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
This patch implements and enables VDP support for the ibmvnic driver. Moreover, it includes the implementation of suitable structs, signal transmission/handling and fuctions which allows the retrival of firmware information from the ibmvnic card. Co-Authored-By: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 140 - drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- 2 files changed, 164 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d0cff28..120f3c0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *, struct ibmvnic_sub_crq_queue *); static int ibmvnic_poll(struct napi_struct *napi, int data); static void send_map_query(struct ibmvnic_adapter *adapter); +static int ibmvnic_get_vpd(struct ibmvnic_adapter *); +static void handle_vpd_size_rsp(union ibmvnic_crq *, struct ibmvnic_adapter *); +static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *); static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, u8); static void send_request_unmap(struct ibmvnic_adapter *, u8); static void send_login(struct ibmvnic_adapter *adapter); @@ -573,6 +576,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter) return 0; } +static void release_vpd_data(struct ibmvnic_adapter *adapter) +{ + if(!adapter->vpd) + return; + + kfree(adapter->vpd->buff); + kfree(adapter->vpd); +} + static void release_tx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_tx_pool *tx_pool; @@ -753,6 +765,8 @@ static void release_resources(struct ibmvnic_adapter *adapter) { int i; + release_vpd_data(adapter); + release_tx_pools(adapter); release_rx_pools(adapter); @@ -850,6 +864,10 @@ static int init_resources(struct ibmvnic_adapter *adapter) if (rc) return rc; + adapter->vpd = kzalloc(sizeof(struct ibmvnic_vpd), GFP_KERNEL); + if (!adapter->vpd) + return -ENOMEM; + adapter->map_id = 1; adapter->napi = kcalloc(adapter->req_rx_queues, sizeof(struct napi_struct), GFP_KERNEL); @@ -950,6 +968,10 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); + + /* Vital Product Data (VPD) */ + ibmvnic_get_vpd(adapter); + mutex_unlock(&adapter->reset_lock); return rc; @@ -1878,11 +1900,15 @@ static int ibmvnic_get_link_ksettings(struct net_device *netdev, return 0; } -static void ibmvnic_get_drvinfo(struct net_device *dev, +static void ibmvnic_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *info) { + struct ibmvnic_adapter *adapter = netdev_priv(netdev); + strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver)); strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version)); + strlcpy(info->fw_version, adapter->fw_version, + sizeof(info->fw_version)); } static u32 ibmvnic_get_msglevel(struct net_device *netdev) @@ -2923,6 +2947,110 @@ static void send_login(struct ibmvnic_adapter *adapter) return; } +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) +{ + struct device *dev = &adapter->vdev->dev; + union ibmvnic_crq crq; + dma_addr_t dma_addr; + int len; + + if (adapter->vpd->buff) + len = adapter->vpd->len; + + reinit_completion(&adapter->fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, &crq); + wait_for_completion(&adapter->fw_done); + + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, +adapter->vpd->len, GFP_KERNEL); + + if (!adapter->vpd->buff) { + dev_err(dev, "Could allocate VPD buffer\n"); + return -ENOMEM; + } + + adapter->vpd->dma_addr = + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "Could not map VPD buffer\n"); + return -ENOMEM; + } + + reinit_completion(&adapter->fw_done); + crq.get_vpd.first = IBMVNIC_CRQ_CMD; + crq.get_vpd.cmd = GET_VPD; + crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr); + crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len); + ibmvnic_send_crq(adapter,
[PATCH] selftests/powerpc: Check FP/VEC on exception in TM
Add a self test to check if FP/VEC/VSX registers are sane (restored correctly) after a FP/VEC/VSX unavailable exception is caught during a transaction. This test checks all possibilities in a thread regarding the combination of MSR.[FP|VEC] states in a thread and for each scenario raises a FP/VEC/VSX unavailable exception in transactional state, verifying if vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg sets, are not corrupted. Signed-off-by: Gustavo Romero Signed-off-by: Breno Leitao Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/Makefile| 3 +- .../testing/selftests/powerpc/tm/tm-unavailable.c | 368 + tools/testing/selftests/powerpc/tm/tm.h| 5 + 3 files changed, 375 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 7bfcd45..24855c0 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu tm-signal-context-chk-vmx tm-signal-context-chk-vsx TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \ + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \ $(SIGNAL_CONTEXT_CHK_TESTS) include ../../lib.mk @@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include $(OUTPUT)/tm-tmspr: CFLAGS += -pthread $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o +$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c b/tools/testing/selftests/powerpc/tm/tm-unavailable.c new file mode 100644 index 000..69a4e8c --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c @@ -0,0 +1,368 @@ +/* + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp. + * Licensed under GPLv2. + * + * Force FP, VEC and VSX unavailable exception during transaction in all + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP + * is enable and VEC is disable, when FP is disable and VEC is enable, and + * so on. Then we check if the restored state is correctly set for the + * FP and VEC registers to the previous state we set just before we entered + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and + * VEC/Altivec registers on abortion due to an unavailable exception in TM. + * N.B. In this test we do not test all the FP/Altivec/VSX registers for + * corruption, but only for registers vs0 and vs32, which are respectively + * representatives of FP and VEC/Altivec reg sets. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +#include "tm.h" + +#define DEBUG 0 + +/* Unavailable exceptions to test in HTM */ +#define FP_UNA_EXCEPTION 0 +#define VEC_UNA_EXCEPTION 1 +#define VSX_UNA_EXCEPTION 2 + +#define NUM_EXCEPTIONS 3 + +struct Flags { + int touch_fp; + int touch_vec; + int result; + int exception; +} flags; + +bool expecting_failure(void) +{ + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION) + return false; + + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION) + return false; + + /* If both FP and VEC are touched it does not mean that touching +* VSX won't raise an exception. However since FP and VEC state +* are already correctly loaded, the transaction is not aborted +* (i.e. treclaimed/trecheckpointed) and MSR.VSX is just set as 1, +* so a TM failure is not expected also in this case. +*/ + if ((flags.touch_fp && flags.touch_vec) && +flags.exception == VSX_UNA_EXCEPTION) + return false; + + return true; +} + +/* Check if failure occurred whilst in transaction. */ +bool is_failure(uint64_t condition_reg) +{ + /* When failure handling occurs, CR0 is set to 0b1010 (0xa). Otherwise +* transaction completes without failure and hence reaches out 'tend.' +* that sets CR0 to 0b0100 (0x4). +*/ + return ((condition_reg >> 28) & 0xa) == 0xa; +} + +void *ping(void *input) +{ + + /* Expected values for vs0 and vs32 after a TM failure. They must +* never change, otherwise they got corrupted. +*/ + uint64_t high_vs0 = 0x; + uint64_t low_vs0 = 0x; + uint64_t high_vs32 = 0x; + uint64_t low_vs32 = 0x; + +
[PATCH V6 2/2] pseries/initnodes: Ensure nodes initialized for hotplug
pseries/nodes: On pseries systems which allow 'hot-add' of CPU, it may occur that the new resources are to be inserted into nodes that were not used for memory resources at bootup. Many different configurations of PowerPC resources may need to be supported depending upon the environment. This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, or that hot-add CPUs into nodes that are memoryless during system execution after boot. Signed-off-by: Michael Bringmann --- Changes in V6: -- Add some needed node initialization to runtime code that maps CPUs based on VPHN associativity -- Add error checks and alternate recovery for compile flag CONFIG_MEMORY_HOTPLUG -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG --- arch/powerpc/mm/numa.c | 51 ++-- 1 file changed, 40 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 334a1ff..163f4cc 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu) nid = of_node_to_nid_single(cpu); out_present: - if (nid < 0 || !node_online(nid)) + if (nid < 0 || !node_possible(nid)) nid = first_online_node; map_cpu_to_node(lcpu, nid); @@ -867,7 +867,7 @@ void __init dump_numa_cpu_topology(void) } /* Initialize NODE_DATA for a node on the local memory */ -static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) +static void setup_node_data(int nid, u64 start_pfn, u64 end_pfn) { u64 spanned_pages = end_pfn - start_pfn; const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES); @@ -913,10 +913,8 @@ static void __init find_possible_nodes(void) min_common_depth); for (i = 0; i < numnodes; i++) { - if (!node_possible(i)) { - setup_node_data(i, 0, 0); + if (!node_possible(i)) node_set(i, node_possible_map); - } } out: @@ -1312,6 +1310,42 @@ static long vphn_get_associativity(unsigned long cpu, return rc; } +static inline int find_cpu_nid(int cpu) +{ + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + int new_nid; + + /* Use associativity from first thread for all siblings */ + vphn_get_associativity(cpu, associativity); + new_nid = associativity_to_nid(associativity); + if (new_nid < 0 || !node_possible(new_nid)) + new_nid = first_online_node; + + if (NODE_DATA(new_nid) == NULL) { +#ifdef CONFIG_MEMORY_HOTPLUG + /* +* Need to ensure that NODE_DATA is initialized +* for a node from available memory (see +* memblock_alloc_try_nid). If unable to init +* the node, then default to nearest node that +* has memory installed. +*/ + if (try_online_node(new_nid)) + new_nid = first_online_node; +#else + /* +* Default to using the nearest node that has +* memory installed. Otherwise, it would be +* necessary to patch the kernel MM code to deal +* with more memoryless-node error conditions. +*/ + new_nid = first_online_node; +#endif + } + + return new_nid; +} + /* * Update the CPU maps and sysfs entries for a single CPU when its NUMA * characteristics change. This function doesn't perform any locking and is @@ -1379,7 +1413,6 @@ int numa_update_cpu_topology(bool cpus_locked) { unsigned int cpu, sibling, changed = 0; struct topology_update_data *updates, *ud; - __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; cpumask_t updated_cpus; struct device *dev; int weight, new_nid, i = 0; @@ -1417,11 +1450,7 @@ int numa_update_cpu_topology(bool cpus_locked) continue; } - /* Use associativity from first thread for all siblings */ - vphn_get_associativity(cpu, associativity); - new_nid = associativity_to_nid(associativity); - if (new_nid < 0 || !node_online(new_nid)) - new_nid = first_online_node; + new_nid = find_cpu_nid(cpu); if (new_nid == numa_cpu_lookup_table[cpu]) { cpumask_andnot(&cpu_associativity_changes_mask,
[PATCH V6 1/2] pseries/nodes: Ensure enough nodes avail for operations
pseries/nodes: On pseries systems which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized. This patch ensures that sufficient nodes are defined to support configuration requirements after boot, as well as at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction, nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the property is not present at boot, no operation will be performed to define or enable additional nodes. Signed-off-by: Michael Bringmann --- Changes in V6: -- Remove some node initialization/allocation from boot setup --- arch/powerpc/mm/numa.c | 40 +--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index eb604b3..334a1ff 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -892,6 +892,37 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = spanned_pages; } +static void __init find_possible_nodes(void) +{ + struct device_node *rtas; + u32 numnodes, i; + + if (min_common_depth <= 0) + return; + + rtas = of_find_node_by_path("/rtas"); + if (!rtas) + return; + + if (of_property_read_u32_index(rtas, + "ibm,max-associativity-domains", + min_common_depth, &numnodes)) + goto out; + + pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes, + min_common_depth); + + for (i = 0; i < numnodes; i++) { + if (!node_possible(i)) { + setup_node_data(i, 0, 0); + node_set(i, node_possible_map); + } + } + +out: + of_node_put(rtas); +} + void __init initmem_init(void) { int nid, cpu; @@ -905,12 +936,15 @@ void __init initmem_init(void) memblock_dump_all(); /* -* Reduce the possible NUMA nodes to the online NUMA nodes, -* since we do not support node hotplug. This ensures that we -* lower the maximum NUMA node ID to what is actually present. +* Modify the set of possible NUMA nodes to reflect information +* available about the set of online nodes, and the set of nodes +* that we expect to make use of for this platform's affinity +* calculations. */ nodes_and(node_possible_map, node_possible_map, node_online_map); + find_possible_nodes(); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn;
[PATCH V6 0/2] pseries/nodes: Fix issues with memoryless nodes
pseries/nodes: Ensure enough nodes avail for operations pseries/initnodes: Ensure nodes initialized for hotplug Signed-off-by: Michael Bringmann Michael Bringmann (2): pseries/nodes: Ensure enough nodes avail for operations pseries/initnodes: Ensure nodes initialized for hotplug --- Changes in V6: -- Remove some node initialization/allocation from boot setup -- Add some needed node initialization to runtime code that maps CPUs based on VPHN associativity -- Add error checks and alternate recovery for compile flag CONFIG_MEMORY_HOTPLUG -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG
Re: linux-next: manual merge of the tip tree with the powerpc tree
On Tue, Oct 31, 2017 at 10:53 PM, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the tip tree got a conflict in: > > arch/powerpc/mm/numa.c > > between commit: > > cee5405da402 ("powerpc/hotplug: Improve responsiveness of hotplug change") > > from the powerpc tree and commit: > > df7e828c1b69 ("timer: Remove init_timer_deferrable() in favor of > timer_setup()") > > from the tip tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc arch/powerpc/mm/numa.c > index eb604b3574fa,73016451f330.. > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@@ -1506,9 -1466,7 +1505,7 @@@ static struct timer_list topology_timer > > static void reset_topology_timer(void) > { > - topology_timer.data = 0; > - topology_timer.expires = jiffies + topology_timer_secs * HZ; > - mod_timer(&topology_timer, topology_timer.expires); > - mod_timer(&topology_timer, jiffies + 60 * HZ); > ++ mod_timer(&topology_timer, jiffies + topology_timer_secs * HZ); > } > > #ifdef CONFIG_SMP > @@@ -1561,13 -1520,14 +1558,14 @@@ int start_topology_update(void > rc = of_reconfig_notifier_register(&dt_update_nb); > #endif > } > - } else if (firmware_has_feature(FW_FEATURE_VPHN) && > + } > + if (firmware_has_feature(FW_FEATURE_VPHN) && >lppaca_shared_proc(get_lppaca())) { > if (!vphn_enabled) { > - prrn_enabled = 0; > vphn_enabled = 1; > setup_cpu_associativity_change_counters(); > - init_timer_deferrable(&topology_timer); > + timer_setup(&topology_timer, topology_timer_fn, > + TIMER_DEFERRABLE); > reset_topology_timer(); > } > } Thanks, this looks correct to me! -Kees -- Kees Cook Pixel Security
Re: [PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > > b/drivers/net/ethernet/ibm/ibmvnic.c > > index d0cff28..120f3c0 100644 > > --- a/drivers/net/ethernet/ibm/ibmvnic.c > > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > > @@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct > > ibmvnic_adapter *, > > struct ibmvnic_sub_crq_queue *); > > static int ibmvnic_poll(struct napi_struct *napi, int data); > > static void send_map_query(struct ibmvnic_adapter *adapter); > > +static int ibmvnic_get_vpd(struct ibmvnic_adapter *); > > +static void handle_vpd_size_rsp(union ibmvnic_crq *, struct > > ibmvnic_adapter *); > > +static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *); > > There should be a space after the comma. And you should try to avoid forward declarations. Move the code around so they are not needed. Andrew
RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC
On Wed, 1 Nov 2017, Qiang Zhao wrote: > Michael Ellerman wrote > > > > Qiang Zhao writes: > > > > > Hi all, > > > > > > Could anybody review this patchset and take action on them? Thank you! > > > > Who maintains this? I don't actually know, it's not powerpc code, or is it? > > Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason > and Marc according to MAINTAINERS file. > > Hi Thomas, Jason and Marc, > > Could you keep an eye on this patchset? Thank you! It's on my radar, but I have zero capacity at the moment. Hopefully Marc can spare a few cycles. Thanks, tglx
Re: [PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
On 11/01/2017 09:39 AM, Desnes Augusto Nunes do Rosario wrote: > This patch implements and enables VDP support for the ibmvnic driver. > Moreover, it includes the implementation of suitable structs, signal > transmission/handling and fuctions which allows the retrival of firmware > information from the ibmvnic card. > > Co-Authored-By: Thomas Falcon > --- Hi, you should include a Signed-off-by tag in the commit message. You should also include the branch the patch is meant for in the PATCH field. In this case, it would be net-next. The commit message should also be wrapped (75 char per line), and 'fuctions' is missing an n. > drivers/net/ethernet/ibm/ibmvnic.c | 140 > - > drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- > 2 files changed, 164 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index d0cff28..120f3c0 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct > ibmvnic_adapter *, > struct ibmvnic_sub_crq_queue *); > static int ibmvnic_poll(struct napi_struct *napi, int data); > static void send_map_query(struct ibmvnic_adapter *adapter); > +static int ibmvnic_get_vpd(struct ibmvnic_adapter *); > +static void handle_vpd_size_rsp(union ibmvnic_crq *, struct ibmvnic_adapter > *); > +static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *); There should be a space after the comma. > static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, > u8); > static void send_request_unmap(struct ibmvnic_adapter *, u8); > static void send_login(struct ibmvnic_adapter *adapter); > @@ -573,6 +576,15 @@ static int reset_tx_pools(struct ibmvnic_adapter > *adapter) > return 0; > } > > +static void release_vpd_data(struct ibmvnic_adapter *adapter) > +{ > + if(!adapter->vpd) There should be a space between the if here. There are also some style checks that were picked up by checkpatch.pl. It's a good idea to run your patch through that before submission. Thanks, Tom > + return; > + > + kfree(adapter->vpd->buff); > + kfree(adapter->vpd); > +} > + > static void release_tx_pools(struct ibmvnic_adapter *adapter) > { > struct ibmvnic_tx_pool *tx_pool; > @@ -753,6 +765,8 @@ static void release_resources(struct ibmvnic_adapter > *adapter) > { > int i; > > + release_vpd_data(adapter); > + > release_tx_pools(adapter); > release_rx_pools(adapter); > > @@ -850,6 +864,10 @@ static int init_resources(struct ibmvnic_adapter > *adapter) > if (rc) > return rc; > > + adapter->vpd = kzalloc(sizeof(struct ibmvnic_vpd), GFP_KERNEL); > + if (!adapter->vpd) > + return -ENOMEM; > + > adapter->map_id = 1; > adapter->napi = kcalloc(adapter->req_rx_queues, > sizeof(struct napi_struct), GFP_KERNEL); > @@ -950,6 +968,10 @@ static int ibmvnic_open(struct net_device *netdev) > > rc = __ibmvnic_open(netdev); > netif_carrier_on(netdev); > + > + /* Vital Product Data (VPD) */ > + ibmvnic_get_vpd(adapter); > + > mutex_unlock(&adapter->reset_lock); > > return rc; > @@ -1878,11 +1900,15 @@ static int ibmvnic_get_link_ksettings(struct > net_device *netdev, > return 0; > } > > -static void ibmvnic_get_drvinfo(struct net_device *dev, > +static void ibmvnic_get_drvinfo(struct net_device *netdev, > struct ethtool_drvinfo *info) > { > + struct ibmvnic_adapter *adapter = netdev_priv(netdev); > + > strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver)); > strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version)); > + strlcpy(info->fw_version, adapter->fw_version, > + sizeof(info->fw_version)); > } > > static u32 ibmvnic_get_msglevel(struct net_device *netdev) > @@ -2923,6 +2947,110 @@ static void send_login(struct ibmvnic_adapter > *adapter) > return; > } > > +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) > +{ > + struct device *dev = &adapter->vdev->dev; > + union ibmvnic_crq crq; > + dma_addr_t dma_addr; > + int len; > + > + if (adapter->vpd->buff) > + len = adapter->vpd->len; > + > + reinit_completion(&adapter->fw_done); > + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; > + crq.get_vpd_size.cmd = GET_VPD_SIZE; > + ibmvnic_send_crq(adapter, &crq); > + wait_for_completion(&adapter->fw_done); > + > + if (!adapter->vpd->buff) > + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); > + else if (adapter->vpd->len != len) > + adapter->vpd->buff = > + krealloc(adapter->vpd->buff, > + adapter->vpd->len, GFP_KERNEL); > + > + if
Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks
On 11/01, Petr Mladek wrote: > > On Tue 2017-10-31 12:48:52, Miroslav Benes wrote: > > + if (task->flags & PF_KTHREAD) { > > + /* > > +* Wake up a kthread which still has not been migrated. > > +*/ > > + wake_up_process(task); > > I have just noticed that freezer used wake_up_state(p, TASK_INTERRUPTIBLE); > IMHO, we should do so as well. I won't argue, but... > wake_up_process() wakes also tasks in TASK_UNINTERRUPTIBLE state. > These might not be ready for an unexpected wakeup. For example, > see concat_dev_erase() in drivers/mtd/mtdcontact.c. I'd say that concat_dev_erase() should be fixed, any code should be ready for spurious wakeup. Note also that wake_up_state(TASK_INTERRUPTIBLE) won't wakeup the TASK_IDLE kthreads, and most of the kthreads which use TASK_INTERRUPTIBLE should use TASK_IDLE today, because in most cases TASK_INTERRUPTIBLE was used to not contribute to loadavg. Oleg.
Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks
On Tue 2017-10-31 12:48:52, Miroslav Benes wrote: > Live patching consistency model is of LEAVE_PATCHED_SET and > SWITCH_THREAD. This means that all tasks in the system have to be marked > one by one as safe to call a new patched function. Safe means when a > task is not (sleeping) in a set of patched functions. That is, no > patched function is on the task's stack. Another clearly safe place is > the boundary between kernel and userspace. The patching waits for all > tasks to get outside of the patched set or to cross the boundary. The > transition is completed afterwards. > > The problem is that a task can block the transition for quite a long > time, if not forever. It could sleep in a set of patched functions, for > example. Luckily we can force the task to leave the set by sending it a > fake signal, that is a signal with no data in signal pending structures > (no handler, no sign of proper signal delivered). Suspend/freezer use > this to freeze the tasks as well. The task gets TIF_SIGPENDING set and > is woken up (if it has been sleeping in the kernel before) or kicked by > rescheduling IPI (if it was running on other CPU). This causes the task > to go to kernel/userspace boundary where the signal would be handled and > the task would be marked as safe in terms of live patching. > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index b004a1fb6032..6700d3b22615 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -577,3 +577,43 @@ void klp_copy_process(struct task_struct *child) > > /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */ > } > + > +/* > + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request > this > + * action currently. > + */ > +void klp_force_signals(void) > +{ > + struct task_struct *g, *task; > + > + pr_notice("signaling remaining tasks\n"); > + > + read_lock(&tasklist_lock); > + for_each_process_thread(g, task) { > + if (!klp_patch_pending(task)) > + continue; > + > + /* > + * There is a small race here. We could see TIF_PATCH_PENDING > + * set and decide to wake up a kthread or send a fake signal. > + * Meanwhile the task could migrate itself and the action > + * would be meaningless. It is not serious though. > + */ > + if (task->flags & PF_KTHREAD) { > + /* > + * Wake up a kthread which still has not been migrated. > + */ > + wake_up_process(task); I have just noticed that freezer used wake_up_state(p, TASK_INTERRUPTIBLE); IMHO, we should do so as well. wake_up_process() wakes also tasks in TASK_UNINTERRUPTIBLE state. These might not be ready for an unexpected wakeup. For example, see concat_dev_erase() in drivers/mtd/mtdcontact.c. With this change, feel free to use Reviewed-by: Petr Mladek Best Regards, Petr
Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks
> +/* > + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request > this > + * action currently. > + */ > +void klp_force_signals(void) > +{ > + struct task_struct *g, *task; > + > + pr_notice("signaling remaining tasks\n"); > + > + read_lock(&tasklist_lock); > + for_each_process_thread(g, task) { > + if (!klp_patch_pending(task)) > + continue; > + > + /* > + * There is a small race here. We could see TIF_PATCH_PENDING > + * set and decide to wake up a kthread or send a fake signal. > + * Meanwhile the task could migrate itself and the action > + * would be meaningless. It is not serious though. > + */ > + if (task->flags & PF_KTHREAD) { > + /* > + * Wake up a kthread which still has not been migrated. > + */ > + wake_up_process(task); So this is not as safe as one would hope. It tries to wake all TASK_NORMAL tasks, which could cause headaches. Let's make it wake_up_state(task, TASK_INTERRUPTIBLE); to wake only kthreads sleeping interruptedly. Thanks Petr for spotting this (offline). Miroslav > + } else { > + /* > + * Send fake signal to all non-kthread tasks which are > + * still not migrated. > + */ > + spin_lock_irq(&task->sighand->siglock); > + signal_wake_up(task, 0); > + spin_unlock_irq(&task->sighand->siglock); > + } > + } > + read_unlock(&tasklist_lock); > +}
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements
On Wed, 1 Nov 2017 17:35:51 +0530 Anshuman Khandual wrote: > On 10/31/2017 12:14 PM, Nicholas Piggin wrote: > > Here's a random mix of performance improvements for radix TLB flushing > > code. The main aims are to reduce the amount of translation that gets > > invalidated, and to reduce global flushes where we can do local. > > > > To that end, a parallel kernel compile benchmark using powerpc:tlbie > > tracepoint shows a reduction in tlbie instructions from about 290,000 > > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to > > 15,000,000. Looks great, but unfortunately does not translate to a > > statistically significant performance improvement! The needle on TLB > > misses does not move much, I suspect because a lot of the flushing is > > done a startup and shutdown, and because a significant cost of TLB > > flushing itself is in the barriers. > > Does memory barrier initiate a single global invalidation with tlbie ? > I'm not quite sure what you're asking, and I don't know the details of how the hardware handles it, but from the measurements in patch 1 of the series we can see there is a benefit for both tlbie and tlbiel of batching them up between barriers. Thanks, Nick
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements
On 10/31/2017 12:14 PM, Nicholas Piggin wrote: > Here's a random mix of performance improvements for radix TLB flushing > code. The main aims are to reduce the amount of translation that gets > invalidated, and to reduce global flushes where we can do local. > > To that end, a parallel kernel compile benchmark using powerpc:tlbie > tracepoint shows a reduction in tlbie instructions from about 290,000 > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to > 15,000,000. Looks great, but unfortunately does not translate to a > statistically significant performance improvement! The needle on TLB > misses does not move much, I suspect because a lot of the flushing is > done a startup and shutdown, and because a significant cost of TLB > flushing itself is in the barriers. Does memory barrier initiate a single global invalidation with tlbie ?
Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
On Sun, Oct 29, 2017 at 05:51:06PM -0700, Ram Pai wrote: > On Mon, Oct 30, 2017 at 09:04:17AM +1100, Paul Mackerras wrote: > > On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote: > > > > > > I like the idea of not tracking the slots at all. It is something the > > > guest should not be knowing or tracking. > > > > Why do you say that? > > 'slot' is a internal mechanism used by the hash table to accelerate > mapping an address to a hash-page. If the hash-table implementation > choose to use a different mechanism to accelerate the mapping, it can't > because that mechanism is baked into the logic of all the consumers. Not all operating systems use the HPT as a cache of translations that are also stored somewhere, as Linux does. Those OSes are perfectly entitled to control the slot allocation for their own purposes in whatever way they want. So having the interface use the slot number is fine; just as having alternative interfaces that don't need to specify the slot number for the kind of usage that Linux does is also fine. Paul.
[RFC 2/2] powerpc/mm: Enable deferred flushing of TLB during reclaim
Deferred flushing can only be enabled on POWER9 DD2.0 processor onwards. Because prior versions of POWER9 and previous hash table based POWER processors will do TLB flushing in pte_get_and_clear() function itself which then prevents batching and eventual flush completion later on. Signed-off-by: Anshuman Khandual --- arch/powerpc/Kconfig| 1 + arch/powerpc/include/asm/tlbbatch.h | 30 +++ arch/powerpc/include/asm/tlbflush.h | 3 +++ arch/powerpc/mm/tlb-radix.c | 49 + 4 files changed, 83 insertions(+) create mode 100644 arch/powerpc/include/asm/tlbbatch.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 809c468..f06b565 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -230,6 +230,7 @@ config PPC select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE select VIRT_TO_BUS if !PPC64 + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if (PPC64 && PPC_BOOK3S) # # Please keep this list sorted alphabetically. # diff --git a/arch/powerpc/include/asm/tlbbatch.h b/arch/powerpc/include/asm/tlbbatch.h new file mode 100644 index 000..fc762ef --- /dev/null +++ b/arch/powerpc/include/asm/tlbbatch.h @@ -0,0 +1,30 @@ +#ifndef _ARCH_POWERPC_TLBBATCH_H +#define _ARCH_POWERPC_TLBBATCH_H + +#include + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +#define MAX_BATCHED_MM 1024 + +struct arch_tlbflush_unmap_batch { + /* +* Each bit set is a CPU that potentially has a +* TLB entry for one of the PFN being flushed. +* This represents whether all deferred struct +* mm will be flushed for any given CPU. +*/ + struct cpumask cpumask; + + /* All the deferred struct mm */ + struct mm_struct *mm[MAX_BATCHED_MM]; + unsigned long int nr_mm; + +}; + +extern bool arch_tlbbatch_should_defer(struct mm_struct *mm); +extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm); +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ +#endif /* _ARCH_POWERPC_TLBBATCH_H */ diff --git a/arch/powerpc/include/asm/tlbflush.h b/arch/powerpc/include/asm/tlbflush.h index 13dbcd4..2041923 100644 --- a/arch/powerpc/include/asm/tlbflush.h +++ b/arch/powerpc/include/asm/tlbflush.h @@ -20,6 +20,9 @@ */ #ifdef __KERNEL__ +#include +#include + #ifdef CONFIG_PPC_MMU_NOHASH /* * TLB flushing for software loaded TLB chips diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index b3e849c..506e7ed 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include #include @@ -519,3 +521,50 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct *mm) } EXPORT_SYMBOL_GPL(radix_kvm_prefetch_workaround); #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +static void clear_tlb(void *data) +{ + struct arch_tlbflush_unmap_batch *batch = data; + int i; + + WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)); + + for (i = 0; i < batch->nr_mm; i++) { + if (batch->mm[i]) + radix__local_flush_tlb_mm(batch->mm[i]); + } +} + +void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) +{ + WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)); + + smp_call_function_many(&batch->cpumask, (void *)clear_tlb, batch, 1); + batch->nr_mm = 0; + cpumask_clear(&batch->cpumask); +} + +void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm) +{ + WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)); + + ++batch->nr_mm; + if (batch->nr_mm != MAX_BATCHED_MM) + batch->mm[batch->nr_mm] = mm; + else + pr_err("Deferred TLB flush: missed a struct mm\n"); + cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); +} + +bool arch_tlbbatch_should_defer(struct mm_struct *mm) +{ + if (!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1)) + return false; + + if (!mm_is_thread_local(mm)) + return true; + + return false; +} +#endif -- 1.8.3.1
[RFC 1/2] mm/tlbbatch: Introduce arch_tlbbatch_should_defer()
The entire scheme of deferred TLB flush in reclaim path rests on the fact that the cost to refill TLB entries is less than flushing out individual entries by sending IPI to remote CPUs. But architecture can have different ways to evaluate that. Hence apart from checking TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be architecture specific. Signed-off-by: Anshuman Khandual --- arch/x86/include/asm/tlbflush.h | 12 mm/rmap.c | 9 + 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c4aed0d..5875f2c 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -366,6 +366,18 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) void native_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info); +static inline void arch_tlbbatch_should_defer(struct mm_struct *mm) +{ + bool should_defer = false; + + /* If remote CPUs need to be flushed then defer batch the flush */ + if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) + should_defer = true; + put_cpu(); + + return should_defer; +} + static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm) { diff --git a/mm/rmap.c b/mm/rmap.c index b874c47..bfbfe92 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -627,17 +627,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable) */ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) { - bool should_defer = false; - if (!(flags & TTU_BATCH_FLUSH)) return false; - /* If remote CPUs need to be flushed then defer batch the flush */ - if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) - should_defer = true; - put_cpu(); - - return should_defer; + return arch_tlbbatch_should_defer(mm); } /* -- 1.8.3.1
[RFC 0/2] Enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH on POWER
From: Anshuman Khandual Batched TLB flush during reclaim path has been around for couple of years now and been enabled on X86 platform. The idea is to batch multiple page TLB invalidation requests together and flush all those CPUs completely who might have the TLB cache for any of the unmapped pages instead of just sending multiple IPIs and flushing out individual pages each time reclaim unmaps one page. This has the potential to improve performance for certain types of workloads under memory pressure provided some conditions related to individual page TLB invalidation, CPU wide TLB invalidation, system wide TLB invalidation, TLB reload, IPI costs etc are met. Please refer the commit 72b252aed5 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages") from Mel Gorman for more details on how it can impact the performance for various workloads. This enablement improves performance for the original test case 'case-lru-file-mmap-read' from vm-scallability bucket but only from system time perspective. time ./run case-lru-file-mmap-read Without the patch: real4m20.364s user102m52.492s sys 433m26.190s With the patch: real4m15.942s (- 1.69%) user111m16.662s (+ 7.55%) sys 382m35.202s (- 11.73%) Parallel kernel compilation does not see any performance improvement or degradation with and with out this patch. It remains within margin of error. Without the patch: real1m13.850s user39m21.803s sys 2m43.362s With the patch: real1m14.481s (+ 0.85%) user39m27.409s (+ 0.23%) sys 2m44.656s (+ 0.79%) It batches up multiple struct mm during reclaim and keeps on accumulating the superset of struct mm's cpu mask who might have a TLB which needs to be invalidated. Then local struct mm wide invalidation is performance on the cpu mask for all those batched ones. Please do the review and let me know if there is any other way to do this better. Thank you. Anshuman Khandual (2): mm/tlbbatch: Introduce arch_tlbbatch_should_defer() powerpc/mm: Enable deferred flushing of TLB during reclaim arch/powerpc/Kconfig| 1 + arch/powerpc/include/asm/tlbbatch.h | 30 +++ arch/powerpc/include/asm/tlbflush.h | 3 +++ arch/powerpc/mm/tlb-radix.c | 49 + arch/x86/include/asm/tlbflush.h | 12 + mm/rmap.c | 9 +-- 6 files changed, 96 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/tlbbatch.h -- 1.8.3.1
Re: [PATCH] powerpc/perf: Fix core-imc hotplug callback failure during imc initialization
On Wednesday 01 November 2017 06:22 AM, Michael Ellerman wrote: Anju T Sudhakar writes: Call trace observed during boot: What's the actual oops? I could recreate this in mambo with CPUS=2 and THREAD=2 Here is the complete stack trace. [ 0.045367] core_imc memory allocation for cpu 2 failed [ 0.045408] Unable to handle kernel paging request for data at address 0x7d20e2a6f92d03b8 [ 0.045443] Faulting instruction address: 0xc00dde18 cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000fd1cb890] pc: c00dde18: event_function_call+0x28/0x14c lr: c00dde00: event_function_call+0x10/0x14c sp: c000fd1cbb10 msr: 90009033 dar: 7d20e2a6f92d03b8 current = 0xc000fd15da00 paca = 0xcfff softe: 0 irq_happened: 0x01 pid = 11, comm = cpuhp/0 Linux version 4.14.0-rc7-00014-g0a08377b127b (maddy@SrihariSrinidhi) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1)) #5 SMP Wed Nov 1 14:12:27 IST 2017 enter ? for help [c000fd1cbb10] (unreliable) [c000fd1cbba0] c00de180 perf_remove_from_context+0x30/0x9c [c000fd1cbbe0] c00e9108 perf_pmu_migrate_context+0x9c/0x224 [c000fd1cbc60] c00682e0 ppc_core_imc_cpu_offline+0xdc/0x144 [c000fd1cbcb0] c0070568 cpuhp_invoke_callback+0xe4/0x244 [c000fd1cbd10] c0070824 cpuhp_thread_fun+0x15c/0x1b0 [c000fd1cbd60] c008e8cc smpboot_thread_fn+0x1e0/0x200 [c000fd1cbdc0] c008ae58 kthread+0x150/0x158 [c000fd1cbe30] c000b464 ret_from_kernel_thread+0x5c/0x78 [c00ff38ffb80] c02ddfac perf_pmu_migrate_context+0xac/0x470 [c00ff38ffc40] c011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 [c00ff38ffc90] c0125758 cpuhp_invoke_callback+0x198/0x5d0 [c00ff38ffd00] c012782c cpuhp_thread_fun+0x8c/0x3d0 [c00ff38ffd60] c01678d0 smpboot_thread_fn+0x290/0x2a0 [c00ff38ffdc0] c015ee78 kthread+0x168/0x1b0 [c00ff38ffe30] c000b368 ret_from_kernel_thread+0x5c/0x74 While registering the cpuhoplug callbacks for core-imc, if we fails in the cpuhotplug online path for any random core (either because opal call to initialize the core-imc counters fails or because memory allocation fails for that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who successfully returned from cpuhotplug online path. But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event context, when core-imc counters are not even initialized. Thus creating the above stack dump. Add a check to see if core-imc counters are enabled or not in the cpuhotplug offline path before migrating the context to handle this failing scenario. Why do we need a bool to track this? Can't we just check the data structure we're deinitialising has been initialised? My bad. yes we could do that. Something like this will work? @@ -606,6 +608,20 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu) if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask)) return 0; + /* + * Check whether core_imc is registered. We could end up here + * if the cpuhotplug callback registration fails. i.e, callback + * invokes the offline path for all sucessfully registered cpus. + * At this stage, core_imc pmu will not be registered and we + * should return here. + * + * We return with a zero since this is not a offline failure. + * And cpuhp_setup_state() returns the actual failure reason + * to the caller, which inturn will call the cleanup routine. + */ + if (!core_imc_pmu->pmu.event_init) + return 0; + /* Find any online cpu in that core except the current "cpu" */ ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu); Doesn't this also mean we won't cleanup the initialisation for any CPUs that have been initialised? No, we do. cpuhp_setup_state() returns the actual failure reason which in-turn calls the cleanup routine. Thanks for review comments Maddy cheers diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 8812624..08139f9 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -30,6 +30,7 @@ static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; static cpumask_t nest_imc_cpumask; struct imc_pmu_ref *nest_imc_refc; static int nest_pmus; +static bool core_imc_enabled; /* Core IMC data structures and variables */ @@ -607,6 +608,19 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu) if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask)) return 0; + /* +* See if core imc counters are enabled or not. +* +* Suppose we reach here from core_imc_cpumask_init(), +* since we failed at the cpuhotplug online path for any random +* core (either because opal call t
Re: [Patch v10] QE: remove PPCisms for QE
Hi Zhao, I just noticed a small nit. On 01/11/17 02:01, Zhao Qiang wrote: QE was supported on PowerPC, and dependent on PPC, Now it is supported on other platforms. so remove PPCisms. Signed-off-by: Zhao Qiang --- Changes for v2: - na Changes for v3: - add NO_IRQ Changes for v4: - modify spin_event_timeout to opencoded timeout loop - remove NO_IRQ - modify virq_to_hw to opencoed code Changes for v5: - modify commit msg - modify depends of QUICC_ENGINE - add kerneldoc header for qe_issue_cmd Changes for v6: - add dependency on FSL_SOC and PPC32 for drivers depending on QUICC_ENGING but not available on ARM Changes for v7: - split qeic part to another patch - rebase Changes for v8: - include in ucc_uart Changes for v9: - fix cast warning Changes for v10: - rebase drivers/net/ethernet/freescale/Kconfig | 11 ++--- drivers/soc/fsl/qe/Kconfig | 2 +- drivers/soc/fsl/qe/qe.c| 82 +- drivers/soc/fsl/qe/qe_io.c | 42 - drivers/soc/fsl/qe/qe_tdm.c| 8 ++-- drivers/soc/fsl/qe/ucc.c | 10 ++--- drivers/soc/fsl/qe/ucc_fast.c | 74 +++--- drivers/tty/serial/Kconfig | 2 +- drivers/tty/serial/ucc_uart.c | 1 + drivers/usb/gadget/udc/Kconfig | 2 +- drivers/usb/host/Kconfig | 2 +- include/soc/fsl/qe/qe.h| 1 - 12 files changed, 126 insertions(+), 111 deletions(-) [...] diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 2ef6fc6487c1..1d695870ea9e 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c [...] @@ -132,20 +142,26 @@ int qe_issue_cmd(u32 cmd, u32 device, u8 mcn_protocol, u32 cmd_input) mcn_shift = QE_CR_MCN_NORMAL_SHIFT; } - out_be32(&qe_immr->cp.cecdr, cmd_input); - out_be32(&qe_immr->cp.cecr, -(cmd | QE_CR_FLG | ((u32) device << dev_shift) | (u32) - mcn_protocol << mcn_shift)); + iowrite32be(cmd_input, &qe_immr->cp.cecdr); + iowrite32be((cmd | QE_CR_FLG | ((u32)device << dev_shift) | + (u32)mcn_protocol << mcn_shift), &qe_immr->cp.cecr); } /* wait for the QE_CR_FLG to clear */ - ret = spin_event_timeout((in_be32(&qe_immr->cp.cecr) & QE_CR_FLG) == 0, - 100, 0); + ret = -EIO; + for (i = 0; i < 100; i++) { + if ((ioread32be(&qe_immr->cp.cecr) & QE_CR_FLG) == 0) { + ret = 0; + break; + } + udelay(1); + } + /* On timeout (e.g. failure), the expression will be false (ret == 0), otherwise it will be true (ret == 1). */ nit: The comment here is no longer valid, on timeout ret == -EIO and on success 0. It should probably be removed to avoid confusion. Cheers, -- Julien Thierry