linux-next: build failure after merge of the powerpc tree

2017-11-01 Thread Stephen Rothwell
Hi all,

After merging the powerpc tree, today's linux-next build (powerpc64
allnoconfig) failed like this:

arch/powerpc/kernel/irq.o: In function `.replay_system_reset':
irq.c:(.text+0x10): undefined reference to `.ppc_save_regs'

Caused by commit

  78adf6c214f0 ("powerpc/64s: Implement system reset idle wakeup reason")

I have applied the following fix patch:

From: Stephen Rothwell 
Date: Thu, 2 Nov 2017 17:45:18 +1100
Subject: [PATCH] powerpc/64s: ppc_save_regs is now needed for all 64s builds

Fixes: 78adf6c214f0 ("powerpc/64s: Implement system reset idle wakeup reason")
Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/kernel/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 91960f83039c..34b6e729f38c 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -128,7 +128,7 @@ obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)   += tm.o
 obj-$(CONFIG_PPC64)+= $(obj64-y)
 obj-$(CONFIG_PPC32)+= $(obj32-y)
 
-ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE),)
+ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE)(CONFIG_PPC_BOOK3S),)
 obj-y  += ppc_save_regs.o
 endif
 
-- 
2.14.1

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols

2017-11-01 Thread Kamalesh Babulal

On Tuesday 31 October 2017 07:49 PM, Naveen N . Rao wrote:

Hi Kamalesh,
Sorry for the late review. Overall, the patch looks good to me. So:
Acked-by: Naveen N. Rao 

However, I have a few minor comments which can be addressed in a
subsequent patch.



Thanks for the review.

[...]


diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 0b0f896..005aaea 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c


[...]


@@ -239,10 +257,19 @@ static void relaswap(void *_x, void *_y, int size)

 /* Get size of potential trampolines required. */
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
-   const Elf64_Shdr *sechdrs)
+   const Elf64_Shdr *sechdrs,
+   struct module *me)
 {
/* One extra reloc so it's always 0-funcaddr terminated */
unsigned long relocs = 1;


You might want to get rid of 'relocs' above and simply use the below
two.


+   /*
+* size of livepatch stub is 28 instructions, whereas the
+* non-livepatch stub requires 7 instructions. Account for
+* different stub sizes and track the livepatch relocation
+* count in me->arch.klp_relocs.
+*/
+   unsigned long sec_relocs = 0;
+   unsigned long klp_relocs = 0;


You should also initialize this to 1 (similar to relocs above) for use
in the iterators below. Or, update the iterators to use
me->arch.klp_relocs (1)


relocs is Sum(sec_relocs), whereas per section relocation count is 
assigned to sec_relocs.  If the section is livepatch section, then it
added to klp_relocs or else to relocs. sec_relocs acts like a temp 
variable to hold the current section relocation count.





unsigned i;

/* Every relocated section... */
@@ -262,9 +289,14 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
 sechdrs[i].sh_size / sizeof(Elf64_Rela),
 sizeof(Elf64_Rela), relacmp, relaswap);

-   relocs += count_relocs((void *)sechdrs[i].sh_addr,
+   sec_relocs = count_relocs((void *)sechdrs[i].sh_addr,
   sechdrs[i].sh_size
   / sizeof(Elf64_Rela));


How about also capturing 'sec_relocs' in 'struct mod_arch_specific'? (2)
That will help simplify a lot of the calculations here and elsewhere.
Note that there are many other places where the number of stubs is
derived looking at 'sh_size', which is incorrect since we now have klp
stubs as well (create_ftrace_stub() for instance).


+   relocs += sec_relocs;
+#ifdef CONFIG_LIVEPATCH
+   if (sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
+   klp_relocs += sec_relocs;
+#endif


If a module has SHF_RELA_LIVEPATCH, but the kernel is compiled without
CONFIG_LIVEPATCH, should we refuse to load the kernel module?


Yes, the module load will fail.



You might want to consider removing the above #ifdef and processing some
of these flags regardless of CONFIG_LIVEPATCH.


ifdef guard, can be replaced with check for SHF_RELA_LIVEPATCH in 
section flag.





}
}

@@ -273,6 +305,15 @@ static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
relocs++;
 #endif

+   relocs -= klp_relocs;
+#ifdef CONFIG_LIVEPATCH
+   me->arch.klp_relocs = klp_relocs;
+
+   pr_debug("Looks like a total of %lu stubs, (%lu) livepatch stubs, 
max\n",

   ^
   (%lu livepatch stubs)

Just wondering why the brackets are the way they are...


Makes it better to use the brackets like you suggested.





+   relocs, klp_relocs);
+   return (relocs * sizeof(struct ppc64_stub_entry) +
+   klp_relocs * sizeof(struct ppc64le_klp_stub_entry));
+#endif
pr_debug("Looks like a total of %lu stubs, max\n", relocs);
return relocs * sizeof(struct ppc64_stub_entry);
 }


[...]



+#ifdef CONFIG_LIVEPATCH
+static unsigned long klp_stub_for_addr(const Elf64_Shdr *sechdrs,
+  unsigned long addr,
+  struct module *me)
+{
+   struct ppc64le_klp_stub_entry *klp_stubs;
+   unsigned int num_klp_stubs = me->arch.klp_relocs;
+   unsigned int i, num_stubs;
+
+   num_stubs = (sechdrs[me->arch.stubs_section].sh_size -
+   (num_klp_stubs * sizeof(*klp_stubs))) /
+   sizeof(struct ppc64_stub_entry);


(2) This can be simplified if we have me->arch.sec_relocs.


Having it will make the calculation look simple:
num_stubs = (me->arch.sec_relocs * size(struct ppc64_stub_entry).





+
+   /*
+* Create livepatch stubs after the regular stubs.
+*/
+ 

Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Fengguang Wu

You are welcome! :)

On Thu, Nov 02, 2017 at 01:44:07PM +1100, Cyril Bur wrote:

On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote:

Hi Cyril,

Thank you for the patch! Yet something to improve:



Once again robot, you have done brilliantly! You're 100% correct and
the last thing I want to do is break the build with
CONFIG_PPC_TRANSACTIONAL_MEM turned off.

Life saver,
Thanks so much kbuild.

Cyril


[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.14-rc7 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-asp8347_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/process.c: In function 'is_transactionally_fp':
> > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has no 
member named 'ckpt_regs'

  (tsk->thread.ckpt_regs.msr & MSR_FP);
  ^
   arch/powerpc/kernel/process.c:244:1: error: control reaches end of non-void 
function [-Werror=return-type]
}
^
   cc1: all warnings being treated as errors

vim +243 arch/powerpc/kernel/process.c

   239  
   240  static int is_transactionally_fp(struct task_struct *tsk)
   241  {
   242  return msr_tm_active(tsk->thread.regs->msr) &&
 > 243   (tsk->thread.ckpt_regs.msr & MSR_FP);
   244  }
   245  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation




[PATCH v2] powerpc/powernv: Improve local TLB flush for boot and MCE on POWER9

2017-11-01 Thread Nicholas Piggin
There are two cases outside the normal address space management
where a CPU's local TLB is to be flushed:

  1. Booting the kernel, in case something has left stale entries in
 the TLB (e.g., kexec).

  2. Machine check, to clean corrupted TLB entries.

CPU state restore from deep idle states also currently flushes the TLB.
However this is just a side-effect of reusing the boot code to set CPU
state, rather than a requirement itself.

This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:

- The current radix mode of the MMU is not taken into account. tlbiel
  is undefined if the R field does not match the current radix mode.

- ISA v3.0B hash must flush the partition and process table caches.

- ISA v3.0B radix must flush partition and process scoped translations,
  partition and process table caches, and also the page walk cache.

To improve this situation, consolidate the flushing code and implement
it in C and inline asm under the mm/ directory with the rest of the
flush code. Add ISA v3.0B cases for radix and hash. Take it out from early
cputable detection hooks, and move it later in the boot process after
the MMU registers are set up and before relocation is first turned on.

Provide capability for LPID flush to specify radix mode.

The TLB flush is no longer called when restoring from deep idle states.

Signed-off-by: Nicholas Piggin 
---
Booted this on power8 and power9 with hash and radix

 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |   1 +
 .../powerpc/include/asm/book3s/64/tlbflush-radix.h |   3 +
 arch/powerpc/include/asm/book3s/64/tlbflush.h  |  34 ++
 arch/powerpc/include/asm/cputable.h|  12 ---
 arch/powerpc/kernel/cpu_setup_power.S  |  50 -
 arch/powerpc/kernel/cputable.c |  14 ---
 arch/powerpc/kernel/dt_cpu_ftrs.c  |  30 --
 arch/powerpc/kernel/mce_power.c| 115 +
 arch/powerpc/kvm/book3s_hv_ras.c   |   6 +-
 arch/powerpc/mm/hash_native_64.c   |  97 +
 arch/powerpc/mm/hash_utils_64.c|   8 ++
 arch/powerpc/mm/pgtable-radix.c|   6 ++
 arch/powerpc/mm/tlb-radix.c|  66 
 13 files changed, 219 insertions(+), 223 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index 99c99bb04353..f0e17f3f3f2f 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -50,6 +50,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
 
 #define arch_flush_lazy_mmu_mode()  do {} while (0)
 
+extern void hash__tlbiel_all(unsigned int action);
 
 extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
int ssize, unsigned long flags);
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index af06c6fe8a9f..db025dd7f619 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -10,6 +10,8 @@ static inline int mmu_get_ap(int psize)
return mmu_psize_defs[psize].ap;
 }
 
+extern void radix__tlbiel_all(unsigned int action);
+
 extern void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma,
   unsigned long start, unsigned long 
end);
 extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long 
start,
@@ -46,4 +48,5 @@ extern void radix__flush_tlb_lpid(unsigned long lpid);
 extern void radix__flush_tlb_all(void);
 extern void radix__flush_tlb_pte_p9_dd1(unsigned long old_pte, struct 
mm_struct *mm,
unsigned long address);
+
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 70760d018bcd..d9f3dc35d8fd 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -7,6 +7,40 @@
 #include 
 #include 
 
+/* TLB flush actions. Used as argument to tlbiel_all() */
+enum {
+   TLB_INVAL_SCOPE_GLOBAL = 0, /* invalidate all TLBs */
+   TLB_INVAL_SCOPE_LPID = 1,   /* invalidate TLBs for current LPID */
+};
+
+static inline void tlbiel_all(void)
+{
+   /*
+* This is used for host machine check and bootup.
+*
+* This uses early_radix_enabled and implementations use
+* early_cpu_has_feature etc because that works early in boot
+* and this is the machine check path which is not performance
+* critical.
+*/
+   if (early_radix_enabled())
+   radix__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL);
+   else
+   hash__tlbiel_all(TLB_INVAL_SCOPE_GLOBAL);
+}
+
+static

Re: [RFC 2/2] powerpc/mm: Enable deferred flushing of TLB during reclaim

2017-11-01 Thread Anshuman Khandual
On 11/01/2017 03:47 PM, Anshuman Khandual wrote:
> Deferred flushing can only be enabled on POWER9 DD2.0 processor onwards.
> Because prior versions of POWER9 and previous hash table based POWER
> processors will do TLB flushing in pte_get_and_clear() function itself
> which then prevents batching and eventual flush completion later on.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/Kconfig|  1 +
>  arch/powerpc/include/asm/tlbbatch.h | 30 +++
>  arch/powerpc/include/asm/tlbflush.h |  3 +++
>  arch/powerpc/mm/tlb-radix.c | 49 
> +
>  4 files changed, 83 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/tlbbatch.h
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 809c468..f06b565 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -230,6 +230,7 @@ config PPC
>   select SPARSE_IRQ
>   select SYSCTL_EXCEPTION_TRACE
>   select VIRT_TO_BUS  if !PPC64
> + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if (PPC64 && PPC_BOOK3S)
>   #
>   # Please keep this list sorted alphabetically.
>   #
> diff --git a/arch/powerpc/include/asm/tlbbatch.h 
> b/arch/powerpc/include/asm/tlbbatch.h
> new file mode 100644
> index 000..fc762ef
> --- /dev/null
> +++ b/arch/powerpc/include/asm/tlbbatch.h
> @@ -0,0 +1,30 @@
> +#ifndef _ARCH_POWERPC_TLBBATCH_H
> +#define _ARCH_POWERPC_TLBBATCH_H
> +
> +#include 
> +
> +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> +
> +#define MAX_BATCHED_MM 1024
> +
> +struct arch_tlbflush_unmap_batch {
> + /*
> +  * Each bit set is a CPU that potentially has a
> +  * TLB entry for one of the PFN being flushed.
> +  * This represents whether all deferred struct
> +  * mm will be flushed for any given CPU.
> +  */
> + struct cpumask cpumask;
> +
> + /* All the deferred struct mm */
> + struct mm_struct *mm[MAX_BATCHED_MM];
> + unsigned long int nr_mm;
> + 
> +};
> +
> +extern bool arch_tlbbatch_should_defer(struct mm_struct *mm);
> +extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> +extern void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
> + struct mm_struct *mm);
> +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */
> +#endif /* _ARCH_POWERPC_TLBBATCH_H */
> diff --git a/arch/powerpc/include/asm/tlbflush.h 
> b/arch/powerpc/include/asm/tlbflush.h
> index 13dbcd4..2041923 100644
> --- a/arch/powerpc/include/asm/tlbflush.h
> +++ b/arch/powerpc/include/asm/tlbflush.h
> @@ -20,6 +20,9 @@
>   */
>  #ifdef __KERNEL__
> 
> +#include 
> +#include 
> +
>  #ifdef CONFIG_PPC_MMU_NOHASH
>  /*
>   * TLB flushing for software loaded TLB chips
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index b3e849c..506e7ed 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include 
>  #include 
> @@ -519,3 +521,50 @@ extern void radix_kvm_prefetch_workaround(struct 
> mm_struct *mm)
>  }
>  EXPORT_SYMBOL_GPL(radix_kvm_prefetch_workaround);
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> +
> +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> +static void clear_tlb(void *data)
> +{
> + struct arch_tlbflush_unmap_batch *batch = data;
> + int i;
> +
> + WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1));
> +
> + for (i = 0; i < batch->nr_mm; i++) {
> + if (batch->mm[i])
> + radix__local_flush_tlb_mm(batch->mm[i]);
> + }
> +}

Instead of clearing each affected 'struct mm' on the CPU, we can
just TLB flush the entire CPU for the given partition. But its
not really giving any improvement from the flushing mechanism
described above.

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 506e7ed..b0eb218 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -525,15 +525,8 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct 
*mm)
 #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 static void clear_tlb(void *data)
 {
-   struct arch_tlbflush_unmap_batch *batch = data;
-   int i;
-
WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1));
-
-   for (i = 0; i < batch->nr_mm; i++) {
-   if (batch->mm[i])
-   radix__local_flush_tlb_mm(batch->mm[i]);
-   }
+   cur_cpu_spec->flush_tlb(TLB_INVAL_SCOPE_LPID);
 }

$time ./run case-lru-file-mmap-read

real4m15.766s
user108m6.967s
sys 393m15.152s



Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Philip Li
On Thu, Nov 02, 2017 at 01:44:07PM +1100, Cyril Bur wrote:
> On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote:
> > Hi Cyril,
> > 
> > Thank you for the patch! Yet something to improve:
> > 
> 
> Once again robot, you have done brilliantly! You're 100% correct and
> the last thing I want to do is break the build with
> CONFIG_PPC_TRANSACTIONAL_MEM turned off.
> 
> Life saver,
> Thanks so much kbuild.
Thanks Cyril, we are glad to do some help.

> 
> Cyril
> 
> > [auto build test ERROR on powerpc/next]
> > [also build test ERROR on v4.14-rc7 next-20171018]
> > [if your patch is applied to the wrong git tree, please drop us a note to 
> > help improve the system]
> > 
> > url:
> > https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> > next
> > config: powerpc-asp8347_defconfig (attached as .config)
> > compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> > reproduce:
> > wget 
> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> > ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # save the attached .config to linux build tree
> > make.cross ARCH=powerpc 
> > 
> > All errors (new ones prefixed by >>):
> > 
> >arch/powerpc/kernel/process.c: In function 'is_transactionally_fp':
> > > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has 
> > > > no member named 'ckpt_regs'
> > 
> >   (tsk->thread.ckpt_regs.msr & MSR_FP);
> >   ^
> >arch/powerpc/kernel/process.c:244:1: error: control reaches end of 
> > non-void function [-Werror=return-type]
> > }
> > ^
> >cc1: all warnings being treated as errors
> > 
> > vim +243 arch/powerpc/kernel/process.c
> > 
> >239  
> >240  static int is_transactionally_fp(struct task_struct *tsk)
> >241  {
> >242  return msr_tm_active(tsk->thread.regs->msr) &&
> >  > 243  (tsk->thread.ckpt_regs.msr & MSR_FP);
> >244  }
> >245  
> > 
> > ---
> > 0-DAY kernel test infrastructureOpen Source Technology 
> > Center
> > https://lists.01.org/pipermail/kbuild-all   Intel 
> > Corporation
> 


Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

2017-11-01 Thread Nicholas Piggin
On Thu, 2 Nov 2017 08:49:49 +0530
Anshuman Khandual  wrote:

> On 11/01/2017 07:09 PM, Nicholas Piggin wrote:
> > On Wed, 1 Nov 2017 17:35:51 +0530
> > Anshuman Khandual  wrote:
> >   
> >> On 10/31/2017 12:14 PM, Nicholas Piggin wrote:  
> >>> Here's a random mix of performance improvements for radix TLB flushing
> >>> code. The main aims are to reduce the amount of translation that gets
> >>> invalidated, and to reduce global flushes where we can do local.
> >>>
> >>> To that end, a parallel kernel compile benchmark using powerpc:tlbie
> >>> tracepoint shows a reduction in tlbie instructions from about 290,000
> >>> to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
> >>> 15,000,000. Looks great, but unfortunately does not translate to a
> >>> statistically significant performance improvement! The needle on TLB
> >>> misses does not move much, I suspect because a lot of the flushing is
> >>> done a startup and shutdown, and because a significant cost of TLB
> >>> flushing itself is in the barriers.
> >>
> >> Does memory barrier initiate a single global invalidation with tlbie ?
> >>  
> > 
> > I'm not quite sure what you're asking, and I don't know the details
> > of how the hardware handles it, but from the measurements in patch
> > 1 of the series we can see there is a benefit for both tlbie and
> > tlbiel of batching them up between barriers.  
> 
> Ahh, I might have got the statement "a significant cost of TLB flushing
> itself is in the barriers" wrong. I guess you were mentioning about the
> total cost of multiple TLB flushes with memory barriers in between each
> of them which is causing the high execution cost. This got reduced by
> packing multiple tlbie(l) instruction between a single memory barrier.
> 

Yes that did get reduced for the va range flush in my patches. However
the big reduction in the number of tlbiel calls came from more use of
range flushes and fewer use of PID flushes. But the PID flushes already
have such optimization. Therefore despite tlbiel being reduced, the
number of barriers probably has not gone down a great deal on this
workload, which may explain why performance numbers are basically in
the noise.

Thanks,
Nick


Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

2017-11-01 Thread Anshuman Khandual
On 11/01/2017 07:09 PM, Nicholas Piggin wrote:
> On Wed, 1 Nov 2017 17:35:51 +0530
> Anshuman Khandual  wrote:
> 
>> On 10/31/2017 12:14 PM, Nicholas Piggin wrote:
>>> Here's a random mix of performance improvements for radix TLB flushing
>>> code. The main aims are to reduce the amount of translation that gets
>>> invalidated, and to reduce global flushes where we can do local.
>>>
>>> To that end, a parallel kernel compile benchmark using powerpc:tlbie
>>> tracepoint shows a reduction in tlbie instructions from about 290,000
>>> to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
>>> 15,000,000. Looks great, but unfortunately does not translate to a
>>> statistically significant performance improvement! The needle on TLB
>>> misses does not move much, I suspect because a lot of the flushing is
>>> done a startup and shutdown, and because a significant cost of TLB
>>> flushing itself is in the barriers.  
>>
>> Does memory barrier initiate a single global invalidation with tlbie ?
>>
> 
> I'm not quite sure what you're asking, and I don't know the details
> of how the hardware handles it, but from the measurements in patch
> 1 of the series we can see there is a benefit for both tlbie and
> tlbiel of batching them up between barriers.

Ahh, I might have got the statement "a significant cost of TLB flushing
itself is in the barriers" wrong. I guess you were mentioning about the
total cost of multiple TLB flushes with memory barriers in between each
of them which is causing the high execution cost. This got reduced by
packing multiple tlbie(l) instruction between a single memory barrier.



[PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

2017-11-01 Thread Cyril Bur
Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur 
---
V2: Unchanged
V3: Unchanged

 arch/powerpc/include/asm/tm.h   |  5 ++--
 arch/powerpc/kernel/process.c   | 22 ++-
 arch/powerpc/kernel/signal_32.c |  2 +-
 arch/powerpc/kernel/signal_64.c |  2 +-
 arch/powerpc/kernel/tm.S| 59 -
 arch/powerpc/kernel/traps.c | 26 +-
 6 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..33d965911bec 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -11,10 +11,9 @@
 
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
-  unsigned long orig_msr, uint8_t cause);
+  uint8_t cause);
 extern void tm_reclaim_current(uint8_t cause);
-extern void tm_recheckpoint(struct thread_struct *thread,
-   unsigned long orig_msr);
+extern void tm_recheckpoint(struct thread_struct *thread);
 extern void tm_abort(uint8_t cause);
 extern void tm_save_sprs(struct thread_struct *thread);
 extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bf651f2fd3bd..b00c291cd05c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -869,6 +869,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 
giveup_all(container_of(thr, struct task_struct, thread));
 
+   tm_reclaim(thr, cause);
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -887,8 +889,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(&thr->ckvr_state, &thr->vr_state,
   sizeof(struct thread_vr_state));
-
-   tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
 void tm_reclaim_current(uint8_t cause)
@@ -937,11 +937,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
tm_save_sprs(thr);
 }
 
-extern void __tm_recheckpoint(struct thread_struct *th

[PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable

2017-11-01 Thread Cyril Bur
After handling a transactional FP, Altivec or VSX unavailable exception.
The return to userspace code will detect that the TIF_RESTORE_TM bit is
set and call restore_tm_state(). restore_tm_state() will call
restore_math() to ensure that the correct facilities are loaded.

This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
is doing pointless work and can simply be removed.

Signed-off-by: Cyril Bur 
---
V2: Obvious cleanup which should have been in v1
V3: Unchanged
 arch/powerpc/kernel/traps.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4a7bc64352fd..3181e85ef17c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * so we don't want to load the VRs from the thread_struct.
 */
tm_recheckpoint(¤t->thread);
-
-   /* If VMX is in use, get the transactional values back */
-   if (orig_msr & MSR_VEC) {
-   msr_check_and_set(MSR_VEC);
-   load_vr_state(¤t->thread.vr_state);
-   /* At this point all the VSX state is loaded, so enable it */
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
current->thread.load_vec = 1;
tm_recheckpoint(¤t->thread);
current->thread.used_vr = 1;
-
-   if (orig_msr & MSR_FP) {
-   msr_check_and_set(MSR_FP);
-   load_fp_state(¤t->thread.fp_state);
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void vsx_unavailable_tm(struct pt_regs *regs)
@@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.load_fp = 1;
 
tm_recheckpoint(¤t->thread);
-
-   msr_check_and_set(MSR_FP | MSR_VEC);
-   load_fp_state(¤t->thread.fp_state);
-   load_vr_state(¤t->thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.15.0



[PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception

2017-11-01 Thread Cyril Bur
Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur 
---
V2: Add this patch for ease of backporting the same fix as the next
patch.
V3: No change

 arch/powerpc/kernel/process.c |  4 ++--
 arch/powerpc/kernel/traps.c   | 22 +-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index cff887e67eb9..bf651f2fd3bd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -867,6 +867,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   giveup_all(container_of(thr, struct task_struct, thread));
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
memcpy(&thr->ckvr_state, &thr->vr_state,
   sizeof(struct thread_vr_state));
 
-   giveup_all(container_of(thr, struct task_struct, thread));
-
tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ef6a45969812..a7d42c89a257 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * If VMX is in use, the VRs now hold checkpointed values,
 * so we don't want to load the VRs from the thread_struct.
 */
-   tm_recheckpoint(¤t->thread, MSR_FP);
+   tm_recheckpoint(¤t->thread, orig_msr | MSR_FP);
 
/* If VMX is in use, get the transactional values back */
-   if (regs->msr & MSR_VEC) {
+   if (orig_msr & MSR_VEC) {
msr_check_and_set(MSR_VEC);
load_vr_state(¤t->thread.vr_state);
/* At this point all the VSX state is loaded, so enable it */
@@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
-   tm_recheckpoint(¤t->thread, MSR_VEC);
+   tm_recheckpoint(¤t->thread, orig_msr | MSR_VEC);
current->thread.used_vr = 1;
 
-   if (regs->msr & MSR_FP) {
+   if (orig_msr & MSR_FP) {
msr_check_and_set(MSR_FP);
load_fp_state(¤t->thread.fp_state);
regs->msr |= MSR_VSX;
@@ -1559,7 +1571,7 @@ void vsx_u

[PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Cyril Bur
Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur 
---
V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use
the load_fp and load_vec booleans to help restore_math() make the
correct decision
V3: Put tm_active_with_{fp,altivec}() inside a #ifdef
CONFIG_PPC_TRANSACTIONAL_MEM 


 arch/powerpc/kernel/process.c | 18 --
 arch/powerpc/kernel/traps.c   |  8 
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..cff887e67eb9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -97,9 +97,23 @@ static inline bool msr_tm_active(unsigned long msr)
 {
return MSR_TM_ACTIVE(msr);
 }
+
+static bool tm_active_with_fp(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_FP);
+}
+
+static bool tm_active_with_altivec(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_VEC);
+}
 #else
 static inline bool msr_tm_active(unsigned long msr) { return false; }
 static inline void check_if_tm_restore_required(struct task_struct *tsk) { }
+static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; }
+static inline bool tm_active_with_altivec(struct task_struct *tsk) { return 
false; }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 bool strict_msr_control;
@@ -232,7 +246,7 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 static int restore_fp(struct task_struct *tsk)
 {
-   if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) {
+   if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
load_fp_state(¤t->thread.fp_state);
current-

Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Cyril Bur
On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote:
> Hi Cyril,
> 
> Thank you for the patch! Yet something to improve:
> 

Once again robot, you have done brilliantly! You're 100% correct and
the last thing I want to do is break the build with
CONFIG_PPC_TRANSACTIONAL_MEM turned off.

Life saver,
Thanks so much kbuild.

Cyril

> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.14-rc7 next-20171018]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-asp8347_defconfig (attached as .config)
> compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>arch/powerpc/kernel/process.c: In function 'is_transactionally_fp':
> > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has 
> > > no member named 'ckpt_regs'
> 
>   (tsk->thread.ckpt_regs.msr & MSR_FP);
>   ^
>arch/powerpc/kernel/process.c:244:1: error: control reaches end of 
> non-void function [-Werror=return-type]
> }
> ^
>cc1: all warnings being treated as errors
> 
> vim +243 arch/powerpc/kernel/process.c
> 
>239
>240static int is_transactionally_fp(struct task_struct *tsk)
>241{
>242return msr_tm_active(tsk->thread.regs->msr) &&
>  > 243(tsk->thread.ckpt_regs.msr & MSR_FP);
>244}
>245
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread kbuild test robot
Hi Cyril,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.14-rc7 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-asp8347_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/process.c: In function 'is_transactionally_fp':
>> arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has no 
>> member named 'ckpt_regs'
  (tsk->thread.ckpt_regs.msr & MSR_FP);
  ^
   arch/powerpc/kernel/process.c:244:1: error: control reaches end of non-void 
function [-Werror=return-type]
}
^
   cc1: all warnings being treated as errors

vim +243 arch/powerpc/kernel/process.c

   239  
   240  static int is_transactionally_fp(struct task_struct *tsk)
   241  {
   242  return msr_tm_active(tsk->thread.regs->msr) &&
 > 243  (tsk->thread.ckpt_regs.msr & MSR_FP);
   244  }
   245  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH v2 2/2] powerpc/64s: idle skip POWER9 DD1 and DD2.0 specific workarounds on DD2.1

2017-11-01 Thread Nicholas Piggin
DD2.1 does not have to flush the ERAT after a state-loss idle. It also
does not have to save and restore MMCR0 (although it does have to save
restore in deep idle states, like other PMU registers).

Performance testing was done on a DD2.1 using only the stop0 idle state
(the shallowest state which supports state loss), using context_switch
selftest configured to ping-poing between two threads on the same core
and two different cores.

Performance improvement for same core is 7.0%, different cores is 14.8%.

Reviewed-by: Vaidyanathan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/idle_book3s.S | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 175d49f468af..59657783d1d5 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -112,12 +112,14 @@ power9_save_additional_sprs:
std r4, STOP_HFSCR(r13)
 
mfspr   r3, SPRN_MMCRA
-   mfspr   r4, SPRN_MMCR1
+   mfspr   r4, SPRN_MMCR0
std r3, STOP_MMCRA(r13)
-   std r4, STOP_MMCR1(r13)
+   std r4, _MMCR0(r1)
 
-   mfspr   r3, SPRN_MMCR2
-   std r3, STOP_MMCR2(r13)
+   mfspr   r3, SPRN_MMCR1
+   mfspr   r4, SPRN_MMCR2
+   std r3, STOP_MMCR1(r13)
+   std r4, STOP_MMCR2(r13)
blr
 
 power9_restore_additional_sprs:
@@ -135,11 +137,14 @@ power9_restore_additional_sprs:
ld  r4, STOP_MMCRA(r13)
mtspr   SPRN_HFSCR, r3
mtspr   SPRN_MMCRA, r4
-   /* We have already restored PACA_MMCR0 */
-   ld  r3, STOP_MMCR1(r13)
-   ld  r4, STOP_MMCR2(r13)
-   mtspr   SPRN_MMCR1, r3
-   mtspr   SPRN_MMCR2, r4
+
+   ld  r3, _MMCR0(r1)
+   ld  r4, STOP_MMCR1(r13)
+   mtspr   SPRN_MMCR0, r3
+   mtspr   SPRN_MMCR1, r4
+
+   ld  r3, STOP_MMCR2(r13)
+   mtspr   SPRN_MMCR2, r3
blr
 
 /*
@@ -350,6 +355,7 @@ power_enter_stop:
b   pnv_wakeup_noloss
 
 .Lhandle_esl_ec_set:
+BEGIN_FTR_SECTION
/*
 * POWER9 DD2 can incorrectly set PMAO when waking up after a
 * state-loss idle. Saving and restoring MMCR0 over idle is a
@@ -357,6 +363,7 @@ power_enter_stop:
 */
mfspr   r4,SPRN_MMCR0
std r4,_MMCR0(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
 
 /*
  * Check if the requested state is a deep idle state.
@@ -542,15 +549,17 @@ pnv_restore_hyp_resource_arch300:
 * then clear bit 60 in MMCRA to ensure the PMU starts running.
 */
blt cr3,1f
+BEGIN_FTR_SECTION
PPC_INVALIDATE_ERAT
ld  r1,PACAR1(r13)
+   ld  r4,_MMCR0(r1)
+   mtspr   SPRN_MMCR0,r4
+END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
mfspr   r4,SPRN_MMCRA
ori r4,r4,(1 << (63-60))
mtspr   SPRN_MMCRA,r4
xorir4,r4,(1 << (63-60))
mtspr   SPRN_MMCRA,r4
-   ld  r4,_MMCR0(r1)
-   mtspr   SPRN_MMCR0,r4
 1:
/*
 * POWER ISA 3. Use PSSCR to determine if we
-- 
2.15.0



[PATCH v2 1/2] powerpc: add POWER9_DD20 feature

2017-11-01 Thread Nicholas Piggin
Cc: Michael Neuling 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/cputable.h |  5 -
 arch/powerpc/kernel/cputable.c  | 20 
 arch/powerpc/kernel/dt_cpu_ftrs.c   |  2 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index a9bf921f4efc..194dc3006446 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -215,6 +215,7 @@ enum {
 #define CPU_FTR_DABRX  LONG_ASM_CONST(0x0800)
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
 #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000)
+#define CPU_FTR_POWER9_DD20LONG_ASM_CONST(0x8000)
 
 #ifndef __ASSEMBLY__
 
@@ -477,6 +478,7 @@ enum {
CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
 #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
 (~CPU_FTR_SAO))
+#define CPU_FTRS_POWER9_DD20 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD20)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -495,7 +497,8 @@ enum {
(CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
-CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | 
CPU_FTRS_POWER9_DD1)
+CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \
+CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD20)
 #endif
 #else
 enum {
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 760872916013..171820190de7 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -547,6 +547,26 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check_early= __machine_check_early_realmode_p9,
.platform   = "power9",
},
+   {   /* Power9 DD2.0 */
+   .pvr_mask   = 0xefff,
+   .pvr_value  = 0x004e0200,
+   .cpu_name   = "POWER9 (raw)",
+   .cpu_features   = CPU_FTRS_POWER9_DD20,
+   .cpu_user_features  = COMMON_USER_POWER9,
+   .cpu_user_features2 = COMMON_USER2_POWER9,
+   .mmu_features   = MMU_FTRS_POWER9,
+   .icache_bsize   = 128,
+   .dcache_bsize   = 128,
+   .num_pmcs   = 6,
+   .pmc_type   = PPC_PMC_IBM,
+   .oprofile_cpu_type  = "ppc64/power9",
+   .oprofile_type  = PPC_OPROFILE_INVALID,
+   .cpu_setup  = __setup_cpu_power9,
+   .cpu_restore= __restore_cpu_power9,
+   .flush_tlb  = __flush_tlb_power9,
+   .machine_check_early= __machine_check_early_realmode_p9,
+   .platform   = "power9",
+   },
{   /* Power9 */
.pvr_mask   = 0x,
.pvr_value  = 0x004e,
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 7275fed271af..63b9d7edd63f 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -735,6 +735,8 @@ static __init void cpufeatures_cpu_quirks(void)
 */
if ((version & 0xff00) == 0x004e0100)
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD1;
+   else if ((version & 0xefff) == 0x004e0200)
+   cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD20;
 }
 
 static void __init cpufeatures_setup_finished(void)
-- 
2.15.0



[PATCH v2 0/2] Add POWER9 DD2.0 feature, remove idle workarounds in DD2.1

2017-11-01 Thread Nicholas Piggin
Chnages since v1:
- Change unnecessarily 'inverted else' conditions for the DD2.0
  feature workarounds, noted by mpe.
- One of the cases was incorrect, removing the code in for DD2.0 and
  leaving it in for 2.1. Performance is further slightly improved with
  this case fixed.
  

Nicholas Piggin (2):
  powerpc: add POWER9_DD20 feature
  powerpc/64s: idle skip POWER9 DD1 and DD2.0 specific workarounds on
DD2.1

 arch/powerpc/include/asm/cputable.h |  5 -
 arch/powerpc/kernel/cputable.c  | 20 
 arch/powerpc/kernel/dt_cpu_ftrs.c   |  2 ++
 arch/powerpc/kernel/idle_book3s.S   | 31 ---
 4 files changed, 46 insertions(+), 12 deletions(-)

-- 
2.15.0



RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC

2017-11-01 Thread Qiang Zhao
On Wed, 1 Nov 2017, Thomas Gleixner  wrote:


> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Thursday, November 02, 2017 1:10 AM
> To: Qiang Zhao 
> Cc: Michael Ellerman ; Jason Cooper
> ; Marc Zyngier ;
> o...@buserror.net; linuxppc-dev@lists.ozlabs.org; Xiaobo Xie
> ; linux-ker...@vger.kernel.org
> Subject: RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC
> 
> On Wed, 1 Nov 2017, Qiang Zhao wrote:
> > Michael Ellerman  wrote
> > >
> > > Qiang Zhao  writes:
> > >
> > > > Hi all,
> > > >
> > > > Could anybody review this patchset and take action on them? Thank you!
> > >
> > > Who maintains this? I don't actually know, it's not powerpc code, or is 
> > > it?
> >
> > Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason 
> > and
> Marc according to MAINTAINERS file.
> >
> > Hi Thomas, Jason and Marc,
> >
> > Could you keep an eye on this patchset? Thank you!
> 
> It's on my radar, but I have zero capacity at the moment. Hopefully Marc can
> spare a few cycles.
> 
> Thanks,
> 
>   tglx

Thank you!

Best Regards
Qiang Zhao


RE: [Patch v10] QE: remove PPCisms for QE

2017-11-01 Thread Qiang Zhao
On 01/11/17 05:19PM, Julien Thierry  wrote:
> -Original Message-
> From: Julien Thierry [mailto:julien.thie...@arm.com]
> Sent: Wednesday, November 01, 2017 5:19 PM
> To: Qiang Zhao ; o...@buserror.net
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [Patch v10] QE: remove PPCisms for QE
> 
> Hi Zhao,
> 
> I just noticed a small nit.
> 
> > /* wait for the QE_CR_FLG to clear */
> > -   ret = spin_event_timeout((in_be32(&qe_immr->cp.cecr) & QE_CR_FLG)
> == 0,
> > -  100, 0);
> > +   ret = -EIO;
> > +   for (i = 0; i < 100; i++) {
> > +   if ((ioread32be(&qe_immr->cp.cecr) & QE_CR_FLG) == 0) {
> > +   ret = 0;
> > +   break;
> > +   }
> > +   udelay(1);
> > +   }
> > +
> > /* On timeout (e.g. failure), the expression will be false (ret == 0),
> >otherwise it will be true (ret == 1). */
> 
> nit:
> The comment here is no longer valid, on timeout ret == -EIO and on success 0. 
> It
> should probably be removed to avoid confusion.

So thank you for you reminder, I ignored it, and will fix in the next version.
 
BR
-Qiang Zhao


Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM

2017-11-01 Thread Cyril Bur
On Wed, 2017-11-01 at 15:23 -0400, Gustavo Romero wrote:
> Add a self test to check if FP/VEC/VSX registers are sane (restored
> correctly) after a FP/VEC/VSX unavailable exception is caught during a
> transaction.
> 
> This test checks all possibilities in a thread regarding the combination
> of MSR.[FP|VEC] states in a thread and for each scenario raises a
> FP/VEC/VSX unavailable exception in transactional state, verifying if
> vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg
> sets, are not corrupted.
> 

Thanks Gustavo,

I do have one more thought on an improvement for this test which is
that:
+   /* Counter for busy wait *
+   uint64_t counter = 0x1ff00;
is a bit fragile, what we should do is have the test work out long it
should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then
use that for these tests.

This will only become a problem if we were to change kernel heuristics
which is fine for now. I'll try to get that added soon but for now this
test has proven too useful to delay adding as is.

> Signed-off-by: Gustavo Romero 
> Signed-off-by: Breno Leitao 
> Signed-off-by: Cyril Bur 
> ---
>  tools/testing/selftests/powerpc/tm/Makefile|   3 +-
>  .../testing/selftests/powerpc/tm/tm-unavailable.c  | 368 
> +
>  tools/testing/selftests/powerpc/tm/tm.h|   5 +
>  3 files changed, 375 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c
> 
> diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
> b/tools/testing/selftests/powerpc/tm/Makefile
> index 7bfcd45..24855c0 100644
> --- a/tools/testing/selftests/powerpc/tm/Makefile
> +++ b/tools/testing/selftests/powerpc/tm/Makefile
> @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
> tm-signal-context-chk-fpu
>   tm-signal-context-chk-vmx tm-signal-context-chk-vsx
>  
>  TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
> tm-signal-stack \
> - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \
> + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \
>   $(SIGNAL_CONTEXT_CHK_TESTS)
>  
>  include ../../lib.mk
> @@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include
>  $(OUTPUT)/tm-tmspr: CFLAGS += -pthread
>  $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
>  $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o
> +$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 
> -Wno-error=uninitialized -mvsx
>  
>  SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
> %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
>  $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
> diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
> b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> new file mode 100644
> index 000..69a4e8c
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> @@ -0,0 +1,368 @@
> +/*
> + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp.
> + * Licensed under GPLv2.
> + *
> + * Force FP, VEC and VSX unavailable exception during transaction in all
> + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP
> + * is enable and VEC is disable, when FP is disable and VEC is enable, and
> + * so on. Then we check if the restored state is correctly set for the
> + * FP and VEC registers to the previous state we set just before we entered
> + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and
> + * VEC/Altivec registers on abortion due to an unavailable exception in TM.
> + * N.B. In this test we do not test all the FP/Altivec/VSX registers for
> + * corruption, but only for registers vs0 and vs32, which are respectively
> + * representatives of FP and VEC/Altivec reg sets.
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "tm.h"
> +
> +#define DEBUG 0
> +
> +/* Unavailable exceptions to test in HTM */
> +#define FP_UNA_EXCEPTION 0
> +#define VEC_UNA_EXCEPTION1
> +#define VSX_UNA_EXCEPTION2
> +
> +#define NUM_EXCEPTIONS   3
> +
> +struct Flags {
> + int touch_fp;
> + int touch_vec;
> + int result;
> + int exception;
> +} flags;
> +
> +bool expecting_failure(void)
> +{
> + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION)
> + return false;
> +
> + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION)
> + return false;
> +
> + /* If both FP and VEC are touched it does not mean that touching
> +  * VSX won't raise an exception. However since FP and VEC state
> +  * are already correctly loaded, the transaction is not aborted
> +  * (i.e. treclaimed/trecheckpointed) and MSR.VSX is just set as 1,
> +  * so a TM failure is not expected also in this case.
> +  */
> + if ((flags.touch_fp && flags.touch_vec) &&
> +  flags.exception == VSX_UNA_EXCEPTION)
> + return

linux-next: manual merge of the powerpc tree with the powerpc-fixes tree

2017-11-01 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the powerpc tree got a conflict in:

  arch/powerpc/mm/tlb-radix.c

between commit:

  26e53d5ebe2e ("powerpc/64s/radix: Fix preempt imbalance in TLB flush")

from the powerpc-fixes tree and commit:

  6773027205ea ("powerpc/mm/radix: Drop unneeded NULL check")

from the powerpc tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/mm/tlb-radix.c
index d304028641a2,3a07d7a5e2fe..
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@@ -359,8 -359,7 +359,8 @@@ void radix__flush_tlb_collapsed_pmd(str
unsigned long pid, end;
  
  
-   pid = mm ? mm->context.id : 0;
+   pid = mm->context.id;
 +  preempt_disable();
if (unlikely(pid == MMU_NO_CONTEXT))
goto no_context;
  


[PATCH] [net-next, v2] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-01 Thread Desnes Augusto Nunes do Rosario
This patch implements and enables VDP support for the ibmvnic driver.
Moreover, it includes the implementation of suitable structs, signal
 transmission/handling and functions which allows the retrival of firmware
 information from the ibmvnic card.

Signed-off-by: Desnes A. Nunes do Rosario 
Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 135 -
 drivers/net/ethernet/ibm/ibmvnic.h |  27 +++-
 2 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d0cff28..ee7a27b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
return 0;
 }
 
+static void release_vpd_data(struct ibmvnic_adapter *adapter)
+{
+   if (!adapter->vpd)
+   return;
+
+   kfree(adapter->vpd->buff);
+   kfree(adapter->vpd);
+}
+
 static void release_tx_pools(struct ibmvnic_adapter *adapter)
 {
struct ibmvnic_tx_pool *tx_pool;
@@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
 {
int i;
 
+   release_vpd_data(adapter);
+
release_tx_pools(adapter);
release_rx_pools(adapter);
 
@@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device *netdev)
return rc;
 }
 
+static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
+{
+   struct device *dev = &adapter->vdev->dev;
+   union ibmvnic_crq crq;
+   dma_addr_t dma_addr;
+   int len;
+
+   if (adapter->vpd->buff)
+   len = adapter->vpd->len;
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd_size.cmd = GET_VPD_SIZE;
+   ibmvnic_send_crq(adapter, &crq);
+   wait_for_completion(&adapter->fw_done);
+
+   if (!adapter->vpd->buff)
+   adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
+   else if (adapter->vpd->len != len)
+   adapter->vpd->buff =
+   krealloc(adapter->vpd->buff,
+adapter->vpd->len, GFP_KERNEL);
+
+   if (!adapter->vpd->buff) {
+   dev_err(dev, "Could allocate VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   adapter->vpd->dma_addr =
+   dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
+  DMA_FROM_DEVICE);
+   if (dma_mapping_error(dev, dma_addr)) {
+   dev_err(dev, "Could not map VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd.cmd = GET_VPD;
+   crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr);
+   crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len);
+   ibmvnic_send_crq(adapter, &crq);
+   wait_for_completion(&adapter->fw_done);
+
+   return 0;
+}
+
 static int init_resources(struct ibmvnic_adapter *adapter)
 {
struct net_device *netdev = adapter->netdev;
@@ -850,6 +908,10 @@ static int init_resources(struct ibmvnic_adapter *adapter)
if (rc)
return rc;
 
+   adapter->vpd = kzalloc(sizeof(*adapter->vpd), GFP_KERNEL);
+   if (!adapter->vpd)
+   return -ENOMEM;
+
adapter->map_id = 1;
adapter->napi = kcalloc(adapter->req_rx_queues,
sizeof(struct napi_struct), GFP_KERNEL);
@@ -950,6 +1012,10 @@ static int ibmvnic_open(struct net_device *netdev)
 
rc = __ibmvnic_open(netdev);
netif_carrier_on(netdev);
+
+   /* Vital Product Data (VPD) */
+   ibmvnic_get_vpd(adapter);
+
mutex_unlock(&adapter->reset_lock);
 
return rc;
@@ -1878,11 +1944,15 @@ static int ibmvnic_get_link_ksettings(struct net_device 
*netdev,
return 0;
 }
 
-static void ibmvnic_get_drvinfo(struct net_device *dev,
+static void ibmvnic_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *info)
 {
+   struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+
strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver));
strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version));
+   strlcpy(info->fw_version, adapter->fw_version,
+   sizeof(info->fw_version));
 }
 
 static u32 ibmvnic_get_msglevel(struct net_device *netdev)
@@ -3076,6 +3146,63 @@ static void send_cap_queries(struct ibmvnic_adapter 
*adapter)
ibmvnic_send_crq(adapter, &crq);
 }
 
+static void handle_vpd_size_rsp(union ibmvnic_crq *crq,
+   struct ibmvnic_adapter *adapter)
+{
+   struct device *dev = &adapter->vdev->dev;
+
+   if (crq->get_vpd_size_rsp.rc.code) {
+   dev_err(dev, "Error retrieving VPD size, rc=%x\n",
+   crq->get_vpd_

[PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-01 Thread Desnes Augusto Nunes do Rosario
This patch implements and enables VDP support for the ibmvnic driver. Moreover, 
it includes the implementation of suitable structs, signal 
transmission/handling and fuctions which allows the retrival of firmware 
information from the ibmvnic card.

Co-Authored-By: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 140 -
 drivers/net/ethernet/ibm/ibmvnic.h |  27 ++-
 2 files changed, 164 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d0cff28..120f3c0 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
ibmvnic_adapter *,
struct ibmvnic_sub_crq_queue *);
 static int ibmvnic_poll(struct napi_struct *napi, int data);
 static void send_map_query(struct ibmvnic_adapter *adapter);
+static int ibmvnic_get_vpd(struct ibmvnic_adapter *);
+static void handle_vpd_size_rsp(union ibmvnic_crq *, struct ibmvnic_adapter *);
+static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *);
 static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, u8);
 static void send_request_unmap(struct ibmvnic_adapter *, u8);
 static void send_login(struct ibmvnic_adapter *adapter);
@@ -573,6 +576,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter)
return 0;
 }
 
+static void release_vpd_data(struct ibmvnic_adapter *adapter)
+{
+   if(!adapter->vpd)
+   return;
+
+   kfree(adapter->vpd->buff);
+   kfree(adapter->vpd);
+}
+
 static void release_tx_pools(struct ibmvnic_adapter *adapter)
 {
struct ibmvnic_tx_pool *tx_pool;
@@ -753,6 +765,8 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
 {
int i;
 
+   release_vpd_data(adapter);
+
release_tx_pools(adapter);
release_rx_pools(adapter);
 
@@ -850,6 +864,10 @@ static int init_resources(struct ibmvnic_adapter *adapter)
if (rc)
return rc;
 
+   adapter->vpd = kzalloc(sizeof(struct ibmvnic_vpd), GFP_KERNEL);
+   if (!adapter->vpd)
+   return -ENOMEM;
+
adapter->map_id = 1;
adapter->napi = kcalloc(adapter->req_rx_queues,
sizeof(struct napi_struct), GFP_KERNEL);
@@ -950,6 +968,10 @@ static int ibmvnic_open(struct net_device *netdev)
 
rc = __ibmvnic_open(netdev);
netif_carrier_on(netdev);
+
+   /* Vital Product Data (VPD) */
+   ibmvnic_get_vpd(adapter);
+
mutex_unlock(&adapter->reset_lock);
 
return rc;
@@ -1878,11 +1900,15 @@ static int ibmvnic_get_link_ksettings(struct net_device 
*netdev,
return 0;
 }
 
-static void ibmvnic_get_drvinfo(struct net_device *dev,
+static void ibmvnic_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *info)
 {
+   struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+
strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver));
strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version));
+   strlcpy(info->fw_version, adapter->fw_version,
+   sizeof(info->fw_version));
 }
 
 static u32 ibmvnic_get_msglevel(struct net_device *netdev)
@@ -2923,6 +2947,110 @@ static void send_login(struct ibmvnic_adapter *adapter)
return;
 }
 
+static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
+{
+   struct device *dev = &adapter->vdev->dev;
+   union ibmvnic_crq crq;
+   dma_addr_t dma_addr;
+   int len;
+
+   if (adapter->vpd->buff)
+   len = adapter->vpd->len;
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd_size.cmd = GET_VPD_SIZE;
+   ibmvnic_send_crq(adapter, &crq);
+   wait_for_completion(&adapter->fw_done);
+
+   if (!adapter->vpd->buff)
+   adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
+   else if (adapter->vpd->len != len)
+   adapter->vpd->buff =
+   krealloc(adapter->vpd->buff,
+adapter->vpd->len, GFP_KERNEL);
+
+   if (!adapter->vpd->buff) {
+   dev_err(dev, "Could allocate VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   adapter->vpd->dma_addr =
+   dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len,
+  DMA_FROM_DEVICE);
+   if (dma_mapping_error(dev, dma_addr)) {
+   dev_err(dev, "Could not map VPD buffer\n");
+   return -ENOMEM;
+   }
+
+   reinit_completion(&adapter->fw_done);
+   crq.get_vpd.first = IBMVNIC_CRQ_CMD;
+   crq.get_vpd.cmd = GET_VPD;
+   crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr);
+   crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len);
+   ibmvnic_send_crq(adapter,

[PATCH] selftests/powerpc: Check FP/VEC on exception in TM

2017-11-01 Thread Gustavo Romero
Add a self test to check if FP/VEC/VSX registers are sane (restored
correctly) after a FP/VEC/VSX unavailable exception is caught during a
transaction.

This test checks all possibilities in a thread regarding the combination
of MSR.[FP|VEC] states in a thread and for each scenario raises a
FP/VEC/VSX unavailable exception in transactional state, verifying if
vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg
sets, are not corrupted.

Signed-off-by: Gustavo Romero 
Signed-off-by: Breno Leitao 
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/Makefile|   3 +-
 .../testing/selftests/powerpc/tm/tm-unavailable.c  | 368 +
 tools/testing/selftests/powerpc/tm/tm.h|   5 +
 3 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 7bfcd45..24855c0 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
tm-signal-context-chk-vmx tm-signal-context-chk-vsx
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
-   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \
+   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \
$(SIGNAL_CONTEXT_CHK_TESTS)
 
 include ../../lib.mk
@@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include
 $(OUTPUT)/tm-tmspr: CFLAGS += -pthread
 $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o
+$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized 
-mvsx
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
%,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
new file mode 100644
index 000..69a4e8c
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -0,0 +1,368 @@
+/*
+ * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Force FP, VEC and VSX unavailable exception during transaction in all
+ * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP
+ * is enable and VEC is disable, when FP is disable and VEC is enable, and
+ * so on. Then we check if the restored state is correctly set for the
+ * FP and VEC registers to the previous state we set just before we entered
+ * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and
+ * VEC/Altivec registers on abortion due to an unavailable exception in TM.
+ * N.B. In this test we do not test all the FP/Altivec/VSX registers for
+ * corruption, but only for registers vs0 and vs32, which are respectively
+ * representatives of FP and VEC/Altivec reg sets.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tm.h"
+
+#define DEBUG 0
+
+/* Unavailable exceptions to test in HTM */
+#define FP_UNA_EXCEPTION   0
+#define VEC_UNA_EXCEPTION  1
+#define VSX_UNA_EXCEPTION  2
+
+#define NUM_EXCEPTIONS 3
+
+struct Flags {
+   int touch_fp;
+   int touch_vec;
+   int result;
+   int exception;
+} flags;
+
+bool expecting_failure(void)
+{
+   if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION)
+   return false;
+
+   if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION)
+   return false;
+
+   /* If both FP and VEC are touched it does not mean that touching
+* VSX won't raise an exception. However since FP and VEC state
+* are already correctly loaded, the transaction is not aborted
+* (i.e. treclaimed/trecheckpointed) and MSR.VSX is just set as 1,
+* so a TM failure is not expected also in this case.
+*/
+   if ((flags.touch_fp && flags.touch_vec) &&
+flags.exception == VSX_UNA_EXCEPTION)
+   return false;
+
+   return true;
+}
+
+/* Check if failure occurred whilst in transaction. */
+bool is_failure(uint64_t condition_reg)
+{
+   /* When failure handling occurs, CR0 is set to 0b1010 (0xa). Otherwise
+* transaction completes without failure and hence reaches out 'tend.'
+* that sets CR0 to 0b0100 (0x4).
+*/
+   return ((condition_reg >> 28) & 0xa) == 0xa;
+}
+
+void *ping(void *input)
+{
+
+   /* Expected values for vs0 and vs32 after a TM failure. They must
+* never change, otherwise they got corrupted.
+*/
+   uint64_t high_vs0 = 0x;
+   uint64_t low_vs0 = 0x;
+   uint64_t high_vs32 = 0x;
+   uint64_t low_vs32 = 0x;
+
+  

[PATCH V6 2/2] pseries/initnodes: Ensure nodes initialized for hotplug

2017-11-01 Thread Michael Bringmann
pseries/nodes: On pseries systems which allow 'hot-add' of CPU,
it may occur that the new resources are to be inserted into nodes
that were not used for memory resources at bootup.  Many different
configurations of PowerPC resources may need to be supported depending
upon the environment.  This patch fixes some problems encountered at
runtime with configurations that support memory-less nodes, or that
hot-add CPUs into nodes that are memoryless during system execution
after boot.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Add some needed node initialization to runtime code that maps
 CPUs based on VPHN associativity
  -- Add error checks and alternate recovery for compile flag
 CONFIG_MEMORY_HOTPLUG
  -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG
---
 arch/powerpc/mm/numa.c |   51 ++--
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 334a1ff..163f4cc 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
nid = of_node_to_nid_single(cpu);
 
 out_present:
-   if (nid < 0 || !node_online(nid))
+   if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
map_cpu_to_node(lcpu, nid);
@@ -867,7 +867,7 @@ void __init dump_numa_cpu_topology(void)
 }
 
 /* Initialize NODE_DATA for a node on the local memory */
-static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+static void setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 {
u64 spanned_pages = end_pfn - start_pfn;
const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
@@ -913,10 +913,8 @@ static void __init find_possible_nodes(void)
min_common_depth);
 
for (i = 0; i < numnodes; i++) {
-   if (!node_possible(i)) {
-   setup_node_data(i, 0, 0);
+   if (!node_possible(i))
node_set(i, node_possible_map);
-   }
}
 
 out:
@@ -1312,6 +1310,42 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
+static inline int find_cpu_nid(int cpu)
+{
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+   int new_nid;
+
+   /* Use associativity from first thread for all siblings */
+   vphn_get_associativity(cpu, associativity);
+   new_nid = associativity_to_nid(associativity);
+   if (new_nid < 0 || !node_possible(new_nid))
+   new_nid = first_online_node;
+
+   if (NODE_DATA(new_nid) == NULL) {
+#ifdef CONFIG_MEMORY_HOTPLUG
+   /*
+* Need to ensure that NODE_DATA is initialized
+* for a node from available memory (see
+* memblock_alloc_try_nid).  If unable to init
+* the node, then default to nearest node that
+* has memory installed.
+*/
+   if (try_online_node(new_nid))
+   new_nid = first_online_node;
+#else
+   /*
+* Default to using the nearest node that has
+* memory installed.  Otherwise, it would be 
+* necessary to patch the kernel MM code to deal
+* with more memoryless-node error conditions.
+*/
+   new_nid = first_online_node;
+#endif
+   }
+
+   return new_nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1379,7 +1413,6 @@ int numa_update_cpu_topology(bool cpus_locked)
 {
unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
-   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
cpumask_t updated_cpus;
struct device *dev;
int weight, new_nid, i = 0;
@@ -1417,11 +1450,7 @@ int numa_update_cpu_topology(bool cpus_locked)
continue;
}
 
-   /* Use associativity from first thread for all siblings */
-   vphn_get_associativity(cpu, associativity);
-   new_nid = associativity_to_nid(associativity);
-   if (new_nid < 0 || !node_online(new_nid))
-   new_nid = first_online_node;
+   new_nid = find_cpu_nid(cpu);
 
if (new_nid == numa_cpu_lookup_table[cpu]) {
cpumask_andnot(&cpu_associativity_changes_mask,



[PATCH V6 1/2] pseries/nodes: Ensure enough nodes avail for operations

2017-11-01 Thread Michael Bringmann
pseries/nodes: On pseries systems which allow 'hot-add' of CPU or
memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized.
This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot.

This patch extracts the value of the lowest domain level (number
of allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Remove some node initialization/allocation from boot setup
---
 arch/powerpc/mm/numa.c |   40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index eb604b3..334a1ff 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,37 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init find_possible_nodes(void)
+{
+   struct device_node *rtas;
+   u32 numnodes, i;
+
+   if (min_common_depth <= 0)
+   return;
+
+   rtas = of_find_node_by_path("/rtas");
+   if (!rtas)
+   return;
+
+   if (of_property_read_u32_index(rtas,
+   "ibm,max-associativity-domains",
+   min_common_depth, &numnodes))
+   goto out;
+
+   pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes,
+   min_common_depth);
+
+   for (i = 0; i < numnodes; i++) {
+   if (!node_possible(i)) {
+   setup_node_data(i, 0, 0);
+   node_set(i, node_possible_map);
+   }
+   }
+
+out:
+   of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
int nid, cpu;
@@ -905,12 +936,15 @@ void __init initmem_init(void)
memblock_dump_all();
 
/*
-* Reduce the possible NUMA nodes to the online NUMA nodes,
-* since we do not support node hotplug. This ensures that  we
-* lower the maximum NUMA node ID to what is actually present.
+* Modify the set of possible NUMA nodes to reflect information
+* available about the set of online nodes, and the set of nodes
+* that we expect to make use of for this platform's affinity
+* calculations.
 */
nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+   find_possible_nodes();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 



[PATCH V6 0/2] pseries/nodes: Fix issues with memoryless nodes

2017-11-01 Thread Michael Bringmann
pseries/nodes: Ensure enough nodes avail for operations

pseries/initnodes: Ensure nodes initialized for hotplug

Signed-off-by: Michael Bringmann 

Michael Bringmann (2):
  pseries/nodes: Ensure enough nodes avail for operations
  pseries/initnodes: Ensure nodes initialized for hotplug
---
Changes in V6:
  -- Remove some node initialization/allocation from boot setup
  -- Add some needed node initialization to runtime code that maps
 CPUs based on VPHN associativity
  -- Add error checks and alternate recovery for compile flag
 CONFIG_MEMORY_HOTPLUG
  -- Add alternate node selection recovery for !CONFIG_MEMORY_HOTPLUG



Re: linux-next: manual merge of the tip tree with the powerpc tree

2017-11-01 Thread Kees Cook
On Tue, Oct 31, 2017 at 10:53 PM, Stephen Rothwell  
wrote:
> Hi all,
>
> Today's linux-next merge of the tip tree got a conflict in:
>
>   arch/powerpc/mm/numa.c
>
> between commit:
>
>   cee5405da402 ("powerpc/hotplug: Improve responsiveness of hotplug change")
>
> from the powerpc tree and commit:
>
>   df7e828c1b69 ("timer: Remove init_timer_deferrable() in favor of 
> timer_setup()")
>
> from the tip tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc arch/powerpc/mm/numa.c
> index eb604b3574fa,73016451f330..
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@@ -1506,9 -1466,7 +1505,7 @@@ static struct timer_list topology_timer
>
>   static void reset_topology_timer(void)
>   {
> -   topology_timer.data = 0;
> -   topology_timer.expires = jiffies + topology_timer_secs * HZ;
> -   mod_timer(&topology_timer, topology_timer.expires);
>  -  mod_timer(&topology_timer, jiffies + 60 * HZ);
> ++  mod_timer(&topology_timer, jiffies + topology_timer_secs * HZ);
>   }
>
>   #ifdef CONFIG_SMP
> @@@ -1561,13 -1520,14 +1558,14 @@@ int start_topology_update(void
> rc = of_reconfig_notifier_register(&dt_update_nb);
>   #endif
> }
>  -  } else if (firmware_has_feature(FW_FEATURE_VPHN) &&
>  +  }
>  +  if (firmware_has_feature(FW_FEATURE_VPHN) &&
>lppaca_shared_proc(get_lppaca())) {
> if (!vphn_enabled) {
>  -  prrn_enabled = 0;
> vphn_enabled = 1;
> setup_cpu_associativity_change_counters();
> -   init_timer_deferrable(&topology_timer);
> +   timer_setup(&topology_timer, topology_timer_fn,
> +   TIMER_DEFERRABLE);
> reset_topology_timer();
> }
> }

Thanks, this looks correct to me!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-01 Thread Andrew Lunn
> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> > b/drivers/net/ethernet/ibm/ibmvnic.c
> > index d0cff28..120f3c0 100644
> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> > @@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
> > ibmvnic_adapter *,
> > struct ibmvnic_sub_crq_queue *);
> >  static int ibmvnic_poll(struct napi_struct *napi, int data);
> >  static void send_map_query(struct ibmvnic_adapter *adapter);
> > +static int ibmvnic_get_vpd(struct ibmvnic_adapter *);
> > +static void handle_vpd_size_rsp(union ibmvnic_crq *, struct 
> > ibmvnic_adapter *);
> > +static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *);
> 
> There should be a space after the comma.

And you should try to avoid forward declarations. Move the code around
so they are not needed.

   Andrew


RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC

2017-11-01 Thread Thomas Gleixner
On Wed, 1 Nov 2017, Qiang Zhao wrote:
> Michael Ellerman  wrote
> > 
> > Qiang Zhao  writes:
> > 
> > > Hi all,
> > >
> > > Could anybody review this patchset and take action on them? Thank you!
> > 
> > Who maintains this? I don't actually know, it's not powerpc code, or is it?
> 
> Yes, it's not powerpc code, it is irqchip code, maintained by Thomas, Jason 
> and Marc according to MAINTAINERS file.
> 
> Hi Thomas, Jason and Marc,
> 
> Could you keep an eye on this patchset? Thank you!

It's on my radar, but I have zero capacity at the moment. Hopefully Marc
can spare a few cycles.

Thanks,

tglx


Re: [PATCH FEAT] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-01 Thread Thomas Falcon
On 11/01/2017 09:39 AM, Desnes Augusto Nunes do Rosario wrote:
> This patch implements and enables VDP support for the ibmvnic driver. 
> Moreover, it includes the implementation of suitable structs, signal 
> transmission/handling and fuctions which allows the retrival of firmware 
> information from the ibmvnic card.
>
> Co-Authored-By: Thomas Falcon 
> ---

Hi, you should include a Signed-off-by tag in the commit message.  You should 
also include the branch the patch is meant for in the PATCH field.  In this 
case, it would be net-next. The commit message should also be wrapped (75 char 
per line), and 'fuctions' is missing an n.

>  drivers/net/ethernet/ibm/ibmvnic.c | 140 
> -
>  drivers/net/ethernet/ibm/ibmvnic.h |  27 ++-
>  2 files changed, 164 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index d0cff28..120f3c0 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -107,6 +107,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
> ibmvnic_adapter *,
>   struct ibmvnic_sub_crq_queue *);
>  static int ibmvnic_poll(struct napi_struct *napi, int data);
>  static void send_map_query(struct ibmvnic_adapter *adapter);
> +static int ibmvnic_get_vpd(struct ibmvnic_adapter *);
> +static void handle_vpd_size_rsp(union ibmvnic_crq *, struct ibmvnic_adapter 
> *);
> +static void handle_vpd_rsp(union ibmvnic_crq *,struct ibmvnic_adapter *);

There should be a space after the comma.

>  static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, 
> u8);
>  static void send_request_unmap(struct ibmvnic_adapter *, u8);
>  static void send_login(struct ibmvnic_adapter *adapter);
> @@ -573,6 +576,15 @@ static int reset_tx_pools(struct ibmvnic_adapter 
> *adapter)
>   return 0;
>  }
>
> +static void release_vpd_data(struct ibmvnic_adapter *adapter)
> +{
> + if(!adapter->vpd)

There should be a space between the if here. There are also some style checks 
that were picked up by checkpatch.pl. It's a good idea to run your patch 
through that before submission.

Thanks,
Tom

> + return;
> +
> + kfree(adapter->vpd->buff);
> + kfree(adapter->vpd);
> +}
> +
>  static void release_tx_pools(struct ibmvnic_adapter *adapter)
>  {
>   struct ibmvnic_tx_pool *tx_pool;
> @@ -753,6 +765,8 @@ static void release_resources(struct ibmvnic_adapter 
> *adapter)
>  {
>   int i;
>
> + release_vpd_data(adapter);
> +
>   release_tx_pools(adapter);
>   release_rx_pools(adapter);
>
> @@ -850,6 +864,10 @@ static int init_resources(struct ibmvnic_adapter 
> *adapter)
>   if (rc)
>   return rc;
>
> + adapter->vpd = kzalloc(sizeof(struct ibmvnic_vpd), GFP_KERNEL);
> + if (!adapter->vpd)
> + return -ENOMEM;
> +
>   adapter->map_id = 1;
>   adapter->napi = kcalloc(adapter->req_rx_queues,
>   sizeof(struct napi_struct), GFP_KERNEL);
> @@ -950,6 +968,10 @@ static int ibmvnic_open(struct net_device *netdev)
>
>   rc = __ibmvnic_open(netdev);
>   netif_carrier_on(netdev);
> +
> + /* Vital Product Data (VPD) */
> + ibmvnic_get_vpd(adapter);
> +
>   mutex_unlock(&adapter->reset_lock);
>
>   return rc;
> @@ -1878,11 +1900,15 @@ static int ibmvnic_get_link_ksettings(struct 
> net_device *netdev,
>   return 0;
>  }
>
> -static void ibmvnic_get_drvinfo(struct net_device *dev,
> +static void ibmvnic_get_drvinfo(struct net_device *netdev,
>   struct ethtool_drvinfo *info)
>  {
> + struct ibmvnic_adapter *adapter = netdev_priv(netdev);
> +
>   strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver));
>   strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version));
> + strlcpy(info->fw_version, adapter->fw_version,
> + sizeof(info->fw_version));
>  }
>
>  static u32 ibmvnic_get_msglevel(struct net_device *netdev)
> @@ -2923,6 +2947,110 @@ static void send_login(struct ibmvnic_adapter 
> *adapter)
>   return;
>  }
>
> +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter)
> +{
> + struct device *dev = &adapter->vdev->dev;
> + union ibmvnic_crq crq;
> + dma_addr_t dma_addr;
> + int len;
> +
> + if (adapter->vpd->buff)
> + len = adapter->vpd->len;
> +
> + reinit_completion(&adapter->fw_done);
> + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
> + crq.get_vpd_size.cmd = GET_VPD_SIZE;
> + ibmvnic_send_crq(adapter, &crq);
> + wait_for_completion(&adapter->fw_done);
> +
> + if (!adapter->vpd->buff)
> + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL);
> + else if (adapter->vpd->len != len)
> + adapter->vpd->buff =
> + krealloc(adapter->vpd->buff,
> +  adapter->vpd->len, GFP_KERNEL);
> +
> + if

Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks

2017-11-01 Thread Oleg Nesterov
On 11/01, Petr Mladek wrote:
>
> On Tue 2017-10-31 12:48:52, Miroslav Benes wrote:
> > +   if (task->flags & PF_KTHREAD) {
> > +   /*
> > +* Wake up a kthread which still has not been migrated.
> > +*/
> > +   wake_up_process(task);
>
> I have just noticed that freezer used wake_up_state(p, TASK_INTERRUPTIBLE);
> IMHO, we should do so as well.

I won't argue, but...

> wake_up_process() wakes also tasks in TASK_UNINTERRUPTIBLE state.
> These might not be ready for an unexpected wakeup. For example,
> see concat_dev_erase() in drivers/mtd/mtdcontact.c.

I'd say that concat_dev_erase() should be fixed, any code should be ready
for spurious wakeup.

Note also that wake_up_state(TASK_INTERRUPTIBLE) won't wakeup the TASK_IDLE
kthreads, and most of the kthreads which use TASK_INTERRUPTIBLE should use
TASK_IDLE today, because in most cases TASK_INTERRUPTIBLE was used to not
contribute to loadavg.

Oleg.



Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks

2017-11-01 Thread Petr Mladek
On Tue 2017-10-31 12:48:52, Miroslav Benes wrote:
> Live patching consistency model is of LEAVE_PATCHED_SET and
> SWITCH_THREAD. This means that all tasks in the system have to be marked
> one by one as safe to call a new patched function. Safe means when a
> task is not (sleeping) in a set of patched functions. That is, no
> patched function is on the task's stack. Another clearly safe place is
> the boundary between kernel and userspace. The patching waits for all
> tasks to get outside of the patched set or to cross the boundary. The
> transition is completed afterwards.
> 
> The problem is that a task can block the transition for quite a long
> time, if not forever. It could sleep in a set of patched functions, for
> example.  Luckily we can force the task to leave the set by sending it a
> fake signal, that is a signal with no data in signal pending structures
> (no handler, no sign of proper signal delivered). Suspend/freezer use
> this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
> is woken up (if it has been sleeping in the kernel before) or kicked by
> rescheduling IPI (if it was running on other CPU). This causes the task
> to go to kernel/userspace boundary where the signal would be handled and
> the task would be marked as safe in terms of live patching.
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index b004a1fb6032..6700d3b22615 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -577,3 +577,43 @@ void klp_copy_process(struct task_struct *child)
>  
>   /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
>  }
> +
> +/*
> + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request 
> this
> + * action currently.
> + */
> +void klp_force_signals(void)
> +{
> + struct task_struct *g, *task;
> +
> + pr_notice("signaling remaining tasks\n");
> +
> + read_lock(&tasklist_lock);
> + for_each_process_thread(g, task) {
> + if (!klp_patch_pending(task))
> + continue;
> +
> + /*
> +  * There is a small race here. We could see TIF_PATCH_PENDING
> +  * set and decide to wake up a kthread or send a fake signal.
> +  * Meanwhile the task could migrate itself and the action
> +  * would be meaningless. It is not serious though.
> +  */
> + if (task->flags & PF_KTHREAD) {
> + /*
> +  * Wake up a kthread which still has not been migrated.
> +  */
> + wake_up_process(task);

I have just noticed that freezer used wake_up_state(p, TASK_INTERRUPTIBLE);
IMHO, we should do so as well.

wake_up_process() wakes also tasks in TASK_UNINTERRUPTIBLE state.
These might not be ready for an unexpected wakeup. For example,
see concat_dev_erase() in drivers/mtd/mtdcontact.c.

With this change, feel free to use

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [PATCH v3 1/2] livepatch: send a fake signal to all blocking tasks

2017-11-01 Thread Miroslav Benes

> +/*
> + * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
> + * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request 
> this
> + * action currently.
> + */
> +void klp_force_signals(void)
> +{
> + struct task_struct *g, *task;
> +
> + pr_notice("signaling remaining tasks\n");
> +
> + read_lock(&tasklist_lock);
> + for_each_process_thread(g, task) {
> + if (!klp_patch_pending(task))
> + continue;
> +
> + /*
> +  * There is a small race here. We could see TIF_PATCH_PENDING
> +  * set and decide to wake up a kthread or send a fake signal.
> +  * Meanwhile the task could migrate itself and the action
> +  * would be meaningless. It is not serious though.
> +  */
> + if (task->flags & PF_KTHREAD) {
> + /*
> +  * Wake up a kthread which still has not been migrated.
> +  */
> + wake_up_process(task);

So this is not as safe as one would hope. It tries to wake all TASK_NORMAL 
tasks, which could cause headaches. Let's make it

wake_up_state(task, TASK_INTERRUPTIBLE);

to wake only kthreads sleeping interruptedly.

Thanks Petr for spotting this (offline).

Miroslav

> + } else {
> + /*
> +  * Send fake signal to all non-kthread tasks which are
> +  * still not migrated.
> +  */
> + spin_lock_irq(&task->sighand->siglock);
> + signal_wake_up(task, 0);
> + spin_unlock_irq(&task->sighand->siglock);
> + }
> + }
> + read_unlock(&tasklist_lock);
> +}


Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

2017-11-01 Thread Nicholas Piggin
On Wed, 1 Nov 2017 17:35:51 +0530
Anshuman Khandual  wrote:

> On 10/31/2017 12:14 PM, Nicholas Piggin wrote:
> > Here's a random mix of performance improvements for radix TLB flushing
> > code. The main aims are to reduce the amount of translation that gets
> > invalidated, and to reduce global flushes where we can do local.
> > 
> > To that end, a parallel kernel compile benchmark using powerpc:tlbie
> > tracepoint shows a reduction in tlbie instructions from about 290,000
> > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
> > 15,000,000. Looks great, but unfortunately does not translate to a
> > statistically significant performance improvement! The needle on TLB
> > misses does not move much, I suspect because a lot of the flushing is
> > done a startup and shutdown, and because a significant cost of TLB
> > flushing itself is in the barriers.  
> 
> Does memory barrier initiate a single global invalidation with tlbie ?
> 

I'm not quite sure what you're asking, and I don't know the details
of how the hardware handles it, but from the measurements in patch
1 of the series we can see there is a benefit for both tlbie and
tlbiel of batching them up between barriers.

Thanks,
Nick


Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

2017-11-01 Thread Anshuman Khandual
On 10/31/2017 12:14 PM, Nicholas Piggin wrote:
> Here's a random mix of performance improvements for radix TLB flushing
> code. The main aims are to reduce the amount of translation that gets
> invalidated, and to reduce global flushes where we can do local.
> 
> To that end, a parallel kernel compile benchmark using powerpc:tlbie
> tracepoint shows a reduction in tlbie instructions from about 290,000
> to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
> 15,000,000. Looks great, but unfortunately does not translate to a
> statistically significant performance improvement! The needle on TLB
> misses does not move much, I suspect because a lot of the flushing is
> done a startup and shutdown, and because a significant cost of TLB
> flushing itself is in the barriers.

Does memory barrier initiate a single global invalidation with tlbie ?



Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE

2017-11-01 Thread Paul Mackerras
On Sun, Oct 29, 2017 at 05:51:06PM -0700, Ram Pai wrote:
> On Mon, Oct 30, 2017 at 09:04:17AM +1100, Paul Mackerras wrote:
> > On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote:
> > > 
> > > I like the idea of not tracking the slots at all. It is something the
> > > guest should not be knowing or tracking.
> > 
> > Why do you say that?
> 
> 'slot' is a internal mechanism used by the hash table to accelerate
> mapping an address to a hash-page.  If the hash-table implementation
> choose to use a different mechanism to accelerate the mapping, it can't
> because that mechanism is baked into the logic of all the consumers.

Not all operating systems use the HPT as a cache of translations that
are also stored somewhere, as Linux does.  Those OSes are perfectly
entitled to control the slot allocation for their own purposes in
whatever way they want.  So having the interface use the slot number
is fine; just as having alternative interfaces that don't need to
specify the slot number for the kind of usage that Linux does is also
fine.

Paul.


[RFC 2/2] powerpc/mm: Enable deferred flushing of TLB during reclaim

2017-11-01 Thread Anshuman Khandual
Deferred flushing can only be enabled on POWER9 DD2.0 processor onwards.
Because prior versions of POWER9 and previous hash table based POWER
processors will do TLB flushing in pte_get_and_clear() function itself
which then prevents batching and eventual flush completion later on.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/include/asm/tlbbatch.h | 30 +++
 arch/powerpc/include/asm/tlbflush.h |  3 +++
 arch/powerpc/mm/tlb-radix.c | 49 +
 4 files changed, 83 insertions(+)
 create mode 100644 arch/powerpc/include/asm/tlbbatch.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 809c468..f06b565 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -230,6 +230,7 @@ config PPC
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select VIRT_TO_BUS  if !PPC64
+   select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if (PPC64 && PPC_BOOK3S)
#
# Please keep this list sorted alphabetically.
#
diff --git a/arch/powerpc/include/asm/tlbbatch.h 
b/arch/powerpc/include/asm/tlbbatch.h
new file mode 100644
index 000..fc762ef
--- /dev/null
+++ b/arch/powerpc/include/asm/tlbbatch.h
@@ -0,0 +1,30 @@
+#ifndef _ARCH_POWERPC_TLBBATCH_H
+#define _ARCH_POWERPC_TLBBATCH_H
+
+#include 
+
+#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+
+#define MAX_BATCHED_MM 1024
+
+struct arch_tlbflush_unmap_batch {
+   /*
+* Each bit set is a CPU that potentially has a
+* TLB entry for one of the PFN being flushed.
+* This represents whether all deferred struct
+* mm will be flushed for any given CPU.
+*/
+   struct cpumask cpumask;
+
+   /* All the deferred struct mm */
+   struct mm_struct *mm[MAX_BATCHED_MM];
+   unsigned long int nr_mm;
+   
+};
+
+extern bool arch_tlbbatch_should_defer(struct mm_struct *mm);
+extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
+extern void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
+   struct mm_struct *mm);
+#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */
+#endif /* _ARCH_POWERPC_TLBBATCH_H */
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 13dbcd4..2041923 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -20,6 +20,9 @@
  */
 #ifdef __KERNEL__
 
+#include 
+#include 
+
 #ifdef CONFIG_PPC_MMU_NOHASH
 /*
  * TLB flushing for software loaded TLB chips
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index b3e849c..506e7ed 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -519,3 +521,50 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct 
*mm)
 }
 EXPORT_SYMBOL_GPL(radix_kvm_prefetch_workaround);
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
+
+#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+static void clear_tlb(void *data)
+{
+   struct arch_tlbflush_unmap_batch *batch = data;
+   int i;
+
+   WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1));
+
+   for (i = 0; i < batch->nr_mm; i++) {
+   if (batch->mm[i])
+   radix__local_flush_tlb_mm(batch->mm[i]);
+   }
+}
+
+void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
+{
+   WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1));
+
+   smp_call_function_many(&batch->cpumask, (void *)clear_tlb, batch, 1);
+   batch->nr_mm = 0;
+   cpumask_clear(&batch->cpumask);
+}
+
+void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, struct 
mm_struct *mm)
+{
+   WARN_ON(!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1));
+
+   ++batch->nr_mm;
+   if (batch->nr_mm != MAX_BATCHED_MM)
+   batch->mm[batch->nr_mm] = mm;
+   else
+   pr_err("Deferred TLB flush: missed a struct mm\n");
+   cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
+}
+
+bool arch_tlbbatch_should_defer(struct mm_struct *mm)
+{
+   if (!radix_enabled() || cpu_has_feature(CPU_FTR_POWER9_DD1))
+   return false;
+
+   if (!mm_is_thread_local(mm))
+   return true;
+
+   return false;
+}
+#endif
-- 
1.8.3.1



[RFC 1/2] mm/tlbbatch: Introduce arch_tlbbatch_should_defer()

2017-11-01 Thread Anshuman Khandual
The entire scheme of deferred TLB flush in reclaim path rests on the
fact that the cost to refill TLB entries is less than flushing out
individual entries by sending IPI to remote CPUs. But architecture
can have different ways to evaluate that. Hence apart from checking
TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be
architecture specific.

Signed-off-by: Anshuman Khandual 
---
 arch/x86/include/asm/tlbflush.h | 12 
 mm/rmap.c   |  9 +
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index c4aed0d..5875f2c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -366,6 +366,18 @@ static inline void flush_tlb_page(struct vm_area_struct 
*vma, unsigned long a)
 void native_flush_tlb_others(const struct cpumask *cpumask,
 const struct flush_tlb_info *info);
 
+static inline void arch_tlbbatch_should_defer(struct mm_struct *mm)
+{
+   bool should_defer = false;
+
+   /* If remote CPUs need to be flushed then defer batch the flush */
+   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
+   should_defer = true;
+   put_cpu();
+
+   return should_defer;
+}
+
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
diff --git a/mm/rmap.c b/mm/rmap.c
index b874c47..bfbfe92 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -627,17 +627,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct 
*mm, bool writable)
  */
 static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
 {
-   bool should_defer = false;
-
if (!(flags & TTU_BATCH_FLUSH))
return false;
 
-   /* If remote CPUs need to be flushed then defer batch the flush */
-   if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
-   should_defer = true;
-   put_cpu();
-
-   return should_defer;
+   return arch_tlbbatch_should_defer(mm);
 }
 
 /*
-- 
1.8.3.1



[RFC 0/2] Enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH on POWER

2017-11-01 Thread Anshuman Khandual
From: Anshuman Khandual 

Batched TLB flush during reclaim path has been around for couple of years
now and been enabled on X86 platform. The idea is to batch multiple page
TLB invalidation requests together and flush all those CPUs completely
who might have the TLB cache for any of the unmapped pages instead of just
sending multiple IPIs and flushing out individual pages each time reclaim
unmaps one page. This has the potential to improve performance for certain
types of workloads under memory pressure provided some conditions related
to individual page TLB invalidation, CPU wide TLB invalidation, system
wide TLB invalidation, TLB reload, IPI costs etc are met.

Please refer the commit 72b252aed5 ("mm: send one IPI per CPU to TLB flush
all entries after unmapping pages") from Mel Gorman for more details on
how it can impact the performance for various workloads. This enablement
improves performance for the original test case 'case-lru-file-mmap-read'
from vm-scallability bucket but only from system time perspective.

time ./run case-lru-file-mmap-read

Without the patch:

real4m20.364s
user102m52.492s
sys 433m26.190s

With the patch:

real4m15.942s   (-  1.69%)
user111m16.662s (+  7.55%)
sys 382m35.202s (- 11.73%)

Parallel kernel compilation does not see any performance improvement or
degradation with and with out this patch. It remains within margin of
error.

Without the patch:

real1m13.850s
user39m21.803s
sys 2m43.362s

With the patch:

real1m14.481s   (+ 0.85%)
user39m27.409s  (+ 0.23%)
sys 2m44.656s   (+ 0.79%)

It batches up multiple struct mm during reclaim and keeps on accumulating
the superset of struct mm's cpu mask who might have a TLB which needs to
be invalidated. Then local struct mm wide invalidation is performance on
the cpu mask for all those batched ones. Please do the review and let me
know if there is any other way to do this better. Thank you.

Anshuman Khandual (2):
  mm/tlbbatch: Introduce arch_tlbbatch_should_defer()
  powerpc/mm: Enable deferred flushing of TLB during reclaim

 arch/powerpc/Kconfig|  1 +
 arch/powerpc/include/asm/tlbbatch.h | 30 +++
 arch/powerpc/include/asm/tlbflush.h |  3 +++
 arch/powerpc/mm/tlb-radix.c | 49 +
 arch/x86/include/asm/tlbflush.h | 12 +
 mm/rmap.c   |  9 +--
 6 files changed, 96 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/tlbbatch.h

-- 
1.8.3.1



Re: [PATCH] powerpc/perf: Fix core-imc hotplug callback failure during imc initialization

2017-11-01 Thread Madhavan Srinivasan



On Wednesday 01 November 2017 06:22 AM, Michael Ellerman wrote:

Anju T Sudhakar  writes:


Call trace observed during boot:

What's the actual oops?


I could recreate this in mambo with CPUS=2 and THREAD=2
Here is the complete stack trace.

[    0.045367] core_imc memory allocation for cpu 2 failed
[    0.045408] Unable to handle kernel paging request for data at 
address 0x7d20e2a6f92d03b8

[    0.045443] Faulting instruction address: 0xc00dde18
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000fd1cb890]
    pc: c00dde18: event_function_call+0x28/0x14c
    lr: c00dde00: event_function_call+0x10/0x14c
    sp: c000fd1cbb10
   msr: 90009033
   dar: 7d20e2a6f92d03b8
  current = 0xc000fd15da00
  paca    = 0xcfff   softe: 0    irq_happened: 0x01
    pid   = 11, comm = cpuhp/0
Linux version 4.14.0-rc7-00014-g0a08377b127b (maddy@SrihariSrinidhi) 
(gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1)) #5 SMP 
Wed Nov 1 14:12:27 IST 2017

enter ? for help
[c000fd1cbb10]  (unreliable)
[c000fd1cbba0] c00de180 perf_remove_from_context+0x30/0x9c
[c000fd1cbbe0] c00e9108 perf_pmu_migrate_context+0x9c/0x224
[c000fd1cbc60] c00682e0 ppc_core_imc_cpu_offline+0xdc/0x144
[c000fd1cbcb0] c0070568 cpuhp_invoke_callback+0xe4/0x244
[c000fd1cbd10] c0070824 cpuhp_thread_fun+0x15c/0x1b0
[c000fd1cbd60] c008e8cc smpboot_thread_fn+0x1e0/0x200
[c000fd1cbdc0] c008ae58 kthread+0x150/0x158
[c000fd1cbe30] c000b464 ret_from_kernel_thread+0x5c/0x78





[c00ff38ffb80] c02ddfac perf_pmu_migrate_context+0xac/0x470
[c00ff38ffc40] c011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0
[c00ff38ffc90] c0125758 cpuhp_invoke_callback+0x198/0x5d0
[c00ff38ffd00] c012782c cpuhp_thread_fun+0x8c/0x3d0
[c00ff38ffd60] c01678d0 smpboot_thread_fn+0x290/0x2a0
[c00ff38ffdc0] c015ee78 kthread+0x168/0x1b0
[c00ff38ffe30] c000b368 ret_from_kernel_thread+0x5c/0x74

While registering the cpuhoplug callbacks for core-imc, if we fails
in the cpuhotplug online path for any random core (either because opal call to
initialize the core-imc counters fails or because memory allocation fails for
that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who
successfully returned from cpuhotplug online path.

But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event
context, when core-imc counters are not even initialized. Thus creating the
above stack dump.

Add a check to see if core-imc counters are enabled or not in the cpuhotplug
offline path before migrating the context to handle this failing scenario.

Why do we need a bool to track this? Can't we just check the data
structure we're deinitialising has been initialised?


My bad. yes we could do that. Something like this will work?

@@ -606,6 +608,20 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
    if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask))
    return 0;

+   /*
+    * Check whether core_imc is registered. We could end up here
+    * if the cpuhotplug callback registration fails. i.e, callback
+    * invokes the offline path for all sucessfully registered cpus.
+    * At this stage, core_imc pmu will not be registered and we
+    * should return here.
+    *
+    * We return with a zero since this is not a offline failure.
+    * And cpuhp_setup_state() returns the actual failure reason
+    * to the caller, which inturn will call the cleanup routine.
+    */
+   if (!core_imc_pmu->pmu.event_init)
+   return 0;
+
    /* Find any online cpu in that core except the current "cpu" */
    ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);




Doesn't this also mean we won't cleanup the initialisation for any CPUs
that have been initialised?


No, we do. cpuhp_setup_state() returns the actual failure reason
which in-turn calls the cleanup routine.

Thanks for review comments
Maddy



cheers


diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 8812624..08139f9 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -30,6 +30,7 @@ static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
  static cpumask_t nest_imc_cpumask;
  struct imc_pmu_ref *nest_imc_refc;
  static int nest_pmus;
+static bool core_imc_enabled;
  
  /* Core IMC data structures and variables */
  
@@ -607,6 +608,19 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)

if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask))
return 0;
  
+	/*

+* See if core imc counters are enabled or not.
+*
+* Suppose we reach here from core_imc_cpumask_init(),
+* since we failed at the cpuhotplug online path for any random
+* core (either because opal call t

Re: [Patch v10] QE: remove PPCisms for QE

2017-11-01 Thread Julien Thierry

Hi Zhao,

I just noticed a small nit.

On 01/11/17 02:01, Zhao Qiang wrote:

QE was supported on PowerPC, and dependent on PPC,
Now it is supported on other platforms. so remove PPCisms.

Signed-off-by: Zhao Qiang 
---
Changes for v2:
- na
Changes for v3:
- add NO_IRQ
Changes for v4:
- modify spin_event_timeout to opencoded timeout loop
- remove NO_IRQ
- modify virq_to_hw to opencoed code
Changes for v5:
- modify commit msg
- modify depends of QUICC_ENGINE
- add kerneldoc header for qe_issue_cmd
Changes for v6:
- add dependency on FSL_SOC and PPC32 for drivers
  depending on QUICC_ENGING but not available on ARM
Changes for v7:
- split qeic part to another patch
- rebase
Changes for v8:
- include  in ucc_uart
Changes for v9:
- fix cast warning
Changes for v10:
- rebase

  drivers/net/ethernet/freescale/Kconfig | 11 ++---
  drivers/soc/fsl/qe/Kconfig |  2 +-
  drivers/soc/fsl/qe/qe.c| 82 +-
  drivers/soc/fsl/qe/qe_io.c | 42 -
  drivers/soc/fsl/qe/qe_tdm.c|  8 ++--
  drivers/soc/fsl/qe/ucc.c   | 10 ++---
  drivers/soc/fsl/qe/ucc_fast.c  | 74 +++---
  drivers/tty/serial/Kconfig |  2 +-
  drivers/tty/serial/ucc_uart.c  |  1 +
  drivers/usb/gadget/udc/Kconfig |  2 +-
  drivers/usb/host/Kconfig   |  2 +-
  include/soc/fsl/qe/qe.h|  1 -
  12 files changed, 126 insertions(+), 111 deletions(-)



[...]


diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 2ef6fc6487c1..1d695870ea9e 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c


[...]


@@ -132,20 +142,26 @@ int qe_issue_cmd(u32 cmd, u32 device, u8 mcn_protocol, 
u32 cmd_input)
mcn_shift = QE_CR_MCN_NORMAL_SHIFT;
}
  
-		out_be32(&qe_immr->cp.cecdr, cmd_input);

-   out_be32(&qe_immr->cp.cecr,
-(cmd | QE_CR_FLG | ((u32) device << dev_shift) | (u32)
- mcn_protocol << mcn_shift));
+   iowrite32be(cmd_input, &qe_immr->cp.cecdr);
+   iowrite32be((cmd | QE_CR_FLG | ((u32)device << dev_shift) |
+   (u32)mcn_protocol << mcn_shift), &qe_immr->cp.cecr);
}
  
  	/* wait for the QE_CR_FLG to clear */

-   ret = spin_event_timeout((in_be32(&qe_immr->cp.cecr) & QE_CR_FLG) == 0,
-  100, 0);
+   ret = -EIO;
+   for (i = 0; i < 100; i++) {
+   if ((ioread32be(&qe_immr->cp.cecr) & QE_CR_FLG) == 0) {
+   ret = 0;
+   break;
+   }
+   udelay(1);
+   }
+
/* On timeout (e.g. failure), the expression will be false (ret == 0),
   otherwise it will be true (ret == 1). */


nit:
The comment here is no longer valid, on timeout ret == -EIO and on 
success 0. It should probably be removed to avoid confusion.


Cheers,

--
Julien Thierry