Re: [PATCH 1/1] serial/uuc_uart: Set shutdown timeout to CONFIG_HZ independent 2ms

2016-12-05 Thread Alexander Stein
On Monday 05 December 2016 10:04:27, Timur Tabi wrote:
> Alexander Stein wrote:
> > -   schedule_timeout(2);
> > +   schedule_timeout(msecs_to_jiffies(2));
> 
> NACK.
> 
> So I don't remember why I wrote this code, but I don't think I was
> expecting it to be 2ms.  Instead, I think I just wanted it to be some
> delay, but I believed that schedule_timeout(1) was too short or would be
> "optimized" out somehow.
> 
> Note that right below this, I do:
> 
>   if (qe_port->wait_closing) {
>   /* Wait a bit longer */
>   set_current_state(TASK_UNINTERRUPTIBLE);
>   schedule_timeout(qe_port->wait_closing);
>   }
> 
> And wait_closing is a number of jiffies, so I knew that
> schedule_timeout() took jiffies as a parameter.
> 
> So I think I'm going to NACK this patch, since I believe I knew what I
> was doing when I wrote it five years ago.

Okay, I was just wondering why the timeout is dependant on the timer tick. 
That didn't seem obvious to me.
Rethinking about this, I would rather replace those lines with msleep instead.

Best regards,
Alexander



Re: [PATCH 1/2] powerpc/powernv/opal-dump : Handles opal_dump_info properly

2016-12-05 Thread Mukesh Ojha

Hi Michael,

Can you please have a look at this patchset as there is no

functional changes involve with this?

Thanks,
Mukesh




On Thursday 01 December 2016 02:38 PM, Mukesh Ojha wrote:

Moves the return value check of 'opal_dump_info' to a proper place which
was previously unnecessarily filling all the dump info even on failure.

Signed-off-by: Mukesh Ojha 
---
  arch/powerpc/platforms/powernv/opal-dump.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-dump.c 
b/arch/powerpc/platforms/powernv/opal-dump.c
index 4c82782..ae32212 100644
--- a/arch/powerpc/platforms/powernv/opal-dump.c
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -225,13 +225,16 @@ static int64_t dump_read_info(uint32_t *dump_id, uint32_t 
*dump_size, uint32_t *
if (rc == OPAL_PARAMETER)
rc = opal_dump_info(&id, &size);

+   if (rc) {
+   pr_warn("%s: Failed to get dump info (%d)\n",
+   __func__, rc);
+   return rc;
+   }
+
*dump_id = be32_to_cpu(id);
*dump_size = be32_to_cpu(size);
*dump_type = be32_to_cpu(type);

-   if (rc)
-   pr_warn("%s: Failed to get dump info (%d)\n",
-   __func__, rc);
return rc;
  }





Re: [PATCH] PPC: sstep.c: Add modsw, moduw instruction emulation

2016-12-05 Thread Naveen N. Rao
By the way, I missed mentioning previously: please use 'powerpc: ' 
prefix for the subject, rather than PPC.


On 2016/12/04 10:25PM, PrasannaKumar Muralidharan wrote:
> Add modsw and moduw instruction emulation support to analyse_instr.

And, it will be better if you can briefly describe what these functions 
do for the benefit of others.

- Naveen



Re: [PATCH v3 2/3] powerpc: get hugetlbpage handling more generic

2016-12-05 Thread Christophe LEROY



Le 06/12/2016 à 02:18, Scott Wood a écrit :

On Wed, 2016-09-21 at 10:11 +0200, Christophe Leroy wrote:

Today there are two implementations of hugetlbpages which are managed
by exclusive #ifdefs:
* FSL_BOOKE: several directory entries points to the same single hugepage
* BOOK3S: one upper level directory entry points to a table of hugepages

In preparation of implementation of hugepage support on the 8xx, we
need a mix of the two above solutions, because the 8xx needs both cases
depending on the size of pages:
* In 4k page size mode, each PGD entry covers a 4M bytes area. It means
that 2 PGD entries will be necessary to cover an 8M hugepage while a
single PGD entry will cover 8x 512k hugepages.
* In 16 page size mode, each PGD entry covers a 64M bytes area. It means
that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
hugepages will be covers by one PGD entry.

This patch:
* removes #ifdefs in favor of if/else based on the range sizes
* merges the two huge_pte_alloc() functions as they are pretty similar
* merges the two hugetlbpage_init() functions as they are pretty similar

[snip]

@@ -860,16 +803,34 @@ static int __init hugetlbpage_init(void)
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
 */
-   if (pdshift != shift) {
+   if (pdshift > shift) {
pgtable_cache_add(pdshift - shift, NULL);
if (!PGT_CACHE(pdshift - shift))
panic("hugetlbpage_init(): could not create
"
  "pgtable cache for %d bit
pagesize\n", shift);
}
+#ifdef CONFIG_PPC_FSL_BOOK3E
+   else if (!hugepte_cache) {


This else never triggers on book3e, because the way this function calculates
pdshift is wrong for book3e (it uses PyD_SHIFT instead of HUGEPD_PxD_SHIFT).
 We later get OOMs because huge_pte_alloc() calculates pdshift correctly,
tries to use hugepte_cache, and fails.


Ok, I'll check it again, I was expecting it to still work properly on 
book3e, because after applying patch 3 it works properly on the 8xx.




If the point of this patch is to remove the compile-time decision on whether
to do things the book3e way, why are there still ifdefs such as the ones
controlling the definition of HUGEPD_PxD_SHIFT?  How does what you're doing on
8xx (for certain page sizes) differ from book3e?


Some of the things done for book3e are common to 8xx, but differ from 
book3s. For that reason, in the following patch (3/3), there is in 
several places:

-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)

Christophe


Re: [PATCH] PPC: sstep.c: Add modsw, moduw instruction emulation

2016-12-05 Thread Naveen N. Rao
On 2016/12/06 01:21AM, PrasannaKumar Muralidharan wrote:
> >> +
> >> + case 267:   /* moduw */
> >
> > Please move this case further up so that the extended opcodes are in
> > numerical order.
> 
> Placed it after divide instruction as it appeared logical. Also placed
> 267 below 779 as it is the order in which the instructions are
> documented in the ISA book. This may help in finding related
> instructions together. If this style is not preferred I can arrange it
> in numerical order.

I guessed as much, but if you look at the existing function, you'll see 
that things have been arranged in numerical order. As such, it's best to 
stick to that convention.

- Naveen



[PATCH 3/3] powerpc: enable support for GCC plugins

2016-12-05 Thread Andrew Donnellan
Enable support for GCC plugins on powerpc.

Add an additional version check in gcc-plugins-check to advise users to
upgrade to gcc 5.2+ on powerpc to avoid issues with header files (gcc <=
4.6) or missing copies of rs6000-cpus.def (4.8 to 5.1 on 64-bit targets).

Signed-off-by: Andrew Donnellan 

---

Open to bikeshedding on the gcc version check.

Compile tested with all plugins enabled on gcc 4.6-6.2,
x86->ppc{32,64,64le} and 4.8-6.2 ppc64le->ppc{32,64,64le}. Thanks to
Chris Smart for help with this.

I think it's best to take this through powerpc#next with an ACK from
Kees/Emese?
---
 arch/powerpc/Kconfig | 1 +
 scripts/Makefile.gcc-plugins | 8 
 2 files changed, 9 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65fba4c..6efbc08 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -92,6 +92,7 @@ config PPC
select HAVE_DYNAMIC_FTRACE_WITH_REGS if MPROFILE_KERNEL
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
+   select HAVE_GCC_PLUGINS
select SYSCTL_EXCEPTION_TRACE
select VIRT_TO_BUS if !PPC64
select HAVE_IDE
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 26c67b7..9835a75 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -47,6 +47,14 @@ gcc-plugins-check: FORCE
 ifdef CONFIG_GCC_PLUGINS
   ifeq ($(PLUGINCC),)
 ifneq ($(GCC_PLUGINS_CFLAGS),)
+  # Various gccs between 4.5 and 5.1 have bugs on powerpc due to missing
+  # header files. gcc <= 4.6 doesn't work at all, gccs from 4.8 to 5.1 have
+  # issues with 64-bit targets.
+  ifeq ($(ARCH),powerpc)
+ifeq ($(call cc-ifversion, -le, 0501, y), y)
+ @echo "Cannot use CONFIG_GCC_PLUGINS: plugin support on gcc <= 5.1 is 
buggy on powerpc, please upgrade to gcc 5.2 or newer" >&2 && exit 1
+endif
+  endif
   ifeq ($(call cc-ifversion, -ge, 0405, y), y)
$(Q)$(srctree)/scripts/gcc-plugin.sh --show-error "$(__PLUGINCC)" 
"$(HOSTCXX)" "$(CC)" || true
@echo "Cannot use CONFIG_GCC_PLUGINS: your gcc installation does not 
support plugins, perhaps the necessary headers are missing?" >&2 && exit 1
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH 2/3] powerpc: correctly disable latent entropy GCC plugin on prom_init.o

2016-12-05 Thread Andrew Donnellan
Commit 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover prom_init.o,
but ended up saying init.o (which doesn't exist) instead. Fix the typo.

Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan 

---

I think that we potentially could get rid of some of these disables, but
it's safer to leave it for now.
---
 arch/powerpc/kernel/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1925341..adb52d1 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -15,7 +15,7 @@ CFLAGS_btext.o+= -fPIC
 endif
 
 CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
-CFLAGS_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
+CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH 1/3] gcc-plugins: fix definition of DISABLE_LATENT_ENTROPY_PLUGIN

2016-12-05 Thread Andrew Donnellan
The variable DISABLE_LATENT_ENTROPY_PLUGIN is defined when
CONFIG_PAX_LATENT_ENTROPY is set. This is leftover from the original PaX
version of the plugin code and doesn't actually exist. Change the condition
to depend on CONFIG_GCC_PLUGIN_LATENT_ENTROPY instead.

Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan 
---
 scripts/Makefile.gcc-plugins | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 060d2cb..26c67b7 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -8,7 +8,7 @@ ifdef CONFIG_GCC_PLUGINS
 
   gcc-plugin-$(CONFIG_GCC_PLUGIN_LATENT_ENTROPY)   += 
latent_entropy_plugin.so
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_LATENT_ENTROPY)+= 
-DLATENT_ENTROPY_PLUGIN
-  ifdef CONFIG_PAX_LATENT_ENTROPY
+  ifdef CONFIG_GCC_PLUGIN_LATENT_ENTROPY
 DISABLE_LATENT_ENTROPY_PLUGIN  += 
-fplugin-arg-latent_entropy_plugin-disable
   endif
 
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 1/3] KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV

2016-12-05 Thread Paul Mackerras
On Thu, Dec 01, 2016 at 06:18:10PM +1100, Nicholas Piggin wrote:
> Change the calling convention to put the trap number together with
> CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
> handler, and r9 free.

Cute idea!  Some comments below...

> The 64-bit PR handler entry translates the calling convention back
> to match the previous call convention (i.e., shared with 32-bit), for
> simplicity.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/exception-64s.h | 28 +++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 15 +++
>  arch/powerpc/kvm/book3s_segment.S| 27 ---
>  3 files changed, 42 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> b/arch/powerpc/include/asm/exception-64s.h
> index 9a3eee6..bc8fc45 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -233,7 +233,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
>  
>  #endif
>  
> -#define __KVM_HANDLER_PROLOG(area, n)
> \
> +#define __KVM_HANDLER(area, h, n)\
>   BEGIN_FTR_SECTION_NESTED(947)   \
>   ld  r10,area+EX_CFAR(r13);  \
>   std r10,HSTATE_CFAR(r13);   \
> @@ -243,30 +243,32 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
>   std r10,HSTATE_PPR(r13);\
>   END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);\
>   ld  r10,area+EX_R10(r13);   \
> - stw r9,HSTATE_SCRATCH1(r13);\
> - ld  r9,area+EX_R9(r13); \
>   std r12,HSTATE_SCRATCH0(r13);   \
> -
> -#define __KVM_HANDLER(area, h, n)\
> - __KVM_HANDLER_PROLOG(area, n)   \
> - li  r12,n;  \
> + li  r12,(n);\
> + sldir12,r12,32; \
> + or  r12,r12,r9; \

Did you consider doing it the other way around, i.e. with r12
containing (cr << 32) | trap?  That would save 1 instruction in each
handler:

+   sldir12,r9,32;  \
+   ori r12,r12,(n);\

> + ld  r9,area+EX_R9(r13); \
> + std r9,HSTATE_SCRATCH1(r13);\

Why not put this std in kvmppc_interrupt[_hv] rather than in each
handler?

>   b   kvmppc_interrupt
>  
>  #define __KVM_HANDLER_SKIP(area, h, n)   
> \
>   cmpwi   r10,KVM_GUEST_MODE_SKIP;\
> - ld  r10,area+EX_R10(r13);   \
>   beq 89f;\
> - stw r9,HSTATE_SCRATCH1(r13);\
>   BEGIN_FTR_SECTION_NESTED(948)   \
> - ld  r9,area+EX_PPR(r13);\
> - std r9,HSTATE_PPR(r13); \
> + ld  r10,area+EX_PPR(r13);   \
> + std r10,HSTATE_PPR(r13);\
>   END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);\
> - ld  r9,area+EX_R9(r13); \
> + ld  r10,area+EX_R10(r13);   \
>   std r12,HSTATE_SCRATCH0(r13);   \
> - li  r12,n;  \
> + li  r12,(n);\
> + sldir12,r12,32; \
> + or  r12,r12,r9; \
> + ld  r9,area+EX_R9(r13); \
> + std r9,HSTATE_SCRATCH1(r13);\

Same comment again, of course.

>   b   kvmppc_interrupt;   \
>  89:  mtocrf  0x80,r9;\
>   ld  r9,area+EX_R9(r13); \
> + ld  r10,area+EX_R10(r13);   \
>   b   kvmppc_skip_##h##interrupt
>  
>  #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index c3c1d1b..0536c73 100644
> --- a/arch/powerpc/kvm/book

Re: [PATCH v8 2/3] perf annotate: Support jump instruction with target as second operand

2016-12-05 Thread Ravi Bangoria
Hi Arnaldo,

Hmm, so it's difficult to find example of this when we use debuginfo.
Because...

Jump__parse tries to look for two things 'offset' and 'target address'.

objdump with debuginfo will include offset in assembly f.e. annotate of
'smp_call_function_single' with perf.data and vmlinux I shared.

   │c016d6ac:   cmpwi  cr7,r9,0 
   ▒
   │c016d6b0: ↑ bnecr7,c016d59c 
<.smp_call_function_single+0x8c>   ▒
   │c016d6b4:   addis  r10,r2,-15   
   ▒

objdump of same function with kcore.

   │c016d6ac:   cmpwi  cr7,r9,0 
   ▒
   │c016d6b0: ↓ bnecr7,0xc016d59c   
   ▒
   │c016d6b4:   addis  r10,r2,-15   
   ▒

Annotating in first case won't show any issue because we directly get
offset. But in this case as well, we are parsing wrong target address
in ops->target.addr

While we don't have offset in second case, we use target address to
find it. And thus it shows wrong o/p something like:

   │   cmpwi  cr7,r9,0  
   ▒
   │ ↓ bne3fe92afc  
   ▒
   │   addis  r10,r2,-15
   ▒

BTW, we have lot of such instructions in kernel.

Thanks,
-Ravi


On Monday 05 December 2016 09:26 PM, Ravi Bangoria wrote:
> Arch like powerpc has jump instructions that includes target address
> as second operand. For example, 'bne  cr7,0xc00f6154'. Add
> support for such instruction in perf annotate.
>
> objdump o/p:
>   c00f6140:   ld r9,1032(r31)
>   c00f6144:   cmpdi  cr7,r9,0
>   c00f6148:   bnecr7,0xc00f6154
>   c00f614c:   ld r9,2312(r30)
>   c00f6150:   stdr9,1032(r31)
>   c00f6154:   ld r9,88(r31)
>
> Corresponding perf annotate o/p:
>
> Before patch:
>  ld r9,1032(r31)
>  cmpdi  cr7,r9,0
>   v  bne3ff09f2c
>  ld r9,2312(r30)
>  stdr9,1032(r31)
>   74:ld r9,88(r31)
>
> After patch:
>  ld r9,1032(r31)
>  cmpdi  cr7,r9,0
>   v  bne74
>  ld r9,2312(r30)
>  stdr9,1032(r31)
>   74:ld r9,88(r31)
>
> Signed-off-by: Ravi Bangoria 
> ---
> Changes in v8:
>   - v7: https://lkml.org/lkml/2016/9/21/436
>   - Rebase to acme/perf/core
>   - Little change in patch description.
>   - No logical changes. (Cross arch annotate patches are in. This patch
> is for hardening annotate for powerpc.)
>
>  tools/perf/util/annotate.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index ea7e0de..590244e 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -223,8 +223,12 @@ bool ins__is_call(const struct ins *ins)
>  static int jump__parse(struct arch *arch __maybe_unused, struct ins_operands 
> *ops, struct map *map __maybe_unused)
>  {
>   const char *s = strchr(ops->raw, '+');
> + const char *c = strchr(ops->raw, ',');
>
> - ops->target.addr = strtoull(ops->raw, NULL, 16);
> + if (c++ != NULL)
> + ops->target.addr = strtoull(c, NULL, 16);
> + else
> + ops->target.addr = strtoull(ops->raw, NULL, 16);
>
>   if (s++ != NULL)
>   ops->target.offset = strtoull(s, NULL, 16);



Re: [PATCH] cxl: prevent read/write to AFU config space while AFU not configured

2016-12-05 Thread Ian Munsie
Acked-by: Ian Munsie 

Looks like a reasonable solution

> Pradipta found this while doing testing for cxlflash. I've tested this 
> patch and I'm satisfied that it solves the issue, but I've asked Pradipta 
> to test it a bit further. 

:)



[PATCH RFC v2 3/3] powerpc/64: Enable use of radix MMU under hypervisor on POWER9

2016-12-05 Thread Paul Mackerras
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).

Then we need to check whether the hypervisor agreed to us using
radix.  We need to do this very early on in the kernel boot process
before any of the MMU initialization is done.  If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.

Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  2 ++
 arch/powerpc/include/asm/hvcall.h| 11 +++
 arch/powerpc/include/asm/prom.h  |  9 +
 arch/powerpc/kernel/prom_init.c  | 18 +-
 arch/powerpc/mm/init_64.c| 29 +
 arch/powerpc/mm/pgtable-radix.c  |  2 ++
 arch/powerpc/platforms/pseries/lpar.c| 29 +
 7 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 8afb0e0..e8cbdc0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -138,5 +138,7 @@ static inline void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 extern int (*register_process_table)(unsigned long base, unsigned long 
page_size,
 unsigned long tbl_size);
 
+extern void radix_init_pseries(void);
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 77ff1ba..54d11b3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -276,6 +276,7 @@
 #define H_GET_MPP_X0x314
 #define H_SET_MODE 0x31C
 #define H_CLEAR_HPT0x358
+#define H_REGISTER_PROC_TBL0x37C
 #define H_SIGNAL_SYS_RESET 0x380
 #define MAX_HCALL_OPCODE   H_SIGNAL_SYS_RESET
 
@@ -313,6 +314,16 @@
 #define H_SIGNAL_SYS_RESET_ALL_OTHERS  -2
 /* >= 0 values are CPU number */
 
+/* Flag values used in H_REGISTER_PROC_TBL hcall */
+#define PROC_TABLE_OP_MASK 0x18
+#define PROC_TABLE_DEREG   0x10
+#define PROC_TABLE_NEW 0x18
+#define PROC_TABLE_TYPE_MASK   0x06
+#define PROC_TABLE_HPT_SLB 0x00
+#define PROC_TABLE_HPT_PT  0x02
+#define PROC_TABLE_RADIX   0x04
+#define PROC_TABLE_GTSE0x01
+
 #ifndef __ASSEMBLY__
 
 /**
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index e6d83d0..8af2546 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -121,6 +121,8 @@ struct of_drconf_cell {
 #define OV1_PPC_2_06   0x02/* set if we support PowerPC 2.06 */
 #define OV1_PPC_2_07   0x01/* set if we support PowerPC 2.07 */
 
+#define OV1_PPC_3_00   0x80/* set if we support PowerPC 3.00 */
+
 /* Option vector 2: Open Firmware options supported */
 #define OV2_REAL_MODE  0x20/* set if we want OF in real mode */
 
@@ -155,6 +157,13 @@ struct of_drconf_cell {
 #define OV5_PFO_HW_842 0x1140  /* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR0x1120  /* PFO Encryption Accelerator */
 #define OV5_SUB_PROCESSORS 0x1501  /* 1,2,or 4 Sub-Processors supported */
+#define OV5_XIVE_EXPLOIT   0x1701  /* XIVE exploitation supported */
+#define OV5_MMU_RADIX_300  0x1880  /* ISA v3.00 radix MMU supported */
+#define OV5_MMU_HASH_300   0x1840  /* ISA v3.00 hash MMU supported */
+#define OV5_MMU_SEGM_RADIX 0x1820  /* radix mode (no segmentation) */
+#define OV5_MMU_PROC_TBL   0x1810  /* hcall selects SLB or proc table */
+#define OV5_MMU_SLB0x1800  /* always use SLB */
+#define OV5_MMU_GTSE   0x1808  /* Guest translation shootdown */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX  0x02/* Linux is our OS */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ec47a93..358d43f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -649,6 +649,7 @@ static void __init early_cmdline_parse(void)
 struct option_vector1 {
u8 byte1;
u8 arch_versions;
+   u8 arch_versions3;
 } __packed;
 
 struct option_vector2 {
@@ -691,6 +692,9 @@ struct option_vector5 {
u8 reserved2;
__be16 reserved3;
u8 subprocessors;
+   u8 byte22;
+   u8 intarch;
+   u8 mmu;
 } __packed;
 
 struct option_vector6 {
@@ -700,7 +704,7 @@ struct option_vector6 {
 } __packed;
 
 struct ibm_arch_vec {
-   struct { u32 mask, val; } pvrs[10];
+ 

Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config

2016-12-05 Thread Pan Xinhui



在 2016/12/6 09:24, Pan Xinhui 写道:



在 2016/12/6 08:58, Boqun Feng 写道:

On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:

pSeries/powerNV will use qspinlock from now on.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/platforms/pseries/Kconfig | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb..8a87d06 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig


Why here? Not arch/powerpc/platforms/Kconfig?


@@ -23,6 +23,14 @@ config PPC_PSERIES
 select PPC_DOORBELL
 default y

+config ARCH_USE_QUEUED_SPINLOCKS
+default y
+bool "Enable qspinlock"


I think you just enable qspinlock by default for all PPC platforms. I
guess you need to put

depends on PPC_PSERIES || PPC_POWERNV

here to achieve what you mean in you commit message.


oh, yes, need depends on PPC_PSERIES || PPC_POWERNV.


yes, another good way.
I prefer to put it in pseries/Kconfig as same as pv-qspinlocks config.
when we build nv, it still include pSeries's config anyway.

thanks
xinhui


Regards,
Boqun


+help
+  Enabling this option will let kernel use qspinlock which is a kind of
+  fairlock.  It has shown a good performance improvement on x86 and also 
ppc
+  especially in high contention cases.
+
 config PPC_SPLPAR
 depends on PPC_PSERIES
 bool "Support for shared-processor logical partitions"
--
2.4.11





[RFC][PATCH] powerpc/64s: use start, size rather than start, end for exception handlers

2016-12-05 Thread Nicholas Piggin
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).

Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/head-64.h   | 158 ++--
 arch/powerpc/kernel/exceptions-64s.S | 195 ++-
 2 files changed, 185 insertions(+), 168 deletions(-)

diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index c691fc2..a475711 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -38,8 +38,8 @@
  * li  r10,128
  * mv  r11,r10
 
- * FIXED_SECTION_ENTRY_BEGIN_LOCATION(section_name, label2, start_address)
- * FIXED_SECTION_ENTRY_END_LOCATION(section_name, label2, end_address)
+ * FIXED_SECTION_ENTRY_BEGIN_LOCATION(section_name, label2, start_address, 
size)
+ * FIXED_SECTION_ENTRY_END_LOCATION(section_name, label2, start_address, size)
  * CLOSE_FIXED_SECTION(section_name)
  *
  * ZERO_FIXED_SECTION can be used to emit zeroed data.
@@ -102,9 +102,15 @@ end_##sname:
 #define FIXED_SECTION_ENTRY_BEGIN(sname, name) \
__FIXED_SECTION_ENTRY_BEGIN(sname, name, IFETCH_ALIGN_BYTES)
 
-#define FIXED_SECTION_ENTRY_BEGIN_LOCATION(sname, name, start) \
+#define FIXED_SECTION_ENTRY_BEGIN_LOCATION(sname, name, start, size) \
USE_FIXED_SECTION(sname);   \
name##_start = (start); \
+   .if ((start) % (size) != 0);\
+   .error "Fixed section exception vector misalignment";   \
+   .endif; \
+   .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100); \
+   .error "Fixed section exception vector bad size";   \
+   .endif; \
.if (start) < sname##_start;\
.error "Fixed section underflow";   \
.abort; \
@@ -113,16 +119,16 @@ end_##sname:
.global name;   \
 name:
 
-#define FIXED_SECTION_ENTRY_END_LOCATION(sname, name, end) \
-   .if (end) > sname##_end;\
+#define FIXED_SECTION_ENTRY_END_LOCATION(sname, name, start, size) \
+   .if (start) + (size) > sname##_end; \
.error "Fixed section overflow";\
.abort; \
.endif; \
-   .if (. - name > end - name##_start);\
+   .if (. - name > (start) + (size) - name##_start);   \
.error "Fixed entry overflow";  \
.abort; \
.endif; \
-   . = ((end) - sname##_start);\
+   . = ((start) + (size) - sname##_start); \
 
 
 /*
@@ -191,17 +197,17 @@ end_##sname:
  * and OOL handlers are implemented as types of TRAMP and TRAMP_VIRT handlers.
  */
 
-#define EXC_REAL_BEGIN(name, start, end)   \
-   FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##name, start)
+#define EXC_REAL_BEGIN(name, start, size)  \
+   FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##name, start, size)
 
-#define EXC_REAL_END(name, start, end) \
-   FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, 
exc_real_##start##_##name, end)
+#define EXC_REAL_END(name, start, size)\
+   FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, 
exc_real_##start##_##name, start, size)
 
-#define EXC_VIRT_BEGIN(name, start, end)   \
-   FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, 
exc_virt_##start##_##name, start)
+#define EXC_VIRT_BEGIN(name, start, size)  \
+   FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, 
exc_virt_##start##_##name, start, size)
 
-#define EXC_VIRT_END(name, start, end) \
-   FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, 
exc_virt_##start##_##name, end)
+#define EXC_VIRT_END(name, start, size)\
+   FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, 
exc_virt_##start##_##name, start, size)
 
 #define EXC_COMMON_BEGIN(name) \
USE_TEXT_SECTION(); \
@@ -223,140 +229,140 @@ end_##sname:
 #define TRAMP_KVM_BEGIN(name)
 #endif
 
-#define EXC_REAL_NONE(start, end)  \
-   FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##unused, start); \
-   FIXED_SEC

[PATCH] powerpc/64s: tidy up after exception handler rework

2016-12-05 Thread Nicholas Piggin
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment. Might as well fix them all up now while git
blame history doesn't extend too far.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/head-64.h   | 160 +--
 arch/powerpc/kernel/exceptions-64s.S |   2 +-
 2 files changed, 81 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index fca7033..c691fc2 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -102,7 +102,7 @@ end_##sname:
 #define FIXED_SECTION_ENTRY_BEGIN(sname, name) \
__FIXED_SECTION_ENTRY_BEGIN(sname, name, IFETCH_ALIGN_BYTES)
 
-#define FIXED_SECTION_ENTRY_BEGIN_LOCATION(sname, name, start) \
+#define FIXED_SECTION_ENTRY_BEGIN_LOCATION(sname, name, start) \
USE_FIXED_SECTION(sname);   \
name##_start = (start); \
.if (start) < sname##_start;\
@@ -113,7 +113,7 @@ end_##sname:
.global name;   \
 name:
 
-#define FIXED_SECTION_ENTRY_END_LOCATION(sname, name, end) \
+#define FIXED_SECTION_ENTRY_END_LOCATION(sname, name, end) \
.if (end) > sname##_end;\
.error "Fixed section overflow";\
.abort; \
@@ -147,12 +147,12 @@ end_##sname:
  * Following are the BOOK3S exception handler helper macros.
  * Handlers come in a number of types, and each type has a number of varieties.
  *
- * EXC_REAL_*- real, unrelocated exception vectors
- * EXC_VIRT_*- virt (AIL), unrelocated exception vectors
+ * EXC_REAL_* - real, unrelocated exception vectors
+ * EXC_VIRT_* - virt (AIL), unrelocated exception vectors
  * TRAMP_REAL_*   - real, unrelocated helpers (virt can call these)
- * TRAMP_VIRT_*  - virt, unreloc helpers (in practice, real can use)
- * TRAMP_KVM - KVM handlers that get put into real, unrelocated
- * EXC_COMMON_*  - virt, relocated common handlers
+ * TRAMP_VIRT_*   - virt, unreloc helpers (in practice, real can use)
+ * TRAMP_KVM  - KVM handlers that get put into real, unrelocated
+ * EXC_COMMON_*   - virt, relocated common handlers
  *
  * The EXC handlers are given a name, and branch to name_common, or the
  * appropriate KVM or masking function. Vector handler verieties are as
@@ -194,20 +194,20 @@ end_##sname:
 #define EXC_REAL_BEGIN(name, start, end)   \
FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##name, start)
 
-#define EXC_REAL_END(name, start, end) \
+#define EXC_REAL_END(name, start, end) \
FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, 
exc_real_##start##_##name, end)
 
 #define EXC_VIRT_BEGIN(name, start, end)   \
FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, 
exc_virt_##start##_##name, start)
 
-#define EXC_VIRT_END(name, start, end) \
+#define EXC_VIRT_END(name, start, end) \
FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, 
exc_virt_##start##_##name, end)
 
-#define EXC_COMMON_BEGIN(name) \
-   USE_TEXT_SECTION(); \
-   .balign IFETCH_ALIGN_BYTES; \
-   .global name;   \
-   DEFINE_FIXED_SYMBOL(name);  \
+#define EXC_COMMON_BEGIN(name) \
+   USE_TEXT_SECTION(); \
+   .balign IFETCH_ALIGN_BYTES; \
+   .global name;   \
+   DEFINE_FIXED_SYMBOL(name);  \
 name:
 
 #define TRAMP_REAL_BEGIN(name) \
@@ -217,7 +217,7 @@ end_##sname:
FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name)
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-#define TRAMP_KVM_BEGIN(name)  \
+#define TRAMP_KVM_BEGIN(name)  \
TRAMP_REAL_BEGIN(name)
 #else
 #define TRAMP_KVM_BEGIN(name)
@@ -232,132 +232,132 @@ end_##sname:
FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, 
exc_virt_##start##_##unused, end);
 
 
-#define EXC_REAL(name, start, end) \
-   EXC_REAL_BEGIN(name, start, end);   \
+#define EXC_REAL(name, start, end) \
+   EXC_REAL_BEGIN(name, start, end);   \
STD_EXCEPTION_PSERIES(start, name##_common);\
EXC_REAL_END(name, start, end);

Re: [PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function

2016-12-05 Thread Pan Xinhui



在 2016/12/6 09:23, Boqun Feng 写道:

On Mon, Dec 05, 2016 at 10:19:23AM -0500, Pan Xinhui wrote:

Add two corresponding helper functions to support pv-qspinlock.

For normal use, __spin_yield_cpu will confer current vcpu slices to the
target vcpu(say, a lock holder). If target vcpu is not specified or it
is in running state, such conferging to lpar happens or not depends.

Because hcall itself will introduce latency and a little overhead. And we
do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
The second parameter *confer* can indicate such case.

__spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
current vcpu state.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/spinlock.h |  4 +++
 arch/powerpc/lib/locks.c| 59 +
 2 files changed, 63 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 954099e..6426bd5 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
 extern void __spin_yield(arch_spinlock_t *lock);
+extern void __spin_yield_cpu(int cpu, int confer);
+extern void __spin_wake_cpu(int cpu);
 extern void __rw_yield(arch_rwlock_t *lock);
 #else /* SPLPAR */
 #define __spin_yield(x)barrier()
+#define __spin_yield_cpu(x, y) barrier()
+#define __spin_wake_cpu(x) barrier()
 #define __rw_yield(x)  barrier()
 #define SHARED_PROCESSOR   0
 #endif
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6574626..bd872c9 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,65 @@
 #include 
 #include 

+/*
+ * confer our slices to a specified cpu and return. If it is in running state
+ * or cpu is -1, then we will check confer. If confer is NULL, we will return
+ * otherwise we confer our slices to lpar.
+ */
+void __spin_yield_cpu(int cpu, int confer)
+{
+   unsigned int holder_cpu = cpu, yield_count;


As I said at:

https://marc.info/?l=linux-kernel&m=147455748619343&w=2

@holder_cpu is not necessary and doesn't help anything.


+
+   if (cpu == -1)
+   goto yield_to_lpar;
+
+   BUG_ON(holder_cpu >= nr_cpu_ids);
+   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   /* if cpu is running, confer slices to lpar conditionally*/
+   if ((yield_count & 1) == 0)
+   goto yield_to_lpar;
+
+   plpar_hcall_norets(H_CONFER,
+   get_hard_smp_processor_id(holder_cpu), yield_count);
+   return;
+
+yield_to_lpar:
+   if (confer)
+   plpar_hcall_norets(H_CONFER, -1, 0);
+}
+EXPORT_SYMBOL_GPL(__spin_yield_cpu);
+
+void __spin_wake_cpu(int cpu)
+{
+   unsigned int holder_cpu = cpu;


And it's even wrong to call the parameter of _wake_cpu() a holder_cpu,
because it's not the current lock holder.


oh, its name is really misleading.

thanks


Regards,
Boqun


+
+   BUG_ON(holder_cpu >= nr_cpu_ids);
+   /*
+* NOTE: we should always do this hcall regardless of
+* the yield_count of the holder_cpu.
+* as thers might be a case like below;
+*  CPU 1   CPU 2
+*  yielded = true
+* if (yielded)
+*  __spin_wake_cpu()
+*  __spin_yield_cpu()
+*
+* So we might lose a wake if we check the yield_count and
+* return directly if the holder_cpu is running.
+* IOW. do NOT code like below.
+*  yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+*  if ((yield_count & 1) == 0)
+*  return;
+*
+* a PROD hcall marks the target_cpu proded, which cause the next cede
+* or confer called on the target_cpu invalid.
+*/
+   plpar_hcall_norets(H_PROD,
+   get_hard_smp_processor_id(holder_cpu));
+}
+EXPORT_SYMBOL_GPL(__spin_wake_cpu);
+
 #ifndef CONFIG_QUEUED_SPINLOCKS
 void __spin_yield(arch_spinlock_t *lock)
 {
--
2.4.11





Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config

2016-12-05 Thread Pan Xinhui



在 2016/12/6 08:58, Boqun Feng 写道:

On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:

pSeries/powerNV will use qspinlock from now on.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/platforms/pseries/Kconfig | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb..8a87d06 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig


Why here? Not arch/powerpc/platforms/Kconfig?


@@ -23,6 +23,14 @@ config PPC_PSERIES
select PPC_DOORBELL
default y

+config ARCH_USE_QUEUED_SPINLOCKS
+   default y
+   bool "Enable qspinlock"


I think you just enable qspinlock by default for all PPC platforms. I
guess you need to put

depends on PPC_PSERIES || PPC_POWERNV

here to achieve what you mean in you commit message.


yes, another good way.
I prefer to put it in pseries/Kconfig as same as pv-qspinlocks config.
when we build nv, it still include pSeries's config anyway.

thanks
xinhui


Regards,
Boqun


+   help
+ Enabling this option will let kernel use qspinlock which is a kind of
+ fairlock.  It has shown a good performance improvement on x86 and 
also ppc
+ especially in high contention cases.
+
 config PPC_SPLPAR
depends on PPC_PSERIES
bool "Support for shared-processor logical partitions"
--
2.4.11





Re: [PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function

2016-12-05 Thread Boqun Feng
On Mon, Dec 05, 2016 at 10:19:23AM -0500, Pan Xinhui wrote:
> Add two corresponding helper functions to support pv-qspinlock.
> 
> For normal use, __spin_yield_cpu will confer current vcpu slices to the
> target vcpu(say, a lock holder). If target vcpu is not specified or it
> is in running state, such conferging to lpar happens or not depends.
> 
> Because hcall itself will introduce latency and a little overhead. And we
> do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
> The second parameter *confer* can indicate such case.
> 
> __spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
> current vcpu state.
> 
> Signed-off-by: Pan Xinhui 
> ---
>  arch/powerpc/include/asm/spinlock.h |  4 +++
>  arch/powerpc/lib/locks.c| 59 
> +
>  2 files changed, 63 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/spinlock.h 
> b/arch/powerpc/include/asm/spinlock.h
> index 954099e..6426bd5 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
>  /* We only yield to the hypervisor if we are in shared processor mode */
>  #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
>  extern void __spin_yield(arch_spinlock_t *lock);
> +extern void __spin_yield_cpu(int cpu, int confer);
> +extern void __spin_wake_cpu(int cpu);
>  extern void __rw_yield(arch_rwlock_t *lock);
>  #else /* SPLPAR */
>  #define __spin_yield(x)barrier()
> +#define __spin_yield_cpu(x, y) barrier()
> +#define __spin_wake_cpu(x) barrier()
>  #define __rw_yield(x)  barrier()
>  #define SHARED_PROCESSOR   0
>  #endif
> diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
> index 6574626..bd872c9 100644
> --- a/arch/powerpc/lib/locks.c
> +++ b/arch/powerpc/lib/locks.c
> @@ -23,6 +23,65 @@
>  #include 
>  #include 
>  
> +/*
> + * confer our slices to a specified cpu and return. If it is in running state
> + * or cpu is -1, then we will check confer. If confer is NULL, we will return
> + * otherwise we confer our slices to lpar.
> + */
> +void __spin_yield_cpu(int cpu, int confer)
> +{
> + unsigned int holder_cpu = cpu, yield_count;

As I said at:

https://marc.info/?l=linux-kernel&m=147455748619343&w=2

@holder_cpu is not necessary and doesn't help anything.

> +
> + if (cpu == -1)
> + goto yield_to_lpar;
> +
> + BUG_ON(holder_cpu >= nr_cpu_ids);
> + yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
> +
> + /* if cpu is running, confer slices to lpar conditionally*/
> + if ((yield_count & 1) == 0)
> + goto yield_to_lpar;
> +
> + plpar_hcall_norets(H_CONFER,
> + get_hard_smp_processor_id(holder_cpu), yield_count);
> + return;
> +
> +yield_to_lpar:
> + if (confer)
> + plpar_hcall_norets(H_CONFER, -1, 0);
> +}
> +EXPORT_SYMBOL_GPL(__spin_yield_cpu);
> +
> +void __spin_wake_cpu(int cpu)
> +{
> + unsigned int holder_cpu = cpu;

And it's even wrong to call the parameter of _wake_cpu() a holder_cpu,
because it's not the current lock holder.

Regards,
Boqun

> +
> + BUG_ON(holder_cpu >= nr_cpu_ids);
> + /*
> +  * NOTE: we should always do this hcall regardless of
> +  * the yield_count of the holder_cpu.
> +  * as thers might be a case like below;
> +  *  CPU 1   CPU 2
> +  *  yielded = true
> +  * if (yielded)
> +  *  __spin_wake_cpu()
> +  *  __spin_yield_cpu()
> +  *
> +  * So we might lose a wake if we check the yield_count and
> +  * return directly if the holder_cpu is running.
> +  * IOW. do NOT code like below.
> +  *  yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
> +  *  if ((yield_count & 1) == 0)
> +  *  return;
> +  *
> +  * a PROD hcall marks the target_cpu proded, which cause the next cede
> +  * or confer called on the target_cpu invalid.
> +  */
> + plpar_hcall_norets(H_PROD,
> + get_hard_smp_processor_id(holder_cpu));
> +}
> +EXPORT_SYMBOL_GPL(__spin_wake_cpu);
> +
>  #ifndef CONFIG_QUEUED_SPINLOCKS
>  void __spin_yield(arch_spinlock_t *lock)
>  {
> -- 
> 2.4.11
> 


signature.asc
Description: PGP signature


Re: [PATCH v3 2/3] powerpc: get hugetlbpage handling more generic

2016-12-05 Thread Scott Wood
On Wed, 2016-09-21 at 10:11 +0200, Christophe Leroy wrote:
> Today there are two implementations of hugetlbpages which are managed
> by exclusive #ifdefs:
> * FSL_BOOKE: several directory entries points to the same single hugepage
> * BOOK3S: one upper level directory entry points to a table of hugepages
> 
> In preparation of implementation of hugepage support on the 8xx, we
> need a mix of the two above solutions, because the 8xx needs both cases
> depending on the size of pages:
> * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
> that 2 PGD entries will be necessary to cover an 8M hugepage while a
> single PGD entry will cover 8x 512k hugepages.
> * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
> that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
> hugepages will be covers by one PGD entry.
> 
> This patch:
> * removes #ifdefs in favor of if/else based on the range sizes
> * merges the two huge_pte_alloc() functions as they are pretty similar
> * merges the two hugetlbpage_init() functions as they are pretty similar
[snip]
> @@ -860,16 +803,34 @@ static int __init hugetlbpage_init(void)
>    * if we have pdshift and shift value same, we don't
>    * use pgt cache for hugepd.
>    */
> - if (pdshift != shift) {
> + if (pdshift > shift) {
>   pgtable_cache_add(pdshift - shift, NULL);
>   if (!PGT_CACHE(pdshift - shift))
>   panic("hugetlbpage_init(): could not create
> "
>     "pgtable cache for %d bit
> pagesize\n", shift);
>   }
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> + else if (!hugepte_cache) {

This else never triggers on book3e, because the way this function calculates
pdshift is wrong for book3e (it uses PyD_SHIFT instead of HUGEPD_PxD_SHIFT).
 We later get OOMs because huge_pte_alloc() calculates pdshift correctly,
tries to use hugepte_cache, and fails.

If the point of this patch is to remove the compile-time decision on whether
to do things the book3e way, why are there still ifdefs such as the ones
controlling the definition of HUGEPD_PxD_SHIFT?  How does what you're doing on
8xx (for certain page sizes) differ from book3e?

-Scott



Re: [PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock

2016-12-05 Thread Pan Xinhui

correct waiman's address.

在 2016/12/6 08:47, Boqun Feng 写道:

On Mon, Dec 05, 2016 at 10:19:21AM -0500, Pan Xinhui wrote:

This patch add basic code to enable qspinlock on powerpc. qspinlock is
one kind of fairlock implementation. And seen some performance improvement
under some scenarios.

queued_spin_unlock() release the lock by just one write of NULL to the
::locked field which sits at different places in the two endianness
system.

We override some arch_spin_XXX as powerpc has io_sync stuff which makes
sure the io operations are protected by the lock correctly.

There is another special case, see commit
2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/qspinlock.h  | 66 +++
 arch/powerpc/include/asm/spinlock.h   | 31 +--
 arch/powerpc/include/asm/spinlock_types.h |  4 ++
 arch/powerpc/lib/locks.c  | 59 +++
 4 files changed, 147 insertions(+), 13 deletions(-)
 create mode 100644 arch/powerpc/include/asm/qspinlock.h

diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index 000..4c89256
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,66 @@
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include 
+
+#define SPIN_THRESHOLD (1 << 15)
+#define queued_spin_unlock queued_spin_unlock
+#define queued_spin_is_locked queued_spin_is_locked
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+
+static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
+{
+   return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   /* release semantics is required */
+   smp_store_release(__qspinlock_lock_byte(lock), 0);
+}
+
+static inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+   smp_mb();
+   return atomic_read(&lock->val);
+}
+
+#include 
+
+/* we need override it as ppc has io_sync stuff */
+#undef arch_spin_trylock
+#undef arch_spin_lock
+#undef arch_spin_lock_flags
+#undef arch_spin_unlock
+#define arch_spin_trylock arch_spin_trylock
+#define arch_spin_lock arch_spin_lock
+#define arch_spin_lock_flags arch_spin_lock_flags
+#define arch_spin_unlock arch_spin_unlock
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+   CLEAR_IO_SYNC;
+   return queued_spin_trylock(lock);
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+   CLEAR_IO_SYNC;
+   queued_spin_lock(lock);
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+   CLEAR_IO_SYNC;
+   queued_spin_lock(lock);
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+   SYNC_IO;
+   queued_spin_unlock(lock);
+}
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 8c1b913..954099e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
 }
 #endif

+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
+extern void __spin_yield(arch_spinlock_t *lock);
+extern void __rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+#define __spin_yield(x)barrier()
+#define __rw_yield(x)  barrier()
+#define SHARED_PROCESSOR   0
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include 
+#else
+
+#define arch_spin_relax(lock)  __spin_yield(lock)
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
  * held.  Conveniently, we have a word in the paca that holds this
  * value.
  */
-
-#if defined(CONFIG_PPC_SPLPAR)
-/* We only yield to the hypervisor if we are in shared processor mode */
-#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-extern void __spin_yield(arch_spinlock_t *lock);
-extern void __rw_yield(arch_rwlock_t *lock);
-#else /* SPLPAR */
-#define __spin_yield(x)barrier()
-#define __rw_yield(x)  barrier()
-#define SHARED_PROCESSOR   0
-#endif
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
CLEAR_IO_SYNC;
@@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
smp_mb();
 }

+#endif /* !CONFIG_QUEUED_SPINLOCKS */
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
@@ -338,7 +344,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock

Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config

2016-12-05 Thread Boqun Feng
On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:
> pSeries/powerNV will use qspinlock from now on.
> 
> Signed-off-by: Pan Xinhui 
> ---
>  arch/powerpc/platforms/pseries/Kconfig | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/Kconfig 
> b/arch/powerpc/platforms/pseries/Kconfig
> index bec90fb..8a87d06 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig

Why here? Not arch/powerpc/platforms/Kconfig?

> @@ -23,6 +23,14 @@ config PPC_PSERIES
>   select PPC_DOORBELL
>   default y
>  
> +config ARCH_USE_QUEUED_SPINLOCKS
> + default y
> + bool "Enable qspinlock"

I think you just enable qspinlock by default for all PPC platforms. I
guess you need to put

depends on PPC_PSERIES || PPC_POWERNV

here to achieve what you mean in you commit message.

Regards,
Boqun

> + help
> +   Enabling this option will let kernel use qspinlock which is a kind of
> +   fairlock.  It has shown a good performance improvement on x86 and 
> also ppc
> +   especially in high contention cases.
> +
>  config PPC_SPLPAR
>   depends on PPC_PSERIES
>   bool "Support for shared-processor logical partitions"
> -- 
> 2.4.11
> 


signature.asc
Description: PGP signature


Re: [PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock

2016-12-05 Thread Boqun Feng
On Mon, Dec 05, 2016 at 10:19:21AM -0500, Pan Xinhui wrote:
> This patch add basic code to enable qspinlock on powerpc. qspinlock is
> one kind of fairlock implementation. And seen some performance improvement
> under some scenarios.
> 
> queued_spin_unlock() release the lock by just one write of NULL to the
> ::locked field which sits at different places in the two endianness
> system.
> 
> We override some arch_spin_XXX as powerpc has io_sync stuff which makes
> sure the io operations are protected by the lock correctly.
> 
> There is another special case, see commit
> 2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")
> 
> Signed-off-by: Pan Xinhui 
> ---
>  arch/powerpc/include/asm/qspinlock.h  | 66 
> +++
>  arch/powerpc/include/asm/spinlock.h   | 31 +--
>  arch/powerpc/include/asm/spinlock_types.h |  4 ++
>  arch/powerpc/lib/locks.c  | 59 +++
>  4 files changed, 147 insertions(+), 13 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/qspinlock.h
> 
> diff --git a/arch/powerpc/include/asm/qspinlock.h 
> b/arch/powerpc/include/asm/qspinlock.h
> new file mode 100644
> index 000..4c89256
> --- /dev/null
> +++ b/arch/powerpc/include/asm/qspinlock.h
> @@ -0,0 +1,66 @@
> +#ifndef _ASM_POWERPC_QSPINLOCK_H
> +#define _ASM_POWERPC_QSPINLOCK_H
> +
> +#include 
> +
> +#define SPIN_THRESHOLD (1 << 15)
> +#define queued_spin_unlock queued_spin_unlock
> +#define queued_spin_is_locked queued_spin_is_locked
> +#define queued_spin_unlock_wait queued_spin_unlock_wait
> +
> +extern void queued_spin_unlock_wait(struct qspinlock *lock);
> +
> +static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
> +{
> + return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
> +}
> +
> +static inline void queued_spin_unlock(struct qspinlock *lock)
> +{
> + /* release semantics is required */
> + smp_store_release(__qspinlock_lock_byte(lock), 0);
> +}
> +
> +static inline int queued_spin_is_locked(struct qspinlock *lock)
> +{
> + smp_mb();
> + return atomic_read(&lock->val);
> +}
> +
> +#include 
> +
> +/* we need override it as ppc has io_sync stuff */
> +#undef arch_spin_trylock
> +#undef arch_spin_lock
> +#undef arch_spin_lock_flags
> +#undef arch_spin_unlock
> +#define arch_spin_trylock arch_spin_trylock
> +#define arch_spin_lock arch_spin_lock
> +#define arch_spin_lock_flags arch_spin_lock_flags
> +#define arch_spin_unlock arch_spin_unlock
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + CLEAR_IO_SYNC;
> + return queued_spin_trylock(lock);
> +}
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + CLEAR_IO_SYNC;
> + queued_spin_lock(lock);
> +}
> +
> +static inline
> +void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
> +{
> + CLEAR_IO_SYNC;
> + queued_spin_lock(lock);
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + SYNC_IO;
> + queued_spin_unlock(lock);
> +}
> +#endif /* _ASM_POWERPC_QSPINLOCK_H */
> diff --git a/arch/powerpc/include/asm/spinlock.h 
> b/arch/powerpc/include/asm/spinlock.h
> index 8c1b913..954099e 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
>  }
>  #endif
>  
> +#if defined(CONFIG_PPC_SPLPAR)
> +/* We only yield to the hypervisor if we are in shared processor mode */
> +#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
> +extern void __spin_yield(arch_spinlock_t *lock);
> +extern void __rw_yield(arch_rwlock_t *lock);
> +#else /* SPLPAR */
> +#define __spin_yield(x)barrier()
> +#define __rw_yield(x)  barrier()
> +#define SHARED_PROCESSOR   0
> +#endif
> +
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +#include 
> +#else
> +
> +#define arch_spin_relax(lock)  __spin_yield(lock)
> +
>  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>  {
>   return lock.slock == 0;
> @@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t 
> *lock)
>   * held.  Conveniently, we have a word in the paca that holds this
>   * value.
>   */
> -
> -#if defined(CONFIG_PPC_SPLPAR)
> -/* We only yield to the hypervisor if we are in shared processor mode */
> -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
> -extern void __spin_yield(arch_spinlock_t *lock);
> -extern void __rw_yield(arch_rwlock_t *lock);
> -#else /* SPLPAR */
> -#define __spin_yield(x)  barrier()
> -#define __rw_yield(x)barrier()
> -#define SHARED_PROCESSOR 0
> -#endif
> -
>  static inline void arch_spin_lock(arch_spinlock_t *lock)
>  {
>   CLEAR_IO_SYNC;
> @@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
> *lock)
>   smp_mb();
>  }
>  
> +#endif /* !CONFIG_QUEUED_SPINLOCKS */
>  /*
>   * Read-write spinlocks, allowing multiple readers
>   * but o

[GIT PULL 00/20] perf/core improvements and fixes

2016-12-05 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit e7af7b15121ca08c31a0ab9df71a41b4c53365b4:

  Merge tag 'perf-core-for-mingo-20161201' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
(2016-12-02 10:08:03 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-core-for-mingo-20161205

for you to fetch changes up to bec60e50af83741cde1786ab475d4bf472aed6f9:

  perf annotate: Show raw form for jump instruction with indirect target 
(2016-12-05 17:21:57 -0300)


perf/core improvements and fixes:

Fixes:

- Do not show a bogus target address in 'perf annotate' for targetless powerpc
  jump instructions such as 'bctr' (Ravi Bangoria)

- tools/build fixes related to race conditions with the fixdep utility (Jiri 
Olsa)

- Fix building objtool with clang (Peter Foley)

Infrastructure:

- Support linking perf with clang and LLVM libraries, initially statically, but
  this limitation will be lifted and shared libraries, when available, will
  be preferred to the static build, that should, as with other features, be
  enabled explicitly (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo 


Jiri Olsa (7):
  tools build: Make fixdep parsing wait for last target
  tools build: Make the .cmd file more readable
  tools build: Move tabs to spaces where suitable
  perf tools: Move install-gtk target into rules area
  perf tools: Move python/perf.so target into rules area
  perf tools: Cleanup build directory before each test
  perf tools: Add non config targets

Peter Foley (1):
  tools build: Fix objtool build with clang

Ravi Bangoria (1):
  perf annotate: Show raw form for jump instruction with indirect target

Wang Nan (11):
  perf tools: Pass context to perf hook functions
  perf llvm: Extract helpers in llvm-utils.c
  tools build: Add feature detection for LLVM
  tools build: Add feature detection for clang
  perf build: Add clang and llvm compile and linking support
  perf clang: Add builtin clang support ant test case
  perf clang: Use real file system for #include
  perf clang: Allow passing CFLAGS to builtin clang
  perf clang: Update test case to use real BPF script
  perf clang: Support compile IR to BPF object and add testcase
  perf clang: Compile BPF script using builtin clang support

 tools/build/Build.include  |  20 ++--
 tools/build/Makefile.feature   | 138 +-
 tools/build/feature/Makefile   | 120 +--
 tools/build/feature/test-clang.cpp |  21 
 tools/build/feature/test-llvm.cpp  |   8 ++
 tools/build/fixdep.c   |   5 +-
 tools/perf/Makefile.config |  62 +---
 tools/perf/Makefile.perf   |  56 +++
 tools/perf/tests/Build |   1 +
 tools/perf/tests/builtin-test.c|   9 ++
 tools/perf/tests/clang.c   |  46 +
 tools/perf/tests/llvm.h|   7 ++
 tools/perf/tests/make  |   4 +-
 tools/perf/tests/perf-hooks.c  |  14 ++-
 tools/perf/tests/tests.h   |   3 +
 tools/perf/util/Build  |   2 +
 tools/perf/util/annotate.c |   3 +
 tools/perf/util/bpf-loader.c   |  19 +++-
 tools/perf/util/c++/Build  |   2 +
 tools/perf/util/c++/clang-c.h  |  43 
 tools/perf/util/c++/clang-test.cpp |  62 
 tools/perf/util/c++/clang.cpp  | 195 +
 tools/perf/util/c++/clang.h|  26 +
 tools/perf/util/llvm-utils.c   |  76 +++
 tools/perf/util/llvm-utils.h   |   6 ++
 tools/perf/util/perf-hooks.c   |  10 +-
 tools/perf/util/perf-hooks.h   |   6 +-
 tools/perf/util/util-cxx.h |  26 +
 28 files changed, 795 insertions(+), 195 deletions(-)
 create mode 100644 tools/build/feature/test-clang.cpp
 create mode 100644 tools/build/feature/test-llvm.cpp
 create mode 100644 tools/perf/tests/clang.c
 create mode 100644 tools/perf/util/c++/Build
 create mode 100644 tools/perf/util/c++/clang-c.h
 create mode 100644 tools/perf/util/c++/clang-test.cpp
 create mode 100644 tools/perf/util/c++/clang.cpp
 create mode 100644 tools/perf/util/c++/clang.h
 create mode 100644 tools/perf/util/util-cxx.h

  # uname -a
  Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 
x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms: Ok
   2: Detect openat syscall event: Ok
   3: Detect openat syscall event on all cpus: Ok
   4: Read samples using the mmap interface  : Ok
   5: Parse event definition strings : Ok
   6: PERF_RECORD_* events & pe

[PATCH 20/20] perf annotate: Show raw form for jump instruction with indirect target

2016-12-05 Thread Arnaldo Carvalho de Melo
From: Ravi Bangoria 

For jump instructions that does not include target address as direct operand,
show the original disassembled line for them. This is needed for certain
powerpc jump instructions that use target address in a register (such as bctr,
btar, ...).

Before:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr   ca2c
 stdr2,24(r1)
 addis  r12,r2,-1

After:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr
 stdr2,24(r1)
 addis  r12,r2,-1

Committer notes:

Testing it using a perf.data file and vmlinux for powerpc64,
cross-annotating it on a x86_64 workstation:

Before:

  .__bpf_prog_run  vmlinux.powerpc
 │stdr10,512(r9)  ▒
 │lbzr9,0(r31)▒
 │rldicr r9,r9,3,60   ▒
 │ldxr9,r30,r9▒
 │mtctr  r9   ▒
  100.00 │  ↓ bctr   3fe01510 ▒
 │lwar10,4(r31)   ▒
 │lwzr9,0(r31)▒
  
  Invalid jump offset: 3fe01510

After:

  .__bpf_prog_run  vmlinux.powerpc
 │stdr10,512(r9)  ▒
 │lbzr9,0(r31)▒
 │rldicr r9,r9,3,60   ▒
 │ldxr9,r30,r9▒
 │mtctr  r9   ▒
  100.00 │  ↓ bctr▒
 │lwar10,4(r31)   ▒
 │lwzr9,0(r31)▒
  
  Invalid jump offset: 3fe01510

This, in turn, uncovers another problem with jumps without operands, the
ENTER/-> operation, to jump to the target, still continues using the bogus
target :-)

BTW, this was the file used for the above tests:

  [acme@jouet ravi_bangoria]$ perf report --header-only -i 
perf.data.f22vm.powerdev
  # 
  # captured on: Thu Nov 24 12:40:38 2016
  # hostname : pdev-f22-qemu
  # os release : 4.4.10-200.fc22.ppc64
  # perf version : 4.9.rc1.g6298ce
  # arch : ppc64
  # nrcpus online : 48
  # nrcpus avail : 48
  # cpudesc : POWER7 (architected), altivec supported
  # cpuid : 74,513
  # total memory : 4158976 kB
  # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a
  # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 
4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 
1, c
  # HEADER_CPU_TOPOLOGY info available, use -I to display
  # HEADER_NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
  # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC 
HEADER_AUXTRACE HEADER_STAT HEADER_CACHE
  # 
  #
  [acme@jouet ravi_bangoria]$

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Chris Riyder 
Cc: Kim Phillips 
Cc: Markus Trippelsdorf 
Cc: Masami Hiramatsu 
Cc: Naveen N. Rao 
Cc: Peter Zijlstra 
Cc: Taeung Song 
Cc: linuxppc-dev@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1480953407-7605-1-git-send-email-ravi.bango...@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4012b1de2813..ea7e0de4b9c1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -237,6 +237,9 @@ static int jump__parse(struct arch *arch __maybe_unused, 
struct ins_operands *op
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
+   if (!ops->target.addr)
+   return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
 }
 
-- 
2.9.3



Re: [PATCH v8 1/3] perf annotate: Show raw form for jump instruction with indirect target

2016-12-05 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 05, 2016 at 05:21:42PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Dec 05, 2016 at 09:26:45PM +0530, Ravi Bangoria escreveu:
> > For jump instructions that does not include target address as direct
> > operand, show the original disassembled line for them. This is needed
> > for certain powerpc jump instructions that use target address in a
> > register (such as bctr, btar, ...).
> 
> Found it, .__bpf_prog_run, that is present in that perf.data file you
> sent me, has it, will use it in my committer notes for this patch.

So, I've added these committer notes while testing it, will continue
processing your patches later/tomorrow, thanks!

Committer notes:

Testing it using a perf.data file and vmlinux for powerpc64,
cross-annotating it on a x86_64 workstation:

Before:

  .__bpf_prog_run  vmlinux.powerpc
 │stdr10,512(r9)  ▒
 │lbzr9,0(r31)▒
 │rldicr r9,r9,3,60   ▒
 │ldxr9,r30,r9▒
 │mtctr  r9   ▒
  100.00 │  ↓ bctr   3fe01510 ▒
 │lwar10,4(r31)   ▒
 │lwzr9,0(r31)▒
  
  Invalid jump offset: 3fe01510

After:

  .__bpf_prog_run  vmlinux.powerpc
 │stdr10,512(r9)  ▒
 │lbzr9,0(r31)▒
 │rldicr r9,r9,3,60   ▒
 │ldxr9,r30,r9▒
 │mtctr  r9   ▒
  100.00 │  ↓ bctr▒
 │lwar10,4(r31)   ▒
 │lwzr9,0(r31)▒
  
  Invalid jump offset: 3fe01510

This, in turn, uncovers another problem with jumps without operands, the
ENTER/-> operation, to jump to the target, still continues using the bogus
target :-)

BTW, this was the file used for the above tests:

  [acme@jouet ravi_bangoria]$ perf report --header-only -i 
perf.data.f22vm.powerdev
  # 
  # captured on: Thu Nov 24 12:40:38 2016
  # hostname : pdev-f22-qemu
  # os release : 4.4.10-200.fc22.ppc64
  # perf version : 4.9.rc1.g6298ce
  # arch : ppc64
  # nrcpus online : 48
  # nrcpus avail : 48
  # cpudesc : POWER7 (architected), altivec supported
  # cpuid : 74,513
  # total memory : 4158976 kB
  # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a
  # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } 
= 4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 
1, c
  # HEADER_CPU_TOPOLOGY info available, use -I to display
  # HEADER_NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5
  # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK 
HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT HEADER_CACHE
  # 
  #
  [acme@jouet ravi_bangoria]$

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
Tested-by: Arnaldo Carvalho de Melo 
 
> - Arnaldo
> 
> > 
> > Before:
> >  ld r12,32088(r12)
> >  mtctr  r12
> >   v  bctr   ca2c
> >  stdr2,24(r1)
> >  addis  r12,r2,-1
> > 
> > After:
> >  ld r12,32088(r12)
> >  mtctr  r12
> >   v  bctr
> >  stdr2,24(r1)
> >  addis  r12,r2,-1
> > 
> > Suggested-by: Michael Ellerman 
> > Signed-off-by: Ravi Bangoria 
> > ---
> > Changes in v8:
> >   - v7: https://lkml.org/lkml/2016/9/21/436
> >   - Rebase to acme/perf/core
> >   - No logical changes. (Cross arch annotate patches are in. This patch
> > is for hardening annotate for powerpc.)
> > 
> >  tools/perf/util/annotate.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index 4012b1d..ea7e0de 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -237,6 +237,9 @@ static int jump__parse(struct arch *arch 
> > __maybe_unused, struct ins_operands *op
> >  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
> >struct ins_operands *ops)
> >  {
> > +   if (!ops->target.addr)
> > +   return ins__raw_scnprintf(ins, bf, size, ops);
> > +
> > return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
> > ops->target.offset);
> >  }
> >  
> > -- 
> > 2.4.11


Re: [PATCH v8 1/3] perf annotate: Show raw form for jump instruction with indirect target

2016-12-05 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 05, 2016 at 09:26:45PM +0530, Ravi Bangoria escreveu:
> For jump instructions that does not include target address as direct
> operand, show the original disassembled line for them. This is needed
> for certain powerpc jump instructions that use target address in a
> register (such as bctr, btar, ...).

Found it, .__bpf_prog_run, that is present in that perf.data file you
sent me, has it, will use it in my committer notes for this patch.

- Arnaldo

> 
> Before:
>  ld r12,32088(r12)
>  mtctr  r12
>   v  bctr   ca2c
>  stdr2,24(r1)
>  addis  r12,r2,-1
> 
> After:
>  ld r12,32088(r12)
>  mtctr  r12
>   v  bctr
>  stdr2,24(r1)
>  addis  r12,r2,-1
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Ravi Bangoria 
> ---
> Changes in v8:
>   - v7: https://lkml.org/lkml/2016/9/21/436
>   - Rebase to acme/perf/core
>   - No logical changes. (Cross arch annotate patches are in. This patch
> is for hardening annotate for powerpc.)
> 
>  tools/perf/util/annotate.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 4012b1d..ea7e0de 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -237,6 +237,9 @@ static int jump__parse(struct arch *arch __maybe_unused, 
> struct ins_operands *op
>  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>  struct ins_operands *ops)
>  {
> + if (!ops->target.addr)
> + return ins__raw_scnprintf(ins, bf, size, ops);
> +
>   return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
> ops->target.offset);
>  }
>  
> -- 
> 2.4.11


Re: [PATCH] PPC: sstep.c: Add modsw, moduw instruction emulation

2016-12-05 Thread PrasannaKumar Muralidharan
Hi Naveen,

Thanks for the review.

>> ---
>>  arch/powerpc/lib/sstep.c | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
>> index 9c78a9c..5acef72 100644
>> --- a/arch/powerpc/lib/sstep.c
>> +++ b/arch/powerpc/lib/sstep.c
>> @@ -1148,6 +1148,15 @@ int __kprobes analyse_instr(struct instruction_op 
>> *op, struct pt_regs *regs,
>>   (int) regs->gpr[rb];
>>   goto arith_done;
>>
>> + case 779:   /* modsw */
>> + regs->gpr[rd] = (int) regs->gpr[ra] %
>> + (int) regs->gpr[rb];
>> + goto arith_done;
>
> Since these instructions don't update CR, you can directly goto
> instr_done.

Sure. Will use that.

>> +
>> + case 267:   /* moduw */
>
> Please move this case further up so that the extended opcodes are in
> numerical order.

Placed it after divide instruction as it appeared logical. Also placed
267 below 779 as it is the order in which the instructions are
documented in the ISA book. This may help in finding related
instructions together. If this style is not preferred I can arrange it
in numerical order.

Regards,
PrasannaKumar


Re: [PATCH v8 1/3] perf annotate: Show raw form for jump instruction with indirect target

2016-12-05 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 05, 2016 at 09:26:45PM +0530, Ravi Bangoria escreveu:
> For jump instructions that does not include target address as direct
> operand, show the original disassembled line for them. This is needed
> for certain powerpc jump instructions that use target address in a
> register (such as bctr, btar, ...).

Please, mention the name of the function where you copy annotated
examples from, so that I can reproduce it here, using the files you
provided (perf.data and vmlinux for powerpc).

Searching one such function now...
 
> Before:
>  ld r12,32088(r12)
>  mtctr  r12
>   v  bctr   ca2c
>  stdr2,24(r1)
>  addis  r12,r2,-1
> 
> After:
>  ld r12,32088(r12)
>  mtctr  r12
>   v  bctr
>  stdr2,24(r1)
>  addis  r12,r2,-1
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Ravi Bangoria 
> ---
> Changes in v8:
>   - v7: https://lkml.org/lkml/2016/9/21/436
>   - Rebase to acme/perf/core
>   - No logical changes. (Cross arch annotate patches are in. This patch
> is for hardening annotate for powerpc.)
> 
>  tools/perf/util/annotate.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 4012b1d..ea7e0de 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -237,6 +237,9 @@ static int jump__parse(struct arch *arch __maybe_unused, 
> struct ins_operands *op
>  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>  struct ins_operands *ops)
>  {
> + if (!ops->target.addr)
> + return ins__raw_scnprintf(ins, bf, size, ops);
> +
>   return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
> ops->target.offset);
>  }
>  
> -- 
> 2.4.11


Re: [PATCH v2] of/irq: improve error report on irq discovery process failure

2016-12-05 Thread Guilherme G. Piccoli
On 12/05/2016 12:28 PM, Rob Herring wrote:
> On Mon, Dec 5, 2016 at 7:59 AM, Guilherme G. Piccoli
>  wrote:
>> On PowerPC machines some PCI slots might not have level triggered
>> interrupts capability (also know as level signaled interrupts),
>> leading of_irq_parse_pci() to complain by presenting error messages
>> on the kernel log - in this case, the properties "interrupt-map" and
>> "interrupt-map-mask" are not present on device's node in the device
>> tree.
>>
>> This patch introduces a different message for this specific case,
>> and also reduces its level from error to warning. Besides, we warn
>> (once) that possibly some PCI slots on the system have no level
>> triggered interrupts available.
>> We changed some error return codes too on function of_irq_parse_raw()
>> in order other failure's cases can be presented in a more precise way.
>>
>> Before this patch, when an adapter was plugged in a slot without level
>> interrupts capabilitiy on PowerPC, we saw a generic error message
>> like this:
>>
>> [54.239] pci 002d:70:00.0: of_irq_parse_pci() failed with rc=-22
>>
>> Now, with this applied, we see the following specific message:
>>
>> [16.154] pci 0014:60:00.1: of_irq_parse_pci: no interrupt-map found,
>> INTx interrupts not available
>>
>> Finally, we standardize the error path in of_irq_parse_raw() by always
>> taking the fail path instead of returning directly from the loop.
>>
>> Signed-off-by: Guilherme G. Piccoli 
>> ---
>>
>> v2:
>>   * Changed function return code to always return negative values;
> 
> Are you sure this is safe? This is tricky because of differing values
> of NO_IRQ (0 or -1).

Thanks Rob, but this is purely bad wording from myself. I'm sorry - I
meant to say that I changed only my positive return code (that was
suggested to be removed in the prior revision) to negative return code!

So, I changed only code I added myself in v1 =)


> 
>>   * Improved/simplified warning outputs;
>>   * Changed some return codes and some error paths in of_irq_parse_raw()
>> in order to be more precise/consistent;
> 
> This too could have some side effects on callers.
> 
> Not saying don't do these changes, just need some assurances this has
> been considered.

Thanks for your attention. I performed a quick investigation before
changing this, all the places that use the return values are just
getting "true/false" information from that, meaning they just are
comparing to 0 basically. So change -EINVAL to -ENOENT wouldn't hurt any
user of these return values, it'll only become more informative IMHO.

Now, regarding the only error path that was changed: for some reason,
this was the only place in which we didn't goto fail label in case of
failure - it was added by a legacy commit from Ben, dated from 2006:
006b64de60 ("[POWERPC] Make OF irq map code detect more error cases").
Then it was carried by Grant Likely's commit 7dc2e1134a ("of/irq: merge
irq mapping code"), 6-year old commit.
I wasn't able to imagine a scenario in which changing this would break
something; I believe the change improve consistency, but I'd remove it
if you or somebody else thinks it worth be removed.

Cheers,

Guilherme


> 
> Rob
> 



Re: [PATCH 1/1] serial/uuc_uart: Set shutdown timeout to CONFIG_HZ independent 2ms

2016-12-05 Thread Timur Tabi

Alexander Stein wrote:

-   schedule_timeout(2);
+   schedule_timeout(msecs_to_jiffies(2));


NACK.

So I don't remember why I wrote this code, but I don't think I was 
expecting it to be 2ms.  Instead, I think I just wanted it to be some 
delay, but I believed that schedule_timeout(1) was too short or would be 
"optimized" out somehow.


Note that right below this, I do:

if (qe_port->wait_closing) {
/* Wait a bit longer */
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(qe_port->wait_closing);
}

And wait_closing is a number of jiffies, so I knew that 
schedule_timeout() took jiffies as a parameter.


So I think I'm going to NACK this patch, since I believe I knew what I 
was doing when I wrote it five years ago.


[PATCH 1/1] serial/uuc_uart: Set shutdown timeout to CONFIG_HZ independent 2ms

2016-12-05 Thread Alexander Stein
schedule_timeout takes a timeout in jiffies resolution. So pass 2ms
as a converted jiffies value. This makes the timeout independent of
CONFIG_HZ.

Signed-off-by: Alexander Stein 
---
 drivers/tty/serial/ucc_uart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c
index 481eb29..c6c01a4 100644
--- a/drivers/tty/serial/ucc_uart.c
+++ b/drivers/tty/serial/ucc_uart.c
@@ -827,7 +827,7 @@ static void qe_uart_shutdown(struct uart_port *port)
break;
}
set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(2);
+   schedule_timeout(msecs_to_jiffies(2));
}
 
if (qe_port->wait_closing) {
-- 
2.7.3



[PATCH v8 3/3] perf annotate: Fix jump target outside of function address range

2016-12-05 Thread Ravi Bangoria
If jump target is outside of function range, perf is not handling it
correctly. Especially when target address is lesser than function start
address, target offset will be negative. But, target address declared
to be unsigned, converts negative number into 2's complement. See below
example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
lesser than function start address(34cf0).

34ac0 - 34cf0 = -0x230 = 0xfdd0

Objdump output:

  00034cf0 <__sigaction>:
  __GI___sigaction():
34cf0: lea-0x20(%rdi),%eax
34cf3: cmp-bashx1,%eax
34cf6: jbe34d00 <__sigaction+0x10>
34cf8: jmpq   34ac0 <__GI___libc_sigaction>
34cfd: nopl   (%rax)
34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8>
34d07: movl   -bashx16,%fs:(%rax)
34d0e: mov-bashx,%eax
34d13: retq

perf annotate before applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
v  jbe10
v  jmpq   fdd0
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

perf annotate after applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
v  jbe10
^  jmpq   34ac0 <__GI___libc_sigaction>
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

Signed-off-by: Ravi Bangoria 
---
Changes in v8:
  - v7: https://lkml.org/lkml/2016/9/21/436
  - Rebased to acme/perf/core.
  - No logical changes. (Cross arch annotate patches are in. This patch
is for hardening annotate.)

tools/perf/ui/browsers/annotate.c |  5 +++--
 tools/perf/util/annotate.c| 14 +-
 tools/perf/util/annotate.h|  5 +++--
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index ec7a30f..ba36aac 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
ui_browser__set_color(browser, color);
if (dl->ins.ops && dl->ins.ops->scnprintf) {
if (ins__is_jump(&dl->ins)) {
-   bool fwd = dl->ops.target.offset > 
(u64)dl->offset;
+   bool fwd = dl->ops.target.offset > dl->offset;
 
ui_browser__write_graph(browser, fwd ? 
SLSMG_DARROW_CHAR :

SLSMG_UARROW_CHAR);
@@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
 {
if (!dl || !dl->ins.ops || !ins__is_jump(&dl->ins)
|| !disasm_line__has_offset(dl)
-   || dl->ops.target.offset >= symbol__size(sym))
+   || dl->ops.target.offset < 0
+   || dl->ops.target.offset >= (s64)symbol__size(sym))
return false;
 
return true;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 590244e..c81a395 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -230,10 +230,12 @@ static int jump__parse(struct arch *arch __maybe_unused, 
struct ins_operands *op
else
ops->target.addr = strtoull(ops->raw, NULL, 16);
 
-   if (s++ != NULL)
+   if (s++ != NULL) {
ops->target.offset = strtoull(s, NULL, 16);
-   else
-   ops->target.offset = UINT64_MAX;
+   ops->target.offset_avail = true;
+   } else {
+   ops->target.offset_avail = false;
+   }
 
return 0;
 }
@@ -241,7 +243,7 @@ static int jump__parse(struct arch *arch __maybe_unused, 
struct ins_operands *op
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   if (!ops->target.addr)
+   if (!ops->target.addr || ops->target.offset < 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
@@ -1209,9 +1211,11 @@ static int symbol__parse_objdump_line(struct symbol 
*sym, struct map *map,
if (dl == NULL)
return -1;
 
-   if (dl->ops.target.offset == UINT64_MAX)
+   if (!disasm_line__has_offset(dl)) {
dl->ops.target.offset = dl->ops.target.addr -
map__rip_2objdump(map, sym->start);
+   dl->ops.target.offset_avail = true;
+   }
 
/* kcore has no symbols, so add the call target name */
if (dl->ins.ops && ins__is_call(&dl->ins) && !dl->ops.target.name) {
diff --git a/tools

[PATCH v8 1/3] perf annotate: Show raw form for jump instruction with indirect target

2016-12-05 Thread Ravi Bangoria
For jump instructions that does not include target address as direct
operand, show the original disassembled line for them. This is needed
for certain powerpc jump instructions that use target address in a
register (such as bctr, btar, ...).

Before:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr   ca2c
 stdr2,24(r1)
 addis  r12,r2,-1

After:
 ld r12,32088(r12)
 mtctr  r12
  v  bctr
 stdr2,24(r1)
 addis  r12,r2,-1

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Changes in v8:
  - v7: https://lkml.org/lkml/2016/9/21/436
  - Rebase to acme/perf/core
  - No logical changes. (Cross arch annotate patches are in. This patch
is for hardening annotate for powerpc.)

 tools/perf/util/annotate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4012b1d..ea7e0de 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -237,6 +237,9 @@ static int jump__parse(struct arch *arch __maybe_unused, 
struct ins_operands *op
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
+   if (!ops->target.addr)
+   return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
 }
 
-- 
2.4.11



[PATCH v8 2/3] perf annotate: Support jump instruction with target as second operand

2016-12-05 Thread Ravi Bangoria
Arch like powerpc has jump instructions that includes target address
as second operand. For example, 'bne  cr7,0xc00f6154'. Add
support for such instruction in perf annotate.

objdump o/p:
  c00f6140:   ld r9,1032(r31)
  c00f6144:   cmpdi  cr7,r9,0
  c00f6148:   bnecr7,0xc00f6154
  c00f614c:   ld r9,2312(r30)
  c00f6150:   stdr9,1032(r31)
  c00f6154:   ld r9,88(r31)

Corresponding perf annotate o/p:

Before patch:
 ld r9,1032(r31)
 cmpdi  cr7,r9,0
  v  bne3ff09f2c
 ld r9,2312(r30)
 stdr9,1032(r31)
  74:ld r9,88(r31)

After patch:
 ld r9,1032(r31)
 cmpdi  cr7,r9,0
  v  bne74
 ld r9,2312(r30)
 stdr9,1032(r31)
  74:ld r9,88(r31)

Signed-off-by: Ravi Bangoria 
---
Changes in v8:
  - v7: https://lkml.org/lkml/2016/9/21/436
  - Rebase to acme/perf/core
  - Little change in patch description.
  - No logical changes. (Cross arch annotate patches are in. This patch
is for hardening annotate for powerpc.)

 tools/perf/util/annotate.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index ea7e0de..590244e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -223,8 +223,12 @@ bool ins__is_call(const struct ins *ins)
 static int jump__parse(struct arch *arch __maybe_unused, struct ins_operands 
*ops, struct map *map __maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
+   const char *c = strchr(ops->raw, ',');
 
-   ops->target.addr = strtoull(ops->raw, NULL, 16);
+   if (c++ != NULL)
+   ops->target.addr = strtoull(c, NULL, 16);
+   else
+   ops->target.addr = strtoull(ops->raw, NULL, 16);
 
if (s++ != NULL)
ops->target.offset = strtoull(s, NULL, 16);
-- 
2.4.11



Re: [PATCH v2] of/irq: improve error report on irq discovery process failure

2016-12-05 Thread Rob Herring
On Mon, Dec 5, 2016 at 7:59 AM, Guilherme G. Piccoli
 wrote:
> On PowerPC machines some PCI slots might not have level triggered
> interrupts capability (also know as level signaled interrupts),
> leading of_irq_parse_pci() to complain by presenting error messages
> on the kernel log - in this case, the properties "interrupt-map" and
> "interrupt-map-mask" are not present on device's node in the device
> tree.
>
> This patch introduces a different message for this specific case,
> and also reduces its level from error to warning. Besides, we warn
> (once) that possibly some PCI slots on the system have no level
> triggered interrupts available.
> We changed some error return codes too on function of_irq_parse_raw()
> in order other failure's cases can be presented in a more precise way.
>
> Before this patch, when an adapter was plugged in a slot without level
> interrupts capabilitiy on PowerPC, we saw a generic error message
> like this:
>
> [54.239] pci 002d:70:00.0: of_irq_parse_pci() failed with rc=-22
>
> Now, with this applied, we see the following specific message:
>
> [16.154] pci 0014:60:00.1: of_irq_parse_pci: no interrupt-map found,
> INTx interrupts not available
>
> Finally, we standardize the error path in of_irq_parse_raw() by always
> taking the fail path instead of returning directly from the loop.
>
> Signed-off-by: Guilherme G. Piccoli 
> ---
>
> v2:
>   * Changed function return code to always return negative values;

Are you sure this is safe? This is tricky because of differing values
of NO_IRQ (0 or -1).

>   * Improved/simplified warning outputs;
>   * Changed some return codes and some error paths in of_irq_parse_raw()
> in order to be more precise/consistent;

This too could have some side effects on callers.

Not saying don't do these changes, just need some assurances this has
been considered.

Rob


[PATCH v2] of/irq: improve error report on irq discovery process failure

2016-12-05 Thread Guilherme G. Piccoli
On PowerPC machines some PCI slots might not have level triggered
interrupts capability (also know as level signaled interrupts),
leading of_irq_parse_pci() to complain by presenting error messages
on the kernel log - in this case, the properties "interrupt-map" and
"interrupt-map-mask" are not present on device's node in the device
tree.

This patch introduces a different message for this specific case,
and also reduces its level from error to warning. Besides, we warn
(once) that possibly some PCI slots on the system have no level
triggered interrupts available.
We changed some error return codes too on function of_irq_parse_raw()
in order other failure's cases can be presented in a more precise way.

Before this patch, when an adapter was plugged in a slot without level
interrupts capabilitiy on PowerPC, we saw a generic error message
like this:

[54.239] pci 002d:70:00.0: of_irq_parse_pci() failed with rc=-22

Now, with this applied, we see the following specific message:

[16.154] pci 0014:60:00.1: of_irq_parse_pci: no interrupt-map found,
INTx interrupts not available

Finally, we standardize the error path in of_irq_parse_raw() by always
taking the fail path instead of returning directly from the loop.

Signed-off-by: Guilherme G. Piccoli 
---

v2:
  * Changed function return code to always return negative values;
  * Improved/simplified warning outputs;
  * Changed some return codes and some error paths in of_irq_parse_raw()
in order to be more precise/consistent;

 drivers/of/irq.c| 19 ---
 drivers/of/of_pci_irq.c | 10 +-
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index 393fea8..9deee86 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -104,7 +104,7 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
const __be32 *match_array = initial_match_array;
const __be32 *tmp, *imap, *imask, dummy_imask[] = { [0 ... 
MAX_PHANDLE_ARGS] = ~0 };
u32 intsize = 1, addrsize, newintsize = 0, newaddrsize = 0;
-   int imaplen, match, i;
+   int imaplen, match, i, rc = -EINVAL;
 
 #ifdef DEBUG
of_print_phandle_args("of_irq_parse_raw: ", out_irq);
@@ -134,7 +134,7 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
pr_debug("of_irq_parse_raw: ipar=%s, size=%d\n", 
of_node_full_name(ipar), intsize);
 
if (out_irq->args_count != intsize)
-   return -EINVAL;
+   goto fail;
 
/* Look for this #address-cells. We have to implement the old linux
 * trick of looking for the parent here as some device-trees rely on it
@@ -153,8 +153,10 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
pr_debug(" -> addrsize=%d\n", addrsize);
 
/* Range check so that the temporary buffer doesn't overflow */
-   if (WARN_ON(addrsize + intsize > MAX_PHANDLE_ARGS))
+   if (WARN_ON(addrsize + intsize > MAX_PHANDLE_ARGS)) {
+   rc = -EFAULT;
goto fail;
+   }
 
/* Precalculate the match array - this simplifies match loop */
for (i = 0; i < addrsize; i++)
@@ -240,10 +242,11 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
newintsize, newaddrsize);
 
/* Check for malformed properties */
-   if (WARN_ON(newaddrsize + newintsize > 
MAX_PHANDLE_ARGS))
-   goto fail;
-   if (imaplen < (newaddrsize + newintsize))
+   if (WARN_ON(newaddrsize + newintsize > MAX_PHANDLE_ARGS)
+   || (imaplen < (newaddrsize + newintsize))) {
+   rc = -EFAULT;
goto fail;
+   }
 
imap += newaddrsize + newintsize;
imaplen -= newaddrsize + newintsize;
@@ -271,11 +274,13 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
ipar = newpar;
newpar = NULL;
}
+   rc = -ENOENT; /* No interrupt-map found */
+
  fail:
of_node_put(ipar);
of_node_put(newpar);
 
-   return -EINVAL;
+   return rc;
 }
 EXPORT_SYMBOL_GPL(of_irq_parse_raw);
 
diff --git a/drivers/of/of_pci_irq.c b/drivers/of/of_pci_irq.c
index 2306313..c175d9c 100644
--- a/drivers/of/of_pci_irq.c
+++ b/drivers/of/of_pci_irq.c
@@ -93,7 +93,15 @@ int of_irq_parse_pci(const struct pci_dev *pdev, struct 
of_phandle_args *out_irq
goto err;
return 0;
 err:
-   dev_err(&pdev->dev, "of_irq_parse_pci() failed with rc=%d\n", rc);
+   if (rc == -ENOENT) {
+   dev_warn(&pdev->dev,
+   "%s: no interrupt-map found, INTx interrupts not 
available\n",
+   __func__);
+   pr_warn_once("%s: possibly 

[PATCH] cxl: prevent read/write to AFU config space while AFU not configured

2016-12-05 Thread Andrew Donnellan
During EEH recovery, we deconfigure all AFUs whilst leaving the
corresponding vPHB and virtual PCI device in place.

If something attempts to interact with the AFU's PCI config space (e.g.
running lspci) after the AFU has been deconfigured and before it's
reconfigured, cxl_pcie_{read,write}_config() will read invalid values from
the deconfigured struct cxl_afu and proceed to Oops when they try to
dereference pointers that have been set to NULL during deconfiguration.

Add a rwsem to struct cxl_afu so we can prevent interaction with config
space while the AFU is deconfigured.

Reported-by: Pradipta Ghosh 
Suggested-by: Frederic Barrat 
Cc: sta...@vger.kernel.org # 4.4+
Signed-off-by: Andrew Donnellan 
---

Pradipta found this while doing testing for cxlflash. I've tested this 
patch and I'm satisfied that it solves the issue, but I've asked Pradipta 
to test it a bit further. 

---

 drivers/misc/cxl/cxl.h  |  2 ++
 drivers/misc/cxl/main.c |  3 ++-
 drivers/misc/cxl/pci.c  |  2 ++
 drivers/misc/cxl/vphb.c | 11 ++-
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index a144073..379c463 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -418,6 +418,8 @@ struct cxl_afu {
struct dentry *debugfs;
struct mutex contexts_lock;
spinlock_t afu_cntl_lock;
+   /* Used to block access to AFU config space while deconfigured */
+   struct rw_semaphore configured_rwsem;
 
/* AFU error buffer fields and bin attribute for sysfs */
u64 eb_len, eb_offset;
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 62e0dfb..2a6bf1d 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -268,7 +268,8 @@ struct cxl_afu *cxl_alloc_afu(struct cxl *adapter, int 
slice)
idr_init(&afu->contexts_idr);
mutex_init(&afu->contexts_lock);
spin_lock_init(&afu->afu_cntl_lock);
-
+   init_rwsem(&afu->configured_rwsem);
+   down_write(&afu->configured_rwsem);
afu->prefault_mode = CXL_PREFAULT_NONE;
afu->irqs_max = afu->adapter->user_irqs;
 
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index c4d79b5d..c7b2121 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1129,6 +1129,7 @@ static int pci_configure_afu(struct cxl_afu *afu, struct 
cxl *adapter, struct pc
if ((rc = cxl_native_register_psl_irq(afu)))
goto err2;
 
+   up_write(&afu->configured_rwsem);
return 0;
 
 err2:
@@ -1141,6 +1142,7 @@ static int pci_configure_afu(struct cxl_afu *afu, struct 
cxl *adapter, struct pc
 
 static void pci_deconfigure_afu(struct cxl_afu *afu)
 {
+   down_write(&afu->configured_rwsem);
cxl_native_release_psl_irq(afu);
if (afu->adapter->native->sl_ops->release_serr_irq)
afu->adapter->native->sl_ops->release_serr_irq(afu);
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 3519ace..d79aba5 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -88,9 +88,16 @@ static int cxl_pcie_config_info(struct pci_bus *bus, 
unsigned int devfn,
return PCIBIOS_DEVICE_NOT_FOUND;
 
afu = (struct cxl_afu *)phb->private_data;
+
+   /* Grab a reader lock on afu. We rely on the caller to release this! */
+   if (!down_read_trylock(&afu->configured_rwsem))
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
record = cxl_pcie_cfg_record(bus->number, devfn);
-   if (record > afu->crs_num)
+   if (record > afu->crs_num) {
+   up_read(&afu->configured_rwsem);
return PCIBIOS_DEVICE_NOT_FOUND;
+   }
 
*_afu = afu;
*_record = record;
@@ -127,6 +134,7 @@ static int cxl_pcie_read_config(struct pci_bus *bus, 
unsigned int devfn,
WARN_ON(1);
}
 
+   up_read(&afu->configured_rwsem); /* locked in cxl_pcie_config_info() */
if (rc)
return PCIBIOS_DEVICE_NOT_FOUND;
 
@@ -157,6 +165,7 @@ static int cxl_pcie_write_config(struct pci_bus *bus, 
unsigned int devfn,
WARN_ON(1);
}
 
+   up_read(&afu->configured_rwsem); /* locked in cxl_pcie_config_info() */
if (rc)
return PCIBIOS_SET_FAILED;
 
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [RFC PATCH] PCI: designware: add host_init() error handling

2016-12-05 Thread Joao Pinto
Às 11:51 AM de 12/2/2016, Srinivas Kandagatla escreveu:
> 
> 
> On 02/12/16 10:32, Joao Pinto wrote:
>>
>> Hi Srinivas,
>>
>> Às 11:51 AM de 12/1/2016, Srinivas Kandagatla escreveu:
>>>  drivers/pci/host/pci-dra7xx.c   |  4 +++-
>>>  drivers/pci/host/pci-exynos.c   |  4 +++-
>>>  drivers/pci/host/pci-imx6.c |  4 +++-
>>>  drivers/pci/host/pci-keystone.c |  4 +++-
>>>  drivers/pci/host/pci-layerscape.c   | 12 
>>>  drivers/pci/host/pcie-armada8k.c|  4 +++-
>>>  drivers/pci/host/pcie-designware-plat.c |  4 +++-
>>>  drivers/pci/host/pcie-designware.c  |  4 +++-
>>>  drivers/pci/host/pcie-designware.h  |  2 +-
>>>  drivers/pci/host/pcie-qcom.c|  6 --
>>>  drivers/pci/host/pcie-spear13xx.c   |  4 +++-
>>>  11 files changed, 37 insertions(+), 15 deletions(-)
>>>
>>
>> Thanks for the patch!
>>
>> In my opinion your idea is good but only qcom driver is able to detect 
>> failure
>> in the specific host init routine, all others have a 'return 0' even if
>> something not well init. I would recomend that we take this issue a bit 
>> further
>> and add the error checking to all specific pci drivers in order to make them 
>> as
>> robust as qcom'.
> I totally agree with you, I can give this a go in next version.

Sure, but I think it would be better to finish now since we are on top of the
task. I can help you if you need.

Thanks Joao

> 
> Thanks,
> srini
> 
>>
>> Thanks,
>> Joao
>>



[PATCH v8 6/6] powerpc/pv-qspinlock: Optimise native unlock path

2016-12-05 Thread Pan Xinhui
Avoid a function call under native version of qspinlock. On powerNV,
bafore applying this patch, every unlock is expensive. This small
optimizes enhance the performance.

We use static_key with jump_label which removes unnecessary loads of
lppaca and its stuff.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/qspinlock_paravirt.h | 18 +-
 arch/powerpc/kernel/paravirt.c|  4 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h 
b/arch/powerpc/include/asm/qspinlock_paravirt.h
index d87cda0..8d39446 100644
--- a/arch/powerpc/include/asm/qspinlock_paravirt.h
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -6,12 +6,14 @@
 #define _ASM_QSPINLOCK_PARAVIRT_H
 
 #include  
+#include  
 
 extern void pv_lock_init(void);
 extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __pv_init_lock_hash(void);
 extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+extern struct static_key_true sharedprocessor_key;
 
 static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
 {
@@ -20,7 +22,21 @@ static inline void pv_queued_spin_lock(struct qspinlock 
*lock, u32 val)
 
 static inline void pv_queued_spin_unlock(struct qspinlock *lock)
 {
-   pv_lock_op.unlock(lock);
+   /*
+* on powerNV and pSeries with jump_label, code will be
+*  PowerNV:pSeries:
+*  nop;b 2f;
+*  native unlock   2:
+*  pv unlock;
+* In this way, we can do unlock quick in native case.
+*
+* IF jump_label is not enabled, we fall back into
+* if condition, IOW, ld && cmp && bne.
+*/
+   if (static_branch_likely(&sharedprocessor_key))
+   native_queued_spin_unlock(lock);
+   else
+   pv_lock_op.unlock(lock);
 }
 
 static inline void pv_wait(u8 *ptr, u8 val)
diff --git a/arch/powerpc/kernel/paravirt.c b/arch/powerpc/kernel/paravirt.c
index e697b17..a0a000e 100644
--- a/arch/powerpc/kernel/paravirt.c
+++ b/arch/powerpc/kernel/paravirt.c
@@ -140,6 +140,9 @@ struct pv_lock_ops pv_lock_op = {
 };
 EXPORT_SYMBOL(pv_lock_op);
 
+struct static_key_true sharedprocessor_key = STATIC_KEY_TRUE_INIT;
+EXPORT_SYMBOL(sharedprocessor_key);
+
 void __init pv_lock_init(void)
 {
if (SHARED_PROCESSOR) {
@@ -149,5 +152,6 @@ void __init pv_lock_init(void)
pv_lock_op.unlock = __pv_queued_spin_unlock;
pv_lock_op.wait = __pv_wait;
pv_lock_op.kick = __pv_kick;
+   static_branch_disable(&sharedprocessor_key);
}
 }
-- 
2.4.11



[PATCH v8 5/6] powerpc: pSeries: Add pv-qspinlock build config/make

2016-12-05 Thread Pan Xinhui
pSeries run as a guest and might need pv-qspinlock.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/kernel/Makefile   | 1 +
 arch/powerpc/platforms/pseries/Kconfig | 8 
 2 files changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1925341..4780415 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_PPC_970_NAP) += idle_power4.o
 obj-$(CONFIG_PPC_P7_NAP)   += idle_book3s.o
 procfs-y   := proc_powerpc.o
 obj-$(CONFIG_PROC_FS)  += $(procfs-y)
+obj-$(CONFIG_PARAVIRT_SPINLOCKS)   += paravirt.o
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)  := rtas_pci.o
 obj-$(CONFIG_PPC_RTAS) += rtas.o rtas-rtc.o $(rtaspci-y-y)
 obj-$(CONFIG_PPC_RTAS_DAEMON)  += rtasd.o
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 8a87d06..0288c78 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -31,6 +31,14 @@ config ARCH_USE_QUEUED_SPINLOCKS
  fairlock.  It has shown a good performance improvement on x86 and 
also ppc
  especially in high contention cases.
 
+config PARAVIRT_SPINLOCKS
+   bool "Paravirtialization support for qspinlock"
+   depends on PPC_SPLPAR && QUEUED_SPINLOCKS
+   default y
+   help
+ If kernel need run as a guest then enable this option.
+ Generally it can let kernel have a better performace.
+
 config PPC_SPLPAR
depends on PPC_PSERIES
bool "Support for shared-processor logical partitions"
-- 
2.4.11



[PATCH v8 4/6] powerpc/pv-qspinlock: powerpc support pv-qspinlock

2016-12-05 Thread Pan Xinhui
The default pv-qspinlock uses qspinlock(native version of pv-qspinlock).
pv_lock initialization should be done in bootstage with irq disabled.
And if we run as a guest with powerKVM/pHyp shared_processor mode,
restore pv_lock_ops callbacks to pv-qspinlock(pv version) which makes
full use of virtualization.

There is a hash table, we store cpu number into it and the key is lock.
So everytime pv_wait can know who is the lock holder by searching the
lock. Also store the lock in a per_cpu struct, and remove it when we own
the lock. Then pv_wait can know which lock we are spinning on. But the
cpu in the hash table might not be the correct lock holder, as for
performace issue, we does not take care of hash conflict.

Also introduce spin_lock_holder, which tells who owns the lock now.
currently the only user is spin_unlock_wait.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/qspinlock.h   |  29 +++-
 arch/powerpc/include/asm/qspinlock_paravirt.h  |  36 +
 .../powerpc/include/asm/qspinlock_paravirt_types.h |  13 ++
 arch/powerpc/kernel/paravirt.c | 153 +
 arch/powerpc/lib/locks.c   |   8 +-
 arch/powerpc/platforms/pseries/setup.c |   5 +
 6 files changed, 241 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt_types.h
 create mode 100644 arch/powerpc/kernel/paravirt.c

diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index 4c89256..8fd6349 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -15,7 +15,7 @@ static inline u8 *__qspinlock_lock_byte(struct qspinlock 
*lock)
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }
 
-static inline void queued_spin_unlock(struct qspinlock *lock)
+static inline void native_queued_spin_unlock(struct qspinlock *lock)
 {
/* release semantics is required */
smp_store_release(__qspinlock_lock_byte(lock), 0);
@@ -27,6 +27,33 @@ static inline int queued_spin_is_locked(struct qspinlock 
*lock)
return atomic_read(&lock->val);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#include 
+/*
+ * try to know who is the lock holder, however it is not always true
+ * Return:
+ * -1, we did not know the lock holder.
+ * other value, likely is the lock holder.
+ */
+extern int spin_lock_holder(void *lock);
+
+static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+   pv_queued_spin_lock(lock, val);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   pv_queued_spin_unlock(lock);
+}
+#else
+#define spin_lock_holder(l) (-1)
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   native_queued_spin_unlock(lock);
+}
+#endif
+
 #include 
 
 /* we need override it as ppc has io_sync stuff */
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h 
b/arch/powerpc/include/asm/qspinlock_paravirt.h
new file mode 100644
index 000..d87cda0
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -0,0 +1,36 @@
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+#error "do not include this file"
+#endif
+
+#ifndef _ASM_QSPINLOCK_PARAVIRT_H
+#define _ASM_QSPINLOCK_PARAVIRT_H
+
+#include  
+
+extern void pv_lock_init(void);
+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_init_lock_hash(void);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
+{
+   pv_lock_op.lock(lock, val);
+}
+
+static inline void pv_queued_spin_unlock(struct qspinlock *lock)
+{
+   pv_lock_op.unlock(lock);
+}
+
+static inline void pv_wait(u8 *ptr, u8 val)
+{
+   pv_lock_op.wait(ptr, val);
+}
+
+static inline void pv_kick(int cpu)
+{
+   pv_lock_op.kick(cpu);
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt_types.h 
b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
new file mode 100644
index 000..83611ed
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+#define _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+
+struct pv_lock_ops {
+   void (*lock)(struct qspinlock *lock, u32 val);
+   void (*unlock)(struct qspinlock *lock);
+   void (*wait)(u8 *ptr, u8 val);
+   void (*kick)(int cpu);
+};
+
+extern struct pv_lock_ops pv_lock_op;
+
+#endif
diff --git a/arch/powerpc/kernel/paravirt.c b/arch/powerpc/kernel/paravirt.c
new file mode 100644
index 000..e697b17
--- /dev/null
+++ b/arch/powerpc/kernel/paravirt.c
@@ -0,0 +1,153 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundatio

[PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function

2016-12-05 Thread Pan Xinhui
Add two corresponding helper functions to support pv-qspinlock.

For normal use, __spin_yield_cpu will confer current vcpu slices to the
target vcpu(say, a lock holder). If target vcpu is not specified or it
is in running state, such conferging to lpar happens or not depends.

Because hcall itself will introduce latency and a little overhead. And we
do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
The second parameter *confer* can indicate such case.

__spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
current vcpu state.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/spinlock.h |  4 +++
 arch/powerpc/lib/locks.c| 59 +
 2 files changed, 63 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 954099e..6426bd5 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
 extern void __spin_yield(arch_spinlock_t *lock);
+extern void __spin_yield_cpu(int cpu, int confer);
+extern void __spin_wake_cpu(int cpu);
 extern void __rw_yield(arch_rwlock_t *lock);
 #else /* SPLPAR */
 #define __spin_yield(x)barrier()
+#define __spin_yield_cpu(x, y) barrier()
+#define __spin_wake_cpu(x) barrier()
 #define __rw_yield(x)  barrier()
 #define SHARED_PROCESSOR   0
 #endif
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6574626..bd872c9 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,65 @@
 #include 
 #include 
 
+/*
+ * confer our slices to a specified cpu and return. If it is in running state
+ * or cpu is -1, then we will check confer. If confer is NULL, we will return
+ * otherwise we confer our slices to lpar.
+ */
+void __spin_yield_cpu(int cpu, int confer)
+{
+   unsigned int holder_cpu = cpu, yield_count;
+
+   if (cpu == -1)
+   goto yield_to_lpar;
+
+   BUG_ON(holder_cpu >= nr_cpu_ids);
+   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   /* if cpu is running, confer slices to lpar conditionally*/
+   if ((yield_count & 1) == 0)
+   goto yield_to_lpar;
+
+   plpar_hcall_norets(H_CONFER,
+   get_hard_smp_processor_id(holder_cpu), yield_count);
+   return;
+
+yield_to_lpar:
+   if (confer)
+   plpar_hcall_norets(H_CONFER, -1, 0);
+}
+EXPORT_SYMBOL_GPL(__spin_yield_cpu);
+
+void __spin_wake_cpu(int cpu)
+{
+   unsigned int holder_cpu = cpu;
+
+   BUG_ON(holder_cpu >= nr_cpu_ids);
+   /*
+* NOTE: we should always do this hcall regardless of
+* the yield_count of the holder_cpu.
+* as thers might be a case like below;
+*  CPU 1   CPU 2
+*  yielded = true
+* if (yielded)
+*  __spin_wake_cpu()
+*  __spin_yield_cpu()
+*
+* So we might lose a wake if we check the yield_count and
+* return directly if the holder_cpu is running.
+* IOW. do NOT code like below.
+*  yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+*  if ((yield_count & 1) == 0)
+*  return;
+*
+* a PROD hcall marks the target_cpu proded, which cause the next cede
+* or confer called on the target_cpu invalid.
+*/
+   plpar_hcall_norets(H_PROD,
+   get_hard_smp_processor_id(holder_cpu));
+}
+EXPORT_SYMBOL_GPL(__spin_wake_cpu);
+
 #ifndef CONFIG_QUEUED_SPINLOCKS
 void __spin_yield(arch_spinlock_t *lock)
 {
-- 
2.4.11



[PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config

2016-12-05 Thread Pan Xinhui
pSeries/powerNV will use qspinlock from now on.

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/platforms/pseries/Kconfig | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb..8a87d06 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -23,6 +23,14 @@ config PPC_PSERIES
select PPC_DOORBELL
default y
 
+config ARCH_USE_QUEUED_SPINLOCKS
+   default y
+   bool "Enable qspinlock"
+   help
+ Enabling this option will let kernel use qspinlock which is a kind of
+ fairlock.  It has shown a good performance improvement on x86 and 
also ppc
+ especially in high contention cases.
+
 config PPC_SPLPAR
depends on PPC_PSERIES
bool "Support for shared-processor logical partitions"
-- 
2.4.11



[PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock

2016-12-05 Thread Pan Xinhui
This patch add basic code to enable qspinlock on powerpc. qspinlock is
one kind of fairlock implementation. And seen some performance improvement
under some scenarios.

queued_spin_unlock() release the lock by just one write of NULL to the
::locked field which sits at different places in the two endianness
system.

We override some arch_spin_XXX as powerpc has io_sync stuff which makes
sure the io operations are protected by the lock correctly.

There is another special case, see commit
2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")

Signed-off-by: Pan Xinhui 
---
 arch/powerpc/include/asm/qspinlock.h  | 66 +++
 arch/powerpc/include/asm/spinlock.h   | 31 +--
 arch/powerpc/include/asm/spinlock_types.h |  4 ++
 arch/powerpc/lib/locks.c  | 59 +++
 4 files changed, 147 insertions(+), 13 deletions(-)
 create mode 100644 arch/powerpc/include/asm/qspinlock.h

diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index 000..4c89256
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,66 @@
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include 
+
+#define SPIN_THRESHOLD (1 << 15)
+#define queued_spin_unlock queued_spin_unlock
+#define queued_spin_is_locked queued_spin_is_locked
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+
+static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
+{
+   return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   /* release semantics is required */
+   smp_store_release(__qspinlock_lock_byte(lock), 0);
+}
+
+static inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+   smp_mb();
+   return atomic_read(&lock->val);
+}
+
+#include 
+
+/* we need override it as ppc has io_sync stuff */
+#undef arch_spin_trylock
+#undef arch_spin_lock
+#undef arch_spin_lock_flags
+#undef arch_spin_unlock
+#define arch_spin_trylock arch_spin_trylock
+#define arch_spin_lock arch_spin_lock
+#define arch_spin_lock_flags arch_spin_lock_flags
+#define arch_spin_unlock arch_spin_unlock
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+   CLEAR_IO_SYNC;
+   return queued_spin_trylock(lock);
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+   CLEAR_IO_SYNC;
+   queued_spin_lock(lock);
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+   CLEAR_IO_SYNC;
+   queued_spin_lock(lock);
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+   SYNC_IO;
+   queued_spin_unlock(lock);
+}
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 8c1b913..954099e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
 }
 #endif
 
+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
+extern void __spin_yield(arch_spinlock_t *lock);
+extern void __rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+#define __spin_yield(x)barrier()
+#define __rw_yield(x)  barrier()
+#define SHARED_PROCESSOR   0
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include 
+#else
+
+#define arch_spin_relax(lock)  __spin_yield(lock)
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
  * held.  Conveniently, we have a word in the paca that holds this
  * value.
  */
-
-#if defined(CONFIG_PPC_SPLPAR)
-/* We only yield to the hypervisor if we are in shared processor mode */
-#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-extern void __spin_yield(arch_spinlock_t *lock);
-extern void __rw_yield(arch_rwlock_t *lock);
-#else /* SPLPAR */
-#define __spin_yield(x)barrier()
-#define __rw_yield(x)  barrier()
-#define SHARED_PROCESSOR   0
-#endif
-
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
CLEAR_IO_SYNC;
@@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
smp_mb();
 }
 
+#endif /* !CONFIG_QUEUED_SPINLOCKS */
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
@@ -338,7 +344,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
-#define arch_spin_relax(lock)  __spin_yield(lock)
 #define arch_read_relax(lock)  __rw_yield(lock)
 #define arch_writ

[PATCH v8 0/6] Implement qspinlock/pv-qspinlock on ppc

2016-12-05 Thread Pan Xinhui
Hi All,
  this is the fairlock patchset. You can apply them and build successfully.
patches are based on linux-next
  qspinlock can avoid waiter starved issue. It has about the same speed in
single-thread and it can be much faster in high contention situations
especially when the spinlock is embedded within the data structure to be
protected.

v7 -> v8:
add one patch to drop a function call under native qspinlock unlock.
Enabling qspinlock or not is a complier option now.
rebase onto linux-next(4.9-rc7)
v6 -> v7:
rebase onto 4.8-rc4
v1 -> v6:
too many details. snip. 

some benchmark result below

perf bench
these numbers are ops per sec, So the higher the better.
***
on pSeries with 32 vcpus, 32Gb memory, pHyp.

test case   | pv-qspinlock  |  qspinlock| 
current-spinlock

futex hash  | 618572| 552332| 553788
futex lock-pi   | 364   | 364   | 364
sched pipe  | 78984 | 76060 | 81454


unix bench:
these numbers are scores, So the higher the better.

on PowerNV with 16 cores(cpus) (smt off), 32Gb memory:
-
pv-qspinlock and qspinlock have very similar results because pv-qspinlock use 
native version
which is only having one callback overhead

test case   | pv-qspinlock and qspinlock | current-spinlock

Execl Throughput   761.1 761.4
File Copy 1024 bufsize 2000 maxblocks 1259.81286.6
File Copy 256 bufsize 500 maxblocks782.2 790.3
File Copy 4096 bufsize 8000 maxblocks 2741.52817.4
Pipe Throughput   1063.21036.7
Pipe-based Context Switching   284.7 281.1
Process Creation   679.6 649.1
Shell Scripts (1 concurrent)  1933.21922.9
Shell Scripts (8 concurrent)  5003.34899.8
System Call Overhead   900.6 896.8
 ==
System Benchmarks Index Score 1139.3 1133.0
--- 
-

***
on pSeries with 32 vcpus, 32Gb memory, pHyp.

test case   |   pv-qspinlock |  qspinlock | 
current-spinlock

Execl Throughput 877.1 891.2 872.8
File Copy 1024 bufsize 2000 maxblocks   1390.41399.21395.0
File Copy 256 bufsize 500 maxblocks  882.4 889.5 881.8
File Copy 4096 bufsize 8000 maxblocks   3112.33113.43121.7
Pipe Throughput 1095.81162.61158.5
Pipe-based Context Switching 194.9 192.7 200.7
Process Creation 518.4 526.4 509.1
Shell Scripts (1 concurrent)1401.91413.91402.2
Shell Scripts (8 concurrent)3215.63246.63229.1
System Call Overhead 833.2 892.4 888.1
  
System Benchmarks Index Score   1033.71052.51047.8


**
on pSeries with 32 vcpus, 16Gb memory, KVM.

test case   |   pv-qspinlock |  qspinlock | 
current-spinlock

Execl Throughput 497.4518.7 497.8
File Copy 1024 bufsize 2000 maxblocks   1368.8   1390.11343.3
File Copy 256 bufsize 500 maxblocks  857.7859.8 831.4
File Copy 4096 bufsize 8000 maxblocks   2851.7   2838.12785.5
Pipe Throughput 1221.9   1265.31250.4
Pipe-based Context S

[GIT PULL] Please pull powerpc/linux.git powerpc-4.9-7 tag

2016-12-05 Thread Michael Ellerman
Hi Linus,

Please pull what is hopefully the last batch of powerpc fixes for 4.9.

The following changes since commit 984d7a1ec67ce3a46324fa4bcb4c745bbc266cf2:

  powerpc/mm: Fixup kernel read only mapping (2016-11-25 14:18:25 +1100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.9-7

for you to fetch changes up to dadc4a1bb9f0095343ed9dd4f1d9f3825d7b3e45:

  powerpc/64: Fix placement of .text to be immediately following .head.text 
(2016-12-01 22:26:31 +1100)


powerpc fixes for 4.9 #7

Four fixes, the first for code we merged this cycle and three that are also
going to stable:

 - On 64-bit Book3E we were not placing the .text section where we said we would
   in the asm.
 - We broke building the boot wrapper on some 32-bit toolchains.
 - Lazy icache flushing was broken on pre-POWER5 machines.
 - One of the error paths in our EEH code would lead to a deadlock.

Thanks to:
  Andrew Donnellan, Ben Hutchings, Benjamin Herrenschmidt, Nicholas Piggin.


Andrew Donnellan (1):
  powerpc/eeh: Fix deadlock when PE frozen state can't be cleared

Ben Hutchings (1):
  powerpc/boot: Fix build failure in 32-bit boot wrapper

Benjamin Herrenschmidt (1):
  powerpc/mm: Fix lazy icache flush on pre-POWER5

Nicholas Piggin (1):
  powerpc/64: Fix placement of .text to be immediately following .head.text

 arch/powerpc/boot/Makefile| 3 ++-
 arch/powerpc/boot/opal.c  | 2 +-
 arch/powerpc/kernel/eeh_driver.c  | 4 +++-
 arch/powerpc/kernel/vmlinux.lds.S | 9 +
 arch/powerpc/mm/hash64_4k.c   | 2 +-
 arch/powerpc/mm/hash64_64k.c  | 4 ++--
 6 files changed, 18 insertions(+), 6 deletions(-)


signature.asc
Description: PGP signature


Re: [RFC][PATCH] powerpc/oops: Provide disassembly on OOPS

2016-12-05 Thread Michael Ellerman
Balbir Singh  writes:

> This patch is tied to xmon, it can be refactored out
> better later if required. The idea is to provide
> disassembly using xmon so that when we get an OOPS
> we see something like the following below
>
> ...
> NIP [c063a230] lkdtm_WARNING+0x0/0x10
> LR [c063986c] lkdtm_do_action+0x3c/0x80
> Call Trace:
> [c000ef1bbbc0] [c09a5804] printk+0x50/0x64 (unreliable)
> [c000ef1bbbe0] [c0639cc0] direct_entry+0x100/0x1b0
> [c000ef1bbc70] [c043eb4c] full_proxy_write+0x8c/0x100
> [c000ef1bbcd0] [c028fe24] __vfs_write+0x54/0x1c0
> [c000ef1bbd80] [c0291138] vfs_write+0xc8/0x260
> [c000ef1bbdd0] [c0292c98] SyS_write+0x78/0x120
> [c000ef1bbe30] [c000b220] system_call+0x38/0xfc
> Instruction dump:
> c063a200  38630618  addir3,r3,1560
> c063a204  f8010010  std r0,16(r1)
> c063a208  f821ffa1  stdur1,-96(r1)
> c063a20c  4836b0f1  bl  c09a52fc# panic+0x8/0x304
> c063a210  6000  nop
> c063a214  6000  nop
> c063a218  6000  nop
> c063a21c  6042  ori r2,r2,0
> c063a220  0fe0  twi 31,r0,0
> c063a224  6000  nop
> c063a228  6000  nop
> c063a22c  6042  ori r2,r2,0
> c063a230  0fe0  twi 31,r0,0
>   

> NOTE: That the <> around the instruction that caused the
> OOPS is now replaced with a  following the disassembly
> in the output.

I think I'd prefer:
  c063a22c  6042  ori r2,r2,0
  c063a230  0fe0  twi 31,r0,0   # <- nip
  c063a234  4e800020  blr

Or maybe:
  c063a22c  6042  ori r2,r2,0
  c063a230  0fe0  twi 31,r0,0   # Faulting instruction
  c063a234  4e800020  blr

?

> An issue was raised if as to whether calling
> xmon during OOPS can cause further issues? xmon has been used
> robustly in the past to look at OOPS and disassemble them
> and moreover the OOPS output is at the end, so we've already
> captured the GPR's and stack trace already.

Once it's refactored properly you won't be calling xmon at all, you'll
just be calling the disassembly code.

The problem we have is that currently print_insn_powerpc() is built
using nonstdio.h, which means it is calling xmon_printf(), and that's
not what we want to do for an oops. An oops is printed using printk. So
that will need more work.

> NOTE2: If CONFIG_XMON_DISASSEMBLY is turned off, the disassembly
> will be printed as a list of .long(s). It is highly recommended
> to have both CONFIG_XMON_DISASSEMBLY and CONFIG_XMON for usable
> output.

So once it's refactored CONFIG_XMON_DISASSEMBLY would become
CONFIG_PPC_DISASSEMBLY (or something like that), and there'd be no
dependency on CONFIG_XMON.

And so there'd be no fallback to printing longs, you'd either print the
old compact format, or the disassembled format.

cheers


Re: [PATCH RFC 3/3] powerpc/64: Enable use of radix MMU under hypervisor on POWER9

2016-12-05 Thread Paul Mackerras
On Mon, Dec 05, 2016 at 07:55:32PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2016-12-05 at 19:04 +1100, Paul Mackerras wrote:
> > +   vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
> > +   if (!vec5 || size <= OV5_INDX(OV5_MMU_RADIX_300))
> > +   return;
> 
> Could be bike shedding but shouldn't we first check if
> we are in an LPAR and bail out of we are not, then
> if we *are* and the above size is too small to contain
> the ARCH 3.00 options, also disable radix as obviously
> the hypervisor doesn't know about it ?

This is *very* early on, so early that we haven't yet decided what
platform we're on.  If we're not in an LPAR then we won't have a
/chosen/ibm-architecture-vec-5 property.  Any hypervisor that is too
old to have that property will also be too old to set the radix
bit in the ibm,pa-features property, so we won't use radix.

If we do have the property but it's short then yes that's a good
indication that the hypervisor can't do radix, though in that case
it's strange that it set the radix bit in the ibm,pa-features property
(which must have been set otherwise we wouldn't have got here).

I'll do a new patch.

> > +   if (!(vec5[OV5_INDX(OV5_MMU_RADIX_300)] & 
> > OV5_FEAT(OV5_MMU_RADIX_300)))
> > +   /* Hypervisor doesn't support radix */
> > +   cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> > +}
> > +

Paul.


Re: [PATCH RFC 3/3] powerpc/64: Enable use of radix MMU under hypervisor on POWER9

2016-12-05 Thread Benjamin Herrenschmidt
On Mon, 2016-12-05 at 19:04 +1100, Paul Mackerras wrote:
> +   vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
> +   if (!vec5 || size <= OV5_INDX(OV5_MMU_RADIX_300))
> +   return;

Could be bike shedding but shouldn't we first check if
we are in an LPAR and bail out of we are not, then
if we *are* and the above size is too small to contain
the ARCH 3.00 options, also disable radix as obviously
the hypervisor doesn't know about it ?

> +   if (!(vec5[OV5_INDX(OV5_MMU_RADIX_300)] & 
> OV5_FEAT(OV5_MMU_RADIX_300)))
> +   /* Hypervisor doesn't support radix */
> +   cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> +}
> +



Re: [PATCH] PPC: sstep.c: Add modsw, moduw instruction emulation

2016-12-05 Thread Naveen N. Rao
On 2016/12/04 10:25PM, PrasannaKumar Muralidharan wrote:
> Add modsw and moduw instruction emulation support to analyse_instr.
> 
> Signed-off-by: PrasannaKumar Muralidharan 

Hi Prasanna,
Thanks for the patch! A few minor comments below...

> ---
>  arch/powerpc/lib/sstep.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 9c78a9c..5acef72 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1148,6 +1148,15 @@ int __kprobes analyse_instr(struct instruction_op *op, 
> struct pt_regs *regs,
>   (int) regs->gpr[rb];
>   goto arith_done;
> 
> + case 779:   /* modsw */
> + regs->gpr[rd] = (int) regs->gpr[ra] %
> + (int) regs->gpr[rb];
> + goto arith_done;

Since these instructions don't update CR, you can directly goto 
instr_done.

> +
> + case 267:   /* moduw */

Please move this case further up so that the extended opcodes are in 
numerical order.


- Naveen



[PATCH RFC 3/3] powerpc/64: Enable use of radix MMU under hypervisor on POWER9

2016-12-05 Thread Paul Mackerras
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).

Then we need to check whether the hypervisor agreed to us using
radix.  We need to do this very early on in the kernel boot process
before any of the MMU initialization is done.  If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.

Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  2 ++
 arch/powerpc/include/asm/hvcall.h| 11 +++
 arch/powerpc/include/asm/prom.h  |  9 +
 arch/powerpc/kernel/prom_init.c  | 18 +-
 arch/powerpc/mm/init_64.c| 28 
 arch/powerpc/mm/pgtable-radix.c  |  2 ++
 arch/powerpc/platforms/pseries/lpar.c| 29 +
 7 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 8afb0e0..e8cbdc0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -138,5 +138,7 @@ static inline void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 extern int (*register_process_table)(unsigned long base, unsigned long 
page_size,
 unsigned long tbl_size);
 
+extern void radix_init_pseries(void);
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 77ff1ba..54d11b3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -276,6 +276,7 @@
 #define H_GET_MPP_X0x314
 #define H_SET_MODE 0x31C
 #define H_CLEAR_HPT0x358
+#define H_REGISTER_PROC_TBL0x37C
 #define H_SIGNAL_SYS_RESET 0x380
 #define MAX_HCALL_OPCODE   H_SIGNAL_SYS_RESET
 
@@ -313,6 +314,16 @@
 #define H_SIGNAL_SYS_RESET_ALL_OTHERS  -2
 /* >= 0 values are CPU number */
 
+/* Flag values used in H_REGISTER_PROC_TBL hcall */
+#define PROC_TABLE_OP_MASK 0x18
+#define PROC_TABLE_DEREG   0x10
+#define PROC_TABLE_NEW 0x18
+#define PROC_TABLE_TYPE_MASK   0x06
+#define PROC_TABLE_HPT_SLB 0x00
+#define PROC_TABLE_HPT_PT  0x02
+#define PROC_TABLE_RADIX   0x04
+#define PROC_TABLE_GTSE0x01
+
 #ifndef __ASSEMBLY__
 
 /**
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index e6d83d0..8af2546 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -121,6 +121,8 @@ struct of_drconf_cell {
 #define OV1_PPC_2_06   0x02/* set if we support PowerPC 2.06 */
 #define OV1_PPC_2_07   0x01/* set if we support PowerPC 2.07 */
 
+#define OV1_PPC_3_00   0x80/* set if we support PowerPC 3.00 */
+
 /* Option vector 2: Open Firmware options supported */
 #define OV2_REAL_MODE  0x20/* set if we want OF in real mode */
 
@@ -155,6 +157,13 @@ struct of_drconf_cell {
 #define OV5_PFO_HW_842 0x1140  /* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR0x1120  /* PFO Encryption Accelerator */
 #define OV5_SUB_PROCESSORS 0x1501  /* 1,2,or 4 Sub-Processors supported */
+#define OV5_XIVE_EXPLOIT   0x1701  /* XIVE exploitation supported */
+#define OV5_MMU_RADIX_300  0x1880  /* ISA v3.00 radix MMU supported */
+#define OV5_MMU_HASH_300   0x1840  /* ISA v3.00 hash MMU supported */
+#define OV5_MMU_SEGM_RADIX 0x1820  /* radix mode (no segmentation) */
+#define OV5_MMU_PROC_TBL   0x1810  /* hcall selects SLB or proc table */
+#define OV5_MMU_SLB0x1800  /* always use SLB */
+#define OV5_MMU_GTSE   0x1808  /* Guest translation shootdown */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX  0x02/* Linux is our OS */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ec47a93..358d43f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -649,6 +649,7 @@ static void __init early_cmdline_parse(void)
 struct option_vector1 {
u8 byte1;
u8 arch_versions;
+   u8 arch_versions3;
 } __packed;
 
 struct option_vector2 {
@@ -691,6 +692,9 @@ struct option_vector5 {
u8 reserved2;
__be16 reserved3;
u8 subprocessors;
+   u8 byte22;
+   u8 intarch;
+   u8 mmu;
 } __packed;
 
 struct option_vector6 {
@@ -700,7 +704,7 @@ struct option_vector6 {
 } __packed;
 
 struct ibm_arch_vec {
-   struct { u32 mask, val; } pvrs[10];
+  

[PATCH RFC 2/3] powerpc/64: Always enable radix support for 64-bit Book 3S kernels

2016-12-05 Thread Paul Mackerras
This removes the ability for the user to choose whether or not to
include support for the radix MMU in kernels built to run on 64-bit
Book 3S machines.  Excluding radix support saves only about 25kiB
of text and 13kiB of data, a total of little over half a page.
Having the option expands the space of option combinations that
need to be tested, which is an ongoing burden on developers,
as well as increasing the number of #ifdefs in the code.
Given that the space savings are small, let's remove the option.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/platforms/Kconfig.cputype | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index ca2da30..52a71ca 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -333,13 +333,8 @@ config PPC_STD_MMU_64
depends on PPC_STD_MMU && PPC64
 
 config PPC_RADIX_MMU
-   bool "Radix MMU Support"
+   def_bool y
depends on PPC_BOOK3S_64
-   default y
-   help
- Enable support for the Power ISA 3.0 Radix style MMU. Currently this
- is only implemented by IBM Power9 CPUs, if you don't have one of them
- you can probably disable this.
 
 config PPC_MMU_NOHASH
def_bool y
-- 
2.7.4



[PATCH 1/3] powerpc/64: Fixes for the ibm,client-architecture-support options

2016-12-05 Thread Paul Mackerras
This fixes the values for some of the option vector 5 bits in
the ibm,client-architecture-support vector 5.  The "platform
facilities options" bits are in byte 17 not byte 14, so the
upper 8 bits of their definitions need to be 0x11 not 0x0E.
The "sub processor support" option is in byte 21 not byte 15.

When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/prom.h   | 8 
 arch/powerpc/platforms/pseries/firmware.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 5e57705..e6d83d0 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -151,10 +151,10 @@ struct of_drconf_cell {
 #define OV5_XCMO   0x0440  /* Page Coalescing */
 #define OV5_TYPE1_AFFINITY 0x0580  /* Type 1 NUMA affinity */
 #define OV5_PRRN   0x0540  /* Platform Resource Reassignment */
-#define OV5_PFO_HW_RNG 0x0E80  /* PFO Random Number Generator */
-#define OV5_PFO_HW_842 0x0E40  /* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR0x0E20  /* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS 0x0F01  /* 1,2,or 4 Sub-Processors supported */
+#define OV5_PFO_HW_RNG 0x1180  /* PFO Random Number Generator */
+#define OV5_PFO_HW_842 0x1140  /* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR0x1120  /* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS 0x1501  /* 1,2,or 4 Sub-Processors supported */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX  0x02/* Linux is our OS */
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index ea7f09b..7d67623 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -126,7 +126,7 @@ static void __init fw_vec5_feature_init(const char *vec5, 
unsigned long len)
index = OV5_INDX(vec5_fw_features_table[i].feature);
feat = OV5_FEAT(vec5_fw_features_table[i].feature);
 
-   if (vec5[index] & feat)
+   if (index < len && (vec5[index] & feat))
powerpc_firmware_features |=
vec5_fw_features_table[i].val;
}
-- 
2.7.4