date:20170322

[PATCH 5/5] KVM: PPC: Add MMIO emulation for remaining floating-point instructions

2017-03-22 Thread Paul Mackerras

For completeness, this adds emulation of the lfiwax and lfiwzx
instructions.  With this, all floating-point load and store instructions
as of Power ISA V2.07 are emulated.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/ppc-opcode.h |  2 ++
 arch/powerpc/kvm/emulate_loadstore.c  | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 738bac1..73f06f4 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -122,6 +122,8 @@
 #define OP_31_XOP_STFDX 727
 #define OP_31_XOP_STFDUX759
 #define OP_31_XOP_LHBRX 790
+#define OP_31_XOP_LFIWAX855
+#define OP_31_XOP_LFIWZX887
 #define OP_31_XOP_STHBRX918
 #define OP_31_XOP_STFIWX983
 
diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index f10ba0c..af83353 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -270,6 +270,20 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
break;
 
+   case OP_31_XOP_LFIWAX:
+   if (kvmppc_check_fp_disabled(vcpu))
+   return EMULATE_DONE;
+   emulated = kvmppc_handle_loads(run, vcpu,
+   KVM_MMIO_REG_FPR|rt, 4, 1);
+   break;
+
+   case OP_31_XOP_LFIWZX:
+   if (kvmppc_check_fp_disabled(vcpu))
+   return EMULATE_DONE;
+   emulated = kvmppc_handle_load(run, vcpu,
+   KVM_MMIO_REG_FPR|rt, 4, 1);
+   break;
+
case OP_31_XOP_STFSX:
if (kvmppc_check_fp_disabled(vcpu))
return EMULATE_DONE;
-- 
2.7.4

[PATCH 4/5] KVM: PPC: Emulation for more integer loads and stores

2017-03-22 Thread Paul Mackerras

This adds emulation for the following integer loads and stores,
thus enabling them to be used in a guest for accessing emulated
MMIO locations.

- lhaux
- lwaux
- lwzux
- ldu
- lwa
- stdux
- stwux
- stdu
- ldbrx
- stdbrx

Previously, most of these would cause an emulation failure exit to
userspace, though ldu and lwa got treated incorrectly as ld, and
stdu got treated incorrectly as std.

This also tidies up some of the formatting and updates the comment
listing instructions that still need to be implemented.

With this, all integer loads and stores that are defined in the Power
ISA v2.07 are emulated, except for those that are permitted to trap
when used on cache-inhibited or write-through mappings (and which do
in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx,
lq/stq, and l[bhwdq]arx/st[bhwdq]cx.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/ppc-opcode.h |   5 ++
 arch/powerpc/kvm/emulate_loadstore.c  | 135 ++
 2 files changed, 91 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 94e7df2..738bac1 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -96,6 +96,8 @@
 #define OP_31_XOP_LBZX  87
 #define OP_31_XOP_STDX  149
 #define OP_31_XOP_STWX  151
+#define OP_31_XOP_STDUX 181
+#define OP_31_XOP_STWUX 183
 #define OP_31_XOP_STBX  215
 #define OP_31_XOP_LBZUX 119
 #define OP_31_XOP_STBUX 247
@@ -104,13 +106,16 @@
 #define OP_31_XOP_MFSPR 339
 #define OP_31_XOP_LWAX  341
 #define OP_31_XOP_LHAX  343
+#define OP_31_XOP_LWAUX 373
 #define OP_31_XOP_LHAUX 375
 #define OP_31_XOP_STHX  407
 #define OP_31_XOP_STHUX 439
 #define OP_31_XOP_MTSPR 467
 #define OP_31_XOP_DCBI  470
+#define OP_31_XOP_LDBRX 532
 #define OP_31_XOP_LWBRX 534
 #define OP_31_XOP_TLBSYNC   566
+#define OP_31_XOP_STDBRX660
 #define OP_31_XOP_STWBRX662
 #define OP_31_XOP_STFSX663
 #define OP_31_XOP_STFSUX695
diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index a0f27a3..f10ba0c 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -58,18 +58,14 @@ static bool kvmppc_check_vsx_disabled(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_VSX */
 
-/* XXX to do:
- * lhax
- * lhaux
- * lswx
- * lswi
- * stswx
- * stswi
- * lha
- * lhau
- * lmw
- * stmw
+/*
+ * XXX to do:
+ * lfiwax, lfiwzx
+ * vector loads and stores
  *
+ * Instructions that trap when used on cache-inhibited mappings
+ * are not emulated here: multiple and string instructions,
+ * lq/stq, and the load-reserve/store-conditional instructions.
  */
 int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 {
@@ -110,6 +106,11 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
break;
 
+   case OP_31_XOP_LWZUX:
+   emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
+   kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
+   break;
+
case OP_31_XOP_LBZX:
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
@@ -121,26 +122,34 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 
case OP_31_XOP_STDX:
emulated = kvmppc_handle_store(run, vcpu,
-  kvmppc_get_gpr(vcpu, rs),
-   8, 1);
+   kvmppc_get_gpr(vcpu, rs), 8, 1);
+   break;
+
+   case OP_31_XOP_STDUX:
+   emulated = kvmppc_handle_store(run, vcpu,
+   kvmppc_get_gpr(vcpu, rs), 8, 1);
+   kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
break;
 
case OP_31_XOP_STWX:
emulated = kvmppc_handle_store(run, vcpu,
-  kvmppc_get_gpr(vcpu, rs),
-  4, 1);
+   kvmppc_get_gpr(vcpu, rs), 4, 1);
+   break;
+
+   case OP_31_XOP_STWUX:
+   emulated = kvmppc_handle_store(run, vcpu,
+   kvmppc_get_gpr(vcpu, rs), 4, 1);
+   kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
break;
 
case OP_31_XOP_STBX:
emulated = kvmppc_handle_store(run, vcpu,
-  kvmppc_get_gpr(vcpu, rs),
-

[PATCH 2/5] KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions

2017-03-22 Thread Paul Mackerras

From: Bin Lu 

This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.

The instructions that this adds emulation for are:

- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x

[pau...@ozlabs.org - some cleanups, fixes and rework, make it
 compile for Book E]

Signed-off-by: Bin Lu 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/disassemble.h |   5 +
 arch/powerpc/include/asm/kvm_host.h|  23 +++
 arch/powerpc/include/asm/kvm_ppc.h |   7 +
 arch/powerpc/include/asm/ppc-opcode.h  |  50 +
 arch/powerpc/kvm/Makefile  |   2 +-
 arch/powerpc/kvm/emulate_loadstore.c   | 335 -
 arch/powerpc/kvm/powerpc.c | 317 ++-
 7 files changed, 731 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 4852e84..c0a5505 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -87,6 +87,11 @@ static inline unsigned int get_oc(u32 inst)
return (inst >> 11) & 0x7fff;
 }
 
+static inline unsigned int get_tx_or_sx(u32 inst)
+{
+   return (inst) & 0x1;
+}
+
 #define IS_XFORM(inst) (get_op(inst)  == 31)
 #define IS_DSFORM(inst)(get_op(inst) >= 56)
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7bba8f4..201438b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -441,6 +441,11 @@ struct mmio_hpte_cache {
unsigned int index;
 };
 
+#define KVMPPC_VSX_COPY_NONE   0
+#define KVMPPC_VSX_COPY_WORD   1
+#define KVMPPC_VSX_COPY_DWORD  2
+#define KVMPPC_VSX_COPY_DWORD_LOAD_DUMP3
+
 struct openpic;
 
 struct kvm_vcpu_arch {
@@ -644,6 +649,21 @@ struct kvm_vcpu_arch {
u8 io_gpr; /* GPR used as IO source/target */
u8 mmio_host_swabbed;
u8 mmio_sign_extend;
+   /* conversion between single and double precision */
+   u8 mmio_sp64_extend;
+   /*
+* Number of simulations for vsx.
+* If we use 2*8bytes to simulate 1*16bytes,
+* then the number should be 2 and
+* mmio_vsx_copy_type=KVMPPC_VSX_COPY_DWORD.
+* If we use 4*4bytes to simulate 1*16bytes,
+* the number should be 4 and
+* mmio_vsx_copy_type=KVMPPC_VSX_COPY_WORD.
+*/
+   u8 mmio_vsx_copy_nums;
+   u8 mmio_vsx_offset;
+   u8 mmio_vsx_copy_type;
+   u8 mmio_vsx_tx_sx_enabled;
u8 osi_needed;
u8 osi_enabled;
u8 papr_enabled;
@@ -732,6 +752,8 @@ struct kvm_vcpu_arch {
 };
 
 #define VCPU_FPR(vcpu, i)  (vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
+#define VCPU_VSX_FPR(vcpu, i, j)   ((vcpu)->arch.fp.fpr[i][j])
+#define VCPU_VSX_VR(vcpu, i)   ((vcpu)->arch.vr.vr[i])
 
 /* Values for vcpu->arch.state */
 #define KVMPPC_VCPU_NOTREADY   0
@@ -745,6 +767,7 @@ struct kvm_vcpu_arch {
 #define KVM_MMIO_REG_FPR   0x0020
 #define KVM_MMIO_REG_QPR   0x0040
 #define KVM_MMIO_REG_FQPR  0x0060
+#define KVM_MMIO_REG_VSX   0x0080
 
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 4f1f22f..bbecec4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -78,9 +78,15 @@ extern int kvmppc_handle_load(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 extern int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int rt, unsigned int bytes,
   int is_default_endian);
+extern int kvmppc_handle_vsx_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
+   unsigned int rt, unsigned int bytes,
+   int is_default_endian, int mmio_sign_extend);
 extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
   u64 val, unsigned int bytes,
   int is_default_endian);
+extern int kvmppc_handle_vsx_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
+   int rs, unsigned int bytes,
+   int is_default_endian);
 
 extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 enum instruction_type type, u32 *inst);
@@ -243,6 +249,7 @@ union kvmppc_one_reg {
u64 dval;
vector128 vval;
u64 vsxval[2];
+   u32 vsx32val[4];

[PATCH 3/5] KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed)

2017-03-22 Thread Paul Mackerras

From: Alexey Kardashevskiy 

This adds missing stdx emulation for emulated MMIO accesses by KVM
guests.  This allows the Mellanox mlx5_core driver from recent kernels
to work when MMIO emulation is enforced by userspace.

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/ppc-opcode.h | 1 +
 arch/powerpc/kvm/emulate_loadstore.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 1e37c3c..94e7df2 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -94,6 +94,7 @@
 #define OP_31_XOP_TRAP_64   68
 #define OP_31_XOP_DCBF  86
 #define OP_31_XOP_LBZX  87
+#define OP_31_XOP_STDX  149
 #define OP_31_XOP_STWX  151
 #define OP_31_XOP_STBX  215
 #define OP_31_XOP_LBZUX 119
diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 9cda1b9..a0f27a3 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -119,6 +119,12 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
break;
 
+   case OP_31_XOP_STDX:
+   emulated = kvmppc_handle_store(run, vcpu,
+  kvmppc_get_gpr(vcpu, rs),
+   8, 1);
+   break;
+
case OP_31_XOP_STWX:
emulated = kvmppc_handle_store(run, vcpu,
   kvmppc_get_gpr(vcpu, rs),
-- 
2.7.4

[PATCH 1/5] KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts

2017-03-22 Thread Paul Mackerras

This provides functions that can be used for generating interrupts
indicating that a given functional unit (floating point, vector, or
VSX) is unavailable.  These functions will be used in instruction
emulation code.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_ppc.h |  3 +++
 arch/powerpc/kvm/book3s.c  | 18 ++
 arch/powerpc/kvm/booke.c   |  5 +
 3 files changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index dd11c4c..4f1f22f 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -132,6 +132,9 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags);
+extern void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_vec_unavail(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_vsx_unavail(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index b6b5c18..0ff0d07 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -197,6 +197,24 @@ void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, 
ulong flags)
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_program);
 
+void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu)
+{
+   /* might as well deliver this straight away */
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, 0);
+}
+
+void kvmppc_core_queue_vec_unavail(struct kvm_vcpu *vcpu)
+{
+   /* might as well deliver this straight away */
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_ALTIVEC, 0);
+}
+
+void kvmppc_core_queue_vsx_unavail(struct kvm_vcpu *vcpu)
+{
+   /* might as well deliver this straight away */
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_VSX, 0);
+}
+
 void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
 {
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 0514cbd..3c296c2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -300,6 +300,11 @@ void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, 
ulong esr_flags)
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
 }
 
+void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu)
+{
+   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_FP_UNAVAIL);
+}
+
 void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
 {
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DECREMENTER);
-- 
2.7.4

[PATCH 0/5] KVM: PPC: Improve MMIO emulation

2017-03-22 Thread Paul Mackerras

Guests accessing emulated MMIO can do so using a wide variety of load
and store instructions on PPC machines.  However, KVM currently only
knows about a subset of the load and store instructions available.
This patchset expands the set of load and store instructions that KVM
can emulate to include all of the integer loads and stores (except
those that trap when used on cache-inhibited mappings), all of the
floating-point loads and stores and all of the VSX loads and stores
defined in the Power ISA v2.07B (which is the architecture for
POWER8).

This does not implement Altivec/VMX loads and stores or the new loads
and stores defined in Power ISA v3.00.

This patch set is against v4.11-rc3.

---

 arch/powerpc/include/asm/disassemble.h |   5 +
 arch/powerpc/include/asm/kvm_host.h|  23 ++
 arch/powerpc/include/asm/kvm_ppc.h |  10 +
 arch/powerpc/include/asm/ppc-opcode.h  |  58 
 arch/powerpc/kvm/Makefile  |   2 +-
 arch/powerpc/kvm/book3s.c  |  18 ++
 arch/powerpc/kvm/booke.c   |   5 +
 arch/powerpc/kvm/emulate_loadstore.c   | 472 ++---
 arch/powerpc/kvm/powerpc.c | 317 +-
 9 files changed, 862 insertions(+), 48 deletions(-)

Re: [PATCH] powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration

2017-03-22 Thread Vaidyanathan Srinivasan

* Michael Ellerman  [2017-03-22 21:55:50]:

> Vaidyanathan Srinivasan  writes:
> > * Michael Ellerman  [2017-03-20 14:05:39]:
> >> Vaidyanathan Srinivasan  writes:
> >  
> >> > On powernv platform cpu_present could be less than cpu_possible
> >> > in cases where firmware detects the cpu, but it is not available
> >> > for OS.
> >> 
> >> It's entirely normal for present < possible, on my laptop for example,
> >> so I don't see how that causes the bug.
> >
> > Yes, present < possible in itself not a problem.  It is whether
> > cpu_device exist for that cpu or not.
> ...
> >
> > Currently if CONFIG_HOTPLUG_CPU=n, then we skip calling register_cpu()
> > and that causes the problem.
> ...
> >> 
> >> I really don't understand how a CPU not being present leads to a crash
> >> in printf()? Something in that call chain should have checked that the
> >> CPU was registered before crashing in printf() - surely?
> >
> > Yes, we should have just failed to register the cpuidle driver.  I have
> > the fix here:
> >
> > [PATCH] cpuidle: Validate cpu_dev in cpuidle_add_sysfs
> > http://patchwork.ozlabs.org/patch/740634/
> 
> OK. Can you send a v2 of this with a better change log that includes all
> the clarifications above.
> 
> And despite your subject being powerpc/powernv/cpuidle, this is a
> cpuidle patch. I can merge it, but I at least need you to Cc the cpuidle
> maintainers so they have a chance to see it.

Thanks for the review, I will post a v2 with more detailed commit log
and CC cpuidle maintainers and linux-pm.

--Vaidy

Re: [PATCH 4/5] powerpc/smp: add cpu_cache_mask

2017-03-22 Thread Oliver O'Halloran

On Wed, Mar 15, 2017 at 10:26 PM, Michael Ellerman  wrote:
> Oliver O'Halloran  writes:
>
>> Traditionally we have only ever tracked which CPUs are in the same core
>> (cpu_sibling_mask) and on the same die (cpu_core_mask). For Power9 we
>> need to be aware of which CPUs share cache with each other so this patch
>> adds cpu_cache_mask and the underlying cpu_cache_map variable to track
>> this.
>
> But which cache?

I'm not sure it matters. All the scheduler really wants to know is
that that migrating between cpus with a shared cache is cheaper than
migrating elsewhere.

> Some CPUs on Power8 share L3, or L4.

Eh... it's not really the same. The "L4" is part of the memory buffers
and it's function is conceptually different to the processor caches.
The L3 on P8 is only shared when the core that owns is offline (or
sleeping) so the scheduler doesn't really need to be aware of it. Even
if the scheduler was aware I don't think it can take advantage of it
without some terrible hacks.

>
> I think just call it cpu_l2cache_map to make it explicit.

I was being deliberately vague. I know it's only a shared currently,
but it's possible we might have a (real) shared L3 in the future. The
latest high-end x86 chips have some of l3 sharing across the entire
chip so you never know. I'm not particularly attached to the name
though, so i'll rename it if you really want.

Oliver

[PATCH v3 0/6] powerpc/perf: Export memory hierarchy level

2017-03-22 Thread Madhavan Srinivasan

Power8/Power9 Perforence Monitoring Unit (PMU) supports
different sampling modes (SM) such as Random Instruction
Sampling (RIS), Random Load/Store Facility Sampling (RLS)
and Random Branch Sampling (RBS). Sample mode RLS updates
Sampled Instruction Event Register [SIER] bits with memory
hierarchy information for a cache reload. Patchset exports
the hierarchy information to the user via the perf_mem_data_src
object from SIER.

Patchset is a rebase of the work posted previously with minor
updates to it.

https://lkml.org/lkml/2015/6/11/92

Changelog v2:
-Updated the commit messages
-Fixed isa207_find_source() to consider all the possible sier[ldst] values.

Changelog v1:
- Fixed author-ship for the first patch and added suka's "Signed-off-by:".

Madhavan Srinivasan (5):
  powerpc/perf: Export memory hierarchy info to user space
  powerpc/perf: Support to export MMCRA[TEC*] field to userspace
  powerpc/perf: Support to export SIERs bit in Power8
  powerpc/perf: Support to export SIERs bit in Power9
  powerpc/perf: Add Power8 mem_access event to sysfs

Sukadev Bhattiprolu (1):
  powerpc/perf: Define big-endian version of perf_mem_data_src

 arch/powerpc/include/asm/perf_event_server.h |  3 +
 arch/powerpc/perf/core-book3s.c  |  8 +++
 arch/powerpc/perf/isa207-common.c| 88 
 arch/powerpc/perf/isa207-common.h| 26 +++-
 arch/powerpc/perf/power8-events-list.h   |  6 ++
 arch/powerpc/perf/power8-pmu.c   |  4 ++
 arch/powerpc/perf/power9-pmu.c   |  2 +
 include/uapi/linux/perf_event.h  | 16 +
 tools/include/uapi/linux/perf_event.h| 16 +
 9 files changed, 168 insertions(+), 1 deletion(-)

-- 
2.7.4

[PATCH v3 6/6] powerpc/perf: Add Power8 mem_access event to sysfs

2017-03-22 Thread Madhavan Srinivasan

Patch add "mem_access" event to sysfs. This as-is not a raw event
supported by Power8 pmu. Instead, it is formed based on
raw event encoding specificed in isa207-common.h.

Primary PMU event used here is PM_MRK_INST_CMPL.
This event tracks only the completed marked instructions.

Random sampling mode (MMCRA[SM]) with Random Instruction
Sampling (RIS) is enabled to mark type of instructions.

With Random sampling in RLS mode with PM_MRK_INST_CMPL event,
the LDST /DATA_SRC fields in SIER identifies the memory
hierarchy level (eg: L1, L2 etc) statisfied a data-cache
miss for a marked instruction.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Sukadev Bhattiprolu 
Cc: Daniel Axtens 
Cc: Andrew Donnellan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power8-events-list.h | 6 ++
 arch/powerpc/perf/power8-pmu.c | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/perf/power8-events-list.h 
b/arch/powerpc/perf/power8-events-list.h
index 3a2e6e8ebb92..0f1d184627cc 100644
--- a/arch/powerpc/perf/power8-events-list.h
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -89,3 +89,9 @@ EVENT(PM_MRK_FILT_MATCH,  0x2013c)
 EVENT(PM_MRK_FILT_MATCH_ALT,   0x3012e)
 /* Alternate event code for PM_LD_MISS_L1 */
 EVENT(PM_LD_MISS_L1_ALT,   0x400f0)
+/*
+ * Memory Access Event -- mem_access
+ * Primary PMU event used here is PM_MRK_INST_CMPL, along with
+ * Random Load/Store Facility Sampling (RIS) in Random sampling mode 
(MMCRA[SM]).
+ */
+EVENT(MEM_ACCESS,  0x10401e0)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 932d7536f0eb..5463516e369b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -90,6 +90,7 @@ GENERIC_EVENT_ATTR(branch-instructions,   
PM_BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,  PM_BR_MPRED_CMPL);
 GENERIC_EVENT_ATTR(cache-references,   PM_LD_REF_L1);
 GENERIC_EVENT_ATTR(cache-misses,   PM_LD_MISS_L1);
+GENERIC_EVENT_ATTR(mem_access, MEM_ACCESS);
 
 CACHE_EVENT_ATTR(L1-dcache-load-misses,PM_LD_MISS_L1);
 CACHE_EVENT_ATTR(L1-dcache-loads,  PM_LD_REF_L1);
@@ -120,6 +121,7 @@ static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
GENERIC_EVENT_PTR(PM_LD_REF_L1),
GENERIC_EVENT_PTR(PM_LD_MISS_L1),
+   GENERIC_EVENT_PTR(MEM_ACCESS),
 
CACHE_EVENT_PTR(PM_LD_MISS_L1),
CACHE_EVENT_PTR(PM_LD_REF_L1),
-- 
2.7.4

[PATCH v3 5/6] powerpc/perf: Support to export SIERs bit in Power9

2017-03-22 Thread Madhavan Srinivasan

Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Sukadev Bhattiprolu 
Cc: Daniel Axtens 
Cc: Andrew Donnellan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-pmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 7f6582708e06..018f8e90ac35 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -427,6 +427,8 @@ static struct power_pmu power9_pmu = {
.bhrb_filter_map= power9_bhrb_filter_map,
.get_constraint = isa207_get_constraint,
.get_alternatives   = power9_get_alternatives,
+   .get_mem_data_src   = isa207_get_mem_data_src,
+   .get_mem_weight = isa207_get_mem_weight,
.disable_pmc= isa207_disable_pmc,
.flags  = PPMU_HAS_SIER | PPMU_ARCH_207S,
.n_generic  = ARRAY_SIZE(power9_generic_events),
-- 
2.7.4

[PATCH v3 3/6] powerpc/perf: Support to export MMCRA[TEC*] field to userspace

2017-03-22 Thread Madhavan Srinivasan

Threshold feature when used with MMCRA [Threshold Event Counter Event],
MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update
MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event
Counter Multiplier] with the corresponding threshold event count values.
Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of
struct perf_sample_data.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Anna-Maria Gleixner 
Cc: Daniel Axtens 
Cc: Sukadev Bhattiprolu 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/perf_event_server.h |  1 +
 arch/powerpc/perf/core-book3s.c  |  4 
 arch/powerpc/perf/isa207-common.c|  8 
 arch/powerpc/perf/isa207-common.h| 10 ++
 4 files changed, 23 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 446cdcd9b7f5..723bf48e7494 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -40,6 +40,7 @@ struct power_pmu {
u64 alt[]);
void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
+   void(*get_mem_weight)(u64 *weight);
u64 (*bhrb_filter_map)(u64 branch_sample_type);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index e241ebebab6f..6c2d4168daec 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2053,6 +2053,10 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
ppmu->get_mem_data_src)
ppmu->get_mem_data_src(_src, ppmu->flags, 
regs);
 
+   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
+   ppmu->get_mem_weight)
+   ppmu->get_mem_weight();
+
if (perf_event_overflow(event, , regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 41cc053ee692..292f6a242bb4 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -227,6 +227,14 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
*dsrc, u32 flags,
}
 }
 
+void isa207_get_mem_weight(u64 *weight)
+{
+   u64 mmcra = mfspr(SPRN_MMCRA);
+   u64 exp = MMCRA_THR_CTR_EXP(mmcra);
+   u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
+
+   *weight = mantissa << (2 * exp);
+}
 
 int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 {
diff --git a/arch/powerpc/perf/isa207-common.h 
b/arch/powerpc/perf/isa207-common.h
index 592aa0917cf3..23e0516df4a4 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -248,6 +248,15 @@
 #define MMCRA_SDAR_MODE_TLB(1ull << MMCRA_SDAR_MODE_SHIFT)
 #define MMCRA_SDAR_MODE_NO_UPDATES ~(0x3ull << MMCRA_SDAR_MODE_SHIFT)
 #define MMCRA_IFM_SHIFT30
+#define MMCRA_THR_CTR_MANT_SHIFT   19
+#define MMCRA_THR_CTR_MANT_MASK0x7Ful
+#define MMCRA_THR_CTR_MANT(v)  (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
+   MMCRA_THR_CTR_MANT_MASK)
+
+#define MMCRA_THR_CTR_EXP_SHIFT27
+#define MMCRA_THR_CTR_EXP_MASK 0x7ul
+#define MMCRA_THR_CTR_EXP(v)   (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
+   MMCRA_THR_CTR_EXP_MASK)
 
 /* MMCR1 Threshold Compare bit constant for power9 */
 #define p9_MMCRA_THR_CMP_SHIFT 45
@@ -282,5 +291,6 @@ int isa207_get_alternatives(u64 event, u64 alt[],
const unsigned int ev_alt[][MAX_ALT], int size);
 void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
struct pt_regs *regs);
+void isa207_get_mem_weight(u64 *weight);
 
 #endif
-- 
2.7.4

[PATCH v3 2/6] powerpc/perf: Export memory hierarchy info to user space

2017-03-22 Thread Madhavan Srinivasan

The LDST field and DATA_SRC in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied. Use the 'perf_mem_data_src' object to export this
hierarchy level to user space.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Anna-Maria Gleixner 
Cc: Daniel Axtens 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Madhavan Srinivasan 
---
Changelog v2:
- Fixed isa207_find_source() to consider all the possible sier[ldst] values.


 arch/powerpc/include/asm/perf_event_server.h |  2 +
 arch/powerpc/perf/core-book3s.c  |  4 ++
 arch/powerpc/perf/isa207-common.c| 80 
 arch/powerpc/perf/isa207-common.h| 16 +-
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index ae0a23091a9b..446cdcd9b7f5 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -38,6 +38,8 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
+   void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+   u32 flags, struct pt_regs *regs);
u64 (*bhrb_filter_map)(u64 branch_sample_type);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2ff13249f87a..e241ebebab6f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2049,6 +2049,10 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
data.br_stack = >bhrb_stack;
}
 
+   if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+   ppmu->get_mem_data_src)
+   ppmu->get_mem_data_src(_src, ppmu->flags, 
regs);
+
if (perf_event_overflow(event, , regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index cd951fd231c4..41cc053ee692 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -148,6 +148,86 @@ static bool is_thresh_cmp_valid(u64 event)
return true;
 }
 
+static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
+{
+   u64 ret = PERF_MEM_NA;
+
+   switch(idx) {
+   case 0:
+   /* Nothing to do */
+   break;
+   case 1:
+   ret = PLH(LVL, L1);
+   break;
+   case 2:
+   ret = PLH(LVL, L2);
+   break;
+   case 3:
+   ret = PLH(LVL, L3);
+   break;
+   case 4:
+   if (sub_idx <= 1)
+   ret = PLH(LVL, LOC_RAM);
+   else if (sub_idx > 1 && sub_idx <= 2)
+   ret = PLH(LVL, REM_RAM1);
+   else
+   ret = PLH(LVL, REM_RAM2);
+   ret |= P(SNOOP, HIT);
+   break;
+   case 5:
+   ret = PLH(LVL, REM_CCE1);
+   if ((sub_idx == 0) || (sub_idx == 2) || (sub_idx == 4))
+   ret |= P(SNOOP, HIT);
+   else if ((sub_idx == 1) || (sub_idx == 3) || (sub_idx == 5))
+   ret |= P(SNOOP, HITM);
+   break;
+   case 6:
+   ret = PLH(LVL, REM_CCE2);
+   if ((sub_idx == 0) || (sub_idx == 2))
+   ret |= P(SNOOP, HIT);
+   else if ((sub_idx == 1) || (sub_idx == 3))
+   ret |= P(SNOOP, HITM);
+   break;
+   case 7:
+   ret = PSM(LVL, L1);
+   break;
+   }
+
+   return ret;
+}
+
+static inline bool is_load_store_inst(u64 sier)
+{
+   u64 val;
+   val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
+
+   /* 1 = load, 2 = store */
+   return val == 1 || val == 2;
+}
+
+void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
+   struct pt_regs *regs)
+{
+

[PATCH v3 4/6] powerpc/perf: Support to export SIERs bit in Power8

2017-03-22 Thread Madhavan Srinivasan

Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Daniel Axtens 
Cc: Andrew Donnellan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power8-pmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ce15b19a7962..932d7536f0eb 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -325,6 +325,8 @@ static struct power_pmu power8_pmu = {
.bhrb_filter_map= power8_bhrb_filter_map,
.get_constraint = isa207_get_constraint,
.get_alternatives   = power8_get_alternatives,
+   .get_mem_data_src   = isa207_get_mem_data_src,
+   .get_mem_weight = isa207_get_mem_weight,
.disable_pmc= isa207_disable_pmc,
.flags  = PPMU_HAS_SIER | PPMU_ARCH_207S,
.n_generic  = ARRAY_SIZE(power8_generic_events),
-- 
2.7.4

[PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-03-22 Thread Madhavan Srinivasan

From: Sukadev Bhattiprolu 

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms (Which is broken now), we also need a big-endian represenation
of perf_mem_data_src. i.e, in a big endian system, if user request
PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
is constructed using shifts:

  /* TLB access */
  #define PERF_MEM_TLB_NA   0x01 /* not available */
  ...
  #define PERF_MEM_TLB_SHIFT26

  #define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

  #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA)  |\
PERF_MEM_S(TLB, NA))

Which works out as:

  ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))

Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
  data_src->val = 0x5080021
 op = 0x0
lvl = 0x0
  snoop = 0x0
   lock = 0x0
   dtlb = 0x0
   rsvd = 0x5080021

Patch does a minimal fix of adding big endian definition of the bitfields
to match the values that are already exported by the kernel on big endian.
And it makes no change on little endian.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Stephane Eranian 
Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Madhavan Srinivasan 
---
Changelog v2:
-Added Michael Ellerman's explanation to comiit message.
Changelog v1:
-Fixed author-ship and added suka's "Signed-off-by:".

 include/uapi/linux/perf_event.h   | 16 
 tools/include/uapi/linux/perf_event.h | 16 
 2 files changed, 32 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP   (1UL << 2) /* pid=cgroup id, per-cpu 
mode only */
 #define PERF_FLAG_FD_CLOEXEC   (1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
-- 
2.7.4

Re: [PATCH 2/5] powerpc/smp: add set_cpus_related()

2017-03-22 Thread Oliver O'Halloran

On Wed, Mar 15, 2017 at 10:18 PM, Michael Ellerman  wrote:
> Oliver O'Halloran  writes:
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index dfe0e1d9cd06..1c531887ca51 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -377,6 +377,25 @@ static void smp_store_cpu_info(int id)
>>  #endif
>>  }
>>
>> +/*
>> + * Relationships between CPUs are maintained in a set of per-cpu cpumasks. 
>> We
>> + * need to ensure that they are kept consistant between CPUs when they are
>> + * changed.
>> + *
>> + * This is slightly tricky since the core mask must be a strict superset of
>> + * the sibling mask.
>> + */
>> +static void set_cpus_related(int i, int j, bool related, struct cpumask 
>> *(*relation_fn)(int))
>> +{
>> + if (related) {
>> + cpumask_set_cpu(i, relation_fn(j));
>> + cpumask_set_cpu(j, relation_fn(i));
>> + } else {
>> + cpumask_clear_cpu(i, relation_fn(j));
>> + cpumask_clear_cpu(j, relation_fn(i));
>> + }
>> +}
>
> I think you pushed the abstraction one notch too far on this one, or
> perhaps not far enough.
>
> We end up with a function called "set" that might clear, depending on a
> bool you pass. Which is hard to parse, eg:
>
> set_cpus_related(cpu, base + i, false, cpu_sibling_mask);
>
> And I know there's two places where we pass an existing bool "add", but
> there's four where we pass true or false.

I think you're looking at this patch. With the full series applied we
never pass a literal to set_cpus_related() directly:

[12:14 oliver ~/.../powerpc/kernel (p9-sched $%)]$ gg set_cpus_related
smp.c:391:static void set_cpus_related(int i, int j, bool related,
struct cpumask *(*relation_fn)(int))
smp.c:647:  set_cpus_related(cpu, cpu, add, cpu_core_mask);
smp.c:651:  set_cpus_related(cpu, i, add, cpu_core_mask);
smp.c:685:  set_cpus_related(cpu, cpu, onlining, mask_fn);
smp.c:697:  set_cpus_related(cpu, i, onlining, mask_fn);
smp.c:721:  set_cpus_related(cpu, base + i, onlining,
cpu_sibling_mask);
smp.c:736:  set_cpus_related(cpu, cpu, onlining, cpu_core_mask);
smp.c:746:  set_cpus_related(cpu, i, onlining, cpu_core_mask);

I agree that set_cpus_related() is probably a bad name,
make_cpus_related() maybe?

>
> If we want to push it in that direction I think we should just pass the
> set/clear routine instead of the flag, so:
>
> do_cpus_related(cpu, base + i, cpumask_clear_cpu, cpu_sibling_mask);
>
> But that might be overdoing it.

I think this would be ok.

>
> So I think we should just do:
>
> static void set_cpus_related(int i, int j, struct cpumask *(*mask_func)(int))
> {
> cpumask_set_cpu(i, mask_func(j));
> cpumask_set_cpu(j, mask_func(i));
> }
>
> static void clear_cpus_related(int i, int j, struct cpumask 
> *(*mask_func)(int))
> {
> cpumask_clear_cpu(i, mask_func(j));
> cpumask_clear_cpu(j, mask_func(i));
> }
>
>
> So the cases with add become:
>
> if (add)
> set_cpus_related(cpu, i, cpu_core_mask(i));
> else
> clear_cpus_related(cpu, i, cpu_core_mask(i));

Dunno, I was trying to get rid of this sort of thing since the logic
is duplicated in a lot of places. Seemed to me that it was just
pointlessly verbose rather than being helpfully explicit.

>
> Which is not as pretty but more explicit.
>
> And the other cases look much better, eg:
>
> clear_cpus_related(cpu, base + i, cpu_sibling_mask);
>
> ??
>
> cheers

Re: [v7] powerpc/powernv: add hdat attribute to sysfs

2017-03-22 Thread Andrew Donnellan


On 23/03/17 09:27, Matt Brown wrote:

The HDAT data area is consumed by skiboot and turned into a device-tree.
In some cases we would like to look directly at the HDAT, so this patch
adds a sysfs node to allow it to be viewed.  This is not possible through
/dev/mem as it is reserved memory which is stopped by the /dev/mem filter.

Signed-off-by: Matt Brown 


Reviewed-by: Andrew Donnellan 


---
Changelog:

v7: 
- moved exported_attrs and attr_name into opal_export_attrs
---
 arch/powerpc/platforms/powernv/opal.c | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2822935..b8f057f 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -604,6 +604,87 @@ static void opal_export_symmap(void)
pr_warn("Error %d creating OPAL symbols file\n", rc);
 }

+static ssize_t export_attr_read(struct file *fp, struct kobject *kobj,
+struct bin_attribute *bin_attr, char *buf,
+loff_t off, size_t count)
+{
+   return memory_read_from_buffer(buf, count, , bin_attr->private,
+  bin_attr->size);
+}
+
+/*
+ * opal_export_attrs: creates a sysfs node for each property listed in
+ * the device-tree under /ibm,opal/firmware/exports/
+ * All new sysfs nodes are created under /opal/exports/.
+ * This allows for reserved memory regions (e.g. HDAT) to be read.
+ * The new sysfs nodes are only readable by root.
+ */
+static void opal_export_attrs(void)
+{
+   /* /sys/firmware/opal/exports */
+   struct kobject *opal_export_kobj;
+   struct bin_attribute *exported_attrs;
+   char **attr_name;
+
+   struct bin_attribute *attr_tmp;
+   const __be64 *syms;
+   unsigned int size;
+   struct device_node *fw;
+   struct property *prop;
+   int rc;
+   int attr_count = 0;
+   int n = 0;
+
+   /* Create new 'exports' directory */
+   opal_export_kobj = kobject_create_and_add("exports", opal_kobj);
+   if (!opal_export_kobj) {
+   pr_warn("kobject_create_and_add opal_exports failed\n");
+   return;
+   }
+
+   fw = of_find_node_by_path("/ibm,opal/firmware/exports");
+   if (!fw)
+   return;
+
+   for (prop = fw->properties; prop != NULL; prop = prop->next)
+   attr_count++;
+
+   if (attr_count > 2) {
+   exported_attrs = kzalloc(sizeof(exported_attrs)*(attr_count-2),
+   GFP_KERNEL);
+   attr_name = kzalloc(sizeof(char *)*(attr_count-2), GFP_KERNEL);
+   }
+
+   for_each_property_of_node(fw, prop) {
+
+   attr_name[n] = kstrdup(prop->name, GFP_KERNEL);
+   syms = of_get_property(fw, attr_name[n], );
+
+   if (!strcmp(attr_name[n], "name") ||
+   !strcmp(attr_name[n], "phandle"))
+   continue;
+
+   if (!syms || size != 2 * sizeof(__be64))
+   continue;
+
+   attr_tmp = _attrs[n];
+   attr_tmp->attr.name = attr_name[n];
+   attr_tmp->attr.mode = 0400;
+   attr_tmp->read = export_attr_read;
+   attr_tmp->private = __va(be64_to_cpu(syms[0]));
+   attr_tmp->size = be64_to_cpu(syms[1]);
+
+   rc = sysfs_create_bin_file(opal_export_kobj, attr_tmp);
+   if (rc)
+   pr_warn("Error %d creating OPAL sysfs exports/%s 
file\n",
+   rc, attr_name[n]);
+   n++;
+   }
+
+   of_node_put(fw);
+
+}
+
 static void __init opal_dump_region_init(void)
 {
void *addr;
@@ -742,6 +823,9 @@ static int __init opal_init(void)
opal_msglog_sysfs_init();
}

+   /* Export all properties */
+   opal_export_attrs();
+
/* Initialize platform devices: IPMI backend, PRD & flash interface */
opal_pdev_init("ibm,opal-ipmi");
opal_pdev_init("ibm,opal-flash");



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

Re: [PATCH 1/5] powerpc/smp: use cpu_to_chip_id() to find siblings

2017-03-22 Thread Oliver O'Halloran

On Wed, Mar 15, 2017 at 10:18 PM, Michael Ellerman  wrote:
> Oliver O'Halloran  writes:
>
>> To determine which logical CPUs are on the same core the kernel uses the
>> ibm,chipid property from the device tree node associated with that cpu.
>> The lookup for this this information is currently open coded in both
>> traverse_siblings() and traverse_siblings_chip_id(). This patch replaces
>> these manual lookups with the existing cpu_to_chip_id() function.
>
> Some minor nits.
>
> cpu_to_chip_id() actually searches recursively up the parents until it
> finds a ibm,chip-id, so it's not a 1:1 replacement for the existing
> logic, but it's probably still an OK conversion. It's still worth
> mentioning in the change log thought.

fair enough

>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index 893bd7f79be6..dfe0e1d9cd06 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -664,23 +655,19 @@ static void traverse_core_siblings(int cpu, bool add)
>>  {
>>   struct device_node *l2_cache, *np;
>>   const struct cpumask *mask;
>> - int i, chip, plen;
>> - const __be32 *prop;
>> + int chip_id;
>> + int i;
>>
>> - /* First see if we have ibm,chip-id properties in cpu nodes */
>> - np = of_get_cpu_node(cpu, NULL);
>> - if (np) {
>> - chip = -1;
>> - prop = of_get_property(np, "ibm,chip-id", );
>> - if (prop && plen == sizeof(int))
>> - chip = of_read_number(prop, 1);
>> - of_node_put(np);
>> - if (chip >= 0) {
>> - traverse_siblings_chip_id(cpu, add, chip);
>> - return;
>> - }
>> + /* threads that share a chip-id are considered siblings (same die) */
>
> You might know it means the "same die", but AFAIK there's no actual
> definition for what the chip-id means, so let's not write comments that
> might be wrong in future. Just saying they're considered siblings is
> sufficient.
>
> Also "Threads" :)

The cpus masks are all built in terms of threads, so this is
technically correct even if it sounds stupid. Maybe "logical cpus"
would be better?

>
> cheers

[v7] powerpc/powernv: add hdat attribute to sysfs

2017-03-22 Thread Matt Brown

The HDAT data area is consumed by skiboot and turned into a device-tree.
In some cases we would like to look directly at the HDAT, so this patch
adds a sysfs node to allow it to be viewed.  This is not possible through
/dev/mem as it is reserved memory which is stopped by the /dev/mem filter.

Signed-off-by: Matt Brown 
---
Changelog:

v7: 
- moved exported_attrs and attr_name into opal_export_attrs
---
 arch/powerpc/platforms/powernv/opal.c | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2822935..b8f057f 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -604,6 +604,87 @@ static void opal_export_symmap(void)
pr_warn("Error %d creating OPAL symbols file\n", rc);
 }
 
+static ssize_t export_attr_read(struct file *fp, struct kobject *kobj,
+struct bin_attribute *bin_attr, char *buf,
+loff_t off, size_t count)
+{
+   return memory_read_from_buffer(buf, count, , bin_attr->private,
+  bin_attr->size);
+}
+
+/*
+ * opal_export_attrs: creates a sysfs node for each property listed in
+ * the device-tree under /ibm,opal/firmware/exports/
+ * All new sysfs nodes are created under /opal/exports/.
+ * This allows for reserved memory regions (e.g. HDAT) to be read.
+ * The new sysfs nodes are only readable by root.
+ */
+static void opal_export_attrs(void)
+{
+   /* /sys/firmware/opal/exports */
+   struct kobject *opal_export_kobj;
+   struct bin_attribute *exported_attrs;
+   char **attr_name;
+
+   struct bin_attribute *attr_tmp;
+   const __be64 *syms;
+   unsigned int size;
+   struct device_node *fw;
+   struct property *prop;
+   int rc;
+   int attr_count = 0;
+   int n = 0;
+
+   /* Create new 'exports' directory */
+   opal_export_kobj = kobject_create_and_add("exports", opal_kobj);
+   if (!opal_export_kobj) {
+   pr_warn("kobject_create_and_add opal_exports failed\n");
+   return;
+   }
+
+   fw = of_find_node_by_path("/ibm,opal/firmware/exports");
+   if (!fw)
+   return;
+
+   for (prop = fw->properties; prop != NULL; prop = prop->next)
+   attr_count++;
+
+   if (attr_count > 2) {
+   exported_attrs = kzalloc(sizeof(exported_attrs)*(attr_count-2),
+   GFP_KERNEL);
+   attr_name = kzalloc(sizeof(char *)*(attr_count-2), GFP_KERNEL);
+   }
+
+   for_each_property_of_node(fw, prop) {
+
+   attr_name[n] = kstrdup(prop->name, GFP_KERNEL);
+   syms = of_get_property(fw, attr_name[n], );
+
+   if (!strcmp(attr_name[n], "name") ||
+   !strcmp(attr_name[n], "phandle"))
+   continue;
+
+   if (!syms || size != 2 * sizeof(__be64))
+   continue;
+
+   attr_tmp = _attrs[n];
+   attr_tmp->attr.name = attr_name[n];
+   attr_tmp->attr.mode = 0400;
+   attr_tmp->read = export_attr_read;
+   attr_tmp->private = __va(be64_to_cpu(syms[0]));
+   attr_tmp->size = be64_to_cpu(syms[1]);
+
+   rc = sysfs_create_bin_file(opal_export_kobj, attr_tmp);
+   if (rc)
+   pr_warn("Error %d creating OPAL sysfs exports/%s 
file\n",
+   rc, attr_name[n]);
+   n++;
+   }
+
+   of_node_put(fw);
+
+}
+
 static void __init opal_dump_region_init(void)
 {
void *addr;
@@ -742,6 +823,9 @@ static int __init opal_init(void)
opal_msglog_sysfs_init();
}
 
+   /* Export all properties */
+   opal_export_attrs();
+
/* Initialize platform devices: IPMI backend, PRD & flash interface */
opal_pdev_init("ibm,opal-ipmi");
opal_pdev_init("ibm,opal-flash");
-- 
2.9.3

[PATCH 3/3] powerpc/configs: Re-enable POWER8 crc32c

2017-03-22 Thread Anton Blanchard

From: Anton Blanchard 

The config option for the POWER8 crc32c recently changed from
CONFIG_CRYPT_CRC32C_VPMSUM to CONFIG_CRYPTO_CRC32C_VPMSUM. Update
the configs.

Signed-off-by: Anton Blanchard

[PATCH 2/3] powerpc/configs: Make oprofile a module

2017-03-22 Thread Anton Blanchard

From: Anton Blanchard 

Most people use perf these days, so save about 31kB by making oprofile
a module.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/configs/powernv_defconfig | 2 +-
 arch/powerpc/configs/ppc64_defconfig   | 2 +-
 arch/powerpc/configs/pseries_defconfig | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index eb78c74..4926d7f 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -33,7 +33,7 @@ CONFIG_BLK_DEV_INITRD=y
 CONFIG_BPF_SYSCALL=y
 # CONFIG_COMPAT_BRK is not set
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
+CONFIG_OPROFILE=m
 CONFIG_KPROBES=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index bdca32e..dfac33c 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -19,7 +19,7 @@ CONFIG_BLK_DEV_INITRD=y
 CONFIG_BPF_SYSCALL=y
 # CONFIG_COMPAT_BRK is not set
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
+CONFIG_OPROFILE=m
 CONFIG_KPROBES=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index cd26091..47f72c8 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -34,7 +34,7 @@ CONFIG_BLK_DEV_INITRD=y
 CONFIG_BPF_SYSCALL=y
 # CONFIG_COMPAT_BRK is not set
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=y
+CONFIG_OPROFILE=m
 CONFIG_KPROBES=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
-- 
2.9.3

[PATCH 1/3] powerpc/configs: Re-enable ISO9660_FS as a built-in in 64 bit configs

2017-03-22 Thread Anton Blanchard

From: Anton Blanchard 

It turns out cloud-config uses ISO9660 filesystems to inject
configuration data into cloud images. The cloud-config failures when
ISO9660_FS is not enabled are cryptic, and building it in makes
mainline testing easier, so re-enable it.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/configs/powernv_defconfig | 2 +-
 arch/powerpc/configs/ppc64_defconfig   | 2 +-
 arch/powerpc/configs/pseries_defconfig | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index ac8b833..eb78c74 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -261,7 +261,7 @@ CONFIG_NILFS2_FS=m
 CONFIG_AUTOFS4_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_OVERLAY_FS=m
-CONFIG_ISO9660_FS=m
+CONFIG_ISO9660_FS=y
 CONFIG_UDF_FS=m
 CONFIG_MSDOS_FS=y
 CONFIG_VFAT_FS=m
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 4f1288b..bdca32e 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -291,7 +291,7 @@ CONFIG_NILFS2_FS=m
 CONFIG_AUTOFS4_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_OVERLAY_FS=m
-CONFIG_ISO9660_FS=m
+CONFIG_ISO9660_FS=y
 CONFIG_UDF_FS=m
 CONFIG_MSDOS_FS=y
 CONFIG_VFAT_FS=m
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 4ff68b7..cd26091 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -259,7 +259,7 @@ CONFIG_NILFS2_FS=m
 CONFIG_AUTOFS4_FS=m
 CONFIG_FUSE_FS=m
 CONFIG_OVERLAY_FS=m
-CONFIG_ISO9660_FS=m
+CONFIG_ISO9660_FS=y
 CONFIG_UDF_FS=m
 CONFIG_MSDOS_FS=y
 CONFIG_VFAT_FS=m
-- 
2.9.3

Re: Optimised memset64/memset32 for powerpc

2017-03-22 Thread Matthew Wilcox

On Wed, Mar 22, 2017 at 06:18:05AM -0700, Matthew Wilcox wrote:
> There's one other potential user I've been wondering about, which are the
> various console drivers.  They use 'memsetw' to blank the entire console
> or lines of the console when scrolling, but the only architecture which
> ever bothered implementing an optimised version of it was Alpha.
> 
> Might be worth it on powerpc actually ... better than a loop calling
> cpu_to_le16() on each iteration.  That'd complete the set with a
> memset16().

All hail plane rides ... This would need to be resplit and merged properly,
but I think it makes life a little saner.

I make no claims that the ARM assembly in here is correct.  The single
x86 instruction that I wrote^W coped and pasted appears to be correct by
my understanding of the instruction set.


diff --git a/arch/alpha/include/asm/string.h b/arch/alpha/include/asm/string.h
index c2911f591704..74c0a693b76b 100644
--- a/arch/alpha/include/asm/string.h
+++ b/arch/alpha/include/asm/string.h
@@ -65,13 +65,14 @@ extern void * memchr(const void *, int, size_t);
aligned values.  The DEST and COUNT parameters must be even for 
correct operation.  */
 
-#define __HAVE_ARCH_MEMSETW
-extern void * __memsetw(void *dest, unsigned short, size_t count);
-
-#define memsetw(s, c, n)\
-(__builtin_constant_p(c)\
- ? __constant_c_memset((s),0x0001000100010001UL*(unsigned short)(c),(n)) \
- : __memsetw((s),(c),(n)))
+#define __HAVE_ARCH_MEMSET16
+extern void * __memset16(void *dest, unsigned short, size_t count);
+static inline void *memset16(uint16_t *p, uint16_t v, size_t n)
+{
+   if (__builtin_constant_p(v))
+   return __constant_c_memset(p, 0x0001000100010001UL * v, n * 2)
+   return __memset16(p, v, n * 2);
+}
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/alpha/include/asm/vga.h b/arch/alpha/include/asm/vga.h
index c00106bac521..3c1c2b6128e7 100644
--- a/arch/alpha/include/asm/vga.h
+++ b/arch/alpha/include/asm/vga.h
@@ -34,7 +34,7 @@ static inline void scr_memsetw(u16 *s, u16 c, unsigned int 
count)
if (__is_ioaddr(s))
memsetw_io((u16 __iomem *) s, c, count);
else
-   memsetw(s, c, count);
+   memset16(s, c, count / 2);
 }
 
 /* Do not trust that the usage will be correct; analyze the arguments.  */
diff --git a/arch/alpha/lib/memset.S b/arch/alpha/lib/memset.S
index 89a26f5e89de..f824969e9e77 100644
--- a/arch/alpha/lib/memset.S
+++ b/arch/alpha/lib/memset.S
@@ -20,7 +20,7 @@
.globl memset
.globl __memset
.globl ___memset
-   .globl __memsetw
+   .globl __memset16
.globl __constant_c_memset
 
.ent ___memset
@@ -110,8 +110,8 @@ EXPORT_SYMBOL(___memset)
 EXPORT_SYMBOL(__constant_c_memset)
 
.align 5
-   .ent __memsetw
-__memsetw:
+   .ent __memset16
+__memset16:
.prologue 0
 
inswl $17,0,$1  /* E0 */
@@ -123,8 +123,8 @@ __memsetw:
or $1,$4,$17/* E0 */
br __constant_c_memset  /* .. E1 */
 
-   .end __memsetw
-EXPORT_SYMBOL(__memsetw)
+   .end __memset16
+EXPORT_SYMBOL(__memset16)
 
 memset = ___memset
 __memset = ___memset
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index da88299f758b..bc7a1be7a76a 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -24,15 +24,22 @@ extern void * memchr(const void *, int, __kernel_size_t);
 #define __HAVE_ARCH_MEMSET
 extern void * memset(void *, int, __kernel_size_t);
 
-#define __HAVE_ARCH_MEMSET_PLUS
-extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
-extern void *__memset64(uint64_t *, uint32_t low, __kernel_size_t, uint32_t 
hi);
+#define __HAVE_ARCH_MEMSET16
+extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
+static inline void *memset16(uint16_t *p, uint16_t v, __kernel_size_t n)
+{
+   return __memset16(p, v, n * 2);
+}
 
+#define __HAVE_ARCH_MEMSET32
+extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
 static inline void *memset32(uint32_t *p, uint32_t v, __kernel_size_t n)
 {
return __memset32(p, v, n * 4);
 }
 
+#define __HAVE_ARCH_MEMSET64
+extern void *__memset64(uint64_t *, uint32_t low, __kernel_size_t, uint32_t 
hi);
 static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
 {
return __memset64(p, v, n * 8, v >> 32);
diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index a835ff9ed30c..0b6cbaa25b33 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -21,12 +21,12 @@ ENTRY(memset)
 UNWIND( .fnstart )
andsr3, r0, #3  @ 1 unaligned?
mov ip, r0  @ preserve r0 as return value
+   orr r1, r1, r1, lsl #8
bne 6f  @ 1
 /*
  * we know that the pointer in ip is aligned to a word boundary.
  */
-1: orr

[PATCH v4 3/3] powerpc/xmon: add debugfs entry for xmon

2017-03-22 Thread Guilherme G. Piccoli

Currently the xmon debugger is set only via kernel boot command-line.
It's disabled by default, and can be enabled with "xmon=on" on the
command-line. Also, xmon may be accessed via sysrq mechanism.
But we cannot enable/disable xmon in runtime, it needs kernel reload.

This patch introduces a debugfs entry for xmon, allowing user to query
its current state and change it if desired. Basically, the "xmon" file
to read from/write to is under the debugfs mount point, on powerpc
directory. It's a simple attribute, value 0 meaning xmon is disabled
and value 1 the opposite. Writing these states to the file will take
immediate effect in the debugger.

Signed-off-by: Guilherme G. Piccoli 
---
v4: fixed a bug in the patch (s/xmon_off/xmon_on/g basically).

v3: logic improved based in the changes made on patch 1.

 arch/powerpc/xmon/xmon.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 25a32815f310..0fab9d7349eb 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -29,6 +29,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_DEBUG_FS
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -3316,6 +3320,33 @@ static int __init setup_xmon_sysrq(void)
 device_initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
 
+#ifdef CONFIG_DEBUG_FS
+static int xmon_dbgfs_set(void *data, u64 val)
+{
+   xmon_on = !!val;
+   xmon_init(xmon_on);
+
+   return 0;
+}
+
+static int xmon_dbgfs_get(void *data, u64 *val)
+{
+   *val = xmon_on;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(xmon_dbgfs_ops, xmon_dbgfs_get,
+   xmon_dbgfs_set, "%llu\n");
+
+static int __init setup_xmon_dbgfs(void)
+{
+   debugfs_create_file("xmon", 0600, powerpc_debugfs_root, NULL,
+   _dbgfs_ops);
+   return 0;
+}
+device_initcall(setup_xmon_dbgfs);
+#endif /* CONFIG_DEBUG_FS */
+
 static int xmon_early __initdata;
 
 static int __init early_parse_xmon(char *p)
-- 
2.11.0

[PATCH v4 0/3] powerpc/xmon: improvements and fixes

2017-03-22 Thread Guilherme G. Piccoli

This series contains some improvements and fixes to xmon:

1) Pan Xinhui fixed a long-term bug, in which the xmon debugger got
stuck enabled after invoked by sysrq, regardless the state it was
set in the kernel command-line.

2) A debugfs entry was added in order to allow users to enable/disable
xmon without needing a kernel reload.

3) The nobt option was dropped and some minor issues were fixed, like
a misplacement of __initdata.

@mpe: The series was rebased against powerpc-next.
Also, I sent the patchset before with multiple versions, now
all patches are the same version, v4.


Guilherme G. Piccoli (2):
  powerpc/xmon: drop the nobt option from xmon plus minor fixes
  powerpc/xmon: add debugfs entry for xmon

Pan Xinhui (1):
  powerpc/xmon: Fix an unexpected xmon on/off state change

 arch/powerpc/xmon/xmon.c | 59 +++-
 1 file changed, 43 insertions(+), 16 deletions(-)

-- 
2.11.0

[PATCH v4 2/3] powerpc/xmon: drop the nobt option from xmon plus minor fixes

2017-03-22 Thread Guilherme G. Piccoli

The xmon parameter nobt was added long time ago, by commit 26c8af5f01df
("[POWERPC] print backtrace when entering xmon"). The problem that time
was that during a crash in a machine with USB keyboard, xmon wouldn't
respond to commands from the keyboard, so printing the backtrace wouldn't
be possible.

Idea then was to show automatically the backtrace on xmon crash for the
first time it's invoked (if it recovers, next time xmon won't show
backtrace automatically). The nobt parameter was added _only_ to prevent
this automatic trace show. Seems long time ago USB keyboards didn't work
that well!

We don't need this parameter anymore, the feature of auto showing the
backtrace is interesting (imagine a case of auto-reboot script),
so this patch extends the functionality, by always showing the backtrace
automatically when xmon is invoked; it removes the nobt parameter too.

Also, this patch fixes __initdata placement on xmon_early and replaces
__initcall() with modern device_initcall() on sysrq handler.

Signed-off-by: Guilherme G. Piccoli 
---
v4: extended the auto backtrace functionality, by showing the trace
in every xmon invokation [mpe suggestion].

 arch/powerpc/xmon/xmon.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a89db1b3f66d..25a32815f310 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -185,8 +185,6 @@ static void dump_tlb_44x(void);
 static void dump_tlb_book3e(void);
 #endif
 
-static int xmon_no_auto_backtrace;
-
 #ifdef CONFIG_PPC64
 #define REG"%.16lx"
 #else
@@ -885,10 +883,7 @@ cmds(struct pt_regs *excp)
last_cmd = NULL;
xmon_regs = excp;
 
-   if (!xmon_no_auto_backtrace) {
-   xmon_no_auto_backtrace = 1;
-   xmon_show_stack(excp->gpr[1], excp->link, excp->nip);
-   }
+   xmon_show_stack(excp->gpr[1], excp->link, excp->nip);
 
for(;;) {
 #ifdef CONFIG_SMP
@@ -3318,10 +3313,10 @@ static int __init setup_xmon_sysrq(void)
register_sysrq_key('x', _xmon_op);
return 0;
 }
-__initcall(setup_xmon_sysrq);
+device_initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
 
-static int __initdata xmon_early;
+static int xmon_early __initdata;
 
 static int __init early_parse_xmon(char *p)
 {
@@ -3335,8 +3330,6 @@ static int __init early_parse_xmon(char *p)
xmon_on = 1;
} else if (strncmp(p, "off", 3) == 0)
xmon_on = 0;
-   else if (strncmp(p, "nobt", 4) == 0)
-   xmon_no_auto_backtrace = 1;
else
return 1;
 
-- 
2.11.0

[PATCH v4 1/3] powerpc/xmon: Fix an unexpected xmon on/off state change

2017-03-22 Thread Guilherme G. Piccoli

From: Pan Xinhui 

Once xmon is triggered by sysrq-x, it is enabled always afterwards even
if it is disabled during boot. This will cause a system reset interrupt
fail to dump. So keep xmon in its original state after exit.

We have several ways to set xmon on or off.
1) by a build config CONFIG_XMON_DEFAULT.
2) by a boot cmdline with xmon or xmon=early or xmon=on to enable xmon
and xmon=off to disable xmon. This value will override that in step 1.
3) by a debugfs interface, as proposed in this patchset.
And this value can override those in step 1 and 2.

Signed-off-by: Pan Xinhui 
Signed-off-by: Guilherme G. Piccoli 
---
v3: changed xmon_off to xmon_on, simplifying the logic [mpe suggestion].

 arch/powerpc/xmon/xmon.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 16321ad9e70c..a89db1b3f66d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -76,6 +76,7 @@ static int xmon_gate;
 #endif /* CONFIG_SMP */
 
 static unsigned long in_xmon __read_mostly = 0;
+static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
 
 static unsigned long adrs;
 static int size = 1;
@@ -3302,6 +3303,8 @@ static void sysrq_handle_xmon(int key)
/* ensure xmon is enabled */
xmon_init(1);
debugger(get_irq_regs());
+   if (!xmon_on)
+   xmon_init(0);
 }
 
 static struct sysrq_key_op sysrq_xmon_op = {
@@ -3318,7 +3321,7 @@ static int __init setup_xmon_sysrq(void)
 __initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
 
-static int __initdata xmon_early, xmon_off;
+static int __initdata xmon_early;
 
 static int __init early_parse_xmon(char *p)
 {
@@ -3326,10 +3329,12 @@ static int __init early_parse_xmon(char *p)
/* just "xmon" is equivalent to "xmon=early" */
xmon_init(1);
xmon_early = 1;
-   } else if (strncmp(p, "on", 2) == 0)
+   xmon_on = 1;
+   } else if (strncmp(p, "on", 2) == 0) {
xmon_init(1);
-   else if (strncmp(p, "off", 3) == 0)
-   xmon_off = 1;
+   xmon_on = 1;
+   } else if (strncmp(p, "off", 3) == 0)
+   xmon_on = 0;
else if (strncmp(p, "nobt", 4) == 0)
xmon_no_auto_backtrace = 1;
else
@@ -3341,10 +3346,8 @@ early_param("xmon", early_parse_xmon);
 
 void __init xmon_setup(void)
 {
-#ifdef CONFIG_XMON_DEFAULT
-   if (!xmon_off)
+   if (xmon_on)
xmon_init(1);
-#endif
if (xmon_early)
debugger(NULL);
 }
-- 
2.11.0

Re: [FIX PATCH v1] powerpc/pseries: Fix reference count leak during CPU unplug

2017-03-22 Thread Michael Bringmann

I get the error when removing a CPU that has been hotplugged after boot.

On 03/14/2017 03:42 PM, Tyrel Datwyler wrote:
> On 03/13/2017 03:29 AM, Bharata B Rao wrote:
>> On Thu, Mar 09, 2017 at 01:34:00PM -0800, Tyrel Datwyler wrote:
>>> On 03/08/2017 08:37 PM, Bharata B Rao wrote:
 The following warning is seen when a CPU is hot unplugged on a PowerKVM
 guest:
>>>
>>> Is this the case with cpus present at boot? What about cpus hotplugged
>>> after boot?
>>
>> I have observed this for CPUs that are hotplugged.
> 
> If removing a cpu present at boot works, but removing one that has been
> hotplugged after boot reproduces the problem it is more likely the case
> that we failed to take a reference during hotplug or released a
> reference we shouldn't have. I'd have to go look at the hot add path.
> 
>>
>>>
>>> My suspicion is that the refcount was wrong to begin with. See my
>>> comments below. The use of the of_node_put() calls is correct as in each
>>> case we incremented the ref count earlier in the same function.
>>>

 refcount_t: underflow; use-after-free.
 [ cut here ]
 WARNING: CPU: 0 PID: 53 at lib/refcount.c:128 
 refcount_sub_and_test+0xd8/0xf0
 Modules linked in:
 CPU: 0 PID: 53 Comm: kworker/u510:1 Not tainted 4.11.0-rc1 #3
 Workqueue: pseries hotplug workque pseries_hp_work_fn
 task: c000fb475000 task.stack: c000fb81c000
 NIP: c06f0808 LR: c06f0804 CTR: c07b98c0
 REGS: c000fb81f710 TRAP: 0700   Not tainted  (4.11.0-rc1)
 MSR: 8282b033 
   CR: 4800  XER: 2000
 CFAR: c0c438e0 SOFTE: 1
 GPR00: c06f0804 c000fb81f990 c1573b00 0026
 GPR04:  016c 667265652e0d0a73 652d61667465722d
 GPR08: 0007 0007 0001 0006
 GPR12: 2200 cff4 c010c578 c001f11b9f40
 GPR16: c001fe0312a8 c001fe031078 c001fe031020 0001
 GPR20:   c1454808 fef7
 GPR24:  c001f1677648  
 GPR28: 1008 c0e4d3d8  c001eaae07d8
 NIP [c06f0808] refcount_sub_and_test+0xd8/0xf0
 LR [c06f0804] refcount_sub_and_test+0xd4/0xf0
 Call Trace:
 [c000fb81f990] [c06f0804] refcount_sub_and_test+0xd4/0xf0 
 (unreliable)
 [c000fb81f9f0] [c06d04b4] kobject_put+0x44/0x2a0
 [c000fb81fa70] [c09d5284] of_node_put+0x34/0x50
 [c000fb81faa0] [c00aceb8] dlpar_cpu_remove_by_index+0x108/0x130
 [c000fb81fb30] [c00ae128] dlpar_cpu+0x78/0x550
 [c000fb81fbe0] [c00a7b40] handle_dlpar_errorlog+0xc0/0x160
 [c000fb81fc50] [c00a7c74] pseries_hp_work_fn+0x94/0xa0
 [c000fb81fc80] [c0102cec] process_one_work+0x23c/0x540
 [c000fb81fd20] [c010309c] worker_thread+0xac/0x620
 [c000fb81fdc0] [c010c6c4] kthread+0x154/0x1a0
 [c000fb81fe30] [c000bbe0] ret_from_kernel_thread+0x5c/0x7c

 Fix this by ensuring that of_node_put() is called only from the
 error path in dlpar_cpu_remove_by_index(). In the normal path,
 of_node_put() happens as part of dlpar_detach_node().

 Signed-off-by: Bharata B Rao 
 Cc: Nathan Fontenot 
 ---
 Changes in v1:
 - Fixed the refcount problem in the userspace driven unplug path
   in addition to in-kernel unplug path. (Sachin Sant)

 v0: https://patchwork.ozlabs.org/patch/736547/

  arch/powerpc/platforms/pseries/hotplug-cpu.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

 diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
 b/arch/powerpc/platforms/pseries/hotplug-cpu.c
 index 7bc0e91..c5ed510 100644
 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
 +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
 @@ -619,7 +619,8 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
}

rc = dlpar_cpu_remove(dn, drc_index);
 -  of_node_put(dn);
 +  if (rc)
 +  of_node_put(dn);
>>>
>>> I think there is another issue at play here because this is wrong.
>>> Regardless of whether the dlpar_cpu_remove() succeeds or fails we still
>>> need of_node_put() for both cases because we incremented the ref count
>>> earlier in this function with a call to cpu_drc_index_to_dn() call. That
>>> call doesn't, but shoul, document that it returns a device_node with
>>> incremented refcount.
>>>
return rc;
  }

 @@ -856,9 +857,12 @@ static ssize_t dlpar_cpu_release(const char *buf, 
 size_t count)
}

rc = dlpar_cpu_remove(dn,

Re: [v3 PATCH 4/4] powernv: Recover correct PACA on wakeup from a stop on P9 DD1

2017-03-22 Thread Nicholas Piggin

On Wed, 22 Mar 2017 20:34:17 +0530
"Gautham R. Shenoy"  wrote:

> From: "Gautham R. Shenoy" 
> 
> POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
> waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
> core. Thus the HSPRG0 of a thread waking up from can contain the paca
> pointer of its sibling.
> 
> This patch implements a context recovery framework within threads of a
> core, by provisioning space in paca_struct for saving every sibling
> threads's paca pointers. Basically, we should be able to arrive at the
> right paca pointer from any of the thread's existing paca pointer.
> 
> At bootup, during powernv idle-init, we save the paca address of every
> CPU in each one its siblings paca_struct in the slot corresponding to
> this CPU's index in the core.
> 
> On wakeup from a stop, the thread will determine its index in the core
> from the TIR register and recover its PACA pointer by indexing into
> the correct slot in the provisioned space in the current PACA.
> 
> Furthermore, ensure that the NVGPRs are restored from the stack on the
> way out by setting the NAPSTATELOST in paca.
> 
> [Changelog written with inputs from sva...@linux.vnet.ibm.com]

Looks good.

Reviewed-by: Nicholas Piggin

[v3 PATCH 1/4] powernv: Move CPU-Offline idle state invocation from smp.c to idle.c

2017-03-22 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cpuidle.h   |  1 +
 arch/powerpc/platforms/powernv/idle.c| 25 +
 arch/powerpc/platforms/powernv/powernv.h |  2 --
 arch/powerpc/platforms/powernv/smp.c | 18 ++
 4 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 1557315..4649ca0 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -46,6 +46,7 @@
 
 extern u64 pnv_first_deep_stop_state;
 
+unsigned long pnv_cpu_offline(unsigned int cpu);
 int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
 static inline void report_invalid_psscr_val(u64 psscr_val, int err)
 {
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 4ee837e..419edff 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -266,6 +266,31 @@ static void power9_idle(void)
 u64 pnv_deepest_stop_psscr_mask;
 
 /*
+ * pnv_cpu_offline: A function that puts the CPU into the deepest
+ * available platform idle state on a CPU-Offline.
+ */
+unsigned long pnv_cpu_offline(unsigned int cpu)
+{
+   unsigned long srr1;
+
+   u32 idle_states = pnv_get_supported_cpuidle_states();
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
+   pnv_deepest_stop_psscr_mask);
+   } else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
+   srr1 = power7_winkle();
+   } else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
+  (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
+   srr1 = power7_sleep();
+   } else {
+   srr1 = power7_nap(1);
+   }
+
+   return srr1;
+}
+
+/*
  * Power ISA 3.0 idle initialization.
  *
  * POWER ISA 3.0 defines a new SPR Processor stop Status and Control
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 6130522..6dbc0a1 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -18,8 +18,6 @@ static inline void pnv_pci_shutdown(void) { }
 #endif
 
 extern u32 pnv_get_supported_cpuidle_states(void);
-extern u64 pnv_deepest_stop_psscr_val;
-extern u64 pnv_deepest_stop_psscr_mask;
 
 extern void pnv_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 8b67e1e..914b456 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
@@ -140,7 +141,6 @@ static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu;
unsigned long srr1, wmask;
-   u32 idle_states;
 
/* Standard hot unplug procedure */
local_irq_disable();
@@ -155,8 +155,6 @@ static void pnv_smp_cpu_kill_self(void)
if (cpu_has_feature(CPU_FTR_ARCH_207S))
wmask = SRR1_WAKEMASK_P8;
 
-   idle_states = pnv_get_supported_cpuidle_states();
-
/* We don't want to take decrementer interrupts while we are offline,
 * so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9)
 * enabled as to let IPIs in.
@@ -184,19 +182,7 @@ static void pnv_smp_cpu_kill_self(void)
kvmppc_set_host_ipi(cpu, 0);
 
ppc64_runlatch_off();
-
-   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
-   srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
-   pnv_deepest_stop_psscr_mask);
-   } else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
-   srr1 = power7_winkle();
-   } else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
-  (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-   srr1 = power7_sleep();
-   } else {
-   srr1 = power7_nap(1);
-   }
-
+   srr1 = pnv_cpu_offline(cpu);
ppc64_runlatch_on();
 
/*
-- 
1.9.4

[v3 PATCH 4/4] powernv: Recover correct PACA on wakeup from a stop on P9 DD1

2017-03-22 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
core. Thus the HSPRG0 of a thread waking up from can contain the paca
pointer of its sibling.

This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.

At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.

On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.

Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.

[Changelog written with inputs from sva...@linux.vnet.ibm.com]

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/paca.h   |  5 
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/idle_book3s.S | 48 ++-
 arch/powerpc/platforms/powernv/idle.c | 30 ++
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 708c3e5..2a17c15 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -172,6 +172,11 @@ struct paca_struct {
u8 thread_mask;
/* Mask to denote subcore sibling threads */
u8 subcore_sibling_mask;
+   /*
+* Pointer to an array which contains pointer
+* to the sibling threads' paca.
+*/
+   struct paca_struct **thread_sibling_pacas;
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4367e7d..6ec5016 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -727,6 +727,7 @@ int main(void)
OFFSET(PACA_THREAD_IDLE_STATE, paca_struct, thread_idle_state);
OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
+   OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
 #endif
 
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 9957287..24717a7 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -375,6 +375,46 @@ _GLOBAL(power9_idle_stop)
li  r4,1
b   pnv_powersave_common
/* No return */
+
+
+/*
+ * On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
+ * HSPRG0 will be set to the HSPRG0 value of one of the
+ * threads in this core. Thus the value we have in r13
+ * may not be this thread's paca pointer.
+ *
+ * Fortunately, the TIR remains invariant. Since this thread's
+ * paca pointer is recorded in all its sibling's paca, we can
+ * correctly recover this thread's paca pointer if we
+ * know the index of this thread in the core.
+ *
+ * This index can be obtained from the TIR.
+ *
+ * i.e, thread's position in the core = TIR.
+ * If this value is i, then this thread's paca is
+ * paca->thread_sibling_pacas[i].
+ */
+power9_dd1_recover_paca:
+   mfspr   r4, SPRN_TIR
+   /*
+* Since each entry in thread_sibling_pacas is 8 bytes
+* we need to left-shift by 3 bits. Thus r4 = i * 8
+*/
+   sldir4, r4, 3
+   /* Get >thread_sibling_pacas[0] in r5 */
+   ld  r5, PACA_SIBLING_PACA_PTRS(r13)
+   /* Load paca->thread_sibling_pacas[i] into r13 */
+   ldx r13, r4, r5
+   SET_PACA(r13)
+   ld  r2, PACATOC(r13)
+   /*
+* Indicate that we have lost NVGPR state
+* which needs to be restored from the stack.
+*/
+   li  r3, 1
+   stb r0,PACA_NAPSTATELOST(r13)
+   blr
+
 /*
  * Called from reset vector. Check whether we have woken up with
  * hypervisor state loss. If yes, restore hypervisor state and return
@@ -385,7 +425,13 @@ _GLOBAL(power9_idle_stop)
  */
 _GLOBAL(pnv_restore_hyp_resource)
 BEGIN_FTR_SECTION
-   ld  r2,PACATOC(r13);
+BEGIN_FTR_SECTION_NESTED(70)
+   mflrr6
+   bl  power9_dd1_recover_paca
+   mtlrr6
+FTR_SECTION_ELSE_NESTED(70)
+   ld  r2, PACATOC(r13)
+ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
/*
 * POWER ISA 3. Use PSSCR to determine if we
 * are waking up from deep idle state
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 63ade78..b369e39 100644
---

[v3 PATCH 3/4] powernv:idle: Don't override default/deepest directly in kernel

2017-03-22 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

Currently during idle-init on power9, if we don't find suitable stop
states in the device tree that can be used as the
default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
stop state psscr to be used by power9_idle and deepest stop state
which is used by CPU-Hotplug.

However, if the platform firmware has not configured or enabled a stop
state, the kernel should not make any assumptions and fallback to a
default choice.

If the kernel uses a stop state that is not configured by the platform
firmware, it may lead to further failures which should be avoided.

In this patch, we modify the init code to ensure that the kernel uses
only the stop states exposed by the firmware through the device
tree. When a suitable default stop state isn't found, we disable
ppc_md.power_save for power9. Similarly, when a suitable
deepest_stop_state is not found in the device tree exported by the
firmware, fall back to the default busy-wait loop in the CPU-Hotplug
code.

[Changelog written with inputs from sva...@linux.vnet.ibm.com]
Reviewed-by: Nicholas Piggin 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/platforms/powernv/idle.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index f335e0f..63ade78 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -147,7 +147,6 @@ u32 pnv_get_supported_cpuidle_states(void)
 }
 EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
 
-
 static void pnv_fastsleep_workaround_apply(void *info)
 
 {
@@ -241,8 +240,9 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
  * The default stop state that will be used by ppc_md.power_save
  * function on platforms that support stop instruction.
  */
-u64 pnv_default_stop_val;
-u64 pnv_default_stop_mask;
+static u64 pnv_default_stop_val;
+static u64 pnv_default_stop_mask;
+static bool default_stop_found;
 
 /*
  * Used for ppc_md.power_save which needs a function with no parameters
@@ -262,8 +262,9 @@ static void power9_idle(void)
  * psscr value and mask of the deepest stop idle state.
  * Used when a cpu is offlined.
  */
-u64 pnv_deepest_stop_psscr_val;
-u64 pnv_deepest_stop_psscr_mask;
+static u64 pnv_deepest_stop_psscr_val;
+static u64 pnv_deepest_stop_psscr_mask;
+static bool deepest_stop_found;
 
 /*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
@@ -275,7 +276,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 
u32 idle_states = pnv_get_supported_cpuidle_states();
 
-   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
@@ -385,7 +386,6 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
u32 *residency_ns = NULL;
u64 max_residency_ns = 0;
int rc = 0, i;
-   bool default_stop_found = false, deepest_stop_found = false;
 
psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
@@ -465,21 +465,24 @@ static int __init pnv_power9_idle_init(struct device_node 
*np, u32 *flags,
}
}
 
-   if (!default_stop_found) {
-   pnv_default_stop_val = PSSCR_HV_DEFAULT_VAL;
-   pnv_default_stop_mask = PSSCR_HV_DEFAULT_MASK;
-   pr_warn("Setting default stop psscr 
val=0x%016llx,mask=0x%016llx\n",
+   if (unlikely(!default_stop_found)) {
+   pr_warn("cpuidle-powernv: No suitable default stop state found. 
Disabling platform idle.\n");
+   } else {
+   ppc_md.power_save = power9_idle;
+   pr_info("cpuidle-powernv: Default stop: psscr = 
0x%016llx,mask=0x%016llx\n",
pnv_default_stop_val, pnv_default_stop_mask);
}
 
-   if (!deepest_stop_found) {
-   pnv_deepest_stop_psscr_val = PSSCR_HV_DEFAULT_VAL;
-   pnv_deepest_stop_psscr_mask = PSSCR_HV_DEFAULT_MASK;
-   pr_warn("Setting default stop psscr 
val=0x%016llx,mask=0x%016llx\n",
+   if (unlikely(!deepest_stop_found)) {
+   pr_warn("cpuidle-powernv: No suitable stop state for 
CPU-Hotplug. Offlined CPUs will busy wait");
+   } else {
+   pr_info("cpuidle-powernv: Deepest stop: psscr = 
0x%016llx,mask=0x%016llx\n",
pnv_deepest_stop_psscr_val,
pnv_deepest_stop_psscr_mask);
}
 
+   pr_info("cpuidle-powernv: Requested Level (RL) value of first deep stop 
= 0x%llx\n",
+   pnv_first_deep_stop_state);
 out:

[v3 PATCH 2/4] powernv:smp: Add busy-wait loop as fall back for CPU-Hotplug

2017-03-22 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

Currently, the powernv cpu-offline function assumes that platform idle
states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
available. On POWER8, it picks nap as the default state if other deep
idle states like sleep/winkle are not available and enabled in the
platform.

On POWER9, nap is not available and all idle states are managed by
STOP instruction.  The parameters to the idle state are passed through
processor stop status control register (PSSCR).  Hence as such
executing STOP would take parameters from current PSSCR. We do not
want to make any assumptions in kernel on what STOP states and PSSCR
features are configured by the platform.

Ideally platform will configure a good set of stop states that can be
used in the kernel.  We would like to start with a clean slate, if the
platform choose to not configure any state or there is an error in
platform firmware that lead to no stop states being configured or
allowed to be requested.

This patch adds a fallback method for CPU-Hotplug that is similar to
snooze loop at idle where the threads are left to spin at low priority
and hence reduce the cycles consumed.

This is a safe fallback mechanism in the case when no stop state would
be requested if the platform firmware did not configure them most
likely due to an error condition.

Requesting a stop state when the platform has not configured them or
enabled them would lead to further error conditions which could be
difficult to debug.

[Changelog written with inputs from sva...@linux.vnet.ibm.com]
Reviewed-by: Nicholas Piggin 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/platforms/powernv/idle.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 419edff..f335e0f 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -283,8 +283,16 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
   (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
srr1 = power7_sleep();
-   } else {
+   } else if (idle_states & OPAL_PM_NAP_ENABLED) {
srr1 = power7_nap(1);
+   } else {
+   /* This is the fallback method. We emulate snooze */
+   while (!generic_check_cpu_restart(cpu)) {
+   HMT_low();
+   HMT_very_low();
+   }
+   srr1 = 0;
+   HMT_medium();
}
 
return srr1;
-- 
1.9.4

[v3 PATCH 0/4] powernv:idle: Fixes for CPU-Hotplug on POWER DD1.0

2017-03-22 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

Hi,

This is the third version of the patchset containing the fixes to
make CPU-Hotplug working on correctly on POWER9 DD1 systems.

The earlier versions can be found here:
[v2] : https://lkml.org/lkml/2017/3/20/555
[v1] : https://lkml.org/lkml/2017/3/13/46

The only change in this patch series from v2 are the following
optimizations suggested by Nicholas Piggin.

- Dynamically allocate the thread_sibling_pacas array to contain
  "threads_per_core" number of slots instead of declaring the array
  size upfront.

- Use SPRN_TIR instead of (SPRN_PIR & 0x3) to determine the thread's
  index within a core.

Patch 4 in the series requires Nicholas Piggin's ack. Rest of the
patches are unchanged from the previous version.

These patches are based on v4.11-rc3.

The patches have been tested with stop1 (ESL=EC=1) as the
deepest-state entered into during CPU-Hotplug.

Gautham R. Shenoy (4):
  powernv: Move CPU-Offline idle state invocation from smp.c to idle.c
  powernv:smp: Add busy-wait loop as fall back for CPU-Hotplug
  powernv:idle: Don't override default/deepest directly in kernel
  powernv: Recover correct PACA on wakeup from a stop on P9 DD1

 arch/powerpc/include/asm/cpuidle.h   |  1 +
 arch/powerpc/include/asm/paca.h  |  5 ++
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kernel/idle_book3s.S| 48 +++-
 arch/powerpc/platforms/powernv/idle.c| 96 ++--
 arch/powerpc/platforms/powernv/powernv.h |  2 -
 arch/powerpc/platforms/powernv/smp.c | 18 +-
 7 files changed, 136 insertions(+), 35 deletions(-)

-- 
1.9.4

Re: [PATCH 1/3] drivers/of/base.c: Add of_property_read_u64_index

2017-03-22 Thread Rob Herring

On Tue, Mar 21, 2017 at 10:49 PM, Alistair Popple  wrote:
> There is of_property_read_u32_index but no u64 variant. This patch
> adds one similar to the u32 version for u64.
>
> Signed-off-by: Alistair Popple 
> ---
>  drivers/of/base.c  | 31 +++
>  include/linux/of.h |  3 +++
>  2 files changed, 34 insertions(+)

Acked-by: Rob Herring

Re: Optimised memset64/memset32 for powerpc

2017-03-22 Thread Matthew Wilcox

On Wed, Mar 22, 2017 at 08:26:12AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2017-03-21 at 06:29 -0700, Matthew Wilcox wrote:
> > 
> > Well, those are the generic versions in the first patch:
> > 
> > http://git.infradead.org/users/willy/linux-dax.git/commitdiff/538b977
> > 6ac925199969bd5af4e994da776d461e7
> > 
> > so if those are good enough for you guys, there's no need for you to
> > do anything.
> > 
> > Thanks for your time!
> 
> I suspect on ppc64 we can do much better, if anything moving 64-bit at
> a time. Matthew, what are the main use cases of these ?

I've only converted two users so far -- zram was the initial inspiration
for this.  It notices when a page has a pattern in it which is
representable as a repetition of an 'unsigned long' (this seems to be
a relatively common thing for userspace to do -- not as common as an
entirely zero page, but common enough to be worth optimising for).  So it
may be doing an entire page worth of this to handle a page fault, or if
there's an I/O to such a page, it will be doing a multiple of 512 bytes.

The other user is sym53c8xx_2; it's an initialisation path thing, and
it saves a few bytes in the driver to call the optimised routine rather
than have its own loop to initialise the array.

I suspect we have additional places in the kernel that could use
memset32/memset64 -- look for loops which store a value which is not
dependent on the loop counter.  They're probably not performance path
though; I'd focus on zram as being the case to optimise for.

There's one other potential user I've been wondering about, which are the
various console drivers.  They use 'memsetw' to blank the entire console
or lines of the console when scrolling, but the only architecture which
ever bothered implementing an optimised version of it was Alpha.

Might be worth it on powerpc actually ... better than a loop calling
cpu_to_le16() on each iteration.  That'd complete the set with a
memset16().

Re: Revert "powerpc/64: Disable use of radix under a hypervisor"

2017-03-22 Thread Michael Ellerman

On Tue, 2017-03-21 at 01:38:02 UTC, Paul Mackerras wrote:
> This reverts commit 3f91a89d424a79f8082525db5a375e438887bb3e.
> 
> Now that we do have the machinery for using the radix MMU under a
> hypervisor, the extra check and comment introduced in 3f91a89d424a are
> no longer correct.  The result is that when booted under a hypervisor
> that only allows use of radix, we clear the MMU_FTR_TYPE_RADIX and
> then set it again, and print a warning about ignoring the
> disable_radix command line option, even though the command line does
> not include "disable_radix".
> 
> Signed-off-by: Paul Mackerras 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/fc36a903265c18d124cefaba364a7f

cheers

Re: gcc-plugins: update architecture list in documentation

2017-03-22 Thread Michael Ellerman

On Mon, 2017-03-20 at 06:55:22 UTC, Andrew Donnellan wrote:
> Commit 65c059bcaa73 ("powerpc: Enable support for GCC plugins") enabled GCC
> plugins on powerpc, but neglected to update the architecture list in the
> docs. Rectify this.
> 
> Fixes: 65c059bcaa73 ("powerpc: Enable support for GCC plugins")
> Signed-off-by: Andrew Donnellan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/cc638a488a5205713b51eabd047be6

cheers

Re: [PATCH] powerpc: fix /proc/self/stack

2017-03-22 Thread Michael Ellerman

Thadeu Lima de Souza Cascardo  writes:

> For the current task, the kernel stack would only tell the last time the
> process was rescheduled, if ever. Use the current stack pointer for the
> current task.

You say "fix" in the subject, but is it a bug, or just an enhancement?

> This is also consistent with some other architectures.

Such as .. arm64 and x86 (though it's buried in the unwind code).

> diff --git a/arch/powerpc/kernel/stacktrace.c 
> b/arch/powerpc/kernel/stacktrace.c
> index 6671195..2446066 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -59,7 +59,12 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
>  
>  void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>  {
> - save_context_stack(trace, tsk->thread.ksp, tsk, 0);
> + unsigned long sp = tsk->thread.ksp;
> +
> + if (tsk == current)
> + sp = current_stack_pointer();
else
sp = tsk->thread.ksp;

Would be clearer IMHO.

> +
> + save_context_stack(trace, sp, tsk, 0);
>  }
>  EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
>  
> -- 
> 2.9.3


cheers

Re: [PATCH] powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration

2017-03-22 Thread Michael Ellerman

Vaidyanathan Srinivasan  writes:
> * Michael Ellerman  [2017-03-20 14:05:39]:
>> Vaidyanathan Srinivasan  writes:
>  
>> > On powernv platform cpu_present could be less than cpu_possible
>> > in cases where firmware detects the cpu, but it is not available
>> > for OS.
>> 
>> It's entirely normal for present < possible, on my laptop for example,
>> so I don't see how that causes the bug.
>
> Yes, present < possible in itself not a problem.  It is whether
> cpu_device exist for that cpu or not.
...
>
> Currently if CONFIG_HOTPLUG_CPU=n, then we skip calling register_cpu()
> and that causes the problem.
...
>> 
>> I really don't understand how a CPU not being present leads to a crash
>> in printf()? Something in that call chain should have checked that the
>> CPU was registered before crashing in printf() - surely?
>
> Yes, we should have just failed to register the cpuidle driver.  I have
> the fix here:
>
> [PATCH] cpuidle: Validate cpu_dev in cpuidle_add_sysfs
> http://patchwork.ozlabs.org/patch/740634/

OK. Can you send a v2 of this with a better change log that includes all
the clarifications above.

And despite your subject being powerpc/powernv/cpuidle, this is a
cpuidle patch. I can merge it, but I at least need you to Cc the cpuidle
maintainers so they have a chance to see it.

cheers

Re: [PATCH 1/2] powerpc/powernv: process interrupts from system reset wakeup

2017-03-22 Thread Michael Ellerman

Nicholas Piggin  writes:

> When the CPU wakes from low power state, it begins at the system reset
> interrupt with the exception that caused the wakeup encoded in SRR1.
>
> Today, powernv idle wakeup ignores the wakeup reason (except a special
> case for HMI), and the regular interrupt corresponding to the exception
> will fire after the idle wakeup exits.
>
> Change this to replay the interrupt from the idle wakeup before
> interrupts are hard-enabled.
>
> Test on POWER8 of context_switch selftests benchmark with polling idle
> disabled (e.g., always nap) gives the following results:
>
> original wakeup direct
> Different threads, same core:   315k/s   264k/s
> Different cores:235k/s   242k/s
>
> There is a slowdown for doorbell IPI (same core) case because system
> reset wakeup does not clear the message and the doorbell interrupt fires
> again needlessly.

Seems like a win.

> diff --git a/arch/powerpc/include/asm/machdep.h 
> b/arch/powerpc/include/asm/machdep.h
> index 5011b69107a7..c0d9fd2e8c04 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -223,7 +223,6 @@ struct machdep_calls {
>  
>  extern void e500_idle(void);
>  extern void power4_idle(void);
> -extern void power7_idle(void);
>  extern void ppc6xx_idle(void);
>  extern void book3e_idle(void);
>  
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index e0fecbcea2a2..8190943d2619 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -454,6 +454,7 @@ extern int powersave_nap; /* set if nap mode can be used 
> in idle loop */
>  extern unsigned long power7_nap(int check_irq);
>  extern unsigned long power7_sleep(void);
>  extern unsigned long power7_winkle(void);
> +extern unsigned long power7_idle(void);
>  extern unsigned long power9_idle_stop(unsigned long stop_psscr_val,
> unsigned long stop_psscr_mask);
>  
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 54546b632026..6b28a4f9c1fd 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -234,11 +234,16 @@ u64 pnv_default_stop_mask;
>  /*
>   * Used for ppc_md.power_save which needs a function with no parameters
>   */
> -static void power9_idle(void)
> +static void power9_power_save(void)
>  {
>   power9_idle_stop(pnv_default_stop_val, pnv_default_stop_mask);
>  }
>  
> +static void power7_power_save(void)
> +{
> + power7_idle();
> +}

Erk. This makes me wonder if we can just mandate using cpuidle for
powernv and drop ppc_md.power_save. I wonder who if anyone has ever
tested powernv without cpuidle.

I notice we already have:

config CPU_IDLE
bool "CPU idle PM support"
default y if ACPI || PPC_PSERIES


But not your problem for this patch.

> @@ -534,9 +539,9 @@ static int __init pnv_init_idle_states(void)
>   }
>  
>   if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> - ppc_md.power_save = power7_idle;
> + ppc_md.power_save = power7_power_save;
>   else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> - ppc_md.power_save = power9_idle;
> + ppc_md.power_save = power9_power_save;
>  
>  out:
>   return 0;
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index 370593006f5f..e7e080c0790c 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -70,13 +70,37 @@ static int snooze_loop(struct cpuidle_device *dev,
>   return index;
>  }
>  
> +/*
> + * Table to convert shifted SRR1 wakeup reason for POWER7, POWER8, POWER9
> + * to __replay_interrupt vector.
> + */
> +static const unsigned short srr1_wakeup_to_replay_table[0x10] =
> +{ 0, 0, 0, 0xe80,/* 0x3 = hv doorbell */
> +  0, 0xa00,  /* 0x5 = doorbell */
> +  0x900, /* 0x6 = decrementer */
> +  0, 0x500,  /* 0x8 = external */
> +  0xea0, /* 0x9 = hv virt (POWER9) */
> +  0xe60, /* 0xa = hmi */
> +  0, 0, 0, 0, 0, };
> +
> +/* Shift SRR1_WAKEMASK_P8 down and convert to __replay_interrupt vector */
> +#define SRR1_TO_REPLAY(srr1) \
> + ((unsigned int)srr1_wakeup_to_replay_table[((srr1) >> 18) & 0xf])

This is a bit hairy, I'd just use a switch, but I guess this generates
vastly better code?

> +
>  static int nap_loop(struct cpuidle_device *dev,
>   struct cpuidle_driver *drv,
>   int index)
>  {
> + u64 srr1;
> + unsigned int reason;
> +
>   ppc64_runlatch_off();
> - power7_idle();
> + srr1 = power7_idle();
>   ppc64_runlatch_on();
> +
> + reason = SRR1_TO_REPLAY(srr1);
> + __replay_interrupt(reason);
> +
>   return index;
>  }
>  
> @@ -111,10 +135,17 @@ static

Re: [v2 PATCH 4/4] powernv: Recover correct PACA on wakeup from a stop on P9 DD1

2017-03-22 Thread Nicholas Piggin

On Wed, 22 Mar 2017 11:28:46 +0530
Gautham R Shenoy  wrote:

> On Tue, Mar 21, 2017 at 02:59:46AM +1000, Nicholas Piggin wrote:
> > On Mon, 20 Mar 2017 21:24:18 +0530
> > "Gautham R. Shenoy"  wrote:
> >   
> > > From: "Gautham R. Shenoy" 
> > > 
> > > POWER9 DD1.0 hardware has an issue due to which the SPRs of a thread
> > > waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the
> > > core. Thus the HSPRG0 of a thread waking up from can contain the paca
> > > pointer of its sibling.
> > > 
> > > This patch implements a context recovery framework within threads of a
> > > core, by provisioning space in paca_struct for saving every sibling
> > > threads's paca pointers. Basically, we should be able to arrive at the
> > > right paca pointer from any of the thread's existing paca pointer.
> > > 
> > > At bootup, during powernv idle-init, we save the paca address of every
> > > CPU in each one its siblings paca_struct in the slot corresponding to
> > > this CPU's index in the core.
> > > 
> > > On wakeup from a stop, the thread will determine its index in the core
> > > from the lower 2 bits of the PIR register and recover its PACA pointer
> > > by indexing into the correct slot in the provisioned space in the
> > > current PACA.
> > > 
> > > Furthermore, ensure that the NVGPRs are restored from the stack on the
> > > way out by setting the NAPSTATELOST in paca.  
> > 
> > Thanks for expanding on this, it makes the patch easier to follow :)
> > 
> > As noted before, I think if we use PACA_EXNMI for system reset, then
> > *hopefully* there should be minimal races with the initial use of other
> > thread's PACA at the start of the exception. So I'll work on getting
> > that in, but it need not prevent this patch from being merged first
> > IMO.
> >   
> > > [Changelog written with inputs from sva...@linux.vnet.ibm.com]
> > > 
> > > Signed-off-by: Gautham R. Shenoy 
> > > ---
> > >  arch/powerpc/include/asm/paca.h   |  5 
> > >  arch/powerpc/kernel/asm-offsets.c |  1 +
> > >  arch/powerpc/kernel/idle_book3s.S | 49 
> > > ++-
> > >  arch/powerpc/platforms/powernv/idle.c | 22 
> > >  4 files changed, 76 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/paca.h 
> > > b/arch/powerpc/include/asm/paca.h
> > > index 708c3e5..4405630 100644
> > > --- a/arch/powerpc/include/asm/paca.h
> > > +++ b/arch/powerpc/include/asm/paca.h
> > > @@ -172,6 +172,11 @@ struct paca_struct {
> > >   u8 thread_mask;
> > >   /* Mask to denote subcore sibling threads */
> > >   u8 subcore_sibling_mask;
> > > + /*
> > > +  * Pointer to an array which contains pointer
> > > +  * to the sibling threads' paca.
> > > +  */
> > > + struct paca_struct *thread_sibling_pacas[8];  
> 
> > 
> > Is 8 the right number? I wonder if we have a define for it.  
> 
> Thats the maximum number of threads per core that we have had on POWER
> so far.
> 
> Perhaps, I can make this
> 
>struct paca_struct **thread_sibling_pacas;
> 
> and allocate threads_per_core number of slots in
> pnv_init_idle_states. Sounds ok ?

I guess that would minimise PACA overhead for non-DD1 machines,
so if it's not too much trouble, that might be good.


> > > +power9_dd1_recover_paca:
> > > + mfspr   r4, SPRN_PIR
> > > + clrldi  r4, r4, 62  
> > 
> > Does SPRN_TIR work?  
> 
> I wasn't aware of SPRN_TIR!
> 
> I can check this. If my reading of the ISA is correct, TIR should
> contain the thread number which are in the range [0..3].

Yep.


> > Reviewed-by: Nicholas Piggin 
> >   
> 
> Thanks for reviewing the patch.

No problems. Don't worry about the machine check wakeup for the moment
either. It's more important to just get the normal wakeup fix in I think.
We can revisit what to do there after my machine check patches go in
(idle machine check does not really work right now for POWER9 anyway).

Thanks,
Nick

42 matches

Mail list logo