from:"Cyril Bur"

Re: [PATCH] selftests/powerpc: Skip tm-unavailable if TM is not enabled

2018-03-05 Thread Cyril Bur

On Mon, 2018-03-05 at 15:48 -0500, Gustavo Romero wrote:
> Some processor revisions do not support transactional memory, and
> additionally kernel support can be disabled. In either case the
> tm-unavailable test should be skipped, otherwise it will fail with
> a SIGILL.
> 
> That commit also sets this selftest to be called through the test
> harness as it's done for other TM selftests.
> 
> Finally, it avoids using "ping" as a thread name since it's
> ambiguous and can be confusing when shown, for instance,
> in a kernel backtrace log.
> 

I spent more time than I care to admit looking at backtraces wondering
how "ping" got in the mix ;).


> Fixes: 77fad8bfb1d2 ("selftests/powerpc: Check FP/VEC on exception in TM")
> Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  .../testing/selftests/powerpc/tm/tm-unavailable.c  | 24 
> ++
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
> b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> index e6a0fad..156c8e7 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> @@ -80,7 +80,7 @@ bool is_failure(uint64_t condition_reg)
>   return ((condition_reg >> 28) & 0xa) == 0xa;
>  }
>  
> -void *ping(void *input)
> +void *tm_una_ping(void *input)
>  {
>  
>   /*
> @@ -280,7 +280,7 @@ void *ping(void *input)
>  }
>  
>  /* Thread to force context switch */
> -void *pong(void *not_used)
> +void *tm_una_pong(void *not_used)
>  {
>   /* Wait thread get its name "pong". */
>   if (DEBUG)
> @@ -311,11 +311,11 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr)
>   do {
>   int rc;
>  
> - /* Bind 'ping' to CPU 0, as specified in 'attr'. */
> - rc = pthread_create(, attr, ping, (void *) );
> + /* Bind to CPU 0, as specified in 'attr'. */
> + rc = pthread_create(, attr, tm_una_ping, (void *) );
>   if (rc)
>   pr_err(rc, "pthread_create()");
> - rc = pthread_setname_np(t0, "ping");
> + rc = pthread_setname_np(t0, "tm_una_ping");
>   if (rc)
>   pr_warn(rc, "pthread_setname_np");
>   rc = pthread_join(t0, _value);
> @@ -333,13 +333,15 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr)
>   }
>  }
>  
> -int main(int argc, char **argv)
> +int tm_unavailable_test(void)
>  {
>   int rc, exception; /* FP = 0, VEC = 1, VSX = 2 */
>   pthread_t t1;
>   pthread_attr_t attr;
>   cpu_set_t cpuset;
>  
> + SKIP_IF(!have_htm());
> +
>   /* Set only CPU 0 in the mask. Both threads will be bound to CPU 0. */
>   CPU_ZERO();
>   CPU_SET(0, );
> @@ -354,12 +356,12 @@ int main(int argc, char **argv)
>   if (rc)
>   pr_err(rc, "pthread_attr_setaffinity_np()");
>  
> - rc = pthread_create(,  /* Bind 'pong' to CPU 0 */, pong, NULL);
> + rc = pthread_create(,  /* Bind to CPU 0 */, tm_una_pong, NULL);
>   if (rc)
>   pr_err(rc, "pthread_create()");
>  
>   /* Name it for systemtap convenience */
> - rc = pthread_setname_np(t1, "pong");
> + rc = pthread_setname_np(t1, "tm_una_pong");
>   if (rc)
>   pr_warn(rc, "pthread_create()");
>  
> @@ -394,3 +396,9 @@ int main(int argc, char **argv)
>   exit(0);
>   }
>  }
> +
> +int main(int argc, char **argv)
> +{
> + test_harness_set_timeout(220);
> + return test_harness(tm_unavailable_test, "tm_unavailable_test");
> +}

Re: [RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit

2018-02-19 Thread Cyril Bur

On Tue, 2018-02-20 at 16:25 +1100, Michael Neuling wrote:
> > > > @@ -1055,6 +1082,8 @@ void restore_tm_state(struct pt_regs *regs)
> > > > msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
> > > > msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
> > > >  
> > > > +   tm_recheckpoint(>thread);
> > > > +
> > > 
> > > So why do we do tm_recheckpoint at all? Shouldn't most of the tm_blah 
> > > code go
> > > away in process.c after all this?
> > > 
> > 
> > I'm not sure I follow, we need to recheckpoint because we're going back
> > to userspace? Or would you rather calling the tm.S code directly from
> > the exception return path?
> 
> Yeah, I was thinking the point of this series was.  We do tm_reclaim right on
> entry and tm_recheckpoint right on exit.  
> 

Yeah that's the ultimate goal, considering I haven't been attacked or
offered more drugs I feel like what I've done isn't crazy. Your
feedback is great, thanks.

> The bits in between (ie. the tm_blah() calls process.c) would mostly go away.
> 
> 
> > Yes, I hope we'll be able to have a fairly big cleanup commit of tm_
> > code in process.c at the end of this series.
> 
> Yep, agreed.
> 
> Mikey

Re: [RFC PATCH 10/12] [WIP] powerpc/tm: Correctly save/restore checkpointed sprs

2018-02-19 Thread Cyril Bur

On Tue, 2018-02-20 at 14:00 +1100, Michael Neuling wrote:
> This needs a description of what you're trying to do.  "Correctly" doesn't
> really mean anything.
> 
> 
> On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote:
> > ---
> >  arch/powerpc/kernel/process.c | 57 
> > +-
> > -
> >  arch/powerpc/kernel/ptrace.c  |  9 +++
> >  2 files changed, 58 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index cd3ae80a6878..674f75c56172 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -859,6 +859,8 @@ static inline bool tm_enabled(struct task_struct *tsk)
> > return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
> >  }
> >  
> > +static inline void save_sprs(struct thread_struct *t);
> > +
> >  static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
> >  {
> > /*
> > @@ -879,6 +881,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
> > uint8_t cause)
> > if (!MSR_TM_SUSPENDED(mfmsr()))
> > return;
> >  
> > +   save_sprs(thr);
> > +
> > giveup_all(container_of(thr, struct task_struct, thread));
> >  
> > tm_reclaim(thr, cause);
> > @@ -991,6 +995,37 @@ void tm_recheckpoint(struct thread_struct *thread)
> >  
> > __tm_recheckpoint(thread);
> >  
> > +   /*
> > +* This is a stripped down restore_sprs(), we need to do this
> > +* now as we might go straight out to userspace and currently
> > +* the checkpointed values are on the CPU.
> > +*
> > +* TODO: Improve
> > +*/
> > +#ifdef CONFIG_ALTIVEC
> > +   if (cpu_has_feature(CPU_FTR_ALTIVEC))
> > +   mtspr(SPRN_VRSAVE, thread->vrsave);
> > +#endif
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +   if (cpu_has_feature(CPU_FTR_DSCR)) {
> > +   u64 dscr = get_paca()->dscr_default;
> > +   if (thread->dscr_inherit)
> > +   dscr = thread->dscr;
> > +
> > +   mtspr(SPRN_DSCR, dscr);
> > +   }
> > +
> > +   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> > +   /* The EBB regs aren't checkpointed */
> > +   mtspr(SPRN_FSCR, thread->fscr);
> > +
> > +   mtspr(SPRN_TAR, thread->tar);
> > +   }
> > +
> > +   /* I think we don't need to */
> > +   if (cpu_has_feature(CPU_FTR_ARCH_300))
> > +   mtspr(SPRN_TIDR, thread->tidr);
> > +#endif
> 
> Why are you touching all the above hunk?

I copied restore_sprs. I'm tidying that up now - we can't call
restore_sprs because we don't have a prev and next thread.

> 
> > local_irq_restore(flags);
> >  }
> >  
> > @@ -1193,6 +1228,11 @@ struct task_struct *__switch_to(struct task_struct
> > *prev,
> >  #endif
> >  
> > new_thread = >thread;
> > +   /*
> > +* Why not >thread; ?
> > +* What is the difference between >thread and
> > +* >thread ?
> > +*/
> 
> Why not just work it out and FIX THE CODE, rather than just rabbiting on about
> it! :-P

Agreed - I started to and then had a mini freakout that things would
end really badly if they're not the same. So I left that comment as a
reminder to investigate.

They should be the same though right?

> 
> > old_thread = >thread;
> >  
> > WARN_ON(!irqs_disabled());
> > @@ -1237,8 +1277,16 @@ struct task_struct *__switch_to(struct task_struct
> > *prev,
> > /*
> >  * We need to save SPRs before treclaim/trecheckpoint as these will
> >  * change a number of them.
> > +*
> > +* Because we're now reclaiming on kernel entry, we've had to
> > +* already save them. Don't do it again.
> > +* Note: To deliver a signal in the signal context, we'll have
> > +* turned off TM because we don't want the signal context to
> > +* have the transactional state of the main thread - what if
> > +* we go through switch to at that point? Can we?
> >  */
> > -   save_sprs(>thread);
> > +   if (!prev->thread.regs || !MSR_TM_ACTIVE(prev->thread.regs->msr))
> > +   save_sprs(>thread);
> >  
> > /* Save FPU, Altivec, VSX and SPE state */
> > giveup_all(prev);
> > @@ -1260,8 +1308,13 @@ struct task_struct *__switch_to(struct task_struct
> > *prev,
> >  * for this is we manually crea

Re: [RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit

2018-02-19 Thread Cyril Bur

On Tue, 2018-02-20 at 13:50 +1100, Michael Neuling wrote:
> On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote:
> 
> 
> The comment from the cover sheet should be here
> 
> > ---
> >  arch/powerpc/include/asm/exception-64s.h | 25 +
> >  arch/powerpc/kernel/entry_64.S   |  5 +
> >  arch/powerpc/kernel/process.c| 37 
> > 
> >  3 files changed, 63 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/exception-64s.h 
> > b/arch/powerpc/include/asm/exception-64s.h
> > index 471b2274fbeb..f904f19a9ec2 100644
> > --- a/arch/powerpc/include/asm/exception-64s.h
> > +++ b/arch/powerpc/include/asm/exception-64s.h
> > @@ -35,6 +35,7 @@
> >   * implementations as possible.
> >   */
> >  #include 
> > +#include 
> >  
> >  /* PACA save area offsets (exgen, exmc, etc) */
> >  #define EX_R9  0
> > @@ -127,6 +128,26 @@
> > hrfid;  \
> > b   hrfi_flush_fallback
> >  
> > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> > +#define TM_KERNEL_ENTRY
> > \
> > +   ld  r3,_MSR(r1);\
> > +   /* Probably don't need to check if coming from user/kernel */   \
> > +   /* If TM is suspended or active then we must have come from*/   \
> > +   /* userspace */ \
> > +   andi.   r0,r3,MSR_PR;   \
> > +   beq 1f; \
> > +   rldicl. r3,r3,(64-MSR_TS_LG),(64-2); /* SUSPENDED or ACTIVE*/   \
> > +   beql+   1f; /* Not SUSPENDED or ACTIVE */   \
> > +   bl  save_nvgprs;\
> > +   RECONCILE_IRQ_STATE(r10,r11);   \
> > +   li  r3,TM_CAUSE_MISC;   \
> > +   bl  tm_reclaim_current; /* uint8 cause */   \
> > +1:
> > +
> > +#else /* CONFIG_PPC_TRANSACTIONAL_MEM */
> > +#define TM_KERNEL_ENTRY
> > +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
> > +
> >  #ifdef CONFIG_RELOCATABLE
> >  #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)   
> > \
> > mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
> > @@ -675,6 +696,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
> > EXCEPTION_PROLOG_COMMON(trap, area);\
> > /* Volatile regs are potentially clobbered here */  \
> > additions;  \
> > +   /* This is going to need to go somewhere else as well */\
> > +   /* See comment in tm_recheckpoint()   */\
> > +   TM_KERNEL_ENTRY;\
> > addir3,r1,STACK_FRAME_OVERHEAD; \
> > bl  hdlr;   \
> > b   ret
> > @@ -689,6 +713,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
> > EXCEPTION_PROLOG_COMMON_3(trap);\
> > /* Volatile regs are potentially clobbered here */  \
> > additions;  \
> > +   TM_KERNEL_ENTRY;\
> > addir3,r1,STACK_FRAME_OVERHEAD; \
> > bl  hdlr
> >  
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 2cb5109a7ea3..107c15c6f48b 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -126,6 +126,11 @@ BEGIN_FW_FTR_SECTION
> >  33:
> >  END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> >  #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
> > +   TM_KERNEL_ENTRY
> > +   REST_GPR(0,r1)
> > +   REST_4GPRS(3,r1)
> > +   REST_2GPRS(7,r1)
> > +   addir9,r1,STACK_FRAME_OVERHEAD
> 
> Why are we doing these restores here now?

The syscall handler expects the syscall params to still be in their
respective regs.

> 
> >  
> > /*
> >  * A syscall should always be called with interrupts enabled
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 77dc6d8288eb..ea75da0fd506 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -951,6 +951,23 @@ void tm_recheckpoint(struct thread_struct

Re: [RFC PATCH 06/12] [WIP] powerpc/tm: Remove dead code from __switch_to_tm()

2018-02-19 Thread Cyril Bur

On Tue, 2018-02-20 at 13:52 +1100, Michael Neuling wrote:
> Not sure I understand this.. should it be merged with the last patch?
> 

Its all going to have to be one patch - I've left it split out to make
it more obvious which bits have had to mess with, this series
absolutely doesn't bisect.

> Needs a comment here.
> 
> 
> On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote:
> > ---
> >  arch/powerpc/kernel/process.c | 24 +---
> >  1 file changed, 5 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index ea75da0fd506..574b05fe7d66 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1027,27 +1027,13 @@ static inline void __switch_to_tm(struct 
> > task_struct *prev,
> > struct task_struct *new)
> >  {
> > /*
> > -* So, with the rework none of this code should not be needed.
> > -* I've left in the reclaim for now. This *should* save us
> > -* from any mistake in the new code. Also the
> > -* enabling/disabling logic of MSR_TM really should be
> > +* The enabling/disabling logic of MSR_TM really should be
> >  * refactored into a common way with MSR_{FP,VEC,VSX}
> >  */
> > -   if (cpu_has_feature(CPU_FTR_TM)) {
> > -   if (tm_enabled(prev) || tm_enabled(new))
> > -   tm_enable();
> > -
> > -   if (tm_enabled(prev)) {
> > -   prev->thread.load_tm++;
> > -   tm_reclaim_task(prev);
> > -   /*
> > -* The disabling logic may be confused don't
> > -* disable for now
> > -*
> > -* if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
> > prev->thread.load_tm == 0)
> > -*  prev->thread.regs->msr &= ~MSR_TM;
> > -*/
> > -   }
> > +   if (cpu_has_feature(CPU_FTR_TM) && tm_enabled(prev)) {
> > +   prev->thread.load_tm++;
> > +   if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
> > prev->thread.load_tm == 0)
> > +   prev->thread.regs->msr &= ~MSR_TM;
> > }
> >  }
> >

Re: [RFC PATCH 12/12] [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest

2018-02-19 Thread Cyril Bur

On Tue, 2018-02-20 at 14:04 +1100, Michael Neuling wrote:
> > --- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
> > +++ /dev/null
> > @@ -1,106 +0,0 @@
> > -/*
> > - * Copyright 2015, Sam Bobroff, IBM Corp.
> > - * Licensed under GPLv2.
> > - *
> > - * Test the kernel's system call code to ensure that a system call
> > - * made from within an active HTM transaction is aborted with the
> > - * correct failure code.
> 
> The above is still true
> 
> > - * Conversely, ensure that a system call made from within a
> > - * suspended transaction can succeed.
> 
> This is true anymore
> 
> So can we just modify the test to remove the second part?
> 

Oh true I overlooked that

Thanks

> Mikey

[RFC PATCH 12/12] [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest

2018-02-19 Thread Cyril Bur

Currently we perform transactional memory work at late as possible.
That is we run in the kernel with the userspace checkpointed state on
the CPU untill we absolultely must remove it and store it away. Likely
a process switch, but possibly also signals or ptrace.

What this means is that if userspace does a system call in suspended
mode, it is possible that we will handle the system call and return
them without the need to to a reclaim/recheckpoint and so they can
expect to resume their transaction.

This is what tm-syscall tests for - the ability to perform a system
call in suspended state and still resume it afterwards.

TM reworks have meant that we now deal with any transactional state on
entry to the kernel, no matter the reason for entry (some expections
apply). We will categorically doom any suspended transaction that makes
a system call, making that transaction unresumeable.

This test will now always fail no matter what. I would like to note
here that this new behaviour does not break userspace at all. Hardware
Transactional Memory gives zero guarantee of forward progress and any
correct userspace has already had and will always have to implement a
non HTM fallback. Relying on this specific kernel behaviour also meant
relying on the stars aligning in the hardware such that there was no
cache overlaps and that it had a large enough footprint to handle
any system call without dooming a transaction.
---
 tools/testing/selftests/powerpc/tm/Makefile|   4 +-
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |  28 --
 tools/testing/selftests/powerpc/tm/tm-syscall.c| 106 -
 3 files changed, 1 insertion(+), 137 deletions(-)
 delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 7a1e53297588..88d6edffcb24 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -2,7 +2,7 @@
 SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu \
tm-signal-context-chk-vmx tm-signal-context-chk-vsx
 
-TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
+TEST_GEN_PROGS := tm-resched-dscr tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
tm-signal-drop-transaction \
$(SIGNAL_CONTEXT_CHK_TESTS)
@@ -13,8 +13,6 @@ $(TEST_GEN_PROGS): ../harness.c ../utils.c
 
 CFLAGS += -mhtm
 
-$(OUTPUT)/tm-syscall: tm-syscall-asm.S
-$(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include
 $(OUTPUT)/tm-tmspr: CFLAGS += -pthread
 $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S 
b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
deleted file mode 100644
index bd1ca25febe4..
--- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
+++ /dev/null
@@ -1,28 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include 
-#include 
-
-   .text
-FUNC_START(getppid_tm_active)
-   tbegin.
-   beq 1f
-   li  r0, __NR_getppid
-   sc
-   tend.
-   blr
-1:
-   li  r3, -1
-   blr
-
-FUNC_START(getppid_tm_suspended)
-   tbegin.
-   beq 1f
-   li  r0, __NR_getppid
-   tsuspend.
-   sc
-   tresume.
-   tend.
-   blr
-1:
-   li  r3, -1
-   blr
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
deleted file mode 100644
index 454b965a2db3..
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ /dev/null
@@ -1,106 +0,0 @@
-/*
- * Copyright 2015, Sam Bobroff, IBM Corp.
- * Licensed under GPLv2.
- *
- * Test the kernel's system call code to ensure that a system call
- * made from within an active HTM transaction is aborted with the
- * correct failure code.
- * Conversely, ensure that a system call made from within a
- * suspended transaction can succeed.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "utils.h"
-#include "tm.h"
-
-extern int getppid_tm_active(void);
-extern int getppid_tm_suspended(void);
-
-unsigned retries = 0;
-
-#define TEST_DURATION 10 /* seconds */
-#define TM_RETRIES 100
-
-pid_t getppid_tm(bool suspend)
-{
-   int i;
-   pid_t pid;
-
-   for (i = 0; i < TM_RETRIES; i++) {
-   if (suspend)
-   pid = getppid_tm_suspended();
-   else
-   pid = getppid_tm_active();
-
-   if (pid >= 0)
-   return pid;
-
-   if (failure_is_persistent()) {
-   if (failure_is_syscall())
-   return -1;
-
-   printf("Unexpected

[RFC PATCH 10/12] [WIP] powerpc/tm: Correctly save/restore checkpointed sprs

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/process.c | 57 +--
 arch/powerpc/kernel/ptrace.c  |  9 +++
 2 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index cd3ae80a6878..674f75c56172 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -859,6 +859,8 @@ static inline bool tm_enabled(struct task_struct *tsk)
return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
 }
 
+static inline void save_sprs(struct thread_struct *t);
+
 static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
 {
/*
@@ -879,6 +881,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, 
uint8_t cause)
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   save_sprs(thr);
+
giveup_all(container_of(thr, struct task_struct, thread));
 
tm_reclaim(thr, cause);
@@ -991,6 +995,37 @@ void tm_recheckpoint(struct thread_struct *thread)
 
__tm_recheckpoint(thread);
 
+   /*
+* This is a stripped down restore_sprs(), we need to do this
+* now as we might go straight out to userspace and currently
+* the checkpointed values are on the CPU.
+*
+* TODO: Improve
+*/
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   mtspr(SPRN_VRSAVE, thread->vrsave);
+#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (cpu_has_feature(CPU_FTR_DSCR)) {
+   u64 dscr = get_paca()->dscr_default;
+   if (thread->dscr_inherit)
+   dscr = thread->dscr;
+
+   mtspr(SPRN_DSCR, dscr);
+   }
+
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   /* The EBB regs aren't checkpointed */
+   mtspr(SPRN_FSCR, thread->fscr);
+
+   mtspr(SPRN_TAR, thread->tar);
+   }
+
+   /* I think we don't need to */
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   mtspr(SPRN_TIDR, thread->tidr);
+#endif
local_irq_restore(flags);
 }
 
@@ -1193,6 +1228,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
 #endif
 
new_thread = >thread;
+   /*
+* Why not >thread; ?
+* What is the difference between >thread and
+* >thread ?
+*/
old_thread = >thread;
 
WARN_ON(!irqs_disabled());
@@ -1237,8 +1277,16 @@ struct task_struct *__switch_to(struct task_struct *prev,
/*
 * We need to save SPRs before treclaim/trecheckpoint as these will
 * change a number of them.
+*
+* Because we're now reclaiming on kernel entry, we've had to
+* already save them. Don't do it again.
+* Note: To deliver a signal in the signal context, we'll have
+* turned off TM because we don't want the signal context to
+* have the transactional state of the main thread - what if
+* we go through switch to at that point? Can we?
 */
-   save_sprs(>thread);
+   if (!prev->thread.regs || !MSR_TM_ACTIVE(prev->thread.regs->msr))
+   save_sprs(>thread);
 
/* Save FPU, Altivec, VSX and SPE state */
giveup_all(prev);
@@ -1260,8 +1308,13 @@ struct task_struct *__switch_to(struct task_struct *prev,
 * for this is we manually create a stack frame for new tasks that
 * directly returns through ret_from_fork() or
 * ret_from_kernel_thread(). See copy_thread() for details.
+*
+* It isn't stricly nessesary that we avoid the restore here
+* because we'll simply restore again after the recheckpoint,
+* but we can avoid it for performance reasons.
 */
-   restore_sprs(old_thread, new_thread);
+   if (!new_thread->regs || !MSR_TM_ACTIVE(new_thread->regs->msr))
+   restore_sprs(old_thread, new_thread);
 
last = _switch(old_thread, new_thread);
 
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index ca72d7391d40..16001987ba71 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -135,12 +135,9 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current))
return;
 
-   if (MSR_TM_SUSPENDED(mfmsr())) {
-   tm_reclaim_current(TM_CAUSE_SIGNAL);
-   } else {
-   tm_enable();
-   tm_save_sprs(&(tsk->thread));
-   }
+   BUG_ON(MSR_TM_SUSPENDED(mfmsr()));
+   tm_enable();
+   tm_save_sprs(&(tsk->thread));
 }
 #else
 static inline void flush_tmregs_to_thread(struct task_struct *tsk) { }
-- 
2.16.2

[RFC PATCH 08/12] [WIP] powerpc/tm: Fix *unavailable_tm exceptions

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/process.c | 11 ++-
 arch/powerpc/kernel/traps.c   |  3 ---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 574b05fe7d66..8a32fd062a2b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -500,11 +500,20 @@ void giveup_all(struct task_struct *tsk)
 
usermsr = tsk->thread.regs->msr;
 
+   /*
+* The *_unavailable_tm() functions might call this in a
+* transaction but with not FP or VEC or VSX meaning that the
+* if condition below will be true, this is bad since we will
+* have preformed a reclaim but not set the TIF flag which
+* must be set in order to trigger the recheckpoint.
+*
+* possibleTODO: Move setting the TIF flag into reclaim code
+*/
+   check_if_tm_restore_required(tsk);
if ((usermsr & msr_all_available) == 0)
return;
 
msr_check_and_set(msr_all_available);
-   check_if_tm_restore_required(tsk);
 
WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & 
MSR_VEC)));
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1e48d157196a..dccfcaf4f603 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1728,7 +1728,6 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * If VMX is in use, the VRs now hold checkpointed values,
 * so we don't want to load the VRs from the thread_struct.
 */
-   tm_recheckpoint(>thread);
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
@@ -1742,7 +1741,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
-   tm_recheckpoint(>thread);
current->thread.used_vr = 1;
 }
 
@@ -1767,7 +1765,6 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.load_vec = 1;
current->thread.load_fp = 1;
 
-   tm_recheckpoint(>thread);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.16.2

[RFC PATCH 11/12] [WIP] powerpc/tm: Afterthoughts

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/process.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 674f75c56172..6ce41ee62b24 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1079,6 +1079,12 @@ static inline void __switch_to_tm(struct task_struct 
*prev,
if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
prev->thread.load_tm == 0)
prev->thread.regs->msr &= ~MSR_TM;
}
+
+   /*
+* Now that we're reclaiming on kernel entry, we should never
+* get here still with user checkpointed state on the CPU
+*/
+   BUG_ON(MSR_TM_ACTIVE(mfmsr()));
 }
 
 /*
@@ -1326,7 +1332,17 @@ struct task_struct *__switch_to(struct task_struct *prev,
}
 
if (current_thread_info()->task->thread.regs) {
-   restore_math(current_thread_info()->task->thread.regs);
+   /*
+* Calling this now has reloaded the live state, which
+* gets overwritten with the checkpointed state right
+* before the trecheckpoint. BUT the MSR still has
+* that the live state is on the CPU, which it isn't.
+*
+* restore_math(current_thread_info()->task->thread.regs);
+* Therefore:
+*/
+   if 
(!MSR_TM_ACTIVE(current_thread_info()->task->thread.regs->msr))
+   restore_math(current_thread_info()->task->thread.regs);
 
/*
 * The copy-paste buffer can only store into foreign real
-- 
2.16.2

[RFC PATCH 04/12] selftests/powerpc: Use less common thread names

2018-02-19 Thread Cyril Bur

"ping" and "pong" (in particular "ping") are common names. If a
selftests causes a kernel BUG_ON or any kind of backtrace the process
name is displayed. Setting a more unique name avoids confusion as to
which process caused the problem.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 tools/testing/selftests/powerpc/tm/tm-unavailable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
index e6a0fad2bfd0..bcfa8add5748 100644
--- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -315,7 +315,7 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr)
rc = pthread_create(, attr, ping, (void *) );
if (rc)
pr_err(rc, "pthread_create()");
-   rc = pthread_setname_np(t0, "ping");
+   rc = pthread_setname_np(t0, "tm-unavailable-ping");
if (rc)
pr_warn(rc, "pthread_setname_np");
rc = pthread_join(t0, _value);
@@ -359,7 +359,7 @@ int main(int argc, char **argv)
pr_err(rc, "pthread_create()");
 
/* Name it for systemtap convenience */
-   rc = pthread_setname_np(t1, "pong");
+   rc = pthread_setname_np(t1, "tm-unavailable-pong");
if (rc)
pr_warn(rc, "pthread_create()");
 
-- 
2.16.2

[RFC PATCH 01/12] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()

2018-02-19 Thread Cyril Bur

tm_reclaim_thread() doesn't use the parameter anymore, both callers have
to bother getting it as they have no need for a struct thread_info
either.

It was previously used but became unused in dc3106690b20 ("powerpc: tm:
Always use fp_state and vr_state to store live registers")

Just remove it and adjust the callers.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/process.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1738c4127b32..77dc6d8288eb 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -850,8 +850,7 @@ static inline bool tm_enabled(struct task_struct *tsk)
return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
 }
 
-static void tm_reclaim_thread(struct thread_struct *thr,
- struct thread_info *ti, uint8_t cause)
+static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
 {
/*
 * Use the current MSR TM suspended bit to track if we have
@@ -898,7 +897,7 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 void tm_reclaim_current(uint8_t cause)
 {
tm_enable();
-   tm_reclaim_thread(>thread, current_thread_info(), cause);
+   tm_reclaim_thread(>thread, cause);
 }
 
 static inline void tm_reclaim_task(struct task_struct *tsk)
@@ -929,7 +928,7 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 thr->regs->ccr, thr->regs->msr,
 thr->regs->trap);
 
-   tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED);
+   tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
 
TM_DEBUG("--- tm_reclaim on pid %d complete\n",
 tsk->pid);
-- 
2.16.2

[RFC PATCH 07/12] [WIP] powerpc/tm: Add TM_KERNEL_ENTRY in more delicate exception pathes

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/entry_64.S   | 15 ++-
 arch/powerpc/kernel/exceptions-64s.S | 31 ---
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 107c15c6f48b..32e8d8f7e091 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -967,7 +967,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bl  __check_irq_replay
cmpwi   cr0,r3,0
beq .Lrestore_no_replay
- 
+
+   /*
+* We decide VERY late if we need to replay interrupts, theres
+* not much which can be done about that so this will have to
+* do
+*/
+   TM_KERNEL_ENTRY
+   /*
+* This will restore r3 that TM_KERNEL_ENTRY clobbered.
+* Clearly not ideal! I wonder if we could change the trap
+* value beforehand...
+*/
+   bl  __check_irq_replay
+
/*
 * We need to re-emit an interrupt. We do so by re-using our
 * existing exception frame. We first change the trap value,
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3ac87e53b3da..c8899bf77fb0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -504,6 +504,11 @@ EXC_COMMON_BEGIN(data_access_common)
li  r5,0x300
std r3,_DAR(r1)
std r4,_DSISR(r1)
+   /*
+* Can't do TM_KERNEL_ENTRY here as do_hash_page might jump to
+* very late in the expection exit code, well after any
+* possiblity of doing a recheckpoint
+*/
 BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
@@ -548,6 +553,11 @@ EXC_COMMON_BEGIN(instruction_access_common)
li  r5,0x400
std r3,_DAR(r1)
std r4,_DSISR(r1)
+   /*
+* Can't do TM_KERNEL_ENTRY here as do_hash_page might jump to
+* very late in the expection exit code, well after any
+* possiblity of doing a recheckpoint
+*/
 BEGIN_MMU_FTR_SECTION
b   do_hash_page/* Try to handle as hpte fault */
 MMU_FTR_SECTION_ELSE
@@ -761,6 +771,7 @@ EXC_COMMON_BEGIN(alignment_common)
std r4,_DSISR(r1)
bl  save_nvgprs
RECONCILE_IRQ_STATE(r10, r11)
+   TM_KERNEL_ENTRY
addir3,r1,STACK_FRAME_OVERHEAD
bl  alignment_exception
b   ret_from_except
@@ -1668,7 +1679,9 @@ do_hash_page:
 
 /* Here we have a page fault that hash_page can't handle. */
 handle_page_fault:
-11:andis.  r0,r4,DSISR_DABRMATCH@h
+11:TM_KERNEL_ENTRY
+   ld  r4,_DSISR(r1)
+   andis.  r0,r4,DSISR_DABRMATCH@h
bne-handle_dabr_fault
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)
@@ -1685,6 +1698,10 @@ handle_page_fault:
 
 /* We have a data breakpoint exception - handle it */
 handle_dabr_fault:
+   /*
+* Don't need to do TM_KERNEL_ENTRY here as we'll
+* come from handle_page_fault: which has done it already
+*/
bl  save_nvgprs
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)
@@ -1698,7 +1715,14 @@ handle_dabr_fault:
  * the PTE insertion
  */
 13:bl  save_nvgprs
-   mr  r5,r3
+   /*
+* Use a non-volatile as the TM code will call, r3 is the
+* return value from __hash_page() so not exactly easy to get
+* again.
+*/
+   mr  r31,r3
+   TM_KERNEL_ENTRY
+   mr  r5, r31
addir3,r1,STACK_FRAME_OVERHEAD
ld  r4,_DAR(r1)
bl  low_hash_fault
@@ -1713,7 +1737,8 @@ handle_dabr_fault:
  * the access, or panic if there isn't a handler.
  */
 77:bl  save_nvgprs
-   mr  r4,r3
+   TM_KERNEL_ENTRY
+   ld  r4,_DAR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
li  r5,SIGSEGV
bl  bad_page_fault
-- 
2.16.2

[RFC PATCH 09/12] [WIP] powerpc/tm: Tweak signal code to handle new reclaim/recheckpoint times

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/process.c   | 13 -
 arch/powerpc/kernel/signal.c| 11 ++-
 arch/powerpc/kernel/signal_32.c | 16 ++--
 arch/powerpc/kernel/signal_64.c | 41 +
 4 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8a32fd062a2b..cd3ae80a6878 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1070,9 +1070,20 @@ void restore_tm_state(struct pt_regs *regs)
 * again, anything else could lead to an incorrect ckpt_msr being
 * saved and therefore incorrect signal contexts.
 */
-   clear_thread_flag(TIF_RESTORE_TM);
+
+   /*
+* So, on signals we're going to have cleared the TM bits from
+* the MSR, meaning that heading to userspace signal handler
+* this will be true.
+* I'm not convinced clearing the TIF_RESTORE_TM flag is a
+* good idea however, we should do it only if we actually
+* recheckpoint, which we'll need to do once the signal
+* hanlder is done and we're returning to the main thread of
+* execution.
+*/
if (!MSR_TM_ACTIVE(regs->msr))
return;
+   clear_thread_flag(TIF_RESTORE_TM);
 
msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 61db86ecd318..4f0398c6ce03 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -191,16 +191,17 @@ unsigned long get_tm_stackpointer(struct task_struct *tsk)
 *
 * For signals taken in non-TM or suspended mode, we use the
 * normal/non-checkpointed stack pointer.
+*
+* We now do reclaims on kernel entry, we should absolutely
+* never need to reclaim here.
+* TODO Update the comment above if needed.
 */
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BUG_ON(tsk != current);
 
-   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
-   tm_reclaim_current(TM_CAUSE_SIGNAL);
-   if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr))
-   return tsk->thread.ckpt_regs.gpr[1];
-   }
+   if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr))
+   return tsk->thread.ckpt_regs.gpr[1];
 #endif
return tsk->thread.regs->gpr[1];
 }
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index a46de0035214..a87a7c8b5d9e 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -860,21 +860,9 @@ static long restore_tm_user_regs(struct pt_regs *regs,
tm_enable();
/* Make sure the transaction is marked as failed */
current->thread.tm_texasr |= TEXASR_FS;
-   /* This loads the checkpointed FP/VEC state, if used */
-   tm_recheckpoint(>thread);
 
-   /* This loads the speculative FP/VEC state, if used */
-   msr_check_and_set(msr & (MSR_FP | MSR_VEC));
-   if (msr & MSR_FP) {
-   load_fp_state(>thread.fp_state);
-   regs->msr |= (MSR_FP | current->thread.fpexc_mode);
-   }
-#ifdef CONFIG_ALTIVEC
-   if (msr & MSR_VEC) {
-   load_vr_state(>thread.vr_state);
-   regs->msr |= MSR_VEC;
-   }
-#endif
+   /* See comment in signal_64.c */
+   set_thread_flag(TIF_RESTORE_TM);
 
return 0;
 }
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 720117690822..a7751d1fcac6 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -568,21 +568,20 @@ static long restore_tm_sigcontexts(struct task_struct 
*tsk,
}
}
 #endif
-   tm_enable();
/* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS;
-   /* This loads the checkpointed FP/VEC state, if used */
-   tm_recheckpoint(>thread);
 
-   msr_check_and_set(msr & (MSR_FP | MSR_VEC));
-   if (msr & MSR_FP) {
-   load_fp_state(>thread.fp_state);
-   regs->msr |= (MSR_FP | tsk->thread.fpexc_mode);
-   }
-   if (msr & MSR_VEC) {
-   load_vr_state(>thread.vr_state);
-   regs->msr |= MSR_VEC;
-   }
+   /*
+* I believe this is only nessesary if the
+* clear_thread_flag(TIF_RESTORE_TM); in restore_tm_state()
+* stays before the if (!MSR_TM_ACTIVE(regs->msr).
+*
+* Actually no, we should follow the comment in
+* restore_tm_state() but this should ALSO be here if
+* if the signal handler does something crazy like 'generate'
+* a transaction.
+*/
+   set_thread_flag(TIF_RESTORE_TM);
 
return err;
 }
@@ -734,6 +733,22 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, 
unsigned long r5,
if

[RFC PATCH 03/12] selftests/powerpc: Add tm-signal-drop-transaction TM test

2018-02-19 Thread Cyril Bur

This test uses a signal to 'discard' a transaction. That is, it will
take a signal of a thread in a suspended transaction and just remove
the suspended MSR bit. Because this will send the userspace thread back
to the tebgin + 4 address, we should also set CR0 to be nice.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 tools/testing/selftests/powerpc/tm/Makefile|  1 +
 .../powerpc/tm/tm-signal-drop-transaction.c| 74 ++
 2 files changed, 75 insertions(+)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index a23453943ad2..7a1e53297588 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,6 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
+   tm-signal-drop-transaction \
$(SIGNAL_CONTEXT_CHK_TESTS)
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c
new file mode 100644
index ..a8397f7e7faa
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright 2018, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * This test uses a signal handler to make a thread go from
+ * transactional state to nothing state. In practice userspace, why
+ * would userspace ever do this? In theory, they can.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+static bool passed;
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   ucontext_t *ucp = uc;
+   struct pt_regs *regs = ucp->uc_mcontext.regs;
+
+   passed = true;
+
+   /* I really hope I got that right, we wan't to clear both the MSR_TS 
bits */
+   regs->msr &= ~(3ULL << 33);
+   /* Set CR0 to 0b0010 */
+   regs->ccr &= ~(0xDULL << 28);
+}
+
+int test_drop(void)
+{
+   struct sigaction act;
+
+   SKIP_IF(!have_htm());
+
+   act.sa_sigaction = signal_usr1;
+   sigemptyset(_mask);
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGUSR1, , NULL) < 0) {
+   perror("sigaction sigusr1");
+   exit(1);
+   }
+
+
+   asm __volatile__(
+   "tbegin.;"
+   "beq1f; "
+   "tsuspend.;"
+   "1: ;"
+   : : : "memory", "cr0");
+
+   if (!passed && !tcheck_transactional()) {
+   fprintf(stderr, "Not in suspended state: 0x%1x\n", tcheck());
+   exit(1);
+   }
+
+   kill(getpid(), SIGUSR1);
+
+   /* If we reach here, we've passed.  Otherwise we've probably crashed
+* the kernel */
+
+   return 0;
+}
+
+int main(int argc, char *argv[])
+{
+   return test_harness(test_drop, "tm_signal_drop_transaction");
+}
-- 
2.16.2

[RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/include/asm/exception-64s.h | 25 +
 arch/powerpc/kernel/entry_64.S   |  5 +
 arch/powerpc/kernel/process.c| 37 
 3 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 471b2274fbeb..f904f19a9ec2 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -35,6 +35,7 @@
  * implementations as possible.
  */
 #include 
+#include 
 
 /* PACA save area offsets (exgen, exmc, etc) */
 #define EX_R9  0
@@ -127,6 +128,26 @@
hrfid;  \
b   hrfi_flush_fallback
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+#define TM_KERNEL_ENTRY
\
+   ld  r3,_MSR(r1);\
+   /* Probably don't need to check if coming from user/kernel */   \
+   /* If TM is suspended or active then we must have come from*/   \
+   /* userspace */ \
+   andi.   r0,r3,MSR_PR;   \
+   beq 1f; \
+   rldicl. r3,r3,(64-MSR_TS_LG),(64-2); /* SUSPENDED or ACTIVE*/   \
+   beql+   1f; /* Not SUSPENDED or ACTIVE */   \
+   bl  save_nvgprs;\
+   RECONCILE_IRQ_STATE(r10,r11);   \
+   li  r3,TM_CAUSE_MISC;   \
+   bl  tm_reclaim_current; /* uint8 cause */   \
+1:
+
+#else /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#define TM_KERNEL_ENTRY
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
 #ifdef CONFIG_RELOCATABLE
 #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)   \
mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
@@ -675,6 +696,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
EXCEPTION_PROLOG_COMMON(trap, area);\
/* Volatile regs are potentially clobbered here */  \
additions;  \
+   /* This is going to need to go somewhere else as well */\
+   /* See comment in tm_recheckpoint()   */\
+   TM_KERNEL_ENTRY;\
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr;   \
b   ret
@@ -689,6 +713,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
EXCEPTION_PROLOG_COMMON_3(trap);\
/* Volatile regs are potentially clobbered here */  \
additions;  \
+   TM_KERNEL_ENTRY;\
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2cb5109a7ea3..107c15c6f48b 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -126,6 +126,11 @@ BEGIN_FW_FTR_SECTION
 33:
 END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
+   TM_KERNEL_ENTRY
+   REST_GPR(0,r1)
+   REST_4GPRS(3,r1)
+   REST_2GPRS(7,r1)
+   addir9,r1,STACK_FRAME_OVERHEAD
 
/*
 * A syscall should always be called with interrupts enabled
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 77dc6d8288eb..ea75da0fd506 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -951,6 +951,23 @@ void tm_recheckpoint(struct thread_struct *thread)
if (!(thread->regs->msr & MSR_TM))
return;
 
+   /*
+* This is 'that' comment.
+*
+* If we get where with tm suspended or active then something
+* has gone wrong. I've added this now as a proof of concept.
+*
+* The problem I'm seeing without it is an attempt to
+* recheckpoint a CPU without a previous reclaim.
+*
+* I'm probably missed an exception entry with the
+* TM_KERNEL_ENTRY macro. Should be easy enough to find.
+*/
+   if (MSR_TM_ACTIVE(mfmsr()))
+   return;
+
+   tm_enable();
+
/* We really can't be interrupted here as the TEXASR registers can't
 * change and later in the trecheckpoint code, we have a userspace R1.
 * So let's hard disable over this region.
@@ -1009,6 +1026,13 @@ static inline void tm_recheckpoint_new_task(struct 
task_struct *new)
 static inline void __switch_to_tm(struct task_struct *prev,
struct task_struct *new)
 {
+   /*
+* So, with the

[RFC PATCH 06/12] [WIP] powerpc/tm: Remove dead code from __switch_to_tm()

2018-02-19 Thread Cyril Bur

---
 arch/powerpc/kernel/process.c | 24 +---
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ea75da0fd506..574b05fe7d66 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1027,27 +1027,13 @@ static inline void __switch_to_tm(struct task_struct 
*prev,
struct task_struct *new)
 {
/*
-* So, with the rework none of this code should not be needed.
-* I've left in the reclaim for now. This *should* save us
-* from any mistake in the new code. Also the
-* enabling/disabling logic of MSR_TM really should be
+* The enabling/disabling logic of MSR_TM really should be
 * refactored into a common way with MSR_{FP,VEC,VSX}
 */
-   if (cpu_has_feature(CPU_FTR_TM)) {
-   if (tm_enabled(prev) || tm_enabled(new))
-   tm_enable();
-
-   if (tm_enabled(prev)) {
-   prev->thread.load_tm++;
-   tm_reclaim_task(prev);
-   /*
-* The disabling logic may be confused don't
-* disable for now
-*
-* if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
prev->thread.load_tm == 0)
-*  prev->thread.regs->msr &= ~MSR_TM;
-*/
-   }
+   if (cpu_has_feature(CPU_FTR_TM) && tm_enabled(prev)) {
+   prev->thread.load_tm++;
+   if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && 
prev->thread.load_tm == 0)
+   prev->thread.regs->msr &= ~MSR_TM;
}
 }
 
-- 
2.16.2

[RFC PATCH 02/12] selftests/powerpc: Fix tm.h helpers

2018-02-19 Thread Cyril Bur

Turns out the tcheck() helpers were subtly wrong

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 tools/testing/selftests/powerpc/tm/tm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm.h 
b/tools/testing/selftests/powerpc/tm/tm.h
index df4204247d45..e187a0d3160c 100644
--- a/tools/testing/selftests/powerpc/tm/tm.h
+++ b/tools/testing/selftests/powerpc/tm/tm.h
@@ -57,11 +57,11 @@ static inline bool failure_is_nesting(void)
return (__builtin_get_texasru() & 0x40);
 }
 
-static inline int tcheck(void)
+static inline uint8_t tcheck(void)
 {
-   long cr;
-   asm volatile ("tcheck 0" : "=r"(cr) : : "cr0");
-   return (cr >> 28) & 4;
+   unsigned long cr;
+   asm volatile ("tcheck 0; mfcr %0;" : "=r"(cr) : : "cr0");
+   return (cr >> 28) & 0xF;
 }
 
 static inline bool tcheck_doomed(void)
@@ -81,7 +81,7 @@ static inline bool tcheck_suspended(void)
 
 static inline bool tcheck_transactional(void)
 {
-   return tcheck() & 6;
+   return (tcheck_active()) || (tcheck_suspended());
 }
 
 #endif /* _SELFTESTS_POWERPC_TM_TM_H */
-- 
2.16.2

[RFC PATCH 00/12] Deal with TM on kernel entry and exit

2018-02-19 Thread Cyril Bur

This is very much a proof of concept and if it isn't clear from the
commit names, still a work in progress.

I believe I have something that works - all the powerpc selftests
pass. I would like to get some eyes on it to a) see if I've missed
anything big and b) some opinions on if it is looking like a net
improvement.

Obviously it is still a bit rough around the edges, I'll have to
convince myself that the SPR code is correct. I don't think the
TM_KERNEL_ENTRY macro needs to check that we came from userspace, if
TM is on then we can probably assume. Maybe a check not in the
fastpath. Some of the BUG_ON()s will probably go.

Background:
Currently TM is dealt with when we need to. That is, when we switch
processes, we'll (if nessesary) reclaim the outgoing process and (if
nessesary) recheckpoint the incoming process. Same with signals, if we
need to deliver a signal, we'll ensure we've reclaimed in order to
have all the information and go from there.
I, along with some others got curious to see what it would look like if
we did the 'opposite'.
At all kernel entry points that won't simply just zoom straight to an
RFID we now check if the thread was transactional and do the reclaim.
Correspondingly do the recheckpoint quite late on exception exit. It
turns out we already had a lot of the code pathes set up on the exit
path as there were things that TM had special cased on exit already.
I wasn't sure it it would lead to more or less complexity and though
I'd have to try it to see. I feel like it was almost a win but SPRs
did add some annoying caveats.

In order to get this past Michael I'm going to prove it performs, or
rather, doesn't slow anything down - workload suggestions welcome.

Thanks,

Cyril Bur (12):
  powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()
  selftests/powerpc: Fix tm.h helpers
  selftests/powerpc: Add tm-signal-drop-transaction TM test
  selftests/powerpc: Use less common thread names
  [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit
  [WIP] powerpc/tm: Remove dead code from __switch_to_tm()
  [WIP] powerpc/tm: Add TM_KERNEL_ENTRY in more delicate exception
pathes
  [WIP] powerpc/tm: Fix *unavailable_tm exceptions
  [WIP] powerpc/tm: Tweak signal code to handle new reclaim/recheckpoint
times
  [WIP] powerpc/tm: Correctly save/restore checkpointed sprs
  [WIP] powerpc/tm: Afterthoughts
  [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest

 arch/powerpc/include/asm/exception-64s.h   |  25 
 arch/powerpc/kernel/entry_64.S |  20 ++-
 arch/powerpc/kernel/exceptions-64s.S   |  31 -
 arch/powerpc/kernel/process.c  | 145 ++---
 arch/powerpc/kernel/ptrace.c   |   9 +-
 arch/powerpc/kernel/signal.c   |  11 +-
 arch/powerpc/kernel/signal_32.c|  16 +--
 arch/powerpc/kernel/signal_64.c|  41 --
 arch/powerpc/kernel/traps.c|   3 -
 tools/testing/selftests/powerpc/tm/Makefile|   5 +-
 .../powerpc/tm/tm-signal-drop-transaction.c|  74 +++
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |  28 
 tools/testing/selftests/powerpc/tm/tm-syscall.c| 106 ---
 .../testing/selftests/powerpc/tm/tm-unavailable.c  |   4 +-
 tools/testing/selftests/powerpc/tm/tm.h|  10 +-
 15 files changed, 319 insertions(+), 209 deletions(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c
 delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

-- 
2.16.2

Re: [PATCH] pseries/drmem: Check for zero filled ibm, dynamic-memory property.

2018-02-15 Thread Cyril Bur

On Thu, 2018-02-15 at 21:27 -0600, Nathan Fontenot wrote:
> Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory
> node with a ibm,dynamic-memory property that is zero-filled. This causes
> the drmem code to oops trying to parse this property.
> 
> The fix for this is to validate that the property does contain LMB
> entries before trying to parse it and bail if the count is zero.
> 
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2048
> NUMA
> pSeries
> Modules linked in:
> Supported: Yes
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-11.2-default #1
> task: c0007e639680 task.stack: c0007e648000
> NIP: c0c709a4 LR: c0c70998 CTR: 
> REGS: c0007e64b8d0 TRAP: 0300   Not tainted  (4.12.14-11.2-default)
> MSR: 80010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
>   CR: 84000248  XER: 
> CFAR: c067018c DAR: 0010 DSISR: 4200 SOFTE: 1
> GPR00: c0c70998 c0007e64bb50 c1157b00 
> GPR04: c0007e64bb70  002f 0022
> GPR08: 0003 c6f63fac c6f63fb0 001e
> GPR12:  cfa8 c000dca8 
> GPR16:    
> GPR20:    
> GPR24: c0cccb98 c0c636f0 c0c56cd0 0007
> GPR28: c0cccba8 c0007c30 c0007e64bbf0 0010
> NIP [c0c709a4] read_drconf_v1_cell+0x54/0x9c
> LR [c0c70998] read_drconf_v1_cell+0x48/0x9c
> Call Trace:
> [c0007e64bb50] [c0c56cd0] __param_initcall_debug+0x0/0x28 
> (unreliable)
> [c0007e64bb90] [c0c70e24] drmem_init+0x144/0x2f8
> [c0007e64bc40] [c000d034] do_one_initcall+0x64/0x1d0
> [c0007e64bd00] [c0c643d0] kernel_init_freeable+0x298/0x38c
> [c0007e64bdc0] [c000dcc4] kernel_init+0x24/0x160
> [c0007e64be30] [c000b428] ret_from_kernel_thread+0x5c/0xb4
> Instruction dump:
> 7c9e2378 6000 e9429050 e93e 7c240b78 7c7f1b78 f9240021 e86a0002
> 4804e41d 6000 e9210020 39490004  f9410020 39490010 7d004c2c
> 
> The ibm,dynamic-reconfiguration-memory device tree property
> generated that causes this:
> 
> ibm,dynamic-reconfiguration-memory {
> ibm,lmb-size = <0x0 0x1000>;
> ibm,memory-flags-mask = <0xff>;
> ibm,dynamic-memory = <0x0 0x0 0x0 0x0 0x0 0x0>;
> linux,phandle = <0x7e57eed8>;
> ibm,associativity-lookup-arrays = <0x1 0x4 0x0 0x0 0x0 0x0>;
> ibm,memory-preservation-time = <0x0>;
> };
> 
> Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com>

Works for me.

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  arch/powerpc/mm/drmem.c |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 1604110c4238..916844f99c64 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -216,6 +216,8 @@ static void __init __walk_drmem_v1_lmbs(const __be32 
> *prop, const __be32 *usm,
>   u32 i, n_lmbs;
>  
>   n_lmbs = of_read_number(prop++, 1);
> + if (n_lmbs == 0)
> + return;
>  
>   for (i = 0; i < n_lmbs; i++) {
>   read_drconf_v1_cell(, );
> @@ -245,6 +247,8 @@ static void __init __walk_drmem_v2_lmbs(const __be32 
> *prop, const __be32 *usm,
>   u32 i, j, lmb_sets;
>  
>   lmb_sets = of_read_number(prop++, 1);
> + if (lmb_sets == 0)
> + return;
>  
>   for (i = 0; i < lmb_sets; i++) {
>   read_drconf_v2_cell(_cell, );
> @@ -354,6 +358,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
>   struct drmem_lmb *lmb;
>  
>   drmem_info->n_lmbs = of_read_number(prop++, 1);
> + if (drmem_info->n_lmbs == 0)
> + return;
>  
>   drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
>  GFP_KERNEL);
> @@ -373,6 +379,8 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
>   int lmb_index;
>  
>   lmb_sets = of_read_number(prop++, 1);
> + if (lmb_sets == 0)
> + return;
>  
>   /* first pass, calculate the number of LMBs */
>   p = prop;
>

Re: 4.16-rc1 virtual machine crash on boot

2018-02-13 Thread Cyril Bur

On Tue, 2018-02-13 at 21:12 -0800, Tyrel Datwyler wrote:
> On 02/13/2018 05:20 PM, Cyril Bur wrote:
> > Hello all,
> 
> Does reverting commit 02ef6dd8109b581343ebeb1c4c973513682535d6 alleviate the 
> issue?
> 

Hi Tyrel,

No it doesn't. Same backtrace.

> -Tyrel
> 
> > 
> > I'm seeing this crash trying to boot a KVM virtual machine. This kernel
> > was compiled with pseries_le_defconfig and run using the following qemu
> > commandline:
> > 
> > qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries
> > -nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive
> > file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0
> > -device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append
> > 'root=/dev/vdb1 rw cloud-init=disabled'
> > 
> > qemu-system-ppc64 --version
> > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright
> > (c) 2003-2008 Fabrice Bellard
> > 
> > 
> > Key type dns_resolver registered
> > Unable to handle kernel paging request for data at address 0x0010
> > Faulting instruction address: 0xc18f2bbc
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > LE SMP NR_CPUS=2048 NUMA pSeries
> > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8
> > NIP:  c18f2bbc LR: c18f2bb4 CTR: 
> > REGS: c000fea838d0 TRAP: 0380   Not tainted  (4.16.0-rc1v4.16-rc1)
> > MSR:  82009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 84000248  XER:
> > 2000
> > CFAR: c19591a0 SOFTE: 0 
> > GPR00: c18f2bb4 c000fea83b50 c1bd8400
> >  
> > GPR04: c000fea83b70  002f
> > 0022 
> > GPR08:  c22a3e90 
> > 0220 
> > GPR12:  cfb40980 c000d698
> >  
> > GPR16:   
> >  
> > GPR20:   
> >  
> > GPR24:  c18b9248 c18e36d8
> > c19738a8 
> > GPR28: 0007 c000fc68 c000fea83bf0
> > 0010 
> > NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c
> > LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> > Call Trace:
> > [c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> > (unreliable)
> > [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
> > [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
> > [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
> > [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
> > [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
> > Instruction dump:
> > 7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e
> > e8690002 
> > f9440021 4806657d 6000 e9210020  39090004 39490010
> > f9010020 
> > ---[ end trace bd9f49f482d30e03 ]---
> > 
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> > 
> > WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883
> > do_unblank_screen+0x1f0/0x270
> > CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  D  4.16.0-
> > rc1v4.16-rc1 #8
> > NIP:  c09aa800 LR: c09aa63c CTR: c148f5f0
> > REGS: c000fea832c0 TRAP: 0700   Tainted:
> > G  D   (4.16.0-rc1v4.16-rc1)
> > MSR:  82029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 2800  XER:
> > 2000
> > CFAR: c09aa658 SOFTE: 1 
> > GPR00: c09aa63c c000fea83540 c1bd8400
> >  
> > GPR04: 0001 c000fb0c200e 1dd7
> > c000fea834d0 
> > GPR08: fe43  
> > 0001 
> > GPR12: 28002428 cfb40980 c000d698
> >  
> > GPR16:   
> >  
> > GPR20:   
> >  
> > GPR24: c000fea4 c000feadf910 c1a4a7a8
> > c1cc4ea0 
> > GPR28: c173f4f0 c1cc4ec8 
> >  
> > NIP [c09aa800] do_unblank_screen+0x1f0/0x270
> > LR [c09aa63c] do_unblank_screen+0x2c/0x270
> > Call Trace:
> > [c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x27

[PATCH] powerpc: Expose TSCR via sysfs only on powernv

2018-02-13 Thread Cyril Bur

The TSCR can only be accessed in hypervisor mode.

Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs")
Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/sysfs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 5a8bfee6e187..04d0bbd7a1dd 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -788,7 +788,8 @@ static int register_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2))
device_create_file(s, _attr_pir);
 
-   if (cpu_has_feature(CPU_FTR_ARCH_206))
+   if (cpu_has_feature(CPU_FTR_ARCH_206) &&
+   !firmware_has_feature(FW_FEATURE_LPAR))
device_create_file(s, _attr_tscr);
 #endif /* CONFIG_PPC64 */
 
@@ -873,7 +874,8 @@ static int unregister_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2))
device_remove_file(s, _attr_pir);
 
-   if (cpu_has_feature(CPU_FTR_ARCH_206))
+   if (cpu_has_feature(CPU_FTR_ARCH_206) &&
+   !firmware_has_feature(FW_FEATURE_LPAR))
device_remove_file(s, _attr_tscr);
 #endif /* CONFIG_PPC64 */
 
-- 
2.16.1

4.16-rc1 virtual machine crash on boot

2018-02-13 Thread Cyril Bur

Hello all,

I'm seeing this crash trying to boot a KVM virtual machine. This kernel
was compiled with pseries_le_defconfig and run using the following qemu
commandline:

qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries
-nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive
file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0
-device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append
'root=/dev/vdb1 rw cloud-init=disabled'

qemu-system-ppc64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright
(c) 2003-2008 Fabrice Bellard


Key type dns_resolver registered
Unable to handle kernel paging request for data at address 0x0010
Faulting instruction address: 0xc18f2bbc
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8
NIP:  c18f2bbc LR: c18f2bb4 CTR: 
REGS: c000fea838d0 TRAP: 0380   Not tainted  (4.16.0-rc1v4.16-rc1)
MSR:  82009033   CR: 84000248  XER:
2000
CFAR: c19591a0 SOFTE: 0 
GPR00: c18f2bb4 c000fea83b50 c1bd8400
 
GPR04: c000fea83b70  002f
0022 
GPR08:  c22a3e90 
0220 
GPR12:  cfb40980 c000d698
 
GPR16:   
 
GPR20:   
 
GPR24:  c18b9248 c18e36d8
c19738a8 
GPR28: 0007 c000fc68 c000fea83bf0
0010 
NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c
LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
Call Trace:
[c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
(unreliable)
[c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
[c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
[c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
[c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
[c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e
e8690002 
f9440021 4806657d 6000 e9210020  39090004 39490010
f9010020 
---[ end trace bd9f49f482d30e03 ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b

WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883
do_unblank_screen+0x1f0/0x270
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  D  4.16.0-
rc1v4.16-rc1 #8
NIP:  c09aa800 LR: c09aa63c CTR: c148f5f0
REGS: c000fea832c0 TRAP: 0700   Tainted:
G  D   (4.16.0-rc1v4.16-rc1)
MSR:  82029033   CR: 2800  XER:
2000
CFAR: c09aa658 SOFTE: 1 
GPR00: c09aa63c c000fea83540 c1bd8400
 
GPR04: 0001 c000fb0c200e 1dd7
c000fea834d0 
GPR08: fe43  
0001 
GPR12: 28002428 cfb40980 c000d698
 
GPR16:   
 
GPR20:   
 
GPR24: c000fea4 c000feadf910 c1a4a7a8
c1cc4ea0 
GPR28: c173f4f0 c1cc4ec8 
 
NIP [c09aa800] do_unblank_screen+0x1f0/0x270
LR [c09aa63c] do_unblank_screen+0x2c/0x270
Call Trace:
[c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x270
(unreliable)
[c000fea835b0] [c08a2a70] bust_spinlocks+0x40/0x80
[c000fea835d0] [c00da90c] panic+0x1b8/0x32c
[c000fea83670] [c00e1bd4] do_exit+0xcb4/0xcc0
[c000fea83730] [c00275fc] die+0x29c/0x450
[c000fea837c0] [c0053f88] bad_page_fault+0xe8/0x160
[c000fea83830] [c0028a90] slb_miss_bad_addr+0x40/0x90
[c000fea83860] [c0008b08] bad_addr_slb+0x158/0x160
--- interrupt: 380 at read_drconf_v1_cell+0x50/0x9c
LR = read_drconf_v1_cell+0x48/0x9c
[c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
[c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
[c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
[c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
[c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
3c62ffbf 38840001 7c8407b4 38639ca8 4b7ae0ed 6000 38210070
e8010010 
ebc1fff0 ebe1fff8 7c0803a6 4e800020 <0fe0> 4bfffe58 6000
6042 
---[ end trace bd9f49f482d30e04 ]---
Rebooting in 10 seconds..

[PATCH] powerpc/tm: Update function prototype comment

2018-02-04 Thread Cyril Bur

In commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs
during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
longer take the second parameter 'unsigned long orig_msr' as part of a
TM rewrite to simplify the reclaiming/recheckpointing process.

There is a comment in the asm file where the function is delcared which
has an incorrect prototype with the 'orig_msr' parameter.

This patch corrects the comment.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/tm.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index b92ac8e711db..2eb20264e70d 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -300,8 +300,8 @@ _GLOBAL(tm_reclaim)
blr
 
 
-   /* void __tm_recheckpoint(struct thread_struct *thread,
-*unsigned long orig_msr)
+   /*
+* void __tm_recheckpoint(struct thread_struct *thread)
 *  - Restore the checkpointed register state saved by tm_reclaim
 *when we switch_to a process.
 *
-- 
2.16.1

Re: [PATCH] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()

2018-02-01 Thread Cyril Bur

On Thu, 2018-02-01 at 15:46 +1100, Michael Ellerman wrote:
> Cyril Bur <cyril...@gmail.com> writes:
> 
> > tm_reclaim_thread() doesn't use the parameter anymore, both callers have
> > to bother getting it as they have no need for a struct thread_info
> > either.
> 
> In future please tell me why the parameter is unused and when it became
> unused.
> 

Thanks, will do!

> In this case it was previously used but the last usage was removed in:
> 
> dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live 
> registers")
> 
> cheers
> 
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index bfdd783e3916..a47498da6562 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -853,8 +853,7 @@ static inline bool tm_enabled(struct task_struct *tsk)
> > return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
> >  }
> >  
> > -static void tm_reclaim_thread(struct thread_struct *thr,
> > - struct thread_info *ti, uint8_t cause)
> > +static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
> >  {
> > /*
> >  * Use the current MSR TM suspended bit to track if we have
> > @@ -901,7 +900,7 @@ static void tm_reclaim_thread(struct thread_struct *thr,
> >  void tm_reclaim_current(uint8_t cause)
> >  {
> > tm_enable();
> > -   tm_reclaim_thread(>thread, current_thread_info(), cause);
> > +   tm_reclaim_thread(>thread, cause);
> >  }
> >  
> >  static inline void tm_reclaim_task(struct task_struct *tsk)
> > @@ -932,7 +931,7 @@ static inline void tm_reclaim_task(struct task_struct 
> > *tsk)
> >  thr->regs->ccr, thr->regs->msr,
> >  thr->regs->trap);
> >  
> > -   tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED);
> > +   tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
> >  
> > TM_DEBUG("--- tm_reclaim on pid %d complete\n",
> >  tsk->pid);
> > -- 
> > 2.16.1

[PATCH] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()

2018-01-31 Thread Cyril Bur

tm_reclaim_thread() doesn't use the parameter anymore, both callers have
to bother getting it as they have no need for a struct thread_info
either.

Just remove it and adjust the callers.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/process.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bfdd783e3916..a47498da6562 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -853,8 +853,7 @@ static inline bool tm_enabled(struct task_struct *tsk)
return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
 }
 
-static void tm_reclaim_thread(struct thread_struct *thr,
- struct thread_info *ti, uint8_t cause)
+static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
 {
/*
 * Use the current MSR TM suspended bit to track if we have
@@ -901,7 +900,7 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 void tm_reclaim_current(uint8_t cause)
 {
tm_enable();
-   tm_reclaim_thread(>thread, current_thread_info(), cause);
+   tm_reclaim_thread(>thread, cause);
 }
 
 static inline void tm_reclaim_task(struct task_struct *tsk)
@@ -932,7 +931,7 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 thr->regs->ccr, thr->regs->msr,
 thr->regs->trap);
 
-   tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED);
+   tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
 
TM_DEBUG("--- tm_reclaim on pid %d complete\n",
 tsk->pid);
-- 
2.16.1

Re: [PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable

2017-12-10 Thread Cyril Bur

On Mon, 2017-12-11 at 13:02 +1100, Michael Ellerman wrote:

> Cyril Bur <cyril...@gmail.com> writes:

>

> > On Tue, 2017-11-21 at 11:31 -0200, Gustavo Romero wrote:

> > > Hi Cyril,

> > >

> > > On 21-11-2017 05:17, Cyril Bur wrote:

> > > > Currently the tm-unavailable test spins for a fixed amount of time in

> > > > an attempt to ensure the FPU/VMX/VSX facilities are off. This value was

> > > > experimentally tested to be long enough.

> > > >

> > > > Problems may arise if kernel heuristics were to change. This patch

> > > > should future proof this test.

> > >

> > > I've tried it on a VM running on '4.14.0-rc7' and apparently it gets stuck

> > > pretty slow on calibration, since it ran ~7m without finding the correct 
> > > value

> > > (before it would take about 3m), like:

> > >

> > > $ time ./tm-unavailable

> > > Testing required spin time required for facility unavailable...

> > >   Trying 0x1800...

> > >   Trying 0x1900...

> > >   Trying 0x1a00...

> > > ...

> > >   Trying 0xfd00... ^C

> > >

> > > real  7m15.304s

> > > user  7m15.291s

> > > sys   0m0.004s

> > >

> >

> > Interesting! I didn't test in a VM. I guess hypervisor switching

> > completely changes the heuristic. Ok I'll have to rethink.

> >

> > Maybe the increase should be a multiplier to get to a good state more

> > quickly.

>

> Yeah this sucks in a VM:

>

> # /home/michael/tm-unavailable

> Testing required spin time required for facility unavailable...

>   Trying 0x1800...

>   Trying 0x1900...

> ...

>   Trying 0x11000...

>

> etc.

>

> I got sick of waiting for it, but it's causing my selftests job to time

> out so it must be taking > ~1 hour.

>


Yeah sorry, I'll see if I can come up with a better way for a VM. Needs a
few more cycles from me.

Cyril

> cheers

Re: [PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable

2017-11-21 Thread Cyril Bur

On Tue, 2017-11-21 at 11:31 -0200, Gustavo Romero wrote:
> Hi Cyril,
> 
> On 21-11-2017 05:17, Cyril Bur wrote:
> > Currently the tm-unavailable test spins for a fixed amount of time in
> > an attempt to ensure the FPU/VMX/VSX facilities are off. This value was
> > experimentally tested to be long enough.
> > 
> > Problems may arise if kernel heuristics were to change. This patch
> > should future proof this test.
> 
> I've tried it on a VM running on '4.14.0-rc7' and apparently it gets stuck
> pretty slow on calibration, since it ran ~7m without finding the correct value
> (before it would take about 3m), like:
> 
> $ time ./tm-unavailable
> Testing required spin time required for facility unavailable...
>   Trying 0x1800...
>   Trying 0x1900...
>   Trying 0x1a00...
> ...
>   Trying 0xfd00... ^C
> 
> real  7m15.304s
> user  7m15.291s
> sys   0m0.004s
> 

Interesting! I didn't test in a VM. I guess hypervisor switching
completely changes the heuristic. Ok I'll have to rethink.

Maybe the increase should be a multiplier to get to a good state more
quickly.

> Trying it on a BM running on '4.13.0-rc2' it indeed found an initial value for
> the timeout but for some reason the value was not sufficient for the 
> subsequent
> tests and the value raised more and more  (I understand that it's an expected
> behavior tho). Even tho it runs about half the time (~3m, good!) but I think 
> the
> output could be little bit less "overloaded":
> 

Happy to put some (or all) of that output inside if (DEBUG)

> $ ./tm-unavailable
> Testing required spin time required for facility unavailable...
>   Trying 0x1800...
>   Trying 0x1900...
>   Trying 0x1a00...
>   Trying 0x1b00...
>   Trying 0x1c00...
>   Trying 0x1d00...
>   Trying 0x1e00...
>   Trying 0x1f00... 1, 2, 3
> Spin time required for a reliable facility unavailable TM failure: 0x1f00
> Checking if FP/VEC registers are sane after a FP unavailable exception...
> If MSR.FP=0 MSR.VEC=0:
>   Expecting the transaction to fail, but it didn't
>   FP ok VEC ok
> Adjusting the facility unavailable spin time...
>   Trying 0x2100... 1, 2, 3
> Now using 0x2100
> If MSR.FP=0 MSR.VEC=0:
>   Expecting the transaction to fail, but it didn't
>   FP ok VEC ok
> Adjusting the facility unavailable spin time...
>   Trying 0x2300... 1, 2, 3
> Now using 0x2300
> If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
> If MSR.FP=0 MSR.VEC=1:
>   Expecting the transaction to fail, but it didn't
>   FP ok VEC ok
> Now using 0x4700
> ...
> 
> So, putting output question aside, are you getting a different result on VM,
> i.e. did you notice if it got stuck/pretty slow?
> 
> 
> Regards,
> Gustavo
> 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > ---
> > Because the test no longer needs to use such a conservative time for
> > the busy wait, it actually runs much faster.
> > 
> > 
> >  .../testing/selftests/powerpc/tm/tm-unavailable.c  | 92 
> > --
> >  1 file changed, 84 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
> > b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> > index e6a0fad2bfd0..54aeb7a7fbb1 100644
> > --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> > +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> > @@ -33,6 +33,11 @@
> >  #define VEC_UNA_EXCEPTION  1
> >  #define VSX_UNA_EXCEPTION  2
> > 
> > +#define ERR_RETRY 1
> > +#define ERR_ADJUST 2
> > +
> > +#define COUNTER_INCREMENT (0x100)
> > +
> >  #define NUM_EXCEPTIONS 3
> >  #define err_at_line(status, errnum, format, ...) \
> > error_at_line(status, errnum,  __FILE__, __LINE__, format ##__VA_ARGS__)
> > @@ -45,6 +50,7 @@ struct Flags {
> > int touch_vec;
> > int result;
> > int exception;
> > +   uint64_t counter;
> >  } flags;
> > 
> >  bool expecting_failure(void)
> > @@ -87,14 +93,12 @@ void *ping(void *input)
> >  * Expected values for vs0 and vs32 after a TM failure. They must never
> >  * change, otherwise they got corrupted.
> >  */
> > +   long rc = 0;
> > uint64_t high_vs0 = 0x;
> > uint64_t low_vs0 = 0x;
> > uint64_t high_vs32 = 0x;
> > uint64_t low_vs32 = 0x;
> > 
> > -   /* Counter for busy wait */
> > -   uint64_t counter =

[PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable

2017-11-20 Thread Cyril Bur

Currently the tm-unavailable test spins for a fixed amount of time in
an attempt to ensure the FPU/VMX/VSX facilities are off. This value was
experimentally tested to be long enough.

Problems may arise if kernel heuristics were to change. This patch
should future proof this test.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
Because the test no longer needs to use such a conservative time for
the busy wait, it actually runs much faster.


 .../testing/selftests/powerpc/tm/tm-unavailable.c  | 92 --
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
index e6a0fad2bfd0..54aeb7a7fbb1 100644
--- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -33,6 +33,11 @@
 #define VEC_UNA_EXCEPTION  1
 #define VSX_UNA_EXCEPTION  2
 
+#define ERR_RETRY 1
+#define ERR_ADJUST 2
+
+#define COUNTER_INCREMENT (0x100)
+
 #define NUM_EXCEPTIONS 3
 #define err_at_line(status, errnum, format, ...) \
error_at_line(status, errnum,  __FILE__, __LINE__, format ##__VA_ARGS__)
@@ -45,6 +50,7 @@ struct Flags {
int touch_vec;
int result;
int exception;
+   uint64_t counter;
 } flags;
 
 bool expecting_failure(void)
@@ -87,14 +93,12 @@ void *ping(void *input)
 * Expected values for vs0 and vs32 after a TM failure. They must never
 * change, otherwise they got corrupted.
 */
+   long rc = 0;
uint64_t high_vs0 = 0x;
uint64_t low_vs0 = 0x;
uint64_t high_vs32 = 0x;
uint64_t low_vs32 = 0x;
 
-   /* Counter for busy wait */
-   uint64_t counter = 0x1ff00;
-
/*
 * Variable to keep a copy of CR register content taken just after we
 * leave the transactional state.
@@ -217,7 +221,7 @@ void *ping(void *input)
  [ex_fp] "i"  (FP_UNA_EXCEPTION),
  [ex_vec]"i"  (VEC_UNA_EXCEPTION),
  [ex_vsx]"i"  (VSX_UNA_EXCEPTION),
- [counter]   "r"  (counter)
+ [counter]   "r"  (flags.counter)
 
: "cr0", "ctr", "v10", "vs0", "vs10", "vs3", "vs32", "vs33",
  "vs34", "fr10"
@@ -232,14 +236,14 @@ void *ping(void *input)
if (expecting_failure() && !is_failure(cr_)) {
printf("\n\tExpecting the transaction to fail, %s",
"but it didn't\n\t");
-   flags.result++;
+   rc = ERR_ADJUST;
}
 
/* Check if we were not expecting a failure and a it occurred. */
if (!expecting_failure() && is_failure(cr_)) {
printf("\n\tUnexpected transaction failure 0x%02lx\n\t",
failure_code());
-   return (void *) -1;
+   rc = ERR_RETRY;
}
 
/*
@@ -249,7 +253,7 @@ void *ping(void *input)
if (is_failure(cr_) && !failure_is_unavailable()) {
printf("\n\tUnexpected failure cause 0x%02lx\n\t",
failure_code());
-   return (void *) -1;
+   rc = ERR_RETRY;
}
 
/* 0x4 is a success and 0xa is a fail. See comment in is_failure(). */
@@ -276,7 +280,7 @@ void *ping(void *input)
 
putchar('\n');
 
-   return NULL;
+   return (void *)rc;
 }
 
 /* Thread to force context switch */
@@ -291,6 +295,55 @@ void *pong(void *not_used)
sched_yield();
 }
 
+static void flags_set_counter(struct Flags *flags)
+{
+   uint64_t cr_;
+   int count = 0;
+
+   do {
+   if (count == 0)
+   printf("\tTrying 0x%08" PRIx64 "... ", flags->counter);
+   else
+   printf("%d, ", count);
+   fflush(stdout);
+   asm (
+   /*
+* Wait an amount of context switches so
+* load_fp and load_vec overflow and MSR.FP,
+* MSR.VEC, and MSR.VSX become zero (off).
+*/
+   "   mtctr   %[counter]  ;"
+
+   /* Decrement CTR branch if CTR non zero. */
+   "1: bdnz 1b ;"
+   "   tbegin. ;"
+   "   beq tfail   ;"
+
+   /* Get a facility unavailable */
+   "

[PATCH 1/2] selftests/powerpc: Check for pthread errors in tm-unavailable

2017-11-20 Thread Cyril Bur

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 .../testing/selftests/powerpc/tm/tm-unavailable.c  | 43 +-
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
index 96c37f84ce54..e6a0fad2bfd0 100644
--- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -15,6 +15,7 @@
  */
 
 #define _GNU_SOURCE
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,11 @@
 #define VSX_UNA_EXCEPTION  2
 
 #define NUM_EXCEPTIONS 3
+#define err_at_line(status, errnum, format, ...) \
+   error_at_line(status, errnum,  __FILE__, __LINE__, format ##__VA_ARGS__)
+
+#define pr_warn(code, format, ...) err_at_line(0, code, format, ##__VA_ARGS__)
+#define pr_err(code, format, ...) err_at_line(1, code, format, ##__VA_ARGS__)
 
 struct Flags {
int touch_fp;
@@ -303,10 +309,19 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr)
 * checking if the failure cause is the one we expect.
 */
do {
+   int rc;
+
/* Bind 'ping' to CPU 0, as specified in 'attr'. */
-   pthread_create(, attr, ping, (void *) );
-   pthread_setname_np(t0, "ping");
-   pthread_join(t0, _value);
+   rc = pthread_create(, attr, ping, (void *) );
+   if (rc)
+   pr_err(rc, "pthread_create()");
+   rc = pthread_setname_np(t0, "ping");
+   if (rc)
+   pr_warn(rc, "pthread_setname_np");
+   rc = pthread_join(t0, _value);
+   if (rc)
+   pr_err(rc, "pthread_join");
+
retries--;
} while (ret_value != NULL && retries);
 
@@ -320,7 +335,7 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr)
 
 int main(int argc, char **argv)
 {
-   int exception; /* FP = 0, VEC = 1, VSX = 2 */
+   int rc, exception; /* FP = 0, VEC = 1, VSX = 2 */
pthread_t t1;
pthread_attr_t attr;
cpu_set_t cpuset;
@@ -330,13 +345,23 @@ int main(int argc, char **argv)
CPU_SET(0, );
 
/* Init pthread attribute. */
-   pthread_attr_init();
+   rc = pthread_attr_init();
+   if (rc)
+   pr_err(rc, "pthread_attr_init()");
 
/* Set CPU 0 mask into the pthread attribute. */
-   pthread_attr_setaffinity_np(, sizeof(cpu_set_t), );
-
-   pthread_create(,  /* Bind 'pong' to CPU 0 */, pong, NULL);
-   pthread_setname_np(t1, "pong"); /* Name it for systemtap convenience */
+   rc = pthread_attr_setaffinity_np(, sizeof(cpu_set_t), );
+   if (rc)
+   pr_err(rc, "pthread_attr_setaffinity_np()");
+
+   rc = pthread_create(,  /* Bind 'pong' to CPU 0 */, pong, NULL);
+   if (rc)
+   pr_err(rc, "pthread_create()");
+
+   /* Name it for systemtap convenience */
+   rc = pthread_setname_np(t1, "pong");
+   if (rc)
+   pr_warn(rc, "pthread_create()");
 
flags.result = 0;
 
-- 
2.15.0

Re: [PATCH v5 06/10] powerpc/opal: Rework the opal-async interface

2017-11-06 Thread Cyril Bur

On Mon, 2017-11-06 at 20:41 +1100, Michael Ellerman wrote:
> Cyril Bur <cyril...@gmail.com> writes:
> 
> > diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
> > b/arch/powerpc/platforms/powernv/opal-async.c
> > index c43421ab2d2f..fbae8a37ce2c 100644
> > --- a/arch/powerpc/platforms/powernv/opal-async.c
> > +++ b/arch/powerpc/platforms/powernv/opal-async.c
> > @@ -23,40 +23,45 @@
> >  #include 
> >  #include 
> >  
> > -#define N_ASYNC_COMPLETIONS64
> > +enum opal_async_token_state {
> > +   ASYNC_TOKEN_UNALLOCATED = 0,
> > +   ASYNC_TOKEN_ALLOCATED,
> > +   ASYNC_TOKEN_COMPLETED
> > +};
> > +
> > +struct opal_async_token {
> > +   enum opal_async_token_state state;
> > +   struct opal_msg response;
> > +};
> >  
> > -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = 
> > {~0UL};
> > -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS);
> >  static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait);
> >  static DEFINE_SPINLOCK(opal_async_comp_lock);
> >  static struct semaphore opal_async_sem;
> > -static struct opal_msg *opal_async_responses;
> >  static unsigned int opal_max_async_tokens;
> > +static struct opal_async_token *opal_async_tokens;
> >  
> >  static int __opal_async_get_token(void)
> >  {
> > unsigned long flags;
> > -   int token;
> > +   int token = -EBUSY;
> >  
> > spin_lock_irqsave(_async_comp_lock, flags);
> > -   token = find_first_bit(opal_async_complete_map, opal_max_async_tokens);
> > -   if (token >= opal_max_async_tokens) {
> > -   token = -EBUSY;
> > -   goto out;
> > +   for (token = 0; token < opal_max_async_tokens; token++) {
> > +   if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) {
> > +   opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
> > +   goto out;
> > +   }
> > }
> > -
> > -   if (__test_and_set_bit(token, opal_async_token_map)) {
> > -   token = -EBUSY;
> > -   goto out;
> > -   }
> > -
> > -   __clear_bit(token, opal_async_complete_map);
> > -
> >  out:
> > spin_unlock_irqrestore(_async_comp_lock, flags);
> > return token;
> >  }
> 
> Resulting in:
> 
>  static int __opal_async_get_token(void)
>  {
>   unsigned long flags;
> + int token = -EBUSY;
>  
>   spin_lock_irqsave(_async_comp_lock, flags);
> + for (token = 0; token < opal_max_async_tokens; token++) {
> + if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) {
> + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
> + goto out;
> + }
>   }
>  out:
>   spin_unlock_irqrestore(_async_comp_lock, flags);
>   return token;
>  }
> 
> So when no unallocated token is found we return opal_max_async_tokens :(
> 
> I changed it to:
> 
> static int __opal_async_get_token(void)
> {
>   unsigned long flags;
>   int i, token = -EBUSY;
> 
>   spin_lock_irqsave(_async_comp_lock, flags);
> 
>   for (i = 0; i < opal_max_async_tokens; i++) {
>   if (opal_async_tokens[i].state == ASYNC_TOKEN_UNALLOCATED) {
>   opal_async_tokens[i].state = ASYNC_TOKEN_ALLOCATED;
>   token = i;
>   break;
>   }
>   }
> 
>   spin_unlock_irqrestore(_async_comp_lock, flags);
>   return token;
> }
> 
> 

Thanks!!

> >  
> > +/*
> > + * Note: If the returned token is used in an opal call and opal returns
> > + * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
> 
>  ^
>  call
> 
> 
> cheers

Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM

2017-11-05 Thread Cyril Bur

On Fri, 2017-11-03 at 10:28 -0200, Gustavo Romero wrote:
> Hi Cyril!
> 
> On 01-11-2017 20:10, Cyril Bur wrote:
> > Thanks Gustavo,
> > 
> > I do have one more thought on an improvement for this test which is
> > that:
> > +   /* Counter for busy wait *
> > +   uint64_t counter = 0x1ff00;
> > is a bit fragile, what we should do is have the test work out long it
> > should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then
> > use that for these tests.
> > 
> > This will only become a problem if we were to change kernel heuristics
> > which is fine for now. I'll try to get that added soon but for now this
> > test has proven too useful to delay adding as is.
> 
> I see. Yup, 'counter' value was indeed determined experimentally under many
> different scenarios (VM and BM, different CPU loads, etc). At least if the
> heuristics changes hurting the test it will catch that pointing out that
> the expected failure did not happen, like:
> 
> Checking if FP/VEC registers are sane after a FP unavailable exception...
> If MSR.FP=0 MSR.VEC=0:
> Expecting the transaction to fail, but it didn't
> FP ok VEC ok
> ...
> 
> So it won't let the hurting change pass fine silently :-)
> 

Yeah, all for merging as is.

It would be nice so that when someone does  make a heuristic change
they don't also have to go fix tests - there is nothing more annoying
than a fragile test suite.

> 
> > > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com>
> > > Signed-off-by: Breno Leitao <lei...@debian.org>
> > > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> 
> Thanks a lot for reviewing it.
> 
> Cheers,
> Gustavo
>

[PATCH v5 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()

2017-11-02 Thread Cyril Bur

The OPAL calls performed in this driver shouldn't be using
opal_async_wait_response() as this performs a wait_event() which, on
long running OPAL calls could result in hung task warnings. wait_event()
prevents timely signal delivery which is also undesirable.

This patch also attempts to quieten down the use of dev_err() when
errors haven't actually occurred and also to return better information up
the stack rather than always -EIO.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com>
---
 drivers/mtd/devices/powernv_flash.c | 57 +++--
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 3343d4f5c4f3..26f9feaa5d17 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -89,33 +89,46 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
return -EIO;
}
 
-   if (rc == OPAL_SUCCESS)
-   goto out_success;
+   if (rc == OPAL_ASYNC_COMPLETION) {
+   rc = opal_async_wait_response_interruptible(token, );
+   if (rc) {
+   /*
+* If we return the mtd core will free the
+* buffer we've just passed to OPAL but OPAL
+* will continue to read or write from that
+* memory.
+* It may be tempting to ultimately return 0
+* if we're doing a read or a write since we
+* are going to end up waiting until OPAL is
+* done. However, because the MTD core sends
+* us the userspace request in chunks, we need
+* it to know we've been interrupted.
+*/
+   rc = -EINTR;
+   if (opal_async_wait_response(token, ))
+   dev_err(dev, "opal_async_wait_response() 
failed\n");
+   goto out;
+   }
+   rc = opal_get_async_rc(msg);
+   }
 
-   if (rc != OPAL_ASYNC_COMPLETION) {
+   /*
+* OPAL does mutual exclusion on the flash, it will return
+* OPAL_BUSY.
+* During firmware updates by the service processor OPAL may
+* be (temporarily) prevented from accessing the flash, in
+* this case OPAL will also return OPAL_BUSY.
+* Both cases aren't errors exactly but the flash could have
+* changed, userspace should be informed.
+*/
+   if (rc != OPAL_SUCCESS && rc != OPAL_BUSY)
dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
op, rc);
-   rc = -EIO;
-   goto out;
-   }
 
-   rc = opal_async_wait_response(token, );
-   if (rc) {
-   dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   rc = -EIO;
-   goto out;
-   }
-
-   rc = opal_get_async_rc(msg);
-out_success:
-   if (rc == OPAL_SUCCESS) {
-   rc = 0;
-   if (retlen)
-   *retlen = len;
-   } else {
-   rc = -EIO;
-   }
+   if (rc == OPAL_SUCCESS && retlen)
+   *retlen = len;
 
+   rc = opal_error_code(rc);
 out:
opal_async_release_token(token);
return rc;
-- 
2.15.0

[PATCH v5 03/10] mtd: powernv_flash: Remove pointless goto in driver init

2017-11-02 Thread Cyril Bur

powernv_flash_probe() has pointless goto statements which jump to the
end of the function to simply return a variable. Rather than checking
for error and going to the label, just return the error as soon as it is
detected.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com>
---
 drivers/mtd/devices/powernv_flash.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index ca3ca6adf71e..4dd3b5d2feb2 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -227,21 +227,20 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
int ret;
 
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!data)
+   return -ENOMEM;
+
data->mtd.priv = data;
 
ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id));
if (ret) {
dev_err(dev, "no device property 'ibm,opal-id'\n");
-   goto out;
+   return ret;
}
 
ret = powernv_flash_set_driver_info(dev, >mtd);
if (ret)
-   goto out;
+   return ret;
 
dev_set_drvdata(dev, data);
 
@@ -250,10 +249,7 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
 * with an ffs partition at the start, it should prove easier for users
 * to deal with partitions or not as they see fit
 */
-   ret = mtd_device_register(>mtd, NULL, 0);
-
-out:
-   return ret;
+   return mtd_device_register(>mtd, NULL, 0);
 }
 
 /**
-- 
2.15.0

[PATCH v5 06/10] powerpc/opal: Rework the opal-async interface

2017-11-02 Thread Cyril Bur

Future work will add an opal_async_wait_response_interruptible()
which will call wait_event_interruptible(). This work requires extra
token state to be tracked as wait_event_interruptible() can return and
the caller could release the token before OPAL responds.

Currently token state is tracked with two bitfields which are 64 bits
big but may not need to be as OPAL informs Linux how many async tokens
there are. It also uses an array indexed by token to store response
messages for each token.

The bitfields make it difficult to add more state and also provide a
hard maximum as to how many tokens there can be - it is possible that
OPAL will inform Linux that there are more than 64 tokens.

Rather than add a bitfield to track the extra state, rework the
internals slightly.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-async.c | 92 -
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index c43421ab2d2f..fbae8a37ce2c 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -1,7 +1,7 @@
 /*
  * PowerNV OPAL asynchronous completion interfaces
  *
- * Copyright 2013 IBM Corp.
+ * Copyright 2013-2017 IBM Corp.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -23,40 +23,45 @@
 #include 
 #include 
 
-#define N_ASYNC_COMPLETIONS64
+enum opal_async_token_state {
+   ASYNC_TOKEN_UNALLOCATED = 0,
+   ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_COMPLETED
+};
+
+struct opal_async_token {
+   enum opal_async_token_state state;
+   struct opal_msg response;
+};
 
-static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL};
-static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS);
 static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait);
 static DEFINE_SPINLOCK(opal_async_comp_lock);
 static struct semaphore opal_async_sem;
-static struct opal_msg *opal_async_responses;
 static unsigned int opal_max_async_tokens;
+static struct opal_async_token *opal_async_tokens;
 
 static int __opal_async_get_token(void)
 {
unsigned long flags;
-   int token;
+   int token = -EBUSY;
 
spin_lock_irqsave(_async_comp_lock, flags);
-   token = find_first_bit(opal_async_complete_map, opal_max_async_tokens);
-   if (token >= opal_max_async_tokens) {
-   token = -EBUSY;
-   goto out;
+   for (token = 0; token < opal_max_async_tokens; token++) {
+   if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) {
+   opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
+   goto out;
+   }
}
-
-   if (__test_and_set_bit(token, opal_async_token_map)) {
-   token = -EBUSY;
-   goto out;
-   }
-
-   __clear_bit(token, opal_async_complete_map);
-
 out:
spin_unlock_irqrestore(_async_comp_lock, flags);
return token;
 }
 
+/*
+ * Note: If the returned token is used in an opal call and opal returns
+ * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * calling another other opal_async_* function
+ */
 int opal_async_get_token_interruptible(void)
 {
int token;
@@ -76,6 +81,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 static int __opal_async_release_token(int token)
 {
unsigned long flags;
+   int rc;
 
if (token < 0 || token >= opal_max_async_tokens) {
pr_err("%s: Passed token is out of range, token %d\n",
@@ -84,11 +90,18 @@ static int __opal_async_release_token(int token)
}
 
spin_lock_irqsave(_async_comp_lock, flags);
-   __set_bit(token, opal_async_complete_map);
-   __clear_bit(token, opal_async_token_map);
+   switch (opal_async_tokens[token].state) {
+   case ASYNC_TOKEN_COMPLETED:
+   case ASYNC_TOKEN_ALLOCATED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED;
+   rc = 0;
+   break;
+   default:
+   rc = 1;
+   }
spin_unlock_irqrestore(_async_comp_lock, flags);
 
-   return 0;
+   return rc;
 }
 
 int opal_async_release_token(int token)
@@ -96,12 +109,10 @@ int opal_async_release_token(int token)
int ret;
 
ret = __opal_async_release_token(token);
-   if (ret)
-   return ret;
-
-   up(_async_sem);
+   if (!ret)
+   up(_async_sem);
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(opal_async_release_token);
 
@@ -122,13 +133,15 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 * functional.
 */
opal_wake_poller();
-   wait_event(opal_async_wait, test_bit(token, opal_async_comple

[PATCH v5 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async

2017-11-02 Thread Cyril Bur

This patch adds an _interruptible version of opal_async_wait_response().
This is useful when a long running OPAL call is performed on behalf of a
userspace thread, for example, the opal_flash_{read,write,erase}
functions performed by the powernv-flash MTD driver.

It is foreseeable that these functions would take upwards of two minutes
causing the wait_event() to block long enough to cause hung task
warnings. Furthermore, wait_event_interruptible() is preferable as
otherwise there is no way for signals to stop the process which is going
to be confusing in userspace.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/opal.h |  2 +
 arch/powerpc/platforms/powernv/opal-async.c | 87 +++--
 2 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0078eb5acf98..f95ca4560bfa 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -307,6 +307,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, 
uint64_t evt_val);
 extern int opal_async_get_token_interruptible(void);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
+extern int opal_async_wait_response_interruptible(uint64_t token,
+   struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 
 struct rtc_time;
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index fbae8a37ce2c..e2004606b75b 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -26,6 +26,8 @@
 enum opal_async_token_state {
ASYNC_TOKEN_UNALLOCATED = 0,
ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_DISPATCHED,
+   ASYNC_TOKEN_ABANDONED,
ASYNC_TOKEN_COMPLETED
 };
 
@@ -58,8 +60,10 @@ static int __opal_async_get_token(void)
 }
 
 /*
- * Note: If the returned token is used in an opal call and opal returns
- * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * Note: If the returned token is used in an opal call and opal
+ * returns OPAL_ASYNC_COMPLETION you MUST one of
+ * opal_async_wait_response() or
+ * opal_async_wait_response_interruptible() at least once before
  * calling another other opal_async_* function
  */
 int opal_async_get_token_interruptible(void)
@@ -96,6 +100,16 @@ static int __opal_async_release_token(int token)
opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED;
rc = 0;
break;
+   /*
+* DISPATCHED and ABANDONED tokens must wait for OPAL to
+* respond.
+* Mark a DISPATCHED token as ABANDONED so that the response
+* response handling code knows no one cares and that it can
+* free it then.
+*/
+   case ASYNC_TOKEN_DISPATCHED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED;
+   /* Fall through */
default:
rc = 1;
}
@@ -128,7 +142,11 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
return -EINVAL;
}
 
-   /* Wakeup the poller before we wait for events to speed things
+   /*
+* There is no need to mark the token as dispatched, wait_event()
+* will block until the token completes.
+*
+* Wakeup the poller before we wait for events to speed things
 * up on platforms or simulators where the interrupts aren't
 * functional.
 */
@@ -141,11 +159,66 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 }
 EXPORT_SYMBOL_GPL(opal_async_wait_response);
 
+int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg 
*msg)
+{
+   unsigned long flags;
+   int ret;
+
+   if (token >= opal_max_async_tokens) {
+   pr_err("%s: Invalid token passed\n", __func__);
+   return -EINVAL;
+   }
+
+   if (!msg) {
+   pr_err("%s: Invalid message pointer passed\n", __func__);
+   return -EINVAL;
+   }
+
+   /*
+* The first time this gets called we mark the token as DISPATCHED
+* so that if wait_event_interruptible() returns not zero and the
+* caller frees the token, we know not to actually free the token
+* until the response comes.
+*
+* Only change if the token is ALLOCATED - it may have been
+* completed even before the caller gets around to calling this
+* the first time.
+*
+* There is also a dirty great comment at the token allocation
+* function that if the opal call returns OPAL_ASYNC_COMPLETION to
+* the caller then the caller *must* call this or the not
+* interruptible version before doing anything e

[PATCH v5 02/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error

2017-11-02 Thread Cyril Bur

While this driver expects to interact asynchronously, OPAL is well
within its rights to return OPAL_SUCCESS to indicate that the operation
completed without the need for a callback. We shouldn't treat
OPAL_SUCCESS as an error rather we should wrap up and return promptly to
the caller.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com>
---
I'll note here that currently no OPAL exists that will return
OPAL_SUCCESS so there isn't the possibility of a bug today.
---
 drivers/mtd/devices/powernv_flash.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index f9ec38281ff2..ca3ca6adf71e 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -63,7 +63,6 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
if (token < 0) {
if (token != -ERESTARTSYS)
dev_err(dev, "Failed to get an async token\n");
-
return token;
}
 
@@ -83,21 +82,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
return -EIO;
}
 
+   if (rc == OPAL_SUCCESS)
+   goto out_success;
+
if (rc != OPAL_ASYNC_COMPLETION) {
dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
op, rc);
-   opal_async_release_token(token);
-   return -EIO;
+   rc = -EIO;
+   goto out;
}
 
rc = opal_async_wait_response(token, );
-   opal_async_release_token(token);
if (rc) {
dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   return -EIO;
+   rc = -EIO;
+   goto out;
}
 
rc = opal_get_async_rc(msg);
+out_success:
if (rc == OPAL_SUCCESS) {
rc = 0;
if (retlen)
@@ -106,6 +109,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
rc = -EIO;
}
 
+out:
+   opal_async_release_token(token);
return rc;
 }
 
-- 
2.15.0

[PATCH v5 00/10] Allow opal-async waiters to get interrupted

2017-11-02 Thread Cyril Bur

V5: Address review from Boris Brezillon, thanks!
Minor cleanups and descriptions - no functional changes.

V4: Rework and rethink.

To recap:
Userspace MTD read()s/write()s and erases to powernv_flash become
calls into the OPAL firmware which subsequently handles flash access.
Because the read()s, write()s or erases can be large (bounded of
course my the size of flash) OPAL may take some time to service the
request, this causes the powernv_flash driver to sit in a wait_event()
for potentially minutes. This causes two problems, firstly, tools
appear to hang for the entire time as they cannot be interrupted by
signals and secondly, this can trigger hung task warnings. The correct
solution is to use wait_event_interruptible() which my rework (as part
of this series) of the opal-async infrastructure provides.

The final patch in this series achieves this. It should eliminate both
hung tasks and threads locking up.

Included in this series are other simpler fixes for powernv_flash:

Don't always return EIO on error. OPAL does mutual exclusion on the
flash and also knows when the service processor takes control of the
flash, in both of these cases it will return OPAL_BUSY, translating
this to EIO is misleading to userspace.

Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION
and don't treat it as an error. Unfortunately there are too many drivers
out there with the incorrect behaviour so this means OPAL can never
return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the
code from being correct.

Don't return ERESTARTSYS if token acquisition is interrupted as
powernv_flash can't be sure it hasn't already performed some work, let
userspace deal with the problem.

Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash.

Not for powernv_flash, a fix from Stewart Smith which fits into this
series as it relies on my improvements to the opal-async
infrastructure.

V3: export opal_error_code() so that powernv_flash can be built=m

Hello,

Version one of this series ignored that OPAL may continue to use
buffers passed to it after Linux kfree()s the buffer. This version
addresses this, not in a particularly nice way - future work could
make this better. This version also includes a few cleanups and fixups
to powernv_flash driver one along the course of this work that I
thought I would just send.

The problem we're trying to solve here is that currently all users of
the opal-async calls must use wait_event(), this may be undesirable
when there is a userspace process behind the request for the opal
call, if OPAL takes too long to complete the call then hung task
warnings will appear.

In order to solve the problem callers should use
wait_event_interruptible(), due to the interruptible nature of this
call the opal-async infrastructure needs to track extra state
associated with each async token, this is prepared for in patch 6/10.

While I was working on the opal-async infrastructure improvements
Stewart fixed another problem and he relies on the corrected behaviour
of opal-async so I've sent it here.

Hello MTD folk, traditionally Michael Ellerman takes powernv_flash
driver patches through the powerpc tree, as always your feedback is
very welcome.

Thanks,

Cyril

Cyril Bur (9):
  mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
  mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
  mtd: powernv_flash: Remove pointless goto in driver init
  mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token
acquisition
  powerpc/opal: Make __opal_async_{get,release}_token() static
  powerpc/opal: Rework the opal-async interface
  powerpc/opal: Add opal_async_wait_response_interruptible() to
opal-async
  powerpc/powernv: Add OPAL_BUSY to opal_error_code()
  mtd: powernv_flash: Use opal_async_wait_response_interruptible()

Stewart Smith (1):
  powernv/opal-sensor: remove not needed lock

 arch/powerpc/include/asm/opal.h  |   4 +-
 arch/powerpc/platforms/powernv/opal-async.c  | 183 +++
 arch/powerpc/platforms/powernv/opal-sensor.c |  17 +--
 arch/powerpc/platforms/powernv/opal.c|   2 +
 drivers/mtd/devices/powernv_flash.c  |  83 +++-
 5 files changed, 194 insertions(+), 95 deletions(-)

-- 
2.15.0

[PATCH v5 05/10] powerpc/opal: Make __opal_async_{get, release}_token() static

2017-11-02 Thread Cyril Bur

There are no callers of both __opal_async_get_token() and
__opal_async_release_token().

This patch also removes the possibility of "emergency through
synchronous call to __opal_async_get_token()" as such it makes more
sense to initialise opal_sync_sem for the maximum number of async
tokens.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/opal.h |  2 --
 arch/powerpc/platforms/powernv/opal-async.c | 10 +++---
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 726c23304a57..0078eb5acf98 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -304,9 +304,7 @@ extern void opal_notifier_enable(void);
 extern void opal_notifier_disable(void);
 extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val);
 
-extern int __opal_async_get_token(void);
 extern int opal_async_get_token_interruptible(void);
-extern int __opal_async_release_token(int token);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index cf33769a7b72..c43421ab2d2f 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -33,7 +33,7 @@ static struct semaphore opal_async_sem;
 static struct opal_msg *opal_async_responses;
 static unsigned int opal_max_async_tokens;
 
-int __opal_async_get_token(void)
+static int __opal_async_get_token(void)
 {
unsigned long flags;
int token;
@@ -73,7 +73,7 @@ int opal_async_get_token_interruptible(void)
 }
 EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 
-int __opal_async_release_token(int token)
+static int __opal_async_release_token(int token)
 {
unsigned long flags;
 
@@ -199,11 +199,7 @@ int __init opal_async_comp_init(void)
goto out_opal_node;
}
 
-   /* Initialize to 1 less than the maximum tokens available, as we may
-* require to pop one during emergency through synchronous call to
-* __opal_async_get_token()
-*/
-   sema_init(_async_sem, opal_max_async_tokens - 1);
+   sema_init(_async_sem, opal_max_async_tokens);
 
 out_opal_node:
of_node_put(opal_node);
-- 
2.15.0

[PATCH v5 07/10] powernv/opal-sensor: remove not needed lock

2017-11-02 Thread Cyril Bur

From: Stewart Smith <stew...@linux.vnet.ibm.com>

Parallel sensor reads could run out of async tokens due to
opal_get_sensor_data grabbing tokens but then doing the sensor
read behind a mutex, essentially serializing the (possibly
asynchronous and relatively slow) sensor read.

It turns out that the mutex isn't needed at all, not only
should the OPAL interface allow concurrent reads, the implementation
is certainly safe for that, and if any sensor we were reading
from somewhere isn't, doing the mutual exclusion in the kernel
is the wrong place to do it, OPAL should be doing it for the kernel.

So, remove the mutex.

Additionally, we shouldn't be printing out an error when we don't
get a token as the only way this should happen is if we've been
interrupted in down_interruptible() on the semaphore.

Reported-by: Robert Lippert <rlipp...@google.com>
Signed-off-by: Stewart Smith <stew...@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-sensor.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c 
b/arch/powerpc/platforms/powernv/opal-sensor.c
index aa267f120033..0a7074bb91dc 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor.c
@@ -19,13 +19,10 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-static DEFINE_MUTEX(opal_sensor_mutex);
-
 /*
  * This will return sensor information to driver based on the requested sensor
  * handle. A handle is an opaque id for the powernv, read by the driver from 
the
@@ -38,13 +35,9 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
__be32 data;
 
token = opal_async_get_token_interruptible();
-   if (token < 0) {
-   pr_err("%s: Couldn't get the token, returning\n", __func__);
-   ret = token;
-   goto out;
-   }
+   if (token < 0)
+   return token;
 
-   mutex_lock(_sensor_mutex);
ret = opal_sensor_read(sensor_hndl, token, );
switch (ret) {
case OPAL_ASYNC_COMPLETION:
@@ -52,7 +45,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
if (ret) {
pr_err("%s: Failed to wait for the async response, 
%d\n",
   __func__, ret);
-   goto out_token;
+   goto out;
}
 
ret = opal_error_code(opal_get_async_rc(msg));
@@ -73,10 +66,8 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
break;
}
 
-out_token:
-   mutex_unlock(_sensor_mutex);
-   opal_async_release_token(token);
 out:
+   opal_async_release_token(token);
return ret;
 }
 EXPORT_SYMBOL_GPL(opal_get_sensor_data);
-- 
2.15.0

[PATCH v5 09/10] powerpc/powernv: Add OPAL_BUSY to opal_error_code()

2017-11-02 Thread Cyril Bur

Also export opal_error_code() so that it can be used in modules

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 65c79ecf5a4d..041ddbd1fc57 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -998,6 +998,7 @@ int opal_error_code(int rc)
 
case OPAL_PARAMETER:return -EINVAL;
case OPAL_ASYNC_COMPLETION: return -EINPROGRESS;
+   case OPAL_BUSY:
case OPAL_BUSY_EVENT:   return -EBUSY;
case OPAL_NO_MEM:   return -ENOMEM;
case OPAL_PERMISSION:   return -EPERM;
@@ -1037,3 +1038,4 @@ EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
 /* Export this for KVM */
 EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
 EXPORT_SYMBOL_GPL(opal_int_eoi);
+EXPORT_SYMBOL_GPL(opal_error_code);
-- 
2.15.0

[PATCH v5 04/10] mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition

2017-11-02 Thread Cyril Bur

Because the MTD core might split up a read() or write() from userspace
into several calls to the driver, we may fail to get a token but already
have done some work, best to return -EINTR back to userspace and have
them decide what to do.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com>
---
 drivers/mtd/devices/powernv_flash.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 4dd3b5d2feb2..3343d4f5c4f3 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -47,6 +47,11 @@ enum flash_op {
FLASH_OP_ERASE,
 };
 
+/*
+ * Don't return -ERESTARTSYS if we can't get a token, the MTD core
+ * might have split up the call from userspace and called into the
+ * driver more than once, we'll already have done some amount of work.
+ */
 static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op,
loff_t offset, size_t len, size_t *retlen, u_char *buf)
 {
@@ -63,6 +68,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
if (token < 0) {
if (token != -ERESTARTSYS)
dev_err(dev, "Failed to get an async token\n");
+   else
+   token = -EINTR;
return token;
}
 
-- 
2.15.0

[PATCH v5 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()

2017-11-02 Thread Cyril Bur

BUG_ON() should be reserved in situations where we can not longer
guarantee the integrity of the system. In the case where
powernv_flash_async_op() receives an impossible op, we can still
guarantee the integrity of the system.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com>
---
 drivers/mtd/devices/powernv_flash.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index f5396f26ddb4..f9ec38281ff2 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -78,7 +78,9 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
rc = opal_flash_erase(info->id, offset, len, token);
break;
default:
-   BUG_ON(1);
+   WARN_ON_ONCE(1);
+   opal_async_release_token(token);
+   return -EIO;
}
 
if (rc != OPAL_ASYNC_COMPLETION) {
-- 
2.15.0

[PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

2017-11-01 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Unchanged
V3: Unchanged

 arch/powerpc/include/asm/tm.h   |  5 ++--
 arch/powerpc/kernel/process.c   | 22 ++-
 arch/powerpc/kernel/signal_32.c |  2 +-
 arch/powerpc/kernel/signal_64.c |  2 +-
 arch/powerpc/kernel/tm.S| 59 -
 arch/powerpc/kernel/traps.c | 26 +-
 6 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..33d965911bec 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -11,10 +11,9 @@
 
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
-  unsigned long orig_msr, uint8_t cause);
+  uint8_t cause);
 extern void tm_reclaim_current(uint8_t cause);
-extern void tm_recheckpoint(struct thread_struct *thread,
-   unsigned long orig_msr);
+extern void tm_recheckpoint(struct thread_struct *thread);
 extern void tm_abort(uint8_t cause);
 extern void tm_save_sprs(struct thread_struct *thread);
 extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bf651f2fd3bd..b00c291cd05c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -869,6 +869,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 
giveup_all(container_of(thr, struct task_struct, thread));
 
+   tm_reclaim(thr, cause);
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -887,8 +889,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(>ckvr_state, >vr_state,
   sizeof(struct thread_vr_state));
-
-   tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
 void tm_reclaim_current(uint8_t cause)
@@ -937,11 +937,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
tm_save_sprs(thr);
 }
 
-extern void __

[PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable

2017-11-01 Thread Cyril Bur

After handling a transactional FP, Altivec or VSX unavailable exception.
The return to userspace code will detect that the TIF_RESTORE_TM bit is
set and call restore_tm_state(). restore_tm_state() will call
restore_math() to ensure that the correct facilities are loaded.

This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
is doing pointless work and can simply be removed.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Obvious cleanup which should have been in v1
V3: Unchanged
 arch/powerpc/kernel/traps.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4a7bc64352fd..3181e85ef17c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * so we don't want to load the VRs from the thread_struct.
 */
tm_recheckpoint(>thread);
-
-   /* If VMX is in use, get the transactional values back */
-   if (orig_msr & MSR_VEC) {
-   msr_check_and_set(MSR_VEC);
-   load_vr_state(>thread.vr_state);
-   /* At this point all the VSX state is loaded, so enable it */
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
current->thread.load_vec = 1;
tm_recheckpoint(>thread);
current->thread.used_vr = 1;
-
-   if (orig_msr & MSR_FP) {
-   msr_check_and_set(MSR_FP);
-   load_fp_state(>thread.fp_state);
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void vsx_unavailable_tm(struct pt_regs *regs)
@@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.load_fp = 1;
 
tm_recheckpoint(>thread);
-
-   msr_check_and_set(MSR_FP | MSR_VEC);
-   load_fp_state(>thread.fp_state);
-   load_vr_state(>thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.15.0

[PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception

2017-11-01 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Add this patch for ease of backporting the same fix as the next
patch.
V3: No change

 arch/powerpc/kernel/process.c |  4 ++--
 arch/powerpc/kernel/traps.c   | 22 +-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index cff887e67eb9..bf651f2fd3bd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -867,6 +867,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   giveup_all(container_of(thr, struct task_struct, thread));
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
memcpy(>ckvr_state, >vr_state,
   sizeof(struct thread_vr_state));
 
-   giveup_all(container_of(thr, struct task_struct, thread));
-
tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ef6a45969812..a7d42c89a257 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * If VMX is in use, the VRs now hold checkpointed values,
 * so we don't want to load the VRs from the thread_struct.
 */
-   tm_recheckpoint(>thread, MSR_FP);
+   tm_recheckpoint(>thread, orig_msr | MSR_FP);
 
/* If VMX is in use, get the transactional values back */
-   if (regs->msr & MSR_VEC) {
+   if (orig_msr & MSR_VEC) {
msr_check_and_set(MSR_VEC);
load_vr_state(>thread.vr_state);
/* At this point all the VSX state is loaded, so enable it */
@@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
-   tm_recheckpoint(>thread, MSR_VEC);
+   tm_recheckpoint(>thread, orig_msr | MSR_VEC);
current->thread.used_vr = 1;
 
-   if (regs->msr & MSR_FP) {
+   if (orig_msr & MSR_FP) {
msr_check_and_set(MSR_FP);
load_fp

[PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use
the load_fp and load_vec booleans to help restore_math() make the
correct decision
V3: Put tm_active_with_{fp,altivec}() inside a #ifdef
CONFIG_PPC_TRANSACTIONAL_MEM 


 arch/powerpc/kernel/process.c | 18 --
 arch/powerpc/kernel/traps.c   |  8 
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..cff887e67eb9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -97,9 +97,23 @@ static inline bool msr_tm_active(unsigned long msr)
 {
return MSR_TM_ACTIVE(msr);
 }
+
+static bool tm_active_with_fp(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_FP);
+}
+
+static bool tm_active_with_altivec(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_VEC);
+}
 #else
 static inline bool msr_tm_active(unsigned long msr) { return false; }
 static inline void check_if_tm_restore_required(struct task_struct *tsk) { }
+static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; }
+static inline bool tm_active_with_altivec(struct task_struct *tsk) { return 
false; }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 bool strict_msr_control;
@@ -232,7 +246,7 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 static int restore_fp(struct task_struct *tsk)
 {
-   if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) {
+   if (tsk->thread.

Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-11-01 Thread Cyril Bur

On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote:
> Hi Cyril,
> 
> Thank you for the patch! Yet something to improve:
> 

Once again robot, you have done brilliantly! You're 100% correct and
the last thing I want to do is break the build with
CONFIG_PPC_TRANSACTIONAL_MEM turned off.

Life saver,
Thanks so much kbuild.

Cyril

> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.14-rc7 next-20171018]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-asp8347_defconfig (attached as .config)
> compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>arch/powerpc/kernel/process.c: In function 'is_transactionally_fp':
> > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has 
> > > no member named 'ckpt_regs'
> 
>   (tsk->thread.ckpt_regs.msr & MSR_FP);
>   ^
>arch/powerpc/kernel/process.c:244:1: error: control reaches end of 
> non-void function [-Werror=return-type]
> }
> ^
>cc1: all warnings being treated as errors
> 
> vim +243 arch/powerpc/kernel/process.c
> 
>239
>240static int is_transactionally_fp(struct task_struct *tsk)
>241{
>242return msr_tm_active(tsk->thread.regs->msr) &&
>  > 243(tsk->thread.ckpt_regs.msr & MSR_FP);
>244}
>245
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM

2017-11-01 Thread Cyril Bur

On Wed, 2017-11-01 at 15:23 -0400, Gustavo Romero wrote:
> Add a self test to check if FP/VEC/VSX registers are sane (restored
> correctly) after a FP/VEC/VSX unavailable exception is caught during a
> transaction.
> 
> This test checks all possibilities in a thread regarding the combination
> of MSR.[FP|VEC] states in a thread and for each scenario raises a
> FP/VEC/VSX unavailable exception in transactional state, verifying if
> vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg
> sets, are not corrupted.
> 

Thanks Gustavo,

I do have one more thought on an improvement for this test which is
that:
+   /* Counter for busy wait *
+   uint64_t counter = 0x1ff00;
is a bit fragile, what we should do is have the test work out long it
should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then
use that for these tests.

This will only become a problem if we were to change kernel heuristics
which is fine for now. I'll try to get that added soon but for now this
test has proven too useful to delay adding as is.

> Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com>
> Signed-off-by: Breno Leitao <lei...@debian.org>
> Signed-off-by: Cyril Bur <cyril...@gmail.com>
> ---
>  tools/testing/selftests/powerpc/tm/Makefile|   3 +-
>  .../testing/selftests/powerpc/tm/tm-unavailable.c  | 368 
> +
>  tools/testing/selftests/powerpc/tm/tm.h|   5 +
>  3 files changed, 375 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c
> 
> diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
> b/tools/testing/selftests/powerpc/tm/Makefile
> index 7bfcd45..24855c0 100644
> --- a/tools/testing/selftests/powerpc/tm/Makefile
> +++ b/tools/testing/selftests/powerpc/tm/Makefile
> @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
> tm-signal-context-chk-fpu
>   tm-signal-context-chk-vmx tm-signal-context-chk-vsx
>  
>  TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
> tm-signal-stack \
> - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \
> + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \
>   $(SIGNAL_CONTEXT_CHK_TESTS)
>  
>  include ../../lib.mk
> @@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include
>  $(OUTPUT)/tm-tmspr: CFLAGS += -pthread
>  $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
>  $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o
> +$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 
> -Wno-error=uninitialized -mvsx
>  
>  SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
> %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
>  $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
> diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
> b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> new file mode 100644
> index 000..69a4e8c
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
> @@ -0,0 +1,368 @@
> +/*
> + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp.
> + * Licensed under GPLv2.
> + *
> + * Force FP, VEC and VSX unavailable exception during transaction in all
> + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP
> + * is enable and VEC is disable, when FP is disable and VEC is enable, and
> + * so on. Then we check if the restored state is correctly set for the
> + * FP and VEC registers to the previous state we set just before we entered
> + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and
> + * VEC/Altivec registers on abortion due to an unavailable exception in TM.
> + * N.B. In this test we do not test all the FP/Altivec/VSX registers for
> + * corruption, but only for registers vs0 and vs32, which are respectively
> + * representatives of FP and VEC/Altivec reg sets.
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "tm.h"
> +
> +#define DEBUG 0
> +
> +/* Unavailable exceptions to test in HTM */
> +#define FP_UNA_EXCEPTION 0
> +#define VEC_UNA_EXCEPTION1
> +#define VSX_UNA_EXCEPTION2
> +
> +#define NUM_EXCEPTIONS   3
> +
> +struct Flags {
> + int touch_fp;
> + int touch_vec;
> + int result;
> + int exception;
> +} flags;
> +
> +bool expecting_failure(void)
> +{
> + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION)
> + return false;
> +
> + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION)
> + return false;
> +
> + /* If both FP and VEC are touched it does

[PATCH v2 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception

2017-10-30 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Add this patch for ease of backporting the same fix as the next
patch.

 arch/powerpc/kernel/process.c |  4 ++--
 arch/powerpc/kernel/traps.c   | 22 +-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ebb5b58a4138..cfa75e99dcfb 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -866,6 +866,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   giveup_all(container_of(thr, struct task_struct, thread));
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -885,8 +887,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
memcpy(>ckvr_state, >vr_state,
   sizeof(struct thread_vr_state));
 
-   giveup_all(container_of(thr, struct task_struct, thread));
-
tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ef6a45969812..a7d42c89a257 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * If VMX is in use, the VRs now hold checkpointed values,
 * so we don't want to load the VRs from the thread_struct.
 */
-   tm_recheckpoint(>thread, MSR_FP);
+   tm_recheckpoint(>thread, orig_msr | MSR_FP);
 
/* If VMX is in use, get the transactional values back */
-   if (regs->msr & MSR_VEC) {
+   if (orig_msr & MSR_VEC) {
msr_check_and_set(MSR_VEC);
load_vr_state(>thread.vr_state);
/* At this point all the VSX state is loaded, so enable it */
@@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
+   /*
+* Save the MSR now because tm_reclaim_current() is likely to
+* change it
+*/
+   unsigned long orig_msr = regs->msr;
+
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1;
-   tm_recheckpoint(>thread, MSR_VEC);
+   tm_recheckpoint(>thread, orig_msr | MSR_VEC);
current->thread.used_vr = 1;
 
-   if (regs->msr & MSR_FP) {
+   if (orig_msr & MSR_FP) {
msr_check_and_set(MSR_FP);
load_fp_state(>thread.fp_state

[PATCH v2 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable

2017-10-30 Thread Cyril Bur

After handling a transactional FP, Altivec or VSX unavailable exception.
The return to userspace code will detect that the TIF_RESTORE_TM bit is
set and call restore_tm_state(). restore_tm_state() will call
restore_math() to ensure that the correct facilities are loaded.

This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
is doing pointless work and can simply be removed.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Obvious cleanup which should have been in v1

 arch/powerpc/kernel/traps.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4a7bc64352fd..3181e85ef17c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* Note:  This does not handle any kind of FP laziness. */
 
TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 * so we don't want to load the VRs from the thread_struct.
 */
tm_recheckpoint(>thread);
-
-   /* If VMX is in use, get the transactional values back */
-   if (orig_msr & MSR_VEC) {
-   msr_check_and_set(MSR_VEC);
-   load_vr_state(>thread.vr_state);
-   /* At this point all the VSX state is loaded, so enable it */
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
-   /*
-* Save the MSR now because tm_reclaim_current() is likely to
-* change it
-*/
-   unsigned long orig_msr = regs->msr;
-
/* See the comments in fp_unavailable_tm().  This function operates
 * the same way.
 */
@@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
current->thread.load_vec = 1;
tm_recheckpoint(>thread);
current->thread.used_vr = 1;
-
-   if (orig_msr & MSR_FP) {
-   msr_check_and_set(MSR_FP);
-   load_fp_state(>thread.fp_state);
-   regs->msr |= MSR_VSX;
-   }
 }
 
 void vsx_unavailable_tm(struct pt_regs *regs)
@@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.load_fp = 1;
 
tm_recheckpoint(>thread);
-
-   msr_check_and_set(MSR_FP | MSR_VEC);
-   load_fp_state(>thread.fp_state);
-   load_vr_state(>thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.14.3

[PATCH v2 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

2017-10-30 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Unchanged

 arch/powerpc/include/asm/tm.h   |  5 ++--
 arch/powerpc/kernel/process.c   | 22 ++-
 arch/powerpc/kernel/signal_32.c |  2 +-
 arch/powerpc/kernel/signal_64.c |  2 +-
 arch/powerpc/kernel/tm.S| 59 -
 arch/powerpc/kernel/traps.c | 26 +-
 6 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..33d965911bec 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -11,10 +11,9 @@
 
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
-  unsigned long orig_msr, uint8_t cause);
+  uint8_t cause);
 extern void tm_reclaim_current(uint8_t cause);
-extern void tm_recheckpoint(struct thread_struct *thread,
-   unsigned long orig_msr);
+extern void tm_recheckpoint(struct thread_struct *thread);
 extern void tm_abort(uint8_t cause);
 extern void tm_save_sprs(struct thread_struct *thread);
 extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index cfa75e99dcfb..4b322ede6420 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -868,6 +868,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 
giveup_all(container_of(thr, struct task_struct, thread));
 
+   tm_reclaim(thr, cause);
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(>ckvr_state, >vr_state,
   sizeof(struct thread_vr_state));
-
-   tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
 void tm_reclaim_current(uint8_t cause)
@@ -936,11 +936,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
tm_save_sprs(thr);
 }
 
-extern void __tm_recheckpoin

[PATCH v2 1/4] powerpc: Don't enable FP/Altivec if not checkpointed

2017-10-30 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use
the load_fp and load_vec booleans to help restore_math() make the
correct decision

 arch/powerpc/kernel/process.c | 17 +++--
 arch/powerpc/kernel/traps.c   |  8 
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..ebb5b58a4138 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -230,9 +230,15 @@ void enable_kernel_fp(void)
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 
+static bool tm_active_with_fp(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_FP);
+}
+
 static int restore_fp(struct task_struct *tsk)
 {
-   if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) {
+   if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
load_fp_state(>thread.fp_state);
current->thread.load_fp++;
return 1;
@@ -311,10 +317,17 @@ void flush_altivec_to_thread(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 
+static bool tm_active_with_altivec(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_VEC);
+}
+
+
 static int restore_altivec(struct task_struct *tsk)
 {
if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
-   (tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) 
{
+   (tsk->thread.load_vec || tm_active_with_altivec(tsk))) {

Re: [PATCH v4 00/10] Allow opal-async waiters to get interrupted

2017-10-30 Thread Cyril Bur

On Mon, 2017-10-30 at 10:15 +0100, Boris Brezillon wrote:
> On Tue, 10 Oct 2017 14:32:52 +1100
> Cyril Bur <cyril...@gmail.com> wrote:
> 
> > V4: Rework and rethink.
> > 
> > To recap:
> > Userspace MTD read()s/write()s and erases to powernv_flash become
> > calls into the OPAL firmware which subsequently handles flash access.
> > Because the read()s, write()s or erases can be large (bounded of
> > course my the size of flash) OPAL may take some time to service the
> > request, this causes the powernv_flash driver to sit in a wait_event()
> > for potentially minutes. This causes two problems, firstly, tools
> > appear to hang for the entire time as they cannot be interrupted by
> > signals and secondly, this can trigger hung task warnings. The correct
> > solution is to use wait_event_interruptible() which my rework (as part
> > of this series) of the opal-async infrastructure provides.
> > 
> > The final patch in this series achieves this. It should eliminate both
> > hung tasks and threads locking up.
> > 
> > Included in this series are other simpler fixes for powernv_flash:
> > 
> > Don't always return EIO on error. OPAL does mutual exclusion on the
> > flash and also knows when the service processor takes control of the
> > flash, in both of these cases it will return OPAL_BUSY, translating
> > this to EIO is misleading to userspace.
> > 
> > Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION
> > and don't treat it as an error. Unfortunately there are too many drivers
> > out there with the incorrect behaviour so this means OPAL can never
> > return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the
> > code from being correct.
> > 
> > Don't return ERESTARTSYS if token acquisition is interrupted as
> > powernv_flash can't be sure it hasn't already performed some work, let
> > userspace deal with the problem.
> > 
> > Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash.
> > 
> > Not for powernv_flash, a fix from Stewart Smith which fits into this
> > series as it relies on my improvements to the opal-async
> > infrastructure.
> > 
> > V3: export opal_error_code() so that powernv_flash can be built=m
> > 
> > Hello,
> > 
> > Version one of this series ignored that OPAL may continue to use
> > buffers passed to it after Linux kfree()s the buffer. This version
> > addresses this, not in a particularly nice way - future work could
> > make this better. This version also includes a few cleanups and fixups
> > to powernv_flash driver one along the course of this work that I
> > thought I would just send.
> > 
> > The problem we're trying to solve here is that currently all users of
> > the opal-async calls must use wait_event(), this may be undesirable
> > when there is a userspace process behind the request for the opal
> > call, if OPAL takes too long to complete the call then hung task
> > warnings will appear.
> > 
> > In order to solve the problem callers should use
> > wait_event_interruptible(), due to the interruptible nature of this
> > call the opal-async infrastructure needs to track extra state
> > associated with each async token, this is prepared for in patch 6/10.
> > 
> > While I was working on the opal-async infrastructure improvements
> > Stewart fixed another problem and he relies on the corrected behaviour
> > of opal-async so I've sent it here.
> > 
> > Hello MTD folk, traditionally Michael Ellerman takes powernv_flash
> > driver patches through the powerpc tree, as always your feedback is
> > very welcome.
> 
> Just gave my acks on patches 1 to 4 and patch 10 (with minor comments
> on patch 3 and 10). Feel free to take the patches directly through the
> powerpc tree.
> 

Hi Boris, thanks very much for the acks. 

All good points - I'll fix that up in a v2

Thanks again,

Cyril

> > 
> > Thanks,
> > 
> > Cyril
> > 
> > Cyril Bur (9):
> >   mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
> >   mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
> >   mtd: powernv_flash: Remove pointless goto in driver init
> >   mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token
> > acquisition
> >   powerpc/opal: Make __opal_async_{get,release}_token() static
> >   powerpc/opal: Rework the opal-async interface
> >   powerpc/opal: Add opal_async_wait_response_interruptible() to
> > opal-async
> >   powerpc/powernv: Add OPAL_BUSY to opal_error_code()
> >   mtd: powernv_flash: Use opal_async_wait_response_interruptible()
> > 
> > Stewart Smith (1):
> >   powernv/opal-sensor: remove not needed lock
> > 
> >  arch/powerpc/include/asm/opal.h  |   4 +-
> >  arch/powerpc/platforms/powernv/opal-async.c  | 183 
> > +++
> >  arch/powerpc/platforms/powernv/opal-sensor.c |  17 +--
> >  arch/powerpc/platforms/powernv/opal.c|   2 +
> >  drivers/mtd/devices/powernv_flash.c  |  83 +++-
> >  5 files changed, 194 insertions(+), 95 deletions(-)
> > 
> 
>

[PATCH 2/2] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

2017-10-29 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/tm.h   |  5 ++--
 arch/powerpc/kernel/process.c   | 26 +++---
 arch/powerpc/kernel/signal_32.c |  2 +-
 arch/powerpc/kernel/signal_64.c |  2 +-
 arch/powerpc/kernel/tm.S| 59 -
 arch/powerpc/kernel/traps.c | 23 +---
 6 files changed, 37 insertions(+), 80 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..33d965911bec 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -11,10 +11,9 @@
 
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
-  unsigned long orig_msr, uint8_t cause);
+  uint8_t cause);
 extern void tm_reclaim_current(uint8_t cause);
-extern void tm_recheckpoint(struct thread_struct *thread,
-   unsigned long orig_msr);
+extern void tm_recheckpoint(struct thread_struct *thread);
 extern void tm_abort(uint8_t cause);
 extern void tm_save_sprs(struct thread_struct *thread);
 extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index da900cd86324..fc9b88ccc2a7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -866,6 +866,10 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
 
+   giveup_all(container_of(thr, struct task_struct, thread));
+
+   tm_reclaim(thr, cause);
+
/*
 * If we are in a transaction and FP is off then we can't have
 * used FP inside that transaction. Hence the checkpointed
@@ -884,10 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(>ckvr_state, >vr_state,
   sizeof(struct thread_vr_state));
-
-   giveup_all(container_of(thr, struct task_struct, thread));
-
-   tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
 void tm_reclaim_current(uint8_t cause)
@@ -936,11 +936,9

[PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed

2017-10-29 Thread Cyril Bur

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled.  The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct.  It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/process.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..da900cd86324 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -230,9 +230,15 @@ void enable_kernel_fp(void)
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 
+static int is_transactionally_fp(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_FP);
+}
+
 static int restore_fp(struct task_struct *tsk)
 {
-   if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) {
+   if (tsk->thread.load_fp || is_transactionally_fp(tsk)) {
load_fp_state(>thread.fp_state);
current->thread.load_fp++;
return 1;
@@ -311,10 +317,17 @@ void flush_altivec_to_thread(struct task_struct *tsk)
 }
 EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 
+static int is_transactionally_altivec(struct task_struct *tsk)
+{
+   return msr_tm_active(tsk->thread.regs->msr) &&
+   (tsk->thread.ckpt_regs.msr & MSR_VEC);
+}
+
+
 static int restore_altivec(struct task_struct *tsk)
 {
if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
-   (tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) 
{
+   (tsk->thread.load_vec || is_transactionally_altivec(tsk))) {
load_vr_state(>thread.vr_state);
tsk->thread.used_vr = 1;
tsk->thread.load_vec++;
-- 
2.14.3

Re: [PATCH] powerpc/tm: fix live state of vs0/32 in tm_reclaim

2017-10-25 Thread Cyril Bur

On Wed, 2017-07-05 at 11:02 +1000, Michael Neuling wrote:
> On Tue, 2017-07-04 at 16:45 -0400, Gustavo Romero wrote:
> > Currently tm_reclaim() can return with a corrupted vs0 (fp0) or vs32 (v0)
> > due to the fact vs0 is used to save FPSCR and vs32 is used to save VSCR.
> 

Hi Mikey,

This completely fell off my radar, we do need something merged!

For what its worth I like the original patch.

> tm_reclaim() should have no state live in the registers once it returns.  It
> should all be saved in the thread struct. The above is not an issue in my 
> book.
> 

Yeah, this is something I agree with, however, if that is the case then
why have tm_recheckpoint() do partial reloads?

A partial reload only makes sense if we can be sure that reclaim will
have left the state at least (partially) correct - not with (as is the
case today) one corrupted fp or Altivec reg.

> Having a quick look at the code, I think there's and issue but we need 
> something
> more like this (completely untested).
> 
> When we recheckpoint inside an fp unavail, we need to recheckpoint vec if it 
> was
> enabled.  Currently we only ever recheckpoint the FP which seems like a bug. 
> Visa versa for the other way around.
> 

In your example, we don't need to reload VEC if we can trust that
reclaim left the checkpointed regs on the CPU correctly - this patch
achieves this.

Of course I'm more than happy to reduce complexity and not have this
optimisation at all but then we should remove the entire parameter to
tm_recheckpoint(). Any in between feels dangerous.


Cyril


> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index d4e545d27e..d1184264e2 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1589,7 +1589,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
>  * If VMX is in use, the VRs now hold checkpointed values,
>  * so we don't want to load the VRs from the thread_struct.
>  */
> -   tm_recheckpoint(>thread, MSR_FP);
> +   tm_recheckpoint(>thread, regs->msr);
>  
> /* If VMX is in use, get the transactional values back */
> if (regs->msr & MSR_VEC) {
> @@ -1611,7 +1611,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
>  regs->nip, regs->msr);
> tm_reclaim_current(TM_CAUSE_FAC_UNAV);
> regs->msr |= MSR_VEC;
> -   tm_recheckpoint(>thread, MSR_VEC);
> +   tm_recheckpoint(>thread, regs->msr);
> current->thread.used_vr = 1;
>  
> if (regs->msr & MSR_FP) {
> 
> 
> > Later, we recheckpoint trusting that the live state of FP and VEC are ok
> > depending on the MSR.FP and MSR.VEC bits, i.e. if MSR.FP is enabled that
> > means the FP registers checkpointed when we entered in TM are correct and
> > after a treclaim. we can trust the FP live state. Similarly to VEC regs.
> > However if tm_reclaim() does not return a sane state then tm_recheckpoint()
> > will recheckpoint a corrupted state from live state back to the checkpoint
> > area.
> 
> 
> 
> 
> > That commit fixes the corruption by restoring vs0 and vs32 from the
> > ckfp_state and ckvr_state after they are used to save FPSCR and VSCR,
> > respectively.
> > 
> > The effect of the issue described above is observed, for instance, once a
> > VSX unavailable exception is caught in the middle of a transaction with
> > MSR.FP = 1 or MSR.VEC = 1. If MSR.FP = 1, then after getting back to user
> > space FP state is corrupted. If MSR.VEC = 1, then VEC state is corrupted.
> > 
> > The issue does not occur if MSR.FP = 0 and MSR.VEC = 0 because ckfp_state
> > and ckvr_state are both copied from fp_state and vr_state, respectively,
> > and on recheckpointing both states will be restored from these thread
> > structures and not from the live state.
> > 
> > The issue does not occur also if MSR.FP = 1 and MSR.VEC = 1 because it
> > implies MSR.VSX = 1 and in that case the VSX unavailable exception does not
> > happen in the middle of the transactional block.
> > 
> > Finally, that commit also fixes the MSR used to check if FP and VEC bits
> > are enabled once we are in tm_reclaim_thread(). ckpt_regs.msr is valid only
> > if giveup_all() is called *before* using ckpt_regs.msr for checks because
> > check_if_tm_restore_required() in giveup_all() will copy regs->msr to
> > ckpt_regs.msr and so ckpt_regs.msr reflects exactly the MSR that the thread
> > had when it came off the processor.
> > 
> > No regression was observed on powerpc/tm selftests after this fix.
> > 
> > Signed-off-by: Gustavo Romero 
> > Signed-off-by: Breno Leitao 
> > ---
> >  arch/powerpc/kernel/process.c |  9 +++--
> >  arch/powerpc/kernel/tm.S  | 14 ++
> >  2 files changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 2ad725e..ac1fc51 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c

Re: [PATCH] powerpc/tm: Set ckpt_regs.msr before using it.

2017-10-24 Thread Cyril Bur

On Tue, 2017-10-24 at 15:13 -0200, Breno Leitao wrote:
> From: Breno Leitao <breno.lei...@debian.org>
> 
> On commit commit f48e91e87e67 ("powerpc/tm: Fix FP and VMX register
> corruption"), we check ckpt_regs.msr to see if a feature (as VEC, VSX
> and FP) is disabled (thus the hot registers might be bogus during the
> reclaim), and then copy the previously saved thread registers, with the
> non-bogus values, into the checkpoint area for a later trecheckpoint.
> This mechanism is used to recheckpoints the proper register values when
> a transaction started using the bogus registers, and these values were
> sent to the memory checkpoint area.
> 
> I see a problem on this code that ckpt_regs.msr is not properly set when
> using it, as for example, when there is a vsx_unavailable_tm() in a code
> like the following, the ckpt_regs.msg[FP] is 0;
> 
>  1: sleep_until_{fp,vec,vsx} = 0
>  2: fadd
>  3: tbegin.
>  4: beq
>  5: xxmrghd
>  6: tend.
> 
> In this case, line 5 will raise an vsx_unavailable_tm() exception, and
> the ckpt_regs.msr[FP] will be zero before memcpy() block, executing the
> memcpy even with the the FP registers hot. That is not correct because
> we executed a float point instruction on line 2, and MSR[FP] was set to
> 1.
> 
> Fortunately this does not cause a big problem as I can see, other than
> this extra memcpy() because treclaim() will later overwrite this wrong
> copied value, since it relies on the correct MSR value, which was
> updated by giveup_all->check_if_tm_restore_required. There might be a
> problem when laziness is being turned on, but I was not able to
> reproduce it.

I believe this analysis is correct, I have come to the same conclusion
in the past. I've also done a bunch of testing with variants of this
patch and haven't seen a difference, however, I do believe the code is
more correct with this patch.

Signed-off-by: Cyril Bur <cyril...@gmail.com>

Having said all that, nothing rules out that our tests simply aren't
good enough ;)

 
> 
> The solution I am proposing is updating ckpt_regs.msr before using it.
> 
> Signed-off-by: Breno Leitao <lei...@debian.org>
> Signed-off-by: Gustavo Romero <gusbrom...@gmail.com>
> CC: Cyril Bur <cyril...@gmail.com>
> CC: Michael Neuling <mi...@neuling.org>
> ---
>  arch/powerpc/kernel/process.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index c051dc2b42ad..773e9c5594e7 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -860,6 +860,9 @@ static void tm_reclaim_thread(struct thread_struct *thr,
>   if (!MSR_TM_SUSPENDED(mfmsr()))
>   return;
>  
> + /* Give up all the registers and set ckpt_regs.msr */
> + giveup_all(container_of(thr, struct task_struct, thread));
> +
>   /*
>* If we are in a transaction and FP is off then we can't have
>* used FP inside that transaction. Hence the checkpointed
> @@ -879,8 +882,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
>   memcpy(>ckvr_state, >vr_state,
>  sizeof(struct thread_vr_state));
>  
> - giveup_all(container_of(thr, struct task_struct, thread));
> -
>   tm_reclaim(thr, thr->ckpt_regs.msr, cause);
>  }
>

Re: [PATCH v3 3/3] powerpc:selftest update memcmp_64 selftest for VMX implementation

2017-10-15 Thread Cyril Bur

On Fri, 2017-10-13 at 12:30 +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> This patch adjust selftest memcmp_64 so that memcmp selftest can be
> compiled successfully.
> 

Do they not compile at the moment?

> It also adds testcases for:
> - memcmp over 4K bytes size.
> - s1/s2 with different/random offset on 16 bytes boundary.
> - enter/exit_vmx_ops pairness.
> 

This is a great idea, just a thought though - perhaps it might make
more sense to have each condition be tested for in a separate binary
rather than a single binary that tests everything.

> Signed-off-by: Simon Guo 
> ---
>  .../selftests/powerpc/copyloops/asm/ppc_asm.h  |  4 +-
>  .../selftests/powerpc/stringloops/asm/ppc_asm.h| 22 +
>  .../testing/selftests/powerpc/stringloops/memcmp.c | 98 
> +-
>  3 files changed, 100 insertions(+), 24 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h 
> b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h
> index 80d34a9..51bf6fa 100644
> --- a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h
> +++ b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h
> @@ -35,11 +35,11 @@
>   li  r3,0
>   blr
>  
> -FUNC_START(enter_vmx_copy)
> +FUNC_START(enter_vmx_ops)
>   li  r3,1
>   blr
>  
> -FUNC_START(exit_vmx_copy)
> +FUNC_START(exit_vmx_ops)
>   blr
>  
>  FUNC_START(memcpy_power7)
> diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h 
> b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> index 11bece8..3326992 100644
> --- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> +++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
> @@ -1,3 +1,5 @@
> +#ifndef _PPC_ASM_H
> +#define __PPC_ASM_H
>  #include 
>  
>  #ifndef r1
> @@ -5,3 +7,23 @@
>  #endif
>  
>  #define _GLOBAL(A) FUNC_START(test_ ## A)
> +
> +#define CONFIG_ALTIVEC
> +
> +#define R14 r14
> +#define R15 r15
> +#define R16 r16
> +#define R17 r17
> +#define R18 r18
> +#define R19 r19
> +#define R20 r20
> +#define R21 r21
> +#define R22 r22
> +#define R29 r29
> +#define R30 r30
> +#define R31 r31
> +
> +#define STACKFRAMESIZE   256
> +#define STK_REG(i)   (112 + ((i)-14)*8)
> +
> +#endif
> diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp.c 
> b/tools/testing/selftests/powerpc/stringloops/memcmp.c
> index 30b1222..f5225f6 100644
> --- a/tools/testing/selftests/powerpc/stringloops/memcmp.c
> +++ b/tools/testing/selftests/powerpc/stringloops/memcmp.c
> @@ -1,20 +1,40 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "utils.h"
>  
>  #define SIZE 256
>  #define ITERATIONS 1
>  
> +#define LARGE_SIZE (5 * 1024)
> +#define LARGE_ITERATIONS 1000
> +#define LARGE_MAX_OFFSET 32
> +#define LARGE_SIZE_START 4096
> +
> +#define MAX_OFFSET_DIFF_S1_S2 48
> +
> +int vmx_count;
> +int enter_vmx_ops(void)
> +{
> + vmx_count++;
> + return 1;
> +}
> +
> +void exit_vmx_ops(void)
> +{
> + vmx_count--;
> +}
>  int test_memcmp(const void *s1, const void *s2, size_t n);
>  
>  /* test all offsets and lengths */
> -static void test_one(char *s1, char *s2)
> +static void test_one(char *s1, char *s2, unsigned long max_offset,
> + unsigned long size_start, unsigned long max_size)
>  {
>   unsigned long offset, size;
>  
> - for (offset = 0; offset < SIZE; offset++) {
> - for (size = 0; size < (SIZE-offset); size++) {
> + for (offset = 0; offset < max_offset; offset++) {
> + for (size = size_start; size < (max_size - offset); size++) {
>   int x, y;
>   unsigned long i;
>  
> @@ -34,70 +54,104 @@ static void test_one(char *s1, char *s2)
>   printf("\n");
>   abort();
>   }
> +
> + if (vmx_count != 0) {
> + printf("vmx enter/exit not paired.(offset:%ld 
> size:%ld s1:%p s2:%p vc:%d\n",
> + offset, size, s1, s2, vmx_count);
> + printf("\n");
> + abort();
> + }
>   }
>   }
>  }
>  
> -static int testcase(void)
> +static int testcase(bool islarge)
>  {
>   char *s1;
>   char *s2;
>   unsigned long i;
>  
> - s1 = memalign(128, SIZE);
> + unsigned long comp_size = (islarge ? LARGE_SIZE : SIZE);
> + unsigned long alloc_size = comp_size + MAX_OFFSET_DIFF_S1_S2;
> + int iterations = islarge ? LARGE_ITERATIONS : ITERATIONS;
> +
> + s1 = memalign(128, alloc_size);
>   if (!s1) {
>   perror("memalign");
>   exit(1);
>   }
>  
> - s2 = memalign(128, SIZE);
> + s2 = memalign(128, alloc_size);
>   if (!s2) {
>   perror("memalign");
>   exit(1);
>   }
>  
> - srandom(1);
> +

[PATCH v4 05/10] powerpc/opal: Make __opal_async_{get, release}_token() static

2017-10-09 Thread Cyril Bur

There are no callers of both __opal_async_get_token() and
__opal_async_release_token().

This patch also removes the possibility of "emergency through
synchronous call to __opal_async_get_token()" as such it makes more
sense to initialise opal_sync_sem for the maximum number of async
tokens.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/opal.h |  2 --
 arch/powerpc/platforms/powernv/opal-async.c | 10 +++---
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 726c23304a57..0078eb5acf98 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -304,9 +304,7 @@ extern void opal_notifier_enable(void);
 extern void opal_notifier_disable(void);
 extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val);
 
-extern int __opal_async_get_token(void);
 extern int opal_async_get_token_interruptible(void);
-extern int __opal_async_release_token(int token);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index cf33769a7b72..c43421ab2d2f 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -33,7 +33,7 @@ static struct semaphore opal_async_sem;
 static struct opal_msg *opal_async_responses;
 static unsigned int opal_max_async_tokens;
 
-int __opal_async_get_token(void)
+static int __opal_async_get_token(void)
 {
unsigned long flags;
int token;
@@ -73,7 +73,7 @@ int opal_async_get_token_interruptible(void)
 }
 EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 
-int __opal_async_release_token(int token)
+static int __opal_async_release_token(int token)
 {
unsigned long flags;
 
@@ -199,11 +199,7 @@ int __init opal_async_comp_init(void)
goto out_opal_node;
}
 
-   /* Initialize to 1 less than the maximum tokens available, as we may
-* require to pop one during emergency through synchronous call to
-* __opal_async_get_token()
-*/
-   sema_init(_async_sem, opal_max_async_tokens - 1);
+   sema_init(_async_sem, opal_max_async_tokens);
 
 out_opal_node:
of_node_put(opal_node);
-- 
2.14.2

[PATCH v4 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()

2017-10-09 Thread Cyril Bur

The OPAL calls performed in this driver shouldn't be using
opal_async_wait_response() as this performs a wait_event() which, on
long running OPAL calls could result in hung task warnings. wait_event()
prevents timely signal delivery which is also undesirable.

This patch also attempts to quieten down the use of dev_err() when
errors haven't actually occurred and also to return better information up
the stack rather than always -EIO.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 57 +++--
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 3343d4f5c4f3..42383dbca5a6 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -1,7 +1,7 @@
 /*
  * OPAL PNOR flash MTD abstraction
  *
- * Copyright IBM 2015
+ * Copyright IBM 2015-2017
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -89,33 +89,46 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
return -EIO;
}
 
-   if (rc == OPAL_SUCCESS)
-   goto out_success;
+   if (rc == OPAL_ASYNC_COMPLETION) {
+   rc = opal_async_wait_response_interruptible(token, );
+   if (rc) {
+   /*
+* If we return the mtd core will free the
+* buffer we've just passed to OPAL but OPAL
+* will continue to read or write from that
+* memory.
+* It may be tempting to ultimately return 0
+* if we're doing a read or a write since we
+* are going to end up waiting until OPAL is
+* done. However, because the MTD core sends
+* us the userspace request in chunks, we need
+* to it know we've been interrupted.
+*/
+   rc = -EINTR;
+   if (opal_async_wait_response(token, ))
+   dev_err(dev, "opal_async_wait_response() 
failed\n");
+   goto out;
+   }
+   rc = opal_get_async_rc(msg);
+   }
 
-   if (rc != OPAL_ASYNC_COMPLETION) {
+   /*
+* OPAL does mutual exclusion on the flash, it will return
+* OPAL_BUSY.
+* During firmware updates by the service processor OPAL may
+* be (temporarily) prevented from accessing the flash, in
+* this case OPAL will also return OPAL_BUSY.
+* Both cases aren't errors exactly but the flash could have
+* changed, userspace should be informed.
+*/
+   if (rc != OPAL_SUCCESS && rc != OPAL_BUSY)
dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
op, rc);
-   rc = -EIO;
-   goto out;
-   }
 
-   rc = opal_async_wait_response(token, );
-   if (rc) {
-   dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   rc = -EIO;
-   goto out;
-   }
-
-   rc = opal_get_async_rc(msg);
-out_success:
-   if (rc == OPAL_SUCCESS) {
-   rc = 0;
-   if (retlen)
+   if (rc == OPAL_SUCCESS && retlen)
*retlen = len;
-   } else {
-   rc = -EIO;
-   }
 
+   rc = opal_error_code(rc);
 out:
opal_async_release_token(token);
return rc;
-- 
2.14.2

[PATCH v4 06/10] powerpc/opal: Rework the opal-async interface

2017-10-09 Thread Cyril Bur

Future work will add an opal_async_wait_response_interruptible()
which will call wait_event_interruptible(). This work requires extra
token state to be tracked as wait_event_interruptible() can return and
the caller could release the token before OPAL responds.

Currently token state is tracked with two bitfields which are 64 bits
big but may not need to be as OPAL informs Linux how many async tokens
there are. It also uses an array indexed by token to store response
messages for each token.

The bitfields make it difficult to add more state and also provide a
hard maximum as to how many tokens there can be - it is possible that
OPAL will inform Linux that there are more than 64 tokens.

Rather than add a bitfield to track the extra state, rework the
internals slightly.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-async.c | 92 -
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index c43421ab2d2f..fbae8a37ce2c 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -1,7 +1,7 @@
 /*
  * PowerNV OPAL asynchronous completion interfaces
  *
- * Copyright 2013 IBM Corp.
+ * Copyright 2013-2017 IBM Corp.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -23,40 +23,45 @@
 #include 
 #include 
 
-#define N_ASYNC_COMPLETIONS64
+enum opal_async_token_state {
+   ASYNC_TOKEN_UNALLOCATED = 0,
+   ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_COMPLETED
+};
+
+struct opal_async_token {
+   enum opal_async_token_state state;
+   struct opal_msg response;
+};
 
-static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL};
-static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS);
 static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait);
 static DEFINE_SPINLOCK(opal_async_comp_lock);
 static struct semaphore opal_async_sem;
-static struct opal_msg *opal_async_responses;
 static unsigned int opal_max_async_tokens;
+static struct opal_async_token *opal_async_tokens;
 
 static int __opal_async_get_token(void)
 {
unsigned long flags;
-   int token;
+   int token = -EBUSY;
 
spin_lock_irqsave(_async_comp_lock, flags);
-   token = find_first_bit(opal_async_complete_map, opal_max_async_tokens);
-   if (token >= opal_max_async_tokens) {
-   token = -EBUSY;
-   goto out;
+   for (token = 0; token < opal_max_async_tokens; token++) {
+   if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) {
+   opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
+   goto out;
+   }
}
-
-   if (__test_and_set_bit(token, opal_async_token_map)) {
-   token = -EBUSY;
-   goto out;
-   }
-
-   __clear_bit(token, opal_async_complete_map);
-
 out:
spin_unlock_irqrestore(_async_comp_lock, flags);
return token;
 }
 
+/*
+ * Note: If the returned token is used in an opal call and opal returns
+ * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * calling another other opal_async_* function
+ */
 int opal_async_get_token_interruptible(void)
 {
int token;
@@ -76,6 +81,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 static int __opal_async_release_token(int token)
 {
unsigned long flags;
+   int rc;
 
if (token < 0 || token >= opal_max_async_tokens) {
pr_err("%s: Passed token is out of range, token %d\n",
@@ -84,11 +90,18 @@ static int __opal_async_release_token(int token)
}
 
spin_lock_irqsave(_async_comp_lock, flags);
-   __set_bit(token, opal_async_complete_map);
-   __clear_bit(token, opal_async_token_map);
+   switch (opal_async_tokens[token].state) {
+   case ASYNC_TOKEN_COMPLETED:
+   case ASYNC_TOKEN_ALLOCATED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED;
+   rc = 0;
+   break;
+   default:
+   rc = 1;
+   }
spin_unlock_irqrestore(_async_comp_lock, flags);
 
-   return 0;
+   return rc;
 }
 
 int opal_async_release_token(int token)
@@ -96,12 +109,10 @@ int opal_async_release_token(int token)
int ret;
 
ret = __opal_async_release_token(token);
-   if (ret)
-   return ret;
-
-   up(_async_sem);
+   if (!ret)
+   up(_async_sem);
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(opal_async_release_token);
 
@@ -122,13 +133,15 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 * functional.
 */
opal_wake_poller();
-   wait_event(opal_async_wait, test_bit(token, opal_async_comple

[PATCH v4 09/10] powerpc/powernv: Add OPAL_BUSY to opal_error_code()

2017-10-09 Thread Cyril Bur

Also export opal_error_code() so that it can be used in modules

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 65c79ecf5a4d..041ddbd1fc57 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -998,6 +998,7 @@ int opal_error_code(int rc)
 
case OPAL_PARAMETER:return -EINVAL;
case OPAL_ASYNC_COMPLETION: return -EINPROGRESS;
+   case OPAL_BUSY:
case OPAL_BUSY_EVENT:   return -EBUSY;
case OPAL_NO_MEM:   return -ENOMEM;
case OPAL_PERMISSION:   return -EPERM;
@@ -1037,3 +1038,4 @@ EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
 /* Export this for KVM */
 EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
 EXPORT_SYMBOL_GPL(opal_int_eoi);
+EXPORT_SYMBOL_GPL(opal_error_code);
-- 
2.14.2

[PATCH v4 00/10] Allow opal-async waiters to get interrupted

2017-10-09 Thread Cyril Bur

V4: Rework and rethink.

To recap:
Userspace MTD read()s/write()s and erases to powernv_flash become
calls into the OPAL firmware which subsequently handles flash access.
Because the read()s, write()s or erases can be large (bounded of
course my the size of flash) OPAL may take some time to service the
request, this causes the powernv_flash driver to sit in a wait_event()
for potentially minutes. This causes two problems, firstly, tools
appear to hang for the entire time as they cannot be interrupted by
signals and secondly, this can trigger hung task warnings. The correct
solution is to use wait_event_interruptible() which my rework (as part
of this series) of the opal-async infrastructure provides.

The final patch in this series achieves this. It should eliminate both
hung tasks and threads locking up.

Included in this series are other simpler fixes for powernv_flash:

Don't always return EIO on error. OPAL does mutual exclusion on the
flash and also knows when the service processor takes control of the
flash, in both of these cases it will return OPAL_BUSY, translating
this to EIO is misleading to userspace.

Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION
and don't treat it as an error. Unfortunately there are too many drivers
out there with the incorrect behaviour so this means OPAL can never
return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the
code from being correct.

Don't return ERESTARTSYS if token acquisition is interrupted as
powernv_flash can't be sure it hasn't already performed some work, let
userspace deal with the problem.

Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash.

Not for powernv_flash, a fix from Stewart Smith which fits into this
series as it relies on my improvements to the opal-async
infrastructure.

V3: export opal_error_code() so that powernv_flash can be built=m

Hello,

Version one of this series ignored that OPAL may continue to use
buffers passed to it after Linux kfree()s the buffer. This version
addresses this, not in a particularly nice way - future work could
make this better. This version also includes a few cleanups and fixups
to powernv_flash driver one along the course of this work that I
thought I would just send.

The problem we're trying to solve here is that currently all users of
the opal-async calls must use wait_event(), this may be undesirable
when there is a userspace process behind the request for the opal
call, if OPAL takes too long to complete the call then hung task
warnings will appear.

In order to solve the problem callers should use
wait_event_interruptible(), due to the interruptible nature of this
call the opal-async infrastructure needs to track extra state
associated with each async token, this is prepared for in patch 6/10.

While I was working on the opal-async infrastructure improvements
Stewart fixed another problem and he relies on the corrected behaviour
of opal-async so I've sent it here.

Hello MTD folk, traditionally Michael Ellerman takes powernv_flash
driver patches through the powerpc tree, as always your feedback is
very welcome.

Thanks,

Cyril

Cyril Bur (9):
  mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
  mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
  mtd: powernv_flash: Remove pointless goto in driver init
  mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token
acquisition
  powerpc/opal: Make __opal_async_{get,release}_token() static
  powerpc/opal: Rework the opal-async interface
  powerpc/opal: Add opal_async_wait_response_interruptible() to
opal-async
  powerpc/powernv: Add OPAL_BUSY to opal_error_code()
  mtd: powernv_flash: Use opal_async_wait_response_interruptible()

Stewart Smith (1):
  powernv/opal-sensor: remove not needed lock

 arch/powerpc/include/asm/opal.h  |   4 +-
 arch/powerpc/platforms/powernv/opal-async.c  | 183 +++
 arch/powerpc/platforms/powernv/opal-sensor.c |  17 +--
 arch/powerpc/platforms/powernv/opal.c|   2 +
 drivers/mtd/devices/powernv_flash.c  |  83 +++-
 5 files changed, 194 insertions(+), 95 deletions(-)

-- 
2.14.2

[PATCH v4 07/10] powernv/opal-sensor: remove not needed lock

2017-10-09 Thread Cyril Bur

From: Stewart Smith <stew...@linux.vnet.ibm.com>

Parallel sensor reads could run out of async tokens due to
opal_get_sensor_data grabbing tokens but then doing the sensor
read behind a mutex, essentially serializing the (possibly
asynchronous and relatively slow) sensor read.

It turns out that the mutex isn't needed at all, not only
should the OPAL interface allow concurrent reads, the implementation
is certainly safe for that, and if any sensor we were reading
from somewhere isn't, doing the mutual exclusion in the kernel
is the wrong place to do it, OPAL should be doing it for the kernel.

So, remove the mutex.

Additionally, we shouldn't be printing out an error when we don't
get a token as the only way this should happen is if we've been
interrupted in down_interruptible() on the semaphore.

Reported-by: Robert Lippert <rlipp...@google.com>
Signed-off-by: Stewart Smith <stew...@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-sensor.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c 
b/arch/powerpc/platforms/powernv/opal-sensor.c
index aa267f120033..0a7074bb91dc 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor.c
@@ -19,13 +19,10 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-static DEFINE_MUTEX(opal_sensor_mutex);
-
 /*
  * This will return sensor information to driver based on the requested sensor
  * handle. A handle is an opaque id for the powernv, read by the driver from 
the
@@ -38,13 +35,9 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
__be32 data;
 
token = opal_async_get_token_interruptible();
-   if (token < 0) {
-   pr_err("%s: Couldn't get the token, returning\n", __func__);
-   ret = token;
-   goto out;
-   }
+   if (token < 0)
+   return token;
 
-   mutex_lock(_sensor_mutex);
ret = opal_sensor_read(sensor_hndl, token, );
switch (ret) {
case OPAL_ASYNC_COMPLETION:
@@ -52,7 +45,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
if (ret) {
pr_err("%s: Failed to wait for the async response, 
%d\n",
   __func__, ret);
-   goto out_token;
+   goto out;
}
 
ret = opal_error_code(opal_get_async_rc(msg));
@@ -73,10 +66,8 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
break;
}
 
-out_token:
-   mutex_unlock(_sensor_mutex);
-   opal_async_release_token(token);
 out:
+   opal_async_release_token(token);
return ret;
 }
 EXPORT_SYMBOL_GPL(opal_get_sensor_data);
-- 
2.14.2

[PATCH v4 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()

2017-10-09 Thread Cyril Bur

BUG_ON() should be reserved in situations where we can not longer
guarantee the integrity of the system. In the case where
powernv_flash_async_op() receives an impossible op, we can still
guarantee the integrity of the system.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index f5396f26ddb4..f9ec38281ff2 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -78,7 +78,9 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
rc = opal_flash_erase(info->id, offset, len, token);
break;
default:
-   BUG_ON(1);
+   WARN_ON_ONCE(1);
+   opal_async_release_token(token);
+   return -EIO;
}
 
if (rc != OPAL_ASYNC_COMPLETION) {
-- 
2.14.2

[PATCH v4 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async

2017-10-09 Thread Cyril Bur

This patch adds an _interruptible version of opal_async_wait_response().
This is useful when a long running OPAL call is performed on behalf of a
userspace thread, for example, the opal_flash_{read,write,erase}
functions performed by the powernv-flash MTD driver.

It is foreseeable that these functions would take upwards of two minutes
causing the wait_event() to block long enough to cause hung task
warnings. Furthermore, wait_event_interruptible() is preferable as
otherwise there is no way for signals to stop the process which is going
to be confusing in userspace.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/opal.h |  2 +
 arch/powerpc/platforms/powernv/opal-async.c | 87 +++--
 2 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0078eb5acf98..f95ca4560bfa 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -307,6 +307,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, 
uint64_t evt_val);
 extern int opal_async_get_token_interruptible(void);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
+extern int opal_async_wait_response_interruptible(uint64_t token,
+   struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 
 struct rtc_time;
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index fbae8a37ce2c..e2004606b75b 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -26,6 +26,8 @@
 enum opal_async_token_state {
ASYNC_TOKEN_UNALLOCATED = 0,
ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_DISPATCHED,
+   ASYNC_TOKEN_ABANDONED,
ASYNC_TOKEN_COMPLETED
 };
 
@@ -58,8 +60,10 @@ static int __opal_async_get_token(void)
 }
 
 /*
- * Note: If the returned token is used in an opal call and opal returns
- * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * Note: If the returned token is used in an opal call and opal
+ * returns OPAL_ASYNC_COMPLETION you MUST one of
+ * opal_async_wait_response() or
+ * opal_async_wait_response_interruptible() at least once before
  * calling another other opal_async_* function
  */
 int opal_async_get_token_interruptible(void)
@@ -96,6 +100,16 @@ static int __opal_async_release_token(int token)
opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED;
rc = 0;
break;
+   /*
+* DISPATCHED and ABANDONED tokens must wait for OPAL to
+* respond.
+* Mark a DISPATCHED token as ABANDONED so that the response
+* response handling code knows no one cares and that it can
+* free it then.
+*/
+   case ASYNC_TOKEN_DISPATCHED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED;
+   /* Fall through */
default:
rc = 1;
}
@@ -128,7 +142,11 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
return -EINVAL;
}
 
-   /* Wakeup the poller before we wait for events to speed things
+   /*
+* There is no need to mark the token as dispatched, wait_event()
+* will block until the token completes.
+*
+* Wakeup the poller before we wait for events to speed things
 * up on platforms or simulators where the interrupts aren't
 * functional.
 */
@@ -141,11 +159,66 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 }
 EXPORT_SYMBOL_GPL(opal_async_wait_response);
 
+int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg 
*msg)
+{
+   unsigned long flags;
+   int ret;
+
+   if (token >= opal_max_async_tokens) {
+   pr_err("%s: Invalid token passed\n", __func__);
+   return -EINVAL;
+   }
+
+   if (!msg) {
+   pr_err("%s: Invalid message pointer passed\n", __func__);
+   return -EINVAL;
+   }
+
+   /*
+* The first time this gets called we mark the token as DISPATCHED
+* so that if wait_event_interruptible() returns not zero and the
+* caller frees the token, we know not to actually free the token
+* until the response comes.
+*
+* Only change if the token is ALLOCATED - it may have been
+* completed even before the caller gets around to calling this
+* the first time.
+*
+* There is also a dirty great comment at the token allocation
+* function that if the opal call returns OPAL_ASYNC_COMPLETION to
+* the caller then the caller *must* call this or the not
+* interruptible version before doing anything e

[PATCH v4 03/10] mtd: powernv_flash: Remove pointless goto in driver init

2017-10-09 Thread Cyril Bur

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index ca3ca6adf71e..4dd3b5d2feb2 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -227,21 +227,20 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
int ret;
 
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!data)
+   return -ENOMEM;
+
data->mtd.priv = data;
 
ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id));
if (ret) {
dev_err(dev, "no device property 'ibm,opal-id'\n");
-   goto out;
+   return ret;
}
 
ret = powernv_flash_set_driver_info(dev, >mtd);
if (ret)
-   goto out;
+   return ret;
 
dev_set_drvdata(dev, data);
 
@@ -250,10 +249,7 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
 * with an ffs partition at the start, it should prove easier for users
 * to deal with partitions or not as they see fit
 */
-   ret = mtd_device_register(>mtd, NULL, 0);
-
-out:
-   return ret;
+   return mtd_device_register(>mtd, NULL, 0);
 }
 
 /**
-- 
2.14.2

[PATCH v4 04/10] mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition

2017-10-09 Thread Cyril Bur

Because the MTD core might split up a read() or write() from userspace
into several calls to the driver, we may fail to get a token but already
have done some work, best to return -EINTR back to userspace and have
them decide what to do.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 4dd3b5d2feb2..3343d4f5c4f3 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -47,6 +47,11 @@ enum flash_op {
FLASH_OP_ERASE,
 };
 
+/*
+ * Don't return -ERESTARTSYS if we can't get a token, the MTD core
+ * might have split up the call from userspace and called into the
+ * driver more than once, we'll already have done some amount of work.
+ */
 static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op,
loff_t offset, size_t len, size_t *retlen, u_char *buf)
 {
@@ -63,6 +68,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
if (token < 0) {
if (token != -ERESTARTSYS)
dev_err(dev, "Failed to get an async token\n");
+   else
+   token = -EINTR;
return token;
}
 
-- 
2.14.2

[PATCH v4 02/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error

2017-10-09 Thread Cyril Bur

While this driver expects to interact asynchronously, OPAL is well
within its rights to return OPAL_SUCCESS to indicate that the operation
completed without the need for a callback. We shouldn't treat
OPAL_SUCCESS as an error rather we should wrap up and return promptly to
the caller.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
I'll note here that currently no OPAL exists that will return
OPAL_SUCCESS so there isn't the possibility of a bug today.
---
 drivers/mtd/devices/powernv_flash.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index f9ec38281ff2..ca3ca6adf71e 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -63,7 +63,6 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
if (token < 0) {
if (token != -ERESTARTSYS)
dev_err(dev, "Failed to get an async token\n");
-
return token;
}
 
@@ -83,21 +82,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
return -EIO;
}
 
+   if (rc == OPAL_SUCCESS)
+   goto out_success;
+
if (rc != OPAL_ASYNC_COMPLETION) {
dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
op, rc);
-   opal_async_release_token(token);
-   return -EIO;
+   rc = -EIO;
+   goto out;
}
 
rc = opal_async_wait_response(token, );
-   opal_async_release_token(token);
if (rc) {
dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   return -EIO;
+   rc = -EIO;
+   goto out;
}
 
rc = opal_get_async_rc(msg);
+out_success:
if (rc == OPAL_SUCCESS) {
rc = 0;
if (retlen)
@@ -106,6 +109,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
rc = -EIO;
}
 
+out:
+   opal_async_release_token(token);
return rc;
 }
 
-- 
2.14.2

[PATCH 3/3] powerpc/tm: P9 disable transactionally suspended sigcontexts

2017-10-06 Thread Cyril Bur

From: Michael Neuling <mi...@neuling.org>

Unfortunately userspace can construct a sigcontext which enables
suspend. Thus userspace can force Linux into a path where trechkpt is
executed.

This patch blocks this from happening on POWER9 but sanity checking
sigcontexts passed in.

ptrace doesn't have this problem as only MSR SE and BE can be changed
via ptrace.

This patch also adds a number of WARN_ON() in case we every enter
suspend when we shouldn't. This should catch systems that don't have
the firmware change and are running TM.

A future firmware change will allow suspend mode on POWER9 but that is
going to require additional Linux changes to support. In the interim,
this allows TM to continue to (partially) work while stopping
userspace from crashing Linux.

Signed-off-by: Michael Neuling <mi...@neuling.org>
Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/kernel/process.c   | 2 ++
 arch/powerpc/kernel/signal_32.c | 4 
 arch/powerpc/kernel/signal_64.c | 5 +
 3 files changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..5b81673c5026 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -903,6 +903,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
if (!MSR_TM_ACTIVE(thr->regs->msr))
goto out_and_saveregs;
 
+   WARN_ON(!tm_suspend_supported());
+
TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
 "ccr=%lx, msr=%lx, trap=%lx)\n",
 tsk->pid, thr->regs->nip,
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 92fb1c8dbbd8..9eac0131c080 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -519,6 +519,8 @@ static int save_tm_user_regs(struct pt_regs *regs,
 {
unsigned long msr = regs->msr;
 
+   WARN_ON(!tm_suspend_supported());
+
/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
 * just indicates to userland that we were doing a transaction, but we
 * don't want to return in transactional state.  This also ensures
@@ -769,6 +771,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
int i;
 #endif
 
+   if (!tm_suspend_supported())
+   return 1;
/*
 * restore general registers but not including MSR or SOFTE. Also
 * take care of keeping r2 (TLS) intact if not a signal.
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index c83c115858c1..6d28caf8496f 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -214,6 +214,8 @@ static long setup_tm_sigcontexts(struct sigcontext __user 
*sc,
 
BUG_ON(!MSR_TM_ACTIVE(regs->msr));
 
+   WARN_ON(!tm_suspend_supported());
+
/* Remove TM bits from thread's MSR.  The MSR in the sigcontext
 * just indicates to userland that we were doing a transaction, but we
 * don't want to return in transactional state.  This also ensures
@@ -430,6 +432,9 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 
BUG_ON(tsk != current);
 
+   if (!tm_suspend_supported())
+   return -EINVAL;
+
/* copy the GPRs */
err |= __copy_from_user(regs->gpr, tm_sc->gp_regs, sizeof(regs->gpr));
err |= __copy_from_user(>thread.ckpt_regs, sc->gp_regs,
-- 
2.14.2

[PATCH 2/3] powerpc/tm: P9 disabled suspend mode workaround

2017-10-06 Thread Cyril Bur

[from Michael Neulings original patch]
Each POWER9 core is made of two super slices. Each super slice can
only have one thread at a time in TM suspend mode. The super slice
restricts ever entering a state where both threads are in suspend by
aborting transactions on tsuspend or exceptions into the kernel.

Unfortunately for context switch we need trechkpt which forces suspend
mode. If a thread is already in suspend and a second thread needs to
be restored that was suspended, the trechkpt must be executed.
Currently the trechkpt will hang in this case until the other thread
exits suspend. This causes problems for Linux resulting in hang and
RCU stall detectors going off.

To workaround this, we disable suspend in the core. This is done via a
firmware change which stops the hardware ever getting into suspend.
The hardware will always rollback a transaction on any tsuspend or
entry into the kernel.

[added by Cyril Bur]
As the no-suspend firmware change is novel and untested using it should
be opt in by users. Furthumore, currently the kernel has no method to
know if the firmware has applied the no-suspend workaround. This patch
extends the ppc_tm commandline option to allow users to opt-in if they
are sure that their firmware has been updated and they understand the
risks involed.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +--
 arch/powerpc/include/asm/cputable.h |  6 ++
 arch/powerpc/include/asm/tm.h   |  6 --
 arch/powerpc/kernel/cputable.c  | 12 
 arch/powerpc/kernel/setup_64.c  | 16 ++--
 5 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4e2b5d9078a0..a0f757f749cf 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -805,8 +805,11 @@
Disable RADIX MMU mode on POWER9
 
ppc_tm= [PPC]
-   Format: {"off"}
-   Disable Hardware Transactional Memory
+   Format: {"off" | "no-suspend"}
+   "Off" Will disable Hardware Transactional Memory.
+   "no-suspend" Informs the kernel that the
+   hardware will not transition into the kernel
+   with a suspended transaction.
 
disable_cpu_apicid= [X86,APIC,SMP]
Format: 
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index a9bf921f4efc..e66101830af2 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -124,6 +124,12 @@ extern void identify_cpu_name(unsigned int pvr);
 extern void do_feature_fixups(unsigned long value, void *fixup_start,
  void *fixup_end);
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+extern bool tm_suspend_supported(void);
+#else
+static inline bool tm_suspend_supported(void) { return false; }
+#endif
+
 extern const char *powerpc_base_platform;
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECKS
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index eca1c866ca97..1fd0b5f72861 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -9,9 +9,11 @@
 
 #ifndef __ASSEMBLY__
 
-#define TM_STATE_ON0
-#define TM_STATE_OFF   1
+#define TM_STATE_ON0
+#define TM_STATE_OFF   1
+#define TM_STATE_NO_SUSPEND2
 
+extern int ppc_tm_state;
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
   unsigned long orig_msr, uint8_t cause);
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 760872916013..2cb01b48123a 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -22,6 +22,7 @@
 #include   /* for PTRRELOC on ARCH=ppc */
 #include 
 #include 
+#include 
 
 static struct cpu_spec the_cpu_spec __read_mostly;
 
@@ -2301,6 +2302,17 @@ void __init identify_cpu_name(unsigned int pvr)
}
 }
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+bool tm_suspend_supported(void)
+{
+   if (cpu_has_feature(CPU_FTR_TM)) {
+   if (pvr_version_is(PVR_POWER9) && ppc_tm_state != 
TM_STATE_NO_SUSPEND)
+   return false;
+   return true;
+   }
+   return false;
+}
+#endif
 
 #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECKS
 struct static_key_true cpu_feature_keys[NUM_CPU_FTR_KEYS] = {
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index e37c26d2e54b..227ac600a1b7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -251,12 +251,14 @@ static void cpu_ready_for_interrupts(void)
get_

[PATCH 1/3] powerpc/tm: Add commandline option to disable hardware transactional memory

2017-10-06 Thread Cyril Bur

Currently the kernel relies on firmware to inform it whether or not the
CPU supports HTM and as long as the kernel was built with
CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make use
of the facility.

There may be situations where it would be advantageous for the kernel
to not allow userspace to use HTM, currently the only way to achieve
this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n.

This patch adds a simple commandline option so that HTM can be disabled
at boot time.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  4 
 arch/powerpc/include/asm/tm.h   |  3 +++
 arch/powerpc/kernel/setup_64.c  | 28 +
 3 files changed, 35 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 05496622b4ef..4e2b5d9078a0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -804,6 +804,10 @@
disable_radix   [PPC]
Disable RADIX MMU mode on POWER9
 
+   ppc_tm= [PPC]
+   Format: {"off"}
+   Disable Hardware Transactional Memory
+
disable_cpu_apicid= [X86,APIC,SMP]
Format: 
The number of initial APIC ID for the
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..eca1c866ca97 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -9,6 +9,9 @@
 
 #ifndef __ASSEMBLY__
 
+#define TM_STATE_ON0
+#define TM_STATE_OFF   1
+
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
   unsigned long orig_msr, uint8_t cause);
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b89c6aac48c9..e37c26d2e54b 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -250,6 +251,31 @@ static void cpu_ready_for_interrupts(void)
get_paca()->kernel_msr = MSR_KERNEL;
 }
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+static int ppc_tm_state;
+static int __init parse_ppc_tm(char *p)
+{
+   if (strcmp(p, "off") == 0)
+   ppc_tm_state = TM_STATE_OFF;
+   else
+   printk(KERN_NOTICE "Unknown value to cmdline ppc_tm '%s'\n", p);
+   return 0;
+}
+early_param("ppc_tm", parse_ppc_tm);
+
+static void check_disable_tm(void)
+{
+   if (cpu_has_feature(CPU_FTR_TM) && ppc_tm_state == TM_STATE_OFF) {
+   printk(KERN_NOTICE "Disabling hardware transactional memory 
(HTM)\n");
+   cur_cpu_spec->cpu_user_features2 &=
+   ~(PPC_FEATURE2_HTM_NOSC | PPC_FEATURE2_HTM);
+   cur_cpu_spec->cpu_features &= ~CPU_FTR_TM;
+   }
+}
+#else
+static void check_disable_tm(void) { }
+#endif
+
 /*
  * Early initialization entry point. This is called by head.S
  * with MMU translation disabled. We rely on the "feature" of
@@ -299,6 +325,8 @@ void __init early_setup(unsigned long dt_ptr)
 */
early_init_devtree(__va(dt_ptr));
 
+   check_disable_tm();
+
/* Now we know the logical id of our boot cpu, setup the paca. */
setup_paca([boot_cpuid]);
fixup_boot_paca();
-- 
2.14.2

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-25 Thread Cyril Bur

On Sun, 2017-09-24 at 05:18 +0800, Simon Guo wrote:
> Hi Cyril,
> On Sat, Sep 23, 2017 at 12:06:48AM +1000, Cyril Bur wrote:
> > On Thu, 2017-09-21 at 07:34 +0800, wei.guo.si...@gmail.com wrote:
> > > From: Simon Guo <wei.guo.si...@gmail.com>
> > > 
> > > This patch add VMX primitives to do memcmp() in case the compare size
> > > exceeds 4K bytes.
> > > 
> > 
> > Hi Simon,
> > 
> > Sorry I didn't see this sooner, I've actually been working on a kernel
> > version of glibc commit dec4a7105e (powerpc: Improve memcmp performance
> > for POWER8) unfortunately I've been distracted and it still isn't done.
> 
> Thanks for sync with me. Let's consolidate our effort together :)
> 
> I have a quick check on glibc commit dec4a7105e. 
> Looks the aligned case comparison with VSX is launched without rN size
> limitation, which means it will have a VSX reg load penalty even when the 
> length is 9 bytes.
> 

This was written for userspace which doesn't have to explicitly enable
VMX in order to use it - we need to be smarter in the kernel.

> It did some optimization when src/dest addrs don't have the same offset 
> on 8 bytes alignment boundary. I need to read more closely.
> 
> > I wonder if we can consolidate our efforts here. One thing I did come
> > across in my testing is that for memcmp() that will fail early (I
> > haven't narrowed down the the optimal number yet) the cost of enabling
> > VMX actually turns out to be a performance regression, as such I've
> > added a small check of the first 64 bytes to the start before enabling
> > VMX to ensure the penalty is worth taking.
> 
> Will there still be a penalty if the 65th byte differs?  
> 

I haven't benchmarked it exactly, my rationale for 64 bytes was that it
is the stride of the vectorised copy loop so, if we know we'll fail
before even completing one iteration of the vectorized loop there isn't
any point using the vector regs.

> > 
> > Also, you should consider doing 4K and greater, KSM (Kernel Samepage
> > Merging) uses PAGE_SIZE which can be as small as 4K.
> 
> Currently the VMX will only be applied when size exceeds 4K. Are you
> suggesting a bigger threshold than 4K?
> 

Equal to or greater than 4K, KSM will benefit.

> We can sync more offline for v3.
> 
> Thanks,
> - Simon

Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

2017-09-22 Thread Cyril Bur

On Thu, 2017-09-21 at 07:34 +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> This patch add VMX primitives to do memcmp() in case the compare size
> exceeds 4K bytes.
> 

Hi Simon,

Sorry I didn't see this sooner, I've actually been working on a kernel
version of glibc commit dec4a7105e (powerpc: Improve memcmp performance
for POWER8) unfortunately I've been distracted and it still isn't done.
I wonder if we can consolidate our efforts here. One thing I did come
across in my testing is that for memcmp() that will fail early (I
haven't narrowed down the the optimal number yet) the cost of enabling
VMX actually turns out to be a performance regression, as such I've
added a small check of the first 64 bytes to the start before enabling
VMX to ensure the penalty is worth taking.

Also, you should consider doing 4K and greater, KSM (Kernel Samepage
Merging) uses PAGE_SIZE which can be as small as 4K.

Cyril

> Test result with following test program(replace the "^>" with ""):
> --
> > # cat tools/testing/selftests/powerpc/stringloops/memcmp.c
> > #include 
> > #include 
> > #include 
> > #include 
> > #include "utils.h"
> > #define SIZE (1024 * 1024 * 900)
> > #define ITERATIONS 40
> 
> int test_memcmp(const void *s1, const void *s2, size_t n);
> 
> static int testcase(void)
> {
> char *s1;
> char *s2;
> unsigned long i;
> 
> s1 = memalign(128, SIZE);
> if (!s1) {
> perror("memalign");
> exit(1);
> }
> 
> s2 = memalign(128, SIZE);
> if (!s2) {
> perror("memalign");
> exit(1);
> }
> 
> for (i = 0; i < SIZE; i++)  {
> s1[i] = i & 0xff;
> s2[i] = i & 0xff;
> }
> for (i = 0; i < ITERATIONS; i++) {
>   int ret = test_memcmp(s1, s2, SIZE);
> 
>   if (ret) {
>   printf("return %d at[%ld]! should have returned 
> zero\n", ret, i);
>   abort();
>   }
>   }
> 
> return 0;
> }
> 
> int main(void)
> {
> return test_harness(testcase, "memcmp");
> }
> --
> Without VMX patch:
>7.435191479 seconds time elapsed   
>( +- 0.51% )
> With VMX patch:
>6.802038938 seconds time elapsed   
>( +- 0.56% )
>   There is ~+8% improvement.
> 
> However I am not aware whether there is use case in kernel for memcmp on
> large size yet.
> 
> Signed-off-by: Simon Guo 
> ---
>  arch/powerpc/include/asm/asm-prototypes.h |  2 +-
>  arch/powerpc/lib/copypage_power7.S|  2 +-
>  arch/powerpc/lib/memcmp_64.S  | 82 
> +++
>  arch/powerpc/lib/memcpy_power7.S  |  2 +-
>  arch/powerpc/lib/vmx-helper.c |  2 +-
>  5 files changed, 86 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
> b/arch/powerpc/include/asm/asm-prototypes.h
> index 7330150..e6530d8 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -49,7 +49,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval,
>  /* VMX copying */
>  int enter_vmx_usercopy(void);
>  int exit_vmx_usercopy(void);
> -int enter_vmx_copy(void);
> +int enter_vmx_ops(void);
>  void * exit_vmx_copy(void *dest);
>  
>  /* Traps */
> diff --git a/arch/powerpc/lib/copypage_power7.S 
> b/arch/powerpc/lib/copypage_power7.S
> index ca5fc8f..9e7729e 100644
> --- a/arch/powerpc/lib/copypage_power7.S
> +++ b/arch/powerpc/lib/copypage_power7.S
> @@ -60,7 +60,7 @@ _GLOBAL(copypage_power7)
>   std r4,-STACKFRAMESIZE+STK_REG(R30)(r1)
>   std r0,16(r1)
>   stdur1,-STACKFRAMESIZE(r1)
> - bl  enter_vmx_copy
> + bl  enter_vmx_ops
>   cmpwi   r3,0
>   ld  r0,STACKFRAMESIZE+16(r1)
>   ld  r3,STK_REG(R31)(r1)
> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> index 6dccfb8..40218fc 100644
> --- a/arch/powerpc/lib/memcmp_64.S
> +++ b/arch/powerpc/lib/memcmp_64.S
> @@ -162,6 +162,13 @@ _GLOBAL(memcmp)
>   blr
>  
>  .Llong:
> +#ifdef CONFIG_ALTIVEC
> + /* Try to use vmx loop if length is larger than 4K */
> + cmpldi  cr6,r5,4096
> + bgt cr6,.Lvmx_cmp
> +
> +.Llong_novmx_cmp:
> +#endif
>   li  off8,8
>   li  off16,16
>   li  off24,24
> @@ -319,4 +326,79 @@ _GLOBAL(memcmp)
>  8:
>   blr
>  
> +#ifdef CONFIG_ALTIVEC
> +.Lvmx_cmp:
> + mflrr0
> + std r3,-STACKFRAMESIZE+STK_REG(R31)(r1)
> + std r4,-STACKFRAMESIZE+STK_REG(R30)(r1)
> + std r5,-STACKFRAMESIZE+STK_REG(R29)(r1)
> + std r0,16(r1)
> + stdur1,-STACKFRAMESIZE(r1)
> + bl  enter_vmx_ops
> + cmpwi   cr1,r3,0
> + ld  r0,STACKFRAMESIZE+16(r1)
> + ld

Re: [PATCH v2] powerpc/tm: Flush TM only if CPU has TM feature

2017-09-14 Thread Cyril Bur

On Wed, 2017-09-13 at 22:13 -0400, Gustavo Romero wrote:
> Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
> added code to access TM SPRs in flush_tmregs_to_thread(). However
> flush_tmregs_to_thread() does not check if TM feature is available on
> CPU before trying to access TM SPRs in order to copy live state to
> thread structures. flush_tmregs_to_thread() is indeed guarded by
> CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel
> was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on
> a CPU without TM feature available, thus rendering the execution
> of TM instructions that are treated by the CPU as illegal instructions.
> 
> The fix is just to add proper checking in flush_tmregs_to_thread()
> if CPU has the TM feature before accessing any TM-specific resource,
> returning immediately if TM is no available on the CPU. Adding
> that checking in flush_tmregs_to_thread() instead of in places
> where it is called, like in vsr_get() and vsr_set(), is better because
> avoids the same problem cropping up elsewhere.
> 
> Cc: sta...@vger.kernel.org # v4.13+
> Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump")
> Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com>

Keeping in mind I reviewed cd63f3c and feeling a bit sheepish having
missed this.

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  arch/powerpc/kernel/ptrace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 07cd22e..f52ad5b 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -131,7 +131,7 @@ static void flush_tmregs_to_thread(struct task_struct 
> *tsk)
>* in the appropriate thread structures from live.
>*/
>  
> - if (tsk != current)
> + if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current))
>   return;
>  
>   if (MSR_TM_SUSPENDED(mfmsr())) {

Re: [PATCH] powerpc: Use reg.h values for program check reason codes

2017-08-16 Thread Cyril Bur

On Wed, 2017-08-16 at 10:52 +0200, Christophe LEROY wrote:
> Hi,
> 
> Le 16/08/2017 à 08:50, Cyril Bur a écrit :
> > Small amount of #define duplication, makes sense for these to be in
> > reg.h.
> > 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> 
> Looks similar to the following applies commit, doesn't it ?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=d30a5a5262ca64d58aa07fb2ecd7f992df83b4bc
> 

Oops, I think I'm based off Linus' tree. Sorry for the noise.


Cyril

*starts writing patch to rename to PROGTMBAD*... because clearly haha
;)

> Christophe
> 
> > ---
> >   arch/powerpc/include/asm/reg.h |  1 +
> >   arch/powerpc/kernel/traps.c| 10 +-
> >   2 files changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> > index a3b6575c7842..c22b1ae5ad03 100644
> > --- a/arch/powerpc/include/asm/reg.h
> > +++ b/arch/powerpc/include/asm/reg.h
> > @@ -675,6 +675,7 @@
> >   * may not be recoverable */
> >   #define SRR1_WS_DEEPER0x0002 /* Some resources not 
> > maintained */
> >   #define SRR1_WS_DEEP  0x0001 /* All resources maintained 
> > */
> > +#define   SRR1_PROGTMBAD   0x0020 /* TM Bad Thing */
> >   #define   SRR1_PROGFPE0x0010 /* Floating Point Enabled */
> >   #define   SRR1_PROGILL0x0008 /* Illegal instruction */
> >   #define   SRR1_PROGPRIV   0x0004 /* Privileged instruction */
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 1f7ec178db05..0a5ddaea8bf1 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -416,11 +416,11 @@ static inline int check_io_access(struct pt_regs 
> > *regs)
> >  exception is in the MSR. */
> >   #define get_reason(regs)  ((regs)->msr)
> >   #define get_mc_reason(regs)   ((regs)->msr)
> > -#define REASON_TM  0x20
> > -#define REASON_FP  0x10
> > -#define REASON_ILLEGAL 0x8
> > -#define REASON_PRIVILEGED  0x4
> > -#define REASON_TRAP0x2
> > +#define REASON_TM  SRR1_PROGTMBAD
> > +#define REASON_FP  SRR1_PROGFPE
> > +#define REASON_ILLEGAL SRR1_PROGILL
> > +#define REASON_PRIVILEGED  SRR1_PROGPRIV
> > +#define REASON_TRAPSRR1_PROGTRAP
> >   
> >   #define single_stepping(regs) ((regs)->msr & MSR_SE)
> >   #define clear_single_step(regs)   ((regs)->msr &= ~MSR_SE)
> >

[PATCH] powerpc: Use reg.h values for program check reason codes

2017-08-16 Thread Cyril Bur

Small amount of #define duplication, makes sense for these to be in
reg.h.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/reg.h |  1 +
 arch/powerpc/kernel/traps.c| 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a3b6575c7842..c22b1ae5ad03 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -675,6 +675,7 @@
  * may not be recoverable */
 #define  SRR1_WS_DEEPER0x0002 /* Some resources not 
maintained */
 #define  SRR1_WS_DEEP  0x0001 /* All resources maintained 
*/
+#define   SRR1_PROGTMBAD   0x0020 /* TM Bad Thing */
 #define   SRR1_PROGFPE 0x0010 /* Floating Point Enabled */
 #define   SRR1_PROGILL 0x0008 /* Illegal instruction */
 #define   SRR1_PROGPRIV0x0004 /* Privileged instruction */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1f7ec178db05..0a5ddaea8bf1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -416,11 +416,11 @@ static inline int check_io_access(struct pt_regs *regs)
exception is in the MSR. */
 #define get_reason(regs)   ((regs)->msr)
 #define get_mc_reason(regs)((regs)->msr)
-#define REASON_TM  0x20
-#define REASON_FP  0x10
-#define REASON_ILLEGAL 0x8
-#define REASON_PRIVILEGED  0x4
-#define REASON_TRAP0x2
+#define REASON_TM  SRR1_PROGTMBAD
+#define REASON_FP  SRR1_PROGFPE
+#define REASON_ILLEGAL SRR1_PROGILL
+#define REASON_PRIVILEGED  SRR1_PROGPRIV
+#define REASON_TRAPSRR1_PROGTRAP
 
 #define single_stepping(regs)  ((regs)->msr & MSR_SE)
 #define clear_single_step(regs)((regs)->msr &= ~MSR_SE)
-- 
2.14.1

Re: [PATCH V9 1/3] powernv: powercap: Add support for powercap framework

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 07:54 +0530, Shilpasri G Bhat wrote:
> Adds a generic powercap framework to change the system powercap
> inband through OPAL-OCC command/response interface.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from V8:
> - Use __pa() while passing pointer in opal call
> - Use mutex_lock_interruptible()
> - Fix error codes returned to user
> - Allocate and add sysfs attributes in a single loop
> 
>  arch/powerpc/include/asm/opal-api.h|   5 +-
>  arch/powerpc/include/asm/opal.h|   4 +
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-powercap.c | 243 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/opal.c  |   4 +
>  6 files changed, 258 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 3130a73..c3e0c4a 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -42,6 +42,7 @@
>  #define OPAL_I2C_STOP_ERR-24
>  #define OPAL_XIVE_PROVISIONING   -31
>  #define OPAL_XIVE_FREE_ACTIVE-32
> +#define OPAL_TIMEOUT -33
>  
>  /* API Tokens (in r0) */
>  #define OPAL_INVALID_CALL   -1
> @@ -190,7 +191,9 @@
>  #define OPAL_NPU_INIT_CONTEXT146
>  #define OPAL_NPU_DESTROY_CONTEXT 147
>  #define OPAL_NPU_MAP_LPAR148
> -#define OPAL_LAST148
> +#define OPAL_GET_POWERCAP152
> +#define OPAL_SET_POWERCAP153
> +#define OPAL_LAST153
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 588fb1c..ec2087c 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int64_t opal_xive_free_irq(uint32_t girq);
>  int64_t opal_xive_sync(uint32_t type, uint32_t id);
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
> +int opal_get_powercap(u32 handle, int token, u32 *pcap);
> +int opal_set_powercap(u32 handle, int token, u32 pcap);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  
>  void opal_wake_poller(void);
>  
> +void opal_powercap_init(void);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_OPAL_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index b5d98cb..e79f806 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o
> +obj-y+= opal-kmsg.o opal-powercap.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c 
> b/arch/powerpc/platforms/powernv/opal-powercap.c
> new file mode 100644
> index 000..9be5093
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-powercap.c
> @@ -0,0 +1,243 @@
> +/*
> + * PowerNV OPAL Powercap interface
> + *
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#define pr_fmt(fmt) "opal-powercap: " fmt
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_MUTEX(powercap_mutex);
> +
> +static struct kobject *powercap_kobj;
> +
> +struct powercap_attr {
> + u32 handle;
> + struct kobj_attribute attr;
> +};
> +
> +static struct pcap {
> + struct attribute_group pg;
> + struct powercap_attr *pattrs;
> +} *pcaps;
> +
> +static ssize_t powercap_show(struct kobject *kobj, struct kobj_attribute 
> *attr,
> +  char *buf)
> +{
> + struct powercap_attr *pcap_attr = container_of(attr,
> + struct powercap_attr, attr);
> + struct opal_msg msg;
> + u32 pcap;
> + int ret, token;
> +
> + token = opal_async_get_token_interruptible();
> + if (token < 0) {
> +

Re: [PATCH v4 2/5] powerpc/lib/sstep: Add popcnt instruction emulation

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote:
> This adds emulations for the popcntb, popcntw, and popcntd instructions.
> Tested for correctness against the popcnt{b,w,d} instructions on ppc64le.
> 
> Signed-off-by: Matt Brown <matthew.brown@gmail.com>

Unlike the rest of this series, it isn't immediately clear that it is
correct, we're definitely on the other side of the optimisation vs
readability line. It looks like it is, perhaps some comments to
clarify.

Otherwise,

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
> v4:
>   - change ifdef macro from __powerpc64__ to CONFIG_PPC64
>   - slight optimisations 
>   (now identical to the popcntb implementation in kernel/traps.c)
> v3:
>   - optimised using the Giles-Miller method of side-ways addition
> v2:
>   - fixed opcodes
>   - fixed typecasting
>   - fixed bitshifting error for both 32 and 64bit arch
> ---
>  arch/powerpc/lib/sstep.c | 42 +-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 87d277f..2fd7377 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -612,6 +612,34 @@ static nokprobe_inline void do_cmpb(struct pt_regs 
> *regs, unsigned long v1,
>   regs->gpr[rd] = out_val;
>  }
>  
> +/*
> + * The size parameter is used to adjust the equivalent popcnt instruction.
> + * popcntb = 8, popcntw = 32, popcntd = 64
> + */
> +static nokprobe_inline void do_popcnt(struct pt_regs *regs, unsigned long v1,
> + int size, int ra)
> +{
> + unsigned long long out = v1;
> +
> + out -= (out >> 1) & 0x;
> + out = (0x & out) + (0x & (out >> 2));
> + out = (out + (out >> 4)) & 0x0f0f0f0f0f0f0f0f;
> +
> + if (size == 8) {/* popcntb */
> + regs->gpr[ra] = out;
> + return;
> + }
> + out += out >> 8;
> + out += out >> 16;
> + if (size == 32) {   /* popcntw */
> + regs->gpr[ra] = out & 0x003f003f;
> + return;
> + }
> +
> + out = (out + (out >> 32)) & 0x7f;
> + regs->gpr[ra] = out;/* popcntd */
> +}
> +
>  static nokprobe_inline int trap_compare(long v1, long v2)
>  {
>   int ret = 0;
> @@ -1194,6 +1222,10 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   regs->gpr[ra] = regs->gpr[rd] & ~regs->gpr[rb];
>   goto logical_done;
>  
> + case 122:   /* popcntb */
> + do_popcnt(regs, regs->gpr[rd], 8, ra);
> + goto logical_done;
> +
>   case 124:   /* nor */
>   regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
> @@ -1206,6 +1238,10 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   regs->gpr[ra] = regs->gpr[rd] ^ regs->gpr[rb];
>   goto logical_done;
>  
> + case 378:   /* popcntw */
> + do_popcnt(regs, regs->gpr[rd], 32, ra);
> + goto logical_done;
> +
>   case 412:   /* orc */
>   regs->gpr[ra] = regs->gpr[rd] | ~regs->gpr[rb];
>   goto logical_done;
> @@ -1217,7 +1253,11 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   case 476:   /* nand */
>   regs->gpr[ra] = ~(regs->gpr[rd] & regs->gpr[rb]);
>   goto logical_done;
> -
> +#ifdef CONFIG_PPC64
> + case 506:   /* popcntd */
> + do_popcnt(regs, regs->gpr[rd], 64, ra);
> + goto logical_done;
> +#endif
>   case 922:   /* extsh */
>   regs->gpr[ra] = (signed short) regs->gpr[rd];
>   goto logical_done;

Re: [PATCH v4 5/5] powerpc/lib/sstep: Add isel instruction emulation

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote:
> This adds emulation for the isel instruction.
> Tested for correctness against the isel instruction and its extended
> mnemonics (lt, gt, eq) on ppc64le.
> 
> Signed-off-by: Matt Brown <matthew.brown@gmail.com>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
> v4:
>   - simplify if statement to ternary op
>   (same as isel emulation in kernel/traps.c)
> v2:
>   - fixed opcode
>   - fixed definition to include the 'if RA=0, a=0' clause
>   - fixed ccr bitshifting error
> ---
>  arch/powerpc/lib/sstep.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index af4eef9..473bab5 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1240,6 +1240,14 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>  /*
>   * Logical instructions
>   */
> + case 15:/* isel */
> + mb = (instr >> 6) & 0x1f; /* bc */
> + val = (regs->ccr >> (31 - mb)) & 1;
> + val2 = (ra) ? regs->gpr[ra] : 0;
> +
> + regs->gpr[rd] = (val) ? val2 : regs->gpr[rb];
> + goto logical_done;
> +
>   case 26:/* cntlzw */
>   asm("cntlzw %0,%1" : "=r" (regs->gpr[ra]) :
>   "r" (regs->gpr[rd]));

Re: [PATCH v4 4/5] powerpc/lib/sstep: Add prty instruction emulation

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote:
> This adds emulation for the prtyw and prtyd instructions.
> Tested for logical correctness against the prtyw and prtyd instructions
> on ppc64le.
> 
> Signed-off-by: Matt Brown <matthew.brown@gmail.com>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
> v4:
>   - use simpler xor method
> v3:
>   - optimised using the Giles-Miller method of side-ways addition
> v2:
>   - fixed opcodes
>   - fixed bitshifting and typecast errors
>   - merged do_prtyw and do_prtyd into single function
> ---
>  arch/powerpc/lib/sstep.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index c9fd613..af4eef9 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -657,6 +657,24 @@ static nokprobe_inline void do_bpermd(struct pt_regs 
> *regs, unsigned long v1,
>   regs->gpr[ra] = perm;
>  }
>  #endif /* CONFIG_PPC64 */
> +/*
> + * The size parameter adjusts the equivalent prty instruction.
> + * prtyw = 32, prtyd = 64
> + */
> +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v,
> + int size, int ra)
> +{
> + unsigned long long res = v ^ (v >> 8);
> +
> + res ^= res >> 16;
> + if (size == 32) {   /* prtyw */
> + regs->gpr[ra] = res & 0x00010001;
> + return;
> + }
> +
> + res ^= res >> 32;
> + regs->gpr[ra] = res & 1;/*prtyd */
> +}
>  
>  static nokprobe_inline int trap_compare(long v1, long v2)
>  {
> @@ -1247,6 +1265,14 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   case 124:   /* nor */
>   regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
> +
> + case 154:   /* prtyw */
> + do_prty(regs, regs->gpr[rd], 32, ra);
> + goto logical_done;
> +
> + case 186:   /* prtyd */
> + do_prty(regs, regs->gpr[rd], 64, ra);
> + goto logical_done;
>  #ifdef CONFIG_PPC64
>   case 252:   /* bpermd */
>   do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra);

Re: [PATCH v4 3/5] powerpc/lib/sstep: Add bpermd instruction emulation

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote:
> This adds emulation for the bpermd instruction.
> Tested for correctness against the bpermd instruction on ppc64le.
> 
> Signed-off-by: Matt Brown <matthew.brown@gmail.com>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
> v4:
>   - change ifdef macro from __powerpc64__ to CONFIG_PPC64
> v2:
>   - fixed opcode
>   - added ifdef tags to do_bpermd func
>   - fixed bitshifting errors
> ---
>  arch/powerpc/lib/sstep.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 2fd7377..c9fd613 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -640,6 +640,24 @@ static nokprobe_inline void do_popcnt(struct pt_regs 
> *regs, unsigned long v1,
>   regs->gpr[ra] = out;/* popcntd */
>  }
>  
> +#ifdef CONFIG_PPC64
> +static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1,
> + unsigned long v2, int ra)
> +{
> + unsigned char perm, idx;
> + unsigned int i;
> +
> + perm = 0;
> + for (i = 0; i < 8; i++) {
> + idx = (v1 >> (i * 8)) & 0xff;
> + if (idx < 64)
> + if (v2 & PPC_BIT(idx))
> + perm |= 1 << i;
> + }
> + regs->gpr[ra] = perm;
> +}
> +#endif /* CONFIG_PPC64 */
> +
>  static nokprobe_inline int trap_compare(long v1, long v2)
>  {
>   int ret = 0;
> @@ -1229,7 +1247,11 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   case 124:   /* nor */
>   regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
> -
> +#ifdef CONFIG_PPC64
> + case 252:   /* bpermd */
> + do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra);
> + goto logical_done;
> +#endif
>   case 284:   /* xor */
>   regs->gpr[ra] = ~(regs->gpr[rd] ^ regs->gpr[rb]);
>   goto logical_done;

Re: [PATCH v4 1/5] powerpc/lib/sstep: Add cmpb instruction emulation

2017-07-30 Thread Cyril Bur

On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote:
> This patch adds emulation of the cmpb instruction, enabling xmon to
> emulate this instruction.
> Tested for correctness against the cmpb asm instruction on ppc64le.
> 
> Signed-off-by: Matt Brown <matthew.brown@gmail.com>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
> v2: 
>   - fixed opcode
>   - fixed mask typecasting
> ---
>  arch/powerpc/lib/sstep.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 33117f8..87d277f 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -596,6 +596,22 @@ static nokprobe_inline void do_cmp_unsigned(struct 
> pt_regs *regs, unsigned long
>   regs->ccr = (regs->ccr & ~(0xf << shift)) | (crval << shift);
>  }
>  
> +static nokprobe_inline void do_cmpb(struct pt_regs *regs, unsigned long v1,
> + unsigned long v2, int rd)
> +{
> + unsigned long long out_val, mask;
> + int i;
> +
> + out_val = 0;
> + for (i = 0; i < 8; i++) {
> + mask = 0xffUL << (i * 8);
> + if ((v1 & mask) == (v2 & mask))
> + out_val |= mask;
> + }
> +
> + regs->gpr[rd] = out_val;
> +}
> +
>  static nokprobe_inline int trap_compare(long v1, long v2)
>  {
>   int ret = 0;
> @@ -1049,6 +1065,10 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   do_cmp_unsigned(regs, val, val2, rd >> 2);
>   goto instr_done;
>  
> + case 508: /* cmpb */
> + do_cmpb(regs, regs->gpr[rd], regs->gpr[rb], ra);
> + goto instr_done;
> +
>  /*
>   * Arithmetic instructions
>   */

Re: [PATCH] powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler

2017-07-28 Thread Cyril Bur

On Wed, 2017-07-26 at 23:19 +1000, Michael Ellerman wrote:
> Historically the boot wrapper was always built 32-bit big endian, even
> for 64-bit kernels. That was because old firmwares didn't necessarily
> support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
> uses CROSS32CC for compilation.
> 
> However when we added 64-bit little endian support, we also added
> support for building the boot wrapper 64-bit. However we kept using
> CROSS32CC, because in most cases it is just CC and everything works.
> 
> However if the user doesn't specify CROSS32_COMPILE (which no one ever
> does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
> becomes just "gcc". On native systems that is probably OK, but if we're
> cross building it definitely isn't, leading to eg:
> 
>   gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
>   gcc: error: unrecognized argument in option ‘-mabi=elfv2’
>   gcc: error: unrecognized command line option ‘-mlittle-endian’
>   make: *** [zImage] Error 2
> 
> To fix it, stop using CROSS32CC, because we may or may not be building
> 32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
> CROSS32_COMPILE if it's set and we're building for 32-bit.
> 
> Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian 
> wrapper")
> Signed-off-by: Michael Ellerman <m...@ellerman.id.au>

Without this patch applied and using a 64bit LE only toolchain my
powernv_defconfig build fails:

gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: note: valid arguments to ‘-mabi=’ are: ms sysv
  BOOTAS  arch/powerpc/boot/crt0.o
  BOOTCC  arch/powerpc/boot/cuboot.o
gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: note: valid arguments to ‘-mabi=’ are: ms sysv
  COPYarch/powerpc/boot/zlib.h
gcc: error: unrecognized command line option ‘-mlittle-endian’; did you
mean ‘-fconvert=little-endian’?
gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: error: unrecognized command line option ‘-mlittle-endian’; did you
mean ‘-fconvert=little-endian’?
gcc: note: valid arguments to ‘-mabi=’ are: ms sysv
  COPYarch/powerpc/boot/zutil.h
  COPYarch/powerpc/boot/inffast.h
  COPYarch/powerpc/boot/zconf.h
make[1]: *** [arch/powerpc/boot/Makefile:201: arch/powerpc/boot/crt0.o]
Error 1
make[1]: *** Waiting for unfinished jobs
  MODPOST 244 modules
gcc: error: unrecognized command line option ‘-mlittle-endian’; did you
mean ‘-fconvert=little-endian’?
make[1]: *** [arch/powerpc/boot/Makefile:198: arch/powerpc/boot/cpm-
serial.o] Error 1
make[1]: *** [arch/powerpc/boot/Makefile:198:
arch/powerpc/boot/cuboot.o] Error 1
  COPYarch/powerpc/boot/inffixed.h
make: *** [arch/powerpc/Makefile:289: zImage] Error 2
make: *** Waiting for unfinished jobs

With this patch applied builds fine. Please merge!

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  arch/powerpc/boot/Makefile | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index a7814a7b1523..6f952fe1f084 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -25,12 +25,20 @@ compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
>  BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
>-fno-strict-aliasing -Os -msoft-float -pipe \
>-fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
> -  -isystem $(shell $(CROSS32CC) -print-file-name=include) \
>-D$(compress-y)
>  
> +BOOTCC := $(CC)
>  ifdef CONFIG_PPC64_BOOT_WRAPPER
>  BOOTCFLAGS   += -m64
> +else
> +BOOTCFLAGS   += -m32
> +ifdef CROSS32_COMPILE
> +BOOTCC := $(CROSS32_COMPILE)gcc
> +endif
>  endif
> +
> +BOOTCFLAGS   += -isystem $(shell $(BOOTCC) -print-file-name=include)
> +
>  ifdef CONFIG_CPU_BIG_ENDIAN
>  BOOTCFLAGS   += -mbig-endian
>  else
> @@ -183,10 +191,10 @@ clean-files := $(zlib-) $(zlibheader-) 
> $(zliblinuxheader-) \
>   empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
>  
>  quiet_cmd_bootcc = BOOTCC  $@
> -  cmd_bootcc = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
> +  cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
>  
>  quiet_cmd_bootas = BOOTAS  $@
> -  cmd_bootas = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $<
> +  cmd_bootas = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $<
>  
>  quiet_cmd_bootar = BOOTAR  $@
>cmd_bootar = $(CROSS32AR) -cr$(KBUILD_ARFLAGS) $@. $(filter-out 
> FORCE,$^); mv $@. $@

Re: [PATCH] powerpc/configs: Add a powernv_be_defconfig

2017-07-28 Thread Cyril Bur

On Mon, 2017-07-24 at 22:50 +1000, Michael Ellerman wrote:
> Although pretty much everyone using powernv is running little endian,
> we should still test we can build for big endian. So add a
> powernv_be_defconfig, which is autogenerated by flipping the endian
> symbol in powernv_defconfig.
> 
> Signed-off-by: Michael Ellerman <m...@ellerman.id.au>

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  arch/powerpc/Makefile  | 4 
>  arch/powerpc/configs/be.config | 1 +
>  2 files changed, 5 insertions(+)
>  create mode 100644 arch/powerpc/configs/be.config
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 8d4ed73d5490..7b8eddfc46d2 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -316,6 +316,10 @@ PHONY += ppc64le_defconfig
>  ppc64le_defconfig:
>   $(call merge_into_defconfig,ppc64_defconfig,le)
>  
> +PHONY += powernv_be_defconfig
> +powernv_be_defconfig:
> + $(call merge_into_defconfig,powernv_defconfig,be)
> +
>  PHONY += mpc85xx_defconfig
>  mpc85xx_defconfig:
>   $(call merge_into_defconfig,mpc85xx_basic_defconfig,\
> diff --git a/arch/powerpc/configs/be.config b/arch/powerpc/configs/be.config
> new file mode 100644
> index ..c5cdc99a6530
> --- /dev/null
> +++ b/arch/powerpc/configs/be.config
> @@ -0,0 +1 @@
> +CONFIG_CPU_BIG_ENDIAN=y

Re: [PATCH V8 3/3] powernv: Add support to clear sensor groups data

2017-07-27 Thread Cyril Bur

On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote:
> Adds support for clearing different sensor groups. OCC inband sensor
> groups like CSM, Profiler, Job Scheduler can be cleared using this
> driver. The min/max of all sensors belonging to these sensor groups
> will be cleared.
> 

Hi Shilpasri,

I think also some comments from v1 also apply here.

Other comments inline

Thanks,

Cyril

> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from V7:
> - s/send_occ_command/opal_sensor_groups_clear_history
> 
>  arch/powerpc/include/asm/opal-api.h|   3 +-
>  arch/powerpc/include/asm/opal.h|   2 +
>  arch/powerpc/include/uapi/asm/opal-occ.h   |  23 ++
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-occ.c  | 109 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>  arch/powerpc/platforms/powernv/opal.c  |   3 +
>  7 files changed, 141 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
>  create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 0d37315..342738a 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -195,7 +195,8 @@
>  #define OPAL_SET_POWERCAP153
>  #define OPAL_GET_PSR 154
>  #define OPAL_SET_PSR 155
> -#define OPAL_LAST155
> +#define OPAL_SENSOR_GROUPS_CLEAR 156
> +#define OPAL_LAST156
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 58b30a4..92db6af 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int opal_set_powercap(u32 handle, int token, u32 pcap);
>  int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr);
>  int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr);
> +int opal_sensor_groups_clear(u32 group_hndl, int token);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  
>  void opal_powercap_init(void);
>  void opal_psr_init(void);
> +int opal_sensor_groups_clear_history(u32 handle);
>  
>  #endif /* __ASSEMBLY__ */
>  
> diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h 
> b/arch/powerpc/include/uapi/asm/opal-occ.h
> new file mode 100644
> index 000..97c45e2
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/opal-occ.h
> @@ -0,0 +1,23 @@
> +/*
> + * OPAL OCC command interface
> + * Supported on POWERNV platform
> + *
> + * (C) Copyright IBM 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_
> +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_
> +
> +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS   _IOR('o', 1, u32)
> +
> +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index 9ed7d33..f193b33 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o
> +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o 
> opal-occ.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-occ.c 
> b/arch/powerpc/platforms/powernv/opal-occ.c
> new file mode 100644
> index 000..d1d4b28
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-occ.c
> @@ -0,0 +1,109 @@
> +/*
> + * Copyright IBM Corporation 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version

Re: [PATCH V8 2/3] powernv: Add support to set power-shifting-ratio

2017-07-27 Thread Cyril Bur

On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote:
> This patch adds support to set power-shifting-ratio for CPU-GPU which
> is used by OCC power capping algorithm.
> 
> Signed-off-by: Shilpasri G Bhat 


Hi Shilpasri,

I started looking though this - a lot the comments to patch 1/3 apply
here so I'll stop repeating myself :).


Thanks,

Cyril
> ---
> Changes from V7:
> - Replaced sscanf with kstrtoint
> 
>  arch/powerpc/include/asm/opal-api.h|   4 +-
>  arch/powerpc/include/asm/opal.h|   3 +
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-psr.c  | 169 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/opal.c  |   3 +
>  6 files changed, 181 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index c3e0c4a..0d37315 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -193,7 +193,9 @@
>  #define OPAL_NPU_MAP_LPAR148
>  #define OPAL_GET_POWERCAP152
>  #define OPAL_SET_POWERCAP153
> -#define OPAL_LAST153
> +#define OPAL_GET_PSR 154
> +#define OPAL_SET_PSR 155
> +#define OPAL_LAST155
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index ec2087c..58b30a4 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
>  int opal_get_powercap(u32 handle, int token, u32 *pcap);
>  int opal_set_powercap(u32 handle, int token, u32 pcap);
> +int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr);
> +int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  void opal_wake_poller(void);
>  
>  void opal_powercap_init(void);
> +void opal_psr_init(void);
>  
>  #endif /* __ASSEMBLY__ */
>  
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index e79f806..9ed7d33 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o opal-powercap.o
> +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
> b/arch/powerpc/platforms/powernv/opal-psr.c
> new file mode 100644
> index 000..07e3f78
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-psr.c
> @@ -0,0 +1,169 @@
> +/*
> + * PowerNV OPAL Power-Shifting-Ratio interface
> + *
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#define pr_fmt(fmt) "opal-psr: " fmt
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_MUTEX(psr_mutex);
> +
> +static struct kobject *psr_kobj;
> +
> +struct psr_attr {
> + u32 handle;
> + struct kobj_attribute attr;
> +};
> +
> +static struct psr_attr *psr_attrs;
> +static struct kobject *psr_kobj;
> +
> +static ssize_t psr_show(struct kobject *kobj, struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct psr_attr *psr_attr = container_of(attr, struct psr_attr, attr);
> + struct opal_msg msg;
> + int psr, ret, token;
> +
> + token = opal_async_get_token_interruptible();
> + if (token < 0) {
> + pr_devel("Failed to get token\n");
> + return token;
> + }
> +
> + mutex_lock(_mutex);
> + ret = opal_get_power_shifting_ratio(psr_attr->handle, token, );

__pa()

> + switch (ret) {
> + case OPAL_ASYNC_COMPLETION:
> + ret = opal_async_wait_response(token, );
> + if (ret) {
> +

Re: [PATCH V8 1/3] powernv: powercap: Add support for powercap framework

2017-07-27 Thread Cyril Bur

On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote:
> Adds a generic powercap framework to change the system powercap
> inband through OPAL-OCC command/response interface.
> 
> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from V7:
> - Replaced sscanf with kstrtoint
> 
>  arch/powerpc/include/asm/opal-api.h|   5 +-
>  arch/powerpc/include/asm/opal.h|   4 +
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-powercap.c | 237 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/opal.c  |   4 +
>  6 files changed, 252 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 3130a73..c3e0c4a 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -42,6 +42,7 @@
>  #define OPAL_I2C_STOP_ERR-24
>  #define OPAL_XIVE_PROVISIONING   -31
>  #define OPAL_XIVE_FREE_ACTIVE-32
> +#define OPAL_TIMEOUT -33
>  
>  /* API Tokens (in r0) */
>  #define OPAL_INVALID_CALL   -1
> @@ -190,7 +191,9 @@
>  #define OPAL_NPU_INIT_CONTEXT146
>  #define OPAL_NPU_DESTROY_CONTEXT 147
>  #define OPAL_NPU_MAP_LPAR148
> -#define OPAL_LAST148
> +#define OPAL_GET_POWERCAP152
> +#define OPAL_SET_POWERCAP153
> +#define OPAL_LAST153
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 588fb1c..ec2087c 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int64_t opal_xive_free_irq(uint32_t girq);
>  int64_t opal_xive_sync(uint32_t type, uint32_t id);
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
> +int opal_get_powercap(u32 handle, int token, u32 *pcap);
> +int opal_set_powercap(u32 handle, int token, u32 pcap);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  
>  void opal_wake_poller(void);
>  
> +void opal_powercap_init(void);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_OPAL_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index b5d98cb..e79f806 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o
> +obj-y+= opal-kmsg.o opal-powercap.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c 
> b/arch/powerpc/platforms/powernv/opal-powercap.c
> new file mode 100644
> index 000..7c57f4b
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-powercap.c
> @@ -0,0 +1,237 @@
> +/*
> + * PowerNV OPAL Powercap interface
> + *
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#define pr_fmt(fmt) "opal-powercap: " fmt
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_MUTEX(powercap_mutex);
> +
> +static struct kobject *powercap_kobj;
> +
> +struct powercap_attr {
> + u32 handle;
> + struct kobj_attribute attr;
> +};
> +
> +static struct attribute_group *pattr_groups;
> +static struct powercap_attr *pcap_attrs;
> +
> +static ssize_t powercap_show(struct kobject *kobj, struct kobj_attribute 
> *attr,
> +  char *buf)
> +{
> + struct powercap_attr *pcap_attr = container_of(attr,
> + struct powercap_attr, attr);
> + struct opal_msg msg;
> + u32 pcap;
> + int ret, token;
> +
> + token = opal_async_get_token_interruptible();
> + if (token < 0) {
> + pr_devel("Failed to get token\n");
> + return token;
> + }
> +
> + mutex_lock(_mutex);

If this is purely a userspace interface,

Re: [PATCH] powerpc/tm: fix TM SPRs in code dump file

2017-07-23 Thread Cyril Bur

On Wed, 2017-07-19 at 01:44 -0400, Gustavo Romero wrote:
> Currently flush_tmregs_to_thread() does not update accordingly the thread
> structures from live state before a core dump rendering wrong values of
> THFAR, TFIAR, and TEXASR in core dump files.
> 
> That commit fixes it by copying from live state to the appropriate thread
> structures when it's necessary.
> 
> Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com>

Gustavo was nice enough to provide me with a simple test case:

int main(void)
{
__builtin_set_texasr(0x4841434b);
__builtin_set_tfhar(0xbfee00);
__builtin_set_tfiar(0x4841434b);

asm volatile (".long 0x0");

return 0;
}

Running this binary in a loop and inspecting the resulting core file
with a modified elfutils also provided by Gustavo (https://sourceware.o
rg/ml/elfutils-devel/2017-q3/msg00030.html) should always observe the
values that those __builtin functions set.
__builtin_set_{texasr,tfhar,tfiar} are just wrappers around the
corresponding mtspr instruction.

On an unmodified 4.13-rc1 it takes in the order of 10 executions of the
test to observe an incorrect TM SPR values in the core file (typically
zero).

The above test was run on the same 4.13-rc1 with this patch applied for
a over 48 hours. The test was executed at a rate of about one run per
second. An incorrect value was never observed.

This gives me confidence that this patch is correct.

Running the kernel selftests does not detect any regressions.

Reviewed-by: Cyril Bur <cyril...@gmail.com>

> ---
>  arch/powerpc/kernel/ptrace.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 925a4ef..660ed39 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -127,12 +127,19 @@ static void flush_tmregs_to_thread(struct task_struct 
> *tsk)
>* If task is not current, it will have been flushed already to
>* it's thread_struct during __switch_to().
>*
> -  * A reclaim flushes ALL the state.
> +  * A reclaim flushes ALL the state or if not in TM save TM SPRs
> +  * in the appropriate thread structures from live.
>*/
>  
> - if (tsk == current && MSR_TM_SUSPENDED(mfmsr()))
> - tm_reclaim_current(TM_CAUSE_SIGNAL);
> + if (tsk != current)
> + return;
>  
> + if (MSR_TM_SUSPENDED(mfmsr())) {
> + tm_reclaim_current(TM_CAUSE_SIGNAL);
> + } else {
> + tm_enable();
> + tm_save_sprs(&(tsk->thread));
> + }
>  }
>  #else
>  static inline void flush_tmregs_to_thread(struct task_struct *tsk) { }

Re: [PATCH v3 02/10] mtd: powernv_flash: Lock around concurrent access to OPAL

2017-07-17 Thread Cyril Bur

On Mon, 2017-07-17 at 19:29 +1000, Balbir Singh wrote:
> On Mon, 2017-07-17 at 17:55 +1000, Cyril Bur wrote:
> > On Mon, 2017-07-17 at 17:34 +1000, Balbir Singh wrote:
> > > On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote:
> > > > OPAL can only manage one flash access at a time and will return an
> > > > OPAL_BUSY error for each concurrent access to the flash. The simplest
> > > > way to prevent this from happening is with a mutex.
> > > > 
> > > > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > > > ---
> > > 
> > > Should the mutex_lock() be mutex_lock_interruptible()? Are we OK waiting 
> > > on
> > > the mutex while other operations with the lock are busy?
> > > 
> > 
> > This is a good question. My best interpretation is that
> > _interruptible() should be used when you'll only be coming from a user
> > context. Which is mostly true for this driver, however, MTD does
> > provide kernel interfaces, so I was hesitant, there isn't a great deal
> > of use of _interruptible() in drivers/mtd. 
> > 
> > Thoughts?
> 
> What are the kernel interfaces (I have not read through mtd in detail)?
> I would still like to see us not blocked in mutex_lock() across threads
> for parallel calls, one option is to use mutex_trylock() and return if
> someone already holds the mutex with -EBUSY, but you'll need to evaluate
> what that means for every call.
> 

Yeah maybe mutex_trylock() is the way to go, thinking quickly, I don't
see how it could be a problem for userspace using powernv_flash. I'm
honestly not too sure about the depths of the mtd kernel interfaces but
I've seen a tonne of cool stuff you could do, hence my reluctance to go
with _interruptible()

Cyril
> Balbir Singh.
>

Re: [PATCH v3 03/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error

2017-07-17 Thread Cyril Bur

On Mon, 2017-07-17 at 18:50 +1000, Balbir Singh wrote:
> On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote:
> > While this driver expects to interact asynchronously, OPAL is well
> > within its rights to return OPAL_SUCCESS to indicate that the operation
> > completed without the need for a callback. We shouldn't treat
> > OPAL_SUCCESS as an error rather we should wrap up and return promptly to
> > the caller.
> > 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > ---
> > I'll note here that currently no OPAL exists that will return
> > OPAL_SUCCESS so there isn't the possibility of a bug today.
> 
> It would help if you mentioned OPAL_SUCCESS to the async call. So effectively
> what we expected to be an asynchronous call with callback, but OPAL returned
> immediately with success.
> 

Ah my favourite problems, commit message.

Thanks,

Cyril

> Balbir Singh.
>

Re: [PATCH v3 06/10] powerpc/opal: Rework the opal-async interface

2017-07-17 Thread Cyril Bur

On Mon, 2017-07-17 at 21:30 +1000, Balbir Singh wrote:
> On Wed, 2017-07-12 at 14:23 +1000, Cyril Bur wrote:
> > Future work will add an opal_async_wait_response_interruptible()
> > which will call wait_event_interruptible(). This work requires extra
> > token state to be tracked as wait_event_interruptible() can return and
> > the caller could release the token before OPAL responds.
> > 
> > Currently token state is tracked with two bitfields which are 64 bits
> > big but may not need to be as OPAL informs Linux how many async tokens
> > there are. It also uses an array indexed by token to store response
> > messages for each token.
> > 
> > The bitfields make it difficult to add more state and also provide a
> > hard maximum as to how many tokens there can be - it is possible that
> > OPAL will inform Linux that there are more than 64 tokens.
> > 
> > Rather than add a bitfield to track the extra state, rework the
> > internals slightly.
> > 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > ---
> >  arch/powerpc/platforms/powernv/opal-async.c | 97 
> > -
> >  1 file changed, 53 insertions(+), 44 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
> > b/arch/powerpc/platforms/powernv/opal-async.c
> > index 1d56ac9da347..d692372a0363 100644
> > --- a/arch/powerpc/platforms/powernv/opal-async.c
> > +++ b/arch/powerpc/platforms/powernv/opal-async.c
> > @@ -1,7 +1,7 @@
> >  /*
> >   * PowerNV OPAL asynchronous completion interfaces
> >   *
> > - * Copyright 2013 IBM Corp.
> > + * Copyright 2013-2017 IBM Corp.
> >   *
> >   * This program is free software; you can redistribute it and/or
> >   * modify it under the terms of the GNU General Public License
> > @@ -23,40 +23,46 @@
> >  #include 
> >  #include 
> >  
> > -#define N_ASYNC_COMPLETIONS64
> > +enum opal_async_token_state {
> > +   ASYNC_TOKEN_FREE,
> > +   ASYNC_TOKEN_ALLOCATED,
> > +   ASYNC_TOKEN_COMPLETED
> > +};
> 
> Are these states mutually exclusive? Does _COMPLETED imply that it is also
> _ALLOCATED? 

Yes

> ALLOCATED and FREE are confusing, I would use IN_USE and NOT_IN_USE
> for tokens. If these are mutually exclusive then you can use IN_USE and 
> !IN_USE
> 

Perhaps instead of _FREE it could be _UNALLOCATED ?

> > +
> > +struct opal_async_token {
> > +   enum opal_async_token_state state;
> > +   struct opal_msg response;
> > +};
> >  
> > -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = 
> > {~0UL};
> > -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS);
> >  static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait);
> >  static DEFINE_SPINLOCK(opal_async_comp_lock);
> >  static struct semaphore opal_async_sem;
> > -static struct opal_msg *opal_async_responses;
> >  static unsigned int opal_max_async_tokens;
> > +static struct opal_async_token *opal_async_tokens;
> >  
> >  static int __opal_async_get_token(void)
> >  {
> > unsigned long flags;
> > int token;
> >  
> > -   spin_lock_irqsave(_async_comp_lock, flags);
> > -   token = find_first_bit(opal_async_complete_map, opal_max_async_tokens);
> > -   if (token >= opal_max_async_tokens) {
> > -   token = -EBUSY;
> > -   goto out;
> > -   }
> > -
> > -   if (__test_and_set_bit(token, opal_async_token_map)) {
> > -   token = -EBUSY;
> > -   goto out;
> > +   for (token = 0; token < opal_max_async_tokens; token++) {
> > +   spin_lock_irqsave(_async_comp_lock, flags);
> 
> Why is the spin lock inside the for loop? If the last token is free, the
> number of times we'll take and release a lock is extensive, why are we
> doing it this way?
> 

Otherwise we might hold the lock for quite some time. At the moment I
think it isn't a bit deal since OPAL gives 8 but there is current work
to increase that number and while it seems the number might only grow
to 16, for a while it was looking like it might grow more.

In a previous iteration I had a check inside the loop but outside the
lock for if (token == ASYNC_TOKEN_FREE) which would then proceed to
take the lock, check again and mark it allocated...

Or I could put the lock around the loop, I'm not attached to any
particular approach.

> > +   if (opal_async_tokens[token].state == ASYNC_TOKEN_FREE) {
> > +   opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
> > +   spin_unlock_irqrestore(_async_comp_lock, flags);

Re: [PATCH v3 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()

2017-07-17 Thread Cyril Bur

On Mon, 2017-07-17 at 13:33 +0200, Frans Klaver wrote:
> On Wed, Jul 12, 2017 at 6:22 AM, Cyril Bur <cyril...@gmail.com> wrote:
> > BUG_ON() should be reserved in situations where we can not longer
> > guarantee the integrity of the system. In the case where
> > powernv_flash_async_op() receives an impossible op, we can still
> > guarantee the integrity of the system.
> > 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > ---
> >  drivers/mtd/devices/powernv_flash.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mtd/devices/powernv_flash.c 
> > b/drivers/mtd/devices/powernv_flash.c
> > index f5396f26ddb4..a9a20c00687c 100644
> > --- a/drivers/mtd/devices/powernv_flash.c
> > +++ b/drivers/mtd/devices/powernv_flash.c
> > @@ -78,7 +78,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
> > enum flash_op op,
> > rc = opal_flash_erase(info->id, offset, len, token);
> > break;
> > default:
> > -   BUG_ON(1);
> > +   WARN_ON_ONCE(1);
> > +   return -EIO;
> 
> Based on the fact that all three values in enum flash_op are handled,
> I would go as far as stating that the default lemma adds no value and
> can be removed.
> 

The way I see it is that it isn't doing any harm being there and in
cases of future programmer error or during corruption events, that
WARN_ON might prove useful.

> Frans

Re: [PATCH v3 02/10] mtd: powernv_flash: Lock around concurrent access to OPAL

2017-07-17 Thread Cyril Bur

On Mon, 2017-07-17 at 17:34 +1000, Balbir Singh wrote:
> On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote:
> > OPAL can only manage one flash access at a time and will return an
> > OPAL_BUSY error for each concurrent access to the flash. The simplest
> > way to prevent this from happening is with a mutex.
> > 
> > Signed-off-by: Cyril Bur <cyril...@gmail.com>
> > ---
> 
> Should the mutex_lock() be mutex_lock_interruptible()? Are we OK waiting on
> the mutex while other operations with the lock are busy?
> 

This is a good question. My best interpretation is that
_interruptible() should be used when you'll only be coming from a user
context. Which is mostly true for this driver, however, MTD does
provide kernel interfaces, so I was hesitant, there isn't a great deal
of use of _interruptible() in drivers/mtd. 

Thoughts?

Cyril

> Balbir Singh.
>

[PATCH v3 03/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error

2017-07-11 Thread Cyril Bur

While this driver expects to interact asynchronously, OPAL is well
within its rights to return OPAL_SUCCESS to indicate that the operation
completed without the need for a callback. We shouldn't treat
OPAL_SUCCESS as an error rather we should wrap up and return promptly to
the caller.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
I'll note here that currently no OPAL exists that will return
OPAL_SUCCESS so there isn't the possibility of a bug today.

 drivers/mtd/devices/powernv_flash.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 7b41af06f4fe..d50b5f200f73 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -66,9 +66,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum 
flash_op op,
if (token < 0) {
if (token != -ERESTARTSYS)
dev_err(dev, "Failed to get an async token\n");
-
-   rc = token;
-   goto out;
+   mutex_unlock(>lock);
+   return token;
}
 
switch (op) {
@@ -87,23 +86,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
goto out;
}
 
+   if (rc == OPAL_SUCCESS)
+   goto out_success;
+
if (rc != OPAL_ASYNC_COMPLETION) {
dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
op, rc);
-   opal_async_release_token(token);
rc = -EIO;
goto out;
}
 
rc = opal_async_wait_response(token, );
-   opal_async_release_token(token);
-   mutex_unlock(>lock);
if (rc) {
dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   return -EIO;
+   rc = -EIO;
+   goto out;
}
 
rc = opal_get_async_rc(msg);
+out_success:
if (rc == OPAL_SUCCESS) {
rc = 0;
if (retlen)
@@ -112,8 +113,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
rc = -EIO;
}
 
-   return rc;
 out:
+   opal_async_release_token(token);
mutex_unlock(>lock);
return rc;
 }
-- 
2.13.2

[PATCH v3 04/10] mtd: powernv_flash: Remove pointless goto in driver init

2017-07-11 Thread Cyril Bur

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index d50b5f200f73..d7243b72ba6e 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -232,21 +232,20 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
int ret;
 
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!data)
+   return -ENOMEM;
+
data->mtd.priv = data;
 
ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id));
if (ret) {
dev_err(dev, "no device property 'ibm,opal-id'\n");
-   goto out;
+   return ret;
}
 
ret = powernv_flash_set_driver_info(dev, >mtd);
if (ret)
-   goto out;
+   return ret;
 
mutex_init(>lock);
 
@@ -257,10 +256,7 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
 * with an ffs partition at the start, it should prove easier for users
 * to deal with partitions or not as they see fit
 */
-   ret = mtd_device_register(>mtd, NULL, 0);
-
-out:
-   return ret;
+   return mtd_device_register(>mtd, NULL, 0);
 }
 
 /**
-- 
2.13.2

[PATCH v3 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async

2017-07-11 Thread Cyril Bur

This patch adds an _interruptible version of opal_async_wait_response().
This is useful when a long running OPAL call is performed on behalf of a
userspace thread, for example, the opal_flash_{read,write,erase}
functions performed by the powernv-flash MTD driver.

It is foreseeable that these functions would take upwards of two minutes
causing the wait_event() to block long enough to cause hung task
warnings. Furthermore, wait_event_interruptible() is preferable as
otherwise there is no way for signals to stop the process which is going
to be confusing in userspace.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/include/asm/opal.h |  2 +
 arch/powerpc/platforms/powernv/opal-async.c | 87 +++--
 2 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 5553ad2f3e53..6e9e53d744f3 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -294,6 +294,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, 
uint64_t evt_val);
 extern int opal_async_get_token_interruptible(void);
 extern int opal_async_release_token(int token);
 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg);
+extern int opal_async_wait_response_interruptible(uint64_t token,
+   struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 
 struct rtc_time;
diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index d692372a0363..f6b30cfceb8f 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -26,6 +26,8 @@
 enum opal_async_token_state {
ASYNC_TOKEN_FREE,
ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_DISPATCHED,
+   ASYNC_TOKEN_ABANDONED,
ASYNC_TOKEN_COMPLETED
 };
 
@@ -59,8 +61,10 @@ static int __opal_async_get_token(void)
 }
 
 /*
- * Note: If the returned token is used in an opal call and opal returns
- * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * Note: If the returned token is used in an opal call and opal
+ * returns OPAL_ASYNC_COMPLETION you MUST one of
+ * opal_async_wait_response() or
+ * opal_async_wait_response_interruptible() at least once before
  * calling another other opal_async_* function
  */
 int opal_async_get_token_interruptible(void)
@@ -97,6 +101,16 @@ static int __opal_async_release_token(int token)
opal_async_tokens[token].state = ASYNC_TOKEN_FREE;
rc = 0;
break;
+   /*
+* DISPATCHED and ABANDONED tokens must wait for OPAL to
+* respond.
+* Mark a DISPATCHED token as ABANDONED so that the response
+* response handling code knows no one cares and that it can
+* free it then.
+*/
+   case ASYNC_TOKEN_DISPATCHED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED;
+   /* Fall through */
default:
rc = 1;
}
@@ -129,7 +143,11 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
return -EINVAL;
}
 
-   /* Wakeup the poller before we wait for events to speed things
+   /*
+* There is no need to mark the token as dispatched, wait_event()
+* will block until the token completes.
+*
+* Wakeup the poller before we wait for events to speed things
 * up on platforms or simulators where the interrupts aren't
 * functional.
 */
@@ -142,11 +160,66 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 }
 EXPORT_SYMBOL_GPL(opal_async_wait_response);
 
+int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg 
*msg)
+{
+   unsigned long flags;
+   int ret;
+
+   if (token >= opal_max_async_tokens) {
+   pr_err("%s: Invalid token passed\n", __func__);
+   return -EINVAL;
+   }
+
+   if (!msg) {
+   pr_err("%s: Invalid message pointer passed\n", __func__);
+   return -EINVAL;
+   }
+
+   /*
+* The first time this gets called we mark the token as DISPATCHED
+* so that if wait_event_interruptible() returns not zero and the
+* caller frees the token, we know not to actually free the token
+* until the response comes.
+*
+* Only change if the token is ALLOCATED - it may have been
+* completed even before the caller gets around to calling this
+* the first time.
+*
+* There is also a dirty great comment at the token allocation
+* function that if the opal call returns OPAL_ASYNC_COMPLETION to
+* the caller then the caller *must* call this or the not
+* interruptible version before doing anything else with the
+* token.
+*/
+

[PATCH v3 06/10] powerpc/opal: Rework the opal-async interface

2017-07-11 Thread Cyril Bur

Future work will add an opal_async_wait_response_interruptible()
which will call wait_event_interruptible(). This work requires extra
token state to be tracked as wait_event_interruptible() can return and
the caller could release the token before OPAL responds.

Currently token state is tracked with two bitfields which are 64 bits
big but may not need to be as OPAL informs Linux how many async tokens
there are. It also uses an array indexed by token to store response
messages for each token.

The bitfields make it difficult to add more state and also provide a
hard maximum as to how many tokens there can be - it is possible that
OPAL will inform Linux that there are more than 64 tokens.

Rather than add a bitfield to track the extra state, rework the
internals slightly.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-async.c | 97 -
 1 file changed, 53 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index 1d56ac9da347..d692372a0363 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -1,7 +1,7 @@
 /*
  * PowerNV OPAL asynchronous completion interfaces
  *
- * Copyright 2013 IBM Corp.
+ * Copyright 2013-2017 IBM Corp.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -23,40 +23,46 @@
 #include 
 #include 
 
-#define N_ASYNC_COMPLETIONS64
+enum opal_async_token_state {
+   ASYNC_TOKEN_FREE,
+   ASYNC_TOKEN_ALLOCATED,
+   ASYNC_TOKEN_COMPLETED
+};
+
+struct opal_async_token {
+   enum opal_async_token_state state;
+   struct opal_msg response;
+};
 
-static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL};
-static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS);
 static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait);
 static DEFINE_SPINLOCK(opal_async_comp_lock);
 static struct semaphore opal_async_sem;
-static struct opal_msg *opal_async_responses;
 static unsigned int opal_max_async_tokens;
+static struct opal_async_token *opal_async_tokens;
 
 static int __opal_async_get_token(void)
 {
unsigned long flags;
int token;
 
-   spin_lock_irqsave(_async_comp_lock, flags);
-   token = find_first_bit(opal_async_complete_map, opal_max_async_tokens);
-   if (token >= opal_max_async_tokens) {
-   token = -EBUSY;
-   goto out;
-   }
-
-   if (__test_and_set_bit(token, opal_async_token_map)) {
-   token = -EBUSY;
-   goto out;
+   for (token = 0; token < opal_max_async_tokens; token++) {
+   spin_lock_irqsave(_async_comp_lock, flags);
+   if (opal_async_tokens[token].state == ASYNC_TOKEN_FREE) {
+   opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED;
+   spin_unlock_irqrestore(_async_comp_lock, flags);
+   return token;
+   }
+   spin_unlock_irqrestore(_async_comp_lock, flags);
}
 
-   __clear_bit(token, opal_async_complete_map);
-
-out:
-   spin_unlock_irqrestore(_async_comp_lock, flags);
-   return token;
+   return -EBUSY;
 }
 
+/*
+ * Note: If the returned token is used in an opal call and opal returns
+ * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before
+ * calling another other opal_async_* function
+ */
 int opal_async_get_token_interruptible(void)
 {
int token;
@@ -76,6 +82,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible);
 static int __opal_async_release_token(int token)
 {
unsigned long flags;
+   int rc;
 
if (token < 0 || token >= opal_max_async_tokens) {
pr_err("%s: Passed token is out of range, token %d\n",
@@ -84,11 +91,18 @@ static int __opal_async_release_token(int token)
}
 
spin_lock_irqsave(_async_comp_lock, flags);
-   __set_bit(token, opal_async_complete_map);
-   __clear_bit(token, opal_async_token_map);
+   switch (opal_async_tokens[token].state) {
+   case ASYNC_TOKEN_COMPLETED:
+   case ASYNC_TOKEN_ALLOCATED:
+   opal_async_tokens[token].state = ASYNC_TOKEN_FREE;
+   rc = 0;
+   break;
+   default:
+   rc = 1;
+   }
spin_unlock_irqrestore(_async_comp_lock, flags);
 
-   return 0;
+   return rc;
 }
 
 int opal_async_release_token(int token)
@@ -96,12 +110,10 @@ int opal_async_release_token(int token)
int ret;
 
ret = __opal_async_release_token(token);
-   if (ret)
-   return ret;
-
-   up(_async_sem);
+   if (!ret)
+   up(_async_sem);
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(opal_async_release_token);
 
@@ -122,13 +134,15 @@ int opal_async_wait_response(uint

[PATCH v3 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()

2017-07-11 Thread Cyril Bur

The OPAL calls performed in this driver shouldn't be using
opal_async_wait_response() as this performs a wait_event() which, on
long running OPAL calls could result in hung task warnings. wait_event()
prevents timely signal delivery which is also undesirable.

This patch also attempts to quieten down the use of dev_err() when
errors haven't actually occurred and also to return better information up
the stack rather than always -EIO.

Signed-off-by: Cyril Bur <cyril...@gmail.com>
---
 drivers/mtd/devices/powernv_flash.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index d7243b72ba6e..cfa274ba7e40 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -90,16 +90,34 @@ static int powernv_flash_async_op(struct mtd_info *mtd, 
enum flash_op op,
goto out_success;
 
if (rc != OPAL_ASYNC_COMPLETION) {
-   dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n",
+   if (rc != OPAL_BUSY)
+   dev_err(dev, "opal_flash_async_op(op=%d) failed (rc 
%d)\n",
op, rc);
-   rc = -EIO;
+   rc = opal_error_code(rc);
goto out;
}
 
-   rc = opal_async_wait_response(token, );
+   rc = opal_async_wait_response_interruptible(token, );
if (rc) {
-   dev_err(dev, "opal async wait failed (rc %d)\n", rc);
-   rc = -EIO;
+   /*
+* Awkward, we've been interrupted but we cannot return. If we
+* do return the mtd core will free the buffer we've just
+* passed to OPAL but OPAL will continue to read or write from
+* that memory.
+* Future work will introduce a call to tell OPAL to stop
+* using the buffer.
+* It may be tempting to ultimately return 0 if we're doing a
+* read or a write since we are going to end up waiting until
+* OPAL is done. However, because the MTD core sends us the
+* userspace request in chunks, we must report EINTR so that
+* it doesn't just send us the next chunk, thus defeating the
+* point of the _interruptible wait.
+*/
+   rc = -EINTR;
+   if (op == FLASH_OP_READ || op == FLASH_OP_WRITE) {
+   if (opal_async_wait_response(token, ))
+   dev_err(dev, "opal async wait failed (rc 
%d)\n", rc);
+   }
goto out;
}
 
-- 
2.13.2

1 2 3 4 5 >

1 - 100 of 440 matches

Mail list logo