Re: [PATCH] selftests/powerpc: Skip tm-unavailable if TM is not enabled
On Mon, 2018-03-05 at 15:48 -0500, Gustavo Romero wrote: > Some processor revisions do not support transactional memory, and > additionally kernel support can be disabled. In either case the > tm-unavailable test should be skipped, otherwise it will fail with > a SIGILL. > > That commit also sets this selftest to be called through the test > harness as it's done for other TM selftests. > > Finally, it avoids using "ping" as a thread name since it's > ambiguous and can be confusing when shown, for instance, > in a kernel backtrace log. > I spent more time than I care to admit looking at backtraces wondering how "ping" got in the mix ;). > Fixes: 77fad8bfb1d2 ("selftests/powerpc: Check FP/VEC on exception in TM") > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > .../testing/selftests/powerpc/tm/tm-unavailable.c | 24 > ++ > 1 file changed, 16 insertions(+), 8 deletions(-) > > diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > index e6a0fad..156c8e7 100644 > --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > @@ -80,7 +80,7 @@ bool is_failure(uint64_t condition_reg) > return ((condition_reg >> 28) & 0xa) == 0xa; > } > > -void *ping(void *input) > +void *tm_una_ping(void *input) > { > > /* > @@ -280,7 +280,7 @@ void *ping(void *input) > } > > /* Thread to force context switch */ > -void *pong(void *not_used) > +void *tm_una_pong(void *not_used) > { > /* Wait thread get its name "pong". */ > if (DEBUG) > @@ -311,11 +311,11 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr) > do { > int rc; > > - /* Bind 'ping' to CPU 0, as specified in 'attr'. */ > - rc = pthread_create(, attr, ping, (void *) ); > + /* Bind to CPU 0, as specified in 'attr'. */ > + rc = pthread_create(, attr, tm_una_ping, (void *) ); > if (rc) > pr_err(rc, "pthread_create()"); > - rc = pthread_setname_np(t0, "ping"); > + rc = pthread_setname_np(t0, "tm_una_ping"); > if (rc) > pr_warn(rc, "pthread_setname_np"); > rc = pthread_join(t0, _value); > @@ -333,13 +333,15 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr) > } > } > > -int main(int argc, char **argv) > +int tm_unavailable_test(void) > { > int rc, exception; /* FP = 0, VEC = 1, VSX = 2 */ > pthread_t t1; > pthread_attr_t attr; > cpu_set_t cpuset; > > + SKIP_IF(!have_htm()); > + > /* Set only CPU 0 in the mask. Both threads will be bound to CPU 0. */ > CPU_ZERO(); > CPU_SET(0, ); > @@ -354,12 +356,12 @@ int main(int argc, char **argv) > if (rc) > pr_err(rc, "pthread_attr_setaffinity_np()"); > > - rc = pthread_create(, /* Bind 'pong' to CPU 0 */, pong, NULL); > + rc = pthread_create(, /* Bind to CPU 0 */, tm_una_pong, NULL); > if (rc) > pr_err(rc, "pthread_create()"); > > /* Name it for systemtap convenience */ > - rc = pthread_setname_np(t1, "pong"); > + rc = pthread_setname_np(t1, "tm_una_pong"); > if (rc) > pr_warn(rc, "pthread_create()"); > > @@ -394,3 +396,9 @@ int main(int argc, char **argv) > exit(0); > } > } > + > +int main(int argc, char **argv) > +{ > + test_harness_set_timeout(220); > + return test_harness(tm_unavailable_test, "tm_unavailable_test"); > +}
Re: [RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit
On Tue, 2018-02-20 at 16:25 +1100, Michael Neuling wrote: > > > > @@ -1055,6 +1082,8 @@ void restore_tm_state(struct pt_regs *regs) > > > > msr_diff = current->thread.ckpt_regs.msr & ~regs->msr; > > > > msr_diff &= MSR_FP | MSR_VEC | MSR_VSX; > > > > > > > > + tm_recheckpoint(>thread); > > > > + > > > > > > So why do we do tm_recheckpoint at all? Shouldn't most of the tm_blah > > > code go > > > away in process.c after all this? > > > > > > > I'm not sure I follow, we need to recheckpoint because we're going back > > to userspace? Or would you rather calling the tm.S code directly from > > the exception return path? > > Yeah, I was thinking the point of this series was. We do tm_reclaim right on > entry and tm_recheckpoint right on exit. > Yeah that's the ultimate goal, considering I haven't been attacked or offered more drugs I feel like what I've done isn't crazy. Your feedback is great, thanks. > The bits in between (ie. the tm_blah() calls process.c) would mostly go away. > > > > Yes, I hope we'll be able to have a fairly big cleanup commit of tm_ > > code in process.c at the end of this series. > > Yep, agreed. > > Mikey
Re: [RFC PATCH 10/12] [WIP] powerpc/tm: Correctly save/restore checkpointed sprs
On Tue, 2018-02-20 at 14:00 +1100, Michael Neuling wrote: > This needs a description of what you're trying to do. "Correctly" doesn't > really mean anything. > > > On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote: > > --- > > arch/powerpc/kernel/process.c | 57 > > +- > > - > > arch/powerpc/kernel/ptrace.c | 9 +++ > > 2 files changed, 58 insertions(+), 8 deletions(-) > > > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index cd3ae80a6878..674f75c56172 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -859,6 +859,8 @@ static inline bool tm_enabled(struct task_struct *tsk) > > return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM); > > } > > > > +static inline void save_sprs(struct thread_struct *t); > > + > > static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) > > { > > /* > > @@ -879,6 +881,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, > > uint8_t cause) > > if (!MSR_TM_SUSPENDED(mfmsr())) > > return; > > > > + save_sprs(thr); > > + > > giveup_all(container_of(thr, struct task_struct, thread)); > > > > tm_reclaim(thr, cause); > > @@ -991,6 +995,37 @@ void tm_recheckpoint(struct thread_struct *thread) > > > > __tm_recheckpoint(thread); > > > > + /* > > +* This is a stripped down restore_sprs(), we need to do this > > +* now as we might go straight out to userspace and currently > > +* the checkpointed values are on the CPU. > > +* > > +* TODO: Improve > > +*/ > > +#ifdef CONFIG_ALTIVEC > > + if (cpu_has_feature(CPU_FTR_ALTIVEC)) > > + mtspr(SPRN_VRSAVE, thread->vrsave); > > +#endif > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + if (cpu_has_feature(CPU_FTR_DSCR)) { > > + u64 dscr = get_paca()->dscr_default; > > + if (thread->dscr_inherit) > > + dscr = thread->dscr; > > + > > + mtspr(SPRN_DSCR, dscr); > > + } > > + > > + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { > > + /* The EBB regs aren't checkpointed */ > > + mtspr(SPRN_FSCR, thread->fscr); > > + > > + mtspr(SPRN_TAR, thread->tar); > > + } > > + > > + /* I think we don't need to */ > > + if (cpu_has_feature(CPU_FTR_ARCH_300)) > > + mtspr(SPRN_TIDR, thread->tidr); > > +#endif > > Why are you touching all the above hunk? I copied restore_sprs. I'm tidying that up now - we can't call restore_sprs because we don't have a prev and next thread. > > > local_irq_restore(flags); > > } > > > > @@ -1193,6 +1228,11 @@ struct task_struct *__switch_to(struct task_struct > > *prev, > > #endif > > > > new_thread = >thread; > > + /* > > +* Why not >thread; ? > > +* What is the difference between >thread and > > +* >thread ? > > +*/ > > Why not just work it out and FIX THE CODE, rather than just rabbiting on about > it! :-P Agreed - I started to and then had a mini freakout that things would end really badly if they're not the same. So I left that comment as a reminder to investigate. They should be the same though right? > > > old_thread = >thread; > > > > WARN_ON(!irqs_disabled()); > > @@ -1237,8 +1277,16 @@ struct task_struct *__switch_to(struct task_struct > > *prev, > > /* > > * We need to save SPRs before treclaim/trecheckpoint as these will > > * change a number of them. > > +* > > +* Because we're now reclaiming on kernel entry, we've had to > > +* already save them. Don't do it again. > > +* Note: To deliver a signal in the signal context, we'll have > > +* turned off TM because we don't want the signal context to > > +* have the transactional state of the main thread - what if > > +* we go through switch to at that point? Can we? > > */ > > - save_sprs(>thread); > > + if (!prev->thread.regs || !MSR_TM_ACTIVE(prev->thread.regs->msr)) > > + save_sprs(>thread); > > > > /* Save FPU, Altivec, VSX and SPE state */ > > giveup_all(prev); > > @@ -1260,8 +1308,13 @@ struct task_struct *__switch_to(struct task_struct > > *prev, > > * for this is we manually crea
Re: [RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit
On Tue, 2018-02-20 at 13:50 +1100, Michael Neuling wrote: > On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote: > > > The comment from the cover sheet should be here > > > --- > > arch/powerpc/include/asm/exception-64s.h | 25 + > > arch/powerpc/kernel/entry_64.S | 5 + > > arch/powerpc/kernel/process.c| 37 > > > > 3 files changed, 63 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/exception-64s.h > > b/arch/powerpc/include/asm/exception-64s.h > > index 471b2274fbeb..f904f19a9ec2 100644 > > --- a/arch/powerpc/include/asm/exception-64s.h > > +++ b/arch/powerpc/include/asm/exception-64s.h > > @@ -35,6 +35,7 @@ > > * implementations as possible. > > */ > > #include > > +#include > > > > /* PACA save area offsets (exgen, exmc, etc) */ > > #define EX_R9 0 > > @@ -127,6 +128,26 @@ > > hrfid; \ > > b hrfi_flush_fallback > > > > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM > > +#define TM_KERNEL_ENTRY > > \ > > + ld r3,_MSR(r1);\ > > + /* Probably don't need to check if coming from user/kernel */ \ > > + /* If TM is suspended or active then we must have come from*/ \ > > + /* userspace */ \ > > + andi. r0,r3,MSR_PR; \ > > + beq 1f; \ > > + rldicl. r3,r3,(64-MSR_TS_LG),(64-2); /* SUSPENDED or ACTIVE*/ \ > > + beql+ 1f; /* Not SUSPENDED or ACTIVE */ \ > > + bl save_nvgprs;\ > > + RECONCILE_IRQ_STATE(r10,r11); \ > > + li r3,TM_CAUSE_MISC; \ > > + bl tm_reclaim_current; /* uint8 cause */ \ > > +1: > > + > > +#else /* CONFIG_PPC_TRANSACTIONAL_MEM */ > > +#define TM_KERNEL_ENTRY > > +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ > > + > > #ifdef CONFIG_RELOCATABLE > > #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) > > \ > > mfspr r11,SPRN_##h##SRR0; /* save SRR0 */ \ > > @@ -675,6 +696,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) > > EXCEPTION_PROLOG_COMMON(trap, area);\ > > /* Volatile regs are potentially clobbered here */ \ > > additions; \ > > + /* This is going to need to go somewhere else as well */\ > > + /* See comment in tm_recheckpoint() */\ > > + TM_KERNEL_ENTRY;\ > > addir3,r1,STACK_FRAME_OVERHEAD; \ > > bl hdlr; \ > > b ret > > @@ -689,6 +713,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) > > EXCEPTION_PROLOG_COMMON_3(trap);\ > > /* Volatile regs are potentially clobbered here */ \ > > additions; \ > > + TM_KERNEL_ENTRY;\ > > addir3,r1,STACK_FRAME_OVERHEAD; \ > > bl hdlr > > > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 2cb5109a7ea3..107c15c6f48b 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -126,6 +126,11 @@ BEGIN_FW_FTR_SECTION > > 33: > > END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR) > > #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */ > > + TM_KERNEL_ENTRY > > + REST_GPR(0,r1) > > + REST_4GPRS(3,r1) > > + REST_2GPRS(7,r1) > > + addir9,r1,STACK_FRAME_OVERHEAD > > Why are we doing these restores here now? The syscall handler expects the syscall params to still be in their respective regs. > > > > > /* > > * A syscall should always be called with interrupts enabled > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index 77dc6d8288eb..ea75da0fd506 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -951,6 +951,23 @@ void tm_recheckpoint(struct thread_struct
Re: [RFC PATCH 06/12] [WIP] powerpc/tm: Remove dead code from __switch_to_tm()
On Tue, 2018-02-20 at 13:52 +1100, Michael Neuling wrote: > Not sure I understand this.. should it be merged with the last patch? > Its all going to have to be one patch - I've left it split out to make it more obvious which bits have had to mess with, this series absolutely doesn't bisect. > Needs a comment here. > > > On Tue, 2018-02-20 at 11:22 +1100, Cyril Bur wrote: > > --- > > arch/powerpc/kernel/process.c | 24 +--- > > 1 file changed, 5 insertions(+), 19 deletions(-) > > > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index ea75da0fd506..574b05fe7d66 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -1027,27 +1027,13 @@ static inline void __switch_to_tm(struct > > task_struct *prev, > > struct task_struct *new) > > { > > /* > > -* So, with the rework none of this code should not be needed. > > -* I've left in the reclaim for now. This *should* save us > > -* from any mistake in the new code. Also the > > -* enabling/disabling logic of MSR_TM really should be > > +* The enabling/disabling logic of MSR_TM really should be > > * refactored into a common way with MSR_{FP,VEC,VSX} > > */ > > - if (cpu_has_feature(CPU_FTR_TM)) { > > - if (tm_enabled(prev) || tm_enabled(new)) > > - tm_enable(); > > - > > - if (tm_enabled(prev)) { > > - prev->thread.load_tm++; > > - tm_reclaim_task(prev); > > - /* > > -* The disabling logic may be confused don't > > -* disable for now > > -* > > -* if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && > > prev->thread.load_tm == 0) > > -* prev->thread.regs->msr &= ~MSR_TM; > > -*/ > > - } > > + if (cpu_has_feature(CPU_FTR_TM) && tm_enabled(prev)) { > > + prev->thread.load_tm++; > > + if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && > > prev->thread.load_tm == 0) > > + prev->thread.regs->msr &= ~MSR_TM; > > } > > } > >
Re: [RFC PATCH 12/12] [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest
On Tue, 2018-02-20 at 14:04 +1100, Michael Neuling wrote: > > --- a/tools/testing/selftests/powerpc/tm/tm-syscall.c > > +++ /dev/null > > @@ -1,106 +0,0 @@ > > -/* > > - * Copyright 2015, Sam Bobroff, IBM Corp. > > - * Licensed under GPLv2. > > - * > > - * Test the kernel's system call code to ensure that a system call > > - * made from within an active HTM transaction is aborted with the > > - * correct failure code. > > The above is still true > > > - * Conversely, ensure that a system call made from within a > > - * suspended transaction can succeed. > > This is true anymore > > So can we just modify the test to remove the second part? > Oh true I overlooked that Thanks > Mikey
[RFC PATCH 12/12] [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest
Currently we perform transactional memory work at late as possible. That is we run in the kernel with the userspace checkpointed state on the CPU untill we absolultely must remove it and store it away. Likely a process switch, but possibly also signals or ptrace. What this means is that if userspace does a system call in suspended mode, it is possible that we will handle the system call and return them without the need to to a reclaim/recheckpoint and so they can expect to resume their transaction. This is what tm-syscall tests for - the ability to perform a system call in suspended state and still resume it afterwards. TM reworks have meant that we now deal with any transactional state on entry to the kernel, no matter the reason for entry (some expections apply). We will categorically doom any suspended transaction that makes a system call, making that transaction unresumeable. This test will now always fail no matter what. I would like to note here that this new behaviour does not break userspace at all. Hardware Transactional Memory gives zero guarantee of forward progress and any correct userspace has already had and will always have to implement a non HTM fallback. Relying on this specific kernel behaviour also meant relying on the stars aligning in the hardware such that there was no cache overlaps and that it had a large enough footprint to handle any system call without dooming a transaction. --- tools/testing/selftests/powerpc/tm/Makefile| 4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 28 -- tools/testing/selftests/powerpc/tm/tm-syscall.c| 106 - 3 files changed, 1 insertion(+), 137 deletions(-) delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 7a1e53297588..88d6edffcb24 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu \ tm-signal-context-chk-vmx tm-signal-context-chk-vsx -TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ +TEST_GEN_PROGS := tm-resched-dscr tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ tm-signal-drop-transaction \ $(SIGNAL_CONTEXT_CHK_TESTS) @@ -13,8 +13,6 @@ $(TEST_GEN_PROGS): ../harness.c ../utils.c CFLAGS += -mhtm -$(OUTPUT)/tm-syscall: tm-syscall-asm.S -$(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include $(OUTPUT)/tm-tmspr: CFLAGS += -pthread $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S deleted file mode 100644 index bd1ca25febe4.. --- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S +++ /dev/null @@ -1,28 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#include -#include - - .text -FUNC_START(getppid_tm_active) - tbegin. - beq 1f - li r0, __NR_getppid - sc - tend. - blr -1: - li r3, -1 - blr - -FUNC_START(getppid_tm_suspended) - tbegin. - beq 1f - li r0, __NR_getppid - tsuspend. - sc - tresume. - tend. - blr -1: - li r3, -1 - blr diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c deleted file mode 100644 index 454b965a2db3.. --- a/tools/testing/selftests/powerpc/tm/tm-syscall.c +++ /dev/null @@ -1,106 +0,0 @@ -/* - * Copyright 2015, Sam Bobroff, IBM Corp. - * Licensed under GPLv2. - * - * Test the kernel's system call code to ensure that a system call - * made from within an active HTM transaction is aborted with the - * correct failure code. - * Conversely, ensure that a system call made from within a - * suspended transaction can succeed. - */ - -#include -#include -#include -#include -#include -#include - -#include "utils.h" -#include "tm.h" - -extern int getppid_tm_active(void); -extern int getppid_tm_suspended(void); - -unsigned retries = 0; - -#define TEST_DURATION 10 /* seconds */ -#define TM_RETRIES 100 - -pid_t getppid_tm(bool suspend) -{ - int i; - pid_t pid; - - for (i = 0; i < TM_RETRIES; i++) { - if (suspend) - pid = getppid_tm_suspended(); - else - pid = getppid_tm_active(); - - if (pid >= 0) - return pid; - - if (failure_is_persistent()) { - if (failure_is_syscall()) - return -1; - - printf("Unexpected
[RFC PATCH 10/12] [WIP] powerpc/tm: Correctly save/restore checkpointed sprs
--- arch/powerpc/kernel/process.c | 57 +-- arch/powerpc/kernel/ptrace.c | 9 +++ 2 files changed, 58 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index cd3ae80a6878..674f75c56172 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -859,6 +859,8 @@ static inline bool tm_enabled(struct task_struct *tsk) return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM); } +static inline void save_sprs(struct thread_struct *t); + static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) { /* @@ -879,6 +881,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) if (!MSR_TM_SUSPENDED(mfmsr())) return; + save_sprs(thr); + giveup_all(container_of(thr, struct task_struct, thread)); tm_reclaim(thr, cause); @@ -991,6 +995,37 @@ void tm_recheckpoint(struct thread_struct *thread) __tm_recheckpoint(thread); + /* +* This is a stripped down restore_sprs(), we need to do this +* now as we might go straight out to userspace and currently +* the checkpointed values are on the CPU. +* +* TODO: Improve +*/ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + mtspr(SPRN_VRSAVE, thread->vrsave); +#endif +#ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_DSCR)) { + u64 dscr = get_paca()->dscr_default; + if (thread->dscr_inherit) + dscr = thread->dscr; + + mtspr(SPRN_DSCR, dscr); + } + + if (cpu_has_feature(CPU_FTR_ARCH_207S)) { + /* The EBB regs aren't checkpointed */ + mtspr(SPRN_FSCR, thread->fscr); + + mtspr(SPRN_TAR, thread->tar); + } + + /* I think we don't need to */ + if (cpu_has_feature(CPU_FTR_ARCH_300)) + mtspr(SPRN_TIDR, thread->tidr); +#endif local_irq_restore(flags); } @@ -1193,6 +1228,11 @@ struct task_struct *__switch_to(struct task_struct *prev, #endif new_thread = >thread; + /* +* Why not >thread; ? +* What is the difference between >thread and +* >thread ? +*/ old_thread = >thread; WARN_ON(!irqs_disabled()); @@ -1237,8 +1277,16 @@ struct task_struct *__switch_to(struct task_struct *prev, /* * We need to save SPRs before treclaim/trecheckpoint as these will * change a number of them. +* +* Because we're now reclaiming on kernel entry, we've had to +* already save them. Don't do it again. +* Note: To deliver a signal in the signal context, we'll have +* turned off TM because we don't want the signal context to +* have the transactional state of the main thread - what if +* we go through switch to at that point? Can we? */ - save_sprs(>thread); + if (!prev->thread.regs || !MSR_TM_ACTIVE(prev->thread.regs->msr)) + save_sprs(>thread); /* Save FPU, Altivec, VSX and SPE state */ giveup_all(prev); @@ -1260,8 +1308,13 @@ struct task_struct *__switch_to(struct task_struct *prev, * for this is we manually create a stack frame for new tasks that * directly returns through ret_from_fork() or * ret_from_kernel_thread(). See copy_thread() for details. +* +* It isn't stricly nessesary that we avoid the restore here +* because we'll simply restore again after the recheckpoint, +* but we can avoid it for performance reasons. */ - restore_sprs(old_thread, new_thread); + if (!new_thread->regs || !MSR_TM_ACTIVE(new_thread->regs->msr)) + restore_sprs(old_thread, new_thread); last = _switch(old_thread, new_thread); diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index ca72d7391d40..16001987ba71 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -135,12 +135,9 @@ static void flush_tmregs_to_thread(struct task_struct *tsk) if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current)) return; - if (MSR_TM_SUSPENDED(mfmsr())) { - tm_reclaim_current(TM_CAUSE_SIGNAL); - } else { - tm_enable(); - tm_save_sprs(&(tsk->thread)); - } + BUG_ON(MSR_TM_SUSPENDED(mfmsr())); + tm_enable(); + tm_save_sprs(&(tsk->thread)); } #else static inline void flush_tmregs_to_thread(struct task_struct *tsk) { } -- 2.16.2
[RFC PATCH 08/12] [WIP] powerpc/tm: Fix *unavailable_tm exceptions
--- arch/powerpc/kernel/process.c | 11 ++- arch/powerpc/kernel/traps.c | 3 --- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 574b05fe7d66..8a32fd062a2b 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -500,11 +500,20 @@ void giveup_all(struct task_struct *tsk) usermsr = tsk->thread.regs->msr; + /* +* The *_unavailable_tm() functions might call this in a +* transaction but with not FP or VEC or VSX meaning that the +* if condition below will be true, this is bad since we will +* have preformed a reclaim but not set the TIF flag which +* must be set in order to trigger the recheckpoint. +* +* possibleTODO: Move setting the TIF flag into reclaim code +*/ + check_if_tm_restore_required(tsk); if ((usermsr & msr_all_available) == 0) return; msr_check_and_set(msr_all_available); - check_if_tm_restore_required(tsk); WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 1e48d157196a..dccfcaf4f603 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1728,7 +1728,6 @@ void fp_unavailable_tm(struct pt_regs *regs) * If VMX is in use, the VRs now hold checkpointed values, * so we don't want to load the VRs from the thread_struct. */ - tm_recheckpoint(>thread); } void altivec_unavailable_tm(struct pt_regs *regs) @@ -1742,7 +1741,6 @@ void altivec_unavailable_tm(struct pt_regs *regs) regs->nip, regs->msr); tm_reclaim_current(TM_CAUSE_FAC_UNAV); current->thread.load_vec = 1; - tm_recheckpoint(>thread); current->thread.used_vr = 1; } @@ -1767,7 +1765,6 @@ void vsx_unavailable_tm(struct pt_regs *regs) current->thread.load_vec = 1; current->thread.load_fp = 1; - tm_recheckpoint(>thread); } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ -- 2.16.2
[RFC PATCH 11/12] [WIP] powerpc/tm: Afterthoughts
--- arch/powerpc/kernel/process.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 674f75c56172..6ce41ee62b24 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1079,6 +1079,12 @@ static inline void __switch_to_tm(struct task_struct *prev, if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && prev->thread.load_tm == 0) prev->thread.regs->msr &= ~MSR_TM; } + + /* +* Now that we're reclaiming on kernel entry, we should never +* get here still with user checkpointed state on the CPU +*/ + BUG_ON(MSR_TM_ACTIVE(mfmsr())); } /* @@ -1326,7 +1332,17 @@ struct task_struct *__switch_to(struct task_struct *prev, } if (current_thread_info()->task->thread.regs) { - restore_math(current_thread_info()->task->thread.regs); + /* +* Calling this now has reloaded the live state, which +* gets overwritten with the checkpointed state right +* before the trecheckpoint. BUT the MSR still has +* that the live state is on the CPU, which it isn't. +* +* restore_math(current_thread_info()->task->thread.regs); +* Therefore: +*/ + if (!MSR_TM_ACTIVE(current_thread_info()->task->thread.regs->msr)) + restore_math(current_thread_info()->task->thread.regs); /* * The copy-paste buffer can only store into foreign real -- 2.16.2
[RFC PATCH 04/12] selftests/powerpc: Use less common thread names
"ping" and "pong" (in particular "ping") are common names. If a selftests causes a kernel BUG_ON or any kind of backtrace the process name is displayed. Setting a more unique name avoids confusion as to which process caused the problem. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- tools/testing/selftests/powerpc/tm/tm-unavailable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c b/tools/testing/selftests/powerpc/tm/tm-unavailable.c index e6a0fad2bfd0..bcfa8add5748 100644 --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c @@ -315,7 +315,7 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr) rc = pthread_create(, attr, ping, (void *) ); if (rc) pr_err(rc, "pthread_create()"); - rc = pthread_setname_np(t0, "ping"); + rc = pthread_setname_np(t0, "tm-unavailable-ping"); if (rc) pr_warn(rc, "pthread_setname_np"); rc = pthread_join(t0, _value); @@ -359,7 +359,7 @@ int main(int argc, char **argv) pr_err(rc, "pthread_create()"); /* Name it for systemtap convenience */ - rc = pthread_setname_np(t1, "pong"); + rc = pthread_setname_np(t1, "tm-unavailable-pong"); if (rc) pr_warn(rc, "pthread_create()"); -- 2.16.2
[RFC PATCH 01/12] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()
tm_reclaim_thread() doesn't use the parameter anymore, both callers have to bother getting it as they have no need for a struct thread_info either. It was previously used but became unused in dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Just remove it and adjust the callers. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/process.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 1738c4127b32..77dc6d8288eb 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -850,8 +850,7 @@ static inline bool tm_enabled(struct task_struct *tsk) return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM); } -static void tm_reclaim_thread(struct thread_struct *thr, - struct thread_info *ti, uint8_t cause) +static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) { /* * Use the current MSR TM suspended bit to track if we have @@ -898,7 +897,7 @@ static void tm_reclaim_thread(struct thread_struct *thr, void tm_reclaim_current(uint8_t cause) { tm_enable(); - tm_reclaim_thread(>thread, current_thread_info(), cause); + tm_reclaim_thread(>thread, cause); } static inline void tm_reclaim_task(struct task_struct *tsk) @@ -929,7 +928,7 @@ static inline void tm_reclaim_task(struct task_struct *tsk) thr->regs->ccr, thr->regs->msr, thr->regs->trap); - tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED); + tm_reclaim_thread(thr, TM_CAUSE_RESCHED); TM_DEBUG("--- tm_reclaim on pid %d complete\n", tsk->pid); -- 2.16.2
[RFC PATCH 07/12] [WIP] powerpc/tm: Add TM_KERNEL_ENTRY in more delicate exception pathes
--- arch/powerpc/kernel/entry_64.S | 15 ++- arch/powerpc/kernel/exceptions-64s.S | 31 --- 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 107c15c6f48b..32e8d8f7e091 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -967,7 +967,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) bl __check_irq_replay cmpwi cr0,r3,0 beq .Lrestore_no_replay - + + /* +* We decide VERY late if we need to replay interrupts, theres +* not much which can be done about that so this will have to +* do +*/ + TM_KERNEL_ENTRY + /* +* This will restore r3 that TM_KERNEL_ENTRY clobbered. +* Clearly not ideal! I wonder if we could change the trap +* value beforehand... +*/ + bl __check_irq_replay + /* * We need to re-emit an interrupt. We do so by re-using our * existing exception frame. We first change the trap value, diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 3ac87e53b3da..c8899bf77fb0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -504,6 +504,11 @@ EXC_COMMON_BEGIN(data_access_common) li r5,0x300 std r3,_DAR(r1) std r4,_DSISR(r1) + /* +* Can't do TM_KERNEL_ENTRY here as do_hash_page might jump to +* very late in the expection exit code, well after any +* possiblity of doing a recheckpoint +*/ BEGIN_MMU_FTR_SECTION b do_hash_page/* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -548,6 +553,11 @@ EXC_COMMON_BEGIN(instruction_access_common) li r5,0x400 std r3,_DAR(r1) std r4,_DSISR(r1) + /* +* Can't do TM_KERNEL_ENTRY here as do_hash_page might jump to +* very late in the expection exit code, well after any +* possiblity of doing a recheckpoint +*/ BEGIN_MMU_FTR_SECTION b do_hash_page/* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -761,6 +771,7 @@ EXC_COMMON_BEGIN(alignment_common) std r4,_DSISR(r1) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) + TM_KERNEL_ENTRY addir3,r1,STACK_FRAME_OVERHEAD bl alignment_exception b ret_from_except @@ -1668,7 +1679,9 @@ do_hash_page: /* Here we have a page fault that hash_page can't handle. */ handle_page_fault: -11:andis. r0,r4,DSISR_DABRMATCH@h +11:TM_KERNEL_ENTRY + ld r4,_DSISR(r1) + andis. r0,r4,DSISR_DABRMATCH@h bne-handle_dabr_fault ld r4,_DAR(r1) ld r5,_DSISR(r1) @@ -1685,6 +1698,10 @@ handle_page_fault: /* We have a data breakpoint exception - handle it */ handle_dabr_fault: + /* +* Don't need to do TM_KERNEL_ENTRY here as we'll +* come from handle_page_fault: which has done it already +*/ bl save_nvgprs ld r4,_DAR(r1) ld r5,_DSISR(r1) @@ -1698,7 +1715,14 @@ handle_dabr_fault: * the PTE insertion */ 13:bl save_nvgprs - mr r5,r3 + /* +* Use a non-volatile as the TM code will call, r3 is the +* return value from __hash_page() so not exactly easy to get +* again. +*/ + mr r31,r3 + TM_KERNEL_ENTRY + mr r5, r31 addir3,r1,STACK_FRAME_OVERHEAD ld r4,_DAR(r1) bl low_hash_fault @@ -1713,7 +1737,8 @@ handle_dabr_fault: * the access, or panic if there isn't a handler. */ 77:bl save_nvgprs - mr r4,r3 + TM_KERNEL_ENTRY + ld r4,_DAR(r1) addir3,r1,STACK_FRAME_OVERHEAD li r5,SIGSEGV bl bad_page_fault -- 2.16.2
[RFC PATCH 09/12] [WIP] powerpc/tm: Tweak signal code to handle new reclaim/recheckpoint times
--- arch/powerpc/kernel/process.c | 13 - arch/powerpc/kernel/signal.c| 11 ++- arch/powerpc/kernel/signal_32.c | 16 ++-- arch/powerpc/kernel/signal_64.c | 41 + 4 files changed, 49 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 8a32fd062a2b..cd3ae80a6878 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1070,9 +1070,20 @@ void restore_tm_state(struct pt_regs *regs) * again, anything else could lead to an incorrect ckpt_msr being * saved and therefore incorrect signal contexts. */ - clear_thread_flag(TIF_RESTORE_TM); + + /* +* So, on signals we're going to have cleared the TM bits from +* the MSR, meaning that heading to userspace signal handler +* this will be true. +* I'm not convinced clearing the TIF_RESTORE_TM flag is a +* good idea however, we should do it only if we actually +* recheckpoint, which we'll need to do once the signal +* hanlder is done and we're returning to the main thread of +* execution. +*/ if (!MSR_TM_ACTIVE(regs->msr)) return; + clear_thread_flag(TIF_RESTORE_TM); msr_diff = current->thread.ckpt_regs.msr & ~regs->msr; msr_diff &= MSR_FP | MSR_VEC | MSR_VSX; diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 61db86ecd318..4f0398c6ce03 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -191,16 +191,17 @@ unsigned long get_tm_stackpointer(struct task_struct *tsk) * * For signals taken in non-TM or suspended mode, we use the * normal/non-checkpointed stack pointer. +* +* We now do reclaims on kernel entry, we should absolutely +* never need to reclaim here. +* TODO Update the comment above if needed. */ #ifdef CONFIG_PPC_TRANSACTIONAL_MEM BUG_ON(tsk != current); - if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) { - tm_reclaim_current(TM_CAUSE_SIGNAL); - if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr)) - return tsk->thread.ckpt_regs.gpr[1]; - } + if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr)) + return tsk->thread.ckpt_regs.gpr[1]; #endif return tsk->thread.regs->gpr[1]; } diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index a46de0035214..a87a7c8b5d9e 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -860,21 +860,9 @@ static long restore_tm_user_regs(struct pt_regs *regs, tm_enable(); /* Make sure the transaction is marked as failed */ current->thread.tm_texasr |= TEXASR_FS; - /* This loads the checkpointed FP/VEC state, if used */ - tm_recheckpoint(>thread); - /* This loads the speculative FP/VEC state, if used */ - msr_check_and_set(msr & (MSR_FP | MSR_VEC)); - if (msr & MSR_FP) { - load_fp_state(>thread.fp_state); - regs->msr |= (MSR_FP | current->thread.fpexc_mode); - } -#ifdef CONFIG_ALTIVEC - if (msr & MSR_VEC) { - load_vr_state(>thread.vr_state); - regs->msr |= MSR_VEC; - } -#endif + /* See comment in signal_64.c */ + set_thread_flag(TIF_RESTORE_TM); return 0; } diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 720117690822..a7751d1fcac6 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -568,21 +568,20 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, } } #endif - tm_enable(); /* Make sure the transaction is marked as failed */ tsk->thread.tm_texasr |= TEXASR_FS; - /* This loads the checkpointed FP/VEC state, if used */ - tm_recheckpoint(>thread); - msr_check_and_set(msr & (MSR_FP | MSR_VEC)); - if (msr & MSR_FP) { - load_fp_state(>thread.fp_state); - regs->msr |= (MSR_FP | tsk->thread.fpexc_mode); - } - if (msr & MSR_VEC) { - load_vr_state(>thread.vr_state); - regs->msr |= MSR_VEC; - } + /* +* I believe this is only nessesary if the +* clear_thread_flag(TIF_RESTORE_TM); in restore_tm_state() +* stays before the if (!MSR_TM_ACTIVE(regs->msr). +* +* Actually no, we should follow the comment in +* restore_tm_state() but this should ALSO be here if +* if the signal handler does something crazy like 'generate' +* a transaction. +*/ + set_thread_flag(TIF_RESTORE_TM); return err; } @@ -734,6 +733,22 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5, if
[RFC PATCH 03/12] selftests/powerpc: Add tm-signal-drop-transaction TM test
This test uses a signal to 'discard' a transaction. That is, it will take a signal of a thread in a suspended transaction and just remove the suspended MSR bit. Because this will send the userspace thread back to the tebgin + 4 address, we should also set CR0 to be nice. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- tools/testing/selftests/powerpc/tm/Makefile| 1 + .../powerpc/tm/tm-signal-drop-transaction.c| 74 ++ 2 files changed, 75 insertions(+) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index a23453943ad2..7a1e53297588 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -4,6 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ + tm-signal-drop-transaction \ $(SIGNAL_CONTEXT_CHK_TESTS) include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c b/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c new file mode 100644 index ..a8397f7e7faa --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c @@ -0,0 +1,74 @@ +/* + * Copyright 2018, Cyril Bur, IBM Corp. + * Licensed under GPLv2. + * + * This test uses a signal handler to make a thread go from + * transactional state to nothing state. In practice userspace, why + * would userspace ever do this? In theory, they can. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "tm.h" + +static bool passed; + +static void signal_usr1(int signum, siginfo_t *info, void *uc) +{ + ucontext_t *ucp = uc; + struct pt_regs *regs = ucp->uc_mcontext.regs; + + passed = true; + + /* I really hope I got that right, we wan't to clear both the MSR_TS bits */ + regs->msr &= ~(3ULL << 33); + /* Set CR0 to 0b0010 */ + regs->ccr &= ~(0xDULL << 28); +} + +int test_drop(void) +{ + struct sigaction act; + + SKIP_IF(!have_htm()); + + act.sa_sigaction = signal_usr1; + sigemptyset(_mask); + act.sa_flags = SA_SIGINFO; + if (sigaction(SIGUSR1, , NULL) < 0) { + perror("sigaction sigusr1"); + exit(1); + } + + + asm __volatile__( + "tbegin.;" + "beq1f; " + "tsuspend.;" + "1: ;" + : : : "memory", "cr0"); + + if (!passed && !tcheck_transactional()) { + fprintf(stderr, "Not in suspended state: 0x%1x\n", tcheck()); + exit(1); + } + + kill(getpid(), SIGUSR1); + + /* If we reach here, we've passed. Otherwise we've probably crashed +* the kernel */ + + return 0; +} + +int main(int argc, char *argv[]) +{ + return test_harness(test_drop, "tm_signal_drop_transaction"); +} -- 2.16.2
[RFC PATCH 05/12] [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit
--- arch/powerpc/include/asm/exception-64s.h | 25 + arch/powerpc/kernel/entry_64.S | 5 + arch/powerpc/kernel/process.c| 37 3 files changed, 63 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 471b2274fbeb..f904f19a9ec2 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -35,6 +35,7 @@ * implementations as possible. */ #include +#include /* PACA save area offsets (exgen, exmc, etc) */ #define EX_R9 0 @@ -127,6 +128,26 @@ hrfid; \ b hrfi_flush_fallback +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +#define TM_KERNEL_ENTRY \ + ld r3,_MSR(r1);\ + /* Probably don't need to check if coming from user/kernel */ \ + /* If TM is suspended or active then we must have come from*/ \ + /* userspace */ \ + andi. r0,r3,MSR_PR; \ + beq 1f; \ + rldicl. r3,r3,(64-MSR_TS_LG),(64-2); /* SUSPENDED or ACTIVE*/ \ + beql+ 1f; /* Not SUSPENDED or ACTIVE */ \ + bl save_nvgprs;\ + RECONCILE_IRQ_STATE(r10,r11); \ + li r3,TM_CAUSE_MISC; \ + bl tm_reclaim_current; /* uint8 cause */ \ +1: + +#else /* CONFIG_PPC_TRANSACTIONAL_MEM */ +#define TM_KERNEL_ENTRY +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ + #ifdef CONFIG_RELOCATABLE #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h) \ mfspr r11,SPRN_##h##SRR0; /* save SRR0 */ \ @@ -675,6 +696,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) EXCEPTION_PROLOG_COMMON(trap, area);\ /* Volatile regs are potentially clobbered here */ \ additions; \ + /* This is going to need to go somewhere else as well */\ + /* See comment in tm_recheckpoint() */\ + TM_KERNEL_ENTRY;\ addir3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr; \ b ret @@ -689,6 +713,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) EXCEPTION_PROLOG_COMMON_3(trap);\ /* Volatile regs are potentially clobbered here */ \ additions; \ + TM_KERNEL_ENTRY;\ addir3,r1,STACK_FRAME_OVERHEAD; \ bl hdlr diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..107c15c6f48b 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -126,6 +126,11 @@ BEGIN_FW_FTR_SECTION 33: END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR) #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */ + TM_KERNEL_ENTRY + REST_GPR(0,r1) + REST_4GPRS(3,r1) + REST_2GPRS(7,r1) + addir9,r1,STACK_FRAME_OVERHEAD /* * A syscall should always be called with interrupts enabled diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 77dc6d8288eb..ea75da0fd506 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -951,6 +951,23 @@ void tm_recheckpoint(struct thread_struct *thread) if (!(thread->regs->msr & MSR_TM)) return; + /* +* This is 'that' comment. +* +* If we get where with tm suspended or active then something +* has gone wrong. I've added this now as a proof of concept. +* +* The problem I'm seeing without it is an attempt to +* recheckpoint a CPU without a previous reclaim. +* +* I'm probably missed an exception entry with the +* TM_KERNEL_ENTRY macro. Should be easy enough to find. +*/ + if (MSR_TM_ACTIVE(mfmsr())) + return; + + tm_enable(); + /* We really can't be interrupted here as the TEXASR registers can't * change and later in the trecheckpoint code, we have a userspace R1. * So let's hard disable over this region. @@ -1009,6 +1026,13 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new) static inline void __switch_to_tm(struct task_struct *prev, struct task_struct *new) { + /* +* So, with the
[RFC PATCH 06/12] [WIP] powerpc/tm: Remove dead code from __switch_to_tm()
--- arch/powerpc/kernel/process.c | 24 +--- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ea75da0fd506..574b05fe7d66 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1027,27 +1027,13 @@ static inline void __switch_to_tm(struct task_struct *prev, struct task_struct *new) { /* -* So, with the rework none of this code should not be needed. -* I've left in the reclaim for now. This *should* save us -* from any mistake in the new code. Also the -* enabling/disabling logic of MSR_TM really should be +* The enabling/disabling logic of MSR_TM really should be * refactored into a common way with MSR_{FP,VEC,VSX} */ - if (cpu_has_feature(CPU_FTR_TM)) { - if (tm_enabled(prev) || tm_enabled(new)) - tm_enable(); - - if (tm_enabled(prev)) { - prev->thread.load_tm++; - tm_reclaim_task(prev); - /* -* The disabling logic may be confused don't -* disable for now -* -* if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && prev->thread.load_tm == 0) -* prev->thread.regs->msr &= ~MSR_TM; -*/ - } + if (cpu_has_feature(CPU_FTR_TM) && tm_enabled(prev)) { + prev->thread.load_tm++; + if (!MSR_TM_ACTIVE(prev->thread.regs->msr) && prev->thread.load_tm == 0) + prev->thread.regs->msr &= ~MSR_TM; } } -- 2.16.2
[RFC PATCH 02/12] selftests/powerpc: Fix tm.h helpers
Turns out the tcheck() helpers were subtly wrong Signed-off-by: Cyril Bur <cyril...@gmail.com> --- tools/testing/selftests/powerpc/tm/tm.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/tm.h b/tools/testing/selftests/powerpc/tm/tm.h index df4204247d45..e187a0d3160c 100644 --- a/tools/testing/selftests/powerpc/tm/tm.h +++ b/tools/testing/selftests/powerpc/tm/tm.h @@ -57,11 +57,11 @@ static inline bool failure_is_nesting(void) return (__builtin_get_texasru() & 0x40); } -static inline int tcheck(void) +static inline uint8_t tcheck(void) { - long cr; - asm volatile ("tcheck 0" : "=r"(cr) : : "cr0"); - return (cr >> 28) & 4; + unsigned long cr; + asm volatile ("tcheck 0; mfcr %0;" : "=r"(cr) : : "cr0"); + return (cr >> 28) & 0xF; } static inline bool tcheck_doomed(void) @@ -81,7 +81,7 @@ static inline bool tcheck_suspended(void) static inline bool tcheck_transactional(void) { - return tcheck() & 6; + return (tcheck_active()) || (tcheck_suspended()); } #endif /* _SELFTESTS_POWERPC_TM_TM_H */ -- 2.16.2
[RFC PATCH 00/12] Deal with TM on kernel entry and exit
This is very much a proof of concept and if it isn't clear from the commit names, still a work in progress. I believe I have something that works - all the powerpc selftests pass. I would like to get some eyes on it to a) see if I've missed anything big and b) some opinions on if it is looking like a net improvement. Obviously it is still a bit rough around the edges, I'll have to convince myself that the SPR code is correct. I don't think the TM_KERNEL_ENTRY macro needs to check that we came from userspace, if TM is on then we can probably assume. Maybe a check not in the fastpath. Some of the BUG_ON()s will probably go. Background: Currently TM is dealt with when we need to. That is, when we switch processes, we'll (if nessesary) reclaim the outgoing process and (if nessesary) recheckpoint the incoming process. Same with signals, if we need to deliver a signal, we'll ensure we've reclaimed in order to have all the information and go from there. I, along with some others got curious to see what it would look like if we did the 'opposite'. At all kernel entry points that won't simply just zoom straight to an RFID we now check if the thread was transactional and do the reclaim. Correspondingly do the recheckpoint quite late on exception exit. It turns out we already had a lot of the code pathes set up on the exit path as there were things that TM had special cased on exit already. I wasn't sure it it would lead to more or less complexity and though I'd have to try it to see. I feel like it was almost a win but SPRs did add some annoying caveats. In order to get this past Michael I'm going to prove it performs, or rather, doesn't slow anything down - workload suggestions welcome. Thanks, Cyril Bur (12): powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() selftests/powerpc: Fix tm.h helpers selftests/powerpc: Add tm-signal-drop-transaction TM test selftests/powerpc: Use less common thread names [WIP] powerpc/tm: Reclaim/recheckpoint on entry/exit [WIP] powerpc/tm: Remove dead code from __switch_to_tm() [WIP] powerpc/tm: Add TM_KERNEL_ENTRY in more delicate exception pathes [WIP] powerpc/tm: Fix *unavailable_tm exceptions [WIP] powerpc/tm: Tweak signal code to handle new reclaim/recheckpoint times [WIP] powerpc/tm: Correctly save/restore checkpointed sprs [WIP] powerpc/tm: Afterthoughts [WIP] selftests/powerpc: Remove incorrect tm-syscall selftest arch/powerpc/include/asm/exception-64s.h | 25 arch/powerpc/kernel/entry_64.S | 20 ++- arch/powerpc/kernel/exceptions-64s.S | 31 - arch/powerpc/kernel/process.c | 145 ++--- arch/powerpc/kernel/ptrace.c | 9 +- arch/powerpc/kernel/signal.c | 11 +- arch/powerpc/kernel/signal_32.c| 16 +-- arch/powerpc/kernel/signal_64.c| 41 -- arch/powerpc/kernel/traps.c| 3 - tools/testing/selftests/powerpc/tm/Makefile| 5 +- .../powerpc/tm/tm-signal-drop-transaction.c| 74 +++ .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 28 tools/testing/selftests/powerpc/tm/tm-syscall.c| 106 --- .../testing/selftests/powerpc/tm/tm-unavailable.c | 4 +- tools/testing/selftests/powerpc/tm/tm.h| 10 +- 15 files changed, 319 insertions(+), 209 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-drop-transaction.c delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S delete mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c -- 2.16.2
Re: [PATCH] pseries/drmem: Check for zero filled ibm, dynamic-memory property.
On Thu, 2018-02-15 at 21:27 -0600, Nathan Fontenot wrote: > Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory > node with a ibm,dynamic-memory property that is zero-filled. This causes > the drmem code to oops trying to parse this property. > > The fix for this is to validate that the property does contain LMB > entries before trying to parse it and bail if the count is zero. > > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=2048 > NUMA > pSeries > Modules linked in: > Supported: Yes > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-11.2-default #1 > task: c0007e639680 task.stack: c0007e648000 > NIP: c0c709a4 LR: c0c70998 CTR: > REGS: c0007e64b8d0 TRAP: 0300 Not tainted (4.12.14-11.2-default) > MSR: 80010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> > CR: 84000248 XER: > CFAR: c067018c DAR: 0010 DSISR: 4200 SOFTE: 1 > GPR00: c0c70998 c0007e64bb50 c1157b00 > GPR04: c0007e64bb70 002f 0022 > GPR08: 0003 c6f63fac c6f63fb0 001e > GPR12: cfa8 c000dca8 > GPR16: > GPR20: > GPR24: c0cccb98 c0c636f0 c0c56cd0 0007 > GPR28: c0cccba8 c0007c30 c0007e64bbf0 0010 > NIP [c0c709a4] read_drconf_v1_cell+0x54/0x9c > LR [c0c70998] read_drconf_v1_cell+0x48/0x9c > Call Trace: > [c0007e64bb50] [c0c56cd0] __param_initcall_debug+0x0/0x28 > (unreliable) > [c0007e64bb90] [c0c70e24] drmem_init+0x144/0x2f8 > [c0007e64bc40] [c000d034] do_one_initcall+0x64/0x1d0 > [c0007e64bd00] [c0c643d0] kernel_init_freeable+0x298/0x38c > [c0007e64bdc0] [c000dcc4] kernel_init+0x24/0x160 > [c0007e64be30] [c000b428] ret_from_kernel_thread+0x5c/0xb4 > Instruction dump: > 7c9e2378 6000 e9429050 e93e 7c240b78 7c7f1b78 f9240021 e86a0002 > 4804e41d 6000 e9210020 39490004 f9410020 39490010 7d004c2c > > The ibm,dynamic-reconfiguration-memory device tree property > generated that causes this: > > ibm,dynamic-reconfiguration-memory { > ibm,lmb-size = <0x0 0x1000>; > ibm,memory-flags-mask = <0xff>; > ibm,dynamic-memory = <0x0 0x0 0x0 0x0 0x0 0x0>; > linux,phandle = <0x7e57eed8>; > ibm,associativity-lookup-arrays = <0x1 0x4 0x0 0x0 0x0 0x0>; > ibm,memory-preservation-time = <0x0>; > }; > > Signed-off-by: Nathan Fontenot <nf...@linux.vnet.ibm.com> Works for me. Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > arch/powerpc/mm/drmem.c |8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c > index 1604110c4238..916844f99c64 100644 > --- a/arch/powerpc/mm/drmem.c > +++ b/arch/powerpc/mm/drmem.c > @@ -216,6 +216,8 @@ static void __init __walk_drmem_v1_lmbs(const __be32 > *prop, const __be32 *usm, > u32 i, n_lmbs; > > n_lmbs = of_read_number(prop++, 1); > + if (n_lmbs == 0) > + return; > > for (i = 0; i < n_lmbs; i++) { > read_drconf_v1_cell(, ); > @@ -245,6 +247,8 @@ static void __init __walk_drmem_v2_lmbs(const __be32 > *prop, const __be32 *usm, > u32 i, j, lmb_sets; > > lmb_sets = of_read_number(prop++, 1); > + if (lmb_sets == 0) > + return; > > for (i = 0; i < lmb_sets; i++) { > read_drconf_v2_cell(_cell, ); > @@ -354,6 +358,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop) > struct drmem_lmb *lmb; > > drmem_info->n_lmbs = of_read_number(prop++, 1); > + if (drmem_info->n_lmbs == 0) > + return; > > drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb), > GFP_KERNEL); > @@ -373,6 +379,8 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop) > int lmb_index; > > lmb_sets = of_read_number(prop++, 1); > + if (lmb_sets == 0) > + return; > > /* first pass, calculate the number of LMBs */ > p = prop; >
Re: 4.16-rc1 virtual machine crash on boot
On Tue, 2018-02-13 at 21:12 -0800, Tyrel Datwyler wrote: > On 02/13/2018 05:20 PM, Cyril Bur wrote: > > Hello all, > > Does reverting commit 02ef6dd8109b581343ebeb1c4c973513682535d6 alleviate the > issue? > Hi Tyrel, No it doesn't. Same backtrace. > -Tyrel > > > > > I'm seeing this crash trying to boot a KVM virtual machine. This kernel > > was compiled with pseries_le_defconfig and run using the following qemu > > commandline: > > > > qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries > > -nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive > > file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0 > > -device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append > > 'root=/dev/vdb1 rw cloud-init=disabled' > > > > qemu-system-ppc64 --version > > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright > > (c) 2003-2008 Fabrice Bellard > > > > > > Key type dns_resolver registered > > Unable to handle kernel paging request for data at address 0x0010 > > Faulting instruction address: 0xc18f2bbc > > Oops: Kernel access of bad area, sig: 11 [#1] > > LE SMP NR_CPUS=2048 NUMA pSeries > > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8 > > NIP: c18f2bbc LR: c18f2bb4 CTR: > > REGS: c000fea838d0 TRAP: 0380 Not tainted (4.16.0-rc1v4.16-rc1) > > MSR: 82009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84000248 XER: > > 2000 > > CFAR: c19591a0 SOFTE: 0 > > GPR00: c18f2bb4 c000fea83b50 c1bd8400 > > > > GPR04: c000fea83b70 002f > > 0022 > > GPR08: c22a3e90 > > 0220 > > GPR12: cfb40980 c000d698 > > > > GPR16: > > > > GPR20: > > > > GPR24: c18b9248 c18e36d8 > > c19738a8 > > GPR28: 0007 c000fc68 c000fea83bf0 > > 0010 > > NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c > > LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c > > Call Trace: > > [c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c > > (unreliable) > > [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec > > [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac > > [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358 > > [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160 > > [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc > > Instruction dump: > > 7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e > > e8690002 > > f9440021 4806657d 6000 e9210020 39090004 39490010 > > f9010020 > > ---[ end trace bd9f49f482d30e03 ]--- > > > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > > > WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883 > > do_unblank_screen+0x1f0/0x270 > > CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D 4.16.0- > > rc1v4.16-rc1 #8 > > NIP: c09aa800 LR: c09aa63c CTR: c148f5f0 > > REGS: c000fea832c0 TRAP: 0700 Tainted: > > G D (4.16.0-rc1v4.16-rc1) > > MSR: 82029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 2800 XER: > > 2000 > > CFAR: c09aa658 SOFTE: 1 > > GPR00: c09aa63c c000fea83540 c1bd8400 > > > > GPR04: 0001 c000fb0c200e 1dd7 > > c000fea834d0 > > GPR08: fe43 > > 0001 > > GPR12: 28002428 cfb40980 c000d698 > > > > GPR16: > > > > GPR20: > > > > GPR24: c000fea4 c000feadf910 c1a4a7a8 > > c1cc4ea0 > > GPR28: c173f4f0 c1cc4ec8 > > > > NIP [c09aa800] do_unblank_screen+0x1f0/0x270 > > LR [c09aa63c] do_unblank_screen+0x2c/0x270 > > Call Trace: > > [c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x27
[PATCH] powerpc: Expose TSCR via sysfs only on powernv
The TSCR can only be accessed in hypervisor mode. Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/sysfs.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 5a8bfee6e187..04d0bbd7a1dd 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -788,7 +788,8 @@ static int register_cpu_online(unsigned int cpu) if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2)) device_create_file(s, _attr_pir); - if (cpu_has_feature(CPU_FTR_ARCH_206)) + if (cpu_has_feature(CPU_FTR_ARCH_206) && + !firmware_has_feature(FW_FEATURE_LPAR)) device_create_file(s, _attr_tscr); #endif /* CONFIG_PPC64 */ @@ -873,7 +874,8 @@ static int unregister_cpu_online(unsigned int cpu) if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2)) device_remove_file(s, _attr_pir); - if (cpu_has_feature(CPU_FTR_ARCH_206)) + if (cpu_has_feature(CPU_FTR_ARCH_206) && + !firmware_has_feature(FW_FEATURE_LPAR)) device_remove_file(s, _attr_tscr); #endif /* CONFIG_PPC64 */ -- 2.16.1
4.16-rc1 virtual machine crash on boot
Hello all, I'm seeing this crash trying to boot a KVM virtual machine. This kernel was compiled with pseries_le_defconfig and run using the following qemu commandline: qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries -nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0 -device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append 'root=/dev/vdb1 rw cloud-init=disabled' qemu-system-ppc64 --version QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright (c) 2003-2008 Fabrice Bellard Key type dns_resolver registered Unable to handle kernel paging request for data at address 0x0010 Faulting instruction address: 0xc18f2bbc Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8 NIP: c18f2bbc LR: c18f2bb4 CTR: REGS: c000fea838d0 TRAP: 0380 Not tainted (4.16.0-rc1v4.16-rc1) MSR: 82009033CR: 84000248 XER: 2000 CFAR: c19591a0 SOFTE: 0 GPR00: c18f2bb4 c000fea83b50 c1bd8400 GPR04: c000fea83b70 002f 0022 GPR08: c22a3e90 0220 GPR12: cfb40980 c000d698 GPR16: GPR20: GPR24: c18b9248 c18e36d8 c19738a8 GPR28: 0007 c000fc68 c000fea83bf0 0010 NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c Call Trace: [c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c (unreliable) [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358 [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160 [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc Instruction dump: 7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e e8690002 f9440021 4806657d 6000 e9210020 39090004 39490010 f9010020 ---[ end trace bd9f49f482d30e03 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883 do_unblank_screen+0x1f0/0x270 CPU: 1 PID: 1 Comm: swapper/0 Tainted: G D 4.16.0- rc1v4.16-rc1 #8 NIP: c09aa800 LR: c09aa63c CTR: c148f5f0 REGS: c000fea832c0 TRAP: 0700 Tainted: G D (4.16.0-rc1v4.16-rc1) MSR: 82029033 CR: 2800 XER: 2000 CFAR: c09aa658 SOFTE: 1 GPR00: c09aa63c c000fea83540 c1bd8400 GPR04: 0001 c000fb0c200e 1dd7 c000fea834d0 GPR08: fe43 0001 GPR12: 28002428 cfb40980 c000d698 GPR16: GPR20: GPR24: c000fea4 c000feadf910 c1a4a7a8 c1cc4ea0 GPR28: c173f4f0 c1cc4ec8 NIP [c09aa800] do_unblank_screen+0x1f0/0x270 LR [c09aa63c] do_unblank_screen+0x2c/0x270 Call Trace: [c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x270 (unreliable) [c000fea835b0] [c08a2a70] bust_spinlocks+0x40/0x80 [c000fea835d0] [c00da90c] panic+0x1b8/0x32c [c000fea83670] [c00e1bd4] do_exit+0xcb4/0xcc0 [c000fea83730] [c00275fc] die+0x29c/0x450 [c000fea837c0] [c0053f88] bad_page_fault+0xe8/0x160 [c000fea83830] [c0028a90] slb_miss_bad_addr+0x40/0x90 [c000fea83860] [c0008b08] bad_addr_slb+0x158/0x160 --- interrupt: 380 at read_drconf_v1_cell+0x50/0x9c LR = read_drconf_v1_cell+0x48/0x9c [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358 [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160 [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc Instruction dump: 3c62ffbf 38840001 7c8407b4 38639ca8 4b7ae0ed 6000 38210070 e8010010 ebc1fff0 ebe1fff8 7c0803a6 4e800020 <0fe0> 4bfffe58 6000 6042 ---[ end trace bd9f49f482d30e04 ]--- Rebooting in 10 seconds..
[PATCH] powerpc/tm: Update function prototype comment
In commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint") __tm_recheckpoint was modified to no longer take the second parameter 'unsigned long orig_msr' as part of a TM rewrite to simplify the reclaiming/recheckpointing process. There is a comment in the asm file where the function is delcared which has an incorrect prototype with the 'orig_msr' parameter. This patch corrects the comment. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/tm.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S index b92ac8e711db..2eb20264e70d 100644 --- a/arch/powerpc/kernel/tm.S +++ b/arch/powerpc/kernel/tm.S @@ -300,8 +300,8 @@ _GLOBAL(tm_reclaim) blr - /* void __tm_recheckpoint(struct thread_struct *thread, -*unsigned long orig_msr) + /* +* void __tm_recheckpoint(struct thread_struct *thread) * - Restore the checkpointed register state saved by tm_reclaim *when we switch_to a process. * -- 2.16.1
Re: [PATCH] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()
On Thu, 2018-02-01 at 15:46 +1100, Michael Ellerman wrote: > Cyril Bur <cyril...@gmail.com> writes: > > > tm_reclaim_thread() doesn't use the parameter anymore, both callers have > > to bother getting it as they have no need for a struct thread_info > > either. > > In future please tell me why the parameter is unused and when it became > unused. > Thanks, will do! > In this case it was previously used but the last usage was removed in: > > dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live > registers") > > cheers > > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index bfdd783e3916..a47498da6562 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -853,8 +853,7 @@ static inline bool tm_enabled(struct task_struct *tsk) > > return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM); > > } > > > > -static void tm_reclaim_thread(struct thread_struct *thr, > > - struct thread_info *ti, uint8_t cause) > > +static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) > > { > > /* > > * Use the current MSR TM suspended bit to track if we have > > @@ -901,7 +900,7 @@ static void tm_reclaim_thread(struct thread_struct *thr, > > void tm_reclaim_current(uint8_t cause) > > { > > tm_enable(); > > - tm_reclaim_thread(>thread, current_thread_info(), cause); > > + tm_reclaim_thread(>thread, cause); > > } > > > > static inline void tm_reclaim_task(struct task_struct *tsk) > > @@ -932,7 +931,7 @@ static inline void tm_reclaim_task(struct task_struct > > *tsk) > > thr->regs->ccr, thr->regs->msr, > > thr->regs->trap); > > > > - tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED); > > + tm_reclaim_thread(thr, TM_CAUSE_RESCHED); > > > > TM_DEBUG("--- tm_reclaim on pid %d complete\n", > > tsk->pid); > > -- > > 2.16.1
[PATCH] powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()
tm_reclaim_thread() doesn't use the parameter anymore, both callers have to bother getting it as they have no need for a struct thread_info either. Just remove it and adjust the callers. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/process.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index bfdd783e3916..a47498da6562 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -853,8 +853,7 @@ static inline bool tm_enabled(struct task_struct *tsk) return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM); } -static void tm_reclaim_thread(struct thread_struct *thr, - struct thread_info *ti, uint8_t cause) +static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause) { /* * Use the current MSR TM suspended bit to track if we have @@ -901,7 +900,7 @@ static void tm_reclaim_thread(struct thread_struct *thr, void tm_reclaim_current(uint8_t cause) { tm_enable(); - tm_reclaim_thread(>thread, current_thread_info(), cause); + tm_reclaim_thread(>thread, cause); } static inline void tm_reclaim_task(struct task_struct *tsk) @@ -932,7 +931,7 @@ static inline void tm_reclaim_task(struct task_struct *tsk) thr->regs->ccr, thr->regs->msr, thr->regs->trap); - tm_reclaim_thread(thr, task_thread_info(tsk), TM_CAUSE_RESCHED); + tm_reclaim_thread(thr, TM_CAUSE_RESCHED); TM_DEBUG("--- tm_reclaim on pid %d complete\n", tsk->pid); -- 2.16.1
Re: [PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable
On Mon, 2017-12-11 at 13:02 +1100, Michael Ellerman wrote: > Cyril Bur <cyril...@gmail.com> writes: > > > On Tue, 2017-11-21 at 11:31 -0200, Gustavo Romero wrote: > > > Hi Cyril, > > > > > > On 21-11-2017 05:17, Cyril Bur wrote: > > > > Currently the tm-unavailable test spins for a fixed amount of time in > > > > an attempt to ensure the FPU/VMX/VSX facilities are off. This value was > > > > experimentally tested to be long enough. > > > > > > > > Problems may arise if kernel heuristics were to change. This patch > > > > should future proof this test. > > > > > > I've tried it on a VM running on '4.14.0-rc7' and apparently it gets stuck > > > pretty slow on calibration, since it ran ~7m without finding the correct > > > value > > > (before it would take about 3m), like: > > > > > > $ time ./tm-unavailable > > > Testing required spin time required for facility unavailable... > > > Trying 0x1800... > > > Trying 0x1900... > > > Trying 0x1a00... > > > ... > > > Trying 0xfd00... ^C > > > > > > real 7m15.304s > > > user 7m15.291s > > > sys 0m0.004s > > > > > > > Interesting! I didn't test in a VM. I guess hypervisor switching > > completely changes the heuristic. Ok I'll have to rethink. > > > > Maybe the increase should be a multiplier to get to a good state more > > quickly. > > Yeah this sucks in a VM: > > # /home/michael/tm-unavailable > Testing required spin time required for facility unavailable... > Trying 0x1800... > Trying 0x1900... > ... > Trying 0x11000... > > etc. > > I got sick of waiting for it, but it's causing my selftests job to time > out so it must be taking > ~1 hour. > Yeah sorry, I'll see if I can come up with a better way for a VM. Needs a few more cycles from me. Cyril > cheers
Re: [PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable
On Tue, 2017-11-21 at 11:31 -0200, Gustavo Romero wrote: > Hi Cyril, > > On 21-11-2017 05:17, Cyril Bur wrote: > > Currently the tm-unavailable test spins for a fixed amount of time in > > an attempt to ensure the FPU/VMX/VSX facilities are off. This value was > > experimentally tested to be long enough. > > > > Problems may arise if kernel heuristics were to change. This patch > > should future proof this test. > > I've tried it on a VM running on '4.14.0-rc7' and apparently it gets stuck > pretty slow on calibration, since it ran ~7m without finding the correct value > (before it would take about 3m), like: > > $ time ./tm-unavailable > Testing required spin time required for facility unavailable... > Trying 0x1800... > Trying 0x1900... > Trying 0x1a00... > ... > Trying 0xfd00... ^C > > real 7m15.304s > user 7m15.291s > sys 0m0.004s > Interesting! I didn't test in a VM. I guess hypervisor switching completely changes the heuristic. Ok I'll have to rethink. Maybe the increase should be a multiplier to get to a good state more quickly. > Trying it on a BM running on '4.13.0-rc2' it indeed found an initial value for > the timeout but for some reason the value was not sufficient for the > subsequent > tests and the value raised more and more (I understand that it's an expected > behavior tho). Even tho it runs about half the time (~3m, good!) but I think > the > output could be little bit less "overloaded": > Happy to put some (or all) of that output inside if (DEBUG) > $ ./tm-unavailable > Testing required spin time required for facility unavailable... > Trying 0x1800... > Trying 0x1900... > Trying 0x1a00... > Trying 0x1b00... > Trying 0x1c00... > Trying 0x1d00... > Trying 0x1e00... > Trying 0x1f00... 1, 2, 3 > Spin time required for a reliable facility unavailable TM failure: 0x1f00 > Checking if FP/VEC registers are sane after a FP unavailable exception... > If MSR.FP=0 MSR.VEC=0: > Expecting the transaction to fail, but it didn't > FP ok VEC ok > Adjusting the facility unavailable spin time... > Trying 0x2100... 1, 2, 3 > Now using 0x2100 > If MSR.FP=0 MSR.VEC=0: > Expecting the transaction to fail, but it didn't > FP ok VEC ok > Adjusting the facility unavailable spin time... > Trying 0x2300... 1, 2, 3 > Now using 0x2300 > If MSR.FP=1 MSR.VEC=0: FP ok VEC ok > If MSR.FP=0 MSR.VEC=1: > Expecting the transaction to fail, but it didn't > FP ok VEC ok > Now using 0x4700 > ... > > So, putting output question aside, are you getting a different result on VM, > i.e. did you notice if it got stuck/pretty slow? > > > Regards, > Gustavo > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > --- > > Because the test no longer needs to use such a conservative time for > > the busy wait, it actually runs much faster. > > > > > > .../testing/selftests/powerpc/tm/tm-unavailable.c | 92 > > -- > > 1 file changed, 84 insertions(+), 8 deletions(-) > > > > diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > > b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > > index e6a0fad2bfd0..54aeb7a7fbb1 100644 > > --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > > +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > > @@ -33,6 +33,11 @@ > > #define VEC_UNA_EXCEPTION 1 > > #define VSX_UNA_EXCEPTION 2 > > > > +#define ERR_RETRY 1 > > +#define ERR_ADJUST 2 > > + > > +#define COUNTER_INCREMENT (0x100) > > + > > #define NUM_EXCEPTIONS 3 > > #define err_at_line(status, errnum, format, ...) \ > > error_at_line(status, errnum, __FILE__, __LINE__, format ##__VA_ARGS__) > > @@ -45,6 +50,7 @@ struct Flags { > > int touch_vec; > > int result; > > int exception; > > + uint64_t counter; > > } flags; > > > > bool expecting_failure(void) > > @@ -87,14 +93,12 @@ void *ping(void *input) > > * Expected values for vs0 and vs32 after a TM failure. They must never > > * change, otherwise they got corrupted. > > */ > > + long rc = 0; > > uint64_t high_vs0 = 0x; > > uint64_t low_vs0 = 0x; > > uint64_t high_vs32 = 0x; > > uint64_t low_vs32 = 0x; > > > > - /* Counter for busy wait */ > > - uint64_t counter =
[PATCH 2/2] selftests/powerpc: Calculate spin time in tm-unavailable
Currently the tm-unavailable test spins for a fixed amount of time in an attempt to ensure the FPU/VMX/VSX facilities are off. This value was experimentally tested to be long enough. Problems may arise if kernel heuristics were to change. This patch should future proof this test. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- Because the test no longer needs to use such a conservative time for the busy wait, it actually runs much faster. .../testing/selftests/powerpc/tm/tm-unavailable.c | 92 -- 1 file changed, 84 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c b/tools/testing/selftests/powerpc/tm/tm-unavailable.c index e6a0fad2bfd0..54aeb7a7fbb1 100644 --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c @@ -33,6 +33,11 @@ #define VEC_UNA_EXCEPTION 1 #define VSX_UNA_EXCEPTION 2 +#define ERR_RETRY 1 +#define ERR_ADJUST 2 + +#define COUNTER_INCREMENT (0x100) + #define NUM_EXCEPTIONS 3 #define err_at_line(status, errnum, format, ...) \ error_at_line(status, errnum, __FILE__, __LINE__, format ##__VA_ARGS__) @@ -45,6 +50,7 @@ struct Flags { int touch_vec; int result; int exception; + uint64_t counter; } flags; bool expecting_failure(void) @@ -87,14 +93,12 @@ void *ping(void *input) * Expected values for vs0 and vs32 after a TM failure. They must never * change, otherwise they got corrupted. */ + long rc = 0; uint64_t high_vs0 = 0x; uint64_t low_vs0 = 0x; uint64_t high_vs32 = 0x; uint64_t low_vs32 = 0x; - /* Counter for busy wait */ - uint64_t counter = 0x1ff00; - /* * Variable to keep a copy of CR register content taken just after we * leave the transactional state. @@ -217,7 +221,7 @@ void *ping(void *input) [ex_fp] "i" (FP_UNA_EXCEPTION), [ex_vec]"i" (VEC_UNA_EXCEPTION), [ex_vsx]"i" (VSX_UNA_EXCEPTION), - [counter] "r" (counter) + [counter] "r" (flags.counter) : "cr0", "ctr", "v10", "vs0", "vs10", "vs3", "vs32", "vs33", "vs34", "fr10" @@ -232,14 +236,14 @@ void *ping(void *input) if (expecting_failure() && !is_failure(cr_)) { printf("\n\tExpecting the transaction to fail, %s", "but it didn't\n\t"); - flags.result++; + rc = ERR_ADJUST; } /* Check if we were not expecting a failure and a it occurred. */ if (!expecting_failure() && is_failure(cr_)) { printf("\n\tUnexpected transaction failure 0x%02lx\n\t", failure_code()); - return (void *) -1; + rc = ERR_RETRY; } /* @@ -249,7 +253,7 @@ void *ping(void *input) if (is_failure(cr_) && !failure_is_unavailable()) { printf("\n\tUnexpected failure cause 0x%02lx\n\t", failure_code()); - return (void *) -1; + rc = ERR_RETRY; } /* 0x4 is a success and 0xa is a fail. See comment in is_failure(). */ @@ -276,7 +280,7 @@ void *ping(void *input) putchar('\n'); - return NULL; + return (void *)rc; } /* Thread to force context switch */ @@ -291,6 +295,55 @@ void *pong(void *not_used) sched_yield(); } +static void flags_set_counter(struct Flags *flags) +{ + uint64_t cr_; + int count = 0; + + do { + if (count == 0) + printf("\tTrying 0x%08" PRIx64 "... ", flags->counter); + else + printf("%d, ", count); + fflush(stdout); + asm ( + /* +* Wait an amount of context switches so +* load_fp and load_vec overflow and MSR.FP, +* MSR.VEC, and MSR.VSX become zero (off). +*/ + " mtctr %[counter] ;" + + /* Decrement CTR branch if CTR non zero. */ + "1: bdnz 1b ;" + " tbegin. ;" + " beq tfail ;" + + /* Get a facility unavailable */ + "
[PATCH 1/2] selftests/powerpc: Check for pthread errors in tm-unavailable
Signed-off-by: Cyril Bur <cyril...@gmail.com> --- .../testing/selftests/powerpc/tm/tm-unavailable.c | 43 +- 1 file changed, 34 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c b/tools/testing/selftests/powerpc/tm/tm-unavailable.c index 96c37f84ce54..e6a0fad2bfd0 100644 --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c @@ -15,6 +15,7 @@ */ #define _GNU_SOURCE +#include #include #include #include @@ -33,6 +34,11 @@ #define VSX_UNA_EXCEPTION 2 #define NUM_EXCEPTIONS 3 +#define err_at_line(status, errnum, format, ...) \ + error_at_line(status, errnum, __FILE__, __LINE__, format ##__VA_ARGS__) + +#define pr_warn(code, format, ...) err_at_line(0, code, format, ##__VA_ARGS__) +#define pr_err(code, format, ...) err_at_line(1, code, format, ##__VA_ARGS__) struct Flags { int touch_fp; @@ -303,10 +309,19 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr) * checking if the failure cause is the one we expect. */ do { + int rc; + /* Bind 'ping' to CPU 0, as specified in 'attr'. */ - pthread_create(, attr, ping, (void *) ); - pthread_setname_np(t0, "ping"); - pthread_join(t0, _value); + rc = pthread_create(, attr, ping, (void *) ); + if (rc) + pr_err(rc, "pthread_create()"); + rc = pthread_setname_np(t0, "ping"); + if (rc) + pr_warn(rc, "pthread_setname_np"); + rc = pthread_join(t0, _value); + if (rc) + pr_err(rc, "pthread_join"); + retries--; } while (ret_value != NULL && retries); @@ -320,7 +335,7 @@ void test_fp_vec(int fp, int vec, pthread_attr_t *attr) int main(int argc, char **argv) { - int exception; /* FP = 0, VEC = 1, VSX = 2 */ + int rc, exception; /* FP = 0, VEC = 1, VSX = 2 */ pthread_t t1; pthread_attr_t attr; cpu_set_t cpuset; @@ -330,13 +345,23 @@ int main(int argc, char **argv) CPU_SET(0, ); /* Init pthread attribute. */ - pthread_attr_init(); + rc = pthread_attr_init(); + if (rc) + pr_err(rc, "pthread_attr_init()"); /* Set CPU 0 mask into the pthread attribute. */ - pthread_attr_setaffinity_np(, sizeof(cpu_set_t), ); - - pthread_create(, /* Bind 'pong' to CPU 0 */, pong, NULL); - pthread_setname_np(t1, "pong"); /* Name it for systemtap convenience */ + rc = pthread_attr_setaffinity_np(, sizeof(cpu_set_t), ); + if (rc) + pr_err(rc, "pthread_attr_setaffinity_np()"); + + rc = pthread_create(, /* Bind 'pong' to CPU 0 */, pong, NULL); + if (rc) + pr_err(rc, "pthread_create()"); + + /* Name it for systemtap convenience */ + rc = pthread_setname_np(t1, "pong"); + if (rc) + pr_warn(rc, "pthread_create()"); flags.result = 0; -- 2.15.0
Re: [PATCH v5 06/10] powerpc/opal: Rework the opal-async interface
On Mon, 2017-11-06 at 20:41 +1100, Michael Ellerman wrote: > Cyril Bur <cyril...@gmail.com> writes: > > > diff --git a/arch/powerpc/platforms/powernv/opal-async.c > > b/arch/powerpc/platforms/powernv/opal-async.c > > index c43421ab2d2f..fbae8a37ce2c 100644 > > --- a/arch/powerpc/platforms/powernv/opal-async.c > > +++ b/arch/powerpc/platforms/powernv/opal-async.c > > @@ -23,40 +23,45 @@ > > #include > > #include > > > > -#define N_ASYNC_COMPLETIONS64 > > +enum opal_async_token_state { > > + ASYNC_TOKEN_UNALLOCATED = 0, > > + ASYNC_TOKEN_ALLOCATED, > > + ASYNC_TOKEN_COMPLETED > > +}; > > + > > +struct opal_async_token { > > + enum opal_async_token_state state; > > + struct opal_msg response; > > +}; > > > > -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = > > {~0UL}; > > -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); > > static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); > > static DEFINE_SPINLOCK(opal_async_comp_lock); > > static struct semaphore opal_async_sem; > > -static struct opal_msg *opal_async_responses; > > static unsigned int opal_max_async_tokens; > > +static struct opal_async_token *opal_async_tokens; > > > > static int __opal_async_get_token(void) > > { > > unsigned long flags; > > - int token; > > + int token = -EBUSY; > > > > spin_lock_irqsave(_async_comp_lock, flags); > > - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); > > - if (token >= opal_max_async_tokens) { > > - token = -EBUSY; > > - goto out; > > + for (token = 0; token < opal_max_async_tokens; token++) { > > + if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) { > > + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; > > + goto out; > > + } > > } > > - > > - if (__test_and_set_bit(token, opal_async_token_map)) { > > - token = -EBUSY; > > - goto out; > > - } > > - > > - __clear_bit(token, opal_async_complete_map); > > - > > out: > > spin_unlock_irqrestore(_async_comp_lock, flags); > > return token; > > } > > Resulting in: > > static int __opal_async_get_token(void) > { > unsigned long flags; > + int token = -EBUSY; > > spin_lock_irqsave(_async_comp_lock, flags); > + for (token = 0; token < opal_max_async_tokens; token++) { > + if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) { > + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; > + goto out; > + } > } > out: > spin_unlock_irqrestore(_async_comp_lock, flags); > return token; > } > > So when no unallocated token is found we return opal_max_async_tokens :( > > I changed it to: > > static int __opal_async_get_token(void) > { > unsigned long flags; > int i, token = -EBUSY; > > spin_lock_irqsave(_async_comp_lock, flags); > > for (i = 0; i < opal_max_async_tokens; i++) { > if (opal_async_tokens[i].state == ASYNC_TOKEN_UNALLOCATED) { > opal_async_tokens[i].state = ASYNC_TOKEN_ALLOCATED; > token = i; > break; > } > } > > spin_unlock_irqrestore(_async_comp_lock, flags); > return token; > } > > Thanks!! > > > > +/* > > + * Note: If the returned token is used in an opal call and opal returns > > + * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before > > ^ > call > > > cheers
Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM
On Fri, 2017-11-03 at 10:28 -0200, Gustavo Romero wrote: > Hi Cyril! > > On 01-11-2017 20:10, Cyril Bur wrote: > > Thanks Gustavo, > > > > I do have one more thought on an improvement for this test which is > > that: > > + /* Counter for busy wait * > > + uint64_t counter = 0x1ff00; > > is a bit fragile, what we should do is have the test work out long it > > should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then > > use that for these tests. > > > > This will only become a problem if we were to change kernel heuristics > > which is fine for now. I'll try to get that added soon but for now this > > test has proven too useful to delay adding as is. > > I see. Yup, 'counter' value was indeed determined experimentally under many > different scenarios (VM and BM, different CPU loads, etc). At least if the > heuristics changes hurting the test it will catch that pointing out that > the expected failure did not happen, like: > > Checking if FP/VEC registers are sane after a FP unavailable exception... > If MSR.FP=0 MSR.VEC=0: > Expecting the transaction to fail, but it didn't > FP ok VEC ok > ... > > So it won't let the hurting change pass fine silently :-) > Yeah, all for merging as is. It would be nice so that when someone does make a heuristic change they don't also have to go fix tests - there is nothing more annoying than a fragile test suite. > > > > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com> > > > Signed-off-by: Breno Leitao <lei...@debian.org> > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > Thanks a lot for reviewing it. > > Cheers, > Gustavo >
[PATCH v5 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()
The OPAL calls performed in this driver shouldn't be using opal_async_wait_response() as this performs a wait_event() which, on long running OPAL calls could result in hung task warnings. wait_event() prevents timely signal delivery which is also undesirable. This patch also attempts to quieten down the use of dev_err() when errors haven't actually occurred and also to return better information up the stack rather than always -EIO. Signed-off-by: Cyril Bur <cyril...@gmail.com> Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com> --- drivers/mtd/devices/powernv_flash.c | 57 +++-- 1 file changed, 35 insertions(+), 22 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 3343d4f5c4f3..26f9feaa5d17 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -89,33 +89,46 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, return -EIO; } - if (rc == OPAL_SUCCESS) - goto out_success; + if (rc == OPAL_ASYNC_COMPLETION) { + rc = opal_async_wait_response_interruptible(token, ); + if (rc) { + /* +* If we return the mtd core will free the +* buffer we've just passed to OPAL but OPAL +* will continue to read or write from that +* memory. +* It may be tempting to ultimately return 0 +* if we're doing a read or a write since we +* are going to end up waiting until OPAL is +* done. However, because the MTD core sends +* us the userspace request in chunks, we need +* it to know we've been interrupted. +*/ + rc = -EINTR; + if (opal_async_wait_response(token, )) + dev_err(dev, "opal_async_wait_response() failed\n"); + goto out; + } + rc = opal_get_async_rc(msg); + } - if (rc != OPAL_ASYNC_COMPLETION) { + /* +* OPAL does mutual exclusion on the flash, it will return +* OPAL_BUSY. +* During firmware updates by the service processor OPAL may +* be (temporarily) prevented from accessing the flash, in +* this case OPAL will also return OPAL_BUSY. +* Both cases aren't errors exactly but the flash could have +* changed, userspace should be informed. +*/ + if (rc != OPAL_SUCCESS && rc != OPAL_BUSY) dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - rc = -EIO; - goto out; - } - rc = opal_async_wait_response(token, ); - if (rc) { - dev_err(dev, "opal async wait failed (rc %d)\n", rc); - rc = -EIO; - goto out; - } - - rc = opal_get_async_rc(msg); -out_success: - if (rc == OPAL_SUCCESS) { - rc = 0; - if (retlen) - *retlen = len; - } else { - rc = -EIO; - } + if (rc == OPAL_SUCCESS && retlen) + *retlen = len; + rc = opal_error_code(rc); out: opal_async_release_token(token); return rc; -- 2.15.0
[PATCH v5 03/10] mtd: powernv_flash: Remove pointless goto in driver init
powernv_flash_probe() has pointless goto statements which jump to the end of the function to simply return a variable. Rather than checking for error and going to the label, just return the error as soon as it is detected. Signed-off-by: Cyril Bur <cyril...@gmail.com> Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com> --- drivers/mtd/devices/powernv_flash.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index ca3ca6adf71e..4dd3b5d2feb2 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -227,21 +227,20 @@ static int powernv_flash_probe(struct platform_device *pdev) int ret; data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); - if (!data) { - ret = -ENOMEM; - goto out; - } + if (!data) + return -ENOMEM; + data->mtd.priv = data; ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id)); if (ret) { dev_err(dev, "no device property 'ibm,opal-id'\n"); - goto out; + return ret; } ret = powernv_flash_set_driver_info(dev, >mtd); if (ret) - goto out; + return ret; dev_set_drvdata(dev, data); @@ -250,10 +249,7 @@ static int powernv_flash_probe(struct platform_device *pdev) * with an ffs partition at the start, it should prove easier for users * to deal with partitions or not as they see fit */ - ret = mtd_device_register(>mtd, NULL, 0); - -out: - return ret; + return mtd_device_register(>mtd, NULL, 0); } /** -- 2.15.0
[PATCH v5 06/10] powerpc/opal: Rework the opal-async interface
Future work will add an opal_async_wait_response_interruptible() which will call wait_event_interruptible(). This work requires extra token state to be tracked as wait_event_interruptible() can return and the caller could release the token before OPAL responds. Currently token state is tracked with two bitfields which are 64 bits big but may not need to be as OPAL informs Linux how many async tokens there are. It also uses an array indexed by token to store response messages for each token. The bitfields make it difficult to add more state and also provide a hard maximum as to how many tokens there can be - it is possible that OPAL will inform Linux that there are more than 64 tokens. Rather than add a bitfield to track the extra state, rework the internals slightly. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal-async.c | 92 - 1 file changed, 50 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index c43421ab2d2f..fbae8a37ce2c 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -1,7 +1,7 @@ /* * PowerNV OPAL asynchronous completion interfaces * - * Copyright 2013 IBM Corp. + * Copyright 2013-2017 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -23,40 +23,45 @@ #include #include -#define N_ASYNC_COMPLETIONS64 +enum opal_async_token_state { + ASYNC_TOKEN_UNALLOCATED = 0, + ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_COMPLETED +}; + +struct opal_async_token { + enum opal_async_token_state state; + struct opal_msg response; +}; -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL}; -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); static DEFINE_SPINLOCK(opal_async_comp_lock); static struct semaphore opal_async_sem; -static struct opal_msg *opal_async_responses; static unsigned int opal_max_async_tokens; +static struct opal_async_token *opal_async_tokens; static int __opal_async_get_token(void) { unsigned long flags; - int token; + int token = -EBUSY; spin_lock_irqsave(_async_comp_lock, flags); - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); - if (token >= opal_max_async_tokens) { - token = -EBUSY; - goto out; + for (token = 0; token < opal_max_async_tokens; token++) { + if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) { + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; + goto out; + } } - - if (__test_and_set_bit(token, opal_async_token_map)) { - token = -EBUSY; - goto out; - } - - __clear_bit(token, opal_async_complete_map); - out: spin_unlock_irqrestore(_async_comp_lock, flags); return token; } +/* + * Note: If the returned token is used in an opal call and opal returns + * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * calling another other opal_async_* function + */ int opal_async_get_token_interruptible(void) { int token; @@ -76,6 +81,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); static int __opal_async_release_token(int token) { unsigned long flags; + int rc; if (token < 0 || token >= opal_max_async_tokens) { pr_err("%s: Passed token is out of range, token %d\n", @@ -84,11 +90,18 @@ static int __opal_async_release_token(int token) } spin_lock_irqsave(_async_comp_lock, flags); - __set_bit(token, opal_async_complete_map); - __clear_bit(token, opal_async_token_map); + switch (opal_async_tokens[token].state) { + case ASYNC_TOKEN_COMPLETED: + case ASYNC_TOKEN_ALLOCATED: + opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED; + rc = 0; + break; + default: + rc = 1; + } spin_unlock_irqrestore(_async_comp_lock, flags); - return 0; + return rc; } int opal_async_release_token(int token) @@ -96,12 +109,10 @@ int opal_async_release_token(int token) int ret; ret = __opal_async_release_token(token); - if (ret) - return ret; - - up(_async_sem); + if (!ret) + up(_async_sem); - return 0; + return ret; } EXPORT_SYMBOL_GPL(opal_async_release_token); @@ -122,13 +133,15 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) * functional. */ opal_wake_poller(); - wait_event(opal_async_wait, test_bit(token, opal_async_comple
[PATCH v5 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async
This patch adds an _interruptible version of opal_async_wait_response(). This is useful when a long running OPAL call is performed on behalf of a userspace thread, for example, the opal_flash_{read,write,erase} functions performed by the powernv-flash MTD driver. It is foreseeable that these functions would take upwards of two minutes causing the wait_event() to block long enough to cause hung task warnings. Furthermore, wait_event_interruptible() is preferable as otherwise there is no way for signals to stop the process which is going to be confusing in userspace. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/opal.h | 2 + arch/powerpc/platforms/powernv/opal-async.c | 87 +++-- 2 files changed, 85 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 0078eb5acf98..f95ca4560bfa 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -307,6 +307,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); extern int opal_async_get_token_interruptible(void); extern int opal_async_release_token(int token); extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); +extern int opal_async_wait_response_interruptible(uint64_t token, + struct opal_msg *msg); extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); struct rtc_time; diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index fbae8a37ce2c..e2004606b75b 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -26,6 +26,8 @@ enum opal_async_token_state { ASYNC_TOKEN_UNALLOCATED = 0, ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_DISPATCHED, + ASYNC_TOKEN_ABANDONED, ASYNC_TOKEN_COMPLETED }; @@ -58,8 +60,10 @@ static int __opal_async_get_token(void) } /* - * Note: If the returned token is used in an opal call and opal returns - * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * Note: If the returned token is used in an opal call and opal + * returns OPAL_ASYNC_COMPLETION you MUST one of + * opal_async_wait_response() or + * opal_async_wait_response_interruptible() at least once before * calling another other opal_async_* function */ int opal_async_get_token_interruptible(void) @@ -96,6 +100,16 @@ static int __opal_async_release_token(int token) opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED; rc = 0; break; + /* +* DISPATCHED and ABANDONED tokens must wait for OPAL to +* respond. +* Mark a DISPATCHED token as ABANDONED so that the response +* response handling code knows no one cares and that it can +* free it then. +*/ + case ASYNC_TOKEN_DISPATCHED: + opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED; + /* Fall through */ default: rc = 1; } @@ -128,7 +142,11 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) return -EINVAL; } - /* Wakeup the poller before we wait for events to speed things + /* +* There is no need to mark the token as dispatched, wait_event() +* will block until the token completes. +* +* Wakeup the poller before we wait for events to speed things * up on platforms or simulators where the interrupts aren't * functional. */ @@ -141,11 +159,66 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) } EXPORT_SYMBOL_GPL(opal_async_wait_response); +int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg) +{ + unsigned long flags; + int ret; + + if (token >= opal_max_async_tokens) { + pr_err("%s: Invalid token passed\n", __func__); + return -EINVAL; + } + + if (!msg) { + pr_err("%s: Invalid message pointer passed\n", __func__); + return -EINVAL; + } + + /* +* The first time this gets called we mark the token as DISPATCHED +* so that if wait_event_interruptible() returns not zero and the +* caller frees the token, we know not to actually free the token +* until the response comes. +* +* Only change if the token is ALLOCATED - it may have been +* completed even before the caller gets around to calling this +* the first time. +* +* There is also a dirty great comment at the token allocation +* function that if the opal call returns OPAL_ASYNC_COMPLETION to +* the caller then the caller *must* call this or the not +* interruptible version before doing anything e
[PATCH v5 02/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
While this driver expects to interact asynchronously, OPAL is well within its rights to return OPAL_SUCCESS to indicate that the operation completed without the need for a callback. We shouldn't treat OPAL_SUCCESS as an error rather we should wrap up and return promptly to the caller. Signed-off-by: Cyril Bur <cyril...@gmail.com> Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com> --- I'll note here that currently no OPAL exists that will return OPAL_SUCCESS so there isn't the possibility of a bug today. --- drivers/mtd/devices/powernv_flash.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index f9ec38281ff2..ca3ca6adf71e 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -63,7 +63,6 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, if (token < 0) { if (token != -ERESTARTSYS) dev_err(dev, "Failed to get an async token\n"); - return token; } @@ -83,21 +82,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, return -EIO; } + if (rc == OPAL_SUCCESS) + goto out_success; + if (rc != OPAL_ASYNC_COMPLETION) { dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - opal_async_release_token(token); - return -EIO; + rc = -EIO; + goto out; } rc = opal_async_wait_response(token, ); - opal_async_release_token(token); if (rc) { dev_err(dev, "opal async wait failed (rc %d)\n", rc); - return -EIO; + rc = -EIO; + goto out; } rc = opal_get_async_rc(msg); +out_success: if (rc == OPAL_SUCCESS) { rc = 0; if (retlen) @@ -106,6 +109,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, rc = -EIO; } +out: + opal_async_release_token(token); return rc; } -- 2.15.0
[PATCH v5 00/10] Allow opal-async waiters to get interrupted
V5: Address review from Boris Brezillon, thanks! Minor cleanups and descriptions - no functional changes. V4: Rework and rethink. To recap: Userspace MTD read()s/write()s and erases to powernv_flash become calls into the OPAL firmware which subsequently handles flash access. Because the read()s, write()s or erases can be large (bounded of course my the size of flash) OPAL may take some time to service the request, this causes the powernv_flash driver to sit in a wait_event() for potentially minutes. This causes two problems, firstly, tools appear to hang for the entire time as they cannot be interrupted by signals and secondly, this can trigger hung task warnings. The correct solution is to use wait_event_interruptible() which my rework (as part of this series) of the opal-async infrastructure provides. The final patch in this series achieves this. It should eliminate both hung tasks and threads locking up. Included in this series are other simpler fixes for powernv_flash: Don't always return EIO on error. OPAL does mutual exclusion on the flash and also knows when the service processor takes control of the flash, in both of these cases it will return OPAL_BUSY, translating this to EIO is misleading to userspace. Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION and don't treat it as an error. Unfortunately there are too many drivers out there with the incorrect behaviour so this means OPAL can never return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the code from being correct. Don't return ERESTARTSYS if token acquisition is interrupted as powernv_flash can't be sure it hasn't already performed some work, let userspace deal with the problem. Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash. Not for powernv_flash, a fix from Stewart Smith which fits into this series as it relies on my improvements to the opal-async infrastructure. V3: export opal_error_code() so that powernv_flash can be built=m Hello, Version one of this series ignored that OPAL may continue to use buffers passed to it after Linux kfree()s the buffer. This version addresses this, not in a particularly nice way - future work could make this better. This version also includes a few cleanups and fixups to powernv_flash driver one along the course of this work that I thought I would just send. The problem we're trying to solve here is that currently all users of the opal-async calls must use wait_event(), this may be undesirable when there is a userspace process behind the request for the opal call, if OPAL takes too long to complete the call then hung task warnings will appear. In order to solve the problem callers should use wait_event_interruptible(), due to the interruptible nature of this call the opal-async infrastructure needs to track extra state associated with each async token, this is prepared for in patch 6/10. While I was working on the opal-async infrastructure improvements Stewart fixed another problem and he relies on the corrected behaviour of opal-async so I've sent it here. Hello MTD folk, traditionally Michael Ellerman takes powernv_flash driver patches through the powerpc tree, as always your feedback is very welcome. Thanks, Cyril Cyril Bur (9): mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON() mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error mtd: powernv_flash: Remove pointless goto in driver init mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition powerpc/opal: Make __opal_async_{get,release}_token() static powerpc/opal: Rework the opal-async interface powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async powerpc/powernv: Add OPAL_BUSY to opal_error_code() mtd: powernv_flash: Use opal_async_wait_response_interruptible() Stewart Smith (1): powernv/opal-sensor: remove not needed lock arch/powerpc/include/asm/opal.h | 4 +- arch/powerpc/platforms/powernv/opal-async.c | 183 +++ arch/powerpc/platforms/powernv/opal-sensor.c | 17 +-- arch/powerpc/platforms/powernv/opal.c| 2 + drivers/mtd/devices/powernv_flash.c | 83 +++- 5 files changed, 194 insertions(+), 95 deletions(-) -- 2.15.0
[PATCH v5 05/10] powerpc/opal: Make __opal_async_{get, release}_token() static
There are no callers of both __opal_async_get_token() and __opal_async_release_token(). This patch also removes the possibility of "emergency through synchronous call to __opal_async_get_token()" as such it makes more sense to initialise opal_sync_sem for the maximum number of async tokens. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/opal.h | 2 -- arch/powerpc/platforms/powernv/opal-async.c | 10 +++--- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 726c23304a57..0078eb5acf98 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -304,9 +304,7 @@ extern void opal_notifier_enable(void); extern void opal_notifier_disable(void); extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); -extern int __opal_async_get_token(void); extern int opal_async_get_token_interruptible(void); -extern int __opal_async_release_token(int token); extern int opal_async_release_token(int token); extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index cf33769a7b72..c43421ab2d2f 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -33,7 +33,7 @@ static struct semaphore opal_async_sem; static struct opal_msg *opal_async_responses; static unsigned int opal_max_async_tokens; -int __opal_async_get_token(void) +static int __opal_async_get_token(void) { unsigned long flags; int token; @@ -73,7 +73,7 @@ int opal_async_get_token_interruptible(void) } EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); -int __opal_async_release_token(int token) +static int __opal_async_release_token(int token) { unsigned long flags; @@ -199,11 +199,7 @@ int __init opal_async_comp_init(void) goto out_opal_node; } - /* Initialize to 1 less than the maximum tokens available, as we may -* require to pop one during emergency through synchronous call to -* __opal_async_get_token() -*/ - sema_init(_async_sem, opal_max_async_tokens - 1); + sema_init(_async_sem, opal_max_async_tokens); out_opal_node: of_node_put(opal_node); -- 2.15.0
[PATCH v5 07/10] powernv/opal-sensor: remove not needed lock
From: Stewart Smith <stew...@linux.vnet.ibm.com> Parallel sensor reads could run out of async tokens due to opal_get_sensor_data grabbing tokens but then doing the sensor read behind a mutex, essentially serializing the (possibly asynchronous and relatively slow) sensor read. It turns out that the mutex isn't needed at all, not only should the OPAL interface allow concurrent reads, the implementation is certainly safe for that, and if any sensor we were reading from somewhere isn't, doing the mutual exclusion in the kernel is the wrong place to do it, OPAL should be doing it for the kernel. So, remove the mutex. Additionally, we shouldn't be printing out an error when we don't get a token as the only way this should happen is if we've been interrupted in down_interruptible() on the semaphore. Reported-by: Robert Lippert <rlipp...@google.com> Signed-off-by: Stewart Smith <stew...@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal-sensor.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c b/arch/powerpc/platforms/powernv/opal-sensor.c index aa267f120033..0a7074bb91dc 100644 --- a/arch/powerpc/platforms/powernv/opal-sensor.c +++ b/arch/powerpc/platforms/powernv/opal-sensor.c @@ -19,13 +19,10 @@ */ #include -#include #include #include #include -static DEFINE_MUTEX(opal_sensor_mutex); - /* * This will return sensor information to driver based on the requested sensor * handle. A handle is an opaque id for the powernv, read by the driver from the @@ -38,13 +35,9 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) __be32 data; token = opal_async_get_token_interruptible(); - if (token < 0) { - pr_err("%s: Couldn't get the token, returning\n", __func__); - ret = token; - goto out; - } + if (token < 0) + return token; - mutex_lock(_sensor_mutex); ret = opal_sensor_read(sensor_hndl, token, ); switch (ret) { case OPAL_ASYNC_COMPLETION: @@ -52,7 +45,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) if (ret) { pr_err("%s: Failed to wait for the async response, %d\n", __func__, ret); - goto out_token; + goto out; } ret = opal_error_code(opal_get_async_rc(msg)); @@ -73,10 +66,8 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) break; } -out_token: - mutex_unlock(_sensor_mutex); - opal_async_release_token(token); out: + opal_async_release_token(token); return ret; } EXPORT_SYMBOL_GPL(opal_get_sensor_data); -- 2.15.0
[PATCH v5 09/10] powerpc/powernv: Add OPAL_BUSY to opal_error_code()
Also export opal_error_code() so that it can be used in modules Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 65c79ecf5a4d..041ddbd1fc57 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -998,6 +998,7 @@ int opal_error_code(int rc) case OPAL_PARAMETER:return -EINVAL; case OPAL_ASYNC_COMPLETION: return -EINPROGRESS; + case OPAL_BUSY: case OPAL_BUSY_EVENT: return -EBUSY; case OPAL_NO_MEM: return -ENOMEM; case OPAL_PERMISSION: return -EPERM; @@ -1037,3 +1038,4 @@ EXPORT_SYMBOL_GPL(opal_write_oppanel_async); /* Export this for KVM */ EXPORT_SYMBOL_GPL(opal_int_set_mfrr); EXPORT_SYMBOL_GPL(opal_int_eoi); +EXPORT_SYMBOL_GPL(opal_error_code); -- 2.15.0
[PATCH v5 04/10] mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition
Because the MTD core might split up a read() or write() from userspace into several calls to the driver, we may fail to get a token but already have done some work, best to return -EINTR back to userspace and have them decide what to do. Signed-off-by: Cyril Bur <cyril...@gmail.com> Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com> --- drivers/mtd/devices/powernv_flash.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 4dd3b5d2feb2..3343d4f5c4f3 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -47,6 +47,11 @@ enum flash_op { FLASH_OP_ERASE, }; +/* + * Don't return -ERESTARTSYS if we can't get a token, the MTD core + * might have split up the call from userspace and called into the + * driver more than once, we'll already have done some amount of work. + */ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, loff_t offset, size_t len, size_t *retlen, u_char *buf) { @@ -63,6 +68,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, if (token < 0) { if (token != -ERESTARTSYS) dev_err(dev, "Failed to get an async token\n"); + else + token = -EINTR; return token; } -- 2.15.0
[PATCH v5 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
BUG_ON() should be reserved in situations where we can not longer guarantee the integrity of the system. In the case where powernv_flash_async_op() receives an impossible op, we can still guarantee the integrity of the system. Signed-off-by: Cyril Bur <cyril...@gmail.com> Acked-by: Boris Brezillon <boris.brezil...@free-electrons.com> --- drivers/mtd/devices/powernv_flash.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index f5396f26ddb4..f9ec38281ff2 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -78,7 +78,9 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, rc = opal_flash_erase(info->id, offset, len, token); break; default: - BUG_ON(1); + WARN_ON_ONCE(1); + opal_async_release_token(token); + return -EIO; } if (rc != OPAL_ASYNC_COMPLETION) { -- 2.15.0
[PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Unchanged V3: Unchanged arch/powerpc/include/asm/tm.h | 5 ++-- arch/powerpc/kernel/process.c | 22 ++- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 2 +- arch/powerpc/kernel/tm.S| 59 - arch/powerpc/kernel/traps.c | 26 +- 6 files changed, 35 insertions(+), 81 deletions(-) diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index 82e06ca3a49b..33d965911bec 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -11,10 +11,9 @@ extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, - unsigned long orig_msr, uint8_t cause); + uint8_t cause); extern void tm_reclaim_current(uint8_t cause); -extern void tm_recheckpoint(struct thread_struct *thread, - unsigned long orig_msr); +extern void tm_recheckpoint(struct thread_struct *thread); extern void tm_abort(uint8_t cause); extern void tm_save_sprs(struct thread_struct *thread); extern void tm_restore_sprs(struct thread_struct *thread); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index bf651f2fd3bd..b00c291cd05c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -869,6 +869,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, giveup_all(container_of(thr, struct task_struct, thread)); + tm_reclaim(thr, cause); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -887,8 +889,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, if ((thr->ckpt_regs.msr & MSR_VEC) == 0) memcpy(>ckvr_state, >vr_state, sizeof(struct thread_vr_state)); - - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } void tm_reclaim_current(uint8_t cause) @@ -937,11 +937,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk) tm_save_sprs(thr); } -extern void __
[PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable
After handling a transactional FP, Altivec or VSX unavailable exception. The return to userspace code will detect that the TIF_RESTORE_TM bit is set and call restore_tm_state(). restore_tm_state() will call restore_math() to ensure that the correct facilities are loaded. This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm() is doing pointless work and can simply be removed. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Obvious cleanup which should have been in v1 V3: Unchanged arch/powerpc/kernel/traps.c | 30 -- 1 file changed, 30 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 4a7bc64352fd..3181e85ef17c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * so we don't want to load the VRs from the thread_struct. */ tm_recheckpoint(>thread); - - /* If VMX is in use, get the transactional values back */ - if (orig_msr & MSR_VEC) { - msr_check_and_set(MSR_VEC); - load_vr_state(>thread.vr_state); - /* At this point all the VSX state is loaded, so enable it */ - regs->msr |= MSR_VSX; - } } void altivec_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs) current->thread.load_vec = 1; tm_recheckpoint(>thread); current->thread.used_vr = 1; - - if (orig_msr & MSR_FP) { - msr_check_and_set(MSR_FP); - load_fp_state(>thread.fp_state); - regs->msr |= MSR_VSX; - } } void vsx_unavailable_tm(struct pt_regs *regs) @@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs) current->thread.load_fp = 1; tm_recheckpoint(>thread); - - msr_check_and_set(MSR_FP | MSR_VEC); - load_fp_state(>thread.fp_state); - load_vr_state(>thread.vr_state); } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ -- 2.15.0
[PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). This patch is a minimal fix for ease of backporting. A more correct fix which removes the msr parameter to tm_reclaim() and tm_recheckpoint() altogether has been upstreamed to apply on top of this patch. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Add this patch for ease of backporting the same fix as the next patch. V3: No change arch/powerpc/kernel/process.c | 4 ++-- arch/powerpc/kernel/traps.c | 22 +- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index cff887e67eb9..bf651f2fd3bd 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -867,6 +867,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, if (!MSR_TM_SUSPENDED(mfmsr())) return; + giveup_all(container_of(thr, struct task_struct, thread)); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, memcpy(>ckvr_state, >vr_state, sizeof(struct thread_vr_state)); - giveup_all(container_of(thr, struct task_struct, thread)); - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index ef6a45969812..a7d42c89a257 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * If VMX is in use, the VRs now hold checkpointed values, * so we don't want to load the VRs from the thread_struct. */ - tm_recheckpoint(>thread, MSR_FP); + tm_recheckpoint(>thread, orig_msr | MSR_FP); /* If VMX is in use, get the transactional values back */ - if (regs->msr & MSR_VEC) { + if (orig_msr & MSR_VEC) { msr_check_and_set(MSR_VEC); load_vr_state(>thread.vr_state); /* At this point all the VSX state is loaded, so enable it */ @@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs) void altivec_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs) regs->nip, regs->msr); tm_reclaim_current(TM_CAUSE_FAC_UNAV); current->thread.load_vec = 1; - tm_recheckpoint(>thread, MSR_VEC); + tm_recheckpoint(>thread, orig_msr | MSR_VEC); current->thread.used_vr = 1; - if (regs->msr & MSR_FP) { + if (orig_msr & MSR_FP) { msr_check_and_set(MSR_FP); load_fp
[PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. Further compounding the issue, the transactional {fp,altivec,vsx} unavailable code has been incorrectly using the MSR to enable facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply means if the registers are live on the CPU, not if the kernel should load them before returning to userspace. This has worked due to the bug mentioned above. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use the load_fp and load_vec booleans to help restore_math() make the correct decision V3: Put tm_active_with_{fp,altivec}() inside a #ifdef CONFIG_PPC_TRANSACTIONAL_MEM arch/powerpc/kernel/process.c | 18 -- arch/powerpc/kernel/traps.c | 8 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf3454..cff887e67eb9 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -97,9 +97,23 @@ static inline bool msr_tm_active(unsigned long msr) { return MSR_TM_ACTIVE(msr); } + +static bool tm_active_with_fp(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_FP); +} + +static bool tm_active_with_altivec(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_VEC); +} #else static inline bool msr_tm_active(unsigned long msr) { return false; } static inline void check_if_tm_restore_required(struct task_struct *tsk) { } +static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; } +static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ bool strict_msr_control; @@ -232,7 +246,7 @@ EXPORT_SYMBOL(enable_kernel_fp); static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) { + if (tsk->thread.
Re: [PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
On Thu, 2017-11-02 at 10:19 +0800, kbuild test robot wrote: > Hi Cyril, > > Thank you for the patch! Yet something to improve: > Once again robot, you have done brilliantly! You're 100% correct and the last thing I want to do is break the build with CONFIG_PPC_TRANSACTIONAL_MEM turned off. Life saver, Thanks so much kbuild. Cyril > [auto build test ERROR on powerpc/next] > [also build test ERROR on v4.14-rc7 next-20171018] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Cyril-Bur/powerpc-Don-t-enable-FP-Altivec-if-not-checkpointed/20171102-073816 > base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next > config: powerpc-asp8347_defconfig (attached as .config) > compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=powerpc > > All errors (new ones prefixed by >>): > >arch/powerpc/kernel/process.c: In function 'is_transactionally_fp': > > > arch/powerpc/kernel/process.c:243:15: error: 'struct thread_struct' has > > > no member named 'ckpt_regs' > > (tsk->thread.ckpt_regs.msr & MSR_FP); > ^ >arch/powerpc/kernel/process.c:244:1: error: control reaches end of > non-void function [-Werror=return-type] > } > ^ >cc1: all warnings being treated as errors > > vim +243 arch/powerpc/kernel/process.c > >239 >240static int is_transactionally_fp(struct task_struct *tsk) >241{ >242return msr_tm_active(tsk->thread.regs->msr) && > > 243(tsk->thread.ckpt_regs.msr & MSR_FP); >244} >245 > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH] selftests/powerpc: Check FP/VEC on exception in TM
On Wed, 2017-11-01 at 15:23 -0400, Gustavo Romero wrote: > Add a self test to check if FP/VEC/VSX registers are sane (restored > correctly) after a FP/VEC/VSX unavailable exception is caught during a > transaction. > > This test checks all possibilities in a thread regarding the combination > of MSR.[FP|VEC] states in a thread and for each scenario raises a > FP/VEC/VSX unavailable exception in transactional state, verifying if > vs0 and vs32 registers, which are representatives of FP/VEC/VSX reg > sets, are not corrupted. > Thanks Gustavo, I do have one more thought on an improvement for this test which is that: + /* Counter for busy wait * + uint64_t counter = 0x1ff00; is a bit fragile, what we should do is have the test work out long it should spin until it reliably gets a TM_CAUSE_FAC_UNAV failure and then use that for these tests. This will only become a problem if we were to change kernel heuristics which is fine for now. I'll try to get that added soon but for now this test has proven too useful to delay adding as is. > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com> > Signed-off-by: Breno Leitao <lei...@debian.org> > Signed-off-by: Cyril Bur <cyril...@gmail.com> > --- > tools/testing/selftests/powerpc/tm/Makefile| 3 +- > .../testing/selftests/powerpc/tm/tm-unavailable.c | 368 > + > tools/testing/selftests/powerpc/tm/tm.h| 5 + > 3 files changed, 375 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/powerpc/tm/tm-unavailable.c > > diff --git a/tools/testing/selftests/powerpc/tm/Makefile > b/tools/testing/selftests/powerpc/tm/Makefile > index 7bfcd45..24855c0 100644 > --- a/tools/testing/selftests/powerpc/tm/Makefile > +++ b/tools/testing/selftests/powerpc/tm/Makefile > @@ -2,7 +2,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr > tm-signal-context-chk-fpu > tm-signal-context-chk-vmx tm-signal-context-chk-vsx > > TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv > tm-signal-stack \ > - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \ > + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \ > $(SIGNAL_CONTEXT_CHK_TESTS) > > include ../../lib.mk > @@ -16,6 +16,7 @@ $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include > $(OUTPUT)/tm-tmspr: CFLAGS += -pthread > $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 > $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o > +$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 > -Wno-error=uninitialized -mvsx > > SIGNAL_CONTEXT_CHK_TESTS := $(patsubst > %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) > $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S > diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c > b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > new file mode 100644 > index 000..69a4e8c > --- /dev/null > +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c > @@ -0,0 +1,368 @@ > +/* > + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp. > + * Licensed under GPLv2. > + * > + * Force FP, VEC and VSX unavailable exception during transaction in all > + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP > + * is enable and VEC is disable, when FP is disable and VEC is enable, and > + * so on. Then we check if the restored state is correctly set for the > + * FP and VEC registers to the previous state we set just before we entered > + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and > + * VEC/Altivec registers on abortion due to an unavailable exception in TM. > + * N.B. In this test we do not test all the FP/Altivec/VSX registers for > + * corruption, but only for registers vs0 and vs32, which are respectively > + * representatives of FP and VEC/Altivec reg sets. > + */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "tm.h" > + > +#define DEBUG 0 > + > +/* Unavailable exceptions to test in HTM */ > +#define FP_UNA_EXCEPTION 0 > +#define VEC_UNA_EXCEPTION1 > +#define VSX_UNA_EXCEPTION2 > + > +#define NUM_EXCEPTIONS 3 > + > +struct Flags { > + int touch_fp; > + int touch_vec; > + int result; > + int exception; > +} flags; > + > +bool expecting_failure(void) > +{ > + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION) > + return false; > + > + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION) > + return false; > + > + /* If both FP and VEC are touched it does
[PATCH v2 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). This patch is a minimal fix for ease of backporting. A more correct fix which removes the msr parameter to tm_reclaim() and tm_recheckpoint() altogether has been upstreamed to apply on top of this patch. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Add this patch for ease of backporting the same fix as the next patch. arch/powerpc/kernel/process.c | 4 ++-- arch/powerpc/kernel/traps.c | 22 +- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ebb5b58a4138..cfa75e99dcfb 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -866,6 +866,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, if (!MSR_TM_SUSPENDED(mfmsr())) return; + giveup_all(container_of(thr, struct task_struct, thread)); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -885,8 +887,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, memcpy(>ckvr_state, >vr_state, sizeof(struct thread_vr_state)); - giveup_all(container_of(thr, struct task_struct, thread)); - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index ef6a45969812..a7d42c89a257 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * If VMX is in use, the VRs now hold checkpointed values, * so we don't want to load the VRs from the thread_struct. */ - tm_recheckpoint(>thread, MSR_FP); + tm_recheckpoint(>thread, orig_msr | MSR_FP); /* If VMX is in use, get the transactional values back */ - if (regs->msr & MSR_VEC) { + if (orig_msr & MSR_VEC) { msr_check_and_set(MSR_VEC); load_vr_state(>thread.vr_state); /* At this point all the VSX state is loaded, so enable it */ @@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs) void altivec_unavailable_tm(struct pt_regs *regs) { + /* +* Save the MSR now because tm_reclaim_current() is likely to +* change it +*/ + unsigned long orig_msr = regs->msr; + /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs) regs->nip, regs->msr); tm_reclaim_current(TM_CAUSE_FAC_UNAV); current->thread.load_vec = 1; - tm_recheckpoint(>thread, MSR_VEC); + tm_recheckpoint(>thread, orig_msr | MSR_VEC); current->thread.used_vr = 1; - if (regs->msr & MSR_FP) { + if (orig_msr & MSR_FP) { msr_check_and_set(MSR_FP); load_fp_state(>thread.fp_state
[PATCH v2 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable
After handling a transactional FP, Altivec or VSX unavailable exception. The return to userspace code will detect that the TIF_RESTORE_TM bit is set and call restore_tm_state(). restore_tm_state() will call restore_math() to ensure that the correct facilities are loaded. This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm() is doing pointless work and can simply be removed. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Obvious cleanup which should have been in v1 arch/powerpc/kernel/traps.c | 30 -- 1 file changed, 30 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 4a7bc64352fd..3181e85ef17c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs) void fp_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* Note: This does not handle any kind of FP laziness. */ TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n", @@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs) * so we don't want to load the VRs from the thread_struct. */ tm_recheckpoint(>thread); - - /* If VMX is in use, get the transactional values back */ - if (orig_msr & MSR_VEC) { - msr_check_and_set(MSR_VEC); - load_vr_state(>thread.vr_state); - /* At this point all the VSX state is loaded, so enable it */ - regs->msr |= MSR_VSX; - } } void altivec_unavailable_tm(struct pt_regs *regs) { - /* -* Save the MSR now because tm_reclaim_current() is likely to -* change it -*/ - unsigned long orig_msr = regs->msr; - /* See the comments in fp_unavailable_tm(). This function operates * the same way. */ @@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs) current->thread.load_vec = 1; tm_recheckpoint(>thread); current->thread.used_vr = 1; - - if (orig_msr & MSR_FP) { - msr_check_and_set(MSR_FP); - load_fp_state(>thread.fp_state); - regs->msr |= MSR_VSX; - } } void vsx_unavailable_tm(struct pt_regs *regs) @@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs) current->thread.load_fp = 1; tm_recheckpoint(>thread); - - msr_check_and_set(MSR_FP | MSR_VEC); - load_fp_state(>thread.fp_state); - load_vr_state(>thread.vr_state); } #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ -- 2.14.3
[PATCH v2 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Unchanged arch/powerpc/include/asm/tm.h | 5 ++-- arch/powerpc/kernel/process.c | 22 ++- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 2 +- arch/powerpc/kernel/tm.S| 59 - arch/powerpc/kernel/traps.c | 26 +- 6 files changed, 35 insertions(+), 81 deletions(-) diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index 82e06ca3a49b..33d965911bec 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -11,10 +11,9 @@ extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, - unsigned long orig_msr, uint8_t cause); + uint8_t cause); extern void tm_reclaim_current(uint8_t cause); -extern void tm_recheckpoint(struct thread_struct *thread, - unsigned long orig_msr); +extern void tm_recheckpoint(struct thread_struct *thread); extern void tm_abort(uint8_t cause); extern void tm_save_sprs(struct thread_struct *thread); extern void tm_restore_sprs(struct thread_struct *thread); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index cfa75e99dcfb..4b322ede6420 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -868,6 +868,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, giveup_all(container_of(thr, struct task_struct, thread)); + tm_reclaim(thr, cause); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, if ((thr->ckpt_regs.msr & MSR_VEC) == 0) memcpy(>ckvr_state, >vr_state, sizeof(struct thread_vr_state)); - - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } void tm_reclaim_current(uint8_t cause) @@ -936,11 +936,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk) tm_save_sprs(thr); } -extern void __tm_recheckpoin
[PATCH v2 1/4] powerpc: Don't enable FP/Altivec if not checkpointed
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. Further compounding the issue, the transactional {fp,altivec,vsx} unavailable code has been incorrectly using the MSR to enable facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply means if the registers are live on the CPU, not if the kernel should load them before returning to userspace. This has worked due to the bug mentioned above. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use the load_fp and load_vec booleans to help restore_math() make the correct decision arch/powerpc/kernel/process.c | 17 +++-- arch/powerpc/kernel/traps.c | 8 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf3454..ebb5b58a4138 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -230,9 +230,15 @@ void enable_kernel_fp(void) } EXPORT_SYMBOL(enable_kernel_fp); +static bool tm_active_with_fp(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_FP); +} + static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) { + if (tsk->thread.load_fp || tm_active_with_fp(tsk)) { load_fp_state(>thread.fp_state); current->thread.load_fp++; return 1; @@ -311,10 +317,17 @@ void flush_altivec_to_thread(struct task_struct *tsk) } EXPORT_SYMBOL_GPL(flush_altivec_to_thread); +static bool tm_active_with_altivec(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_VEC); +} + + static int restore_altivec(struct task_struct *tsk) { if (cpu_has_feature(CPU_FTR_ALTIVEC) && - (tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) { + (tsk->thread.load_vec || tm_active_with_altivec(tsk))) {
Re: [PATCH v4 00/10] Allow opal-async waiters to get interrupted
On Mon, 2017-10-30 at 10:15 +0100, Boris Brezillon wrote: > On Tue, 10 Oct 2017 14:32:52 +1100 > Cyril Bur <cyril...@gmail.com> wrote: > > > V4: Rework and rethink. > > > > To recap: > > Userspace MTD read()s/write()s and erases to powernv_flash become > > calls into the OPAL firmware which subsequently handles flash access. > > Because the read()s, write()s or erases can be large (bounded of > > course my the size of flash) OPAL may take some time to service the > > request, this causes the powernv_flash driver to sit in a wait_event() > > for potentially minutes. This causes two problems, firstly, tools > > appear to hang for the entire time as they cannot be interrupted by > > signals and secondly, this can trigger hung task warnings. The correct > > solution is to use wait_event_interruptible() which my rework (as part > > of this series) of the opal-async infrastructure provides. > > > > The final patch in this series achieves this. It should eliminate both > > hung tasks and threads locking up. > > > > Included in this series are other simpler fixes for powernv_flash: > > > > Don't always return EIO on error. OPAL does mutual exclusion on the > > flash and also knows when the service processor takes control of the > > flash, in both of these cases it will return OPAL_BUSY, translating > > this to EIO is misleading to userspace. > > > > Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION > > and don't treat it as an error. Unfortunately there are too many drivers > > out there with the incorrect behaviour so this means OPAL can never > > return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the > > code from being correct. > > > > Don't return ERESTARTSYS if token acquisition is interrupted as > > powernv_flash can't be sure it hasn't already performed some work, let > > userspace deal with the problem. > > > > Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash. > > > > Not for powernv_flash, a fix from Stewart Smith which fits into this > > series as it relies on my improvements to the opal-async > > infrastructure. > > > > V3: export opal_error_code() so that powernv_flash can be built=m > > > > Hello, > > > > Version one of this series ignored that OPAL may continue to use > > buffers passed to it after Linux kfree()s the buffer. This version > > addresses this, not in a particularly nice way - future work could > > make this better. This version also includes a few cleanups and fixups > > to powernv_flash driver one along the course of this work that I > > thought I would just send. > > > > The problem we're trying to solve here is that currently all users of > > the opal-async calls must use wait_event(), this may be undesirable > > when there is a userspace process behind the request for the opal > > call, if OPAL takes too long to complete the call then hung task > > warnings will appear. > > > > In order to solve the problem callers should use > > wait_event_interruptible(), due to the interruptible nature of this > > call the opal-async infrastructure needs to track extra state > > associated with each async token, this is prepared for in patch 6/10. > > > > While I was working on the opal-async infrastructure improvements > > Stewart fixed another problem and he relies on the corrected behaviour > > of opal-async so I've sent it here. > > > > Hello MTD folk, traditionally Michael Ellerman takes powernv_flash > > driver patches through the powerpc tree, as always your feedback is > > very welcome. > > Just gave my acks on patches 1 to 4 and patch 10 (with minor comments > on patch 3 and 10). Feel free to take the patches directly through the > powerpc tree. > Hi Boris, thanks very much for the acks. All good points - I'll fix that up in a v2 Thanks again, Cyril > > > > Thanks, > > > > Cyril > > > > Cyril Bur (9): > > mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON() > > mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error > > mtd: powernv_flash: Remove pointless goto in driver init > > mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token > > acquisition > > powerpc/opal: Make __opal_async_{get,release}_token() static > > powerpc/opal: Rework the opal-async interface > > powerpc/opal: Add opal_async_wait_response_interruptible() to > > opal-async > > powerpc/powernv: Add OPAL_BUSY to opal_error_code() > > mtd: powernv_flash: Use opal_async_wait_response_interruptible() > > > > Stewart Smith (1): > > powernv/opal-sensor: remove not needed lock > > > > arch/powerpc/include/asm/opal.h | 4 +- > > arch/powerpc/platforms/powernv/opal-async.c | 183 > > +++ > > arch/powerpc/platforms/powernv/opal-sensor.c | 17 +-- > > arch/powerpc/platforms/powernv/opal.c| 2 + > > drivers/mtd/devices/powernv_flash.c | 83 +++- > > 5 files changed, 194 insertions(+), 95 deletions(-) > > > >
[PATCH 2/2] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/tm.h | 5 ++-- arch/powerpc/kernel/process.c | 26 +++--- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 2 +- arch/powerpc/kernel/tm.S| 59 - arch/powerpc/kernel/traps.c | 23 +--- 6 files changed, 37 insertions(+), 80 deletions(-) diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index 82e06ca3a49b..33d965911bec 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -11,10 +11,9 @@ extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, - unsigned long orig_msr, uint8_t cause); + uint8_t cause); extern void tm_reclaim_current(uint8_t cause); -extern void tm_recheckpoint(struct thread_struct *thread, - unsigned long orig_msr); +extern void tm_recheckpoint(struct thread_struct *thread); extern void tm_abort(uint8_t cause); extern void tm_save_sprs(struct thread_struct *thread); extern void tm_restore_sprs(struct thread_struct *thread); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index da900cd86324..fc9b88ccc2a7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -866,6 +866,10 @@ static void tm_reclaim_thread(struct thread_struct *thr, if (!MSR_TM_SUSPENDED(mfmsr())) return; + giveup_all(container_of(thr, struct task_struct, thread)); + + tm_reclaim(thr, cause); + /* * If we are in a transaction and FP is off then we can't have * used FP inside that transaction. Hence the checkpointed @@ -884,10 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, if ((thr->ckpt_regs.msr & MSR_VEC) == 0) memcpy(>ckvr_state, >vr_state, sizeof(struct thread_vr_state)); - - giveup_all(container_of(thr, struct task_struct, thread)); - - tm_reclaim(thr, thr->ckpt_regs.msr, cause); } void tm_reclaim_current(uint8_t cause) @@ -936,11 +936,9
[PATCH 1/2] powerpc: Don't enable FP/Altivec if not checkpointed
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/process.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf3454..da900cd86324 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -230,9 +230,15 @@ void enable_kernel_fp(void) } EXPORT_SYMBOL(enable_kernel_fp); +static int is_transactionally_fp(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_FP); +} + static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) { + if (tsk->thread.load_fp || is_transactionally_fp(tsk)) { load_fp_state(>thread.fp_state); current->thread.load_fp++; return 1; @@ -311,10 +317,17 @@ void flush_altivec_to_thread(struct task_struct *tsk) } EXPORT_SYMBOL_GPL(flush_altivec_to_thread); +static int is_transactionally_altivec(struct task_struct *tsk) +{ + return msr_tm_active(tsk->thread.regs->msr) && + (tsk->thread.ckpt_regs.msr & MSR_VEC); +} + + static int restore_altivec(struct task_struct *tsk) { if (cpu_has_feature(CPU_FTR_ALTIVEC) && - (tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) { + (tsk->thread.load_vec || is_transactionally_altivec(tsk))) { load_vr_state(>thread.vr_state); tsk->thread.used_vr = 1; tsk->thread.load_vec++; -- 2.14.3
Re: [PATCH] powerpc/tm: fix live state of vs0/32 in tm_reclaim
On Wed, 2017-07-05 at 11:02 +1000, Michael Neuling wrote: > On Tue, 2017-07-04 at 16:45 -0400, Gustavo Romero wrote: > > Currently tm_reclaim() can return with a corrupted vs0 (fp0) or vs32 (v0) > > due to the fact vs0 is used to save FPSCR and vs32 is used to save VSCR. > Hi Mikey, This completely fell off my radar, we do need something merged! For what its worth I like the original patch. > tm_reclaim() should have no state live in the registers once it returns. It > should all be saved in the thread struct. The above is not an issue in my > book. > Yeah, this is something I agree with, however, if that is the case then why have tm_recheckpoint() do partial reloads? A partial reload only makes sense if we can be sure that reclaim will have left the state at least (partially) correct - not with (as is the case today) one corrupted fp or Altivec reg. > Having a quick look at the code, I think there's and issue but we need > something > more like this (completely untested). > > When we recheckpoint inside an fp unavail, we need to recheckpoint vec if it > was > enabled. Currently we only ever recheckpoint the FP which seems like a bug. > Visa versa for the other way around. > In your example, we don't need to reload VEC if we can trust that reclaim left the checkpointed regs on the CPU correctly - this patch achieves this. Of course I'm more than happy to reduce complexity and not have this optimisation at all but then we should remove the entire parameter to tm_recheckpoint(). Any in between feels dangerous. Cyril > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > index d4e545d27e..d1184264e2 100644 > --- a/arch/powerpc/kernel/traps.c > +++ b/arch/powerpc/kernel/traps.c > @@ -1589,7 +1589,7 @@ void fp_unavailable_tm(struct pt_regs *regs) > * If VMX is in use, the VRs now hold checkpointed values, > * so we don't want to load the VRs from the thread_struct. > */ > - tm_recheckpoint(>thread, MSR_FP); > + tm_recheckpoint(>thread, regs->msr); > > /* If VMX is in use, get the transactional values back */ > if (regs->msr & MSR_VEC) { > @@ -1611,7 +1611,7 @@ void altivec_unavailable_tm(struct pt_regs *regs) > regs->nip, regs->msr); > tm_reclaim_current(TM_CAUSE_FAC_UNAV); > regs->msr |= MSR_VEC; > - tm_recheckpoint(>thread, MSR_VEC); > + tm_recheckpoint(>thread, regs->msr); > current->thread.used_vr = 1; > > if (regs->msr & MSR_FP) { > > > > Later, we recheckpoint trusting that the live state of FP and VEC are ok > > depending on the MSR.FP and MSR.VEC bits, i.e. if MSR.FP is enabled that > > means the FP registers checkpointed when we entered in TM are correct and > > after a treclaim. we can trust the FP live state. Similarly to VEC regs. > > However if tm_reclaim() does not return a sane state then tm_recheckpoint() > > will recheckpoint a corrupted state from live state back to the checkpoint > > area. > > > > > > That commit fixes the corruption by restoring vs0 and vs32 from the > > ckfp_state and ckvr_state after they are used to save FPSCR and VSCR, > > respectively. > > > > The effect of the issue described above is observed, for instance, once a > > VSX unavailable exception is caught in the middle of a transaction with > > MSR.FP = 1 or MSR.VEC = 1. If MSR.FP = 1, then after getting back to user > > space FP state is corrupted. If MSR.VEC = 1, then VEC state is corrupted. > > > > The issue does not occur if MSR.FP = 0 and MSR.VEC = 0 because ckfp_state > > and ckvr_state are both copied from fp_state and vr_state, respectively, > > and on recheckpointing both states will be restored from these thread > > structures and not from the live state. > > > > The issue does not occur also if MSR.FP = 1 and MSR.VEC = 1 because it > > implies MSR.VSX = 1 and in that case the VSX unavailable exception does not > > happen in the middle of the transactional block. > > > > Finally, that commit also fixes the MSR used to check if FP and VEC bits > > are enabled once we are in tm_reclaim_thread(). ckpt_regs.msr is valid only > > if giveup_all() is called *before* using ckpt_regs.msr for checks because > > check_if_tm_restore_required() in giveup_all() will copy regs->msr to > > ckpt_regs.msr and so ckpt_regs.msr reflects exactly the MSR that the thread > > had when it came off the processor. > > > > No regression was observed on powerpc/tm selftests after this fix. > > > > Signed-off-by: Gustavo Romero> > Signed-off-by: Breno Leitao > > --- > > arch/powerpc/kernel/process.c | 9 +++-- > > arch/powerpc/kernel/tm.S | 14 ++ > > 2 files changed, 21 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > > index 2ad725e..ac1fc51 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c
Re: [PATCH] powerpc/tm: Set ckpt_regs.msr before using it.
On Tue, 2017-10-24 at 15:13 -0200, Breno Leitao wrote: > From: Breno Leitao <breno.lei...@debian.org> > > On commit commit f48e91e87e67 ("powerpc/tm: Fix FP and VMX register > corruption"), we check ckpt_regs.msr to see if a feature (as VEC, VSX > and FP) is disabled (thus the hot registers might be bogus during the > reclaim), and then copy the previously saved thread registers, with the > non-bogus values, into the checkpoint area for a later trecheckpoint. > This mechanism is used to recheckpoints the proper register values when > a transaction started using the bogus registers, and these values were > sent to the memory checkpoint area. > > I see a problem on this code that ckpt_regs.msr is not properly set when > using it, as for example, when there is a vsx_unavailable_tm() in a code > like the following, the ckpt_regs.msg[FP] is 0; > > 1: sleep_until_{fp,vec,vsx} = 0 > 2: fadd > 3: tbegin. > 4: beq > 5: xxmrghd > 6: tend. > > In this case, line 5 will raise an vsx_unavailable_tm() exception, and > the ckpt_regs.msr[FP] will be zero before memcpy() block, executing the > memcpy even with the the FP registers hot. That is not correct because > we executed a float point instruction on line 2, and MSR[FP] was set to > 1. > > Fortunately this does not cause a big problem as I can see, other than > this extra memcpy() because treclaim() will later overwrite this wrong > copied value, since it relies on the correct MSR value, which was > updated by giveup_all->check_if_tm_restore_required. There might be a > problem when laziness is being turned on, but I was not able to > reproduce it. I believe this analysis is correct, I have come to the same conclusion in the past. I've also done a bunch of testing with variants of this patch and haven't seen a difference, however, I do believe the code is more correct with this patch. Signed-off-by: Cyril Bur <cyril...@gmail.com> Having said all that, nothing rules out that our tests simply aren't good enough ;) > > The solution I am proposing is updating ckpt_regs.msr before using it. > > Signed-off-by: Breno Leitao <lei...@debian.org> > Signed-off-by: Gustavo Romero <gusbrom...@gmail.com> > CC: Cyril Bur <cyril...@gmail.com> > CC: Michael Neuling <mi...@neuling.org> > --- > arch/powerpc/kernel/process.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index c051dc2b42ad..773e9c5594e7 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -860,6 +860,9 @@ static void tm_reclaim_thread(struct thread_struct *thr, > if (!MSR_TM_SUSPENDED(mfmsr())) > return; > > + /* Give up all the registers and set ckpt_regs.msr */ > + giveup_all(container_of(thr, struct task_struct, thread)); > + > /* >* If we are in a transaction and FP is off then we can't have >* used FP inside that transaction. Hence the checkpointed > @@ -879,8 +882,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, > memcpy(>ckvr_state, >vr_state, > sizeof(struct thread_vr_state)); > > - giveup_all(container_of(thr, struct task_struct, thread)); > - > tm_reclaim(thr, thr->ckpt_regs.msr, cause); > } >
Re: [PATCH v3 3/3] powerpc:selftest update memcmp_64 selftest for VMX implementation
On Fri, 2017-10-13 at 12:30 +0800, wei.guo.si...@gmail.com wrote: > From: Simon Guo> > This patch adjust selftest memcmp_64 so that memcmp selftest can be > compiled successfully. > Do they not compile at the moment? > It also adds testcases for: > - memcmp over 4K bytes size. > - s1/s2 with different/random offset on 16 bytes boundary. > - enter/exit_vmx_ops pairness. > This is a great idea, just a thought though - perhaps it might make more sense to have each condition be tested for in a separate binary rather than a single binary that tests everything. > Signed-off-by: Simon Guo > --- > .../selftests/powerpc/copyloops/asm/ppc_asm.h | 4 +- > .../selftests/powerpc/stringloops/asm/ppc_asm.h| 22 + > .../testing/selftests/powerpc/stringloops/memcmp.c | 98 > +- > 3 files changed, 100 insertions(+), 24 deletions(-) > > diff --git a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h > b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h > index 80d34a9..51bf6fa 100644 > --- a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h > +++ b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h > @@ -35,11 +35,11 @@ > li r3,0 > blr > > -FUNC_START(enter_vmx_copy) > +FUNC_START(enter_vmx_ops) > li r3,1 > blr > > -FUNC_START(exit_vmx_copy) > +FUNC_START(exit_vmx_ops) > blr > > FUNC_START(memcpy_power7) > diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h > b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h > index 11bece8..3326992 100644 > --- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h > +++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h > @@ -1,3 +1,5 @@ > +#ifndef _PPC_ASM_H > +#define __PPC_ASM_H > #include > > #ifndef r1 > @@ -5,3 +7,23 @@ > #endif > > #define _GLOBAL(A) FUNC_START(test_ ## A) > + > +#define CONFIG_ALTIVEC > + > +#define R14 r14 > +#define R15 r15 > +#define R16 r16 > +#define R17 r17 > +#define R18 r18 > +#define R19 r19 > +#define R20 r20 > +#define R21 r21 > +#define R22 r22 > +#define R29 r29 > +#define R30 r30 > +#define R31 r31 > + > +#define STACKFRAMESIZE 256 > +#define STK_REG(i) (112 + ((i)-14)*8) > + > +#endif > diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp.c > b/tools/testing/selftests/powerpc/stringloops/memcmp.c > index 30b1222..f5225f6 100644 > --- a/tools/testing/selftests/powerpc/stringloops/memcmp.c > +++ b/tools/testing/selftests/powerpc/stringloops/memcmp.c > @@ -1,20 +1,40 @@ > #include > #include > #include > +#include > #include "utils.h" > > #define SIZE 256 > #define ITERATIONS 1 > > +#define LARGE_SIZE (5 * 1024) > +#define LARGE_ITERATIONS 1000 > +#define LARGE_MAX_OFFSET 32 > +#define LARGE_SIZE_START 4096 > + > +#define MAX_OFFSET_DIFF_S1_S2 48 > + > +int vmx_count; > +int enter_vmx_ops(void) > +{ > + vmx_count++; > + return 1; > +} > + > +void exit_vmx_ops(void) > +{ > + vmx_count--; > +} > int test_memcmp(const void *s1, const void *s2, size_t n); > > /* test all offsets and lengths */ > -static void test_one(char *s1, char *s2) > +static void test_one(char *s1, char *s2, unsigned long max_offset, > + unsigned long size_start, unsigned long max_size) > { > unsigned long offset, size; > > - for (offset = 0; offset < SIZE; offset++) { > - for (size = 0; size < (SIZE-offset); size++) { > + for (offset = 0; offset < max_offset; offset++) { > + for (size = size_start; size < (max_size - offset); size++) { > int x, y; > unsigned long i; > > @@ -34,70 +54,104 @@ static void test_one(char *s1, char *s2) > printf("\n"); > abort(); > } > + > + if (vmx_count != 0) { > + printf("vmx enter/exit not paired.(offset:%ld > size:%ld s1:%p s2:%p vc:%d\n", > + offset, size, s1, s2, vmx_count); > + printf("\n"); > + abort(); > + } > } > } > } > > -static int testcase(void) > +static int testcase(bool islarge) > { > char *s1; > char *s2; > unsigned long i; > > - s1 = memalign(128, SIZE); > + unsigned long comp_size = (islarge ? LARGE_SIZE : SIZE); > + unsigned long alloc_size = comp_size + MAX_OFFSET_DIFF_S1_S2; > + int iterations = islarge ? LARGE_ITERATIONS : ITERATIONS; > + > + s1 = memalign(128, alloc_size); > if (!s1) { > perror("memalign"); > exit(1); > } > > - s2 = memalign(128, SIZE); > + s2 = memalign(128, alloc_size); > if (!s2) { > perror("memalign"); > exit(1); > } > > - srandom(1); > +
[PATCH v4 05/10] powerpc/opal: Make __opal_async_{get, release}_token() static
There are no callers of both __opal_async_get_token() and __opal_async_release_token(). This patch also removes the possibility of "emergency through synchronous call to __opal_async_get_token()" as such it makes more sense to initialise opal_sync_sem for the maximum number of async tokens. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/opal.h | 2 -- arch/powerpc/platforms/powernv/opal-async.c | 10 +++--- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 726c23304a57..0078eb5acf98 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -304,9 +304,7 @@ extern void opal_notifier_enable(void); extern void opal_notifier_disable(void); extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); -extern int __opal_async_get_token(void); extern int opal_async_get_token_interruptible(void); -extern int __opal_async_release_token(int token); extern int opal_async_release_token(int token); extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index cf33769a7b72..c43421ab2d2f 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -33,7 +33,7 @@ static struct semaphore opal_async_sem; static struct opal_msg *opal_async_responses; static unsigned int opal_max_async_tokens; -int __opal_async_get_token(void) +static int __opal_async_get_token(void) { unsigned long flags; int token; @@ -73,7 +73,7 @@ int opal_async_get_token_interruptible(void) } EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); -int __opal_async_release_token(int token) +static int __opal_async_release_token(int token) { unsigned long flags; @@ -199,11 +199,7 @@ int __init opal_async_comp_init(void) goto out_opal_node; } - /* Initialize to 1 less than the maximum tokens available, as we may -* require to pop one during emergency through synchronous call to -* __opal_async_get_token() -*/ - sema_init(_async_sem, opal_max_async_tokens - 1); + sema_init(_async_sem, opal_max_async_tokens); out_opal_node: of_node_put(opal_node); -- 2.14.2
[PATCH v4 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()
The OPAL calls performed in this driver shouldn't be using opal_async_wait_response() as this performs a wait_event() which, on long running OPAL calls could result in hung task warnings. wait_event() prevents timely signal delivery which is also undesirable. This patch also attempts to quieten down the use of dev_err() when errors haven't actually occurred and also to return better information up the stack rather than always -EIO. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 57 +++-- 1 file changed, 35 insertions(+), 22 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 3343d4f5c4f3..42383dbca5a6 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -1,7 +1,7 @@ /* * OPAL PNOR flash MTD abstraction * - * Copyright IBM 2015 + * Copyright IBM 2015-2017 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -89,33 +89,46 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, return -EIO; } - if (rc == OPAL_SUCCESS) - goto out_success; + if (rc == OPAL_ASYNC_COMPLETION) { + rc = opal_async_wait_response_interruptible(token, ); + if (rc) { + /* +* If we return the mtd core will free the +* buffer we've just passed to OPAL but OPAL +* will continue to read or write from that +* memory. +* It may be tempting to ultimately return 0 +* if we're doing a read or a write since we +* are going to end up waiting until OPAL is +* done. However, because the MTD core sends +* us the userspace request in chunks, we need +* to it know we've been interrupted. +*/ + rc = -EINTR; + if (opal_async_wait_response(token, )) + dev_err(dev, "opal_async_wait_response() failed\n"); + goto out; + } + rc = opal_get_async_rc(msg); + } - if (rc != OPAL_ASYNC_COMPLETION) { + /* +* OPAL does mutual exclusion on the flash, it will return +* OPAL_BUSY. +* During firmware updates by the service processor OPAL may +* be (temporarily) prevented from accessing the flash, in +* this case OPAL will also return OPAL_BUSY. +* Both cases aren't errors exactly but the flash could have +* changed, userspace should be informed. +*/ + if (rc != OPAL_SUCCESS && rc != OPAL_BUSY) dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - rc = -EIO; - goto out; - } - rc = opal_async_wait_response(token, ); - if (rc) { - dev_err(dev, "opal async wait failed (rc %d)\n", rc); - rc = -EIO; - goto out; - } - - rc = opal_get_async_rc(msg); -out_success: - if (rc == OPAL_SUCCESS) { - rc = 0; - if (retlen) + if (rc == OPAL_SUCCESS && retlen) *retlen = len; - } else { - rc = -EIO; - } + rc = opal_error_code(rc); out: opal_async_release_token(token); return rc; -- 2.14.2
[PATCH v4 06/10] powerpc/opal: Rework the opal-async interface
Future work will add an opal_async_wait_response_interruptible() which will call wait_event_interruptible(). This work requires extra token state to be tracked as wait_event_interruptible() can return and the caller could release the token before OPAL responds. Currently token state is tracked with two bitfields which are 64 bits big but may not need to be as OPAL informs Linux how many async tokens there are. It also uses an array indexed by token to store response messages for each token. The bitfields make it difficult to add more state and also provide a hard maximum as to how many tokens there can be - it is possible that OPAL will inform Linux that there are more than 64 tokens. Rather than add a bitfield to track the extra state, rework the internals slightly. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal-async.c | 92 - 1 file changed, 50 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index c43421ab2d2f..fbae8a37ce2c 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -1,7 +1,7 @@ /* * PowerNV OPAL asynchronous completion interfaces * - * Copyright 2013 IBM Corp. + * Copyright 2013-2017 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -23,40 +23,45 @@ #include #include -#define N_ASYNC_COMPLETIONS64 +enum opal_async_token_state { + ASYNC_TOKEN_UNALLOCATED = 0, + ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_COMPLETED +}; + +struct opal_async_token { + enum opal_async_token_state state; + struct opal_msg response; +}; -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL}; -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); static DEFINE_SPINLOCK(opal_async_comp_lock); static struct semaphore opal_async_sem; -static struct opal_msg *opal_async_responses; static unsigned int opal_max_async_tokens; +static struct opal_async_token *opal_async_tokens; static int __opal_async_get_token(void) { unsigned long flags; - int token; + int token = -EBUSY; spin_lock_irqsave(_async_comp_lock, flags); - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); - if (token >= opal_max_async_tokens) { - token = -EBUSY; - goto out; + for (token = 0; token < opal_max_async_tokens; token++) { + if (opal_async_tokens[token].state == ASYNC_TOKEN_UNALLOCATED) { + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; + goto out; + } } - - if (__test_and_set_bit(token, opal_async_token_map)) { - token = -EBUSY; - goto out; - } - - __clear_bit(token, opal_async_complete_map); - out: spin_unlock_irqrestore(_async_comp_lock, flags); return token; } +/* + * Note: If the returned token is used in an opal call and opal returns + * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * calling another other opal_async_* function + */ int opal_async_get_token_interruptible(void) { int token; @@ -76,6 +81,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); static int __opal_async_release_token(int token) { unsigned long flags; + int rc; if (token < 0 || token >= opal_max_async_tokens) { pr_err("%s: Passed token is out of range, token %d\n", @@ -84,11 +90,18 @@ static int __opal_async_release_token(int token) } spin_lock_irqsave(_async_comp_lock, flags); - __set_bit(token, opal_async_complete_map); - __clear_bit(token, opal_async_token_map); + switch (opal_async_tokens[token].state) { + case ASYNC_TOKEN_COMPLETED: + case ASYNC_TOKEN_ALLOCATED: + opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED; + rc = 0; + break; + default: + rc = 1; + } spin_unlock_irqrestore(_async_comp_lock, flags); - return 0; + return rc; } int opal_async_release_token(int token) @@ -96,12 +109,10 @@ int opal_async_release_token(int token) int ret; ret = __opal_async_release_token(token); - if (ret) - return ret; - - up(_async_sem); + if (!ret) + up(_async_sem); - return 0; + return ret; } EXPORT_SYMBOL_GPL(opal_async_release_token); @@ -122,13 +133,15 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) * functional. */ opal_wake_poller(); - wait_event(opal_async_wait, test_bit(token, opal_async_comple
[PATCH v4 09/10] powerpc/powernv: Add OPAL_BUSY to opal_error_code()
Also export opal_error_code() so that it can be used in modules Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 65c79ecf5a4d..041ddbd1fc57 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -998,6 +998,7 @@ int opal_error_code(int rc) case OPAL_PARAMETER:return -EINVAL; case OPAL_ASYNC_COMPLETION: return -EINPROGRESS; + case OPAL_BUSY: case OPAL_BUSY_EVENT: return -EBUSY; case OPAL_NO_MEM: return -ENOMEM; case OPAL_PERMISSION: return -EPERM; @@ -1037,3 +1038,4 @@ EXPORT_SYMBOL_GPL(opal_write_oppanel_async); /* Export this for KVM */ EXPORT_SYMBOL_GPL(opal_int_set_mfrr); EXPORT_SYMBOL_GPL(opal_int_eoi); +EXPORT_SYMBOL_GPL(opal_error_code); -- 2.14.2
[PATCH v4 00/10] Allow opal-async waiters to get interrupted
V4: Rework and rethink. To recap: Userspace MTD read()s/write()s and erases to powernv_flash become calls into the OPAL firmware which subsequently handles flash access. Because the read()s, write()s or erases can be large (bounded of course my the size of flash) OPAL may take some time to service the request, this causes the powernv_flash driver to sit in a wait_event() for potentially minutes. This causes two problems, firstly, tools appear to hang for the entire time as they cannot be interrupted by signals and secondly, this can trigger hung task warnings. The correct solution is to use wait_event_interruptible() which my rework (as part of this series) of the opal-async infrastructure provides. The final patch in this series achieves this. It should eliminate both hung tasks and threads locking up. Included in this series are other simpler fixes for powernv_flash: Don't always return EIO on error. OPAL does mutual exclusion on the flash and also knows when the service processor takes control of the flash, in both of these cases it will return OPAL_BUSY, translating this to EIO is misleading to userspace. Handle receiving OPAL_SUCCESS when it expects OPAL_ASYNC_COMPLETION and don't treat it as an error. Unfortunately there are too many drivers out there with the incorrect behaviour so this means OPAL can never return anything but OPAL_ASYNC_COMPLETION, this shouldn't prevent the code from being correct. Don't return ERESTARTSYS if token acquisition is interrupted as powernv_flash can't be sure it hasn't already performed some work, let userspace deal with the problem. Change the incorrect use of BUG_ON() to WARN_ON() in powernv_flash. Not for powernv_flash, a fix from Stewart Smith which fits into this series as it relies on my improvements to the opal-async infrastructure. V3: export opal_error_code() so that powernv_flash can be built=m Hello, Version one of this series ignored that OPAL may continue to use buffers passed to it after Linux kfree()s the buffer. This version addresses this, not in a particularly nice way - future work could make this better. This version also includes a few cleanups and fixups to powernv_flash driver one along the course of this work that I thought I would just send. The problem we're trying to solve here is that currently all users of the opal-async calls must use wait_event(), this may be undesirable when there is a userspace process behind the request for the opal call, if OPAL takes too long to complete the call then hung task warnings will appear. In order to solve the problem callers should use wait_event_interruptible(), due to the interruptible nature of this call the opal-async infrastructure needs to track extra state associated with each async token, this is prepared for in patch 6/10. While I was working on the opal-async infrastructure improvements Stewart fixed another problem and he relies on the corrected behaviour of opal-async so I've sent it here. Hello MTD folk, traditionally Michael Ellerman takes powernv_flash driver patches through the powerpc tree, as always your feedback is very welcome. Thanks, Cyril Cyril Bur (9): mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON() mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error mtd: powernv_flash: Remove pointless goto in driver init mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition powerpc/opal: Make __opal_async_{get,release}_token() static powerpc/opal: Rework the opal-async interface powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async powerpc/powernv: Add OPAL_BUSY to opal_error_code() mtd: powernv_flash: Use opal_async_wait_response_interruptible() Stewart Smith (1): powernv/opal-sensor: remove not needed lock arch/powerpc/include/asm/opal.h | 4 +- arch/powerpc/platforms/powernv/opal-async.c | 183 +++ arch/powerpc/platforms/powernv/opal-sensor.c | 17 +-- arch/powerpc/platforms/powernv/opal.c| 2 + drivers/mtd/devices/powernv_flash.c | 83 +++- 5 files changed, 194 insertions(+), 95 deletions(-) -- 2.14.2
[PATCH v4 07/10] powernv/opal-sensor: remove not needed lock
From: Stewart Smith <stew...@linux.vnet.ibm.com> Parallel sensor reads could run out of async tokens due to opal_get_sensor_data grabbing tokens but then doing the sensor read behind a mutex, essentially serializing the (possibly asynchronous and relatively slow) sensor read. It turns out that the mutex isn't needed at all, not only should the OPAL interface allow concurrent reads, the implementation is certainly safe for that, and if any sensor we were reading from somewhere isn't, doing the mutual exclusion in the kernel is the wrong place to do it, OPAL should be doing it for the kernel. So, remove the mutex. Additionally, we shouldn't be printing out an error when we don't get a token as the only way this should happen is if we've been interrupted in down_interruptible() on the semaphore. Reported-by: Robert Lippert <rlipp...@google.com> Signed-off-by: Stewart Smith <stew...@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal-sensor.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c b/arch/powerpc/platforms/powernv/opal-sensor.c index aa267f120033..0a7074bb91dc 100644 --- a/arch/powerpc/platforms/powernv/opal-sensor.c +++ b/arch/powerpc/platforms/powernv/opal-sensor.c @@ -19,13 +19,10 @@ */ #include -#include #include #include #include -static DEFINE_MUTEX(opal_sensor_mutex); - /* * This will return sensor information to driver based on the requested sensor * handle. A handle is an opaque id for the powernv, read by the driver from the @@ -38,13 +35,9 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) __be32 data; token = opal_async_get_token_interruptible(); - if (token < 0) { - pr_err("%s: Couldn't get the token, returning\n", __func__); - ret = token; - goto out; - } + if (token < 0) + return token; - mutex_lock(_sensor_mutex); ret = opal_sensor_read(sensor_hndl, token, ); switch (ret) { case OPAL_ASYNC_COMPLETION: @@ -52,7 +45,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) if (ret) { pr_err("%s: Failed to wait for the async response, %d\n", __func__, ret); - goto out_token; + goto out; } ret = opal_error_code(opal_get_async_rc(msg)); @@ -73,10 +66,8 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data) break; } -out_token: - mutex_unlock(_sensor_mutex); - opal_async_release_token(token); out: + opal_async_release_token(token); return ret; } EXPORT_SYMBOL_GPL(opal_get_sensor_data); -- 2.14.2
[PATCH v4 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
BUG_ON() should be reserved in situations where we can not longer guarantee the integrity of the system. In the case where powernv_flash_async_op() receives an impossible op, we can still guarantee the integrity of the system. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index f5396f26ddb4..f9ec38281ff2 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -78,7 +78,9 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, rc = opal_flash_erase(info->id, offset, len, token); break; default: - BUG_ON(1); + WARN_ON_ONCE(1); + opal_async_release_token(token); + return -EIO; } if (rc != OPAL_ASYNC_COMPLETION) { -- 2.14.2
[PATCH v4 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async
This patch adds an _interruptible version of opal_async_wait_response(). This is useful when a long running OPAL call is performed on behalf of a userspace thread, for example, the opal_flash_{read,write,erase} functions performed by the powernv-flash MTD driver. It is foreseeable that these functions would take upwards of two minutes causing the wait_event() to block long enough to cause hung task warnings. Furthermore, wait_event_interruptible() is preferable as otherwise there is no way for signals to stop the process which is going to be confusing in userspace. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/opal.h | 2 + arch/powerpc/platforms/powernv/opal-async.c | 87 +++-- 2 files changed, 85 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 0078eb5acf98..f95ca4560bfa 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -307,6 +307,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); extern int opal_async_get_token_interruptible(void); extern int opal_async_release_token(int token); extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); +extern int opal_async_wait_response_interruptible(uint64_t token, + struct opal_msg *msg); extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); struct rtc_time; diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index fbae8a37ce2c..e2004606b75b 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -26,6 +26,8 @@ enum opal_async_token_state { ASYNC_TOKEN_UNALLOCATED = 0, ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_DISPATCHED, + ASYNC_TOKEN_ABANDONED, ASYNC_TOKEN_COMPLETED }; @@ -58,8 +60,10 @@ static int __opal_async_get_token(void) } /* - * Note: If the returned token is used in an opal call and opal returns - * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * Note: If the returned token is used in an opal call and opal + * returns OPAL_ASYNC_COMPLETION you MUST one of + * opal_async_wait_response() or + * opal_async_wait_response_interruptible() at least once before * calling another other opal_async_* function */ int opal_async_get_token_interruptible(void) @@ -96,6 +100,16 @@ static int __opal_async_release_token(int token) opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED; rc = 0; break; + /* +* DISPATCHED and ABANDONED tokens must wait for OPAL to +* respond. +* Mark a DISPATCHED token as ABANDONED so that the response +* response handling code knows no one cares and that it can +* free it then. +*/ + case ASYNC_TOKEN_DISPATCHED: + opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED; + /* Fall through */ default: rc = 1; } @@ -128,7 +142,11 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) return -EINVAL; } - /* Wakeup the poller before we wait for events to speed things + /* +* There is no need to mark the token as dispatched, wait_event() +* will block until the token completes. +* +* Wakeup the poller before we wait for events to speed things * up on platforms or simulators where the interrupts aren't * functional. */ @@ -141,11 +159,66 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) } EXPORT_SYMBOL_GPL(opal_async_wait_response); +int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg) +{ + unsigned long flags; + int ret; + + if (token >= opal_max_async_tokens) { + pr_err("%s: Invalid token passed\n", __func__); + return -EINVAL; + } + + if (!msg) { + pr_err("%s: Invalid message pointer passed\n", __func__); + return -EINVAL; + } + + /* +* The first time this gets called we mark the token as DISPATCHED +* so that if wait_event_interruptible() returns not zero and the +* caller frees the token, we know not to actually free the token +* until the response comes. +* +* Only change if the token is ALLOCATED - it may have been +* completed even before the caller gets around to calling this +* the first time. +* +* There is also a dirty great comment at the token allocation +* function that if the opal call returns OPAL_ASYNC_COMPLETION to +* the caller then the caller *must* call this or the not +* interruptible version before doing anything e
[PATCH v4 03/10] mtd: powernv_flash: Remove pointless goto in driver init
Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index ca3ca6adf71e..4dd3b5d2feb2 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -227,21 +227,20 @@ static int powernv_flash_probe(struct platform_device *pdev) int ret; data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); - if (!data) { - ret = -ENOMEM; - goto out; - } + if (!data) + return -ENOMEM; + data->mtd.priv = data; ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id)); if (ret) { dev_err(dev, "no device property 'ibm,opal-id'\n"); - goto out; + return ret; } ret = powernv_flash_set_driver_info(dev, >mtd); if (ret) - goto out; + return ret; dev_set_drvdata(dev, data); @@ -250,10 +249,7 @@ static int powernv_flash_probe(struct platform_device *pdev) * with an ffs partition at the start, it should prove easier for users * to deal with partitions or not as they see fit */ - ret = mtd_device_register(>mtd, NULL, 0); - -out: - return ret; + return mtd_device_register(>mtd, NULL, 0); } /** -- 2.14.2
[PATCH v4 04/10] mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition
Because the MTD core might split up a read() or write() from userspace into several calls to the driver, we may fail to get a token but already have done some work, best to return -EINTR back to userspace and have them decide what to do. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 4dd3b5d2feb2..3343d4f5c4f3 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -47,6 +47,11 @@ enum flash_op { FLASH_OP_ERASE, }; +/* + * Don't return -ERESTARTSYS if we can't get a token, the MTD core + * might have split up the call from userspace and called into the + * driver more than once, we'll already have done some amount of work. + */ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, loff_t offset, size_t len, size_t *retlen, u_char *buf) { @@ -63,6 +68,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, if (token < 0) { if (token != -ERESTARTSYS) dev_err(dev, "Failed to get an async token\n"); + else + token = -EINTR; return token; } -- 2.14.2
[PATCH v4 02/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
While this driver expects to interact asynchronously, OPAL is well within its rights to return OPAL_SUCCESS to indicate that the operation completed without the need for a callback. We shouldn't treat OPAL_SUCCESS as an error rather we should wrap up and return promptly to the caller. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- I'll note here that currently no OPAL exists that will return OPAL_SUCCESS so there isn't the possibility of a bug today. --- drivers/mtd/devices/powernv_flash.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index f9ec38281ff2..ca3ca6adf71e 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -63,7 +63,6 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, if (token < 0) { if (token != -ERESTARTSYS) dev_err(dev, "Failed to get an async token\n"); - return token; } @@ -83,21 +82,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, return -EIO; } + if (rc == OPAL_SUCCESS) + goto out_success; + if (rc != OPAL_ASYNC_COMPLETION) { dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - opal_async_release_token(token); - return -EIO; + rc = -EIO; + goto out; } rc = opal_async_wait_response(token, ); - opal_async_release_token(token); if (rc) { dev_err(dev, "opal async wait failed (rc %d)\n", rc); - return -EIO; + rc = -EIO; + goto out; } rc = opal_get_async_rc(msg); +out_success: if (rc == OPAL_SUCCESS) { rc = 0; if (retlen) @@ -106,6 +109,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, rc = -EIO; } +out: + opal_async_release_token(token); return rc; } -- 2.14.2
[PATCH 3/3] powerpc/tm: P9 disable transactionally suspended sigcontexts
From: Michael Neuling <mi...@neuling.org> Unfortunately userspace can construct a sigcontext which enables suspend. Thus userspace can force Linux into a path where trechkpt is executed. This patch blocks this from happening on POWER9 but sanity checking sigcontexts passed in. ptrace doesn't have this problem as only MSR SE and BE can be changed via ptrace. This patch also adds a number of WARN_ON() in case we every enter suspend when we shouldn't. This should catch systems that don't have the firmware change and are running TM. A future firmware change will allow suspend mode on POWER9 but that is going to require additional Linux changes to support. In the interim, this allows TM to continue to (partially) work while stopping userspace from crashing Linux. Signed-off-by: Michael Neuling <mi...@neuling.org> Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/kernel/process.c | 2 ++ arch/powerpc/kernel/signal_32.c | 4 arch/powerpc/kernel/signal_64.c | 5 + 3 files changed, 11 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf3454..5b81673c5026 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -903,6 +903,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk) if (!MSR_TM_ACTIVE(thr->regs->msr)) goto out_and_saveregs; + WARN_ON(!tm_suspend_supported()); + TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, " "ccr=%lx, msr=%lx, trap=%lx)\n", tsk->pid, thr->regs->nip, diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 92fb1c8dbbd8..9eac0131c080 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -519,6 +519,8 @@ static int save_tm_user_regs(struct pt_regs *regs, { unsigned long msr = regs->msr; + WARN_ON(!tm_suspend_supported()); + /* Remove TM bits from thread's MSR. The MSR in the sigcontext * just indicates to userland that we were doing a transaction, but we * don't want to return in transactional state. This also ensures @@ -769,6 +771,8 @@ static long restore_tm_user_regs(struct pt_regs *regs, int i; #endif + if (!tm_suspend_supported()) + return 1; /* * restore general registers but not including MSR or SOFTE. Also * take care of keeping r2 (TLS) intact if not a signal. diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index c83c115858c1..6d28caf8496f 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -214,6 +214,8 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc, BUG_ON(!MSR_TM_ACTIVE(regs->msr)); + WARN_ON(!tm_suspend_supported()); + /* Remove TM bits from thread's MSR. The MSR in the sigcontext * just indicates to userland that we were doing a transaction, but we * don't want to return in transactional state. This also ensures @@ -430,6 +432,9 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, BUG_ON(tsk != current); + if (!tm_suspend_supported()) + return -EINVAL; + /* copy the GPRs */ err |= __copy_from_user(regs->gpr, tm_sc->gp_regs, sizeof(regs->gpr)); err |= __copy_from_user(>thread.ckpt_regs, sc->gp_regs, -- 2.14.2
[PATCH 2/3] powerpc/tm: P9 disabled suspend mode workaround
[from Michael Neulings original patch] Each POWER9 core is made of two super slices. Each super slice can only have one thread at a time in TM suspend mode. The super slice restricts ever entering a state where both threads are in suspend by aborting transactions on tsuspend or exceptions into the kernel. Unfortunately for context switch we need trechkpt which forces suspend mode. If a thread is already in suspend and a second thread needs to be restored that was suspended, the trechkpt must be executed. Currently the trechkpt will hang in this case until the other thread exits suspend. This causes problems for Linux resulting in hang and RCU stall detectors going off. To workaround this, we disable suspend in the core. This is done via a firmware change which stops the hardware ever getting into suspend. The hardware will always rollback a transaction on any tsuspend or entry into the kernel. [added by Cyril Bur] As the no-suspend firmware change is novel and untested using it should be opt in by users. Furthumore, currently the kernel has no method to know if the firmware has applied the no-suspend workaround. This patch extends the ppc_tm commandline option to allow users to opt-in if they are sure that their firmware has been updated and they understand the risks involed. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- Documentation/admin-guide/kernel-parameters.txt | 7 +-- arch/powerpc/include/asm/cputable.h | 6 ++ arch/powerpc/include/asm/tm.h | 6 -- arch/powerpc/kernel/cputable.c | 12 arch/powerpc/kernel/setup_64.c | 16 ++-- 5 files changed, 37 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 4e2b5d9078a0..a0f757f749cf 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -805,8 +805,11 @@ Disable RADIX MMU mode on POWER9 ppc_tm= [PPC] - Format: {"off"} - Disable Hardware Transactional Memory + Format: {"off" | "no-suspend"} + "Off" Will disable Hardware Transactional Memory. + "no-suspend" Informs the kernel that the + hardware will not transition into the kernel + with a suspended transaction. disable_cpu_apicid= [X86,APIC,SMP] Format: diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index a9bf921f4efc..e66101830af2 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -124,6 +124,12 @@ extern void identify_cpu_name(unsigned int pvr); extern void do_feature_fixups(unsigned long value, void *fixup_start, void *fixup_end); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern bool tm_suspend_supported(void); +#else +static inline bool tm_suspend_supported(void) { return false; } +#endif + extern const char *powerpc_base_platform; #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECKS diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index eca1c866ca97..1fd0b5f72861 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -9,9 +9,11 @@ #ifndef __ASSEMBLY__ -#define TM_STATE_ON0 -#define TM_STATE_OFF 1 +#define TM_STATE_ON0 +#define TM_STATE_OFF 1 +#define TM_STATE_NO_SUSPEND2 +extern int ppc_tm_state; extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, unsigned long orig_msr, uint8_t cause); diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 760872916013..2cb01b48123a 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -22,6 +22,7 @@ #include /* for PTRRELOC on ARCH=ppc */ #include #include +#include static struct cpu_spec the_cpu_spec __read_mostly; @@ -2301,6 +2302,17 @@ void __init identify_cpu_name(unsigned int pvr) } } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +bool tm_suspend_supported(void) +{ + if (cpu_has_feature(CPU_FTR_TM)) { + if (pvr_version_is(PVR_POWER9) && ppc_tm_state != TM_STATE_NO_SUSPEND) + return false; + return true; + } + return false; +} +#endif #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECKS struct static_key_true cpu_feature_keys[NUM_CPU_FTR_KEYS] = { diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index e37c26d2e54b..227ac600a1b7 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -251,12 +251,14 @@ static void cpu_ready_for_interrupts(void) get_
[PATCH 1/3] powerpc/tm: Add commandline option to disable hardware transactional memory
Currently the kernel relies on firmware to inform it whether or not the CPU supports HTM and as long as the kernel was built with CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make use of the facility. There may be situations where it would be advantageous for the kernel to not allow userspace to use HTM, currently the only way to achieve this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n. This patch adds a simple commandline option so that HTM can be disabled at boot time. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- Documentation/admin-guide/kernel-parameters.txt | 4 arch/powerpc/include/asm/tm.h | 3 +++ arch/powerpc/kernel/setup_64.c | 28 + 3 files changed, 35 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 05496622b4ef..4e2b5d9078a0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -804,6 +804,10 @@ disable_radix [PPC] Disable RADIX MMU mode on POWER9 + ppc_tm= [PPC] + Format: {"off"} + Disable Hardware Transactional Memory + disable_cpu_apicid= [X86,APIC,SMP] Format: The number of initial APIC ID for the diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index 82e06ca3a49b..eca1c866ca97 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -9,6 +9,9 @@ #ifndef __ASSEMBLY__ +#define TM_STATE_ON0 +#define TM_STATE_OFF 1 + extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, unsigned long orig_msr, uint8_t cause); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index b89c6aac48c9..e37c26d2e54b 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -68,6 +68,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -250,6 +251,31 @@ static void cpu_ready_for_interrupts(void) get_paca()->kernel_msr = MSR_KERNEL; } +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +static int ppc_tm_state; +static int __init parse_ppc_tm(char *p) +{ + if (strcmp(p, "off") == 0) + ppc_tm_state = TM_STATE_OFF; + else + printk(KERN_NOTICE "Unknown value to cmdline ppc_tm '%s'\n", p); + return 0; +} +early_param("ppc_tm", parse_ppc_tm); + +static void check_disable_tm(void) +{ + if (cpu_has_feature(CPU_FTR_TM) && ppc_tm_state == TM_STATE_OFF) { + printk(KERN_NOTICE "Disabling hardware transactional memory (HTM)\n"); + cur_cpu_spec->cpu_user_features2 &= + ~(PPC_FEATURE2_HTM_NOSC | PPC_FEATURE2_HTM); + cur_cpu_spec->cpu_features &= ~CPU_FTR_TM; + } +} +#else +static void check_disable_tm(void) { } +#endif + /* * Early initialization entry point. This is called by head.S * with MMU translation disabled. We rely on the "feature" of @@ -299,6 +325,8 @@ void __init early_setup(unsigned long dt_ptr) */ early_init_devtree(__va(dt_ptr)); + check_disable_tm(); + /* Now we know the logical id of our boot cpu, setup the paca. */ setup_paca([boot_cpuid]); fixup_boot_paca(); -- 2.14.2
Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
On Sun, 2017-09-24 at 05:18 +0800, Simon Guo wrote: > Hi Cyril, > On Sat, Sep 23, 2017 at 12:06:48AM +1000, Cyril Bur wrote: > > On Thu, 2017-09-21 at 07:34 +0800, wei.guo.si...@gmail.com wrote: > > > From: Simon Guo <wei.guo.si...@gmail.com> > > > > > > This patch add VMX primitives to do memcmp() in case the compare size > > > exceeds 4K bytes. > > > > > > > Hi Simon, > > > > Sorry I didn't see this sooner, I've actually been working on a kernel > > version of glibc commit dec4a7105e (powerpc: Improve memcmp performance > > for POWER8) unfortunately I've been distracted and it still isn't done. > > Thanks for sync with me. Let's consolidate our effort together :) > > I have a quick check on glibc commit dec4a7105e. > Looks the aligned case comparison with VSX is launched without rN size > limitation, which means it will have a VSX reg load penalty even when the > length is 9 bytes. > This was written for userspace which doesn't have to explicitly enable VMX in order to use it - we need to be smarter in the kernel. > It did some optimization when src/dest addrs don't have the same offset > on 8 bytes alignment boundary. I need to read more closely. > > > I wonder if we can consolidate our efforts here. One thing I did come > > across in my testing is that for memcmp() that will fail early (I > > haven't narrowed down the the optimal number yet) the cost of enabling > > VMX actually turns out to be a performance regression, as such I've > > added a small check of the first 64 bytes to the start before enabling > > VMX to ensure the penalty is worth taking. > > Will there still be a penalty if the 65th byte differs? > I haven't benchmarked it exactly, my rationale for 64 bytes was that it is the stride of the vectorised copy loop so, if we know we'll fail before even completing one iteration of the vectorized loop there isn't any point using the vector regs. > > > > Also, you should consider doing 4K and greater, KSM (Kernel Samepage > > Merging) uses PAGE_SIZE which can be as small as 4K. > > Currently the VMX will only be applied when size exceeds 4K. Are you > suggesting a bigger threshold than 4K? > Equal to or greater than 4K, KSM will benefit. > We can sync more offline for v3. > > Thanks, > - Simon
Re: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
On Thu, 2017-09-21 at 07:34 +0800, wei.guo.si...@gmail.com wrote: > From: Simon Guo> > This patch add VMX primitives to do memcmp() in case the compare size > exceeds 4K bytes. > Hi Simon, Sorry I didn't see this sooner, I've actually been working on a kernel version of glibc commit dec4a7105e (powerpc: Improve memcmp performance for POWER8) unfortunately I've been distracted and it still isn't done. I wonder if we can consolidate our efforts here. One thing I did come across in my testing is that for memcmp() that will fail early (I haven't narrowed down the the optimal number yet) the cost of enabling VMX actually turns out to be a performance regression, as such I've added a small check of the first 64 bytes to the start before enabling VMX to ensure the penalty is worth taking. Also, you should consider doing 4K and greater, KSM (Kernel Samepage Merging) uses PAGE_SIZE which can be as small as 4K. Cyril > Test result with following test program(replace the "^>" with ""): > -- > > # cat tools/testing/selftests/powerpc/stringloops/memcmp.c > > #include > > #include > > #include > > #include > > #include "utils.h" > > #define SIZE (1024 * 1024 * 900) > > #define ITERATIONS 40 > > int test_memcmp(const void *s1, const void *s2, size_t n); > > static int testcase(void) > { > char *s1; > char *s2; > unsigned long i; > > s1 = memalign(128, SIZE); > if (!s1) { > perror("memalign"); > exit(1); > } > > s2 = memalign(128, SIZE); > if (!s2) { > perror("memalign"); > exit(1); > } > > for (i = 0; i < SIZE; i++) { > s1[i] = i & 0xff; > s2[i] = i & 0xff; > } > for (i = 0; i < ITERATIONS; i++) { > int ret = test_memcmp(s1, s2, SIZE); > > if (ret) { > printf("return %d at[%ld]! should have returned > zero\n", ret, i); > abort(); > } > } > > return 0; > } > > int main(void) > { > return test_harness(testcase, "memcmp"); > } > -- > Without VMX patch: >7.435191479 seconds time elapsed >( +- 0.51% ) > With VMX patch: >6.802038938 seconds time elapsed >( +- 0.56% ) > There is ~+8% improvement. > > However I am not aware whether there is use case in kernel for memcmp on > large size yet. > > Signed-off-by: Simon Guo > --- > arch/powerpc/include/asm/asm-prototypes.h | 2 +- > arch/powerpc/lib/copypage_power7.S| 2 +- > arch/powerpc/lib/memcmp_64.S | 82 > +++ > arch/powerpc/lib/memcpy_power7.S | 2 +- > arch/powerpc/lib/vmx-helper.c | 2 +- > 5 files changed, 86 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/asm-prototypes.h > b/arch/powerpc/include/asm/asm-prototypes.h > index 7330150..e6530d8 100644 > --- a/arch/powerpc/include/asm/asm-prototypes.h > +++ b/arch/powerpc/include/asm/asm-prototypes.h > @@ -49,7 +49,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval, > /* VMX copying */ > int enter_vmx_usercopy(void); > int exit_vmx_usercopy(void); > -int enter_vmx_copy(void); > +int enter_vmx_ops(void); > void * exit_vmx_copy(void *dest); > > /* Traps */ > diff --git a/arch/powerpc/lib/copypage_power7.S > b/arch/powerpc/lib/copypage_power7.S > index ca5fc8f..9e7729e 100644 > --- a/arch/powerpc/lib/copypage_power7.S > +++ b/arch/powerpc/lib/copypage_power7.S > @@ -60,7 +60,7 @@ _GLOBAL(copypage_power7) > std r4,-STACKFRAMESIZE+STK_REG(R30)(r1) > std r0,16(r1) > stdur1,-STACKFRAMESIZE(r1) > - bl enter_vmx_copy > + bl enter_vmx_ops > cmpwi r3,0 > ld r0,STACKFRAMESIZE+16(r1) > ld r3,STK_REG(R31)(r1) > diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S > index 6dccfb8..40218fc 100644 > --- a/arch/powerpc/lib/memcmp_64.S > +++ b/arch/powerpc/lib/memcmp_64.S > @@ -162,6 +162,13 @@ _GLOBAL(memcmp) > blr > > .Llong: > +#ifdef CONFIG_ALTIVEC > + /* Try to use vmx loop if length is larger than 4K */ > + cmpldi cr6,r5,4096 > + bgt cr6,.Lvmx_cmp > + > +.Llong_novmx_cmp: > +#endif > li off8,8 > li off16,16 > li off24,24 > @@ -319,4 +326,79 @@ _GLOBAL(memcmp) > 8: > blr > > +#ifdef CONFIG_ALTIVEC > +.Lvmx_cmp: > + mflrr0 > + std r3,-STACKFRAMESIZE+STK_REG(R31)(r1) > + std r4,-STACKFRAMESIZE+STK_REG(R30)(r1) > + std r5,-STACKFRAMESIZE+STK_REG(R29)(r1) > + std r0,16(r1) > + stdur1,-STACKFRAMESIZE(r1) > + bl enter_vmx_ops > + cmpwi cr1,r3,0 > + ld r0,STACKFRAMESIZE+16(r1) > + ld
Re: [PATCH v2] powerpc/tm: Flush TM only if CPU has TM feature
On Wed, 2017-09-13 at 22:13 -0400, Gustavo Romero wrote: > Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") > added code to access TM SPRs in flush_tmregs_to_thread(). However > flush_tmregs_to_thread() does not check if TM feature is available on > CPU before trying to access TM SPRs in order to copy live state to > thread structures. flush_tmregs_to_thread() is indeed guarded by > CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel > was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on > a CPU without TM feature available, thus rendering the execution > of TM instructions that are treated by the CPU as illegal instructions. > > The fix is just to add proper checking in flush_tmregs_to_thread() > if CPU has the TM feature before accessing any TM-specific resource, > returning immediately if TM is no available on the CPU. Adding > that checking in flush_tmregs_to_thread() instead of in places > where it is called, like in vsr_get() and vsr_set(), is better because > avoids the same problem cropping up elsewhere. > > Cc: sta...@vger.kernel.org # v4.13+ > Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com> Keeping in mind I reviewed cd63f3c and feeling a bit sheepish having missed this. Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > arch/powerpc/kernel/ptrace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c > index 07cd22e..f52ad5b 100644 > --- a/arch/powerpc/kernel/ptrace.c > +++ b/arch/powerpc/kernel/ptrace.c > @@ -131,7 +131,7 @@ static void flush_tmregs_to_thread(struct task_struct > *tsk) >* in the appropriate thread structures from live. >*/ > > - if (tsk != current) > + if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current)) > return; > > if (MSR_TM_SUSPENDED(mfmsr())) {
Re: [PATCH] powerpc: Use reg.h values for program check reason codes
On Wed, 2017-08-16 at 10:52 +0200, Christophe LEROY wrote: > Hi, > > Le 16/08/2017 à 08:50, Cyril Bur a écrit : > > Small amount of #define duplication, makes sense for these to be in > > reg.h. > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > Looks similar to the following applies commit, doesn't it ? > > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=d30a5a5262ca64d58aa07fb2ecd7f992df83b4bc > Oops, I think I'm based off Linus' tree. Sorry for the noise. Cyril *starts writing patch to rename to PROGTMBAD*... because clearly haha ;) > Christophe > > > --- > > arch/powerpc/include/asm/reg.h | 1 + > > arch/powerpc/kernel/traps.c| 10 +- > > 2 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > > index a3b6575c7842..c22b1ae5ad03 100644 > > --- a/arch/powerpc/include/asm/reg.h > > +++ b/arch/powerpc/include/asm/reg.h > > @@ -675,6 +675,7 @@ > > * may not be recoverable */ > > #define SRR1_WS_DEEPER0x0002 /* Some resources not > > maintained */ > > #define SRR1_WS_DEEP 0x0001 /* All resources maintained > > */ > > +#define SRR1_PROGTMBAD 0x0020 /* TM Bad Thing */ > > #define SRR1_PROGFPE0x0010 /* Floating Point Enabled */ > > #define SRR1_PROGILL0x0008 /* Illegal instruction */ > > #define SRR1_PROGPRIV 0x0004 /* Privileged instruction */ > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > > index 1f7ec178db05..0a5ddaea8bf1 100644 > > --- a/arch/powerpc/kernel/traps.c > > +++ b/arch/powerpc/kernel/traps.c > > @@ -416,11 +416,11 @@ static inline int check_io_access(struct pt_regs > > *regs) > > exception is in the MSR. */ > > #define get_reason(regs) ((regs)->msr) > > #define get_mc_reason(regs) ((regs)->msr) > > -#define REASON_TM 0x20 > > -#define REASON_FP 0x10 > > -#define REASON_ILLEGAL 0x8 > > -#define REASON_PRIVILEGED 0x4 > > -#define REASON_TRAP0x2 > > +#define REASON_TM SRR1_PROGTMBAD > > +#define REASON_FP SRR1_PROGFPE > > +#define REASON_ILLEGAL SRR1_PROGILL > > +#define REASON_PRIVILEGED SRR1_PROGPRIV > > +#define REASON_TRAPSRR1_PROGTRAP > > > > #define single_stepping(regs) ((regs)->msr & MSR_SE) > > #define clear_single_step(regs) ((regs)->msr &= ~MSR_SE) > >
[PATCH] powerpc: Use reg.h values for program check reason codes
Small amount of #define duplication, makes sense for these to be in reg.h. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/traps.c| 10 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index a3b6575c7842..c22b1ae5ad03 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -675,6 +675,7 @@ * may not be recoverable */ #define SRR1_WS_DEEPER0x0002 /* Some resources not maintained */ #define SRR1_WS_DEEP 0x0001 /* All resources maintained */ +#define SRR1_PROGTMBAD 0x0020 /* TM Bad Thing */ #define SRR1_PROGFPE 0x0010 /* Floating Point Enabled */ #define SRR1_PROGILL 0x0008 /* Illegal instruction */ #define SRR1_PROGPRIV0x0004 /* Privileged instruction */ diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 1f7ec178db05..0a5ddaea8bf1 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -416,11 +416,11 @@ static inline int check_io_access(struct pt_regs *regs) exception is in the MSR. */ #define get_reason(regs) ((regs)->msr) #define get_mc_reason(regs)((regs)->msr) -#define REASON_TM 0x20 -#define REASON_FP 0x10 -#define REASON_ILLEGAL 0x8 -#define REASON_PRIVILEGED 0x4 -#define REASON_TRAP0x2 +#define REASON_TM SRR1_PROGTMBAD +#define REASON_FP SRR1_PROGFPE +#define REASON_ILLEGAL SRR1_PROGILL +#define REASON_PRIVILEGED SRR1_PROGPRIV +#define REASON_TRAPSRR1_PROGTRAP #define single_stepping(regs) ((regs)->msr & MSR_SE) #define clear_single_step(regs)((regs)->msr &= ~MSR_SE) -- 2.14.1
Re: [PATCH V9 1/3] powernv: powercap: Add support for powercap framework
On Mon, 2017-07-31 at 07:54 +0530, Shilpasri G Bhat wrote: > Adds a generic powercap framework to change the system powercap > inband through OPAL-OCC command/response interface. > > Signed-off-by: Shilpasri G Bhat> --- > Changes from V8: > - Use __pa() while passing pointer in opal call > - Use mutex_lock_interruptible() > - Fix error codes returned to user > - Allocate and add sysfs attributes in a single loop > > arch/powerpc/include/asm/opal-api.h| 5 +- > arch/powerpc/include/asm/opal.h| 4 + > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-powercap.c | 243 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/opal.c | 4 + > 6 files changed, 258 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 3130a73..c3e0c4a 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -42,6 +42,7 @@ > #define OPAL_I2C_STOP_ERR-24 > #define OPAL_XIVE_PROVISIONING -31 > #define OPAL_XIVE_FREE_ACTIVE-32 > +#define OPAL_TIMEOUT -33 > > /* API Tokens (in r0) */ > #define OPAL_INVALID_CALL -1 > @@ -190,7 +191,9 @@ > #define OPAL_NPU_INIT_CONTEXT146 > #define OPAL_NPU_DESTROY_CONTEXT 147 > #define OPAL_NPU_MAP_LPAR148 > -#define OPAL_LAST148 > +#define OPAL_GET_POWERCAP152 > +#define OPAL_SET_POWERCAP153 > +#define OPAL_LAST153 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 588fb1c..ec2087c 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int64_t opal_xive_free_irq(uint32_t girq); > int64_t opal_xive_sync(uint32_t type, uint32_t id); > int64_t opal_xive_dump(uint32_t type, uint32_t id); > +int opal_get_powercap(u32 handle, int token, u32 *pcap); > +int opal_set_powercap(u32 handle, int token, u32 pcap); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg) > > void opal_wake_poller(void); > > +void opal_powercap_init(void); > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_OPAL_H */ > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index b5d98cb..e79f806 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o > +obj-y+= opal-kmsg.o opal-powercap.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c > b/arch/powerpc/platforms/powernv/opal-powercap.c > new file mode 100644 > index 000..9be5093 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-powercap.c > @@ -0,0 +1,243 @@ > +/* > + * PowerNV OPAL Powercap interface > + * > + * Copyright 2017 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#define pr_fmt(fmt) "opal-powercap: " fmt > + > +#include > +#include > +#include > + > +#include > + > +DEFINE_MUTEX(powercap_mutex); > + > +static struct kobject *powercap_kobj; > + > +struct powercap_attr { > + u32 handle; > + struct kobj_attribute attr; > +}; > + > +static struct pcap { > + struct attribute_group pg; > + struct powercap_attr *pattrs; > +} *pcaps; > + > +static ssize_t powercap_show(struct kobject *kobj, struct kobj_attribute > *attr, > + char *buf) > +{ > + struct powercap_attr *pcap_attr = container_of(attr, > + struct powercap_attr, attr); > + struct opal_msg msg; > + u32 pcap; > + int ret, token; > + > + token = opal_async_get_token_interruptible(); > + if (token < 0) { > +
Re: [PATCH v4 2/5] powerpc/lib/sstep: Add popcnt instruction emulation
On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote: > This adds emulations for the popcntb, popcntw, and popcntd instructions. > Tested for correctness against the popcnt{b,w,d} instructions on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown@gmail.com> Unlike the rest of this series, it isn't immediately clear that it is correct, we're definitely on the other side of the optimisation vs readability line. It looks like it is, perhaps some comments to clarify. Otherwise, Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > v4: > - change ifdef macro from __powerpc64__ to CONFIG_PPC64 > - slight optimisations > (now identical to the popcntb implementation in kernel/traps.c) > v3: > - optimised using the Giles-Miller method of side-ways addition > v2: > - fixed opcodes > - fixed typecasting > - fixed bitshifting error for both 32 and 64bit arch > --- > arch/powerpc/lib/sstep.c | 42 +- > 1 file changed, 41 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 87d277f..2fd7377 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -612,6 +612,34 @@ static nokprobe_inline void do_cmpb(struct pt_regs > *regs, unsigned long v1, > regs->gpr[rd] = out_val; > } > > +/* > + * The size parameter is used to adjust the equivalent popcnt instruction. > + * popcntb = 8, popcntw = 32, popcntd = 64 > + */ > +static nokprobe_inline void do_popcnt(struct pt_regs *regs, unsigned long v1, > + int size, int ra) > +{ > + unsigned long long out = v1; > + > + out -= (out >> 1) & 0x; > + out = (0x & out) + (0x & (out >> 2)); > + out = (out + (out >> 4)) & 0x0f0f0f0f0f0f0f0f; > + > + if (size == 8) {/* popcntb */ > + regs->gpr[ra] = out; > + return; > + } > + out += out >> 8; > + out += out >> 16; > + if (size == 32) { /* popcntw */ > + regs->gpr[ra] = out & 0x003f003f; > + return; > + } > + > + out = (out + (out >> 32)) & 0x7f; > + regs->gpr[ra] = out;/* popcntd */ > +} > + > static nokprobe_inline int trap_compare(long v1, long v2) > { > int ret = 0; > @@ -1194,6 +1222,10 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > regs->gpr[ra] = regs->gpr[rd] & ~regs->gpr[rb]; > goto logical_done; > > + case 122: /* popcntb */ > + do_popcnt(regs, regs->gpr[rd], 8, ra); > + goto logical_done; > + > case 124: /* nor */ > regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]); > goto logical_done; > @@ -1206,6 +1238,10 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > regs->gpr[ra] = regs->gpr[rd] ^ regs->gpr[rb]; > goto logical_done; > > + case 378: /* popcntw */ > + do_popcnt(regs, regs->gpr[rd], 32, ra); > + goto logical_done; > + > case 412: /* orc */ > regs->gpr[ra] = regs->gpr[rd] | ~regs->gpr[rb]; > goto logical_done; > @@ -1217,7 +1253,11 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > case 476: /* nand */ > regs->gpr[ra] = ~(regs->gpr[rd] & regs->gpr[rb]); > goto logical_done; > - > +#ifdef CONFIG_PPC64 > + case 506: /* popcntd */ > + do_popcnt(regs, regs->gpr[rd], 64, ra); > + goto logical_done; > +#endif > case 922: /* extsh */ > regs->gpr[ra] = (signed short) regs->gpr[rd]; > goto logical_done;
Re: [PATCH v4 5/5] powerpc/lib/sstep: Add isel instruction emulation
On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote: > This adds emulation for the isel instruction. > Tested for correctness against the isel instruction and its extended > mnemonics (lt, gt, eq) on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown@gmail.com> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > v4: > - simplify if statement to ternary op > (same as isel emulation in kernel/traps.c) > v2: > - fixed opcode > - fixed definition to include the 'if RA=0, a=0' clause > - fixed ccr bitshifting error > --- > arch/powerpc/lib/sstep.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index af4eef9..473bab5 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -1240,6 +1240,14 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > /* > * Logical instructions > */ > + case 15:/* isel */ > + mb = (instr >> 6) & 0x1f; /* bc */ > + val = (regs->ccr >> (31 - mb)) & 1; > + val2 = (ra) ? regs->gpr[ra] : 0; > + > + regs->gpr[rd] = (val) ? val2 : regs->gpr[rb]; > + goto logical_done; > + > case 26:/* cntlzw */ > asm("cntlzw %0,%1" : "=r" (regs->gpr[ra]) : > "r" (regs->gpr[rd]));
Re: [PATCH v4 4/5] powerpc/lib/sstep: Add prty instruction emulation
On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote: > This adds emulation for the prtyw and prtyd instructions. > Tested for logical correctness against the prtyw and prtyd instructions > on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown@gmail.com> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > v4: > - use simpler xor method > v3: > - optimised using the Giles-Miller method of side-ways addition > v2: > - fixed opcodes > - fixed bitshifting and typecast errors > - merged do_prtyw and do_prtyd into single function > --- > arch/powerpc/lib/sstep.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index c9fd613..af4eef9 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -657,6 +657,24 @@ static nokprobe_inline void do_bpermd(struct pt_regs > *regs, unsigned long v1, > regs->gpr[ra] = perm; > } > #endif /* CONFIG_PPC64 */ > +/* > + * The size parameter adjusts the equivalent prty instruction. > + * prtyw = 32, prtyd = 64 > + */ > +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v, > + int size, int ra) > +{ > + unsigned long long res = v ^ (v >> 8); > + > + res ^= res >> 16; > + if (size == 32) { /* prtyw */ > + regs->gpr[ra] = res & 0x00010001; > + return; > + } > + > + res ^= res >> 32; > + regs->gpr[ra] = res & 1;/*prtyd */ > +} > > static nokprobe_inline int trap_compare(long v1, long v2) > { > @@ -1247,6 +1265,14 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > case 124: /* nor */ > regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]); > goto logical_done; > + > + case 154: /* prtyw */ > + do_prty(regs, regs->gpr[rd], 32, ra); > + goto logical_done; > + > + case 186: /* prtyd */ > + do_prty(regs, regs->gpr[rd], 64, ra); > + goto logical_done; > #ifdef CONFIG_PPC64 > case 252: /* bpermd */ > do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra);
Re: [PATCH v4 3/5] powerpc/lib/sstep: Add bpermd instruction emulation
On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote: > This adds emulation for the bpermd instruction. > Tested for correctness against the bpermd instruction on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown@gmail.com> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > v4: > - change ifdef macro from __powerpc64__ to CONFIG_PPC64 > v2: > - fixed opcode > - added ifdef tags to do_bpermd func > - fixed bitshifting errors > --- > arch/powerpc/lib/sstep.c | 24 +++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 2fd7377..c9fd613 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -640,6 +640,24 @@ static nokprobe_inline void do_popcnt(struct pt_regs > *regs, unsigned long v1, > regs->gpr[ra] = out;/* popcntd */ > } > > +#ifdef CONFIG_PPC64 > +static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1, > + unsigned long v2, int ra) > +{ > + unsigned char perm, idx; > + unsigned int i; > + > + perm = 0; > + for (i = 0; i < 8; i++) { > + idx = (v1 >> (i * 8)) & 0xff; > + if (idx < 64) > + if (v2 & PPC_BIT(idx)) > + perm |= 1 << i; > + } > + regs->gpr[ra] = perm; > +} > +#endif /* CONFIG_PPC64 */ > + > static nokprobe_inline int trap_compare(long v1, long v2) > { > int ret = 0; > @@ -1229,7 +1247,11 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > case 124: /* nor */ > regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]); > goto logical_done; > - > +#ifdef CONFIG_PPC64 > + case 252: /* bpermd */ > + do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra); > + goto logical_done; > +#endif > case 284: /* xor */ > regs->gpr[ra] = ~(regs->gpr[rd] ^ regs->gpr[rb]); > goto logical_done;
Re: [PATCH v4 1/5] powerpc/lib/sstep: Add cmpb instruction emulation
On Mon, 2017-07-31 at 10:58 +1000, Matt Brown wrote: > This patch adds emulation of the cmpb instruction, enabling xmon to > emulate this instruction. > Tested for correctness against the cmpb asm instruction on ppc64le. > > Signed-off-by: Matt Brown <matthew.brown@gmail.com> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > v2: > - fixed opcode > - fixed mask typecasting > --- > arch/powerpc/lib/sstep.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 33117f8..87d277f 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -596,6 +596,22 @@ static nokprobe_inline void do_cmp_unsigned(struct > pt_regs *regs, unsigned long > regs->ccr = (regs->ccr & ~(0xf << shift)) | (crval << shift); > } > > +static nokprobe_inline void do_cmpb(struct pt_regs *regs, unsigned long v1, > + unsigned long v2, int rd) > +{ > + unsigned long long out_val, mask; > + int i; > + > + out_val = 0; > + for (i = 0; i < 8; i++) { > + mask = 0xffUL << (i * 8); > + if ((v1 & mask) == (v2 & mask)) > + out_val |= mask; > + } > + > + regs->gpr[rd] = out_val; > +} > + > static nokprobe_inline int trap_compare(long v1, long v2) > { > int ret = 0; > @@ -1049,6 +1065,10 @@ int analyse_instr(struct instruction_op *op, struct > pt_regs *regs, > do_cmp_unsigned(regs, val, val2, rd >> 2); > goto instr_done; > > + case 508: /* cmpb */ > + do_cmpb(regs, regs->gpr[rd], regs->gpr[rb], ra); > + goto instr_done; > + > /* > * Arithmetic instructions > */
Re: [PATCH] powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
On Wed, 2017-07-26 at 23:19 +1000, Michael Ellerman wrote: > Historically the boot wrapper was always built 32-bit big endian, even > for 64-bit kernels. That was because old firmwares didn't necessarily > support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile > uses CROSS32CC for compilation. > > However when we added 64-bit little endian support, we also added > support for building the boot wrapper 64-bit. However we kept using > CROSS32CC, because in most cases it is just CC and everything works. > > However if the user doesn't specify CROSS32_COMPILE (which no one ever > does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC > becomes just "gcc". On native systems that is probably OK, but if we're > cross building it definitely isn't, leading to eg: > > gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c > gcc: error: unrecognized argument in option ‘-mabi=elfv2’ > gcc: error: unrecognized command line option ‘-mlittle-endian’ > make: *** [zImage] Error 2 > > To fix it, stop using CROSS32CC, because we may or may not be building > 32-bit. Instead setup a BOOTCC, which defaults to CC, and only use > CROSS32_COMPILE if it's set and we're building for 32-bit. > > Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian > wrapper") > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> Without this patch applied and using a 64bit LE only toolchain my powernv_defconfig build fails: gcc: error: unrecognized argument in option ‘-mabi=elfv2’ gcc: note: valid arguments to ‘-mabi=’ are: ms sysv BOOTAS arch/powerpc/boot/crt0.o BOOTCC arch/powerpc/boot/cuboot.o gcc: error: unrecognized argument in option ‘-mabi=elfv2’ gcc: note: valid arguments to ‘-mabi=’ are: ms sysv COPYarch/powerpc/boot/zlib.h gcc: error: unrecognized command line option ‘-mlittle-endian’; did you mean ‘-fconvert=little-endian’? gcc: error: unrecognized argument in option ‘-mabi=elfv2’ gcc: error: unrecognized command line option ‘-mlittle-endian’; did you mean ‘-fconvert=little-endian’? gcc: note: valid arguments to ‘-mabi=’ are: ms sysv COPYarch/powerpc/boot/zutil.h COPYarch/powerpc/boot/inffast.h COPYarch/powerpc/boot/zconf.h make[1]: *** [arch/powerpc/boot/Makefile:201: arch/powerpc/boot/crt0.o] Error 1 make[1]: *** Waiting for unfinished jobs MODPOST 244 modules gcc: error: unrecognized command line option ‘-mlittle-endian’; did you mean ‘-fconvert=little-endian’? make[1]: *** [arch/powerpc/boot/Makefile:198: arch/powerpc/boot/cpm- serial.o] Error 1 make[1]: *** [arch/powerpc/boot/Makefile:198: arch/powerpc/boot/cuboot.o] Error 1 COPYarch/powerpc/boot/inffixed.h make: *** [arch/powerpc/Makefile:289: zImage] Error 2 make: *** Waiting for unfinished jobs With this patch applied builds fine. Please merge! Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > arch/powerpc/boot/Makefile | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > index a7814a7b1523..6f952fe1f084 100644 > --- a/arch/powerpc/boot/Makefile > +++ b/arch/powerpc/boot/Makefile > @@ -25,12 +25,20 @@ compress-$(CONFIG_KERNEL_XZ) := CONFIG_KERNEL_XZ > BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ >-fno-strict-aliasing -Os -msoft-float -pipe \ >-fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ > - -isystem $(shell $(CROSS32CC) -print-file-name=include) \ >-D$(compress-y) > > +BOOTCC := $(CC) > ifdef CONFIG_PPC64_BOOT_WRAPPER > BOOTCFLAGS += -m64 > +else > +BOOTCFLAGS += -m32 > +ifdef CROSS32_COMPILE > +BOOTCC := $(CROSS32_COMPILE)gcc > +endif > endif > + > +BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) > + > ifdef CONFIG_CPU_BIG_ENDIAN > BOOTCFLAGS += -mbig-endian > else > @@ -183,10 +191,10 @@ clean-files := $(zlib-) $(zlibheader-) > $(zliblinuxheader-) \ > empty.c zImage.coff.lds zImage.ps3.lds zImage.lds > > quiet_cmd_bootcc = BOOTCC $@ > - cmd_bootcc = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< > + cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< > > quiet_cmd_bootas = BOOTAS $@ > - cmd_bootas = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< > + cmd_bootas = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< > > quiet_cmd_bootar = BOOTAR $@ >cmd_bootar = $(CROSS32AR) -cr$(KBUILD_ARFLAGS) $@. $(filter-out > FORCE,$^); mv $@. $@
Re: [PATCH] powerpc/configs: Add a powernv_be_defconfig
On Mon, 2017-07-24 at 22:50 +1000, Michael Ellerman wrote: > Although pretty much everyone using powernv is running little endian, > we should still test we can build for big endian. So add a > powernv_be_defconfig, which is autogenerated by flipping the endian > symbol in powernv_defconfig. > > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > arch/powerpc/Makefile | 4 > arch/powerpc/configs/be.config | 1 + > 2 files changed, 5 insertions(+) > create mode 100644 arch/powerpc/configs/be.config > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index 8d4ed73d5490..7b8eddfc46d2 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -316,6 +316,10 @@ PHONY += ppc64le_defconfig > ppc64le_defconfig: > $(call merge_into_defconfig,ppc64_defconfig,le) > > +PHONY += powernv_be_defconfig > +powernv_be_defconfig: > + $(call merge_into_defconfig,powernv_defconfig,be) > + > PHONY += mpc85xx_defconfig > mpc85xx_defconfig: > $(call merge_into_defconfig,mpc85xx_basic_defconfig,\ > diff --git a/arch/powerpc/configs/be.config b/arch/powerpc/configs/be.config > new file mode 100644 > index ..c5cdc99a6530 > --- /dev/null > +++ b/arch/powerpc/configs/be.config > @@ -0,0 +1 @@ > +CONFIG_CPU_BIG_ENDIAN=y
Re: [PATCH V8 3/3] powernv: Add support to clear sensor groups data
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote: > Adds support for clearing different sensor groups. OCC inband sensor > groups like CSM, Profiler, Job Scheduler can be cleared using this > driver. The min/max of all sensors belonging to these sensor groups > will be cleared. > Hi Shilpasri, I think also some comments from v1 also apply here. Other comments inline Thanks, Cyril > Signed-off-by: Shilpasri G Bhat> --- > Changes from V7: > - s/send_occ_command/opal_sensor_groups_clear_history > > arch/powerpc/include/asm/opal-api.h| 3 +- > arch/powerpc/include/asm/opal.h| 2 + > arch/powerpc/include/uapi/asm/opal-occ.h | 23 ++ > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-occ.c | 109 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + > arch/powerpc/platforms/powernv/opal.c | 3 + > 7 files changed, 141 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h > create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 0d37315..342738a 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -195,7 +195,8 @@ > #define OPAL_SET_POWERCAP153 > #define OPAL_GET_PSR 154 > #define OPAL_SET_PSR 155 > -#define OPAL_LAST155 > +#define OPAL_SENSOR_GROUPS_CLEAR 156 > +#define OPAL_LAST156 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 58b30a4..92db6af 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int opal_set_powercap(u32 handle, int token, u32 pcap); > int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr); > int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr); > +int opal_sensor_groups_clear(u32 group_hndl, int token); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) > > void opal_powercap_init(void); > void opal_psr_init(void); > +int opal_sensor_groups_clear_history(u32 handle); > > #endif /* __ASSEMBLY__ */ > > diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h > b/arch/powerpc/include/uapi/asm/opal-occ.h > new file mode 100644 > index 000..97c45e2 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/opal-occ.h > @@ -0,0 +1,23 @@ > +/* > + * OPAL OCC command interface > + * Supported on POWERNV platform > + * > + * (C) Copyright IBM 2017 > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2, or (at your option) > + * any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_ > +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_ > + > +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32) > + > +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */ > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index 9ed7d33..f193b33 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > opal-occ.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-occ.c > b/arch/powerpc/platforms/powernv/opal-occ.c > new file mode 100644 > index 000..d1d4b28 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-occ.c > @@ -0,0 +1,109 @@ > +/* > + * Copyright IBM Corporation 2017 > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version
Re: [PATCH V8 2/3] powernv: Add support to set power-shifting-ratio
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote: > This patch adds support to set power-shifting-ratio for CPU-GPU which > is used by OCC power capping algorithm. > > Signed-off-by: Shilpasri G BhatHi Shilpasri, I started looking though this - a lot the comments to patch 1/3 apply here so I'll stop repeating myself :). Thanks, Cyril > --- > Changes from V7: > - Replaced sscanf with kstrtoint > > arch/powerpc/include/asm/opal-api.h| 4 +- > arch/powerpc/include/asm/opal.h| 3 + > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-psr.c | 169 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/opal.c | 3 + > 6 files changed, 181 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index c3e0c4a..0d37315 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -193,7 +193,9 @@ > #define OPAL_NPU_MAP_LPAR148 > #define OPAL_GET_POWERCAP152 > #define OPAL_SET_POWERCAP153 > -#define OPAL_LAST153 > +#define OPAL_GET_PSR 154 > +#define OPAL_SET_PSR 155 > +#define OPAL_LAST155 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index ec2087c..58b30a4 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int64_t opal_xive_dump(uint32_t type, uint32_t id); > int opal_get_powercap(u32 handle, int token, u32 *pcap); > int opal_set_powercap(u32 handle, int token, u32 pcap); > +int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr); > +int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) > void opal_wake_poller(void); > > void opal_powercap_init(void); > +void opal_psr_init(void); > > #endif /* __ASSEMBLY__ */ > > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index e79f806..9ed7d33 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o opal-powercap.o > +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-psr.c > b/arch/powerpc/platforms/powernv/opal-psr.c > new file mode 100644 > index 000..07e3f78 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-psr.c > @@ -0,0 +1,169 @@ > +/* > + * PowerNV OPAL Power-Shifting-Ratio interface > + * > + * Copyright 2017 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#define pr_fmt(fmt) "opal-psr: " fmt > + > +#include > +#include > +#include > + > +#include > + > +DEFINE_MUTEX(psr_mutex); > + > +static struct kobject *psr_kobj; > + > +struct psr_attr { > + u32 handle; > + struct kobj_attribute attr; > +}; > + > +static struct psr_attr *psr_attrs; > +static struct kobject *psr_kobj; > + > +static ssize_t psr_show(struct kobject *kobj, struct kobj_attribute *attr, > + char *buf) > +{ > + struct psr_attr *psr_attr = container_of(attr, struct psr_attr, attr); > + struct opal_msg msg; > + int psr, ret, token; > + > + token = opal_async_get_token_interruptible(); > + if (token < 0) { > + pr_devel("Failed to get token\n"); > + return token; > + } > + > + mutex_lock(_mutex); > + ret = opal_get_power_shifting_ratio(psr_attr->handle, token, ); __pa() > + switch (ret) { > + case OPAL_ASYNC_COMPLETION: > + ret = opal_async_wait_response(token, ); > + if (ret) { > +
Re: [PATCH V8 1/3] powernv: powercap: Add support for powercap framework
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote: > Adds a generic powercap framework to change the system powercap > inband through OPAL-OCC command/response interface. > > Signed-off-by: Shilpasri G Bhat> --- > Changes from V7: > - Replaced sscanf with kstrtoint > > arch/powerpc/include/asm/opal-api.h| 5 +- > arch/powerpc/include/asm/opal.h| 4 + > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-powercap.c | 237 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/opal.c | 4 + > 6 files changed, 252 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 3130a73..c3e0c4a 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -42,6 +42,7 @@ > #define OPAL_I2C_STOP_ERR-24 > #define OPAL_XIVE_PROVISIONING -31 > #define OPAL_XIVE_FREE_ACTIVE-32 > +#define OPAL_TIMEOUT -33 > > /* API Tokens (in r0) */ > #define OPAL_INVALID_CALL -1 > @@ -190,7 +191,9 @@ > #define OPAL_NPU_INIT_CONTEXT146 > #define OPAL_NPU_DESTROY_CONTEXT 147 > #define OPAL_NPU_MAP_LPAR148 > -#define OPAL_LAST148 > +#define OPAL_GET_POWERCAP152 > +#define OPAL_SET_POWERCAP153 > +#define OPAL_LAST153 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 588fb1c..ec2087c 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int64_t opal_xive_free_irq(uint32_t girq); > int64_t opal_xive_sync(uint32_t type, uint32_t id); > int64_t opal_xive_dump(uint32_t type, uint32_t id); > +int opal_get_powercap(u32 handle, int token, u32 *pcap); > +int opal_set_powercap(u32 handle, int token, u32 pcap); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg) > > void opal_wake_poller(void); > > +void opal_powercap_init(void); > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_OPAL_H */ > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index b5d98cb..e79f806 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o > +obj-y+= opal-kmsg.o opal-powercap.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c > b/arch/powerpc/platforms/powernv/opal-powercap.c > new file mode 100644 > index 000..7c57f4b > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-powercap.c > @@ -0,0 +1,237 @@ > +/* > + * PowerNV OPAL Powercap interface > + * > + * Copyright 2017 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#define pr_fmt(fmt) "opal-powercap: " fmt > + > +#include > +#include > +#include > + > +#include > + > +DEFINE_MUTEX(powercap_mutex); > + > +static struct kobject *powercap_kobj; > + > +struct powercap_attr { > + u32 handle; > + struct kobj_attribute attr; > +}; > + > +static struct attribute_group *pattr_groups; > +static struct powercap_attr *pcap_attrs; > + > +static ssize_t powercap_show(struct kobject *kobj, struct kobj_attribute > *attr, > + char *buf) > +{ > + struct powercap_attr *pcap_attr = container_of(attr, > + struct powercap_attr, attr); > + struct opal_msg msg; > + u32 pcap; > + int ret, token; > + > + token = opal_async_get_token_interruptible(); > + if (token < 0) { > + pr_devel("Failed to get token\n"); > + return token; > + } > + > + mutex_lock(_mutex); If this is purely a userspace interface,
Re: [PATCH] powerpc/tm: fix TM SPRs in code dump file
On Wed, 2017-07-19 at 01:44 -0400, Gustavo Romero wrote: > Currently flush_tmregs_to_thread() does not update accordingly the thread > structures from live state before a core dump rendering wrong values of > THFAR, TFIAR, and TEXASR in core dump files. > > That commit fixes it by copying from live state to the appropriate thread > structures when it's necessary. > > Signed-off-by: Gustavo Romero <grom...@linux.vnet.ibm.com> Gustavo was nice enough to provide me with a simple test case: int main(void) { __builtin_set_texasr(0x4841434b); __builtin_set_tfhar(0xbfee00); __builtin_set_tfiar(0x4841434b); asm volatile (".long 0x0"); return 0; } Running this binary in a loop and inspecting the resulting core file with a modified elfutils also provided by Gustavo (https://sourceware.o rg/ml/elfutils-devel/2017-q3/msg00030.html) should always observe the values that those __builtin functions set. __builtin_set_{texasr,tfhar,tfiar} are just wrappers around the corresponding mtspr instruction. On an unmodified 4.13-rc1 it takes in the order of 10 executions of the test to observe an incorrect TM SPR values in the core file (typically zero). The above test was run on the same 4.13-rc1 with this patch applied for a over 48 hours. The test was executed at a rate of about one run per second. An incorrect value was never observed. This gives me confidence that this patch is correct. Running the kernel selftests does not detect any regressions. Reviewed-by: Cyril Bur <cyril...@gmail.com> > --- > arch/powerpc/kernel/ptrace.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c > index 925a4ef..660ed39 100644 > --- a/arch/powerpc/kernel/ptrace.c > +++ b/arch/powerpc/kernel/ptrace.c > @@ -127,12 +127,19 @@ static void flush_tmregs_to_thread(struct task_struct > *tsk) >* If task is not current, it will have been flushed already to >* it's thread_struct during __switch_to(). >* > - * A reclaim flushes ALL the state. > + * A reclaim flushes ALL the state or if not in TM save TM SPRs > + * in the appropriate thread structures from live. >*/ > > - if (tsk == current && MSR_TM_SUSPENDED(mfmsr())) > - tm_reclaim_current(TM_CAUSE_SIGNAL); > + if (tsk != current) > + return; > > + if (MSR_TM_SUSPENDED(mfmsr())) { > + tm_reclaim_current(TM_CAUSE_SIGNAL); > + } else { > + tm_enable(); > + tm_save_sprs(&(tsk->thread)); > + } > } > #else > static inline void flush_tmregs_to_thread(struct task_struct *tsk) { }
Re: [PATCH v3 02/10] mtd: powernv_flash: Lock around concurrent access to OPAL
On Mon, 2017-07-17 at 19:29 +1000, Balbir Singh wrote: > On Mon, 2017-07-17 at 17:55 +1000, Cyril Bur wrote: > > On Mon, 2017-07-17 at 17:34 +1000, Balbir Singh wrote: > > > On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote: > > > > OPAL can only manage one flash access at a time and will return an > > > > OPAL_BUSY error for each concurrent access to the flash. The simplest > > > > way to prevent this from happening is with a mutex. > > > > > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > > > --- > > > > > > Should the mutex_lock() be mutex_lock_interruptible()? Are we OK waiting > > > on > > > the mutex while other operations with the lock are busy? > > > > > > > This is a good question. My best interpretation is that > > _interruptible() should be used when you'll only be coming from a user > > context. Which is mostly true for this driver, however, MTD does > > provide kernel interfaces, so I was hesitant, there isn't a great deal > > of use of _interruptible() in drivers/mtd. > > > > Thoughts? > > What are the kernel interfaces (I have not read through mtd in detail)? > I would still like to see us not blocked in mutex_lock() across threads > for parallel calls, one option is to use mutex_trylock() and return if > someone already holds the mutex with -EBUSY, but you'll need to evaluate > what that means for every call. > Yeah maybe mutex_trylock() is the way to go, thinking quickly, I don't see how it could be a problem for userspace using powernv_flash. I'm honestly not too sure about the depths of the mtd kernel interfaces but I've seen a tonne of cool stuff you could do, hence my reluctance to go with _interruptible() Cyril > Balbir Singh. >
Re: [PATCH v3 03/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
On Mon, 2017-07-17 at 18:50 +1000, Balbir Singh wrote: > On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote: > > While this driver expects to interact asynchronously, OPAL is well > > within its rights to return OPAL_SUCCESS to indicate that the operation > > completed without the need for a callback. We shouldn't treat > > OPAL_SUCCESS as an error rather we should wrap up and return promptly to > > the caller. > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > --- > > I'll note here that currently no OPAL exists that will return > > OPAL_SUCCESS so there isn't the possibility of a bug today. > > It would help if you mentioned OPAL_SUCCESS to the async call. So effectively > what we expected to be an asynchronous call with callback, but OPAL returned > immediately with success. > Ah my favourite problems, commit message. Thanks, Cyril > Balbir Singh. >
Re: [PATCH v3 06/10] powerpc/opal: Rework the opal-async interface
On Mon, 2017-07-17 at 21:30 +1000, Balbir Singh wrote: > On Wed, 2017-07-12 at 14:23 +1000, Cyril Bur wrote: > > Future work will add an opal_async_wait_response_interruptible() > > which will call wait_event_interruptible(). This work requires extra > > token state to be tracked as wait_event_interruptible() can return and > > the caller could release the token before OPAL responds. > > > > Currently token state is tracked with two bitfields which are 64 bits > > big but may not need to be as OPAL informs Linux how many async tokens > > there are. It also uses an array indexed by token to store response > > messages for each token. > > > > The bitfields make it difficult to add more state and also provide a > > hard maximum as to how many tokens there can be - it is possible that > > OPAL will inform Linux that there are more than 64 tokens. > > > > Rather than add a bitfield to track the extra state, rework the > > internals slightly. > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > --- > > arch/powerpc/platforms/powernv/opal-async.c | 97 > > - > > 1 file changed, 53 insertions(+), 44 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/opal-async.c > > b/arch/powerpc/platforms/powernv/opal-async.c > > index 1d56ac9da347..d692372a0363 100644 > > --- a/arch/powerpc/platforms/powernv/opal-async.c > > +++ b/arch/powerpc/platforms/powernv/opal-async.c > > @@ -1,7 +1,7 @@ > > /* > > * PowerNV OPAL asynchronous completion interfaces > > * > > - * Copyright 2013 IBM Corp. > > + * Copyright 2013-2017 IBM Corp. > > * > > * This program is free software; you can redistribute it and/or > > * modify it under the terms of the GNU General Public License > > @@ -23,40 +23,46 @@ > > #include > > #include > > > > -#define N_ASYNC_COMPLETIONS64 > > +enum opal_async_token_state { > > + ASYNC_TOKEN_FREE, > > + ASYNC_TOKEN_ALLOCATED, > > + ASYNC_TOKEN_COMPLETED > > +}; > > Are these states mutually exclusive? Does _COMPLETED imply that it is also > _ALLOCATED? Yes > ALLOCATED and FREE are confusing, I would use IN_USE and NOT_IN_USE > for tokens. If these are mutually exclusive then you can use IN_USE and > !IN_USE > Perhaps instead of _FREE it could be _UNALLOCATED ? > > + > > +struct opal_async_token { > > + enum opal_async_token_state state; > > + struct opal_msg response; > > +}; > > > > -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = > > {~0UL}; > > -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); > > static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); > > static DEFINE_SPINLOCK(opal_async_comp_lock); > > static struct semaphore opal_async_sem; > > -static struct opal_msg *opal_async_responses; > > static unsigned int opal_max_async_tokens; > > +static struct opal_async_token *opal_async_tokens; > > > > static int __opal_async_get_token(void) > > { > > unsigned long flags; > > int token; > > > > - spin_lock_irqsave(_async_comp_lock, flags); > > - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); > > - if (token >= opal_max_async_tokens) { > > - token = -EBUSY; > > - goto out; > > - } > > - > > - if (__test_and_set_bit(token, opal_async_token_map)) { > > - token = -EBUSY; > > - goto out; > > + for (token = 0; token < opal_max_async_tokens; token++) { > > + spin_lock_irqsave(_async_comp_lock, flags); > > Why is the spin lock inside the for loop? If the last token is free, the > number of times we'll take and release a lock is extensive, why are we > doing it this way? > Otherwise we might hold the lock for quite some time. At the moment I think it isn't a bit deal since OPAL gives 8 but there is current work to increase that number and while it seems the number might only grow to 16, for a while it was looking like it might grow more. In a previous iteration I had a check inside the loop but outside the lock for if (token == ASYNC_TOKEN_FREE) which would then proceed to take the lock, check again and mark it allocated... Or I could put the lock around the loop, I'm not attached to any particular approach. > > + if (opal_async_tokens[token].state == ASYNC_TOKEN_FREE) { > > + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; > > + spin_unlock_irqrestore(_async_comp_lock, flags);
Re: [PATCH v3 01/10] mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON()
On Mon, 2017-07-17 at 13:33 +0200, Frans Klaver wrote: > On Wed, Jul 12, 2017 at 6:22 AM, Cyril Bur <cyril...@gmail.com> wrote: > > BUG_ON() should be reserved in situations where we can not longer > > guarantee the integrity of the system. In the case where > > powernv_flash_async_op() receives an impossible op, we can still > > guarantee the integrity of the system. > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > --- > > drivers/mtd/devices/powernv_flash.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/mtd/devices/powernv_flash.c > > b/drivers/mtd/devices/powernv_flash.c > > index f5396f26ddb4..a9a20c00687c 100644 > > --- a/drivers/mtd/devices/powernv_flash.c > > +++ b/drivers/mtd/devices/powernv_flash.c > > @@ -78,7 +78,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, > > enum flash_op op, > > rc = opal_flash_erase(info->id, offset, len, token); > > break; > > default: > > - BUG_ON(1); > > + WARN_ON_ONCE(1); > > + return -EIO; > > Based on the fact that all three values in enum flash_op are handled, > I would go as far as stating that the default lemma adds no value and > can be removed. > The way I see it is that it isn't doing any harm being there and in cases of future programmer error or during corruption events, that WARN_ON might prove useful. > Frans
Re: [PATCH v3 02/10] mtd: powernv_flash: Lock around concurrent access to OPAL
On Mon, 2017-07-17 at 17:34 +1000, Balbir Singh wrote: > On Wed, 2017-07-12 at 14:22 +1000, Cyril Bur wrote: > > OPAL can only manage one flash access at a time and will return an > > OPAL_BUSY error for each concurrent access to the flash. The simplest > > way to prevent this from happening is with a mutex. > > > > Signed-off-by: Cyril Bur <cyril...@gmail.com> > > --- > > Should the mutex_lock() be mutex_lock_interruptible()? Are we OK waiting on > the mutex while other operations with the lock are busy? > This is a good question. My best interpretation is that _interruptible() should be used when you'll only be coming from a user context. Which is mostly true for this driver, however, MTD does provide kernel interfaces, so I was hesitant, there isn't a great deal of use of _interruptible() in drivers/mtd. Thoughts? Cyril > Balbir Singh. >
[PATCH v3 03/10] mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error
While this driver expects to interact asynchronously, OPAL is well within its rights to return OPAL_SUCCESS to indicate that the operation completed without the need for a callback. We shouldn't treat OPAL_SUCCESS as an error rather we should wrap up and return promptly to the caller. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- I'll note here that currently no OPAL exists that will return OPAL_SUCCESS so there isn't the possibility of a bug today. drivers/mtd/devices/powernv_flash.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index 7b41af06f4fe..d50b5f200f73 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -66,9 +66,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, if (token < 0) { if (token != -ERESTARTSYS) dev_err(dev, "Failed to get an async token\n"); - - rc = token; - goto out; + mutex_unlock(>lock); + return token; } switch (op) { @@ -87,23 +86,25 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, goto out; } + if (rc == OPAL_SUCCESS) + goto out_success; + if (rc != OPAL_ASYNC_COMPLETION) { dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - opal_async_release_token(token); rc = -EIO; goto out; } rc = opal_async_wait_response(token, ); - opal_async_release_token(token); - mutex_unlock(>lock); if (rc) { dev_err(dev, "opal async wait failed (rc %d)\n", rc); - return -EIO; + rc = -EIO; + goto out; } rc = opal_get_async_rc(msg); +out_success: if (rc == OPAL_SUCCESS) { rc = 0; if (retlen) @@ -112,8 +113,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, rc = -EIO; } - return rc; out: + opal_async_release_token(token); mutex_unlock(>lock); return rc; } -- 2.13.2
[PATCH v3 04/10] mtd: powernv_flash: Remove pointless goto in driver init
Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index d50b5f200f73..d7243b72ba6e 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -232,21 +232,20 @@ static int powernv_flash_probe(struct platform_device *pdev) int ret; data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); - if (!data) { - ret = -ENOMEM; - goto out; - } + if (!data) + return -ENOMEM; + data->mtd.priv = data; ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id)); if (ret) { dev_err(dev, "no device property 'ibm,opal-id'\n"); - goto out; + return ret; } ret = powernv_flash_set_driver_info(dev, >mtd); if (ret) - goto out; + return ret; mutex_init(>lock); @@ -257,10 +256,7 @@ static int powernv_flash_probe(struct platform_device *pdev) * with an ffs partition at the start, it should prove easier for users * to deal with partitions or not as they see fit */ - ret = mtd_device_register(>mtd, NULL, 0); - -out: - return ret; + return mtd_device_register(>mtd, NULL, 0); } /** -- 2.13.2
[PATCH v3 08/10] powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async
This patch adds an _interruptible version of opal_async_wait_response(). This is useful when a long running OPAL call is performed on behalf of a userspace thread, for example, the opal_flash_{read,write,erase} functions performed by the powernv-flash MTD driver. It is foreseeable that these functions would take upwards of two minutes causing the wait_event() to block long enough to cause hung task warnings. Furthermore, wait_event_interruptible() is preferable as otherwise there is no way for signals to stop the process which is going to be confusing in userspace. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/include/asm/opal.h | 2 + arch/powerpc/platforms/powernv/opal-async.c | 87 +++-- 2 files changed, 85 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 5553ad2f3e53..6e9e53d744f3 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -294,6 +294,8 @@ extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); extern int opal_async_get_token_interruptible(void); extern int opal_async_release_token(int token); extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); +extern int opal_async_wait_response_interruptible(uint64_t token, + struct opal_msg *msg); extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); struct rtc_time; diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index d692372a0363..f6b30cfceb8f 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -26,6 +26,8 @@ enum opal_async_token_state { ASYNC_TOKEN_FREE, ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_DISPATCHED, + ASYNC_TOKEN_ABANDONED, ASYNC_TOKEN_COMPLETED }; @@ -59,8 +61,10 @@ static int __opal_async_get_token(void) } /* - * Note: If the returned token is used in an opal call and opal returns - * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * Note: If the returned token is used in an opal call and opal + * returns OPAL_ASYNC_COMPLETION you MUST one of + * opal_async_wait_response() or + * opal_async_wait_response_interruptible() at least once before * calling another other opal_async_* function */ int opal_async_get_token_interruptible(void) @@ -97,6 +101,16 @@ static int __opal_async_release_token(int token) opal_async_tokens[token].state = ASYNC_TOKEN_FREE; rc = 0; break; + /* +* DISPATCHED and ABANDONED tokens must wait for OPAL to +* respond. +* Mark a DISPATCHED token as ABANDONED so that the response +* response handling code knows no one cares and that it can +* free it then. +*/ + case ASYNC_TOKEN_DISPATCHED: + opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED; + /* Fall through */ default: rc = 1; } @@ -129,7 +143,11 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) return -EINVAL; } - /* Wakeup the poller before we wait for events to speed things + /* +* There is no need to mark the token as dispatched, wait_event() +* will block until the token completes. +* +* Wakeup the poller before we wait for events to speed things * up on platforms or simulators where the interrupts aren't * functional. */ @@ -142,11 +160,66 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg) } EXPORT_SYMBOL_GPL(opal_async_wait_response); +int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg) +{ + unsigned long flags; + int ret; + + if (token >= opal_max_async_tokens) { + pr_err("%s: Invalid token passed\n", __func__); + return -EINVAL; + } + + if (!msg) { + pr_err("%s: Invalid message pointer passed\n", __func__); + return -EINVAL; + } + + /* +* The first time this gets called we mark the token as DISPATCHED +* so that if wait_event_interruptible() returns not zero and the +* caller frees the token, we know not to actually free the token +* until the response comes. +* +* Only change if the token is ALLOCATED - it may have been +* completed even before the caller gets around to calling this +* the first time. +* +* There is also a dirty great comment at the token allocation +* function that if the opal call returns OPAL_ASYNC_COMPLETION to +* the caller then the caller *must* call this or the not +* interruptible version before doing anything else with the +* token. +*/ +
[PATCH v3 06/10] powerpc/opal: Rework the opal-async interface
Future work will add an opal_async_wait_response_interruptible() which will call wait_event_interruptible(). This work requires extra token state to be tracked as wait_event_interruptible() can return and the caller could release the token before OPAL responds. Currently token state is tracked with two bitfields which are 64 bits big but may not need to be as OPAL informs Linux how many async tokens there are. It also uses an array indexed by token to store response messages for each token. The bitfields make it difficult to add more state and also provide a hard maximum as to how many tokens there can be - it is possible that OPAL will inform Linux that there are more than 64 tokens. Rather than add a bitfield to track the extra state, rework the internals slightly. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- arch/powerpc/platforms/powernv/opal-async.c | 97 - 1 file changed, 53 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index 1d56ac9da347..d692372a0363 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -1,7 +1,7 @@ /* * PowerNV OPAL asynchronous completion interfaces * - * Copyright 2013 IBM Corp. + * Copyright 2013-2017 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -23,40 +23,46 @@ #include #include -#define N_ASYNC_COMPLETIONS64 +enum opal_async_token_state { + ASYNC_TOKEN_FREE, + ASYNC_TOKEN_ALLOCATED, + ASYNC_TOKEN_COMPLETED +}; + +struct opal_async_token { + enum opal_async_token_state state; + struct opal_msg response; +}; -static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL}; -static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); static DEFINE_SPINLOCK(opal_async_comp_lock); static struct semaphore opal_async_sem; -static struct opal_msg *opal_async_responses; static unsigned int opal_max_async_tokens; +static struct opal_async_token *opal_async_tokens; static int __opal_async_get_token(void) { unsigned long flags; int token; - spin_lock_irqsave(_async_comp_lock, flags); - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); - if (token >= opal_max_async_tokens) { - token = -EBUSY; - goto out; - } - - if (__test_and_set_bit(token, opal_async_token_map)) { - token = -EBUSY; - goto out; + for (token = 0; token < opal_max_async_tokens; token++) { + spin_lock_irqsave(_async_comp_lock, flags); + if (opal_async_tokens[token].state == ASYNC_TOKEN_FREE) { + opal_async_tokens[token].state = ASYNC_TOKEN_ALLOCATED; + spin_unlock_irqrestore(_async_comp_lock, flags); + return token; + } + spin_unlock_irqrestore(_async_comp_lock, flags); } - __clear_bit(token, opal_async_complete_map); - -out: - spin_unlock_irqrestore(_async_comp_lock, flags); - return token; + return -EBUSY; } +/* + * Note: If the returned token is used in an opal call and opal returns + * OPAL_ASYNC_COMPLETION you MUST opal_async_wait_response() before + * calling another other opal_async_* function + */ int opal_async_get_token_interruptible(void) { int token; @@ -76,6 +82,7 @@ EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); static int __opal_async_release_token(int token) { unsigned long flags; + int rc; if (token < 0 || token >= opal_max_async_tokens) { pr_err("%s: Passed token is out of range, token %d\n", @@ -84,11 +91,18 @@ static int __opal_async_release_token(int token) } spin_lock_irqsave(_async_comp_lock, flags); - __set_bit(token, opal_async_complete_map); - __clear_bit(token, opal_async_token_map); + switch (opal_async_tokens[token].state) { + case ASYNC_TOKEN_COMPLETED: + case ASYNC_TOKEN_ALLOCATED: + opal_async_tokens[token].state = ASYNC_TOKEN_FREE; + rc = 0; + break; + default: + rc = 1; + } spin_unlock_irqrestore(_async_comp_lock, flags); - return 0; + return rc; } int opal_async_release_token(int token) @@ -96,12 +110,10 @@ int opal_async_release_token(int token) int ret; ret = __opal_async_release_token(token); - if (ret) - return ret; - - up(_async_sem); + if (!ret) + up(_async_sem); - return 0; + return ret; } EXPORT_SYMBOL_GPL(opal_async_release_token); @@ -122,13 +134,15 @@ int opal_async_wait_response(uint
[PATCH v3 10/10] mtd: powernv_flash: Use opal_async_wait_response_interruptible()
The OPAL calls performed in this driver shouldn't be using opal_async_wait_response() as this performs a wait_event() which, on long running OPAL calls could result in hung task warnings. wait_event() prevents timely signal delivery which is also undesirable. This patch also attempts to quieten down the use of dev_err() when errors haven't actually occurred and also to return better information up the stack rather than always -EIO. Signed-off-by: Cyril Bur <cyril...@gmail.com> --- drivers/mtd/devices/powernv_flash.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c index d7243b72ba6e..cfa274ba7e40 100644 --- a/drivers/mtd/devices/powernv_flash.c +++ b/drivers/mtd/devices/powernv_flash.c @@ -90,16 +90,34 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, goto out_success; if (rc != OPAL_ASYNC_COMPLETION) { - dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", + if (rc != OPAL_BUSY) + dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", op, rc); - rc = -EIO; + rc = opal_error_code(rc); goto out; } - rc = opal_async_wait_response(token, ); + rc = opal_async_wait_response_interruptible(token, ); if (rc) { - dev_err(dev, "opal async wait failed (rc %d)\n", rc); - rc = -EIO; + /* +* Awkward, we've been interrupted but we cannot return. If we +* do return the mtd core will free the buffer we've just +* passed to OPAL but OPAL will continue to read or write from +* that memory. +* Future work will introduce a call to tell OPAL to stop +* using the buffer. +* It may be tempting to ultimately return 0 if we're doing a +* read or a write since we are going to end up waiting until +* OPAL is done. However, because the MTD core sends us the +* userspace request in chunks, we must report EINTR so that +* it doesn't just send us the next chunk, thus defeating the +* point of the _interruptible wait. +*/ + rc = -EINTR; + if (op == FLASH_OP_READ || op == FLASH_OP_WRITE) { + if (opal_async_wait_response(token, )) + dev_err(dev, "opal async wait failed (rc %d)\n", rc); + } goto out; } -- 2.13.2