Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 20:24, Segher Boessenkool a écrit :

On Thu, Mar 04, 2021 at 09:54:44AM -0800, Nick Desaulniers wrote:

On Thu, Mar 4, 2021 at 9:42 AM Marco Elver  wrote:
include/linux/compiler.h:246:
prevent_tail_call_optimization

commit a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try")


https://github.com/linuxppc/linux/commit/a9a3ed1eff36



That is much heavier than needed (an mb()).  You can just put an empty
inline asm after a call before a return, and that call cannot be
optimised to a sibling call: (the end of a function is an implicit
return:)

Instead of:

void g(void);
void f(int x)
if (x)
g();
}

Do:

void g(void);
void f(int x)
if (x)
g();
asm("");
}

This costs no extra instructions, and certainly not something as heavy
as an mb()!  It works without the "if" as well, of course, but with it
it is a more interesting example of a tail call.


In the commit mentionned at the top, it is said:

The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), ... , was to add an empty asm("").

This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.

Christophe


Re: linux-next: manual merge of the tty tree with the powerpc-fixes tree

2021-03-04 Thread Greg KH
On Fri, Mar 05, 2021 at 12:05:23PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the tty tree got a conflict in:
> 
>   drivers/tty/hvc/hvcs.c
> 
> between commit:
> 
>   386a966f5ce7 ("vio: make remove callback return void")
> 
> from the powerpc-fixes tree and commit:
> 
>   fb8d350c291c ("tty: hvc, drop unneeded forward declarations")
> 
> from the tty tree.
> 
> I fixed it up (they both removed the forward decalrartion of
> hvcs_remove(), but the latter removed more) and can carry the fix as
> necessary. This is now fixed as far as linux-next is concerned, but any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to consider
> cooperating with the maintainer of the conflicting tree to minimise any
> particularly complex conflicts.

Thanks, that's the correct fix, we knew this was going to happen :(

greg k-h


Re: [PATCH v2 07/37] KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM

2021-03-04 Thread Daniel Axtens
Hi Nick,

> - .if IKVM_SKIP
> -89:  mtocrf  0x80,r9
> - ld  r10,IAREA+EX_CTR(r13)
> - mtctr   r10
> - ld  r9,IAREA+EX_R9(r13)
> - ld  r10,IAREA+EX_R10(r13)
> - ld  r11,IAREA+EX_R11(r13)
> - ld  r12,IAREA+EX_R12(r13)
> - .if IHSRR_IF_HVMODE
> - BEGIN_FTR_SECTION
> - b   kvmppc_skip_Hinterrupt
> - FTR_SECTION_ELSE
> - b   kvmppc_skip_interrupt
> - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

I was initially concerned that you were pulling out the complexities
around IHSRR_IF_HVMODE, but I checked exceptions-64s.S and the only
exception handler that sets IHSRR_IF_HVMODE is hardware_interrupt and
that does not set IKVM_SKIP, so we are indeed fine to not keep this
complex little asm section.

> - .elseif IHSRR
> - b   kvmppc_skip_Hinterrupt
> - .else
> - b   kvmppc_skip_interrupt
> - .endif
> - .endif
>  .endm

> +/*
> + * KVM uses a trick where it is running in MSR[HV]=1 mode in real-mode with 
> the
> + * guest MMU context loaded, and it sets KVM_GUEST_MODE_SKIP and enables
> + * MSR[DR]=1 while leaving MSR[IR]=0, so it continues to fetch HV 
> instructions
> + * but loads and stores will access the guest context. This is used to load
> + * the faulting instruction without walking page tables.
> + *
> + * However the guest context may not be able to translate, or it may cause a
> + * machine check or other issue, which will result in a fault in the host
> + * (even with KVM-HV).
> + *
> + * These faults are caught here and if the fault was (or was likely) due to
> + * that load, then we just return with the PC advanced +4 and skip the load,
> + * which then goes via the slow path.
> + */

This is a really helpful comment! Thanks!

> +.Lmaybe_skip:
> + cmpwi   r12,BOOK3S_INTERRUPT_MACHINE_CHECK
> + beq 1f
> + cmpwi   r12,BOOK3S_INTERRUPT_DATA_STORAGE
> + beq 1f
> + cmpwi   r12,BOOK3S_INTERRUPT_DATA_SEGMENT
> + beq 1f
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* HSRR interrupts have 2 added to trap vector */
> + cmpwi   r12,BOOK3S_INTERRUPT_H_DATA_STORAGE | 0x2

This took me by surprise for a while, but I checked how it works in
exceptions-64s.S and indeed the GEN_KVM macro will add 0x2 to the IVEC
if IHSRR is set, and it is set for h_data_storage.

So this checks out to me.

I have checked, to the best of my limited assembler capabilities that
the logic before and after matches. It seems good to me.

On that limited basis,
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel


> + beq 2f
> +#endif
> + b   .Lno_skip
> +1:   mfspr   r9,SPRN_SRR0
> + addir9,r9,4
> + mtspr   SPRN_SRR0,r9
> + ld  r12,HSTATE_SCRATCH0(r13)
> + ld  r9,HSTATE_SCRATCH2(r13)
> + GET_SCRATCH0(r13)
> + RFI_TO_KERNEL
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +2:   mfspr   r9,SPRN_HSRR0
> + addir9,r9,4
> + mtspr   SPRN_HSRR0,r9
> + ld  r12,HSTATE_SCRATCH0(r13)
> + ld  r9,HSTATE_SCRATCH2(r13)
> + GET_SCRATCH0(r13)
> + HRFI_TO_KERNEL
> +#endif
> -- 
> 2.23.0


Re: [PATCH] powerpc/perf: prevent mixed EBB and non-EBB events

2021-03-04 Thread Athira Rajeev



> On 24-Feb-2021, at 5:51 PM, Thadeu Lima de Souza Cascardo 
>  wrote:
> 
> EBB events must be under exclusive groups, so there is no mix of EBB and
> non-EBB events on the same PMU. This requirement worked fine as perf core
> would not allow other pinned events to be scheduled together with exclusive
> events.
> 
> This assumption was broken by commit 1908dc911792 ("perf: Tweak
> perf_event_attr::exclusive semantics").
> 
> After that, the test cpu_event_pinned_vs_ebb_test started succeeding after
> read_events, but worse, the task would not have given access to PMC1, so
> when it tried to write to it, it was killed with "illegal instruction".
> 
> Preventing mixed EBB and non-EBB events from being add to the same PMU will
> just revert to the previous behavior and the test will succeed.


Hi,

Thanks for checking this. I checked your patch which is fixing “check_excludes” 
to make
sure all events must agree on EBB. But in the PMU group constraints, we already 
have check for
EBB events. This is in arch/powerpc/perf/isa207-common.c ( 
isa207_get_constraint function ).

<<>>
mask  |= CNST_EBB_VAL(ebb);
value |= CNST_EBB_MASK;
<<>>

But the above setting for mask and value is interchanged. We actually need to 
fix here.

Below patch should fix this:

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..8b5eeb6fb2fb 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -447,8 +447,8 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, 
unsigned long *valp,
 * EBB events are pinned & exclusive, so this should never actually
 * hit, but we leave it as a fallback in case.
 */
-   mask  |= CNST_EBB_VAL(ebb);
-   value |= CNST_EBB_MASK;
+   mask  |= CNST_EBB_MASK;
+   value |= CNST_EBB_VAL(ebb);
 
*maskp = mask;
*valp = value;


Can you please try with this patch.

Thanks
Athira


> 
> Fixes: 1908dc911792 (perf: Tweak perf_event_attr::exclusive semantics)
> Signed-off-by: Thadeu Lima de Souza Cascardo 
> ---
> arch/powerpc/perf/core-book3s.c | 20 
> 1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 43599e671d38..d767f7944f85 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1010,9 +1010,25 @@ static int check_excludes(struct perf_event **ctrs, 
> unsigned int cflags[],
> int n_prev, int n_new)
> {
>   int eu = 0, ek = 0, eh = 0;
> + bool ebb = false;
>   int i, n, first;
>   struct perf_event *event;
> 
> + n = n_prev + n_new;
> + if (n <= 1)
> + return 0;
> +
> + first = 1;
> + for (i = 0; i < n; ++i) {
> + event = ctrs[i];
> + if (first) {
> + ebb = is_ebb_event(event);
> + first = 0;
> + } else if (is_ebb_event(event) != ebb) {
> + return -EAGAIN;
> + }
> + }
> +
>   /*
>* If the PMU we're on supports per event exclude settings then we
>* don't need to do any of this logic. NB. This assumes no PMU has both
> @@ -1021,10 +1037,6 @@ static int check_excludes(struct perf_event **ctrs, 
> unsigned int cflags[],
>   if (ppmu->flags & PPMU_ARCH_207S)
>   return 0;
> 
> - n = n_prev + n_new;
> - if (n <= 1)
> - return 0;
> -
>   first = 1;
>   for (i = 0; i < n; ++i) {
>   if (cflags[i] & PPMU_LIMITED_PMC_OK) {
> -- 
> 2.27.0
> 



Re: [PATCH] ibmvnic: remove excessive irqsave

2021-03-04 Thread Christophe Leroy




Le 05/03/2021 à 02:43, angkery a écrit :

From: Junlin Yang 

ibmvnic_remove locks multiple spinlocks while disabling interrupts:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock_irqsave(&adapter->rwi_lock, flags);

there is no need for the second irqsave,since interrupts are disabled
at that point, so remove the second irqsave:


The problème is not that there is no need. The problem is a lot more serious:
As reported by coccinella, the second _irqsave() overwrites the value saved in 'flags' by the first 
_irqsave, therefore when the second _irqrestore comes, the value in 'flags' is not valid, the value 
saved by the first _irqsave has been lost. This likely leads to IRQs remaining disabled, which is 
_THE_ problem really.



spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock(&adapter->rwi_lock);

Generated by: ./scripts/coccinelle/locks/flags.cocci
./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
ERROR: nested lock+irqsave that reuses flags from line 5404.

Signed-off-by: Junlin Yang 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2464c8a..a52668d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5408,9 +5408,9 @@ static void ibmvnic_remove(struct vio_dev *dev)
 * after setting state, so __ibmvnic_reset() which is called
 * from the flush_work() below, can make progress.
 */
-   spin_lock_irqsave(&adapter->rwi_lock, flags);
+   spin_lock(&adapter->rwi_lock);
adapter->state = VNIC_REMOVING;
-   spin_unlock_irqrestore(&adapter->rwi_lock, flags);
+   spin_unlock(&adapter->rwi_lock);
  
  	spin_unlock_irqrestore(&adapter->state_lock, flags);
  



Re: [PATCH v2 06/37] KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point

2021-03-04 Thread Daniel Axtens
Hi Nick,

> Rather than bifurcate the call depending on whether or not HV is
> possible, and have the HV entry test for PR, just make a single
> common point which does the demultiplexing. This makes it simpler
> to add another type of exit handler.
>
> Reviewed-by: Fabiano Rosas 

I checked this against the RFC which I was happy with and there are no
changes that I am concerned about. The expanded comment is also nice.

As such:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S|  8 +-
>  arch/powerpc/kvm/Makefile   |  3 +++
>  arch/powerpc/kvm/book3s_64_entry.S  | 35 +
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++--
>  4 files changed, 41 insertions(+), 16 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_64_entry.S
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 0097e0676ed7..ba13d749d203 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -208,7 +208,6 @@ do_define_int n
>  .endm
>  
>  #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  /*
>   * All interrupts which set HSRR registers, as well as SRESET and MCE and
>   * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
> @@ -238,13 +237,8 @@ do_define_int n
>  
>  /*
>   * If an interrupt is taken while a guest is running, it is immediately 
> routed
> - * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go 
> first
> - * to kvmppc_interrupt_hv, which handles the PR guest case.
> + * to KVM to handle.
>   */
> -#define kvmppc_interrupt kvmppc_interrupt_hv
> -#else
> -#define kvmppc_interrupt kvmppc_interrupt_pr
> -#endif
>  
>  .macro KVMTEST name
>   lbz r10,HSTATE_IN_GUEST(r13)
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 2bfeaa13befb..cdd119028f64 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -59,6 +59,9 @@ kvm-pr-y := \
>  kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
>   tm.o
>  
> +kvm-book3s_64-builtin-objs-y += \
> + book3s_64_entry.o
> +
>  ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
>   book3s_rmhandlers.o
> diff --git a/arch/powerpc/kvm/book3s_64_entry.S 
> b/arch/powerpc/kvm/book3s_64_entry.S
> new file mode 100644
> index ..e9a6a8fbb164
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_64_entry.S
> @@ -0,0 +1,35 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * This is branched to from interrupt handlers in exception-64s.S which set
> + * IKVM_REAL or IKVM_VIRT, if HSTATE_IN_GUEST was found to be non-zero.
> + */
> +.global  kvmppc_interrupt
> +.balign IFETCH_ALIGN_BYTES
> +kvmppc_interrupt:
> + /*
> +  * Register contents:
> +  * R12  = (guest CR << 32) | interrupt vector
> +  * R13  = PACA
> +  * guest R12 saved in shadow VCPU SCRATCH0
> +  * guest R13 saved in SPRN_SCRATCH0
> +  */
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + std r9,HSTATE_SCRATCH2(r13)
> + lbz r9,HSTATE_IN_GUEST(r13)
> + cmpwi   r9,KVM_GUEST_MODE_HOST_HV
> + beq kvmppc_bad_host_intr
> +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> + cmpwi   r9,KVM_GUEST_MODE_GUEST
> + ld  r9,HSTATE_SCRATCH2(r13)
> + beq kvmppc_interrupt_pr
> +#endif
> + b   kvmppc_interrupt_hv
> +#else
> + b   kvmppc_interrupt_pr
> +#endif
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 5e634db4809b..f976efb7e4a9 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1269,16 +1269,8 @@ kvmppc_interrupt_hv:
>* R13  = PACA
>* guest R12 saved in shadow VCPU SCRATCH0
>* guest R13 saved in SPRN_SCRATCH0
> +  * guest R9 saved in HSTATE_SCRATCH2
>*/
> - std r9, HSTATE_SCRATCH2(r13)
> - lbz r9, HSTATE_IN_GUEST(r13)
> - cmpwi   r9, KVM_GUEST_MODE_HOST_HV
> - beq kvmppc_bad_host_intr
> -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> - cmpwi   r9, KVM_GUEST_MODE_GUEST
> - ld  r9, HSTATE_SCRATCH2(r13)
> - beq kvmppc_interrupt_pr
> -#endif
>   /* We're now back in the host but in guest MMU context */
>   li  r9, KVM_GUEST_MODE_HOST_HV
>   stb r9, HSTATE_IN_GUEST(r13)
> @@ -3280,6 +3272,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST)
>   * cfar is saved in HSTATE_CFAR(r13)
>   * ppr is saved in HSTATE_PPR(r13)
>   */
> +.global kvmppc_bad_host_intr
>  kvmppc_bad_host_intr:
>   /*
>* Switch to the emergency stack, but start half-way down in
> -- 
> 2.23.0


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Michael Ellerman
Marco Elver  writes:
> On Thu, Mar 04, 2021 at 12:48PM +0100, Christophe Leroy wrote:
>> Le 04/03/2021 à 12:31, Marco Elver a écrit :
>> > On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
>> >  wrote:
>> > > Le 03/03/2021 à 11:56, Marco Elver a écrit :
>> > > > 
>> > > > Somewhat tangentially, I also note that e.g. show_regs(regs) (which
>> > > > was printed along the KFENCE report above) didn't include the top
>> > > > frame in the "Call Trace", so this assumption is definitely not
>> > > > isolated to KFENCE.
>> > > > 
>> > > 
>> > > Now, I have tested PPC64 (with the patch I sent yesterday to modify 
>> > > save_stack_trace_regs()
>> > > applied), and I get many failures. Any idea ?
>> > > 
>> > > [   17.653751][   T58] 
>> > > ==
>> > > [   17.654379][   T58] BUG: KFENCE: invalid free in 
>> > > .kfence_guarded_free+0x2e4/0x530
>> > > [   17.654379][   T58]
>> > > [   17.654831][   T58] Invalid free of 0xc0003c9c (in 
>> > > kfence-#77):
>> > > [   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
>> > > [   17.655775][   T58]  .__slab_free+0x320/0x5a0
>> > > [   17.656039][   T58]  .test_double_free+0xe0/0x198
>> > > [   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
>> > > [   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
>> > > [   17.657161][   T58]  .kthread+0x18c/0x1a0
>> > > [   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
>> > > [   17.659869][   T58]
> [...]
>> > 
>> > Looks like something is prepending '.' to function names. We expect
>> > the function name to appear as-is, e.g. "kfence_guarded_free",
>> > "test_double_free", etc.
>> > 
>> > Is there something special on ppc64, where the '.' is some convention?
>> > 
>> 
>> I think so, see 
>> https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES
>> 
>> Also see commit https://github.com/linuxppc/linux/commit/02424d896
>
> Thanks -- could you try the below patch? You'll need to define
> ARCH_FUNC_PREFIX accordingly.
>
> We think, since there are only very few architectures that add a prefix,
> requiring  to define something like ARCH_FUNC_PREFIX is
> the simplest option. Let me know if this works for you.
>
> There an alternative option, which is to dynamically figure out the
> prefix, but if this simpler option is fine with you, we'd prefer it.

We have rediscovered this problem in basically every tracing / debugging
feature added in the last 20 years :)

I think the simplest solution is the one tools/perf/util/symbol.c uses,
which is to just skip a leading '.'.

Does that work?

diff --git a/mm/kfence/report.c b/mm/kfence/report.c
index ab83d5a59bb1..67b49dc54b38 100644
--- a/mm/kfence/report.c
+++ b/mm/kfence/report.c
@@ -67,6 +67,9 @@ static int get_stack_skipnr(const unsigned long 
stack_entries[], int num_entries
for (skipnr = 0; skipnr < num_entries; skipnr++) {
int len = scnprintf(buf, sizeof(buf), "%ps", (void 
*)stack_entries[skipnr]);
 
+   if (buf[0] == '.')
+   buf++;
+
if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, 
"__kfence_") ||
!strncmp(buf, "__slab_free", len)) {
/*



cheers


Re: [PATCH v2 05/37] KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR

2021-03-04 Thread Daniel Axtens
>> This 'if' is technically redundant but you mention a future patch warning
>> on !(msr & MSR_ME) so I'm holding off on any judgement about the 'if' until
>> I get to that patch :)
>
> That's true. The warning is actually further down when we're setting up 
> the msr to run in guest mode. At this point the MSR I think comes from
> qemu (and arguably the guest setup code shouldn't need to know about HV
> specific MSR bits) so a warning here wouldn't be appropriate.
>
> I could remove the if, although the compiler might already do that.

Yes, I think the compiler is almost certainly going to remove that.

I'd probably not include the 'if' statement even though it probably gets
removed by the compiler but I think that's more a matter of taste than
anything else.

Kind regards,
Daniel

>
>> 
>> The patch seems sane to me, I agree that we don't want guests running with
>> MSR_ME=0 and kvmppc_set_msr_hv already ensures that the transactional state 
>> is
>> sane so this is another sanity-enforcement in the same sort of vein.
>> 
>> All up:
>> Reviewed-by: Daniel Axtens 
>
> Thanks,
> Nick


Re: [PATCH] ibmvnic: remove excessive irqsave

2021-03-04 Thread Sukadev Bhattiprolu
angkery [angk...@163.com] wrote:
> From: Junlin Yang 
> 
> ibmvnic_remove locks multiple spinlocks while disabling interrupts:
> spin_lock_irqsave(&adapter->state_lock, flags);
> spin_lock_irqsave(&adapter->rwi_lock, flags);
> 
> there is no need for the second irqsave,since interrupts are disabled
> at that point, so remove the second irqsave:
> spin_lock_irqsave(&adapter->state_lock, flags);
> spin_lock(&adapter->rwi_lock);
> 
> Generated by: ./scripts/coccinelle/locks/flags.cocci
> ./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
> ERROR: nested lock+irqsave that reuses flags from line 5404.
> 

Thanks. Please add

Fixes: 4a41c421f367 ("ibmvnic: serialize access to work queue on remove")

> Signed-off-by: Junlin Yang 

Reviewed-by: Sukadev Bhattiprolu 


[PATCH V3] mm: Generalize HUGETLB_PAGE_SIZE_VARIABLE

2021-03-04 Thread Anshuman Khandual
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
platform subscribing it. Instead just make it generic.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: Christophe Leroy 
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Suggested-by: Christoph Hellwig 
Signed-off-by: Anshuman Khandual 
---
This change was originally suggested in an earilier discussion. This
applies on v5.12-rc1 and has been build tested on all applicable
platforms i.e ia64 and powerpc.

https://patchwork.kernel.org/project/linux-mm/patch/1613024531-19040-3-git-send-email-anshuman.khand...@arm.com/

Changes in V3:

- Dropped the bool desciption that enabled user selection
- Dropped the dependency on HUGETLB_PAGE for HUGETLB_PAGE_SIZE_VARIABLE

Changes in V2:

https://patchwork.kernel.org/project/linux-mm/patch/1614661987-23881-1-git-send-email-anshuman.khand...@arm.com/

- Added a description for HUGETLB_PAGE_SIZE_VARIABLE
- Added HUGETLB_PAGE dependency while selecting HUGETLB_PAGE_SIZE_VARIABLE

Changes in V1:

https://patchwork.kernel.org/project/linux-mm/patch/1614577853-7452-1-git-send-email-anshuman.khand...@arm.com/

 arch/ia64/Kconfig| 6 +-
 arch/powerpc/Kconfig | 6 +-
 mm/Kconfig   | 7 +++
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..dccf5bfebf48 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -32,6 +32,7 @@ config IA64
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
+   select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -82,11 +83,6 @@ config STACKTRACE_SUPPORT
 config GENERIC_LOCKBREAK
def_bool n
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE
-   default y
-
 config GENERIC_CALIBRATE_DELAY
bool
default y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..11fea95a1f2c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -231,6 +231,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_PAGE_SIZE
select HAVE_REGS_AND_STACK_ACCESS_API
@@ -415,11 +416,6 @@ config HIGHMEM
 
 source "kernel/Kconfig.hz"
 
-config HUGETLB_PAGE_SIZE_VARIABLE
-   bool
-   depends on HUGETLB_PAGE && PPC_BOOK3S_64
-   default y
-
 config MATH_EMULATION
bool "Math emulation"
depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
diff --git a/mm/Kconfig b/mm/Kconfig
index 24c045b24b95..4413a69e7850 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -274,6 +274,13 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 config ARCH_ENABLE_THP_MIGRATION
bool
 
+config HUGETLB_PAGE_SIZE_VARIABLE
+   def_bool n
+   help
+ Allows the pageblock_order value to be dynamic instead of just 
standard
+ HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes 
available
+ on a platform.
+
 config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-- 
2.20.1



[PATCH] ibmvnic: remove excessive irqsave

2021-03-04 Thread angkery
From: Junlin Yang 

ibmvnic_remove locks multiple spinlocks while disabling interrupts:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock_irqsave(&adapter->rwi_lock, flags);

there is no need for the second irqsave,since interrupts are disabled
at that point, so remove the second irqsave:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock(&adapter->rwi_lock);

Generated by: ./scripts/coccinelle/locks/flags.cocci
./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
ERROR: nested lock+irqsave that reuses flags from line 5404.

Signed-off-by: Junlin Yang 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2464c8a..a52668d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5408,9 +5408,9 @@ static void ibmvnic_remove(struct vio_dev *dev)
 * after setting state, so __ibmvnic_reset() which is called
 * from the flush_work() below, can make progress.
 */
-   spin_lock_irqsave(&adapter->rwi_lock, flags);
+   spin_lock(&adapter->rwi_lock);
adapter->state = VNIC_REMOVING;
-   spin_unlock_irqrestore(&adapter->rwi_lock, flags);
+   spin_unlock(&adapter->rwi_lock);
 
spin_unlock_irqrestore(&adapter->state_lock, flags);
 
-- 
1.9.1




linux-next: manual merge of the tty tree with the powerpc-fixes tree

2021-03-04 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tty tree got a conflict in:

  drivers/tty/hvc/hvcs.c

between commit:

  386a966f5ce7 ("vio: make remove callback return void")

from the powerpc-fixes tree and commit:

  fb8d350c291c ("tty: hvc, drop unneeded forward declarations")

from the tty tree.

I fixed it up (they both removed the forward decalrartion of
hvcs_remove(), but the latter removed more) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpsDX9sWaK90.pgp
Description: OpenPGP digital signature


Re: [Freedreno] [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Rob Clark
On Thu, Mar 4, 2021 at 7:48 AM Robin Murphy  wrote:
>
> On 2021-03-01 08:42, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig 
>
> Moreso than the previous patch, where the feature is at least relatively
> generic (note that there's a bunch of in-flight development around
> DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to
> bloat the generic iommu_ops structure with private driver-specific
> interfaces. The attribute interface is a great compromise for these
> kinds of things, and you can easily add type-checked wrappers around it
> for external callers (maybe even make the actual attributes internal
> between the IOMMU core and drivers) if that's your concern.

I suppose if this is *just* for the GPU we could move it into adreno_smmu_priv..

But one thing I'm not sure about is whether
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA is something that other devices
*should* be using as well, but just haven't gotten around to yet.

BR,
-R

> Robin.
>
> > ---
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
> >   drivers/iommu/iommu.c   |  9 ++
> >   include/linux/iommu.h   |  9 +-
> >   4 files changed, 29 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain 
> > *iommu)
> >   struct io_pgtable_domain_attr pgtbl_cfg;
> >
> >   pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
> > - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
> > + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
> >   }
> >
> >   struct msm_gem_address_space *
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index 2e17d990d04481..2858999c86dfd1 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
> > iommu_domain *domain)
> >   return ret;
> >   }
> >
> > -static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> > - enum iommu_attr attr, void *data)
> > +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
> > + struct io_pgtable_domain_attr *pgtbl_cfg)
> >   {
> > - int ret = 0;
> >   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > + int ret = -EPERM;
> >
> > - mutex_lock(&smmu_domain->init_mutex);
> > -
> > - switch(domain->type) {
> > - case IOMMU_DOMAIN_UNMANAGED:
> > - switch (attr) {
> > - case DOMAIN_ATTR_IO_PGTABLE_CFG: {
> > - struct io_pgtable_domain_attr *pgtbl_cfg = data;
> > -
> > - if (smmu_domain->smmu) {
> > - ret = -EPERM;
> > - goto out_unlock;
> > - }
> > + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> > + return -EINVAL;
> >
> > - smmu_domain->pgtbl_cfg = *pgtbl_cfg;
> > - break;
> > - }
> > - default:
> > - ret = -ENODEV;
> > - }
> > - break;
> > - case IOMMU_DOMAIN_DMA:
> > - ret = -ENODEV;
> > - break;
> > - default:
> > - ret = -EINVAL;
> > + mutex_lock(&smmu_domain->init_mutex);
> > + if (!smmu_domain->smmu) {
> > + smmu_domain->pgtbl_cfg = *pgtbl_cfg;
> > + ret = 0;
> >   }
> > -out_unlock:
> >   mutex_unlock(&smmu_domain->init_mutex);
> > +
> >   return ret;
> >   }
> >
> > @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {
> >   .device_group   = arm_smmu_device_group,
> >   .dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
> >   .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
> > - .domain_set_attr= arm_smmu_domain_set_attr,
> > + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
> >   .domain_enable_nesting  = arm_smmu_domain_enable_nesting,
> >   .of_xlate   = arm_smmu_of_xlate,
> >   .get_resv_regions   = arm_smmu_get_resv_regions,
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 2e9e058501a953..8490aefd4b41f8 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
> > *domain)
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
> >
> > +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,
> > + struct io_pgtable_domain_attr *pgtbl_

Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 19:51, Mark Rutland  wrote:
> On Thu, Mar 04, 2021 at 07:22:53PM +0100, Marco Elver wrote:
> > On Thu, 4 Mar 2021 at 19:02, Mark Rutland  wrote:
> > > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote:
> > > > On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> > > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  
> > > > > > wrote:
> > > > > > > [adding Mark Brown]
> > > > > > >
> > > > > > > The bigger problem here is that skipping is dodgy to begin with, 
> > > > > > > and
> > > > > > > this is still liable to break in some cases. One big concern is 
> > > > > > > that
> > > > > > > (especially with LTO) we cannot guarantee the compiler will not 
> > > > > > > inline
> > > > > > > or outline functions, causing the skipp value to be too large or 
> > > > > > > too
> > > > > > > small. That's liable to happen to callers, and in theory (though
> > > > > > > unlikely in practice), portions of arch_stack_walk() or
> > > > > > > stack_trace_save() could get outlined too.
> > > > > > >
> > > > > > > Unless we can get some strong guarantees from compiler folk such 
> > > > > > > that we
> > > > > > > can guarantee a specific function acts boundary for unwinding (and
> > > > > > > doesn't itself get split, etc), the only reliable way I can think 
> > > > > > > to
> > > > > > > solve this requires an assembly trampoline. Whatever we do is 
> > > > > > > liable to
> > > > > > > need some invasive rework.
> > > > > >
> > > > > > Will LTO and friends respect 'noinline'?
> > > > >
> > > > > I hope so (and suspect we'd have more problems otherwise), but I don't
> > > > > know whether they actually so.
> > > > >
> > > > > I suspect even with 'noinline' the compiler is permitted to outline
> > > > > portions of a function if it wanted to (and IIUC it could still make
> > > > > specialized copies in the absence of 'noclone').
> > > > >
> > > > > > One thing I also noticed is that tail calls would also cause the 
> > > > > > stack
> > > > > > trace to appear somewhat incomplete (for some of my tests I've
> > > > > > disabled tail call optimizations).
> > > > >
> > > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a
> > > > > trace A->C? ... or is A going missing too?
> > > >
> > > > Correct, it's just the A->C outcome.
> > >
> > > I'd assumed that those cases were benign, e.g. for livepatching what
> > > matters is what can be returned to, so B disappearing from the trace
> > > isn't a problem there.
> > >
> > > Is the concern debugability, or is there a functional issue you have in
> > > mind?
> >
> > For me, it's just been debuggability, and reliable test cases.
> >
> > > > > > Is there a way to also mark a function non-tail-callable?
> > > > >
> > > > > I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> > > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> > > > > function-local optimization options), but I don't expect there's any 
> > > > > way
> > > > > to mark a callee as not being tail-callable.
> > > >
> > > > I don't think this is reliable. It'd be
> > > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
> > > > work if applied to the function we do not want to tail-call-optimize,
> > > > but would have to be applied to the function that does the tail-calling.
> > >
> > > Yup; that's what I meant then I said you could do that on the caller but
> > > not the callee.
> > >
> > > I don't follow why you'd want to put this on the callee, though, so I
> > > think I'm missing something. Considering a set of functions in different
> > > compilation units:
> > >
> > >   A->B->C->D->E->F->G->H->I->J->K
> >
> > I was having this problem with KCSAN, where the compiler would
> > tail-call-optimize __tsan_X instrumentation.
>
> Those are compiler-generated calls, right? When those are generated the
> compilation unit (and whatever it has included) might not have provided
> a prototype anyway, and the compiler has special knowledge of the
> functions, so it feels like the compiler would need to inhibit TCO here
> for this to be robust. For their intended usage subjecting them to TCO
> doesn't seem to make sense AFAICT.
>
> I suspect that compilers have some way of handling that; otherwise I'd
> expect to have heard stories of mcount/fentry calls getting TCO'd and
> causing problems. So maybe there's an easy fix there?

I agree, the compiler builtins should be handled by the compiler
directly, perhaps that was a bad example. But we also have "explicit
instrumentation", e.g. everything that's in .

> > This would mean that KCSAN runtime functions ended up in the trace,
> > but the function where the access happened would not. However, I don't
> > care about the runtime functions, and instead want to see the function
> > where the access happened. In that case, I'd like to just mark
> > __tsan_X and any other kcsan instrumentatio

Re: [PATCH 2/5] CMDLINE: drivers: of: ifdef out cmdline section

2021-03-04 Thread Rob Herring
On Thu, Mar 4, 2021 at 3:24 PM Daniel Walker  wrote:
>
> On Thu, Mar 04, 2021 at 08:09:52AM +0100, Christophe Leroy wrote:
> >
> >
> > Le 04/03/2021 à 05:47, Daniel Walker a écrit :
> > > It looks like there's some seepage of cmdline stuff into
> > > the generic device tree code. This conflicts with the
> > > generic cmdline implementation so I remove it in the case
> > > when that's enabled.
> > >
> > > Cc: xe-linux-exter...@cisco.com
> > > Signed-off-by: Ruslan Ruslichenko 
> > > Signed-off-by: Daniel Walker 
> > > ---
> > >   drivers/of/fdt.c | 12 
> > >   1 file changed, 12 insertions(+)
> > >
> > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > index feb0f2d67fc5..cfe4f8d3c9f5 100644
> > > --- a/drivers/of/fdt.c
> > > +++ b/drivers/of/fdt.c
> > > @@ -25,6 +25,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include   /* for COMMAND_LINE_SIZE */
> > >   #include 
> > > @@ -1048,8 +1049,18 @@ int __init early_init_dt_scan_chosen(unsigned long 
> > > node, const char *uname,
> > > early_init_dt_check_for_initrd(node);
> > > +#ifdef CONFIG_GENERIC_CMDLINE
> > > /* Retrieve command line */
> > > p = of_get_flat_dt_prop(node, "bootargs", &l);
> > > +
> > > +   /*
> > > +* The builtin command line will be added here, or it can override
> > > +* with the DT bootargs.
> > > +*/
> > > +   cmdline_add_builtin(data,
> > > +   ((p != NULL && l > 0) ? p : NULL), /* This is 
> > > sanity checking */
> >
> > Can we do more simple ? If p is NULL, p is already NULL, so (l > 0 ? p : 
> > NULL) should be enough.
>
>
> I believe Rob gave me this line. Maybe he can comment on it.

It's an obvious improvement and LGTM.


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 19:02, Mark Rutland  wrote:
> On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote:
> > On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> > > > > [adding Mark Brown]
> > > > >
> > > > > The bigger problem here is that skipping is dodgy to begin with, and
> > > > > this is still liable to break in some cases. One big concern is that
> > > > > (especially with LTO) we cannot guarantee the compiler will not inline
> > > > > or outline functions, causing the skipp value to be too large or too
> > > > > small. That's liable to happen to callers, and in theory (though
> > > > > unlikely in practice), portions of arch_stack_walk() or
> > > > > stack_trace_save() could get outlined too.
> > > > >
> > > > > Unless we can get some strong guarantees from compiler folk such that 
> > > > > we
> > > > > can guarantee a specific function acts boundary for unwinding (and
> > > > > doesn't itself get split, etc), the only reliable way I can think to
> > > > > solve this requires an assembly trampoline. Whatever we do is liable 
> > > > > to
> > > > > need some invasive rework.
> > > >
> > > > Will LTO and friends respect 'noinline'?
> > >
> > > I hope so (and suspect we'd have more problems otherwise), but I don't
> > > know whether they actually so.
> > >
> > > I suspect even with 'noinline' the compiler is permitted to outline
> > > portions of a function if it wanted to (and IIUC it could still make
> > > specialized copies in the absence of 'noclone').
> > >
> > > > One thing I also noticed is that tail calls would also cause the stack
> > > > trace to appear somewhat incomplete (for some of my tests I've
> > > > disabled tail call optimizations).
> > >
> > > I assume you mean for a chain A->B->C where B tail-calls C, you get a
> > > trace A->C? ... or is A going missing too?
> >
> > Correct, it's just the A->C outcome.
>
> I'd assumed that those cases were benign, e.g. for livepatching what
> matters is what can be returned to, so B disappearing from the trace
> isn't a problem there.
>
> Is the concern debugability, or is there a functional issue you have in
> mind?

For me, it's just been debuggability, and reliable test cases.

> > > > Is there a way to also mark a function non-tail-callable?
> > >
> > > I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> > > function-local optimization options), but I don't expect there's any way
> > > to mark a callee as not being tail-callable.
> >
> > I don't think this is reliable. It'd be
> > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
> > work if applied to the function we do not want to tail-call-optimize,
> > but would have to be applied to the function that does the tail-calling.
>
> Yup; that's what I meant then I said you could do that on the caller but
> not the callee.
>
> I don't follow why you'd want to put this on the callee, though, so I
> think I'm missing something. Considering a set of functions in different
> compilation units:
>
>   A->B->C->D->E->F->G->H->I->J->K

I was having this problem with KCSAN, where the compiler would
tail-call-optimize __tsan_X instrumentation. This would mean that
KCSAN runtime functions ended up in the trace, but the function where
the access happened would not. However, I don't care about the runtime
functions, and instead want to see the function where the access
happened. In that case, I'd like to just mark __tsan_X and any other
kcsan instrumentation functions as do-not-tail-call-optimize, which
would solve the problem.

The solution today is that when you compile a kernel with KCSAN, every
instrumented TU is compiled with -fno-optimize-sibling-calls. The
better solution would be to just mark KCSAN runtime functions somehow,
but permit tail calling other things. Although, I probably still want
to see the full trace, and would decide that having
-fno-optimize-sibling-calls is a small price to pay in a
debug-only-kernel to get complete traces.

> ... if K were marked in this way, and J was compiled with visibility of
> this, J would stick around, but J's callers might not, and so the a
> trace might see:
>
>   A->J->K
>
> ... do you just care about the final caller, i.e. you just need
> certainty that J will be in the trace?

Yes. But maybe it's a special problem that only sanitizers have.

> If so, we can somewhat bodge that by having K have an __always_inline
> wrapper which has a barrier() or similar after the real call to K, so
> the call couldn't be TCO'd.
>
> Otherwise I'd expect we'd probably need to disable TCO generally.

Thanks,
-- Marco


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Segher Boessenkool
Hi!

On Thu, Mar 04, 2021 at 02:57:30PM +, Mark Rutland wrote:
> It looks like GCC is happy to give us the function-entry-time FP if we use
> __builtin_frame_address(1),

>From the GCC manual:
 Calling this function with a nonzero argument can have
 unpredictable effects, including crashing the calling program.  As
 a result, calls that are considered unsafe are diagnosed when the
 '-Wframe-address' option is in effect.  Such calls should only be
 made in debugging situations.

It *does* warn (the warning is in -Wall btw), on both powerpc and
aarch64.  Furthermore, using this builtin causes lousy code (it forces
the use of a frame pointer, which we normally try very hard to optimise
away, for good reason).

And, that warning is not an idle warning.  Non-zero arguments to
__builtin_frame_address can crash the program.  It won't on simpler
functions, but there is no real definition of what a simpler function
*is*.  It is meant for debugging, not for production use (this is also
why no one has bothered to make it faster).

On Power it should work, but on pretty much any other arch it won't.

> Unless we can get some strong guarantees from compiler folk such that we
> can guarantee a specific function acts boundary for unwinding (and
> doesn't itself get split, etc), the only reliable way I can think to
> solve this requires an assembly trampoline. Whatever we do is liable to
> need some invasive rework.

You cannot get such a guarantee, other than not letting the compiler
see into the routine at all, like with assembler code (not inline asm,
real assembler code).

The real way forward is to bite the bullet and to no longer pretend you
can do a full backtrace from just the stack contents.  You cannot.


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Nick Desaulniers
On Thu, Mar 4, 2021 at 9:42 AM Marco Elver  wrote:
>
> On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> > > > [adding Mark Brown]
> > > >
> > > > The bigger problem here is that skipping is dodgy to begin with, and
> > > > this is still liable to break in some cases. One big concern is that
> > > > (especially with LTO) we cannot guarantee the compiler will not inline
> > > > or outline functions, causing the skipp value to be too large or too
> > > > small. That's liable to happen to callers, and in theory (though
> > > > unlikely in practice), portions of arch_stack_walk() or
> > > > stack_trace_save() could get outlined too.
> > > >
> > > > Unless we can get some strong guarantees from compiler folk such that we
> > > > can guarantee a specific function acts boundary for unwinding (and
> > > > doesn't itself get split, etc), the only reliable way I can think to
> > > > solve this requires an assembly trampoline. Whatever we do is liable to
> > > > need some invasive rework.
> > >
> > > Will LTO and friends respect 'noinline'?
> >
> > I hope so (and suspect we'd have more problems otherwise), but I don't
> > know whether they actually so.
> >
> > I suspect even with 'noinline' the compiler is permitted to outline
> > portions of a function if it wanted to (and IIUC it could still make
> > specialized copies in the absence of 'noclone').
> >
> > > One thing I also noticed is that tail calls would also cause the stack
> > > trace to appear somewhat incomplete (for some of my tests I've
> > > disabled tail call optimizations).
> >
> > I assume you mean for a chain A->B->C where B tail-calls C, you get a
> > trace A->C? ... or is A going missing too?
>
> Correct, it's just the A->C outcome.
>
> > > Is there a way to also mark a function non-tail-callable?
> >
> > I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> > function-local optimization options), but I don't expect there's any way
> > to mark a callee as not being tail-callable.
>
> I don't think this is reliable. It'd be
> __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
> work if applied to the function we do not want to tail-call-optimize,
> but would have to be applied to the function that does the tail-calling.
> So it's a bit backwards, even if it worked.
>
> > Accoding to the GCC documentation, GCC won't TCO noreturn functions, but
> > obviously that's not something we can use generally.
> >
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

include/linux/compiler.h:246:
prevent_tail_call_optimization

commit a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try")

>
> Perhaps we can ask the toolchain folks to help add such an attribute. Or
> maybe the feature already exists somewhere, but hidden.
>
> +Cc linux-toolcha...@vger.kernel.org
>
> > > But I'm also not sure if with all that we'd be guaranteed the code we
> > > want, even though in practice it might.
> >
> > True! I'd just like to be on the least dodgy ground we can be.
>
> It's been dodgy for a while, and I'd welcome any low-cost fixes to make
> it less dodgy in the short-term at least. :-)
>
> Thanks,
> -- Marco



-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Marco Elver
On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> > > [adding Mark Brown]
> > >
> > > The bigger problem here is that skipping is dodgy to begin with, and
> > > this is still liable to break in some cases. One big concern is that
> > > (especially with LTO) we cannot guarantee the compiler will not inline
> > > or outline functions, causing the skipp value to be too large or too
> > > small. That's liable to happen to callers, and in theory (though
> > > unlikely in practice), portions of arch_stack_walk() or
> > > stack_trace_save() could get outlined too.
> > >
> > > Unless we can get some strong guarantees from compiler folk such that we
> > > can guarantee a specific function acts boundary for unwinding (and
> > > doesn't itself get split, etc), the only reliable way I can think to
> > > solve this requires an assembly trampoline. Whatever we do is liable to
> > > need some invasive rework.
> > 
> > Will LTO and friends respect 'noinline'?
> 
> I hope so (and suspect we'd have more problems otherwise), but I don't
> know whether they actually so.
> 
> I suspect even with 'noinline' the compiler is permitted to outline
> portions of a function if it wanted to (and IIUC it could still make
> specialized copies in the absence of 'noclone').
> 
> > One thing I also noticed is that tail calls would also cause the stack
> > trace to appear somewhat incomplete (for some of my tests I've
> > disabled tail call optimizations).
> 
> I assume you mean for a chain A->B->C where B tail-calls C, you get a
> trace A->C? ... or is A going missing too?

Correct, it's just the A->C outcome.

> > Is there a way to also mark a function non-tail-callable?
> 
> I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> function-local optimization options), but I don't expect there's any way
> to mark a callee as not being tail-callable.

I don't think this is reliable. It'd be
__attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
work if applied to the function we do not want to tail-call-optimize,
but would have to be applied to the function that does the tail-calling.
So it's a bit backwards, even if it worked.

> Accoding to the GCC documentation, GCC won't TCO noreturn functions, but
> obviously that's not something we can use generally.
> 
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

Perhaps we can ask the toolchain folks to help add such an attribute. Or
maybe the feature already exists somewhere, but hidden.

+Cc linux-toolcha...@vger.kernel.org

> > But I'm also not sure if with all that we'd be guaranteed the code we
> > want, even though in practice it might.
> 
> True! I'd just like to be on the least dodgy ground we can be.

It's been dodgy for a while, and I'd welcome any low-cost fixes to make
it less dodgy in the short-term at least. :-)

Thanks,
-- Marco


Re: [PATCH 2/5] CMDLINE: drivers: of: ifdef out cmdline section

2021-03-04 Thread Daniel Walker
On Thu, Mar 04, 2021 at 08:09:52AM +0100, Christophe Leroy wrote:
> 
> 
> Le 04/03/2021 à 05:47, Daniel Walker a écrit :
> > It looks like there's some seepage of cmdline stuff into
> > the generic device tree code. This conflicts with the
> > generic cmdline implementation so I remove it in the case
> > when that's enabled.
> > 
> > Cc: xe-linux-exter...@cisco.com
> > Signed-off-by: Ruslan Ruslichenko 
> > Signed-off-by: Daniel Walker 
> > ---
> >   drivers/of/fdt.c | 12 
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index feb0f2d67fc5..cfe4f8d3c9f5 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -25,6 +25,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include   /* for COMMAND_LINE_SIZE */
> >   #include 
> > @@ -1048,8 +1049,18 @@ int __init early_init_dt_scan_chosen(unsigned long 
> > node, const char *uname,
> > early_init_dt_check_for_initrd(node);
> > +#ifdef CONFIG_GENERIC_CMDLINE
> > /* Retrieve command line */
> > p = of_get_flat_dt_prop(node, "bootargs", &l);
> > +
> > +   /*
> > +* The builtin command line will be added here, or it can override
> > +* with the DT bootargs.
> > +*/
> > +   cmdline_add_builtin(data,
> > +   ((p != NULL && l > 0) ? p : NULL), /* This is 
> > sanity checking */
> 
> Can we do more simple ? If p is NULL, p is already NULL, so (l > 0 ? p : 
> NULL) should be enough.


I believe Rob gave me this line. Maybe he can comment on it.

Daniel


Re: [PATCH 1/5] CMDLINE: add generic builtin command line

2021-03-04 Thread Daniel Walker
On Thu, Mar 04, 2021 at 08:00:49AM +0100, Christophe Leroy wrote:
> 
> 
> Le 04/03/2021 à 05:47, Daniel Walker a écrit :
> > This code allows architectures to use a generic builtin command line.
> > The state of the builtin command line options across architecture is
> > diverse. On x86 and mips they have pretty much the same code and the
> > code prepends the builtin command line onto the boot loader provided
> > one. On powerpc there is only a builtin override and nothing else.
> 
> This is not exact. powerpc has:
> CONFIG_FROM_BOOTLOADER
> CONFIG_EXTEND
> CONFIG_FORCE
 
I don't currently have ppc64 to test on, but CONFIG_FROM_BOOTLOADER should 
likely
stay, but the other two can come from the generic code.


> > 
> > The code in this commit unifies the code into a generic
> > header file under the CONFIG_GENERIC_CMDLINE option. When this
> > option is enabled the architecture can call the cmdline_add_builtin()
> > to add the builtin command line.
> > 
> > Cc: xe-linux-exter...@cisco.com
> > Signed-off-by: Ruslan Bilovol 
> > Signed-off-by: Daniel Walker 
> > ---
> >   include/linux/cmdline.h | 75 +
> >   init/Kconfig| 68 +
> >   2 files changed, 143 insertions(+)
> >   create mode 100644 include/linux/cmdline.h
> > 
> > diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
> > new file mode 100644
> > index ..f44011d1a9ee
> > --- /dev/null
> > +++ b/include/linux/cmdline.h
> > @@ -0,0 +1,75 @@
> 
> Missing the SPDX Licence Identifier
> 
> > +#ifndef _LINUX_CMDLINE_H
> > +#define _LINUX_CMDLINE_H
> > +
> > +/*
> > + *
> > + * Copyright (C) 2006,2021. Cisco Systems, Inc.
> > + *
> > + * Generic Append/Prepend cmdline support.
> > + */
> > +
> > +#if defined(CONFIG_GENERIC_CMDLINE) && defined(CONFIG_CMDLINE_BOOL)
> 
> I think it would be better if we can avoid the CONFIG_CMDLINE_BOOL.
> By making the CMDLINEs default to "" at all time, I think we can about that 
> BOOL.

Wouldn't it be annoying if you have to deleted all the characters from two text
boxes vs. just disabling a single option ? What if you leave a space
accidentally , woops.

> > +
> > +#ifndef CONFIG_CMDLINE_OVERRIDE
> > +/*
> > + * This function will append or prepend a builtin command line to the 
> > command
> 
> As far as I understand, it doesn't "append _or_ prepend" but it does "append 
> _and_ prepend"

I think the end results is accurately , no need to get pedantic.

> > + * line provided by the bootloader. Kconfig options can be used to alter
> > + * the behavior of this builtin command line.
> > + * @dest: The destination of the final appended/prepended string
> > + * @src: The starting string or NULL if there isn't one.
> > + * @tmp: temporary space used for prepending
> > + * @length: the maximum length of the strings above.
> 
> Missing some parameters here, but I think we should avoid those 'strlcpy'
> and 'strlcat', see later comment.
> 
> > + */
> > +static inline void
> > +__cmdline_add_builtin(char *dest, const char *src, char *tmp, unsigned 
> > long length,
> > +   size_t (*strlcpy)(char *dest, const char *src, size_t size),
> > +   size_t (*strlcat)(char *dest, const char *src, size_t count)
> 
> Don't use names that overide names of existing functions.
> 
> 'count' is __kernel_size_t not size_t
 
It's type checking all the parameters at compile time, it doesn't complain about
this that I've seen.


> > +   )
> > +{
> > +   if (src != dest && src != NULL) {
> > +   strlcpy(dest, " ", length);
> 
> Why do you need a space up front in that case ? Why not just copy the source 
> to the destination ?

There may not be a space between them, it doesn't cost anything to have one.

> > +   strlcat(dest, src, length);
> > +   }
> > +
> > +   if (sizeof(CONFIG_CMDLINE_APPEND) > 1)
> > +   strlcat(dest, " " CONFIG_CMDLINE_APPEND, length);
> > +
> > +   if (sizeof(CONFIG_CMDLINE_PREPEND) > 1) {
> > +   strlcpy(tmp, CONFIG_CMDLINE_PREPEND " ", length);
> > +   strlcat(tmp, dest, length);
> > +   strlcpy(dest, tmp, length);
> 
> Could we use memmove(), or implement strmove() and avoid the temporary buffer 
> at all ?

I don't really want to make drastic alteration like this, unless there is a
better reason for it. Most of this hasn't change inside Cisco's tree for almost 
a decade.

> > +   }
> > +}
> > +
> > +#define cmdline_add_builtin_custom(dest, src, length, label, strlcpy, 
> > strlcat) \
> 
> It is misleading to call parameters 'strlcpy' or 'strlcat', it hides that 
> they are overriden.

I can change the names, it's not a big deal.

> > +{  
> > \
> > +   if (sizeof(CONFIG_CMDLINE_PREPEND) > 1) {   
> > \
> > +   static label char cmdline_tmp_space[length];
> >

Re: [PATCH 2/5] CMDLINE: drivers: of: ifdef out cmdline section

2021-03-04 Thread Daniel Walker
On Thu, Mar 04, 2021 at 08:32:37AM -0600, Rob Herring wrote:
> On Wed, Mar 3, 2021 at 10:48 PM Daniel Walker  wrote:
> >
> > It looks like there's some seepage of cmdline stuff into
> > the generic device tree code. This conflicts with the
> > generic cmdline implementation so I remove it in the case
> > when that's enabled.
> >
> > Cc: xe-linux-exter...@cisco.com
> > Signed-off-by: Ruslan Ruslichenko 
> > Signed-off-by: Daniel Walker 
> > ---
> >  drivers/of/fdt.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index feb0f2d67fc5..cfe4f8d3c9f5 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -25,6 +25,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include   /* for COMMAND_LINE_SIZE */
> >  #include 
> > @@ -1048,8 +1049,18 @@ int __init early_init_dt_scan_chosen(unsigned long 
> > node, const char *uname,
> >
> > early_init_dt_check_for_initrd(node);
> >
> > +#ifdef CONFIG_GENERIC_CMDLINE
> 
> What I like about Christophe's version is it removes the old DT
> implementation. Who's going to convert the rest of the DT based
> arches? That's arm, arm64, hexagon, microblaze, nios2, openrisc,
> riscv, sh, and xtensa. Either separate the common code from the config
> like Christophe's version or these all need converting. Though it's
> fine to hash out patch 1 with a couple of arches first.
 
I'm limited in what I can test, so I can't know for sure that I have something
which works on those architectures. Even powerpc 64 is part of this series but
I can't really test it at this time. Also Cisco's needs out strip the
implementation of extend or override.

I have un-tested conversions for arm32, arm64, c6x, microblaze, nios2, and
openrisc. These could go into -next and we can see who complains. The
implementation on these architectures isn't all uniform.

> > /* Retrieve command line */
> > p = of_get_flat_dt_prop(node, "bootargs", &l);
> 
> This needs to be outside the ifdef.

Ok ..

Daniel


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> [adding Mark Brown]
>
> On Wed, Mar 03, 2021 at 04:20:43PM +0100, Marco Elver wrote:
> > On Wed, Mar 03, 2021 at 03:52PM +0100, Christophe Leroy wrote:
> > > Le 03/03/2021 � 15:38, Marco Elver a �crit�:
> > > > On Wed, 3 Mar 2021 at 15:09, Christophe Leroy
> > > >  wrote:
> > > > >
> > > > > It seems like all other sane architectures, namely x86 and arm64
> > > > > at least, include the running function as top entry when saving
> > > > > stack trace.
> > > > >
> > > > > Functionnalities like KFENCE expect it.
> > > > >
> > > > > Do the same on powerpc, it allows KFENCE to properly identify the 
> > > > > faulting
> > > > > function as depicted below. Before the patch KFENCE was identifying
> > > > > finish_task_switch.isra as the faulting function.
> > > > >
> > > > > [   14.937370] 
> > > > > ==
> > > > > [   14.948692] BUG: KFENCE: invalid read in 
> > > > > test_invalid_access+0x54/0x108
> > > > > [   14.948692]
> > > > > [   14.956814] Invalid read at 0xdf98800a:
> > > > > [   14.960664]  test_invalid_access+0x54/0x108
> > > > > [   14.964876]  finish_task_switch.isra.0+0x54/0x23c
> > > > > [   14.969606]  kunit_try_run_case+0x5c/0xd0
> > > > > [   14.973658]  kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > > [   14.979079]  kthread+0x15c/0x174
> > > > > [   14.982342]  ret_from_kernel_thread+0x14/0x1c
> > > > > [   14.986731]
> > > > > [   14.988236] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: GB  
> > > > >5.12.0-rc1-01537-g95f6e2088d7e-dirty #4682
> > > > > [   14.999795] NIP:  c016ec2c LR: c02f517c CTR: c016ebd8
> > > > > [   15.004851] REGS: e2449d90 TRAP: 0301   Tainted: GB
> > > > >   (5.12.0-rc1-01537-g95f6e2088d7e-dirty)
> > > > > [   15.015274] MSR:  9032   CR: 2204  XER: 
> > > > > 
> > > > > [   15.022043] DAR: df98800a DSISR: 2000
> > > > > [   15.022043] GPR00: c02f517c e2449e50 c1142080 e100dd24 c084b13c 
> > > > > 0008 c084b32b c016ebd8
> > > > > [   15.022043] GPR08: c085 df988000 c0d1 e2449eb0 22000288
> > > > > [   15.040581] NIP [c016ec2c] test_invalid_access+0x54/0x108
> > > > > [   15.046010] LR [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > > [   15.051181] Call Trace:
> > > > > [   15.053637] [e2449e50] [c005a68c] 
> > > > > finish_task_switch.isra.0+0x54/0x23c (unreliable)
> > > > > [   15.061338] [e2449eb0] [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > > [   15.067215] [e2449ed0] [c02f648c] 
> > > > > kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > > [   15.074472] [e2449ef0] [c004e7b0] kthread+0x15c/0x174
> > > > > [   15.079571] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c
> > > > > [   15.085798] Instruction dump:
> > > > > [   15.088784] 8129d608 38e7ebd8 81020280 911f004c 3900 995f0024 
> > > > > 907f0028 90ff001c
> > > > > [   15.096613] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 
> > > > > 3908adb0 812a4b98 3d40c02f
> > > > > [   15.104612] 
> > > > > ==
> > > > >
> > > > > Signed-off-by: Christophe Leroy 
> > > >
> > > > Acked-by: Marco Elver 
> > > >
> > > > Thank you, I think this looks like the right solution. Just a question 
> > > > below:
> > > >
> > > ...
> > >
> > > > > @@ -59,23 +70,26 @@ void save_stack_trace(struct stack_trace *trace)
> > > > >
> > > > >  sp = current_stack_frame();
> > > > >
> > > > > -   save_context_stack(trace, sp, current, 1);
> > > > > +   save_context_stack(trace, sp, (unsigned 
> > > > > long)save_stack_trace, current, 1);
> > > >
> > > > This causes ip == save_stack_trace and also below for
> > > > save_stack_trace_tsk. Does this mean save_stack_trace() is included in
> > > > the trace? Looking at kernel/stacktrace.c, I think the library wants
> > > > to exclude itself from the trace, as it does '.skip = skipnr + 1' (and
> > > > '.skip   = skipnr + (current == tsk)' for the _tsk variant).
> > > >
> > > > If the arch-helper here is included, should this use _RET_IP_ instead?
> > > >
> > >
> > > Don't really know, I was inspired by arm64 which has:
> > >
> > > void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > >  struct task_struct *task, struct pt_regs *regs)
> > > {
> > > struct stackframe frame;
> > >
> > > if (regs)
> > > start_backtrace(&frame, regs->regs[29], regs->pc);
> > > else if (task == current)
> > > start_backtrace(&frame,
> > > (unsigned long)__builtin_frame_address(0),
> > > (unsigned long)arch_stack_walk);
> > > else
> > > start_backtrace(&frame, thread_saved_fp(task),
> > > thread_saved_pc(task));
> > >
> > > walk_stackframe(task, &frame, consume_entry, cookie);
> > > }
> > >
> > > But looking at x86 you may be right, so what should be done rea

Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Segher Boessenkool
On Thu, Mar 04, 2021 at 09:54:44AM -0800, Nick Desaulniers wrote:
> On Thu, Mar 4, 2021 at 9:42 AM Marco Elver  wrote:
> include/linux/compiler.h:246:
> prevent_tail_call_optimization
> 
> commit a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try")

That is much heavier than needed (an mb()).  You can just put an empty
inline asm after a call before a return, and that call cannot be
optimised to a sibling call: (the end of a function is an implicit
return:)

Instead of:

void g(void);
void f(int x)
if (x)
g();
}

Do:

void g(void);
void f(int x)
if (x)
g();
asm("");
}

This costs no extra instructions, and certainly not something as heavy
as an mb()!  It works without the "if" as well, of course, but with it
it is a more interesting example of a tail call.


Segher


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Mark Rutland
On Thu, Mar 04, 2021 at 07:22:53PM +0100, Marco Elver wrote:
> On Thu, 4 Mar 2021 at 19:02, Mark Rutland  wrote:
> > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote:
> > > On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  
> > > > > wrote:
> > > > > > [adding Mark Brown]
> > > > > >
> > > > > > The bigger problem here is that skipping is dodgy to begin with, and
> > > > > > this is still liable to break in some cases. One big concern is that
> > > > > > (especially with LTO) we cannot guarantee the compiler will not 
> > > > > > inline
> > > > > > or outline functions, causing the skipp value to be too large or too
> > > > > > small. That's liable to happen to callers, and in theory (though
> > > > > > unlikely in practice), portions of arch_stack_walk() or
> > > > > > stack_trace_save() could get outlined too.
> > > > > >
> > > > > > Unless we can get some strong guarantees from compiler folk such 
> > > > > > that we
> > > > > > can guarantee a specific function acts boundary for unwinding (and
> > > > > > doesn't itself get split, etc), the only reliable way I can think to
> > > > > > solve this requires an assembly trampoline. Whatever we do is 
> > > > > > liable to
> > > > > > need some invasive rework.
> > > > >
> > > > > Will LTO and friends respect 'noinline'?
> > > >
> > > > I hope so (and suspect we'd have more problems otherwise), but I don't
> > > > know whether they actually so.
> > > >
> > > > I suspect even with 'noinline' the compiler is permitted to outline
> > > > portions of a function if it wanted to (and IIUC it could still make
> > > > specialized copies in the absence of 'noclone').
> > > >
> > > > > One thing I also noticed is that tail calls would also cause the stack
> > > > > trace to appear somewhat incomplete (for some of my tests I've
> > > > > disabled tail call optimizations).
> > > >
> > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a
> > > > trace A->C? ... or is A going missing too?
> > >
> > > Correct, it's just the A->C outcome.
> >
> > I'd assumed that those cases were benign, e.g. for livepatching what
> > matters is what can be returned to, so B disappearing from the trace
> > isn't a problem there.
> >
> > Is the concern debugability, or is there a functional issue you have in
> > mind?
> 
> For me, it's just been debuggability, and reliable test cases.
> 
> > > > > Is there a way to also mark a function non-tail-callable?
> > > >
> > > > I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> > > > function-local optimization options), but I don't expect there's any way
> > > > to mark a callee as not being tail-callable.
> > >
> > > I don't think this is reliable. It'd be
> > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
> > > work if applied to the function we do not want to tail-call-optimize,
> > > but would have to be applied to the function that does the tail-calling.
> >
> > Yup; that's what I meant then I said you could do that on the caller but
> > not the callee.
> >
> > I don't follow why you'd want to put this on the callee, though, so I
> > think I'm missing something. Considering a set of functions in different
> > compilation units:
> >
> >   A->B->C->D->E->F->G->H->I->J->K
> 
> I was having this problem with KCSAN, where the compiler would
> tail-call-optimize __tsan_X instrumentation.

Those are compiler-generated calls, right? When those are generated the
compilation unit (and whatever it has included) might not have provided
a prototype anyway, and the compiler has special knowledge of the
functions, so it feels like the compiler would need to inhibit TCO here
for this to be robust. For their intended usage subjecting them to TCO
doesn't seem to make sense AFAICT.

I suspect that compilers have some way of handling that; otherwise I'd
expect to have heard stories of mcount/fentry calls getting TCO'd and
causing problems. So maybe there's an easy fix there?

> This would mean that KCSAN runtime functions ended up in the trace,
> but the function where the access happened would not. However, I don't
> care about the runtime functions, and instead want to see the function
> where the access happened. In that case, I'd like to just mark
> __tsan_X and any other kcsan instrumentation functions as
> do-not-tail-call-optimize, which would solve the problem.

I understand why we don't want to TCO these calls, but given the calls
are implicitly generated, I strongly suspect it's better to fix the
implicit call generation to not be TCO'd to begin with.

> The solution today is that when you compile a kernel with KCSAN, every
> instrumented TU is compiled with -fno-optimize-sibling-calls. The
> better solution would be to just mark KCSAN runtime functions somehow,
> but pe

Re: [PATCH] scsi: ibmvfc: Switch to using the new API kobj_to_dev()

2021-03-04 Thread Tyrel Datwyler
On 3/4/21 1:28 AM, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/scsi/ibmvscsi/ibmvfc.c:3483:60-61: WARNING opportunity for
> kobj_to_dev().
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Acked-by: Tyrel Datwyler 

> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 755313b..e5f1ca7 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3480,7 +3480,7 @@ static ssize_t ibmvfc_read_trace(struct file *filp, 
> struct kobject *kobj,
>struct bin_attribute *bin_attr,
>char *buf, loff_t off, size_t count)
>  {
> - struct device *dev = container_of(kobj, struct device, kobj);
> + struct device *dev = kobj_to_dev(kobj);
>   struct Scsi_Host *shost = class_to_shost(dev);
>   struct ibmvfc_host *vhost = shost_priv(shost);
>   unsigned long flags = 0;
> 



Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 15:08, Christophe Leroy
 wrote:
>
>
>
> Le 04/03/2021 à 13:48, Marco Elver a écrit :
> >  From d118080eb9552073f5dcf1f86198f3d86d5ea850 Mon Sep 17 00:00:00 2001
> > From: Marco Elver 
> > Date: Thu, 4 Mar 2021 13:15:51 +0100
> > Subject: [PATCH] kfence: fix reports if constant function prefixes exist
> >
> > Some architectures prefix all functions with a constant string ('.' on
> > ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in
> > , so that get_stack_skipnr() can work properly.
>
>
> It works, thanks.
>
> >
> > Link: 
> > https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f...@csgroup.eu
> > Reported-by: Christophe Leroy 
> > Signed-off-by: Marco Elver 
>
> Tested-by: Christophe Leroy 

Thanks, I'll send this to Andrew for inclusion in -mm, since this is
not a strict dependency (it'll work without the patch, just the stack
traces aren't that pretty but still useful). If the ppc patches and
this make it into the next merge window, everything should be good for
5.13.

> > ---
> >   mm/kfence/report.c | 18 --
> >   1 file changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/kfence/report.c b/mm/kfence/report.c
> > index 519f037720f5..e3f71451ad9e 100644
> > --- a/mm/kfence/report.c
> > +++ b/mm/kfence/report.c
> > @@ -20,6 +20,11 @@
> >
> >   #include "kfence.h"
> >
> > +/* May be overridden by . */
> > +#ifndef ARCH_FUNC_PREFIX
> > +#define ARCH_FUNC_PREFIX ""
> > +#endif
> > +
> >   extern bool no_hash_pointers;
> >
> >   /* Helper function to either print to a seq_file or to console. */
> > @@ -67,8 +72,9 @@ static int get_stack_skipnr(const unsigned long 
> > stack_entries[], int num_entries
> >   for (skipnr = 0; skipnr < num_entries; skipnr++) {
> >   int len = scnprintf(buf, sizeof(buf), "%ps", (void 
> > *)stack_entries[skipnr]);
> >
> > - if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, 
> > "__kfence_") ||
> > - !strncmp(buf, "__slab_free", len)) {
> > + if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfence_") ||
> > + str_has_prefix(buf, ARCH_FUNC_PREFIX "__kfence_") ||
> > + !strncmp(buf, ARCH_FUNC_PREFIX "__slab_free", len)) {
> >   /*
> >* In case of tail calls from any of the below
> >* to any of the above.
> > @@ -77,10 +83,10 @@ static int get_stack_skipnr(const unsigned long 
> > stack_entries[], int num_entries
> >   }
> >
> >   /* Also the *_bulk() variants by only checking prefixes. */
> > - if (str_has_prefix(buf, "kfree") ||
> > - str_has_prefix(buf, "kmem_cache_free") ||
> > - str_has_prefix(buf, "__kmalloc") ||
> > - str_has_prefix(buf, "kmem_cache_alloc"))
> > + if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfree") ||
> > + str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_free") ||
> > + str_has_prefix(buf, ARCH_FUNC_PREFIX "__kmalloc") ||
> > + str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_alloc"))
> >   goto found;
> >   }
> >   if (fallback < num_entries)
> >


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Mark Rutland
On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote:
> On Thu, Mar 04, 2021 at 04:59PM +, Mark Rutland wrote:
> > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> > > > [adding Mark Brown]
> > > >
> > > > The bigger problem here is that skipping is dodgy to begin with, and
> > > > this is still liable to break in some cases. One big concern is that
> > > > (especially with LTO) we cannot guarantee the compiler will not inline
> > > > or outline functions, causing the skipp value to be too large or too
> > > > small. That's liable to happen to callers, and in theory (though
> > > > unlikely in practice), portions of arch_stack_walk() or
> > > > stack_trace_save() could get outlined too.
> > > >
> > > > Unless we can get some strong guarantees from compiler folk such that we
> > > > can guarantee a specific function acts boundary for unwinding (and
> > > > doesn't itself get split, etc), the only reliable way I can think to
> > > > solve this requires an assembly trampoline. Whatever we do is liable to
> > > > need some invasive rework.
> > > 
> > > Will LTO and friends respect 'noinline'?
> > 
> > I hope so (and suspect we'd have more problems otherwise), but I don't
> > know whether they actually so.
> > 
> > I suspect even with 'noinline' the compiler is permitted to outline
> > portions of a function if it wanted to (and IIUC it could still make
> > specialized copies in the absence of 'noclone').
> > 
> > > One thing I also noticed is that tail calls would also cause the stack
> > > trace to appear somewhat incomplete (for some of my tests I've
> > > disabled tail call optimizations).
> > 
> > I assume you mean for a chain A->B->C where B tail-calls C, you get a
> > trace A->C? ... or is A going missing too?
> 
> Correct, it's just the A->C outcome.

I'd assumed that those cases were benign, e.g. for livepatching what
matters is what can be returned to, so B disappearing from the trace
isn't a problem there.

Is the concern debugability, or is there a functional issue you have in
mind?

> > > Is there a way to also mark a function non-tail-callable?
> > 
> > I think this can be bodged using __attribute__((optimize("$OPTIONS")))
> > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
> > function-local optimization options), but I don't expect there's any way
> > to mark a callee as not being tail-callable.
> 
> I don't think this is reliable. It'd be
> __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't
> work if applied to the function we do not want to tail-call-optimize,
> but would have to be applied to the function that does the tail-calling.

Yup; that's what I meant then I said you could do that on the caller but
not the callee.

I don't follow why you'd want to put this on the callee, though, so I
think I'm missing something. Considering a set of functions in different
compilation units:

  A->B->C->D->E->F->G->H->I->J->K

... if K were marked in this way, and J was compiled with visibility of
this, J would stick around, but J's callers might not, and so the a
trace might see:

  A->J->K

... do you just care about the final caller, i.e. you just need
certainty that J will be in the trace?

If so, we can somewhat bodge that by having K have an __always_inline
wrapper which has a barrier() or similar after the real call to K, so
the call couldn't be TCO'd.

Otherwise I'd expect we'd probably need to disable TCO generally.

> So it's a bit backwards, even if it worked.
> 
> > Accoding to the GCC documentation, GCC won't TCO noreturn functions, but
> > obviously that's not something we can use generally.
> > 
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
> 
> Perhaps we can ask the toolchain folks to help add such an attribute. Or
> maybe the feature already exists somewhere, but hidden.
> 
> +Cc linux-toolcha...@vger.kernel.org
> 
> > > But I'm also not sure if with all that we'd be guaranteed the code we
> > > want, even though in practice it might.
> > 
> > True! I'd just like to be on the least dodgy ground we can be.
> 
> It's been dodgy for a while, and I'd welcome any low-cost fixes to make
> it less dodgy in the short-term at least. :-)

:)

Thanks,
Mark.


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Mark Rutland
On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote:
> On Thu, 4 Mar 2021 at 15:57, Mark Rutland  wrote:
> > [adding Mark Brown]
> >
> > The bigger problem here is that skipping is dodgy to begin with, and
> > this is still liable to break in some cases. One big concern is that
> > (especially with LTO) we cannot guarantee the compiler will not inline
> > or outline functions, causing the skipp value to be too large or too
> > small. That's liable to happen to callers, and in theory (though
> > unlikely in practice), portions of arch_stack_walk() or
> > stack_trace_save() could get outlined too.
> >
> > Unless we can get some strong guarantees from compiler folk such that we
> > can guarantee a specific function acts boundary for unwinding (and
> > doesn't itself get split, etc), the only reliable way I can think to
> > solve this requires an assembly trampoline. Whatever we do is liable to
> > need some invasive rework.
> 
> Will LTO and friends respect 'noinline'?

I hope so (and suspect we'd have more problems otherwise), but I don't
know whether they actually so.

I suspect even with 'noinline' the compiler is permitted to outline
portions of a function if it wanted to (and IIUC it could still make
specialized copies in the absence of 'noclone').

> One thing I also noticed is that tail calls would also cause the stack
> trace to appear somewhat incomplete (for some of my tests I've
> disabled tail call optimizations).

I assume you mean for a chain A->B->C where B tail-calls C, you get a
trace A->C? ... or is A going missing too?

> Is there a way to also mark a function non-tail-callable?

I think this can be bodged using __attribute__((optimize("$OPTIONS")))
on a caller to inhibit TCO (though IIRC GCC doesn't reliably support
function-local optimization options), but I don't expect there's any way
to mark a callee as not being tail-callable.

Accoding to the GCC documentation, GCC won't TCO noreturn functions, but
obviously that's not something we can use generally.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

> But I'm also not sure if with all that we'd be guaranteed the code we
> want, even though in practice it might.

True! I'd just like to be on the least dodgy ground we can be.

Thanks,
Mark.


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Marco Elver
On Thu, Mar 04, 2021 at 12:48PM +0100, Christophe Leroy wrote:
> Le 04/03/2021 à 12:31, Marco Elver a écrit :
> > On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
> >  wrote:
> > > Le 03/03/2021 à 11:56, Marco Elver a écrit :
> > > > 
> > > > Somewhat tangentially, I also note that e.g. show_regs(regs) (which
> > > > was printed along the KFENCE report above) didn't include the top
> > > > frame in the "Call Trace", so this assumption is definitely not
> > > > isolated to KFENCE.
> > > > 
> > > 
> > > Now, I have tested PPC64 (with the patch I sent yesterday to modify 
> > > save_stack_trace_regs()
> > > applied), and I get many failures. Any idea ?
> > > 
> > > [   17.653751][   T58] 
> > > ==
> > > [   17.654379][   T58] BUG: KFENCE: invalid free in 
> > > .kfence_guarded_free+0x2e4/0x530
> > > [   17.654379][   T58]
> > > [   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
> > > [   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
> > > [   17.655775][   T58]  .__slab_free+0x320/0x5a0
> > > [   17.656039][   T58]  .test_double_free+0xe0/0x198
> > > [   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
> > > [   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> > > [   17.657161][   T58]  .kthread+0x18c/0x1a0
> > > [   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
> > > [   17.659869][   T58]
[...]
> > 
> > Looks like something is prepending '.' to function names. We expect
> > the function name to appear as-is, e.g. "kfence_guarded_free",
> > "test_double_free", etc.
> > 
> > Is there something special on ppc64, where the '.' is some convention?
> > 
> 
> I think so, see 
> https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES
> 
> Also see commit https://github.com/linuxppc/linux/commit/02424d896

Thanks -- could you try the below patch? You'll need to define
ARCH_FUNC_PREFIX accordingly.

We think, since there are only very few architectures that add a prefix,
requiring  to define something like ARCH_FUNC_PREFIX is
the simplest option. Let me know if this works for you.

There an alternative option, which is to dynamically figure out the
prefix, but if this simpler option is fine with you, we'd prefer it.

Thanks,
-- Marco

-- >8 --

>From d118080eb9552073f5dcf1f86198f3d86d5ea850 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Mar 2021 13:15:51 +0100
Subject: [PATCH] kfence: fix reports if constant function prefixes exist

Some architectures prefix all functions with a constant string ('.' on
ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in
, so that get_stack_skipnr() can work properly.

Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f...@csgroup.eu
Reported-by: Christophe Leroy 
Signed-off-by: Marco Elver 
---
 mm/kfence/report.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/mm/kfence/report.c b/mm/kfence/report.c
index 519f037720f5..e3f71451ad9e 100644
--- a/mm/kfence/report.c
+++ b/mm/kfence/report.c
@@ -20,6 +20,11 @@
 
 #include "kfence.h"
 
+/* May be overridden by . */
+#ifndef ARCH_FUNC_PREFIX
+#define ARCH_FUNC_PREFIX ""
+#endif
+
 extern bool no_hash_pointers;
 
 /* Helper function to either print to a seq_file or to console. */
@@ -67,8 +72,9 @@ static int get_stack_skipnr(const unsigned long 
stack_entries[], int num_entries
for (skipnr = 0; skipnr < num_entries; skipnr++) {
int len = scnprintf(buf, sizeof(buf), "%ps", (void 
*)stack_entries[skipnr]);
 
-   if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, 
"__kfence_") ||
-   !strncmp(buf, "__slab_free", len)) {
+   if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfence_") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "__kfence_") ||
+   !strncmp(buf, ARCH_FUNC_PREFIX "__slab_free", len)) {
/*
 * In case of tail calls from any of the below
 * to any of the above.
@@ -77,10 +83,10 @@ static int get_stack_skipnr(const unsigned long 
stack_entries[], int num_entries
}
 
/* Also the *_bulk() variants by only checking prefixes. */
-   if (str_has_prefix(buf, "kfree") ||
-   str_has_prefix(buf, "kmem_cache_free") ||
-   str_has_prefix(buf, "__kmalloc") ||
-   str_has_prefix(buf, "kmem_cache_alloc"))
+   if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfree") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_free") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "__kmalloc") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_alloc"))
goto found;
}
if (fallback < num_entries)
-- 
2.30.1.766.gb4fecdf3b7-goog


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 13:00, Christophe Leroy
 wrote:
>
>
>
> Le 04/03/2021 à 12:48, Christophe Leroy a écrit :
> >
> >
> > Le 04/03/2021 à 12:31, Marco Elver a écrit :
> >> On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
> >>  wrote:
> >>> Le 03/03/2021 à 11:56, Marco Elver a écrit :
> 
>  Somewhat tangentially, I also note that e.g. show_regs(regs) (which
>  was printed along the KFENCE report above) didn't include the top
>  frame in the "Call Trace", so this assumption is definitely not
>  isolated to KFENCE.
> 
> >>>
> >>> Now, I have tested PPC64 (with the patch I sent yesterday to modify 
> >>> save_stack_trace_regs()
> >>> applied), and I get many failures. Any idea ?
> >>>
> >>> [   17.653751][   T58] 
> >>> ==
> >>> [   17.654379][   T58] BUG: KFENCE: invalid free in 
> >>> .kfence_guarded_free+0x2e4/0x530
> >>> [   17.654379][   T58]
> >>> [   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
> >>> [   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
> >>> [   17.655775][   T58]  .__slab_free+0x320/0x5a0
> >>> [   17.656039][   T58]  .test_double_free+0xe0/0x198
> >>> [   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
> >>> [   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> >>> [   17.657161][   T58]  .kthread+0x18c/0x1a0
> >>> [   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
> >>> [   17.659869][   T58]
> >>> [   17.663954][   T58] kfence-#77 [0xc0003c9c-0xc0003c9c001f, 
> >>> size=32, cache=kmalloc-32]
> >>> allocated by task 58:
> >>> [   17.666113][   T58]  .__kfence_alloc+0x1bc/0x510
> >>> [   17.667069][   T58]  .__kmalloc+0x280/0x4f0
> >>> [   17.667452][   T58]  .test_alloc+0x19c/0x430
> >>> [   17.667732][   T58]  .test_double_free+0x88/0x198
> >>> [   17.667971][   T58]  .kunit_try_run_case+0x80/0x110
> >>> [   17.668283][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> >>> [   17.668553][   T58]  .kthread+0x18c/0x1a0
> >>> [   17.669315][   T58]  .ret_from_kernel_thread+0x58/0x70
> >>> [   17.669711][   T58]
> >>> [   17.669711][   T58] freed by task 58:
> >>> [   17.670116][   T58]  .kfence_guarded_free+0x3d0/0x530
> >>> [   17.670421][   T58]  .__slab_free+0x320/0x5a0
> >>> [   17.670603][   T58]  .test_double_free+0xb4/0x198
> >>> [   17.670827][   T58]  .kunit_try_run_case+0x80/0x110
> >>> [   17.671073][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> >>> [   17.671410][   T58]  .kthread+0x18c/0x1a0
> >>> [   17.671618][   T58]  .ret_from_kernel_thread+0x58/0x70
> >>> [   17.671972][   T58]
> >>> [   17.672638][   T58] CPU: 0 PID: 58 Comm: kunit_try_catch Tainted: G
> >>> B
> >>> 5.12.0-rc1-01540-g0783285cc1b8-dirty #4685
> >>> [   17.673768][   T58] 
> >>> ==
> >>> [   17.677031][   T58] # test_double_free: EXPECTATION FAILED at 
> >>> mm/kfence/kfence_test.c:380
> >>> [   17.677031][   T58] Expected report_matches(&expect) to be true, 
> >>> but is false
> >>> [   17.684397][T1] not ok 7 - test_double_free
> >>> [   17.686463][   T59] # test_double_free-memcache: setup_test_cache: 
> >>> size=32, ctor=0x0
> >>> [   17.688403][   T59] # test_double_free-memcache: test_alloc: 
> >>> size=32, gfp=cc0, policy=any,
> >>> cache=1
> >>
> >> Looks like something is prepending '.' to function names. We expect
> >> the function name to appear as-is, e.g. "kfence_guarded_free",
> >> "test_double_free", etc.
> >>
> >> Is there something special on ppc64, where the '.' is some convention?
> >>
> >
> > I think so, see 
> > https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES
> >
> > Also see commit https://github.com/linuxppc/linux/commit/02424d896
> >
>
> But I'm wondering, if the dot is the problem, how so is the following one ok ?
>
> [   79.574457][   T75] # test_krealloc: test_alloc: size=32, gfp=cc0, 
> policy=any, cache=0
> [   79.682728][   T75] 
> ==
> [   79.684017][   T75] BUG: KFENCE: use-after-free read in 
> .test_krealloc+0x4fc/0x5b8
> [   79.684017][   T75]
> [   79.684955][   T75] Use-after-free read at 0xc0003d06 (in 
> kfence-#130):
> [   79.687581][   T75]  .test_krealloc+0x4fc/0x5b8
> [   79.688216][   T75]  .test_krealloc+0x4e4/0x5b8
> [   79.688824][   T75]  .kunit_try_run_case+0x80/0x110
> [   79.689737][   T75]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> [   79.690335][   T75]  .kthread+0x18c/0x1a0
> [   79.691092][   T75]  .ret_from_kernel_thread+0x58/0x70
> [   79.692081][   T75]
> [   79.692671][   T75] kfence-#130 [0xc0003d06-0xc0003d06001f, 
> size=32,
> cache=kmalloc-32] allocated by task 75:
> [   79.700977][   T75]  .__kfence_alloc+0x1bc/0x510
> [   79.701812][   T75]  .__kmalloc+0x280/0x4f0
> [   79.702695][   T75]  .test_alloc+0x19c/0x430
> [   79.703051][   T75]  .test_krealloc+0xa8/0

Re: [PATCH v2 1/3] powerpc: sstep: Fix load and update emulation

2021-03-04 Thread Segher Boessenkool
On Wed, Mar 03, 2021 at 10:01:27PM +0530, Naveen N. Rao wrote:
> On 2021/03/01 08:37PM, Segher Boessenkool wrote:
> > > And, r6 always ends up with 0xaea. It changes with the value I put into 
> > > r6 though.
> > 
> > That is exactly the behaviour specified for p8.  0aaa+0040=0aea.
> > 
> > > Granted, this is all up in the air, but it does look like there is more 
> > > going on and the value isn't the EA or the value at the address.
> > 
> > That *is* the EA.  The EA is the address the insn does the access at.
> 
> I'm probably missing something here. 0xaaa is the value I stored at an 
> offset of 64 bytes from the stack pointer (r1 is copied into r6). In the 
> ldu instruction above, the EA is 64(r6), which should translate to 
> r1+64.  The data returned by the load would be 0xaaa, which should be 
> discarded per the description you provided above. So, I would expect to 
> see a 0xc0.. address in r6.

Yes, I misread your code it seems.

> In fact, this looks to be the behavior documented for P9:
> 
> > > Power9 does:
> > >
> > >   Load with Update Instructions (RA = 0)
> > > EA is placed into R0.
> > >   Load with Update Instructions (RA = RT)
> > > The storage operand addressed by EA is accessed. The 
> > > displacement
> > > field is added to the data returned by the load and placed into 
> > > RT.

Yup.  So on what cpu did you test?

Either way, the kernel should not emulate any particular cpu here, I'd
say, esp. since recent cpus do different things for this invalid form.


Segher


Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(&smmu_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
  struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
+stru

Re: [PATCH v2] powerpc/32: remove bogus ppc_select syscall

2021-03-04 Thread Arnd Bergmann
On Thu, Mar 4, 2021 at 4:24 PM Christophe Leroy
 wrote:
> Le 04/03/2021 à 16:17, Arnd Bergmann a écrit :
> > On Thu, Mar 4, 2021 at 1:51 PM Christophe Leroy 
> >  wrote:
> >> ---
> >> First version was in 2008, at that time it was rejected, see
> >> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/200809240839.14902.a...@arndb.de/
> >
> > The patch from 2008 did two things:
> >
> > - it removed the ppc32 specific 'select' syscall at #82
> > - it fixed the generic '_newselect' syscall at #142
> >
> > Back then, the decision was to only address the second issue, which
> > got merged in commit dad2f2fb0fc7 ("powerpc: Fix wrong error code from
> > ppc32 select syscall").
> >
> > It is probably ok to remove the old select system call now, but
> > my changelog text no longer makes sense, as the patch has nothing
> > to do with the bug that was reported back then.
> >
>
> I understood that the original reported bug was that calling that version of 
> select() with a
> negative value as first parametre would lead to a -EFAULT instead of -EINVAL. 
> That's exactly the
> case here, if you set n = -1 you get into this (unsigned long)n > 4096 then 
> the buffer is at
> 0x and access_ok() won't grand access to it so the return value will 
> be -EFAULT instead of
> -EINVAL.
>
> Am I missing something ?

This is the behavior of the ppc_select() implementation, but as far as
I can tell,
the bug report was for the problem that this behavior would happen for both
syscall #82 and syscall #142 when the correct behavior would have been to
only do it for #82 but not for #142.

   Arnd


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = -EN

Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Marco Elver
On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
 wrote:
> Le 03/03/2021 à 11:56, Marco Elver a écrit :
> >
> > Somewhat tangentially, I also note that e.g. show_regs(regs) (which
> > was printed along the KFENCE report above) didn't include the top
> > frame in the "Call Trace", so this assumption is definitely not
> > isolated to KFENCE.
> >
>
> Now, I have tested PPC64 (with the patch I sent yesterday to modify 
> save_stack_trace_regs()
> applied), and I get many failures. Any idea ?
>
> [   17.653751][   T58] 
> ==
> [   17.654379][   T58] BUG: KFENCE: invalid free in 
> .kfence_guarded_free+0x2e4/0x530
> [   17.654379][   T58]
> [   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
> [   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
> [   17.655775][   T58]  .__slab_free+0x320/0x5a0
> [   17.656039][   T58]  .test_double_free+0xe0/0x198
> [   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
> [   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> [   17.657161][   T58]  .kthread+0x18c/0x1a0
> [   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
> [   17.659869][   T58]
> [   17.663954][   T58] kfence-#77 [0xc0003c9c-0xc0003c9c001f, 
> size=32, cache=kmalloc-32]
> allocated by task 58:
> [   17.666113][   T58]  .__kfence_alloc+0x1bc/0x510
> [   17.667069][   T58]  .__kmalloc+0x280/0x4f0
> [   17.667452][   T58]  .test_alloc+0x19c/0x430
> [   17.667732][   T58]  .test_double_free+0x88/0x198
> [   17.667971][   T58]  .kunit_try_run_case+0x80/0x110
> [   17.668283][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> [   17.668553][   T58]  .kthread+0x18c/0x1a0
> [   17.669315][   T58]  .ret_from_kernel_thread+0x58/0x70
> [   17.669711][   T58]
> [   17.669711][   T58] freed by task 58:
> [   17.670116][   T58]  .kfence_guarded_free+0x3d0/0x530
> [   17.670421][   T58]  .__slab_free+0x320/0x5a0
> [   17.670603][   T58]  .test_double_free+0xb4/0x198
> [   17.670827][   T58]  .kunit_try_run_case+0x80/0x110
> [   17.671073][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
> [   17.671410][   T58]  .kthread+0x18c/0x1a0
> [   17.671618][   T58]  .ret_from_kernel_thread+0x58/0x70
> [   17.671972][   T58]
> [   17.672638][   T58] CPU: 0 PID: 58 Comm: kunit_try_catch Tainted: GB
> 5.12.0-rc1-01540-g0783285cc1b8-dirty #4685
> [   17.673768][   T58] 
> ==
> [   17.677031][   T58] # test_double_free: EXPECTATION FAILED at 
> mm/kfence/kfence_test.c:380
> [   17.677031][   T58] Expected report_matches(&expect) to be true, but 
> is false
> [   17.684397][T1] not ok 7 - test_double_free
> [   17.686463][   T59] # test_double_free-memcache: setup_test_cache: 
> size=32, ctor=0x0
> [   17.688403][   T59] # test_double_free-memcache: test_alloc: size=32, 
> gfp=cc0, policy=any,
> cache=1

Looks like something is prepending '.' to function names. We expect
the function name to appear as-is, e.g. "kfence_guarded_free",
"test_double_free", etc.

Is there something special on ppc64, where the '.' is some convention?

Thanks,
-- Marco


Re: [PATCH v2] powerpc/32: remove bogus ppc_select syscall

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 16:17, Arnd Bergmann a écrit :

On Thu, Mar 4, 2021 at 1:51 PM Christophe Leroy
 wrote:


From: Arnd Bergmann 

The ppc_select function was introduced in linux-2.3.48 in order to support
code confusing the legacy select() calling convention with the standard one.
Even 24 years ago, all correctly built code should not have done this and
could have easily been phased out. Nothing that was compiled later should
actually try to use the old_select interface, and it would have been broken
already on all ppc64 kernels with the syscall emulation layer.

This patch brings the 32 bit compat ABI and the native 32 bit ABI for
powerpc into a consistent state, by removing support for both the
old_select system call number and the handler for it.

The bug report triggering this came from
Halesh Sadashiiv , who discovered that the
32 bit implementation of ppc_select would in case of a negative number
of file descriptors incorrectly return -EFAULT instead of -EINVAL.
There seems to be no way to fix this problem in a way that would
keep broken pre-1997 binaries running.

Signed-off-by: Arnd Bergmann 
Cc: Halesh Sadashiiv 
[chleroy: Rebased and updated the number of years elapsed in the commit message]
Signed-off-by: Christophe Leroy 
---
First version was in 2008, at that time it was rejected, see
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/200809240839.14902.a...@arndb.de/


The patch from 2008 did two things:

- it removed the ppc32 specific 'select' syscall at #82
- it fixed the generic '_newselect' syscall at #142

Back then, the decision was to only address the second issue, which
got merged in commit dad2f2fb0fc7 ("powerpc: Fix wrong error code from
ppc32 select syscall").

It is probably ok to remove the old select system call now, but
my changelog text no longer makes sense, as the patch has nothing
to do with the bug that was reported back then.



I understood that the original reported bug was that calling that version of select() with a 
negative value as first parametre would lead to a -EFAULT instead of -EINVAL. That's exactly the 
case here, if you set n = -1 you get into this (unsigned long)n > 4096 then the buffer is at 
0x and access_ok() won't grand access to it so the return value will be -EFAULT instead of 
-EINVAL.


Am I missing something ?

Christophe


Re: [PATCH v2] powerpc/32: remove bogus ppc_select syscall

2021-03-04 Thread Arnd Bergmann
On Thu, Mar 4, 2021 at 1:51 PM Christophe Leroy
 wrote:
>
> From: Arnd Bergmann 
>
> The ppc_select function was introduced in linux-2.3.48 in order to support
> code confusing the legacy select() calling convention with the standard one.
> Even 24 years ago, all correctly built code should not have done this and
> could have easily been phased out. Nothing that was compiled later should
> actually try to use the old_select interface, and it would have been broken
> already on all ppc64 kernels with the syscall emulation layer.
>
> This patch brings the 32 bit compat ABI and the native 32 bit ABI for
> powerpc into a consistent state, by removing support for both the
> old_select system call number and the handler for it.
>
> The bug report triggering this came from
> Halesh Sadashiiv , who discovered that the
> 32 bit implementation of ppc_select would in case of a negative number
> of file descriptors incorrectly return -EFAULT instead of -EINVAL.
> There seems to be no way to fix this problem in a way that would
> keep broken pre-1997 binaries running.
>
> Signed-off-by: Arnd Bergmann 
> Cc: Halesh Sadashiiv 
> [chleroy: Rebased and updated the number of years elapsed in the commit 
> message]
> Signed-off-by: Christophe Leroy 
> ---
> First version was in 2008, at that time it was rejected, see
> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/200809240839.14902.a...@arndb.de/

The patch from 2008 did two things:

- it removed the ppc32 specific 'select' syscall at #82
- it fixed the generic '_newselect' syscall at #142

Back then, the decision was to only address the second issue, which
got merged in commit dad2f2fb0fc7 ("powerpc: Fix wrong error code from
ppc32 select syscall").

It is probably ok to remove the old select system call now, but
my changelog text no longer makes sense, as the patch has nothing
to do with the bug that was reported back then.

   Arnd


Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

2021-03-04 Thread Mark Rutland
[adding Mark Brown]

On Wed, Mar 03, 2021 at 04:20:43PM +0100, Marco Elver wrote:
> On Wed, Mar 03, 2021 at 03:52PM +0100, Christophe Leroy wrote:
> > Le 03/03/2021 � 15:38, Marco Elver a �crit�:
> > > On Wed, 3 Mar 2021 at 15:09, Christophe Leroy
> > >  wrote:
> > > > 
> > > > It seems like all other sane architectures, namely x86 and arm64
> > > > at least, include the running function as top entry when saving
> > > > stack trace.
> > > > 
> > > > Functionnalities like KFENCE expect it.
> > > > 
> > > > Do the same on powerpc, it allows KFENCE to properly identify the 
> > > > faulting
> > > > function as depicted below. Before the patch KFENCE was identifying
> > > > finish_task_switch.isra as the faulting function.
> > > > 
> > > > [   14.937370] 
> > > > ==
> > > > [   14.948692] BUG: KFENCE: invalid read in 
> > > > test_invalid_access+0x54/0x108
> > > > [   14.948692]
> > > > [   14.956814] Invalid read at 0xdf98800a:
> > > > [   14.960664]  test_invalid_access+0x54/0x108
> > > > [   14.964876]  finish_task_switch.isra.0+0x54/0x23c
> > > > [   14.969606]  kunit_try_run_case+0x5c/0xd0
> > > > [   14.973658]  kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > [   14.979079]  kthread+0x15c/0x174
> > > > [   14.982342]  ret_from_kernel_thread+0x14/0x1c
> > > > [   14.986731]
> > > > [   14.988236] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: GB
> > > >  5.12.0-rc1-01537-g95f6e2088d7e-dirty #4682
> > > > [   14.999795] NIP:  c016ec2c LR: c02f517c CTR: c016ebd8
> > > > [   15.004851] REGS: e2449d90 TRAP: 0301   Tainted: GB  
> > > > (5.12.0-rc1-01537-g95f6e2088d7e-dirty)
> > > > [   15.015274] MSR:  9032   CR: 2204  XER: 
> > > > 
> > > > [   15.022043] DAR: df98800a DSISR: 2000
> > > > [   15.022043] GPR00: c02f517c e2449e50 c1142080 e100dd24 c084b13c 
> > > > 0008 c084b32b c016ebd8
> > > > [   15.022043] GPR08: c085 df988000 c0d1 e2449eb0 22000288
> > > > [   15.040581] NIP [c016ec2c] test_invalid_access+0x54/0x108
> > > > [   15.046010] LR [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > [   15.051181] Call Trace:
> > > > [   15.053637] [e2449e50] [c005a68c] 
> > > > finish_task_switch.isra.0+0x54/0x23c (unreliable)
> > > > [   15.061338] [e2449eb0] [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > [   15.067215] [e2449ed0] [c02f648c] 
> > > > kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > [   15.074472] [e2449ef0] [c004e7b0] kthread+0x15c/0x174
> > > > [   15.079571] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c
> > > > [   15.085798] Instruction dump:
> > > > [   15.088784] 8129d608 38e7ebd8 81020280 911f004c 3900 995f0024 
> > > > 907f0028 90ff001c
> > > > [   15.096613] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 3908adb0 
> > > > 812a4b98 3d40c02f
> > > > [   15.104612] 
> > > > ==
> > > > 
> > > > Signed-off-by: Christophe Leroy 
> > > 
> > > Acked-by: Marco Elver 
> > > 
> > > Thank you, I think this looks like the right solution. Just a question 
> > > below:
> > > 
> > ...
> > 
> > > > @@ -59,23 +70,26 @@ void save_stack_trace(struct stack_trace *trace)
> > > > 
> > > >  sp = current_stack_frame();
> > > > 
> > > > -   save_context_stack(trace, sp, current, 1);
> > > > +   save_context_stack(trace, sp, (unsigned long)save_stack_trace, 
> > > > current, 1);
> > > 
> > > This causes ip == save_stack_trace and also below for
> > > save_stack_trace_tsk. Does this mean save_stack_trace() is included in
> > > the trace? Looking at kernel/stacktrace.c, I think the library wants
> > > to exclude itself from the trace, as it does '.skip = skipnr + 1' (and
> > > '.skip   = skipnr + (current == tsk)' for the _tsk variant).
> > > 
> > > If the arch-helper here is included, should this use _RET_IP_ instead?
> > > 
> > 
> > Don't really know, I was inspired by arm64 which has:
> > 
> > void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> >  struct task_struct *task, struct pt_regs *regs)
> > {
> > struct stackframe frame;
> > 
> > if (regs)
> > start_backtrace(&frame, regs->regs[29], regs->pc);
> > else if (task == current)
> > start_backtrace(&frame,
> > (unsigned long)__builtin_frame_address(0),
> > (unsigned long)arch_stack_walk);
> > else
> > start_backtrace(&frame, thread_saved_fp(task),
> > thread_saved_pc(task));
> > 
> > walk_stackframe(task, &frame, consume_entry, cookie);
> > }
> > 
> > But looking at x86 you may be right, so what should be done really ?
> 
> x86:
> 
> [2.843292] calling stack_trace_save:
> [2.843705]  test_func+0x6c/0x118
> [2.844184]  do_one_initcall+0x58/0x270
> [2.844618]  kernel_init_freeable+0x1da/0x23a
> [2.845110]  kernel_init+0xc/0x166
> [2.

[PATCH v2 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-03-04 Thread Christophe Leroy
This reuses the DEBUG_PAGEALLOC logic.

Tested on qemu with ppc64_defconfig + CONFIG_KFENCE + CONFIG_KUNIT +
CONFIG_KFENCE_KUNIT_TEST.

Signed-off-by: Christophe Leroy 
---
v2: New
---
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/include/asm/kfence.h |  8 
 arch/powerpc/mm/book3s64/hash_utils.c | 29 +--
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d46db0bfb998..67c47b60cc84 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -185,7 +185,7 @@ config PPC
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KGDB
-   select HAVE_ARCH_KFENCE if PPC32
+   select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
select HAVE_ARCH_NVRAM_OPS
diff --git a/arch/powerpc/include/asm/kfence.h 
b/arch/powerpc/include/asm/kfence.h
index a9846b68c6b9..532cc1a92fa5 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -11,11 +11,18 @@
 #include 
 #include 
 
+#if defined(CONFIG_PPC64) && !defined(PPC64_ELF_ABI_v2)
+#define ARCH_FUNC_PREFIX "."
+#endif
+
 static inline bool arch_kfence_init_pool(void)
 {
return true;
 }
 
+#ifdef CONFIG_PPC64
+bool kfence_protect_page(unsigned long addr, bool protect);
+#else
 static inline bool kfence_protect_page(unsigned long addr, bool protect)
 {
pte_t *kpte = virt_to_kpte(addr);
@@ -29,5 +36,6 @@ static inline bool kfence_protect_page(unsigned long addr, 
bool protect)
 
return true;
 }
+#endif
 
 #endif /* __ASM_POWERPC_KFENCE_H */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index cb09a49be798..b967a6403e59 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -323,8 +323,8 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
break;
 
cond_resched();
-   if (debug_pagealloc_enabled() &&
-   (paddr >> PAGE_SHIFT) < linear_map_hash_count)
+   if (debug_pagealloc_enabled_or_kfence() &&
+   (paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
}
return ret < 0 ? ret : 0;
@@ -661,7 +661,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
 
-   if (!debug_pagealloc_enabled()) {
+   if (!debug_pagealloc_enabled_or_kfence()) {
/*
 * Pick a size for the linear mapping. Currently, we only
 * support 16M, 1M and 4K which is the default
@@ -949,7 +949,7 @@ static void __init htab_initialize(void)
 
prot = pgprot_val(PAGE_KERNEL);
 
-   if (debug_pagealloc_enabled()) {
+   if (debug_pagealloc_enabled_or_kfence()) {
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
@@ -1927,7 +1927,7 @@ long hpte_insert_repeating(unsigned long hash, unsigned 
long vpn,
return slot;
 }
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
 static DEFINE_SPINLOCK(linear_map_hash_lock);
 
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -1982,6 +1982,21 @@ static void kernel_unmap_linear_page(unsigned long 
vaddr, unsigned long lmi)
 mmu_kernel_ssize, 0);
 }
 
+#ifdef CONFIG_KFENCE
+bool kfence_protect_page(unsigned long addr, bool protect)
+{
+   unsigned long lmi = __pa(addr) >> PAGE_SHIFT;
+
+   if (protect)
+   kernel_unmap_linear_page(addr, lmi);
+   else
+   kernel_map_linear_page(addr, lmi);
+
+   return true;
+}
+#endif
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
unsigned long flags, vaddr, lmi;
@@ -2000,7 +2015,9 @@ void __kernel_map_pages(struct page *page, int numpages, 
int enable)
}
local_irq_restore(flags);
 }
-#endif /* CONFIG_DEBUG_PAGEALLOC */
+#endif
+
+#endif /* CONFIG_DEBUG_PAGEALLOC || CONFIG_KFENCE */
 
 void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
-- 
2.25.0



[PATCH v2 1/4] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Christophe Leroy
Add architecture specific implementation details for KFENCE and enable
KFENCE for the ppc32 architecture. In particular, this implements the
required interface in .

KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the Read/Write linear map to be
mapped at page granularity.

Signed-off-by: Christophe Leroy 
Acked-by: Marco Elver 
---
v2: Added debug_pagealloc_enabled_or_kfence()
---
 arch/powerpc/Kconfig  | 13 ++--
 arch/powerpc/include/asm/kfence.h | 33 +++
 arch/powerpc/mm/book3s32/mmu.c|  2 +-
 arch/powerpc/mm/fault.c   |  7 ++-
 arch/powerpc/mm/init_32.c |  3 +++
 arch/powerpc/mm/mmu_decl.h|  5 +
 arch/powerpc/mm/nohash/8xx.c  |  4 ++--
 7 files changed, 57 insertions(+), 10 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kfence.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..d46db0bfb998 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -185,6 +185,7 @@ config PPC
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KGDB
+   select HAVE_ARCH_KFENCE if PPC32
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
select HAVE_ARCH_NVRAM_OPS
@@ -786,7 +787,7 @@ config THREAD_SHIFT
 config DATA_SHIFT_BOOL
bool "Set custom data alignment"
depends on ADVANCED_OPTIONS
-   depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC
+   depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE
depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && 
!STRICT_KERNEL_RWX)
help
  This option allows you to set the kernel data alignment. When
@@ -798,13 +799,13 @@ config DATA_SHIFT_BOOL
 config DATA_SHIFT
int "Data shift" if DATA_SHIFT_BOOL
default 24 if STRICT_KERNEL_RWX && PPC64
-   range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_BOOK3S_32
-   range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_8xx
+   range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_BOOK3S_32
+   range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC || KFENCE) && 
PPC_8xx
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
-   default 18 if DEBUG_PAGEALLOC && PPC_BOOK3S_32
+   default 18 if (DEBUG_PAGEALLOC || KFENCE) && PPC_BOOK3S_32
default 23 if STRICT_KERNEL_RWX && PPC_8xx
-   default 23 if DEBUG_PAGEALLOC && PPC_8xx && PIN_TLB_DATA
-   default 19 if DEBUG_PAGEALLOC && PPC_8xx
+   default 23 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx && PIN_TLB_DATA
+   default 19 if (DEBUG_PAGEALLOC || KFENCE) && PPC_8xx
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
diff --git a/arch/powerpc/include/asm/kfence.h 
b/arch/powerpc/include/asm/kfence.h
new file mode 100644
index ..a9846b68c6b9
--- /dev/null
+++ b/arch/powerpc/include/asm/kfence.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * powerpc KFENCE support.
+ *
+ * Copyright (C) 2020 CS GROUP France
+ */
+
+#ifndef __ASM_POWERPC_KFENCE_H
+#define __ASM_POWERPC_KFENCE_H
+
+#include 
+#include 
+
+static inline bool arch_kfence_init_pool(void)
+{
+   return true;
+}
+
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+   pte_t *kpte = virt_to_kpte(addr);
+
+   if (protect) {
+   pte_update(&init_mm, addr, kpte, _PAGE_PRESENT, 0, 0);
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+   } else {
+   pte_update(&init_mm, addr, kpte, 0, _PAGE_PRESENT, 0);
+   }
+
+   return true;
+}
+
+#endif /* __ASM_POWERPC_KFENCE_H */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index d7eb266a3f7a..a0db398b5c26 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -162,7 +162,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
 
 
-   if (debug_pagealloc_enabled() || __map_without_bats) {
+   if (debug_pagealloc_enabled_or_kfence() || __map_without_bats) {
pr_debug_once("Read-Write memory mapped without BATs\n");
if (base >= border)
return base;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index bb368257b55c..bea13682c909 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -418,8 +419,12 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * take a page fault to a kernel address or a page fault to a user
   

[PATCH v2 2/4] powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils

2021-03-04 Thread Christophe Leroy
debug_pagealloc_enabled() is always defined and constant folds to
'false' when CONFIG_DEBUG_PAGEALLOC is not enabled.

Remove the #ifdefs, the code and associated static variables will
be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is
not defined.

Signed-off-by: Christophe Leroy 
---
v2: New
---
 arch/powerpc/mm/book3s64/hash_utils.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 581b20a2feaf..f1b5a5f1d3a9 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -126,11 +126,8 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_PPC_64K_PAGES
 int mmu_ci_restrictions;
 #endif
-#ifdef CONFIG_DEBUG_PAGEALLOC
 static u8 *linear_map_hash_slots;
 static unsigned long linear_map_hash_count;
-static DEFINE_SPINLOCK(linear_map_hash_lock);
-#endif /* CONFIG_DEBUG_PAGEALLOC */
 struct mmu_hash_ops mmu_hash_ops;
 EXPORT_SYMBOL(mmu_hash_ops);
 
@@ -326,11 +323,9 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
break;
 
cond_resched();
-#ifdef CONFIG_DEBUG_PAGEALLOC
if (debug_pagealloc_enabled() &&
(paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
-#endif /* CONFIG_DEBUG_PAGEALLOC */
}
return ret < 0 ? ret : 0;
 }
@@ -954,7 +949,6 @@ static void __init htab_initialize(void)
 
prot = pgprot_val(PAGE_KERNEL);
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
if (debug_pagealloc_enabled()) {
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
@@ -964,7 +958,6 @@ static void __init htab_initialize(void)
panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
  __func__, linear_map_hash_count, &ppc64_rma_size);
}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
 
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, &base, &end) {
@@ -1935,6 +1928,8 @@ long hpte_insert_repeating(unsigned long hash, unsigned 
long vpn,
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
+static DEFINE_SPINLOCK(linear_map_hash_lock);
+
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
unsigned long hash;
-- 
2.25.0



[PATCH v2 3/4] powerpc/64s: Allow double call of kernel_[un]map_linear_page()

2021-03-04 Thread Christophe Leroy
If the page is already mapped resp. already unmapped, bail out.

Signed-off-by: Christophe Leroy 
---
v2: New
---
 arch/powerpc/mm/book3s64/hash_utils.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index f1b5a5f1d3a9..cb09a49be798 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1944,6 +1944,9 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
if (!vsid)
return;
 
+   if (linear_map_hash_slots[lmi] & 0x80)
+   return;
+
ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
HPTE_V_BOLTED,
mmu_linear_psize, mmu_kernel_ssize);
@@ -1963,7 +1966,10 @@ static void kernel_unmap_linear_page(unsigned long 
vaddr, unsigned long lmi)
 
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
spin_lock(&linear_map_hash_lock);
-   BUG_ON(!(linear_map_hash_slots[lmi] & 0x80));
+   if (!(linear_map_hash_slots[lmi] & 0x80)) {
+   spin_unlock(&linear_map_hash_lock);
+   return;
+   }
hidx = linear_map_hash_slots[lmi] & 0x7f;
linear_map_hash_slots[lmi] = 0;
spin_unlock(&linear_map_hash_lock);
-- 
2.25.0



Re: [PATCH 2/5] CMDLINE: drivers: of: ifdef out cmdline section

2021-03-04 Thread Rob Herring
On Wed, Mar 3, 2021 at 10:48 PM Daniel Walker  wrote:
>
> It looks like there's some seepage of cmdline stuff into
> the generic device tree code. This conflicts with the
> generic cmdline implementation so I remove it in the case
> when that's enabled.
>
> Cc: xe-linux-exter...@cisco.com
> Signed-off-by: Ruslan Ruslichenko 
> Signed-off-by: Daniel Walker 
> ---
>  drivers/of/fdt.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index feb0f2d67fc5..cfe4f8d3c9f5 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include   /* for COMMAND_LINE_SIZE */
>  #include 
> @@ -1048,8 +1049,18 @@ int __init early_init_dt_scan_chosen(unsigned long 
> node, const char *uname,
>
> early_init_dt_check_for_initrd(node);
>
> +#ifdef CONFIG_GENERIC_CMDLINE

What I like about Christophe's version is it removes the old DT
implementation. Who's going to convert the rest of the DT based
arches? That's arm, arm64, hexagon, microblaze, nios2, openrisc,
riscv, sh, and xtensa. Either separate the common code from the config
like Christophe's version or these all need converting. Though it's
fine to hash out patch 1 with a couple of arches first.

> /* Retrieve command line */
> p = of_get_flat_dt_prop(node, "bootargs", &l);

This needs to be outside the ifdef.

> +
> +   /*
> +* The builtin command line will be added here, or it can override
> +* with the DT bootargs.
> +*/
> +   cmdline_add_builtin(data,
> +   ((p != NULL && l > 0) ? p : NULL), /* This is 
> sanity checking */
> +   COMMAND_LINE_SIZE);
> +#else
> if (p != NULL && l > 0)
> strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
>
> @@ -1070,6 +1081,7 @@ int __init early_init_dt_scan_chosen(unsigned long 
> node, const char *uname,
> strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
>  #endif
>  #endif /* CONFIG_CMDLINE */
> +#endif /* CONFIG_GENERIC_CMDLINE */
>
> pr_debug("Command line is: %s\n", (char *)data);
>
> --
> 2.25.1
>


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 13:48, Marco Elver a écrit :

 From d118080eb9552073f5dcf1f86198f3d86d5ea850 Mon Sep 17 00:00:00 2001
From: Marco Elver 
Date: Thu, 4 Mar 2021 13:15:51 +0100
Subject: [PATCH] kfence: fix reports if constant function prefixes exist

Some architectures prefix all functions with a constant string ('.' on
ppc64). Add ARCH_FUNC_PREFIX, which may optionally be defined in
, so that get_stack_skipnr() can work properly.



It works, thanks.



Link: https://lkml.kernel.org/r/f036c53d-7e81-763c-47f4-6024c6c5f...@csgroup.eu
Reported-by: Christophe Leroy 
Signed-off-by: Marco Elver 


Tested-by: Christophe Leroy 


---
  mm/kfence/report.c | 18 --
  1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/mm/kfence/report.c b/mm/kfence/report.c
index 519f037720f5..e3f71451ad9e 100644
--- a/mm/kfence/report.c
+++ b/mm/kfence/report.c
@@ -20,6 +20,11 @@
  
  #include "kfence.h"
  
+/* May be overridden by . */

+#ifndef ARCH_FUNC_PREFIX
+#define ARCH_FUNC_PREFIX ""
+#endif
+
  extern bool no_hash_pointers;
  
  /* Helper function to either print to a seq_file or to console. */

@@ -67,8 +72,9 @@ static int get_stack_skipnr(const unsigned long 
stack_entries[], int num_entries
for (skipnr = 0; skipnr < num_entries; skipnr++) {
int len = scnprintf(buf, sizeof(buf), "%ps", (void 
*)stack_entries[skipnr]);
  
-		if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") ||

-   !strncmp(buf, "__slab_free", len)) {
+   if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfence_") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "__kfence_") ||
+   !strncmp(buf, ARCH_FUNC_PREFIX "__slab_free", len)) {
/*
 * In case of tail calls from any of the below
 * to any of the above.
@@ -77,10 +83,10 @@ static int get_stack_skipnr(const unsigned long 
stack_entries[], int num_entries
}
  
  		/* Also the *_bulk() variants by only checking prefixes. */

-   if (str_has_prefix(buf, "kfree") ||
-   str_has_prefix(buf, "kmem_cache_free") ||
-   str_has_prefix(buf, "__kmalloc") ||
-   str_has_prefix(buf, "kmem_cache_alloc"))
+   if (str_has_prefix(buf, ARCH_FUNC_PREFIX "kfree") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_free") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "__kmalloc") ||
+   str_has_prefix(buf, ARCH_FUNC_PREFIX "kmem_cache_alloc"))
goto found;
}
if (fallback < num_entries)



Re: [PATCH v2 1/3] powerpc: sstep: Fix load and update emulation

2021-03-04 Thread Naveen N. Rao
On 2021/03/01 08:37PM, Segher Boessenkool wrote:
> Hi!
> 
> I didn't see this until now, almost a month later, sorry about that :-)

No problem.

> 
> On Thu, Feb 04, 2021 at 01:57:53PM +0530, Naveen N. Rao wrote:
> > On 2021/02/03 03:17PM, Segher Boessenkool wrote:
> > > Power8 does:
> > > 
> > >   Load with Update Instructions (RA = 0)
> > > EA is placed into R0.
> > >   Load with Update Instructions (RA = RT)
> > > EA is placed into RT. The storage operand addressed by EA is
> > > accessed, but the data returned by the load is discarded.
> > 
> > I'm actually not seeing that. This is what I am testing with:
> > li  8,0xaaa
> > mr  6,1
> > std 8,64(6)
> > #ldu6,64(6)
> > .long   0xe8c60041
> > 
> > And, r6 always ends up with 0xaea. It changes with the value I put into 
> > r6 though.
> 
> That is exactly the behaviour specified for p8.  0aaa+0040=0aea.
> 
> > Granted, this is all up in the air, but it does look like there is more 
> > going on and the value isn't the EA or the value at the address.
> 
> That *is* the EA.  The EA is the address the insn does the access at.

I'm probably missing something here. 0xaaa is the value I stored at an 
offset of 64 bytes from the stack pointer (r1 is copied into r6). In the 
ldu instruction above, the EA is 64(r6), which should translate to 
r1+64.  The data returned by the load would be 0xaaa, which should be 
discarded per the description you provided above. So, I would expect to 
see a 0xc0.. address in r6.

In fact, this looks to be the behavior documented for P9:

> > Power9 does:
> >
> >   Load with Update Instructions (RA = 0)
> > EA is placed into R0.
> >   Load with Update Instructions (RA = RT)
> > The storage operand addressed by EA is accessed. The 
> > displacement
> > field is added to the data returned by the load and placed into 
> > RT.

- Naveen


[PATCH v2] powerpc/32: remove bogus ppc_select syscall

2021-03-04 Thread Christophe Leroy
From: Arnd Bergmann 

The ppc_select function was introduced in linux-2.3.48 in order to support
code confusing the legacy select() calling convention with the standard one.
Even 24 years ago, all correctly built code should not have done this and
could have easily been phased out. Nothing that was compiled later should
actually try to use the old_select interface, and it would have been broken
already on all ppc64 kernels with the syscall emulation layer.

This patch brings the 32 bit compat ABI and the native 32 bit ABI for
powerpc into a consistent state, by removing support for both the
old_select system call number and the handler for it.

The bug report triggering this came from
Halesh Sadashiiv , who discovered that the
32 bit implementation of ppc_select would in case of a negative number
of file descriptors incorrectly return -EFAULT instead of -EINVAL.
There seems to be no way to fix this problem in a way that would
keep broken pre-1997 binaries running.

Signed-off-by: Arnd Bergmann 
Cc: Halesh Sadashiiv 
[chleroy: Rebased and updated the number of years elapsed in the commit message]
Signed-off-by: Christophe Leroy 
---
First version was in 2008, at that time it was rejected, see
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/200809240839.14902.a...@arndb.de/

If we decide to still keep this, then we'll have to:
- take into account -4096 < fd < 0 case
- use unsafe_get_user inside a uaccess_begin block
---
 arch/powerpc/include/asm/asm-prototypes.h |  3 ---
 arch/powerpc/kernel/syscalls.c| 25 ---
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +---
 3 files changed, 1 insertion(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 939f3c94c8f3..78e0a3bd448a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -63,9 +63,6 @@ long sys_swapcontext(struct ucontext __user *old_ctx,
 #ifdef CONFIG_PPC32
 long sys_debug_setcontext(struct ucontext __user *ctx,
  int ndbg, struct sig_dbg_op __user *dbg);
-int
-ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
-  struct __kernel_old_timeval __user *tvp);
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 078608ec2e92..70b0eb5bedfd 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -71,31 +71,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT);
 }
 
-#ifdef CONFIG_PPC32
-/*
- * Due to some executables calling the wrong select we sometimes
- * get wrong args.  This determines how the args are being passed
- * (a single ptr to them all args passed) then calls
- * sys_select() with the appropriate args. -- Cort
- */
-int
-ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, 
struct __kernel_old_timeval __user *tvp)
-{
-   if ( (unsigned long)n >= 4096 )
-   {
-   unsigned long __user *buffer = (unsigned long __user *)n;
-   if (!access_ok(buffer, 5*sizeof(unsigned long))
-   || __get_user(n, buffer)
-   || __get_user(inp, ((fd_set __user * __user *)(buffer+1)))
-   || __get_user(outp, ((fd_set  __user * __user *)(buffer+2)))
-   || __get_user(exp, ((fd_set  __user * __user *)(buffer+3)))
-   || __get_user(tvp, ((struct __kernel_old_timeval  __user * 
__user *)(buffer+4
-   return -EFAULT;
-   }
-   return sys_select(n, inp, outp, exp, tvp);
-}
-#endif
-
 #ifdef CONFIG_PPC64
 long ppc64_personality(unsigned long personality)
 {
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0b2480cf3e47..5bb0e90e502e 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -110,9 +110,7 @@
 79 common  settimeofdaysys_settimeofday
compat_sys_settimeofday
 80 common  getgroups   sys_getgroups
 81 common  setgroups   sys_setgroups
-82 32  select  ppc_select  
sys_ni_syscall
-82 64  select  sys_ni_syscall
-82 spu select  sys_ni_syscall
+82 common  select  sys_ni_syscall
 83 common  symlink sys_symlink
 84 32  oldlstatsys_lstat   
sys_ni_syscall
 84 64  oldlstatsys_ni_syscall
-- 
2.25.0



Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 12:48, Christophe Leroy a écrit :



Le 04/03/2021 à 12:31, Marco Elver a écrit :

On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
 wrote:

Le 03/03/2021 à 11:56, Marco Elver a écrit :


Somewhat tangentially, I also note that e.g. show_regs(regs) (which
was printed along the KFENCE report above) didn't include the top
frame in the "Call Trace", so this assumption is definitely not
isolated to KFENCE.



Now, I have tested PPC64 (with the patch I sent yesterday to modify 
save_stack_trace_regs()
applied), and I get many failures. Any idea ?

[   17.653751][   T58] 
==
[   17.654379][   T58] BUG: KFENCE: invalid free in 
.kfence_guarded_free+0x2e4/0x530
[   17.654379][   T58]
[   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
[   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
[   17.655775][   T58]  .__slab_free+0x320/0x5a0
[   17.656039][   T58]  .test_double_free+0xe0/0x198
[   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
[   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.657161][   T58]  .kthread+0x18c/0x1a0
[   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.659869][   T58]
[   17.663954][   T58] kfence-#77 [0xc0003c9c-0xc0003c9c001f, 
size=32, cache=kmalloc-32]
allocated by task 58:
[   17.666113][   T58]  .__kfence_alloc+0x1bc/0x510
[   17.667069][   T58]  .__kmalloc+0x280/0x4f0
[   17.667452][   T58]  .test_alloc+0x19c/0x430
[   17.667732][   T58]  .test_double_free+0x88/0x198
[   17.667971][   T58]  .kunit_try_run_case+0x80/0x110
[   17.668283][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.668553][   T58]  .kthread+0x18c/0x1a0
[   17.669315][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.669711][   T58]
[   17.669711][   T58] freed by task 58:
[   17.670116][   T58]  .kfence_guarded_free+0x3d0/0x530
[   17.670421][   T58]  .__slab_free+0x320/0x5a0
[   17.670603][   T58]  .test_double_free+0xb4/0x198
[   17.670827][   T58]  .kunit_try_run_case+0x80/0x110
[   17.671073][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.671410][   T58]  .kthread+0x18c/0x1a0
[   17.671618][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.671972][   T58]
[   17.672638][   T58] CPU: 0 PID: 58 Comm: kunit_try_catch Tainted: G    B
5.12.0-rc1-01540-g0783285cc1b8-dirty #4685
[   17.673768][   T58] 
==
[   17.677031][   T58] # test_double_free: EXPECTATION FAILED at 
mm/kfence/kfence_test.c:380
[   17.677031][   T58] Expected report_matches(&expect) to be true, but is 
false
[   17.684397][    T1] not ok 7 - test_double_free
[   17.686463][   T59] # test_double_free-memcache: setup_test_cache: 
size=32, ctor=0x0
[   17.688403][   T59] # test_double_free-memcache: test_alloc: size=32, 
gfp=cc0, policy=any,
cache=1


Looks like something is prepending '.' to function names. We expect
the function name to appear as-is, e.g. "kfence_guarded_free",
"test_double_free", etc.

Is there something special on ppc64, where the '.' is some convention?



I think so, see 
https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES

Also see commit https://github.com/linuxppc/linux/commit/02424d896



But I'm wondering, if the dot is the problem, how so is the following one ok ?

[   79.574457][   T75] # test_krealloc: test_alloc: size=32, gfp=cc0, 
policy=any, cache=0
[   79.682728][   T75] 
==
[   79.684017][   T75] BUG: KFENCE: use-after-free read in 
.test_krealloc+0x4fc/0x5b8
[   79.684017][   T75]
[   79.684955][   T75] Use-after-free read at 0xc0003d06 (in 
kfence-#130):
[   79.687581][   T75]  .test_krealloc+0x4fc/0x5b8
[   79.688216][   T75]  .test_krealloc+0x4e4/0x5b8
[   79.688824][   T75]  .kunit_try_run_case+0x80/0x110
[   79.689737][   T75]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   79.690335][   T75]  .kthread+0x18c/0x1a0
[   79.691092][   T75]  .ret_from_kernel_thread+0x58/0x70
[   79.692081][   T75]
[   79.692671][   T75] kfence-#130 [0xc0003d06-0xc0003d06001f, size=32, 
cache=kmalloc-32] allocated by task 75:

[   79.700977][   T75]  .__kfence_alloc+0x1bc/0x510
[   79.701812][   T75]  .__kmalloc+0x280/0x4f0
[   79.702695][   T75]  .test_alloc+0x19c/0x430
[   79.703051][   T75]  .test_krealloc+0xa8/0x5b8
[   79.703276][   T75]  .kunit_try_run_case+0x80/0x110
[   79.703693][   T75]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   79.704223][   T75]  .kthread+0x18c/0x1a0
[   79.704586][   T75]  .ret_from_kernel_thread+0x58/0x70
[   79.704968][   T75]
[   79.704968][   T75] freed by task 75:
[   79.705756][   T75]  .kfence_guarded_free+0x3d0/0x530
[   79.706754][   T75]  .__slab_free+0x320/0x5a0
[   79.708575][   T75]  .krealloc+0xe8/0x180
[   79.708970][   T75]  .test_krealloc+0x1c8/0x5b8
[   79.709606][   T75]  .kunit_try_run_case+0x80

[Bug 210749] sysfs: cannot create duplicate filename '/bus/nvmem/devices/module-vpd'

2021-03-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=210749

--- Comment #8 from Michael Ellerman (mich...@ellerman.id.au) ---
Actually I also reverted 4e302c3b568e ("misc: eeprom: at24: fix NVMEM name with
custom AT24 device name").

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] powerpc/perf: Fix sampled instruction type for larx/stcx

2021-03-04 Thread Athira Rajeev
Sampled Instruction Event Register (SIER) field [46:48]
identifies the sampled instruction type. ISA v3.1 says value
of 0b111 for this field as reserved, but in POWER10 it denotes
LARX/STCX type which will hopefully be fixed in ISA v3.1 update.

Patch fixes the functions to handle type value 7 for
CPU_FTR_ARCH_31.

Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev 
---
 arch/powerpc/perf/isa207-common.c | 30 +++---
 arch/powerpc/perf/isa207-common.h |  1 +
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..754f904d8d69 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -266,6 +266,8 @@ void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, 
u32 flags,
u32 sub_idx;
u64 sier;
u64 val;
+   u64 mmcra = mfspr(SPRN_MMCRA);
+   u32 op_type;
 
/* Skip if no SIER support */
if (!(flags & PPMU_HAS_SIER)) {
@@ -275,12 +277,34 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
*dsrc, u32 flags,
 
sier = mfspr(SPRN_SIER);
val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
-   if (val == 1 || val == 2) {
+   if (val == 1 || val == 2 || (val == 7 && 
cpu_has_feature(CPU_FTR_ARCH_31))) {
idx = (sier & ISA207_SIER_LDST_MASK) >> ISA207_SIER_LDST_SHIFT;
sub_idx = (sier & ISA207_SIER_DATA_SRC_MASK) >> 
ISA207_SIER_DATA_SRC_SHIFT;
 
dsrc->val = isa207_find_source(idx, sub_idx);
-   dsrc->val |= (val == 1) ? P(OP, LOAD) : P(OP, STORE);
+   if (val == 7) {
+   /*
+* Type 0b111 denotes either larx or stcx instruction. 
Use the
+* MMCRA sampling bits [57:59] along with the type value
+* to determine the exact instruction type. If the 
sampling
+* criteria is neither load or store, set the type as 
default
+* to NA.
+*/
+   op_type = (mmcra >> MMCRA_SAMP_ELIG_SHIFT) & 
MMCRA_SAMP_ELIG_MASK;
+   switch (op_type) {
+   case 5:
+   dsrc->val |= P(OP, LOAD);
+   break;
+   case 7:
+   dsrc->val |= P(OP, STORE);
+   break;
+   default:
+   dsrc->val |= P(OP, NA);
+   break;
+   }
+   } else {
+   dsrc->val |= (val == 1) ? P(OP, LOAD) : P(OP, STORE);
+   }
}
 }
 
@@ -295,7 +319,7 @@ void isa207_get_mem_weight(u64 *weight)
if (cpu_has_feature(CPU_FTR_ARCH_31))
mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
 
-   if (val == 0 || val == 7)
+   if (val == 0 || (val == 7 && !cpu_has_feature(CPU_FTR_ARCH_31)))
*weight = 0;
else
*weight = mantissa << (2 * exp);
diff --git a/arch/powerpc/perf/isa207-common.h 
b/arch/powerpc/perf/isa207-common.h
index 1af0e8c97ac7..7b0242efe4b9 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -220,6 +220,7 @@
 /* Bits in MMCRA for PowerISA v2.07 */
 #define MMCRA_SAMP_MODE_SHIFT  1
 #define MMCRA_SAMP_ELIG_SHIFT  4
+#define MMCRA_SAMP_ELIG_MASK   7
 #define MMCRA_THR_CTL_SHIFT8
 #define MMCRA_THR_SEL_SHIFT16
 #define MMCRA_THR_CMP_SHIFT32
-- 
1.8.3.1



Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 12:31, Marco Elver a écrit :

On Thu, 4 Mar 2021 at 12:23, Christophe Leroy
 wrote:

Le 03/03/2021 à 11:56, Marco Elver a écrit :


Somewhat tangentially, I also note that e.g. show_regs(regs) (which
was printed along the KFENCE report above) didn't include the top
frame in the "Call Trace", so this assumption is definitely not
isolated to KFENCE.



Now, I have tested PPC64 (with the patch I sent yesterday to modify 
save_stack_trace_regs()
applied), and I get many failures. Any idea ?

[   17.653751][   T58] 
==
[   17.654379][   T58] BUG: KFENCE: invalid free in 
.kfence_guarded_free+0x2e4/0x530
[   17.654379][   T58]
[   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
[   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
[   17.655775][   T58]  .__slab_free+0x320/0x5a0
[   17.656039][   T58]  .test_double_free+0xe0/0x198
[   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
[   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.657161][   T58]  .kthread+0x18c/0x1a0
[   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.659869][   T58]
[   17.663954][   T58] kfence-#77 [0xc0003c9c-0xc0003c9c001f, 
size=32, cache=kmalloc-32]
allocated by task 58:
[   17.666113][   T58]  .__kfence_alloc+0x1bc/0x510
[   17.667069][   T58]  .__kmalloc+0x280/0x4f0
[   17.667452][   T58]  .test_alloc+0x19c/0x430
[   17.667732][   T58]  .test_double_free+0x88/0x198
[   17.667971][   T58]  .kunit_try_run_case+0x80/0x110
[   17.668283][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.668553][   T58]  .kthread+0x18c/0x1a0
[   17.669315][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.669711][   T58]
[   17.669711][   T58] freed by task 58:
[   17.670116][   T58]  .kfence_guarded_free+0x3d0/0x530
[   17.670421][   T58]  .__slab_free+0x320/0x5a0
[   17.670603][   T58]  .test_double_free+0xb4/0x198
[   17.670827][   T58]  .kunit_try_run_case+0x80/0x110
[   17.671073][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.671410][   T58]  .kthread+0x18c/0x1a0
[   17.671618][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.671972][   T58]
[   17.672638][   T58] CPU: 0 PID: 58 Comm: kunit_try_catch Tainted: GB
5.12.0-rc1-01540-g0783285cc1b8-dirty #4685
[   17.673768][   T58] 
==
[   17.677031][   T58] # test_double_free: EXPECTATION FAILED at 
mm/kfence/kfence_test.c:380
[   17.677031][   T58] Expected report_matches(&expect) to be true, but is 
false
[   17.684397][T1] not ok 7 - test_double_free
[   17.686463][   T59] # test_double_free-memcache: setup_test_cache: 
size=32, ctor=0x0
[   17.688403][   T59] # test_double_free-memcache: test_alloc: size=32, 
gfp=cc0, policy=any,
cache=1


Looks like something is prepending '.' to function names. We expect
the function name to appear as-is, e.g. "kfence_guarded_free",
"test_double_free", etc.

Is there something special on ppc64, where the '.' is some convention?



I think so, see 
https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#FUNC-DES

Also see commit https://github.com/linuxppc/linux/commit/02424d896

Christophe


[Bug 210749] sysfs: cannot create duplicate filename '/bus/nvmem/devices/module-vpd'

2021-03-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=210749

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #7 from Michael Ellerman (mich...@ellerman.id.au) ---
Yeah, reverting that on mainline fixed it.

Not sure how we're going to fix it though, a straight revert isn't going to be
acceptable I suspect.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] powerpc/pseries: export LPAR security flavor in lparcfg

2021-03-04 Thread Laurent Dufour
This is helpful to read the security flavor from inside the LPAR.

Export it like this in /proc/powerpc/lparcfg:

$ grep security_flavor /proc/powerpc/lparcfg
security_flavor=1

Value means:
0 Speculative execution fully enabled
1 Speculative execution controls to mitigate user-to-kernel attacks
2 Speculative execution controls to mitigate user-to-kernel and
  user-to-user side-channel attacks

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/hvcall.h| 1 +
 arch/powerpc/platforms/pseries/lparcfg.c | 2 ++
 arch/powerpc/platforms/pseries/pseries.h | 1 +
 arch/powerpc/platforms/pseries/setup.c   | 8 
 4 files changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index ed6086d57b22..455e188da26d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -389,6 +389,7 @@
 #define H_CPU_BEHAV_FAVOUR_SECURITY(1ull << 63) // IBM bit 0
 #define H_CPU_BEHAV_L1D_FLUSH_PR   (1ull << 62) // IBM bit 1
 #define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR  (1ull << 61) // IBM bit 2
+#define H_CPU_BEHAV_FAVOUR_SECURITY_H  (1ull << 60) // IBM bit 3
 #define H_CPU_BEHAV_FLUSH_COUNT_CACHE  (1ull << 58) // IBM bit 5
 #define H_CPU_BEHAV_FLUSH_LINK_STACK   (1ull << 57) // IBM bit 6
 
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index e278390ab28d..35f6c4929fbd 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -169,6 +169,7 @@ static void show_gpci_data(struct seq_file *m)
kfree(buf);
 }
 
+
 static unsigned h_pic(unsigned long *pool_idle_time,
  unsigned long *num_procs)
 {
@@ -537,6 +538,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
parse_em_data(m);
maxmem_data(m);
 
+   seq_printf(m, "security_flavor=%u\n", pseries_security_flavor);
return 0;
 }
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 4fe48c04c6c2..a25517dc2515 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -111,6 +111,7 @@ static inline unsigned long cmo_get_page_size(void)
 
 int dlpar_workqueue_init(void);
 
+extern u32 pseries_security_flavor;
 void pseries_setup_security_mitigations(void);
 void pseries_lpar_read_hblkrm_characteristics(void);
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 46e1540abc22..59080413a269 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -85,6 +85,7 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 int ibm_nmi_interlock_token;
+u32 pseries_security_flavor;
 
 static void pSeries_show_cpuinfo(struct seq_file *m)
 {
@@ -534,9 +535,16 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
/*
 * The features below are enabled by default, so we instead look to see
 * if firmware has *disabled* them, and clear them if so.
+* H_CPU_BEHAV_FAVOUR_SECURITY_H could be set only if
+* H_CPU_BEHAV_FAVOUR_SECURITY is.
 */
if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))
security_ftr_clear(SEC_FTR_FAVOUR_SECURITY);
+   else if (result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY_H)
+   pseries_security_flavor = 1;
+   else
+   pseries_security_flavor = 2;
+
 
if (!(result->behaviour & H_CPU_BEHAV_L1D_FLUSH_PR))
security_ftr_clear(SEC_FTR_L1D_FLUSH_PR);
-- 
2.30.1



Re: [PATCH v2] powerpc: Fix save_stack_trace_regs() to have running function as first entry

2021-03-04 Thread Michael Ellerman
Christophe Leroy  writes:
> It seems like other architectures, namely x86 and arm64
> at least, include the running function as top entry when saving
> stack trace with save_stack_trace_regs().

Also riscv AFAICS.

> Functionnalities like KFENCE expect it.
>
> Do the same on powerpc, it allows KFENCE to properly identify the faulting
> function as depicted below. Before the patch KFENCE was identifying
> finish_task_switch.isra as the faulting function.

Thanks, I think this is the right approach. There's kfence but also
several other users from what I can see with a quick grep.

...
>
> Signed-off-by: Christophe Leroy 
> Fixes: 35de3b1aa168 ("powerpc: Implement save_stack_trace_regs() to enable 
> kprobe stack tracing")
> Cc: sta...@vger.kernel.org

I'm not sure about the Cc to stable. I think we are fixing the behaviour
to match the (implied) intent of the API, but that doesn't mean we won't
break something by accident. I'll think about it :)

cheers


Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32

2021-03-04 Thread Christophe Leroy




Le 03/03/2021 à 11:56, Marco Elver a écrit :


Somewhat tangentially, I also note that e.g. show_regs(regs) (which
was printed along the KFENCE report above) didn't include the top
frame in the "Call Trace", so this assumption is definitely not
isolated to KFENCE.



Now, I have tested PPC64 (with the patch I sent yesterday to modify save_stack_trace_regs() 
applied), and I get many failures. Any idea ?


[   17.653751][   T58] 
==
[   17.654379][   T58] BUG: KFENCE: invalid free in 
.kfence_guarded_free+0x2e4/0x530
[   17.654379][   T58]
[   17.654831][   T58] Invalid free of 0xc0003c9c (in kfence-#77):
[   17.655358][   T58]  .kfence_guarded_free+0x2e4/0x530
[   17.655775][   T58]  .__slab_free+0x320/0x5a0
[   17.656039][   T58]  .test_double_free+0xe0/0x198
[   17.656308][   T58]  .kunit_try_run_case+0x80/0x110
[   17.656523][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.657161][   T58]  .kthread+0x18c/0x1a0
[   17.659148][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.659869][   T58]
[   17.663954][   T58] kfence-#77 [0xc0003c9c-0xc0003c9c001f, size=32, cache=kmalloc-32] 
allocated by task 58:

[   17.666113][   T58]  .__kfence_alloc+0x1bc/0x510
[   17.667069][   T58]  .__kmalloc+0x280/0x4f0
[   17.667452][   T58]  .test_alloc+0x19c/0x430
[   17.667732][   T58]  .test_double_free+0x88/0x198
[   17.667971][   T58]  .kunit_try_run_case+0x80/0x110
[   17.668283][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.668553][   T58]  .kthread+0x18c/0x1a0
[   17.669315][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.669711][   T58]
[   17.669711][   T58] freed by task 58:
[   17.670116][   T58]  .kfence_guarded_free+0x3d0/0x530
[   17.670421][   T58]  .__slab_free+0x320/0x5a0
[   17.670603][   T58]  .test_double_free+0xb4/0x198
[   17.670827][   T58]  .kunit_try_run_case+0x80/0x110
[   17.671073][   T58]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.671410][   T58]  .kthread+0x18c/0x1a0
[   17.671618][   T58]  .ret_from_kernel_thread+0x58/0x70
[   17.671972][   T58]
[   17.672638][   T58] CPU: 0 PID: 58 Comm: kunit_try_catch Tainted: GB 
5.12.0-rc1-01540-g0783285cc1b8-dirty #4685

[   17.673768][   T58] 
==
[   17.677031][   T58] # test_double_free: EXPECTATION FAILED at 
mm/kfence/kfence_test.c:380
[   17.677031][   T58] Expected report_matches(&expect) to be true, but is 
false
[   17.684397][T1] not ok 7 - test_double_free
[   17.686463][   T59] # test_double_free-memcache: setup_test_cache: 
size=32, ctor=0x0
[   17.688403][   T59] # test_double_free-memcache: test_alloc: size=32, gfp=cc0, policy=any, 
cache=1

[   17.797584][   T59] 
==
[   17.801260][   T59] BUG: KFENCE: invalid free in 
.kfence_guarded_free+0x2e4/0x530
[   17.801260][   T59]
[   17.801512][   T59] Invalid free of 0xc0003c9effe0 (in kfence-#78):
[   17.801668][   T59]  .kfence_guarded_free+0x2e4/0x530
[   17.801849][   T59]  .__slab_free+0x320/0x5a0
[   17.801983][   T59]  .kmem_cache_free+0x31c/0x5c0
[   17.802109][   T59]  .test_double_free+0xd0/0x198
[   17.802227][   T59]  .kunit_try_run_case+0x80/0x110
[   17.802494][   T59]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.802641][   T59]  .kthread+0x18c/0x1a0
[   17.802821][   T59]  .ret_from_kernel_thread+0x58/0x70
[   17.802989][   T59]
[   17.803303][   T59] kfence-#78 [0xc0003c9effe0-0xc0003c9e, size=32, cache=test] 
allocated by task 59:

[   17.803666][   T59]  .__kfence_alloc+0x1bc/0x510
[   17.803898][   T59]  .kmem_cache_alloc+0x290/0x440
[   17.804036][   T59]  .test_alloc+0x188/0x430
[   17.804151][   T59]  .test_double_free+0x88/0x198
[   17.804363][   T59]  .kunit_try_run_case+0x80/0x110
[   17.804637][   T59]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.805099][   T59]  .kthread+0x18c/0x1a0
[   17.805313][   T59]  .ret_from_kernel_thread+0x58/0x70
[   17.806035][   T59]
[   17.806035][   T59] freed by task 59:
[   17.806495][   T59]  .kfence_guarded_free+0x3d0/0x530
[   17.806689][   T59]  .__slab_free+0x320/0x5a0
[   17.806941][   T59]  .kmem_cache_free+0x31c/0x5c0
[   17.807122][   T59]  .test_double_free+0xa8/0x198
[   17.807360][   T59]  .kunit_try_run_case+0x80/0x110
[   17.807538][   T59]  .kunit_generic_run_threadfn_adapter+0x38/0x50
[   17.807703][   T59]  .kthread+0x18c/0x1a0
[   17.808015][   T59]  .ret_from_kernel_thread+0x58/0x70
[   17.808220][   T59]
[   17.808406][   T59] CPU: 0 PID: 59 Comm: kunit_try_catch Tainted: GB 
5.12.0-rc1-01540-g0783285cc1b8-dirty #4685

[   17.808670][   T59] 
==
[   17.809882][   T59] # test_double_free-memcache: EXPECTATION FAILED at 
mm/kfence/kfence_test.c:380

[   17.809882][   T59] Expected report_matches(&expect) to be true, but is 
fal

Re: cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver

2021-03-04 Thread Joerg Roedel
On Mon, Mar 01, 2021 at 09:42:40AM +0100, Christoph Hellwig wrote:
> Diffstat:
>  arch/powerpc/include/asm/fsl_pamu_stash.h   |   12 
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |2 
>  drivers/iommu/amd/iommu.c   |   23 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   85 ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  122 +---
>  drivers/iommu/dma-iommu.c   |8 
>  drivers/iommu/fsl_pamu.c|  264 --
>  drivers/iommu/fsl_pamu.h|   10 
>  drivers/iommu/fsl_pamu_domain.c |  694 
> ++--
>  drivers/iommu/fsl_pamu_domain.h |   46 -
>  drivers/iommu/intel/iommu.c |   55 --
>  drivers/iommu/iommu.c   |   75 ---
>  drivers/soc/fsl/qbman/qman_portal.c |   56 --
>  drivers/vfio/vfio_iommu_type1.c |   31 -
>  drivers/vhost/vdpa.c|   10 
>  include/linux/iommu.h   |   81 ---
>  16 files changed, 214 insertions(+), 1360 deletions(-)

Nice cleanup, thanks. The fsl_pamu driver and interface has always been
a little bit of an alien compared to other IOMMU drivers. I am inclined
to merge this after -rc3 is out, given some reviews. Can you also please
add changelogs to the last three patches?

Thanks,

Joerg


Re: [PATCH v2 34/37] KVM: PPC: Book3S HV: add virtual mode handlers for HPT hcalls and page faults

2021-03-04 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 4, 2021 6:09 am:
> Nicholas Piggin  writes:
> 
>> In order to support hash guests in the P9 path (which does not do real
>> mode hcalls or page fault handling), these real-mode hash specific
>> interrupts need to be implemented in virt mode.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  arch/powerpc/kvm/book3s_hv.c | 118 +--
>>  1 file changed, 113 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 9d2fa21201c1..1bbc46f2cfbf 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -935,6 +935,52 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>  return RESUME_HOST;
>>
>>  switch (req) {
>> +case H_REMOVE:
>> +ret = kvmppc_h_remove(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5),
>> +kvmppc_get_gpr(vcpu, 6));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_ENTER:
>> +ret = kvmppc_h_enter(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5),
>> +kvmppc_get_gpr(vcpu, 6),
>> +kvmppc_get_gpr(vcpu, 7));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_READ:
>> +ret = kvmppc_h_read(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_CLEAR_MOD:
>> +ret = kvmppc_h_clear_mod(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_CLEAR_REF:
>> +ret = kvmppc_h_clear_ref(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_PROTECT:
>> +ret = kvmppc_h_protect(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +kvmppc_get_gpr(vcpu, 5),
>> +kvmppc_get_gpr(vcpu, 6));
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +case H_BULK_REMOVE:
>> +ret = kvmppc_h_bulk_remove(vcpu);
>> +if (ret == H_TOO_HARD)
>> +return RESUME_HOST;
>> +break;
>> +
> 
> Some of these symbols need to be exported.
> 
> ERROR: modpost: "kvmppc_h_bulk_remove" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_h_clear_mod" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_xive_xics_hcall" [arch/powerpc/kvm/kvm-hv.ko] 
> undefined!
> ERROR: modpost: "kvmppc_h_remove" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "decrementers_next_tb" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_hpte_hv_fault" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_h_protect" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_h_enter" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_h_clear_ref" [arch/powerpc/kvm/kvm-hv.ko] undefined!
> ERROR: modpost: "kvmppc_h_read" [arch/powerpc/kvm/kvm-hv.ko] undefined!

Yeah sorry about that there's a few issues there, I'll try polish that 
up a bit before the next post.

Thanks,
Nick


Re: [PATCH v2 30/37] KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU

2021-03-04 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 3, 2021 7:21 am:
> Nicholas Piggin  writes:
> 
>> Rather than partition the guest PID space and catch and flush a rogue
>> guest, instead work around this issue by ensuring the MMU is always
>> disabled in HV mode while the guest MMU context is switched in.
>>
>> This may be a bit less efficient, but it is a lot less complicated and
>> allows the P9 path to trivally implement the workaround too. Newer CPUs
>> are not subject to this issue.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  arch/powerpc/include/asm/mmu_context.h   |  6 
>>  arch/powerpc/kvm/book3s_hv.c | 10 --
>>  arch/powerpc/kvm/book3s_hv_interrupt.c   | 14 ++--
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 34 --
>>  arch/powerpc/mm/book3s64/radix_pgtable.c | 27 +-
>>  arch/powerpc/mm/book3s64/radix_tlb.c | 46 
>>  arch/powerpc/mm/mmu_context.c|  4 +--
>>  7 files changed, 28 insertions(+), 113 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> b/arch/powerpc/include/asm/mmu_context.h
>> index 652ce85f9410..bb5c7e5e142e 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -122,12 +122,6 @@ static inline bool need_extra_context(struct mm_struct 
>> *mm, unsigned long ea)
>>  }
>>  #endif
>>
>> -#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
>> -extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
>> -#else
>> -static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
>> -#endif
>> -
>>  extern void switch_cop(struct mm_struct *next);
>>  extern int use_cop(unsigned long acop, struct mm_struct *mm);
>>  extern void drop_cop(unsigned long acop, struct mm_struct *mm);
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index ad16331c3370..c3064075f1d7 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -806,6 +806,10 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
>> unsigned long mflags,
>>  /* KVM does not support mflags=2 (AIL=2) */
>>  if (mflags != 0 && mflags != 3)
>>  return H_UNSUPPORTED_FLAG_START;
>> +/* Prefetch bug */
>> +if (cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG) &&
>> +kvmhv_vcpu_is_radix(vcpu) && mflags == 3)
>> +return H_UNSUPPORTED_FLAG_START;
> 
> So does this mean that if the host has the prefetch bug, all of its
> guests will run with AIL=0 all the time?

All radix guests will, yes.

> And what we're avoiding here is
> a guest setting AIL=3 which would (since there's no HAIL) cause
> hypervisor interrupts to be taken with MMU on, is that it?

Yes that's right.

> Do we need to add this verification to kvmppc_set_lpcr as well? QEMU
> could in theory call the KVM_SET_ONE_REG ioctl and set AIL to any value.

Yeah I guess so. We don't restrict other AIL values there by the looks
but maybe we should.

Thanks,
Nick


Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-04 Thread Ravi Bangoria




On 3/4/21 4:21 PM, Christophe Leroy wrote:



Le 04/03/2021 à 11:13, Ravi Bangoria a écrit :



On 3/4/21 1:02 PM, Christophe Leroy wrote:



Le 04/03/2021 à 06:05, Ravi Bangoria a écrit :

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
   - Drop restriction for Uprobe on suffix of prefixed instruction.
 It needs lot of code change including generic code but what
 we get in return is not worth it.

  arch/powerpc/kernel/uprobes.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
  if (addr & 0x03)
  return -EINVAL;
+    if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))


cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not 
enabled, no need to double check.


Ok.

I'm going to drop CONFIG_PPC64 check because it's not really
required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31
check as is.




+    return 0;
+
+    if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {


Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C


Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F).


Sorry I meant 3c instead of 3f (And usually we don't use capital letters for 
that).
The last two bits are supposed to always be 0, so it doesn't really matter, I just 
thought it would look better having the same value both sides of the test, ie (addr 
& 0x3c) == 0x3c.


Ok yeah makes sense. Thanks.







What about

(addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ?


Yes this is bit better. Though, it should be:

 (addr & (SZ_64 - 1)) == SZ_64 - 4


-1 or -4 should give the same results as instructions are always 32 bits 
aligned though.


Got it.

Ravi


Re: [PATCH v2 28/37] KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling

2021-03-04 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 3, 2021 1:04 am:
> Nicholas Piggin  writes:
> 
>> This is a first step to wrapping supervisor and user SPR saving and
>> loading up into helpers, which will then be called independently in
>> bare metal and nested HV cases in order to optimise SPR access.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
> 
> 
> 
>> +/* vcpu guest regs must already be saved */
>> +static void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
>> +struct p9_host_os_sprs *host_os_sprs)
>> +{
>> +mtspr(SPRN_PSPB, 0);
>> +mtspr(SPRN_WORT, 0);
>> +mtspr(SPRN_UAMOR, 0);
>> +mtspr(SPRN_PSPB, 0);
> 
> Not your fault, but PSPB is set twice here.

Yeah you're right.

>> +
>> +mtspr(SPRN_DSCR, host_os_sprs->dscr);
>> +mtspr(SPRN_TIDR, host_os_sprs->tidr);
>> +mtspr(SPRN_IAMR, host_os_sprs->iamr);
>> +
>> +if (host_os_sprs->amr != vcpu->arch.amr)
>> +mtspr(SPRN_AMR, host_os_sprs->amr);
>> +
>> +if (host_os_sprs->fscr != vcpu->arch.fscr)
>> +mtspr(SPRN_FSCR, host_os_sprs->fscr);
>> +}
>> +
> 
> 
> 
>> @@ -3605,34 +3666,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
>> *vcpu, u64 time_limit,
>>  vcpu->arch.dec_expires = dec + tb;
>>  vcpu->cpu = -1;
>>  vcpu->arch.thread_cpu = -1;
>> -vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
>> -
>> -vcpu->arch.iamr = mfspr(SPRN_IAMR);
>> -vcpu->arch.pspb = mfspr(SPRN_PSPB);
>> -vcpu->arch.fscr = mfspr(SPRN_FSCR);
>> -vcpu->arch.tar = mfspr(SPRN_TAR);
>> -vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
>> -vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
>> -vcpu->arch.bescr = mfspr(SPRN_BESCR);
>> -vcpu->arch.wort = mfspr(SPRN_WORT);
>> -vcpu->arch.tid = mfspr(SPRN_TIDR);
>> -vcpu->arch.amr = mfspr(SPRN_AMR);
>> -vcpu->arch.uamor = mfspr(SPRN_UAMOR);
>> -vcpu->arch.dscr = mfspr(SPRN_DSCR);
>> -
>> -mtspr(SPRN_PSPB, 0);
>> -mtspr(SPRN_WORT, 0);
>> -mtspr(SPRN_UAMOR, 0);
>> -mtspr(SPRN_DSCR, host_dscr);
>> -mtspr(SPRN_TIDR, host_tidr);
>> -mtspr(SPRN_IAMR, host_iamr);
>> -mtspr(SPRN_PSPB, 0);
>>
>> -if (host_amr != vcpu->arch.amr)
>> -mtspr(SPRN_AMR, host_amr);
>> +restore_p9_host_os_sprs(vcpu, &host_os_sprs);
>>
>> -if (host_fscr != vcpu->arch.fscr)
>> -mtspr(SPRN_FSCR, host_fscr);
>> +store_spr_state(vcpu);
> 
> store_spr_state should come first, right? We want to save the guest
> state before restoring the host state.

Yes good catch. I switched that back around later but looks like I
never brought the fix back to the right patch. Interestingly, things 
pretty much work like this if the guest or host doesn't do anything
much with the SPRs!

Thanks,
Nick


Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-04 Thread Christophe Leroy




Le 04/03/2021 à 11:13, Ravi Bangoria a écrit :



On 3/4/21 1:02 PM, Christophe Leroy wrote:



Le 04/03/2021 à 06:05, Ravi Bangoria a écrit :

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
   - Drop restriction for Uprobe on suffix of prefixed instruction.
 It needs lot of code change including generic code but what
 we get in return is not worth it.

  arch/powerpc/kernel/uprobes.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
  if (addr & 0x03)
  return -EINVAL;
+    if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))


cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not enabled, no need 
to double check.


Ok.

I'm going to drop CONFIG_PPC64 check because it's not really
required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31
check as is.




+    return 0;
+
+    if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {


Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C


Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F).


Sorry I meant 3c instead of 3f (And usually we don't use capital letters for 
that).
The last two bits are supposed to always be 0, so it doesn't really matter, I just thought it would 
look better having the same value both sides of the test, ie (addr & 0x3c) == 0x3c.






What about

(addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ?


Yes this is bit better. Though, it should be:

     (addr & (SZ_64 - 1)) == SZ_64 - 4


-1 or -4 should give the same results as instructions are always 32 bits 
aligned though.

Christophe


[Bug 210749] sysfs: cannot create duplicate filename '/bus/nvmem/devices/module-vpd'

2021-03-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=210749

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 CC||mich...@ellerman.id.au

--- Comment #6 from Michael Ellerman (mich...@ellerman.id.au) ---
Looks like: 61f764c307f6 ("eeprom: at24: Support custom device names for AT24
EEPROMs")

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction

2021-03-04 Thread Ravi Bangoria




On 3/4/21 1:02 PM, Christophe Leroy wrote:



Le 04/03/2021 à 06:05, Ravi Bangoria a écrit :

As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.

There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.

Signed-off-by: Ravi Bangoria 
---
v2: 
https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com
v2->v3:
   - Drop restriction for Uprobe on suffix of prefixed instruction.
 It needs lot of code change including generic code but what
 we get in return is not worth it.

  arch/powerpc/kernel/uprobes.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index e8a63713e655..c400971ebe70 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
  if (addr & 0x03)
  return -EINVAL;
+    if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31))


cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not 
enabled, no need to double check.


Ok.

I'm going to drop CONFIG_PPC64 check because it's not really
required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31
check as is.




+    return 0;
+
+    if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) {


Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C


Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F).



What about

(addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ?


Yes this is bit better. Though, it should be:

(addr & (SZ_64 - 1)) == SZ_64 - 4

Ravi


[PATCH] scsi: ibmvfc: Switch to using the new API kobj_to_dev()

2021-03-04 Thread Jiapeng Chong
Fix the following coccicheck warnings:

./drivers/scsi/ibmvscsi/ibmvfc.c:3483:60-61: WARNING opportunity for
kobj_to_dev().

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 755313b..e5f1ca7 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3480,7 +3480,7 @@ static ssize_t ibmvfc_read_trace(struct file *filp, 
struct kobject *kobj,
 struct bin_attribute *bin_attr,
 char *buf, loff_t off, size_t count)
 {
-   struct device *dev = container_of(kobj, struct device, kobj);
+   struct device *dev = kobj_to_dev(kobj);
struct Scsi_Host *shost = class_to_shost(dev);
struct ibmvfc_host *vhost = shost_priv(shost);
unsigned long flags = 0;
-- 
1.8.3.1



[PATCH v2] ASoC: imx-hdmi: fix platform_no_drv_owner.cocci warnings

2021-03-04 Thread Yang Li
./sound/soc/fsl/imx-hdmi.c:226:3-8: No need to set .owner here. The core
will do it.

Remove .owner field if calls are used which set it automatically

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---

Change in v2:
-use imx-hdmi instead of hdmi-codec for Subject

 sound/soc/fsl/imx-hdmi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
index dbbb761..cd0235a 100644
--- a/sound/soc/fsl/imx-hdmi.c
+++ b/sound/soc/fsl/imx-hdmi.c
@@ -223,7 +223,6 @@ static int imx_hdmi_probe(struct platform_device *pdev)
 static struct platform_driver imx_hdmi_driver = {
.driver = {
.name = "imx-hdmi",
-   .owner = THIS_MODULE,
.pm = &snd_soc_pm_ops,
.of_match_table = imx_hdmi_dt_ids,
},
-- 
1.8.3.1



[PATCH] powerpc: fix coding style issues

2021-03-04 Thread Qiang Ma
There are several style issues in this function,
so fix them.

Signed-off-by: Qiang Ma 
---
 arch/powerpc/kernel/syscalls.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 078608ec2e92..bcbb5fb2a0c1 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -81,15 +81,15 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
 int
 ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, 
struct __kernel_old_timeval __user *tvp)
 {
-   if ( (unsigned long)n >= 4096 )
-   {
+   if ((unsigned long)n >= 4096) {
unsigned long __user *buffer = (unsigned long __user *)n;
-   if (!access_ok(buffer, 5*sizeof(unsigned long))
+   if (!access_ok(buffer, 5 * sizeof(unsigned long))
|| __get_user(n, buffer)
-   || __get_user(inp, ((fd_set __user * __user *)(buffer+1)))
-   || __get_user(outp, ((fd_set  __user * __user *)(buffer+2)))
-   || __get_user(exp, ((fd_set  __user * __user *)(buffer+3)))
-   || __get_user(tvp, ((struct __kernel_old_timeval  __user * 
__user *)(buffer+4
+   || __get_user(inp, ((fd_set __user * __user *)(buffer + 1)))
+   || __get_user(outp, ((fd_set  __user * __user *)(buffer + 
2)))
+   || __get_user(exp, ((fd_set  __user * __user *)(buffer + 
3)))
+   || __get_user(tvp,
+   ((struct __kernel_old_timeval  __user * __user 
*)(buffer + 4
return -EFAULT;
}
return sys_select(n, inp, outp, exp, tvp);
-- 
2.20.1