[PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Arnd Bergmann
From: Arnd Bergmann 

With UBSAN enabled and building with clang, there are occasionally
warnings like

WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from 
the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
The function arch_atomic64_or() references
the variable __initdata numa_nodes_parsed.
This is often because arch_atomic64_or lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

for functions that end up not being inlined as intended but operating
on __initdata variables. Mark these as __always_inline, along with
the corresponding asm-generic wrappers.

Signed-off-by: Arnd Bergmann 
---
 arch/arm64/include/asm/atomic.h | 10 +-
 include/asm-generic/bitops/atomic.h |  6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 015ddffaf6ca..b56a4b2bc248 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -17,7 +17,7 @@
 #include 
 
 #define ATOMIC_OP(op)  \
-static inline void arch_##op(int i, atomic_t *v)   \
+static __always_inline void arch_##op(int i, atomic_t *v)  \
 {  \
__lse_ll_sc_body(op, i, v); \
 }
@@ -32,7 +32,7 @@ ATOMIC_OP(atomic_sub)
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, op)  \
-static inline int arch_##op##name(int i, atomic_t *v)  \
+static __always_inline int arch_##op##name(int i, atomic_t *v) \
 {  \
return __lse_ll_sc_body(op##name, i, v);\
 }
@@ -56,7 +56,7 @@ ATOMIC_FETCH_OPS(atomic_sub_return)
 #undef ATOMIC_FETCH_OPS
 
 #define ATOMIC64_OP(op)
\
-static inline void arch_##op(long i, atomic64_t *v)\
+static __always_inline void arch_##op(long i, atomic64_t *v)   \
 {  \
__lse_ll_sc_body(op, i, v); \
 }
@@ -71,7 +71,7 @@ ATOMIC64_OP(atomic64_sub)
 #undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, op)\
-static inline long arch_##op##name(long i, atomic64_t *v)  \
+static __always_inline long arch_##op##name(long i, atomic64_t *v) \
 {  \
return __lse_ll_sc_body(op##name, i, v);\
 }
@@ -94,7 +94,7 @@ ATOMIC64_FETCH_OPS(atomic64_sub_return)
 #undef ATOMIC64_FETCH_OP
 #undef ATOMIC64_FETCH_OPS
 
-static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+static __always_inline long arch_atomic64_dec_if_positive(atomic64_t *v)
 {
return __lse_ll_sc_body(atomic64_dec_if_positive, v);
 }
diff --git a/include/asm-generic/bitops/atomic.h 
b/include/asm-generic/bitops/atomic.h
index dd90c9792909..0e7316a86240 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -11,19 +11,19 @@
  * See Documentation/atomic_bitops.txt for details.
  */
 
-static inline void set_bit(unsigned int nr, volatile unsigned long *p)
+static __always_inline void set_bit(unsigned int nr, volatile unsigned long *p)
 {
p += BIT_WORD(nr);
atomic_long_or(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
-static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
+static __always_inline void clear_bit(unsigned int nr, volatile unsigned long 
*p)
 {
p += BIT_WORD(nr);
atomic_long_andnot(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
-static inline void change_bit(unsigned int nr, volatile unsigned long *p)
+static __always_inline void change_bit(unsigned int nr, volatile unsigned long 
*p)
 {
p += BIT_WORD(nr);
atomic_long_xor(BIT_MASK(nr), (atomic_long_t *)p);
-- 
2.29.2



Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-13 Thread Will Deacon
On Tue, Jan 12, 2021 at 10:23:12AM +, Mark Rutland wrote:
> On Fri, Jan 08, 2021 at 09:39:53PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 08, 2021 at 09:32:58AM +, Will Deacon wrote:
> > > Hi Arnd,
> > > 
> > > On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > > > From: Arnd Bergmann 
> > > > 
> > > > With UBSAN enabled and building with clang, there are occasionally
> > > > warnings like
> > > > 
> > > > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in 
> > > > reference from the function arch_atomic64_or() to the variable 
> > > > .init.data:numa_nodes_parsed
> > > > The function arch_atomic64_or() references
> > > > the variable __initdata numa_nodes_parsed.
> > > > This is often because arch_atomic64_or lacks a __initdata
> > > > annotation or the annotation of numa_nodes_parsed is wrong.
> > > > 
> > > > for functions that end up not being inlined as intended but operating
> > > > on __initdata variables. Mark these as __always_inline, along with
> > > > the corresponding asm-generic wrappers.
> > > 
> > > Hmm, I don't fully grok this. Why does it matter if a non '__init' 
> > > function
> > > is called with a pointer to some '__initdata'? Or is the reference coming
> > > from somewhere else? (where?).
> > 
> > FWIW the x86 atomics are __always_inline in part due to the noinstr
> > crud, which I imagine resulted in much the same 'fun'.
> 
> FWIW, I was planning on doing the same here as part of making arm64
> noinstr safe, so I reckon we should probably do this regardless of
> whether it's a complete fix for the section mismatch issue.

Fair enough:

Acked-by: Will Deacon 

Will


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-13 Thread Catalin Marinas
On Fri, 8 Jan 2021 10:19:56 +0100, Arnd Bergmann wrote:
> With UBSAN enabled and building with clang, there are occasionally
> warnings like
> 
> WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference 
> from the function arch_atomic64_or() to the variable 
> .init.data:numa_nodes_parsed
> The function arch_atomic64_or() references
> the variable __initdata numa_nodes_parsed.
> This is often because arch_atomic64_or lacks a __initdata
> annotation or the annotation of numa_nodes_parsed is wrong.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: make atomic helpers __always_inline
  https://git.kernel.org/arm64/c/c35a824c3183

-- 
Catalin



Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-12 Thread Mark Rutland
On Fri, Jan 08, 2021 at 09:39:53PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 08, 2021 at 09:32:58AM +, Will Deacon wrote:
> > Hi Arnd,
> > 
> > On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > > 
> > > With UBSAN enabled and building with clang, there are occasionally
> > > warnings like
> > > 
> > > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in 
> > > reference from the function arch_atomic64_or() to the variable 
> > > .init.data:numa_nodes_parsed
> > > The function arch_atomic64_or() references
> > > the variable __initdata numa_nodes_parsed.
> > > This is often because arch_atomic64_or lacks a __initdata
> > > annotation or the annotation of numa_nodes_parsed is wrong.
> > > 
> > > for functions that end up not being inlined as intended but operating
> > > on __initdata variables. Mark these as __always_inline, along with
> > > the corresponding asm-generic wrappers.
> > 
> > Hmm, I don't fully grok this. Why does it matter if a non '__init' function
> > is called with a pointer to some '__initdata'? Or is the reference coming
> > from somewhere else? (where?).
> 
> FWIW the x86 atomics are __always_inline in part due to the noinstr
> crud, which I imagine resulted in much the same 'fun'.

FWIW, I was planning on doing the same here as part of making arm64
noinstr safe, so I reckon we should probably do this regardless of
whether it's a complete fix for the section mismatch issue.

Mark.


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Will Deacon
Hi Arnd,

On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> With UBSAN enabled and building with clang, there are occasionally
> warnings like
> 
> WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference 
> from the function arch_atomic64_or() to the variable 
> .init.data:numa_nodes_parsed
> The function arch_atomic64_or() references
> the variable __initdata numa_nodes_parsed.
> This is often because arch_atomic64_or lacks a __initdata
> annotation or the annotation of numa_nodes_parsed is wrong.
> 
> for functions that end up not being inlined as intended but operating
> on __initdata variables. Mark these as __always_inline, along with
> the corresponding asm-generic wrappers.

Hmm, I don't fully grok this. Why does it matter if a non '__init' function
is called with a pointer to some '__initdata'? Or is the reference coming
from somewhere else? (where?).

Will


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Arnd Bergmann
On Fri, Jan 8, 2021 at 10:33 AM Will Deacon  wrote:
> On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> > With UBSAN enabled and building with clang, there are occasionally
> > warnings like
> >
> > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference 
> > from the function arch_atomic64_or() to the variable 
> > .init.data:numa_nodes_parsed
> > The function arch_atomic64_or() references
> > the variable __initdata numa_nodes_parsed.
> > This is often because arch_atomic64_or lacks a __initdata
> > annotation or the annotation of numa_nodes_parsed is wrong.
> >
> > for functions that end up not being inlined as intended but operating
> > on __initdata variables. Mark these as __always_inline, along with
> > the corresponding asm-generic wrappers.
>
> Hmm, I don't fully grok this. Why does it matter if a non '__init' function
> is called with a pointer to some '__initdata'? Or is the reference coming
> from somewhere else? (where?).

There are (at least) three ways for gcc to deal with a 'static inline'
function:

a) fully inline it as the __always_inline attribute does
b) not inline it at all, treating it as a regular static function
c) create a specialized version with different calling conventions

In this case, clang goes with option c when it notices that all
callers pass the same constant pointer. This means we have a
synthetic

static noinline long arch_atomic64_or(long i)
{
return __lse_ll_sc_body(atomic64_fetch_or, i, &numa_nodes_parsed);
}

which is a few bytes shorter than option b as it saves a load in the
caller. This function definition however violates the kernel's rules
for section references, as the synthetic version is not marked __init.

  Arnd


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Will Deacon
On Fri, Jan 08, 2021 at 11:26:53AM +0100, Arnd Bergmann wrote:
> On Fri, Jan 8, 2021 at 10:33 AM Will Deacon  wrote:
> > On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > > With UBSAN enabled and building with clang, there are occasionally
> > > warnings like
> > >
> > > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in 
> > > reference from the function arch_atomic64_or() to the variable 
> > > .init.data:numa_nodes_parsed
> > > The function arch_atomic64_or() references
> > > the variable __initdata numa_nodes_parsed.
> > > This is often because arch_atomic64_or lacks a __initdata
> > > annotation or the annotation of numa_nodes_parsed is wrong.
> > >
> > > for functions that end up not being inlined as intended but operating
> > > on __initdata variables. Mark these as __always_inline, along with
> > > the corresponding asm-generic wrappers.
> >
> > Hmm, I don't fully grok this. Why does it matter if a non '__init' function
> > is called with a pointer to some '__initdata'? Or is the reference coming
> > from somewhere else? (where?).
> 
> There are (at least) three ways for gcc to deal with a 'static inline'
> function:
> 
> a) fully inline it as the __always_inline attribute does
> b) not inline it at all, treating it as a regular static function
> c) create a specialized version with different calling conventions
> 
> In this case, clang goes with option c when it notices that all
> callers pass the same constant pointer. This means we have a
> synthetic
> 
> static noinline long arch_atomic64_or(long i)
> {
> return __lse_ll_sc_body(atomic64_fetch_or, i, &numa_nodes_parsed);
> }
> 
> which is a few bytes shorter than option b as it saves a load in the
> caller. This function definition however violates the kernel's rules
> for section references, as the synthetic version is not marked __init.

Ah, I was hoping the compiler would've sorted that out, but then again, how
would it know? But doesn't this mean that whenever we get one caller passing
something like an __initdata pointer to a function, then that function needs
to be __always_inline for everybody? It feels like a slippery slope
considering the incentive to go back and replace it with 'inline' if the
caller goes away is very small.

Didn't we used to #define inline as __always_inline to avoid this situation?

Will


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Arnd Bergmann
On Fri, Jan 8, 2021 at 7:50 PM Will Deacon  wrote:
> On Fri, Jan 08, 2021 at 11:26:53AM +0100, Arnd Bergmann wrote:
> >
> > a) fully inline it as the __always_inline attribute does
> > b) not inline it at all, treating it as a regular static function
> > c) create a specialized version with different calling conventions
> >
> > In this case, clang goes with option c when it notices that all
> > callers pass the same constant pointer. This means we have a
> > synthetic
> >
> > static noinline long arch_atomic64_or(long i)
> > {
> > return __lse_ll_sc_body(atomic64_fetch_or, i, &numa_nodes_parsed);
> > }
> >
> > which is a few bytes shorter than option b as it saves a load in the
> > caller. This function definition however violates the kernel's rules
> > for section references, as the synthetic version is not marked __init.
>
> Ah, I was hoping the compiler would've sorted that out, but then again, how
> would it know? But doesn't this mean that whenever we get one caller passing
> something like an __initdata pointer to a function, then that function needs
> to be __always_inline for everybody? It feels like a slippery slope
> considering the incentive to go back and replace it with 'inline' if the
> caller goes away is very small.

It showed up after UBSAN was enabled, which changed in the inlining rules.
I think we've had similar cases in the past, and worked around them in the
same way. IIRC there were two or three such instances this time, and only
in functions that are supposed to be only a handful of instructions long.

One thing I did not try though was to look at the object file to find
out why it has done this. Here is the generated assembler code for reference:

.p2align2   // -- Begin
function arch_atomic64_or
.type   arch_atomic64_or,@function
arch_atomic64_or:   // @arch_atomic64_or
// %bb.0:
hint#25
stp x29, x30, [sp, #-48]!   // 16-byte Folded Spill
stp x20, x19, [sp, #32] // 16-byte Folded Spill
mov x19, x1
mov x20, x0
str x21, [sp, #16]  // 8-byte Folded Spill
mov x29, sp
//APP
1:  b   .Ltmp2
.pushsection__jump_table, "aw"
.align  3
.long   1b - ., .Ltmp2 - .
.quad   arm64_const_caps_ready+1 - .
.popsection

//NO_APP
// %bb.1:
mov w21, wzr
.LBB19_2:
adrpx0, system_uses_lse_atomics.__f
eor w1, w21, #0x1
add x0, x0, :lo12:system_uses_lse_atomics.__f
mov w2, #1
mov w3, wzr
bl  ftrace_likely_update
tbnzw21, #0, .LBB19_7
// %bb.3:
//APP
1:  b   .Ltmp3
.pushsection__jump_table, "aw"
.align  3
.long   1b - ., .Ltmp3 - .
.quad   cpu_hwcap_keys+81 - .
.popsection

//NO_APP
// %bb.4:
mov w21, #1
.LBB19_5:
adrpx0, system_uses_lse_atomics.__f.20
add x0, x0, :lo12:system_uses_lse_atomics.__f.20
mov w2, #1
mov w1, w21
mov w3, wzr
bl  ftrace_likely_update
cbz w21, .LBB19_7
// %bb.6:
//APP
.arch_extension lse
stset   x20, [x19]

//NO_APP
b   .LBB19_8
.LBB19_7:
//APP
// atomic64_or
b   3f
.subsection 1
3:
prfmpstl1strm, [x19]
1:  ldxrx8, [x19]
orr x8, x8, x20
stxrw9, x8, [x19]
cbnzw9, 1b
b   4f
.previous
4:

//NO_APP
.LBB19_8:
ldp x20, x19, [sp, #32] // 16-byte Folded Reload
ldr x21, [sp, #16]  // 8-byte Folded Reload
ldp x29, x30, [sp], #48 // 16-byte Folded Reload
hint#29
ret
.Ltmp2: // Block address taken
.LBB19_9:
mov w21, #1
b   .LBB19_2
.Ltmp3: // Block address taken
.LBB19_10:
mov w21, wzr
b   .LBB19_5
.Lfunc_end19:
.size   arch_atomic64_or, .Lfunc_end19-arch_atomic64_or
// -- End function
.section.init.text,"ax",@progbits
.p2align2   // -- Begin
function early_cpu_to_node


Admittedly, now that I look at the output, I tend to agree with the
compiler that it should not be inlined and my approach was wrong!

And indeed, CONFIG_UBSAN does not even change the contents of
the function, but it does reduce the amount of inlining overall, so
without UBSAN it does not happen.

This patch actually avoids the out-of-line version as well
and als

Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Peter Zijlstra
On Fri, Jan 08, 2021 at 09:32:58AM +, Will Deacon wrote:
> Hi Arnd,
> 
> On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> > 
> > With UBSAN enabled and building with clang, there are occasionally
> > warnings like
> > 
> > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference 
> > from the function arch_atomic64_or() to the variable 
> > .init.data:numa_nodes_parsed
> > The function arch_atomic64_or() references
> > the variable __initdata numa_nodes_parsed.
> > This is often because arch_atomic64_or lacks a __initdata
> > annotation or the annotation of numa_nodes_parsed is wrong.
> > 
> > for functions that end up not being inlined as intended but operating
> > on __initdata variables. Mark these as __always_inline, along with
> > the corresponding asm-generic wrappers.
> 
> Hmm, I don't fully grok this. Why does it matter if a non '__init' function
> is called with a pointer to some '__initdata'? Or is the reference coming
> from somewhere else? (where?).

FWIW the x86 atomics are __always_inline in part due to the noinstr
crud, which I imagine resulted in much the same 'fun'.


Re: [PATCH] arm64: make atomic helpers __always_inline

2021-01-08 Thread Nick Desaulniers
On Fri, Jan 8, 2021 at 2:27 AM Arnd Bergmann  wrote:
>
> On Fri, Jan 8, 2021 at 10:33 AM Will Deacon  wrote:
> > On Fri, Jan 08, 2021 at 10:19:56AM +0100, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > > With UBSAN enabled and building with clang, there are occasionally
> > > warnings like
> > >
> > > WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in 
> > > reference from the function arch_atomic64_or() to the variable 
> > > .init.data:numa_nodes_parsed
> > > The function arch_atomic64_or() references
> > > the variable __initdata numa_nodes_parsed.
> > > This is often because arch_atomic64_or lacks a __initdata
> > > annotation or the annotation of numa_nodes_parsed is wrong.
> > >
> > > for functions that end up not being inlined as intended but operating
> > > on __initdata variables. Mark these as __always_inline, along with
> > > the corresponding asm-generic wrappers.
> >
> > Hmm, I don't fully grok this. Why does it matter if a non '__init' function
> > is called with a pointer to some '__initdata'? Or is the reference coming
> > from somewhere else? (where?).
>
> There are (at least) three ways for gcc to deal with a 'static inline'
> function:
>
> a) fully inline it as the __always_inline attribute does
> b) not inline it at all, treating it as a regular static function
> c) create a specialized version with different calling conventions
>
> In this case, clang goes with option c when it notices that all
> callers pass the same constant pointer. This means we have a
> synthetic
>
> static noinline long arch_atomic64_or(long i)
> {
> return __lse_ll_sc_body(atomic64_fetch_or, i, &numa_nodes_parsed);
> }
>
> which is a few bytes shorter than option b as it saves a load in the
> caller. This function definition however violates the kernel's rules
> for section references, as the synthetic version is not marked __init.

Interesting, I didn't know LLVM could do that.  Do you have a simpler
test case? Maybe I could just fix that in LLVM. (I would guess that
when synthesizing a function from an existing function, the new
function needs to copy the original functions attributes as well).

-- 
Thanks,
~Nick Desaulniers