On Tue, Jul 16, 2019 at 11:15:54AM -0700, Nick Desaulniers wrote: > On Sun, Jul 14, 2019 at 5:37 PM Josh Poimboeuf <jpoim...@redhat.com> wrote: > > > > On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression > > elimination" optimization results in ___bpf_prog_run()'s jumptable code > > changing from this: > > > > select_insn: > > jmp *jumptable(, %rax, 8) > > ... > > ALU64_ADD_X: > > ... > > jmp *jumptable(, %rax, 8) > > ALU_ADD_X: > > ... > > jmp *jumptable(, %rax, 8) > > > > to this: > > > > select_insn: > > mov jumptable, %r12 > > jmp *(%r12, %rax, 8) > > ... > > ALU64_ADD_X: > > ... > > jmp *(%r12, %rax, 8) > > ALU_ADD_X: > > ... > > jmp *(%r12, %rax, 8) > > > > The jumptable address is placed in a register once, at the beginning of > > the function. The function execution can then go through multiple > > indirect jumps which rely on that same register value. This has a few > > issues: > > > > 1) Objtool isn't smart enough to be able to track such a register value > > across multiple recursive indirect jumps through the jump table. > > > > 2) With CONFIG_RETPOLINE enabled, this optimization actually results in > > a small slowdown. I measured a ~4.7% slowdown in the test_bpf > > "tcpdump port 22" selftest. > > > > This slowdown is actually predicted by the GCC manual: > > > > Note: When compiling a program using computed gotos, a GCC > > extension, you may get better run-time performance if you > > disable the global common subexpression elimination pass by > > adding -fno-gcse to the command line. > > > > So just disable the optimization for this function. > > > > Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code") > > Reported-by: Randy Dunlap <rdun...@infradead.org> > > Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com> > > Acked-by: Alexei Starovoitov <a...@kernel.org> > > --- > > Cc: Alexei Starovoitov <a...@kernel.org> > > Cc: Daniel Borkmann <dan...@iogearbox.net> > > --- > > include/linux/compiler-gcc.h | 2 ++ > > include/linux/compiler_types.h | 4 ++++ > > kernel/bpf/core.c | 2 +- > > 3 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > > index e8579412ad21..d7ee4c6bad48 100644 > > --- a/include/linux/compiler-gcc.h > > +++ b/include/linux/compiler-gcc.h > > @@ -170,3 +170,5 @@ > > #else > > #define __diag_GCC_8(s) > > #endif > > + > > +#define __no_fgcse __attribute__((optimize("-fno-gcse"))) > > + Miguel, maintainer of compiler_attributes.h > I wonder if the optimize attributes can be feature detected? > Is -fno-gcse supported all the way back to GCC 4.6?
Yeah, from snooping in the GCC tree it looks like it's been around for 18+ years. -- Josh