On 26/09/2018 06:03, rth7...@gmail.com wrote:
From: Richard Henderson <richard.hender...@linaro.org>
ARMv8.1 adds an (mandatory) Atomics extension, also known as the
Large System Extension. Deploying this extension at the OS level
has proved challenging.
The following is the result of a conversation between myself,
Alex Graf of SuSE, and Ramana Radhakrishnan of ARM, at last week's
Linaro Connect in Vancouver.
The current state of the world is that one could distribute two
different copies of a given shared library and place the LSE-enabled
version in /lib64/atomics/ and it will be selected over the /lib64/
version by ld.so when HWCAP_ATOMICS is present.
Alex's main concern with this is that (1) he doesn't want to
distribute two copies of every library, or determine what a
resonable subset would be and (2) this solution does not work
for executables, e.g. mysql.
Ramana's main concern was to avoid the overhead of an indirect jump,
especially in how that would affect the (non-)branch-prediction of
the smallest implementations.
Therefore, I've created small out-of-line helpers that are directly
linked into every library or executable that requires them. There
will be two direct branches, both of which will be well-predicted.
In the process, I discovered a number of places within the code
where the existing implementation could be improved. In particular:
- the LSE patterns didn't use predicates or constraints that
match the actual instructions, requiring unnecessary splitting.
- the non-LSE compare-and-swap can use an extending compare to
avoid requiring the input to have been previously extended.
- TImode compare-and-swap was missing entirely. This brings
aarch64 to parity with x86_64 wrt __sync_val_compare_and_swap.
There is a final patch that enables the new option by default.
I am not necessarily expecting this to be merged upstream, but
for the operating system to decide what the default should be.
It might be that this should be a configure option, so as to
make that OS choice easier, but I've just now thought of that. ;-)
I'm going to have to rely on Alex and/or Ramana to perform
testing on a system that supports LSE.
Thanks for this patchset -
I'll give this a whirl in the next couple of days but don't expect
results until Monday or so.
I do have an additional concern that I forgot to mention in Vancouver -
Thanks Wilco for reminding me that this now replaces a bunch of inline
instructions with effectively a library call therefore clobbering a
whole bunch of caller saved registers.
In which case I see 2 options.
- maybe we should consider a private interface and restrict the
registers that these files are compiled with to minimise the number of
caller saved registers we trash.
- Alternatively we should consider an option to inline these at O2 or O3
as we may just be trading the performance improvements we get with using
the lse atomics for additional stacking and unstacking of caller saved
registers in the main functions...
But anyway while we discuss that we'll have a look at testing and
benchmarking this.
regards
Ramana
r~
Richard Henderson (11):
aarch64: Simplify LSE cas generation
aarch64: Improve cas generation
aarch64: Improve swp generation
aarch64: Improve atomic-op lse generation
aarch64: Emit LSE st<op> instructions
Add visibility to libfunc constructors
Link static libgcc after shared libgcc for -shared-libgcc
aarch64: Add out-of-line functions for LSE atomics
aarch64: Implement -matomic-ool
aarch64: Implement TImode compare-and-swap
Enable -matomic-ool by default
gcc/config/aarch64/aarch64-protos.h | 20 +-
gcc/optabs-libfuncs.h | 2 +
gcc/common/config/aarch64/aarch64-common.c | 6 +-
gcc/config/aarch64/aarch64.c | 480 ++++++--------
gcc/gcc.c | 9 +-
gcc/optabs-libfuncs.c | 26 +-
.../atomic-comp-swap-release-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-inst-ldadd.c | 18 +-
.../gcc.target/aarch64/atomic-inst-ldlogic.c | 54 +-
.../gcc.target/aarch64/atomic-op-acq_rel.c | 2 +-
.../gcc.target/aarch64/atomic-op-acquire.c | 2 +-
.../gcc.target/aarch64/atomic-op-char.c | 2 +-
.../gcc.target/aarch64/atomic-op-consume.c | 2 +-
.../gcc.target/aarch64/atomic-op-imm.c | 2 +-
.../gcc.target/aarch64/atomic-op-int.c | 2 +-
.../gcc.target/aarch64/atomic-op-long.c | 2 +-
.../gcc.target/aarch64/atomic-op-relaxed.c | 2 +-
.../gcc.target/aarch64/atomic-op-release.c | 2 +-
.../gcc.target/aarch64/atomic-op-seq_cst.c | 2 +-
.../gcc.target/aarch64/atomic-op-short.c | 2 +-
.../aarch64/atomic_cmp_exchange_zero_reg_1.c | 2 +-
.../atomic_cmp_exchange_zero_strong_1.c | 2 +-
.../gcc.target/aarch64/sync-comp-swap.c | 2 +-
.../gcc.target/aarch64/sync-op-acquire.c | 2 +-
.../gcc.target/aarch64/sync-op-full.c | 2 +-
libgcc/config/aarch64/lse.c | 280 ++++++++
gcc/config/aarch64/aarch64.opt | 4 +
gcc/config/aarch64/atomics.md | 608 ++++++++++--------
gcc/config/aarch64/iterators.md | 8 +-
gcc/config/aarch64/predicates.md | 12 +
gcc/doc/invoke.texi | 14 +-
libgcc/config.host | 4 +
libgcc/config/aarch64/t-lse | 48 ++
33 files changed, 1050 insertions(+), 577 deletions(-)
create mode 100644 libgcc/config/aarch64/lse.c
create mode 100644 libgcc/config/aarch64/t-lse