>   Here is a new rematerialization sub-pass of LRA.
> 
>   I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
> sub-pass permits to generate a smaller code in average on both
> architecture (although improvement no-significant), adds < 0.4%
> additional compilation time in -O2 mode of release GCC (according user
> time of compilation of 500K lines fortran program and valgrind lakey #
> insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
> performance result, the best I found is 1% SPECFP2000 improvement on
> ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
> results are practically the same (Haswell has a very good
> sophisticated memory sub-system).

I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves
-mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is
AArch64 specific, but previously non-optimal register allocation choices 
triggered
A latent bug in ree (it's unclear why GCC still allocates FP registers in 
high-pressure integer code, as I set the costs for int<->FP moves high).

On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and 
SPECFP is ~0.2% faster.

Generally I think it is good to have a specific pass for rematerialization.
However should this not also affect the costs of instructions that can be 
cheaply rematerialized? Similarly for the choice whether to caller save or 
spill 
(today the caller-save code doesn't care at all about rematerialization, so it 
aggressively caller-saves values which could be rematerialized - see eg. 
https://gcc.gnu.org/ml/gcc/2014-09/msg00071.html).

Also I am confused by the claim "memory reads are not profitable to 
rematerialize". 
Surely rematerializing a memory read from const-data or literal pool is cheaper
than spilling as you avoid a store to the stack?

Wilco


 


Reply via email to