The following patch fixes PR59466
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59466
LRA on PPC sometimes generates inefficient code
addis 9,2,.LC77@toc@ha
addi 9,9,.LC77@toc@l
ld 9,0(9)
instead of
addis 9,2,.LC77@toc@ha
ld 9,.LC77@toc@l(9)
I can not create a small test for this. The smallest file when I
found is bzip2.c from SPEC2000.
LRA generates move insn with invalid memory [unspec[`*.LC29',%2:DI]
45] but it can handle it (make it valid) very efficiently after that
trying different numerous transformations. PPC target machinary
through validize_mem just put all address in a pseudo.
I could prevent use validize_mem in rs6000.c but I prefer to do it
in change_addres_1 as other targets might have the same problem and it
is better to have one for all solution.
Still it does not fully solve the problem as insn
r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as
*movdi... pattern has operand predicates rejecting memory because of
invalid address. To fix this a change in general_operand is done. As
LRA can not work properly with regular insn recognition, I added an
assert for this in lra_set_insn_recog_data to figure out this
situation earlier.
Again, LRA has a very good code for legitimize address by itself and
it is better to use it.
After applying patch I see code size reduction on SPEC2000.
Before the patch (this is relative reload generated code):
----------------CFP2000-----------------
-0.471% 27171 27043 168.wupwise
-0.457% 14006 13942 171.swim
-0.392% 24515 24419 172.mgrid
0.226% 85079 85271 173.applu
0.751% 728891 734363 177.mesa
0.194% 214357 214773 178.galgel
-0.295% 21683 21619 179.art
0.412% 31089 31217 183.equake
-0.520% 79976 79560 187.facerec
0.000% 152504 152504 188.ammp
0.000% 43758 43758 189.lucas
-0.181% 1062265 1060337 191.fma3d
1.035% 1041684 1052468 200.sixtrack
-0.105% 151944 151784 301.apsi
Average = 0.0139775%
----------------CINT2000-----------------
0.252% 76242 76434 164.gzip
0.172% 186152 186472 175.vpr
-0.215% 2084612 2080132 176.gcc
0.000% 16716 16716 181.mcf
0.085% 225316 225508 186.crafty
-0.015% 210100 210068 197.parser
0.622% 433635 436332 252.eon
-0.298% 762014 759742 253.perlbmk
=0.000% 904784 904784 254.gap
-0.285% 706432 704416 255.vortex
0.220% 58297 58425 256.bzip2
0.314% 265334 266166 300.twolf
Average = 0.070863%
After the patch:
----------------CFP2000-----------------
-0.589% 27171 27011 168.wupwise
-0.457% 14006 13942 171.swim
-0.392% 24515 24419 172.mgrid
-0.113% 85079 84983 173.applu
0.654% 728891 733659 177.mesa
0.060% 214357 214485 178.galgel
-0.295% 21683 21619 179.art
-0.412% 31089 30961 183.equake
-0.520% 79976 79560 187.facerec
0.000% 152504 152504 188.ammp
0.000% 43758 43758 189.lucas
-0.317% 1062265 1058897 191.fma3d
0.356% 1041684 1045396 200.sixtrack
-0.105% 151944 151784 301.apsi
Average = -0.152103%
----------------CINT2000-----------------
0.084% 76242 76306 164.gzip
-0.052% 186152 186056 175.vpr
-0.284% 2084612 2078692 176.gcc
0.000% 16716 16716 181.mcf
-0.312% 225316 224612 186.crafty
-0.091% 210100 209908 197.parser
0.622% 433635 436332 252.eon
-0.340% 762014 759422 253.perlbmk
0.000% 904784 904784 254.gap
-0.335% 706432 704064 255.vortex
0.110% 58297 58361 256.bzip2
-0.241% 265334 264694 300.twolf
Average = -0.070023%
Code size reduction for PPC in this case means also faster code
generation. I see it but cannot provide reliable SPEC2000 rate
change.
The patch was successfully bootstrapped and tested on i686, x86_64,
and PPC64.
Ok to commit?
2013-12-11 Vladimir Makarov <vmaka...@redhat.com>
PR rtl-optimization/59466
* emit-rtl.c (change_address_1): Don't validate address for LRA.
* recog.c (general_operand): Accept any memory for LRA.
* lra.c (lra_set_insn_recog_data): Add an assert.
Index: emit-rtl.c
===================================================================
--- emit-rtl.c (revision 205870)
+++ emit-rtl.c (working copy)
@@ -1951,7 +1951,9 @@ change_address_1 (rtx memref, enum machi
&& (!validate || memory_address_addr_space_p (mode, addr, as)))
return memref;
- if (validate)
+ /* Don't validate address for LRA. LRA can make the address valid
+ by itself in most efficient way. */
+ if (validate && !lra_in_progress)
{
if (reload_in_progress || reload_completed)
gcc_assert (memory_address_addr_space_p (mode, addr, as));
Index: lra.c
===================================================================
--- lra.c (revision 205870)
+++ lra.c (working copy)
@@ -1072,9 +1072,16 @@ lra_set_insn_recog_data (rtx insn)
nop = asm_noperands (PATTERN (insn));
data->operand_loc = data->dup_loc = NULL;
if (nop < 0)
- /* Its is a special insn like USE or CLOBBER. */
- data->insn_static_data = insn_static_data
- = get_static_insn_data (-1, 0, 0, 1);
+ {
+ /* Its is a special insn like USE or CLOBBER. We should
+ recognize any regular insn otherwise LRA can do nothing
+ with this insn. */
+ gcc_assert (GET_CODE (PATTERN (insn)) == USE
+ || GET_CODE (PATTERN (insn)) == CLOBBER
+ || GET_CODE (PATTERN (insn)) == ASM_INPUT);
+ data->insn_static_data = insn_static_data
+ = get_static_insn_data (-1, 0, 0, 1);
+ }
else
{
/* expand_asm_operands makes sure there aren't too many
Index: recog.c
===================================================================
--- recog.c (revision 205870)
+++ recog.c (working copy)
@@ -1,3 +1,4 @@
+
/* Subroutines used by or related to instruction recognition.
Copyright (C) 1987-2013 Free Software Foundation, Inc.
@@ -1021,8 +1022,12 @@ general_operand (rtx op, enum machine_mo
if (! volatile_ok && MEM_VOLATILE_P (op))
return 0;
- /* Use the mem's mode, since it will be reloaded thus. */
- if (memory_address_addr_space_p (GET_MODE (op), y, MEM_ADDR_SPACE
(op)))
+ /* Use the mem's mode, since it will be reloaded thus. LRA can
+ generate move insn with invalid addresses which is made valid
+ and efficiently calculated by LRA through further numerous
+ transformations. */
+ if (lra_in_progress
+ || memory_address_addr_space_p (GET_MODE (op), y, MEM_ADDR_SPACE
(op)))
return 1;
}