http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58461
Bug ID: 58461 Summary: [MIPS] Using LRA instead of reload increases code size for mips16 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matthew.fortune at imgtec dot com Created attachment 30852 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30852&action=edit Test case to trigger LRA reload issue While working on enabling LRA for MIPS (mips16 in particular) I have observed that LRA is not producing as optimal code as classic reload. The underlying cause of this is that the register allocation decisions made in IRA were sub-optimal but classic reload 'fixes' them whereas LRA does not. Regardless of fixing the IRA issues there is probably something to fix in LRA. I have attached a patch that applies to top of tree to enable LRA for mips/mips16 and exposes two options to demonstrate how LRA differs from classic reload. I have also attached a test case (reload_test_mips16.i) which is a function from libjpeg. The two options added by the patch are -mreload and -mfix-regalloc. LRA is default on with the patch applied: * mips-sde-elf-gcc -Os -mips16 -march=m4kec reload_test_mips16.i LRA introduces a number of reloads that involve $24 which is inaccessible to most mips16 instructions leading to an increase in code size. * mips-sde-elf-gcc -Os -mips16 -march=m4kec -mreload ... Classic reload manages to avoid the reloads of $24 as its reloads converge to use the same reload register and eliminate $24 altogether. * mips-sde-elf-gcc -Os -mips16 -march=m4kec -mfix-regalloc ... LRA now outperforms classic reload as the initial register allocation by IRA is better so LRA does not hit the problem I am reporting. The original register allocation from IRA is: Disposition: 26:r245 l0 2 2:r246 l0 4 28:r249 l0 2 3:r250 l0 16 4:r251 l0 17 5:r252 l0 8 6:r253 l0 9 19:r260 l0 11 15:r266 l0 24 12:r275 l0 3 11:r276 l0 2 10:r278 l0 10 27:r281 l0 4 7:r282 l0 5 8:r283 l0 6 1:r284 l0 7 29:r285 l0 mem 24:r286 l0 24 25:r287 l0 2 23:r288 l0 24 22:r289 l0 24 21:r290 l0 24 20:r291 l0 24 18:r292 l0 24 17:r293 l0 11 16:r294 l0 11 14:r295 l0 11 13:r296 l0 24 9:r297 l0 24 0:r298 l0 24 The fixed register allocation from IRA is as follows, note the mems instead of hard registers 8-11,24: Disposition: 26:r245 l0 2 2:r246 l0 4 28:r249 l0 2 3:r250 l0 mem 4:r251 l0 mem 5:r252 l0 16 6:r253 l0 17 19:r260 l0 mem 15:r266 l0 mem 12:r275 l0 3 11:r276 l0 2 10:r278 l0 mem 27:r281 l0 4 7:r282 l0 5 8:r283 l0 6 1:r284 l0 7 29:r285 l0 mem 24:r286 l0 mem 25:r287 l0 2 23:r288 l0 mem 22:r289 l0 mem 21:r290 l0 mem 20:r291 l0 mem 18:r292 l0 mem 17:r293 l0 mem 16:r294 l0 mem 14:r295 l0 mem 13:r296 l0 mem 9:r297 l0 24 0:r298 l0 24 So the issue (I believe) is that reloads from LRA do not converge as well as reloads introduced by classic reload. While this example can (and should) be fixed in the original register allocation it feels as though there is a problem to fix in LRA. ============== [mfortune@mfortune-linux lra_bugreport]$ /althome/mips/tk/bin/mips-sde-elf-gcc -v Using built-in specs. COLLECT_GCC=/althome/mips/tk/bin/mips-sde-elf-gcc COLLECT_LTO_WRAPPER=/althome/mips/tk/libexec/gcc/mips-sde-elf/4.9.0/lto-wrapper Target: mips-sde-elf Configured with: /althome/mips/git_br/gcc/configure --prefix=/althome/mips/tk --target=mips-sde-elf --with-gnu-as --with-gnu-ld --with-arch=mips32r2 --with-mips-plt --with-synci --with-llsc --with-newlib target_alias=mips-sde-elf --enable-languages=c,c++,lto Thread model: single gcc version 4.9.0 20130918 (experimental) (GCC)