Hello, This is a proposal to address what I think has been a long-standing, very nasty latent code generation problem, which just manifested in-house on a proprietary testcase.
This is intricate, so long email ... Visible misbehavior ==================== The visible effect is a powerpc-eabispe compiler (no red-zone) producing an epilogue sequence like addi %r11,%r1,184 # temp pointer within frame addi %r1,%r11,104 # release frame evldd %r21,16(%r11) # restore registers ... # ... evldd %r31,96(%r11) # ... blr # return This is causing troubles in situations where interrupt code somehow quicks in between the frame release and the blr, clobbering the stack slots before the register restore insn have run. Triggering conditions ===================== We are witnessing a situation where the rs6000 stack_tie mechanism is ineffective, out of a missed dependency between the stack_tie insn and some register restores. We have observed this with a gcc 4.7 back-end and weren't able to reproduce with a more recent version. I believe, however, that the problem is still latent, just requiring extremely precise conditions to trigger the problematic scheduling decision. At the RTL level, with our 4.7 based compiler, the chain of events is as follows: Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to: ------------ p.adb.216r.split4: (insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650]) (high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high} ... (note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG) ... ... sequence of register restores ... (insn 894 893 895 32 (set (reg:SI 11 %r11) (plus:SI (reg/f:SI 1 %r1) (const_int 184 [0xb8]))) ... (insn 907 906 908 32 (set (reg:V2SI 31 31) (mem:V2SI (plus:SI (reg:SI 11 11) (const_int 96 [0x60])) [0 S8 A64])) ... STACK TIE insn, intended to prevent the scheduler from moving ... ... restores past this point, so they remain issued prior to the ... ... following insn: (insn 908 907 909 32 (parallel [ (set (mem/c:BLK (reg/f:SI 1 1) [17 A8]) (const_int 0 [0])) (set (mem/c:BLK (reg:SI 11 11) [17 A8]) (const_int 0 [0])) ]) ../p.adb:84:8 681 {stack_tie} (nil)) ... bumping r1 (stack pointer), releasing the stack frame: (insn/f 909 908 910 32 (set (reg/f:SI 1 1) (plus:SI (reg:SI 11 11) (const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1} The tie references a mem with alias set 17 (stack accesses). The restores use alias set 0, which should conflict anyway, so this looks OK. When sched2 kicks in, we however fail to establish a dependency between 907 (and other restores relative to r11) and the stack_tie. At the bottom of the chain, we see write_dependence_p claiming absence of dependence for 907 from if (! writep) { base = find_base_term (mem_addr); if (base && (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (base)))) return 0; } with (gdb) pr mem_addr (plus:SI (reg:SI 11 11) (const_int 96 [0x60])) and (gdb) pr base (symbol_ref/u:SI ("*.LC0") [flags 0x82]) coming from insn 710, despite 894 in between. Ug. Further Analysis & patch proposal ================================= The reason why 894 is not accounted in the base ref computation is because it is part of the epilogue sequence, and init_alias_analysis has: /* Walk the insns adding values to the new_reg_base_value array. */ for (i = 0; i < rpo_cnt; i++) { ... if (could_be_prologue_epilogue && prologue_epilogue_contains (insn)) continue; The motivation for this is unclear to me. My rough understanding is that we probably really care about frame_related insns only here, at least on targets where the flag is supposed to be set accurately. This is the basis of the proposed patch, which essentially disconnects the shortcut above for !frame_related insns on targets which need precise alias analysis within epilogues, as indicated by a new target hook. On the key insn at hand, the frame_related bit was cleared on purpose, per: https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html "1) Marking an instruction setting up r11 for use by _save64gpr_* as frame-related results in r11 being set as the cfa for the rest of the function. That's bad when r11 gets used for other things later. " I have verified that it cures the problem with our gcc-4.7 based toolchain, and we have been exercising it in gcc-4.9 based compilers on lots of ppc and e500 configurations without problems for a few weeks now I'll be happy to perform further testing if required. This is pretty heavy for me, however, so I'd rather have a discussion and a preliminary agreement on the approach first. This kind of issue isn't new. See for example the threads rooted at https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html or https://gcc.gnu.org/ml/gcc-patches/2012-04/msg00070.html The misbehavior it produces is a real pain, with arbitrary hazards very hard to track down. We have been hit more than once already for a gain which is not so obvious. So, aside from the dependency issue which needs to be fixed somehow, I think it would make sense to consider using a strong blockage mecanism in expand_epilogue. Thoughts ? Thanks in advance for your feedback, With Kind Regards, Olivier ----- 2016-03-23 Olivier Hainque <hain...@adacore.com> * target.def (epilogue_aliases): New target hook. * function.c (prologue_epilogue_insns): Return bool and rewrite as combination of ... (prologue_insns, epilogue_insns): New functions. * rtl.h: Add prototypes for the new functions and adjust prototype of prologue_epilogue_insns for the return type change. * alias.c (init_alias_analysis): Honor targetm.epilogue_aliases. * config/rs6000/rs6000.c (TARGET_EPILOGUE_ALIASES): Redefine, return true. doc/ * tm.texi.in (TARGET_EPILOGUE_ALIASES): Add new @hook entry. * tm.texi: Regenerate.
ppc-frame.diff
Description: Binary data