rs6000 stack_tie mishap again

Olivier Hainque Wed, 23 Mar 2016 09:07:05 -0700

Hello,

This is a proposal to address what I think has been a long-standing, very
nasty latent code generation problem, which just manifested in-house on a
proprietary testcase.


This is intricate, so long email ...

Visible misbehavior
====================

The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
epilogue sequence like

   addi %r11,%r1,184    # temp pointer within frame

   addi %r1,%r11,104    # release frame

   evldd %r21,16(%r11)  # restore registers
   ...                  # ...
   evldd %r31,96(%r11)  # ...

   blr                  # return

This is causing troubles in situations where interrupt code somehow quicks in
between the frame release and the blr, clobbering the stack slots before the
register restore insn have run.

Triggering conditions
=====================

We are witnessing a situation where the rs6000 stack_tie mechanism is
ineffective, out of a missed dependency between the stack_tie insn and some
register restores.

We have observed this with a gcc 4.7 back-end and weren't able to reproduce
with a more recent version. I believe, however, that the problem is still
latent, just requiring extremely precise conditions to trigger the problematic
scheduling decision.

At the RTL level, with our 4.7 based compiler, the chain of events is as
follows:

Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to:

------------ p.adb.216r.split4:

(insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650])
        (high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high}
...

(note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG)
...
... sequence of register restores ...

(insn 894 893 895 32 (set (reg:SI 11 %r11)
        (plus:SI (reg/f:SI 1 %r1)
            (const_int 184 [0xb8]))) 
... 
(insn 907 906 908 32 (set (reg:V2SI 31 31)
        (mem:V2SI (plus:SI (reg:SI 11 11)
                (const_int 96 [0x60])) [0 S8 A64])) 
 
... STACK TIE insn, intended to prevent the scheduler from moving ...
... restores past this point, so they remain issued prior to the  ...
... following insn:
 
(insn 908 907 909 32 (parallel [
            (set (mem/c:BLK (reg/f:SI 1 1) [17 A8])
                (const_int 0 [0]))
            (set (mem/c:BLK (reg:SI 11 11) [17 A8])
                (const_int 0 [0]))
        ]) ../p.adb:84:8 681 {stack_tie}
     (nil))
 
... bumping r1 (stack pointer), releasing the stack frame:
 
(insn/f 909 908 910 32 (set (reg/f:SI 1 1)
        (plus:SI (reg:SI 11 11)
            (const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1}

The tie references a mem with alias set 17 (stack accesses). The restores
use alias set 0, which should conflict anyway, so this looks OK.

When sched2 kicks in, we however fail to establish a dependency between
907 (and other restores relative to r11) and the stack_tie.

At the bottom of the chain, we see write_dependence_p claiming absence
of dependence for 907 from

  if (! writep)
    {
      base = find_base_term (mem_addr);
      if (base && (GET_CODE (base) == LABEL_REF
                   || (GET_CODE (base) == SYMBOL_REF
                       && CONSTANT_POOL_ADDRESS_P (base))))
        return 0;
    }


with

(gdb) pr mem_addr
(plus:SI (reg:SI 11 11)
    (const_int 96 [0x60]))
 
and
 
(gdb) pr base
(symbol_ref/u:SI ("*.LC0") [flags 0x82])
 
coming from insn 710, despite 894 in between. Ug.

Further Analysis & patch proposal
=================================

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

      /* Walk the insns adding values to the new_reg_base_value array.  */
      for (i = 0; i < rpo_cnt; i++)
        { ...
                  if (could_be_prologue_epilogue
                      && prologue_epilogue_contains (insn))
                    continue;

The motivation for this is unclear to me.

My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.

This is the basis of the proposed patch, which essentially disconnects the
shortcut above for !frame_related insns on targets which need precise alias
analysis within epilogues, as indicated by a new target hook.

On the key insn at hand, the frame_related bit was cleared on purpose,
per:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

  "1) Marking an instruction setting up r11 for use by _save64gpr_* as
   frame-related results in r11 being set as the cfa for the rest of the
   function.  That's bad when r11 gets used for other things later.
  "

I have verified that it cures the problem with our gcc-4.7 based toolchain,
and we have been exercising it in gcc-4.9 based compilers on lots of ppc and
e500 configurations without problems for a few weeks now

I'll be happy to perform further testing if required. This is pretty heavy for
me, however, so I'd rather have a discussion and a preliminary agreement on
the approach first.

This kind of issue isn't new. See for example the threads rooted at
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html
or https://gcc.gnu.org/ml/gcc-patches/2012-04/msg00070.html

The misbehavior it produces is a real pain, with arbitrary hazards very hard
to track down. We have been hit more than once already for a gain which is not
so obvious.

So, aside from the dependency issue which needs to be fixed somehow, I
think it would make sense to consider using a strong blockage mecanism in
expand_epilogue.

Thoughts ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier

-----


2016-03-23  Olivier Hainque <hain...@adacore.com>

        * target.def (epilogue_aliases): New target hook.
        * function.c (prologue_epilogue_insns): Return bool and rewrite as
        combination of ...
        (prologue_insns, epilogue_insns): New functions.
        * rtl.h: Add prototypes for the new functions and adjust prototype
        of prologue_epilogue_insns for the return type change.
        * alias.c (init_alias_analysis): Honor targetm.epilogue_aliases.
    
        * config/rs6000/rs6000.c (TARGET_EPILOGUE_ALIASES): Redefine, return
        true.
    
        doc/
        * tm.texi.in (TARGET_EPILOGUE_ALIASES): Add new @hook entry.
        * tm.texi: Regenerate.

ppc-frame.diff
Description: Binary data

rs6000 stack_tie mishap again

Reply via email to