reduction-10.c and many more FAIL

vogt at linux dot vnet.ibm.com Tue, 29 Nov 2016 10:15:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78468


--- Comment #20 from Dominik Vogt <vogt at linux dot vnet.ibm.com> ---
(In reply to Eric Botcazou from comment #19)
> I think that the patch is simply incorrect and should be reverted, it very
> likely breaks other ports than PowerPC and SPARC and the failure more is
> quite nasty.

It does not break anything that wasn't broken before.  The Sparc backend was
just _lucky_ that the allocation code in the middlend was _broken_.  Otherwise
Gcc for Sparc (and Aix) would have generated code that makes dynamic
allocations with alloca() overlap.

(Actually this patch already helped to identify dynamic array bounds violations
in some Gcc library and Glibc that were real bugs that were hidden by Gcc's
over-allocation but possibly not by other compilers.  The unpatched Gcc
promotes array bounds violations in user code by providing some surprising
extra space that covers programming bugs most of the time.)

> IMO it's fundamentally backwards: instead of making it so that the alignment
> of VIRTUAL_STACK_DYNAMIC_REGNUM is honored by every dynamic allocation, it
> assumes that it is already honored to optimize the dynamic allocation.

The patch fixes the bug that causes dynamic stack allocation to overestimate
the needed space on the stack most of the time.  To do this, it uses
information available from elsewhere in the middleend.

It turns out that the backend (or middlend, depends on the point of view) lies
about the alignment of VIRTUAL_STACK_DYNAMIC_REGNUM.  There may be _other_
users users of that value that fail to do their job because they think the
stored alignment is correct.  Such users may do worse things than wasting some
stack space - we may just have not noticed them yet.

So, there is _another_ bug in the backends (or the middleend) that needs to be
fixed.  It's not "one fix instead of another" - there are two bugs that need
two separate fixes.

--

You say this should rather be fixed in the middleend, but actually it (i.e.
both bugs) _cannot_ be fixed in the middleend without correct alignment
information from the backend:

Consider this program:

-- snip --
__attribute__ ((noinline))
int *foo(int a1, int a2, int a3, int a4, int a5, int a6,
         int *pl, int *px, int *d, int *e)
{
  return d + a1 + a2 + a3 + a4 + a5 + a6;
}

int main(int argc, char **argv)
{
  int l;
  int x[argc];
  int *p;
  __attribute__ ((aligned(4))) int d[argc];
  __attribute__ ((aligned(8))) int e[argc];

  p = foo(argc + 1, argc + 2, argc + 3, argc + 4, argc + 5, argc + 6,
          &l, x, d, e);

  return (int)p;
}
-- snip --

Compiling it on Sparc (without the discussed patch) with "gcc -O3 -m32 -S
test.c" produces this assembly output:

-- snip --
main:
        save    %sp, -120, %sp
        sll     %i0, 2, %g1     ; i0 = 2 -> g1 = 8
        add     %g1, 10, %g2    ; g2 = 18
        add     %g1, 14, %g1    ; g1 = 22
        and     %g2, -8, %g2    ; g2 = 16
        and     %g1, -8, %g1    ; g1 = 16
        sub     %sp, %g2, %sp
        add     %sp, 108, %g3   ; g3 = fp - 28 (x)
        sub     %sp, %g2, %sp
        add     %sp, 108, %g2   ; g2 = fp - 44 (d)
        sub     %sp, %g1, %sp
        add     %sp, 112, %g1   ; g1 = fp - 56 (e)
        st      %g1, [%sp+104]
        add     %fp, -4, %g1    ; g1 = fp - 4 (&l)
        ... (set %o registers)
        st      %g2, [%sp+100]
        st      %g3, [%sp+96]
        call    foo, 0
         st     %g1, [%sp+92]
-- snip --

So, the unpatched stack layout is:

     fp   +--------------------+ sp0
          |         l          |
 fp - 4   +--------------------+
          |////// wasted //////| <--- where does this come from?
          |////// space  //////|
 fp - 12  +--------------------+ <--- start of dynamic allocation area
          |////// wasted //////|
          |////// space  //////|
 fp - 20  +--------------------+
          |        x[1]        |
          |        x[0]        |
 fp - 28  +--------------------+ <-------
          |////// wasted //////|         \
          |////// space  //////|          |
 fp - 36  +--------------------+          |
          |        d[1]        |          |
          |        d[0]        |          |
 fp - 44  +--------------------+ <----------
          |##### padding ######|          | \
 fp - 48  +--------------------+          |  |
          |        e[1]        |          |  |
          |        e[0]        |          |  |
 fp - 56  +--------------------+ <-------------
          |/// wasted space ///|          |  | \
 fp - 60  +--------------------+          |  |  |
          | outarg 10 (e)      |          |  |  |
          | outarg 9 (d)       |          |  |  |
          | outarg 8 (x)       |          |  |  |
          | outarg 7 (&l)      |          |  |  |
 fp - 76  +--------------------+          |  |  |
                   ...                    |  |  |
 fp - 120 +----- dynamic ------+ sp1      |  |  |
          |     allocation     |          |  |  |
          |         |          |         /   |  |
 fp - 136 +---      |       ---+ sp2 ----    |  |
          |         |          |  +108       |  |
          |         |          |            /   |
 fp - 152 +---      |       ---+ sp3 -------    |
          |         |          |  +108          |
          |         v          |               /
 fp - 168 +--------------------+ sp4 ----------
                                  +112

This wastes 8 + 8 + 4 = 20 bytes of stack space because of over-allocation and
bad alignment info.  With the patch plus the fix I've suggested first, this
wasted space goes away:

     fp   +--------------------+ sp0
          |         l          |
 fp - 4   +--------------------+
          |////// wasted //////| <--- where does this come from?
          |////// space  //////|
 fp - 12  +--------------------+
          |##### padding ######|
 fp - 16  +--------------------+ <--- start of dynamic allocation area
          |        x[1]        |
          |        x[0]        |
 fp - 24  +--------------------+
          |        d[1]        |
          |        d[0]        |
 fp - 28  +--------------------+
          |        e[1]        |
          |        e[0]        |
 fp - 40  +--------------------+
          | outarg 10 (e)      |
          | outarg 9 (d)       |
          | outarg 8 (x)       |
          | outarg 7 (&l)      |
 fp - 56  +--------------------+
                   ...
 fp - 120 +----- dynamic ------+
          |     allocation     |
          |         |          |
          |         v          |
 fp - 144 +--------------------+

At the beginning of the dynamic allocation area, the backend inserts four
padding bytes for functions that have calls_alloca set.  After that, each
allocation reserves memory in multiples of STACK_BOUNDARY (hard coded in
get_dynamic_stack_space()), and between them there is no additional space
(except if the alignment is > 8).

(Note that it's unclear where the extra 16 bytes after the automatic variables
come from.  This extra space appears when a function does any dynamic
allocation.  As I've seen it only on Sparc, it seems to be some buggy
calculation in the backend.)

--

It is also better to make the broken backends respect the STACK_BOUNDARY
alignment of VIRTUAL_STACK_DYNAMIC_REGNUM than make them tell the middleend the
real alignment:

get_dynamic_stack_space() allocates stack space in multiples of STACK_BOUNDARY
(a source code comment states that is to guarantee the stack pointer alignment
is not broken).  Allowing the backend to set this to a lower value would force
the function to round up more often.  In the example above, the 8-byte-aligned
allocation would need to 8 bytes for run time alignment because it does not
know the alignment of the pointer left by the previous allocation.  Each
following allocation may need another patch of padding.  This could only be
fixed by keeping track of whether a dynamic allocation is the first one in the
function, i.e. it might be necessary to generate code to track this, for
example with this code snippet:

 if (flag_x)
   px = alloca(x);
 if (flag_y)
   py = alloca(y);

--

Making the middleend force STACK_BOUNDARY alignment of
VIRTUAL_STACK_DYNAMIC_REGNUM is also possible, but without any information from
the backend we'd have to generate run time code to do that alignment.  So, why
not just use STACK_DYNAMIC_OFFSET et. al. to just guarantee that the alignment
is right?

[Bug libgomp/78468] [7 regression] libgomp.c/reduction-10.c and many more FAIL

Reply via email to