On Sun, Aug 5, 2018 at 12:48 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote: >> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubiz...@gmail.com> wrote: >> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu...@intel.com> wrote: >> >>>>> We should always set cfun->machine->max_used_stack_alignment if the >> >>>>> maximum stack slot alignment may be greater than 64 bits. >> >>>>> >> >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? >> >>>> >> >>>> Can you explain why 64 bits, and what this value represents? Is this >> >>>> value the same for 64bit and 32bit targets? >> >>>> >> >>>> Should crtl->max_used_stack_slot_alignment be compared to >> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? >> >>> >> >>> In this case, we don't need to realign the incoming stack since both >> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary >> >>> are 128 bits. We don't compute the largest alignment of stack slots to >> >>> keep stack frame properly aligned for it. Normally it is OK. But if >> >>> the largest alignment of stack slots > 64 bits and we don't keep stack >> >>> frame proper aligned, we will get segfault if aligned vector load/store >> >>> are used on these unaligned stack slots. My patch computes the >> >>> largest alignment of stack slots, which are actually used, if the >> >>> largest alignment of stack slots allocated is > 64 bits, which is >> >>> the smallest alignment for misaligned load/store. >> >> >> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think >> >> that we need to compare to STACK_BOUNDARY instead: >> > >> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. >> > cfun->machine->max_used_stack_alignment is used to decide how >> > stack frame should be aligned. It is always safe to compute it. I used >> > >> > else if (crtl->max_used_stack_slot_alignment > 64) >> > >> > to compute cfun->machine->max_used_stack_alignment only if >> > we have to. >> > >> >> --cut here-- >> >> Index: config/i386/i386.c >> >> =================================================================== >> >> --- config/i386/i386.c (revision 263307) >> >> +++ config/i386/i386.c (working copy) >> >> @@ -13281,8 +13281,7 @@ >> >> recompute_frame_layout_p = true; >> >> } >> >> } >> >> - else if (crtl->max_used_stack_slot_alignment >> >> - > crtl->preferred_stack_boundary) >> >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) >> >> { >> > >> > This isn't correct.. cygming.h has >> > >> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : >> > BITS_PER_WORD) >> > >> > darwin.h has >> > >> > #undef STACK_BOUNDARY >> > #define STACK_BOUNDARY \ >> > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ >> > ? 128 : BITS_PER_WORD) >> > >> > i386.h has >> > >> > /* Boundary (in *bits*) on which stack pointer should be aligned. */ >> > #define STACK_BOUNDARY \ >> > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) >> > >> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, >> > we will get segment when 128bit aligned load/store is generated on >> > misaligned >> > stack slot. >> > >> >> /* We don't need to realign stack. But we still need to keep >> >> stack frame properly aligned to satisfy the largest alignment >> >> --cut here-- >> >> >> >> (The testcase works OK with -mabi=ms, which somehow suggests that we >> >> don't need realignment in this case). >> > >> > We may not hit 128bit aligned load/store on misaligned stack slot in this >> > case. It doesn't mean that won't happen. >> > >> > else if (crtl->max_used_stack_slot_alignment > 64) >> > >> > is the correct thing to do here. >> >> OK, but please add a comment, so in the future we will still know the >> purpose of the magic number. >> > > Like this? > > H.J. > --- > cfun->machine->max_used_stack_alignment is used to decide how stack frame > should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit. > It is always safe to compute max_used_stack_alignment. We compute it only > if 128-bit aligned load/store may be generated on misaligned stack slot > which will lead to segfault. > > gcc/ > > PR target/86386 > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set > cfun->machine->max_used_stack_alignment if needed. > > gcc/testsuite/ > > PR target/86386 > * gcc.target/i386/pr86386.c: New file.
OK, but please write the condition as ">= 128". The number 64 confused me; I was thinking that it has something to do with minimum alignment on 64bit target, while 128 clearly shows that alignment is related to SSE moves. With ">= 128", I think that code is clear enough that a long comment is not needed. Thanks, and sorry for the confusion, Uros. > --- > gcc/config/i386/i386.c | 14 +++++++------ > gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++ > 2 files changed, 34 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ee409cfe7e4..cf8c33bd909 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void) > recompute_frame_layout_p = true; > } > } > - else if (crtl->max_used_stack_slot_alignment > - > crtl->preferred_stack_boundary) > - { > - /* We don't need to realign stack. But we still need to keep > - stack frame properly aligned to satisfy the largest alignment > - of stack slots. */ > + else if (crtl->max_used_stack_slot_alignment > 64) > + { > + /* We don't need to realign stack. max_used_stack_alignment is > + used to decide how stack frame should be aligned. This is > + independent of any psABIs nor 32-bit vs 64-bit. It is always > + safe to compute max_used_stack_alignment. We compute it only > + if 128-bit aligned load/store may be generated on misaligned > + stack slot which will lead to segfault. */ > if (ix86_find_max_used_stack_alignment (stack_alignment, true)) > cfun->machine->max_used_stack_alignment > = stack_alignment / BITS_PER_UNIT; > diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c > b/gcc/testsuite/gcc.target/i386/pr86386.c > new file mode 100644 > index 00000000000..a67cf45444e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr86386.c > @@ -0,0 +1,26 @@ > +/* PR target/86386 */ > +/* { dg-do run { target { avx_runtime && int128 } } } */ > +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } > */ > + > +unsigned c, d, e, f; > + > +unsigned __attribute__((noipa)) > +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, > + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) > +{ > + __builtin_memset (&e, 0, 3); > + n <<= m; > + __builtin_memcpy (&m, 2 + (char *) &n, 1); > + m >>= 0; > + d ^= __builtin_mul_overflow (l, n, &m); > + return m; > +} > + > +int > +main () > +{ > + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); > + if (x != 24) > + __builtin_abort (); > + return 0; > +} > -- > 2.17.1 >