On Sun, Aug 5, 2018 at 12:48 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote:
>> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu...@intel.com> wrote:
>> >>>>> We should always set cfun->machine->max_used_stack_alignment if the
>> >>>>> maximum stack slot alignment may be greater than 64 bits.
>> >>>>>
>> >>>>> Tested on i686 and x86-64.  OK for master and backport for GCC 8?
>> >>>>
>> >>>> Can you explain why 64 bits, and what this value represents? Is this
>> >>>> value the same for 64bit and 32bit targets?
>> >>>>
>> >>>> Should crtl->max_used_stack_slot_alignment be compared to
>> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>> >>>
>> >>> In this case, we don't need to realign the incoming stack since both
>> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
>> >>> are 128 bits.  We don't compute the largest alignment of stack slots to
>> >>> keep stack frame properly aligned for it.  Normally it is OK.   But if
>> >>> the largest alignment of stack slots > 64 bits and we don't keep stack
>> >>> frame proper aligned, we will get segfault if aligned vector load/store
>> >>> are used on these unaligned stack slots. My patch computes the
>> >>> largest alignment of stack slots, which are actually used,  if the
>> >>> largest alignment of stack slots allocated is > 64 bits, which is
>> >>> the smallest alignment for misaligned load/store.
>> >>
>> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
>> >> that we need to compare to STACK_BOUNDARY instead:
>> >
>> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
>> > cfun->machine->max_used_stack_alignment is used to decide how
>> > stack frame should be aligned.  It is always safe to compute it.  I used
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > to compute cfun->machine->max_used_stack_alignment only if
>> > we have to.
>> >
>> >> --cut here--
>> >> Index: config/i386/i386.c
>> >> ===================================================================
>> >> --- config/i386/i386.c  (revision 263307)
>> >> +++ config/i386/i386.c  (working copy)
>> >> @@ -13281,8 +13281,7 @@
>> >>           recompute_frame_layout_p = true;
>> >>         }
>> >>      }
>> >> -  else if (crtl->max_used_stack_slot_alignment
>> >> -          > crtl->preferred_stack_boundary)
>> >> +  else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
>> >>      {
>> >
>> > This isn't correct..  cygming.h has
>> >
>> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
>> > BITS_PER_WORD)
>> >
>> > darwin.h has
>> >
>> > #undef STACK_BOUNDARY
>> > #define STACK_BOUNDARY \
>> >  ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
>> >   ? 128 : BITS_PER_WORD)
>> >
>> > i386.h has
>> >
>> > /* Boundary (in *bits*) on which stack pointer should be aligned.  */
>> > #define STACK_BOUNDARY \
>> >  (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)
>> >
>> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
>> > we will get segment when 128bit aligned load/store is generated on 
>> > misaligned
>> > stack slot.
>> >
>> >>        /* We don't need to realign stack.  But we still need to keep
>> >>          stack frame properly aligned to satisfy the largest alignment
>> >> --cut here--
>> >>
>> >> (The testcase works OK with -mabi=ms, which somehow suggests that we
>> >> don't need realignment in this case).
>> >
>> > We may not hit 128bit aligned load/store on misaligned stack slot in this
>> > case.  It doesn't mean that won't happen.
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > is the correct thing to do here.
>>
>> OK, but please add a comment, so in the future we will still know the
>> purpose of the magic number.
>>
>
> Like this?
>
> H.J.
> ---
> cfun->machine->max_used_stack_alignment is used to decide how stack frame
> should be aligned.  This is independent of any psABIs nor 32-bit vs 64-bit.
> It is always safe to compute max_used_stack_alignment.  We compute it only
> if 128-bit aligned load/store may be generated on misaligned stack slot
> which will lead to segfault.
>
> gcc/
>
>         PR target/86386
>         * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
>         cfun->machine->max_used_stack_alignment if needed.
>
> gcc/testsuite/
>
>         PR target/86386
>         * gcc.target/i386/pr86386.c: New file.

OK, but please write the condition as ">= 128". The number 64 confused
me; I was thinking that it has something to do with minimum alignment
on 64bit target, while 128 clearly shows that alignment is related to
SSE moves. With ">= 128", I think that code is clear enough that a
long comment is not needed.

Thanks, and sorry for the confusion,
Uros.

> ---
>  gcc/config/i386/i386.c                  | 14 +++++++------
>  gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ee409cfe7e4..cf8c33bd909 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void)
>           recompute_frame_layout_p = true;
>         }
>      }
> -  else if (crtl->max_used_stack_slot_alignment
> -          > crtl->preferred_stack_boundary)
> -    {
> -      /* We don't need to realign stack.  But we still need to keep
> -        stack frame properly aligned to satisfy the largest alignment
> -        of stack slots.  */
> +  else if (crtl->max_used_stack_slot_alignment > 64)
> +    {
> +      /* We don't need to realign stack.  max_used_stack_alignment is
> +        used to decide how stack frame should be aligned.  This is
> +        independent of any psABIs nor 32-bit vs 64-bit.  It is always
> +        safe to compute max_used_stack_alignment.  We compute it only
> +        if 128-bit aligned load/store may be generated on misaligned
> +        stack slot which will lead to segfault.   */
>        if (ix86_find_max_used_stack_alignment (stack_alignment, true))
>         cfun->machine->max_used_stack_alignment
>           = stack_alignment / BITS_PER_UNIT;
> diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c 
> b/gcc/testsuite/gcc.target/i386/pr86386.c
> new file mode 100644
> index 00000000000..a67cf45444e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr86386.c
> @@ -0,0 +1,26 @@
> +/* PR target/86386 */
> +/* { dg-do run { target { avx_runtime && int128 } } } */
> +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } 
> */
> +
> +unsigned c, d, e, f;
> +
> +unsigned __attribute__((noipa))
> +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
> +     unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
> +{
> +  __builtin_memset (&e, 0, 3);
> +  n <<= m;
> +  __builtin_memcpy (&m, 2 + (char *) &n, 1);
> +  m >>= 0;
> +  d ^= __builtin_mul_overflow (l, n, &m);
> +  return m;
> +}
> +
> +int
> +main ()
> +{
> +  unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
> +  if (x != 24)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.17.1
>

Reply via email to