https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120839

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #14)
> (In reply to Richard Biener from comment #12)
> > I don't see HJ is working on this but I clearly do not know enough of
> > this code.  I believe it's a backend issue though and a fix must be
> > done in the backend.
> 
> My current patch is at
> 
> https://patchwork.sourceware.org/project/gcc/list/?series=49364

But comment#4 and comment#5 are misguided.  The target cannot change
alignment of objects accessible by the user.  Iff the ABI really
specifies this should be passed only 128bit (which I doubt, see comment#10),
then we have to emit a callee copy so that accesses done in 'e' access an
object with specified alignment.

typedef struct {
  long double a, b;
} c __attribute__((aligned(32)));
double d;
void e(c f);// { d = f.a; }

c x;

void bar()
{
  e (x);
}

shows we do pass the object in a 128bit aligned stack slot only.

But

void e(c f) { if ((unsigned long)&f & (1<<5 - 1)) __builtin_abort (); }

shows we elide the alignment test.

And

typedef int v8si __attribute__((vector_size(32)));
typedef struct {
  char a[32];
} c __attribute__((aligned(32)));
v8si d;
void __attribute__((noinline)) e(c f) { d = *(v8si *)f.a; }

c x;

void bar()
{
  e (x);
}

shows we pass 'x' in a 16 byte aligned stack slot, copy it to a local,
properly aligned storage and access that with large alignment.  This
happens in assign_parm_setup_block

(insn 2 5 3 2 (set (reg:OI 99)
        (mem/c:OI (reg/f:DI 92 virtual-incoming-args) [0 f+0 S32 A64]))
"t.c":6:39 -1
     (nil))
(insn 3 2 4 2 (set (mem/c:OI (plus:DI (reg/f:DI 93 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [0 f+0 S32 A256])
        (reg:OI 99)) "t.c":6:39 -1
     (nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg:V8SI 100)
        (mem/c:V8SI (plus:DI (reg/f:DI 93 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [1 MEM[(v8si *)&f]+0 S32
A256])) "t.c":6:43 -1
     (nil))


That is something we somehow fail to do for the testcase in question - possibly
XFmode is special here or some other code is confused about the incoming
argument alignment.

(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (reg:XF 100)
        (mem/c:XF (reg/f:DI 92 virtual-incoming-args) [1 f.a+0 S16 A256]))
"t.c":5:20 -1
     (nil))
(insn 6 5 0 2 (set (mem/c:DF (symbol_ref:DI ("d") [flags 0x2]  <var_decl
0x7ffff740de40 d>) [3 d+0 S8 A64])
        (float_truncate:DF (reg:XF 100))) "t.c":5:20 -1
     (nil))

assign_parm_setup_block doesn't do this because data->stack_parm is already
assigned.  It get's cleared in assign_parm_adjust_stack_rtl for the working
case but not here:

  /* If we can't trust the parm stack slot to be aligned enough for its
     ultimate type, don't use that slot after entry.  We'll make another
     stack slot, if we need one.  */
  if (stack_parm
      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
           && ((optab_handler (movmisalign_optab, data->nominal_mode)
                != CODE_FOR_nothing)
               || targetm.slow_unaligned_access (data->nominal_mode,
                                                 MEM_ALIGN (stack_parm))))
          || (data->nominal_type
              && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
              && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
    stack_parm = NULL; 

where the difference is data->stack_parm with A64 vs A128 and the
PREFERRED_STACK_BOUNDARY check which I do not understand.  I don't
quite understand the movmisalign optab check either, but ...
the latter check was introduced in r0-64961-gbfc45551d5ace4

I believe the MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY needs to
be dropped, changing it to <= also works and is less aggressive.

Anyway, dropping or changing to <= the fixes the testcase and we emit

e:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        movdqa  16(%rbp), %xmm0
        movaps  %xmm0, -32(%rsp)
        fldt    -32(%rsp)
        fstpl   d(%rip)
        leave
        .cfi_def_cfa 7, 8
        ret

Reply via email to