14 Regression] Misaligned vmovaps when compiling with stack-protector-strong for znver4

jakub at gcc dot gnu.org via Gcc-bugs Thu, 04 Apr 2024 10:08:23 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114566


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |avieira at gcc dot gnu.org
            Summary|[11/12/13 Regression]       |[11/12/13/14 Regression]
                   |Misaligned vmovaps when     |Misaligned vmovaps when
                   |compiling with              |compiling with
                   |stack-protector-strong for  |stack-protector-strong for
                   |znver4                      |znver4

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, it is the
10753         /* The vector size of the epilogue is smaller than that of the
main loop
10754            so the alignment is either the same or lower. This means the
dr will
10755            thus by definition be aligned.  */
10756         STMT_VINFO_DR_INFO (stmt_vinfo)->base_misaligned = false;
that clears base_misaligned, but somehow nothing forced the higher alignment on
the var before.
And the assumption is just wrong.
In the main loop it is using 512-bit vectors and we have base_alignment 16,
offset_alignment 32, so for V16SFmode accesses in the main vectorized loop as
the earlier one in the vectorized epilogue, so vect_compute_data_ref_alignment
in that
case gave up already earlier:
  if (drb->offset_alignment < vect_align_c
      || !step_preserves_misalignment_p
      /* We need to know whether the step wrt the vectorized loop is
         negative when computing the starting misalignment below.  */
      || TREE_CODE (drb->step) != INTEGER_CST)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "Unknown alignment for access: %T\n", ref);
      return;
    }
and just in the V8SFmode case in the epilogue, because vect_align_c is there 32
rather than 64, goes further and triggers
  if (base_alignment < vect_align_c)
    {
      unsigned int max_alignment;
      tree base = get_base_for_alignment (drb->base_address, &max_alignment);
      if (max_alignment < vect_align_c
          || !vect_can_force_dr_alignment_p (base,
                                             vect_align_c * BITS_PER_UNIT))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_NOTE, vect_location,
                             "can't force alignment of ref: %T\n", ref);
          return;
        }

      /* Force the alignment of the decl.
         NOTE: This is the only change to the code we make during
         the analysis phase, before deciding to vectorize the loop.  */
      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                         "force alignment of %T\n", ref);

      dr_info->base_decl = base;
      dr_info->base_misaligned = true;
      base_misalignment = 0;
    }

So, if we don't want to force higher base alignment just because of some
accesses in vectorizable epilogue, I think we need to recompute the
alignment/misalignment there as well.
Marking for 14 as well because I believe the trunk commit just made it latent
there rather than fixed.

[Bug target/114566] [11/12/13/14 Regression] Misaligned vmovaps when compiling with stack-protector-strong for znver4

Reply via email to