https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114566
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |avieira at gcc dot gnu.org Summary|[11/12/13 Regression] |[11/12/13/14 Regression] |Misaligned vmovaps when |Misaligned vmovaps when |compiling with |compiling with |stack-protector-strong for |stack-protector-strong for |znver4 |znver4 --- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Ah, it is the 10753 /* The vector size of the epilogue is smaller than that of the main loop 10754 so the alignment is either the same or lower. This means the dr will 10755 thus by definition be aligned. */ 10756 STMT_VINFO_DR_INFO (stmt_vinfo)->base_misaligned = false; that clears base_misaligned, but somehow nothing forced the higher alignment on the var before. And the assumption is just wrong. In the main loop it is using 512-bit vectors and we have base_alignment 16, offset_alignment 32, so for V16SFmode accesses in the main vectorized loop as the earlier one in the vectorized epilogue, so vect_compute_data_ref_alignment in that case gave up already earlier: if (drb->offset_alignment < vect_align_c || !step_preserves_misalignment_p /* We need to know whether the step wrt the vectorized loop is negative when computing the starting misalignment below. */ || TREE_CODE (drb->step) != INTEGER_CST) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Unknown alignment for access: %T\n", ref); return; } and just in the V8SFmode case in the epilogue, because vect_align_c is there 32 rather than 64, goes further and triggers if (base_alignment < vect_align_c) { unsigned int max_alignment; tree base = get_base_for_alignment (drb->base_address, &max_alignment); if (max_alignment < vect_align_c || !vect_can_force_dr_alignment_p (base, vect_align_c * BITS_PER_UNIT)) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "can't force alignment of ref: %T\n", ref); return; } /* Force the alignment of the decl. NOTE: This is the only change to the code we make during the analysis phase, before deciding to vectorize the loop. */ if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "force alignment of %T\n", ref); dr_info->base_decl = base; dr_info->base_misaligned = true; base_misalignment = 0; } So, if we don't want to force higher base alignment just because of some accesses in vectorizable epilogue, I think we need to recompute the alignment/misalignment there as well. Marking for 14 as well because I believe the trunk commit just made it latent there rather than fixed.