[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 --- Comment #7 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:af4ccaa7515b8e72449448c509916575831e6292 commit r12-284-gaf4ccaa7515b8e72449448c509916575831e6292 Author: Richard Biener Date: Thu Apr 29 11:52:08 2021 +0200 tree-optimization/100253 - fix bogus aligned vectorized loads/stores At some point DR_MISALIGNMENT was supposed to be -1 when the access was not element aligned. That's obviously not true at this point so this adjusts both store and load vectorizing to no longer assume this which in turn allows simplifying the code. 2021-04-29 Richard Biener PR tree-optimization/100253 * tree-vect-stmts.c (vectorizable_load): Do not assume element alignment when DR_MISALIGNMENT is -1. (vectorizable_store): Likewise. * g++.dg/pr100253.C: New testcase.
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 --- Comment #6 from Richard Biener --- So the issue is we're getting a dataref pointer like <__int128 unsigned> [(char * {ref-all}) + 25B] and the first access has DR_MISALIGNMENT of 9 and the target alignment is 16. So we have align == 16 misalign == 9 then we do data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr, dataref_offset ? dataref_offset : build_int_cst (ref_type, 0)); if (aligned_access_p (first_dr_info)) ; else if (DR_MISALIGNMENT (first_dr_info) == -1) TREE_TYPE (data_ref) = build_aligned_type (TREE_TYPE (data_ref), align * BITS_PER_UNIT); else TREE_TYPE (data_ref) = build_aligned_type (TREE_TYPE (data_ref), TYPE_ALIGN (elem_type)); but since DR_MISALIGNMENT is not -1 we assume element alignment (since DR_MISALIGNMENT is the misalign in elements and at least at some point wasn't arbitrary ... unless I misremember). Since the vector type is vector(1) __int128 unsigned we get an aligned access. Note how we're using 'align' in the == -1 case but that's the target alignment ... The load code has the same issue. I'm testing a simplification.
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Priority|P3 |P2 Target Milestone|--- |10.4 --- Comment #5 from Richard Biener --- I will have a look.
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 --- Comment #4 from Andrew Pinski --- (In reply to Hongtao.liu from comment #3) > > I think SLP did not mark the load as unaligned even though it knows it is > > one: > But gimple tree is marked as aligned. Right and we are saying the same thing just differently. SLP is what needs to mark the load as unaligned as it creates the (gimple) load in the first place.
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 --- Comment #3 from Hongtao.liu --- (In reply to Andrew Pinski from comment #2) > The problem is right away in expand: > ;; vect__36.383_12 = MEM [(char * > {ref-all})_10 + 16B]; > > (insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ]) > (mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ]) > (const_int 16 [0x10])) [0 MEM > [(char * {ref-all})_10 + 16B]+0 S16 A128])) -1 > (nil)) > > > I think SLP did not mark the load as unaligned even though it knows it is > one: But gimple tree is marked as aligned. unit-size align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffea300a80 precision:128 min max pointer_to_this > unsigned V1TI size unit-size align:128 warn_if_not_align:0 symtab:0 alias-set 31 canonical-type 0x7fffe9a59150 nunits:1 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffea30c498> visited def_stmt _11 = + _214; version:11 ptr-info 0x7fffe9487330> arg:1 constant 16>>
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Component|rtl-optimization|tree-optimization Last reconfirmed||2021-04-25 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- The problem is right away in expand: ;; vect__36.383_12 = MEM [(char * {ref-all})_10 + 16B]; (insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ]) (mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ]) (const_int 16 [0x10])) [0 MEM [(char * {ref-all})_10 + 16B]+0 S16 A128])) -1 (nil)) I think SLP did not mark the load as unaligned even though it knows it is one: t.cc:7:8: note: Vectorizing an unaligned access. t.cc:7:8: note: vect_model_load_cost: unaligned supported by hardware. t.cc:7:8: note: vect_model_load_cost: inside_cost = 24, prologue_cost = 0 . t.cc:7:8: note: ==> examining statement: MEM <__int128 unsigned> [(char * {ref-all}) + 25B] = _36; t.cc:7:8: note: vect_is_simple_use: operand # VUSE <.MEM_30> MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_10], type of def: internal t.cc:7:8: note: vect_is_simple_use: operand # VUSE <.MEM_35> MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_19], type of def: internal t.cc:7:8: note: Vectorizing an unaligned access. t.cc:7:8: note: vect_model_store_cost: unaligned supported by hardware. Confirmed. When -fno-tree-bit-ccp is turned off, the prop of the unalignedness does not happen.