https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110935
Bug ID: 110935 Summary: Missed BB reduction vectorization because of missed eliding of a permute Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- double vals[16]; double test () { vals[0]++; return vals[2] + vals[4] + vals[1] + vals[3]; } has the reduction not vectorized with -ffast-math because t.c:5:38: note: === vect_slp_analyze_operations === t.c:5:38: note: ==> examining statement: _8 = vals[3]; t.c:5:38: missed: BB vectorization with gaps at the end of a load is not supported t.c:5:44: missed: not vectorized: relevant stmt not supported: _8 = vals[3]; t.c:5:38: note: removing SLP instance operations starting from: _11 = _7 + _8; t.c:5:38: missed: not vectorized: bad operation in basic block. we fail to elide the load permutation (BB vect allows a consecutive sub-set): t.c:5:38: note: Final SLP tree for instance 0x51c8d60: t.c:5:38: note: node 0x5285860 (max_nunits=2, refcnt=2) vector(2) double t.c:5:38: note: op template: _8 = vals[3]; t.c:5:38: note: stmt 0 _8 = vals[3]; t.c:5:38: note: stmt 1 _6 = vals[1]; t.c:5:38: note: stmt 2 _3 = vals[2]; t.c:5:38: note: stmt 3 _4 = vals[4]; t.c:5:38: note: load permutation { 3 1 2 4 } t.c:5:38: note: === vect_match_slp_patterns === t.c:5:38: note: Analyzing SLP tree 0x5285860 for patterns t.c:5:38: note: SLP optimize permutations: t.c:5:38: note: 1: { 2, 0, 1, 3 } t.c:5:38: note: SLP optimize partitions: t.c:5:38: note: ------------- t.c:5:38: note: partition 0 (layout 0): t.c:5:38: note: nodes: t.c:5:38: note: - 0x5285860: t.c:5:38: note: weight: 1.000000 t.c:5:38: note: op template: _8 = vals[3]; t.c:5:38: note: edges: t.c:5:38: note: layout 0: (*) t.c:5:38: note: {depth: 0.000000, total: 0.000000} t.c:5:38: note: + {depth: 1.000000, total: 1.000000} t.c:5:38: note: + {depth: 0.000000, total: 0.000000} t.c:5:38: note: = {depth: 1.000000, total: 1.000000} t.c:5:38: note: layout 1: t.c:5:38: note: {depth: 0.000000, total: 0.000000} t.c:5:38: note: + {depth: 1.000000, total: 1.000000} t.c:5:38: note: + {depth: 0.000000, total: 0.000000} t.c:5:38: note: = {depth: 1.000000, total: 1.000000} t.c:5:38: note: recording new base alignment for &vals alignment: 32 misalignment: 0 based on: _1 = vals[0]; t.c:5:38: note: === vect_slp_analyze_instance_alignment ===