[Bug tree-optimization/99656] [11 Regression] ICE in linear_loads_p

2021-03-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.  Thanks Tamar.

[Bug tree-optimization/99656] [11 Regression] ICE in linear_loads_p

2021-03-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:c3a2bc6daaa2d278cb5f323e2df4b8c2af4198ac

commit r11-7736-gc3a2bc6daaa2d278cb5f323e2df4b8c2af4198ac
Author: Tamar Christina 
Date:   Fri Mar 19 14:29:36 2021 +

slp: remove unneeded permute calculation (PR99656)

The attach testcase ICEs because as you showed on the PR we have one child
which is an internal with a PERM of EVENEVEN and one with TOP.

The problem is while we can conceptually merge the permute itself into
EVENEVEN,
merging the lanes don't really make sense.

That said, we no longer even require the merged lanes as we create the
permutes
based on the KIND directly.

This patch just removes all of that code.

Unfortunately it still won't vectorize with the cost model enabled due to
the
blend that's created combining the load and the external

note: node 0x51f2ce8 (max_nunits=1, refcnt=1)
note: op: VEC_PERM_EXPR
note:   { }
note:   lane permutation { 0[0] 1[1] }
note:   children 0x51f23e0 0x51f2578
note: node 0x51f23e0 (max_nunits=2, refcnt=1)
note: op template: _16 = REALPART_EXPR <*t1_9(D)>;
note:   stmt 0 _16 = REALPART_EXPR <*t1_9(D)>;
note:   stmt 1 _16 = REALPART_EXPR <*t1_9(D)>;
note:   load permutation { 0 0 }
note: node (external) 0x51f2578 (max_nunits=1, refcnt=1)
note:   { _18, _18 }

which costs the cost for the load-and-split and the cost of the external
splat,
and the one for blending them while in reality it's just a scalar load and
insert.

The compiler (with the cost model disabled) generates

ldr q1, [x19]
dup v1.2d, v1.d[0]
ldr d0, [x19, 8]
fnegd0, d0
ins v1.d[1], v0.d[0]

while really it should be

ldp d1, d0, [x19]
fnegd0, d0
ins v1.d[1], v0.d[0]

but that's for another time.

gcc/ChangeLog:

PR tree-optimization/99656
* tree-vect-slp-patterns.c (linear_loads_p,
complex_add_pattern::matches, is_eq_or_top,
vect_validate_multiplication, complex_mul_pattern::matches,
complex_fms_pattern::matches): Remove complex_perm_kinds_t.
* tree-vectorizer.h: (complex_load_perm_t): Removed.
(slp_tree_to_load_perm_map_t): Use complex_perm_kinds_t instead of
complex_load_perm_t.

gcc/testsuite/ChangeLog:

PR tree-optimization/99656
* gfortran.dg/vect/pr99656.f90: New test.

[Bug tree-optimization/99656] [11 Regression] ICE in linear_loads_p

2021-03-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Started to ICE with r11-6734-gad2603433853129e847cade5e269c6a5f889a020

[Bug tree-optimization/99656] [11 Regression] ICE in linear_loads_p

2021-03-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #2 from Tamar Christina  ---
I'll take a look, thanks!

[Bug tree-optimization/99656] [11 Regression] ICE in linear_loads_p

2021-03-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
Summary|ICE in linear_loads_p   |[11 Regression] ICE in
   ||linear_loads_p
 Ever confirmed|0   |1
 CC||rguenth at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
   Target Milestone|--- |11.0
   Last reconfirmed||2021-03-19
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
Confirmed.  -fwrapv is not needed

#2  0x0183e03e in linear_loads_p (perm_cache=0x7fffd080, 
root=0x355a838) at ../../src/trunk/gcc/tree-vect-slp-patterns.c:258
258 nloads[i] = all_loads[perm[i].first][perm[i].second];
(gdb) p debug(root)
t.f90:1:24: note: node 0x355a838 (max_nunits=1, refcnt=1)
t.f90:1:24: note: op: VEC_PERM_EXPR
t.f90:1:24: note:   { }
t.f90:1:24: note:   lane permutation { 0[0] 1[1] }
t.f90:1:24: note:   children 0x3559f30 0x355a0c8
$1 = void

(gdb) p debug ((slp_tree)0x3559f30)
t.f90:1:24: note: node 0x3559f30 (max_nunits=2, refcnt=4)
t.f90:1:24: note: op template: _16 = REALPART_EXPR <*t1_9(D)>;
t.f90:1:24: note:   stmt 0 _16 = REALPART_EXPR <*t1_9(D)>;
t.f90:1:24: note:   stmt 1 _16 = REALPART_EXPR <*t1_9(D)>;
t.f90:1:24: note:   load permutation { 0 0 }
$8 = void
(gdb) p debug ((slp_tree)0x355a0c8)
t.f90:1:24: note: node (external) 0x355a0c8 (max_nunits=1, refcnt=2)
t.f90:1:24: note:   { _18, _18 }
$9 = void

so one child doesn't have a load permutation.  Looks like we merge
PERM_TOP and PERM_EVENEVEN as PERM_EVENEVEN but for the PERM_TOP
child we then record NULL into all_loads.

Tamar?