addition with promotion

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 18 Jan 2024 00:10:13 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113458


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #2)
> > But if we reduce n to 4, the loop based vectorizer is not able to handle it
> > either.
> 
> Do we support 1 element vector(i.e V1SI) in vectorizer?

Yes, but I'm not sure we'd try it here.

For SVE with -msve-vector-bits=128 we fail to elide the load permutation
thoug it looks odd:

t.c:7:13: missed:   unsupported vect permute { 3 2 0 1 7 6 4 5 11 10 8 9 }
t.c:7:13: missed:   unsupported load permutation

the SLP tree is

t.c:7:13: note:   Final SLP tree for instance 0x45933a0:
t.c:7:13: note:   node 0x45b82a0 (max_nunits=4, refcnt=2) vector(4) int
t.c:7:13: note:   op template: patt_38 = _16 w* patt_36;
t.c:7:13: note:         stmt 0 patt_38 = _16 w* patt_36;
t.c:7:13: note:         stmt 1 patt_35 = _11 w* patt_33;
t.c:7:13: note:         stmt 2 patt_29 = _1 w* patt_27;
t.c:7:13: note:         stmt 3 patt_32 = _6 w* patt_30;
t.c:7:13: note:         children 0x45b8330 0x45b83c0
t.c:7:13: note:   node 0x45b8330 (max_nunits=4, refcnt=2) vector(4) short int
t.c:7:13: note:   op template: _16 = MEM[(short int *)a_22(D) + 6B];
t.c:7:13: note:         stmt 0 _16 = MEM[(short int *)a_22(D) + 6B];
t.c:7:13: note:         stmt 1 _11 = MEM[(short int *)a_22(D) + 4B];
t.c:7:13: note:         stmt 2 _1 = *a_22(D);
t.c:7:13: note:         stmt 3 _6 = MEM[(short int *)a_22(D) + 2B];
t.c:7:13: note:         load permutation { 3 2 0 1 }
t.c:7:13: note:   node 0x45b83c0 (max_nunits=4, refcnt=2) vector(4) signed
short
t.c:7:13: note:   op template: patt_36 = (signed short) _18;
t.c:7:13: note:         stmt 0 patt_36 = (signed short) _18;
t.c:7:13: note:         stmt 1 patt_33 = (signed short) _13;
t.c:7:13: note:         stmt 2 patt_27 = (signed short) _3;
t.c:7:13: note:         stmt 3 patt_30 = (signed short) _8;
t.c:7:13: note:         children 0x45b8450
t.c:7:13: note:   node 0x45b8450 (max_nunits=4, refcnt=2) vector(4) signed char
t.c:7:13: note:   op template: _18 = MEM[(signed char *)b_23(D) + 3B];
t.c:7:13: note:         stmt 0 _18 = MEM[(signed char *)b_23(D) + 3B];
t.c:7:13: note:         stmt 1 _13 = MEM[(signed char *)b_23(D) + 2B];
t.c:7:13: note:         stmt 2 _3 = *b_23(D);
t.c:7:13: note:         stmt 3 _8 = MEM[(signed char *)b_23(D) + 1B];
t.c:7:13: note:         load permutation { 3 2 0 1 }

it looks like NEON doesn't have integer vectors(!?)

[Bug tree-optimization/113458] Missed SLP for reduction of multiplication/addition with promotion

Reply via email to