https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I can see we don't recognize a gather IFN during pattern recog here.

t.c:15:1: note:   Final SLP tree for instance 0x502e9a0:
t.c:15:1: note:   node 0x4f84700 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note:   op template: *_10 = _11;
t.c:15:1: note:         stmt 0 *_10 = _11;
t.c:15:1: note:         stmt 1 *_20 = _21;
t.c:15:1: note:         children 0x4f84790
t.c:15:1: note:   node 0x4f84790 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note:   op template: _11 = _8 + 1.0e+0;
t.c:15:1: note:         stmt 0 _11 = _8 + 1.0e+0;
t.c:15:1: note:         stmt 1 _21 = _18 + 2.0e+0;
t.c:15:1: note:         children 0x4f84820 0x4f84940
t.c:15:1: note:   node 0x4f84820 (max_nunits=128, refcnt=2) vector(32) float
t.c:15:1: note:   op template: _8 = *_7;
t.c:15:1: note:         stmt 0 _8 = *_7;
t.c:15:1: note:         stmt 1 _18 = *_17;
t.c:15:1: note:         children 0x4f848b0
t.c:15:1: note:   node 0x4f848b0 (max_nunits=128, refcnt=2) vector(128)
unsigned char
t.c:15:1: note:   op template: _4 = *_3;
t.c:15:1: note:         stmt 0 _4 = *_3;
t.c:15:1: note:         stmt 1 _14 = *_13;
t.c:15:1: note:         load permutation { 0 1 }
t.c:15:1: note:   node (constant) 0x4f84940 (max_nunits=1, refcnt=1)
t.c:15:1: note:         { 1.0e+0, 2.0e+0 }
t.c:15:1: note:    === vect_match_slp_patterns ===
t.c:15:1: note:    Analyzing SLP tree 0x4f84700 for patterns
t.c:15:1: note:   === vect_make_slp_decision ===
t.c:15:1: note:   Decided to SLP 1 instances. Unrolling factor 64

it tries a few other modes, one even having .MASK_LEN_GATHER_LOAD but that
fails to build SLP.  In the end we choose

t.c:15:1: note:  ***** Choosing vector mode RVVM4QI
t.c:15:1: note:  ***** Choosing epilogue vector mode RVVMF4QI

the main loop instance is

t.c:15:1: note:   Vectorizing SLP tree:
t.c:15:1: note:   node 0x4f849d0 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note:   op template: *_10 = _11;
t.c:15:1: note:         stmt 0 *_10 = _11;
t.c:15:1: note:         stmt 1 *_20 = _21;
t.c:15:1: note:         children 0x4f84a60
t.c:15:1: note:   node 0x4f84a60 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note:   op template: _11 = _8 + 1.0e+0;
t.c:15:1: note:         stmt 0 _11 = _8 + 1.0e+0;
t.c:15:1: note:         stmt 1 _21 = _18 + 2.0e+0;
t.c:15:1: note:         children 0x4f84af0 0x4f84c10
t.c:15:1: note:   node 0x4f84af0 (max_nunits=64, refcnt=1) vector(32) float
t.c:15:1: note:   op template: _8 = *_7;
t.c:15:1: note:         stmt 0 _8 = *_7;
t.c:15:1: note:         stmt 1 _18 = *_17;
t.c:15:1: note:         children 0x4f84b80
t.c:15:1: note:   node 0x4f84b80 (max_nunits=64, refcnt=1) vector(64) unsigned
char
t.c:15:1: note:   op template: _4 = *_3;
t.c:15:1: note:         stmt 0 _4 = *_3;
t.c:15:1: note:         stmt 1 _14 = *_13;
t.c:15:1: note:   node (constant) 0x4f84c10 (max_nunits=1, refcnt=1) vector(32)
float
t.c:15:1: note:         { 1.0e+0, 2.0e+0 }

so the main loop uses emulated gather but the epilog uses non-SLP but
gathers here.

  # vectp_index.6_209 = PHI <vectp_index.6_210(5), index_25(D)(2)>
  # vectp_y.12_601 = PHI <vectp_y.12_602(5), y_27(D)(2)>
  vect__4.8_211 = MEM <vector(64) unsigned char> [(uint8_t
*)vectp_index.6_209];
...
  MEM <vector(32) float> [(float *)vectp_y.12_601] = vect__11.11_599;
  vectp_y.12_604 = vectp_y.12_601 + 128;
  MEM <vector(32) float> [(float *)vectp_y.12_604] = vect__11.11_599;
...
  vectp_index.6_210 = vectp_index.6_209 + 64;
  vectp_y.12_602 = vectp_y.12_604 + 128;
  ivtmp_607 = ivtmp_606 + 1;
  if (ivtmp_607 < 3)

that IV updates look OK to me.

So not sure what to do?  Does the testcase execute correctly with
--param vect-epilogues-nomask=0 ?

Reply via email to