https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenther at suse dot de
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110062

--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The only difference between slp vectorization is:

-  # _68 = PHI <_5(3)>
-  # _67 = PHI <_11(3)>
-  # _66 = PHI <_16(3)>
-  <retval>.r = _68;
-  <retval>.g = _67;
-  <retval>.b = _66;
+  # _70 = PHI <_5(3)>
+  # _69 = PHI <_11(3)>
+  # _68 = PHI <_16(3)>
+  <retval>.r = _70;
+  <retval>.g = _69;
+  <retval>.b = _68;
+  <retval>.o = r$o_33(D);

so SRA invents r$o_33(D) even if that variable is undefined.

SLP vectorizer then sees it as interleaving stores:

-t.c:19:16: note:       _1 = rgbs[i_35].r;
-t.c:19:16: note:       _7 = rgbs[i_35].g;
-t.c:19:16: note:       _12 = rgbs[i_35].b;
-t.c:19:16: note:   Detected interleaving store of size 3
-t.c:19:16: note:       <retval>.r = _68;
-t.c:19:16: note:       <retval>.g = _67;
-t.c:19:16: note:       <retval>.b = _66;
+t.c:19:16: note:       _1 = rgbs[i_37].r;
+t.c:19:16: note:       _7 = rgbs[i_37].g;
+t.c:19:16: note:       _12 = rgbs[i_37].b;
+t.c:19:16: note:   Detected interleaving store of size 4
+t.c:19:16: note:       <retval>.r = _70;
+t.c:19:16: note:       <retval>.g = _69;
+t.c:19:16: note:       <retval>.b = _68;
+t.c:19:16: note:       <retval>.o = r$o_33(D);

For first case it first tries to vectorize for vector of 3 doubles and fails:

-t.c:19:16: note:     <retval>.r = _68;
-t.c:19:16: note:     <retval>.g = _67;
-t.c:19:16: note:     <retval>.b = _66;
-t.c:19:16: note:   starting SLP discovery for node 0x2cb4fe8
-t.c:19:16: note:   Build SLP for <retval>.r = _68;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   Build SLP for <retval>.g = _67;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   Build SLP for <retval>.b = _66;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   SLP discovery for node 0x2cb4fe8 failed

And later it tries to vectorize first 2 items:

-t.c:19:16: note:   Splitting SLP group at stmt 2
-t.c:19:16: note:   Split group into 2 and 1
-t.c:19:16: note:   Starting SLP discovery for
-t.c:19:16: note:     <retval>.r = _68;
-t.c:19:16: note:     <retval>.g = _67;
-t.c:19:16

... and after a lot of blablabla succeeds.

If opaque field is present we start with vector of size 4:
+t.c:19:16: note:     <retval>.r = _70;
+t.c:19:16: note:     <retval>.g = _69;
+t.c:19:16: note:     <retval>.b = _68;
+t.c:19:16: note:     <retval>.o = r$o_33(D);


+t.c:19:16: note:   vect_is_simple_use: operand _70 = PHI <_5(3)>, type of def:
internal
+t.c:19:16: note:   vect_is_simple_use: operand _69 = PHI <_11(3)>, type of
def: internal
+t.c:19:16: note:   vect_is_simple_use: operand _68 = PHI <_16(3)>, type of
def: internal
+t.c:19:16: note:   vect_is_simple_use: operand r$o_33(D), type of def:
external
+t.c:19:16: missed:   treating operand as external
+t.c:19:16: note:   SLP discovery for node 0x2e80058 succeeded
+t.c:19:16: note:   SLP size 1 vs. limit 23.
+t.c:19:16: note:   Final SLP tree for instance 0x2def840:
+t.c:19:16: note:   node 0x2e80058 (max_nunits=4, refcnt=2) vector(4) double
+t.c:19:16: note:   op template: <retval>.r = _70;
+t.c:19:16: note:       stmt 0 <retval>.r = _70;
+t.c:19:16: note:       stmt 1 <retval>.g = _69;
+t.c:19:16: note:       stmt 2 <retval>.b = _68;
+t.c:19:16: note:       stmt 3 <retval>.o = r$o_33(D);
+t.c:19:16: note:       children 0x2e800d8
+t.c:19:16: note:   node (external) 0x2e800d8 (max_nunits=1, refcnt=1)
+t.c:19:16: note:       { _70, _69, _68, r$o_33(D) }

So it seems to succeed vectorizing with 4 entries but it does so for the single
return statement:

  <bb 3> [local count: 1063004409]:
  # i_37 = PHI <i_22(5), 0(2)>
  # r$r_40 = PHI <_5(5), r$r_25(D)(2)>
  # r$g_42 = PHI <_11(5), r$g_26(D)(2)>
  # r$b_44 = PHI <_16(5), r$b_27(D)(2)>
  # ivtmp_67 = PHI <ivtmp_66(5), 10000000(2)>
  _1 = rgbs[i_37].r;
  _2 = (int) _1;
  _3 = (double) _2;
  _4 = _3 * w_21(D);
  _5 = _4 + r$r_40;
  _7 = rgbs[i_37].g;
  _8 = (int) _7;
  _9 = (double) _8;
  _10 = _9 * w_21(D);
  _11 = _10 + r$g_42;
  _12 = rgbs[i_37].b;
  _13 = (int) _12;
  _14 = (double) _13;
  _15 = _14 * w_21(D);
  _16 = _15 + r$b_44;
  i_22 = i_37 + 1;
  ivtmp_66 = ivtmp_67 - 1;
  if (ivtmp_66 != 0)
    goto <bb 5>; [99.00%]
  else
    goto <bb 4>; [1.00%]

  <bb 5> [local count: 1052374367]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 10737416]:
  # _70 = PHI <_5(3)>
  # _69 = PHI <_11(3)>
  # _68 = PHI <_16(3)>
  _65 = {_70, _69, _68, r$o_33(D)};
  MEM <vector(4) double> [(double *)&<retval>] = _65;

that seems somewhat pointless.
If one adds code initializing opacity field then vectorization works well. So
perhaps SLP vectorizer needs to be told how to deal with uninitialized
variabels that may be common in code like this after SRA?

Richi, it is not clear to me where SLP vectorizer discards the idea of
vectorizing the loop body in this case. But I think one needs to address:
+t.c:19:16: missed:   treating operand as external

I wonder if the loop would work faster it it used vectors of size 4 with the
last field unused.

Reply via email to