http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021



--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-13 
15:58:31 UTC ---

The following testcase shows the issue well:



_Complex double self[1024];

_Complex double a[1024][1024];

_Complex double b[1024];



void foo (void)

{

  int i, j;

  for (i = 0; i < 1024; i+=3)

    for (j = 0; j < 1024; j+=3)

      self[i] = self[i] + a[i][j]*b[j];

}



we have to get the complex multiplication pattern recognized by SLP

which looks like (without PRE):



  <bb 3>:



  <bb 4>:

  # j_21 = PHI <j_13(3), 0(7)>

  # self_I_RE_lsm.2_12 = PHI <_26(3), self_I_RE_lsm.2_7(7)>

  # self_I_IM_lsm.3_28 = PHI <_27(3), self_I_IM_lsm.3_8(7)>

  # ivtmp_16 = PHI <ivtmp_1(3), 342(7)>

  _2 = REALPART_EXPR <a[i_20][j_21]>;

  _18 = IMAGPART_EXPR <a[i_20][j_21]>;

  _19 = REALPART_EXPR <b[j_21]>;

  _17 = IMAGPART_EXPR <b[j_21]>;

  _4 = _19 * _2;

  _3 = _18 * _17;

  _6 = _17 * _2;

  _23 = _19 * _18;

  _24 = _4 - _3;

  _25 = _23 + _6;

  _26 = _24 + self_I_RE_lsm.2_12;

  _27 = _25 + self_I_IM_lsm.3_28;

  j_13 = j_21 + 3;

  ivtmp_1 = ivtmp_16 - 1;

  if (ivtmp_1 != 0)

    goto <bb 3>;



we fail to build the SLP tree for _25 = _23 + _6 because the matching

stmt is _24 = _4 - _3 which has a different operation (SSE4 addsub

would support vectorizing this).  I don't see how we can easily

make this supported with the current pattern support ... the

support doesn't allow tieing together two SLP group members.

Simply allowing analysis to proceeed here reveals the fact that

the interleaving has a gap of 6 which makes the analysis fail.

Allowing it to proceed for ncopies == 1 (thus no actual interleaving

required) reveals the next check is slightly bogus in that case.

Fixing that ends us with



t.c:9: note: Load permutation 0 0 1 0 1 1 0 1

t.c:9: note: Build SLP failed: unsupported load permutation _27 = _25 +

self_I_IM_lsm.3_28;



... (to be continued)

Reply via email to