5 Regression] Vectorization miscompilation with -mcpu=power7

jakub at gcc dot gnu.org Tue, 23 Sep 2014 05:41:34 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63341


--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Another testcase:
typedef union U { unsigned short s; unsigned char c; } __attribute__((packed))
U;
struct S { char e __attribute__((aligned (16))); U s[32]; };
struct S t = {0, {{0x5010}, {0x5111}, {0x5212}, {0x5313}, {0x5414}, {0x5515},
{0x5616}, {0x5717},
          {0x5818}, {0x5919}, {0x5a1a}, {0x5b1b}, {0x5c1c}, {0x5d1d}, {0x5e1e},
{0x5f1f},
          {0x6020}, {0x6121}, {0x6222}, {0x6323}, {0x6424}, {0x6525}, {0x6626},
{0x6727},
          {0x6828}, {0x6929}, {0x6a2a}, {0x6b2b}, {0x6c2c}, {0x6d2d}, {0x6e2e},
{0x6f2f}}};
unsigned short d[32];

int
main ()
{
  int i;
  for (i = 0; i < 32; i++)
    d[i] = t.s[i].s + 4;
  for (i = 0; i < 32; i++)
    if (d[i] != t.s[i].s + 4)
      __builtin_abort ();
    else
      asm volatile ("" : : : "memory");
  return 0;
}

which fails similarly.  For both testcases, if I manually change the addi
9,10,15 instruction to addi 9,10,16, it passes.
That instruction corresponds to the *.vect created:
vectp_t.9_3 = &MEM[(void *)&t + 15B];
and I change it to
vectp_t.9_3 = &MEM[(void *)&t + 16B];
Here is what the vectorizer emits for this second testcase:
  <bb 2>:
  vectp_t.5_19 = &MEM[(void *)&t + 1B];
  vectp_t.5_5 = vectp_t.5_19 & 4294967280B;
  vect__7.3_4 = MEM[(short unsigned int *)vectp_t.5_5];
  vect__7.6_2 = __builtin_altivec_mask_for_load (vectp_t.5_19);
  vectp_t.9_3 = &MEM[(void *)&t + 15B];
  vect_cst_.13_33 = { 4, 4, 4, 4, 4, 4, 4, 4 };
  vectp_d.15_35 = &d;

  <bb 3>:
  # i_24 = PHI <i_10(4), 0(2)>
  # ivtmp_21 = PHI <ivtmp_20(4), 32(2)>
  # vect__7.7_1 = PHI <vect__7.10_31(4), vect__7.3_4(2)>
  # vectp_t.8_28 = PHI <vectp_t.8_29(4), vectp_t.9_3(2)>
  # vectp_d.14_36 = PHI <vectp_d.14_37(4), vectp_d.15_35(2)>
  # ivtmp_9 = PHI <ivtmp_39(4), 0(2)>
  vectp_t.8_30 = vectp_t.8_28 & 4294967280B;
  vect__7.10_31 = MEM[(short unsigned int *)vectp_t.8_30];
  vect__7.11_32 = REALIGN_LOAD <vect__7.7_1, vect__7.10_31, vect__7.6_2>;
  _7 = t.s[i_24].s;
  vect__8.12_34 = vect__7.11_32 + vect_cst_.13_33;
  _8 = _7 + 4;
  MEM[(short unsigned int *)vectp_d.14_36] = vect__8.12_34;
  i_10 = i_24 + 1;
  ivtmp_20 = ivtmp_21 - 1;
  vectp_t.8_29 = vectp_t.8_28 + 16;
  vectp_d.14_37 = vectp_d.14_36 + 16;
  ivtmp_39 = ivtmp_9 + 1;
  if (ivtmp_39 < 4)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

SSA_NAMEs _1, _31 aren't really used for anything but arguments for
REALIGN_LOAD, so as long as the targets that support this (seems only rs6000
and spu)
handle the misaligned units fine (it is about whether
__builtin_altivec_mask_for_load computes the right mask for the permutations),
I'd say just fixing up the offset should be all that is needed.
Note that at least two of the three negative == true cases I saw on one x86_64
testcase use negative offset and depend on it not to have the low bits set
(so offset -7 must become -14 for V8HImode and not -15).
So, at least as a hack, adding step - 1 to offset in
vect_create_addr_base_for_vector_ref if offset is non-NULL and positive might
DTRT (because in all the 3 negative cases offset will be negative).
But perhaps cleaner will be a bool flag that the offset is already in bytes or
something similar.

[Bug tree-optimization/63341] [4.8/4.9/5 Regression] Vectorization miscompilation with -mcpu=power7

Reply via email to