https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111228

--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Peter Bergner from comment #1)
> Confirmed.  The testsuite log shows for vsx-extract-6.c and vsx-extract-7.c:
> 
> gcc.target/powerpc/vsx-extract-6.c: \\mxxpermdi\\M found 2 times
> FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-times \\mxxpermdi\\M
> 1
> FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-not \\mvspltisw\\M
> 
> So we have an extra xxpermdi than we expected and we also have a vspltisw
> when we expected none.  I haven't looked at whether the code is better or
> worse though, to know whether we should just update the expected counts or
> whether this is really a code quality regression.

The commit makes the vsx-extract-6.c end up with:

test_vpasted:
.LFB0:
        .cfi_startproc
        xxspltib 0,0
        xxpermdi 34,34,0,1
        xxpermdi 34,34,35,1
        blr

instead of (the original expected):

test_vpasted:
.LFB0:
        .cfi_startproc
        xxpermdi 34,34,35,1
        blr

I think it's a code quality regression. The optimized gimple IR is changed to:

__vector unsigned long long test_vpasted (__vector unsigned long long high,
__vector unsigned long long low)
{
  __vector unsigned long long res;

  <bb 2> [local count: 1073741824]:
  res_3 = VEC_PERM_EXPR <res_2(D), high_1(D), { 0, 3 }>;
  res_5 = VEC_PERM_EXPR <low_4(D), res_3, { 0, 3 }>;
  return res_5;

}

from:

__vector unsigned long long test_vpasted (__vector unsigned long long high,
__vector unsigned long long low)
{
  __vector unsigned long long res;
  long long unsigned int _1;
  long long unsigned int _2;

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <high_3(D), 64, 64>;
  res_5 = BIT_INSERT_EXPR <res_4(D), _1, 64 (64 bits)>;
  _2 = BIT_FIELD_REF <low_6(D), 64, 0>;
  res_7 = BIT_INSERT_EXPR <res_5, _2, 0 (64 bits)>;
  return res_7;

}

For gimple IRs:

  res_3 = VEC_PERM_EXPR <res_2(D), high_1(D), { 0, 3 }>;
  res_5 = VEC_PERM_EXPR <low_4(D), res_3, { 0, 3 }>;

I'd expect it can be further optimized into

  res_5 = VEC_PERM_EXPR <low_4(D), high_1(D), { 0, 3 }>;

Reply via email to