Hi,
For following test-case, taken from pr88152.C:
#include <x86intrin.h>
template <typename T, size_t N>
using V [[gnu::vector_size(N)]] = T;
int f10 (V<unsigned long long, 16> a)
{
return _mm_movemask_pd (reinterpret_cast<__m128d> (a > __LONG_LONG_MAX__));
}
.optimized dump shows:
f10 (V a)
{
vector(2) signed long _1;
vector(2) long int _2;
vector(2) double _3;
int _6;
<bb 2> [local count: 1073741824]:
_1 = VIEW_CONVERT_EXPR<vector(2) signed long>(a_4(D));
_2 = VEC_COND_EXPR <_1 < { 0, 0 }, { -1, -1 }, { 0, 0 }>;
_3 = VIEW_CONVERT_EXPR<__m128d>(_2);
_6 = __builtin_ia32_movmskpd (_3); [tail call]
return _6;
}
IIUC, we're using -1 to represent true and 0 as false.
combine then does following combinations:
Trying 7 -> 9:
7: r90:V2DI=r89:V2DI>r93:V2DI
REG_DEAD r93:V2DI
REG_DEAD r89:V2DI
9: r91:V2DF=r90:V2DI#0
REG_DEAD r90:V2DI
Successfully matched this instruction:
(set (subreg:V2DI (reg:V2DF 91) 0)
(gt:V2DI (reg:V2DI 89)
(reg:V2DI 93)))
allowing combination of insns 7 and 9
Trying 6, 9 -> 10:
6: r89:V2DI=const_vector
9: r91:V2DF#0=r89:V2DI>r93:V2DI
REG_DEAD r89:V2DI
REG_DEAD r93:V2DI
10: r87:SI=unspec[r91:V2DF] 43
REG_DEAD r91:V2DF
Successfully matched this instruction:
(set (reg:SI 87)
(unspec:SI [
(lt:V2DF (reg:V2DI 93)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
]))
] UNSPEC_MOVMSK))
Is the above folding correct, since lt has V2DF mode,
and casting -1 (literally) to DFmode would result in -NaN ?
Also, should result of lt be having only integral modes ?
split2 then folds insn 10 into:
(insn 22 9 16 2 (set (reg:SI 0 ax [87])
(unspec:SI [
(reg:V2DF 20 xmm0 [93])
] UNSPEC_MOVMSK))
"../../stage1-build/gcc/include/emmintrin.h":958:34 4222
{sse2_movmskpd}
(nil))
deleting insn 10.
The issue is my patch for PR88833 results in following propagation in forwprop1:
In insn 10, replacing
(unspec:SI [
(reg:V2DF 91)
] UNSPEC_MOVMSK)
with (unspec:SI [
(subreg:V2DF (reg:V2DI 90) 0)
] UNSPEC_MOVMSK)
deleting insn 9 and this inhibits the above combinations,
resulting in failure of PR88152.C
With patch, combine shows:
Trying 7 -> 10:
7: r90:V2DI=r89:V2DI>r93:V2DI
REG_DEAD r93:V2DI
REG_DEAD r89:V2DI
10: r87:SI=unspec[r90:V2DI#0] 43
REG_DEAD r90:V2DI
Failed to match this instruction:
(set (reg:SI 87)
(unspec:SI [
(subreg:V2DF (gt:V2DI (reg:V2DI 89)
(reg:V2DI 93)) 0)
] UNSPEC_MOVMSK))
and subsequently fails to match 6, 7 -> 10
Patch:
http://people.linaro.org/~prathamesh.kulkarni/pr88833-10.diff
Upstream discussion about the issue:
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01651.html
Thanks,
Prathamesh