https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71488
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2016-06-13 Component|target |middle-end Target Milestone|--- |7.0 Summary|Wrong code on GCC trunk |[6/7 Regression] Wrong code |with ivybridge and westmere |for vector comparisons with |targets |ivybridge and westmere | |targets Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> --- Following minimized case will show the problem: --cut here-- int var_4 = 1; long long var_9 = 0; int main() { std::valarray<std::valarray<long long>> v10; v10.resize(1); v10[0].resize(4); for (int i = 0; i < 4; i++) v10[0][i] = ((var_9 == 0) > unsigned (var_4 == 0)) + (var_9 == 0); std::cout << v10[0][0] << "\n"; } --cut here-- This test should be compiled with "-std=c++11 -O3 -march=westmere" to obtain wrong result: $ ./a.out 1 The correct result can be obtained by adding -fno-tree-vectorize to compile flags: ./a.out 2 Looking at the asm dump, the problematic loop is: .L22: movddup var_9(%rip), %xmm0 pxor %xmm1, %xmm1 (1) pcmpeqq %xmm1, %xmm0 salq $63, %rax movdqa .LC0(%rip), %xmm2 sarq $63, %rax movq %rax, %xmm1 (2) movdqa %xmm0, %xmm3 punpcklqdq %xmm1, %xmm1 pand %xmm2, %xmm0 shufps $136, %xmm0, %xmm0 (3) pcmpgtq %xmm1, %xmm3 movdqa %xmm3, %xmm1 pand %xmm2, %xmm1 shufps $136, %xmm1, %xmm1 paddd %xmm1, %xmm0 pmovsxdq %xmm0, %xmm1 psrldq $8, %xmm0 pmovsxdq %xmm0, %xmm0 movups %xmm1, (%rdx) movups %xmm0, 16(%rdx) At insn (1), vector (0xf...f,0xf...f) is generated as a result of comparison of vector (var_9,var_9) with vector (0,0). However, this result goes through insn (2) directly to insn (3) as its input argument. This is certainly wrong, the result of the comparison should be masked with (0x0...1,0x0...1). The problem already exists at RTL expand time. The corresponding insn sequence is: ;; mask__3.59_48 = vect_cst__51 == { 0, 0 }; (insn 117 116 118 (set (reg:V2DI 179) (vec_duplicate:V2DI (reg:DI 108 [ var_9.0_50 ]))) crash.cpp:29 4210 {*vec_dupv2di} (nil)) (insn 118 117 119 (set (reg:V2DI 180) (const_vector:V2DI [ (const_int 0 [0]) (const_int 0 [0]) ])) crash.cpp:29 -1 (nil)) (insn 119 118 120 (set (reg:V2DI 181) (eq:V2DI (reg:V2DI 179) (reg:V2DI 180))) crash.cpp:29 -1 (nil)) (insn 120 119 0 (set (reg:V2DI 106 [ mask__3.59 ]) (reg:V2DI 181)) crash.cpp:29 -1 (nil)) ;; vect_patt_111.61_79 = VEC_COND_EXPR <mask__3.59_48 > vect_cst__63, { 1, 1 }, { 0, 0 }>; (insn 121 120 122 (set (reg:V2DI 182) (vec_duplicate:V2DI (reg:DI 117 [ _64 ]))) 4210 {*vec_dupv2di} (nil)) (insn 122 121 123 (set (reg:V2DI 183) (mem/u/c:V2DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [5 S16 A128])) -1 (expr_list:REG_EQUAL (const_vector:V2DI [ (const_int 1 [0x1]) (const_int 1 [0x1]) ]) (nil))) (insn 123 122 124 (set (reg:V2DI 184) (gt:V2DI (reg:V2DI 106 [ mask__3.59 ]) (reg:V2DI 182))) -1 (nil)) (insn 124 123 0 (set (reg:V2DI 119 [ vect_patt_111.61 ]) (and:V2DI (reg:V2DI 184) (reg:V2DI 183))) -1 (nil)) Please note how the result of comparison from (insn 119) enters directly a foolow up comparison (insn 123). It looks to me that (insn 120) needs to be AND insn, as is the case with comparison (insn 123) and its corresponding (insn 124). Confirmed as a middle-end problem.