https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71488

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-06-13
          Component|target                      |middle-end
   Target Milestone|---                         |7.0
            Summary|Wrong code on GCC trunk     |[6/7 Regression] Wrong code
                   |with ivybridge and westmere |for vector comparisons with
                   |targets                     |ivybridge and westmere
                   |                            |targets
     Ever confirmed|0                           |1

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Following minimized case will show the problem:

--cut here--
int var_4 = 1;
long long var_9 = 0;

int main() {

  std::valarray<std::valarray<long long>> v10;

  v10.resize(1);
  v10[0].resize(4);

  for (int i = 0; i < 4; i++)
    v10[0][i] = ((var_9 == 0) > unsigned (var_4 == 0)) + (var_9 == 0);

  std::cout << v10[0][0] << "\n";
}
--cut here--

This test should be compiled with "-std=c++11 -O3 -march=westmere" to obtain
wrong result:

$ ./a.out
1

The correct result can be obtained by adding -fno-tree-vectorize to compile
flags:

./a.out
2

Looking at the asm dump, the problematic loop is:

.L22:
        movddup var_9(%rip), %xmm0
        pxor    %xmm1, %xmm1
(1)     pcmpeqq %xmm1, %xmm0
        salq    $63, %rax
        movdqa  .LC0(%rip), %xmm2
        sarq    $63, %rax
        movq    %rax, %xmm1
(2)     movdqa  %xmm0, %xmm3
        punpcklqdq      %xmm1, %xmm1
        pand    %xmm2, %xmm0
        shufps  $136, %xmm0, %xmm0
(3)     pcmpgtq %xmm1, %xmm3
        movdqa  %xmm3, %xmm1
        pand    %xmm2, %xmm1
        shufps  $136, %xmm1, %xmm1
        paddd   %xmm1, %xmm0
        pmovsxdq        %xmm0, %xmm1
        psrldq  $8, %xmm0
        pmovsxdq        %xmm0, %xmm0
        movups  %xmm1, (%rdx)
        movups  %xmm0, 16(%rdx)

At insn (1), vector (0xf...f,0xf...f) is generated as a result of comparison of
vector (var_9,var_9) with vector (0,0). However, this result goes through insn
(2) directly to insn (3) as its input argument. This is certainly wrong, the
result of the comparison should be masked with (0x0...1,0x0...1).

The problem already exists at RTL expand time. The corresponding insn sequence
is:

;; mask__3.59_48 = vect_cst__51 == { 0, 0 };

(insn 117 116 118 (set (reg:V2DI 179)
        (vec_duplicate:V2DI (reg:DI 108 [ var_9.0_50 ]))) crash.cpp:29 4210
{*vec_dupv2di}
     (nil))

(insn 118 117 119 (set (reg:V2DI 180)
        (const_vector:V2DI [
                (const_int 0 [0])
                (const_int 0 [0])
            ])) crash.cpp:29 -1
     (nil))

(insn 119 118 120 (set (reg:V2DI 181)
        (eq:V2DI (reg:V2DI 179)
            (reg:V2DI 180))) crash.cpp:29 -1
     (nil))

(insn 120 119 0 (set (reg:V2DI 106 [ mask__3.59 ])
        (reg:V2DI 181)) crash.cpp:29 -1
     (nil))

;; vect_patt_111.61_79 = VEC_COND_EXPR <mask__3.59_48 > vect_cst__63, { 1, 1 },
{ 0, 0 }>;

(insn 121 120 122 (set (reg:V2DI 182)
        (vec_duplicate:V2DI (reg:DI 117 [ _64 ]))) 4210 {*vec_dupv2di}
     (nil))

(insn 122 121 123 (set (reg:V2DI 183)
        (mem/u/c:V2DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [5  S16 A128]))
-1
     (expr_list:REG_EQUAL (const_vector:V2DI [
                (const_int 1 [0x1])
                (const_int 1 [0x1])
            ])
        (nil)))

(insn 123 122 124 (set (reg:V2DI 184)
        (gt:V2DI (reg:V2DI 106 [ mask__3.59 ])
            (reg:V2DI 182))) -1
     (nil))

(insn 124 123 0 (set (reg:V2DI 119 [ vect_patt_111.61 ])
        (and:V2DI (reg:V2DI 184)
            (reg:V2DI 183))) -1
     (nil))

Please note how the result of comparison from (insn 119) enters directly a
foolow up comparison (insn 123). It looks to me that (insn 120) needs to be AND
insn, as is the case with comparison (insn 123) and its corresponding (insn
124).

Confirmed as a middle-end problem.

Reply via email to