https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87064

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dje at gcc dot gnu.org,
                   |                            |meissner at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org
          Component|libgomp                     |target

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems to be a powerpc64le backend bug or RA bug.
Reduced testcase for -fopenacc -O1:
program reduction_3
  implicit none
  integer, parameter    :: n = 10, vl = 32
  integer               :: i
  double precision      :: vresult, rv
  double precision, parameter :: e = 0.001
  double precision, dimension (n) :: array
  do i = 1, n
     array(i) = i
  end do
  rv = 0
  vresult = 0
  !$acc parallel vector_length(vl) copy(rv)
  !$acc loop reduction(max:rv) vector
  do i = 1, n
     rv = max (rv, array(i))
  end do
  !$acc end parallel
  do i = 1, n
     vresult = max (vresult, array(i))
  end do
  if (abs (rv - vresult) .ge. e) STOP 11
end program reduction_3

In *.optimized it looks all correct:
  <bb 3> [local count: 437450368]:
  # vect_M.23_45 = PHI <vect_cst__39(2), vect_M.27_34(3)>
  # ivtmp.34_3 = PHI <ivtmp.34_43(2), ivtmp.34_4(3)>
  _2 = (void *) ivtmp.34_3;
  vect__28.26_44 = MEM[base: _2, offset: 0B];
  vect_M.27_34 = MAX_EXPR <vect__28.26_44, vect_M.23_45>;
  ivtmp.34_4 = ivtmp.34_3 + 16;
  if (ivtmp.34_4 != _25)
    goto <bb 3>; [80.00%]
  else
    goto <bb 4>; [20.00%]

  <bb 4> [local count: 437450371]:
  stmp_M.28_8 = .REDUC_MAX (vect_M.27_34);
  *_10 = stmp_M.28_8;
and the loop indeed iterates properly and we end up with { 10.0, 9.0 } vector
which REDUC_MAX ifn should reduce to 10.0.
During early RTL opts it also looks correct:
(insn 20 19 21 4 (parallel [
            (set (reg:V2DF 134)
                (smax:V2DF (vec_concat:V2DF (vec_select:DF (reg:V2DF 128 [
vect_M.23 ])
                            (parallel [
                                    (const_int 1 [0x1])
                                ]))
                        (vec_select:DF (reg:V2DF 128 [ vect_M.23 ])
                            (parallel [
                                    (const_int 0 [0])
                                ])))
                    (reg:V2DF 128 [ vect_M.23 ])))
            (clobber (scratch:V2DF))
        ]) 1330 {vsx_reduc_smax_v2df}
     (nil))
(insn 21 20 22 4 (set (reg:DF 123 [ stmp_M.28 ])
        (vec_select:DF (reg:V2DF 134)
            (parallel [
                    (const_int 0 [0])
                ]))) 1219 {vsx_extract_v2df}
     (nil))
Then combine turns that into:
(insn 21 20 22 4 (parallel [
            (set (reg:DF 123 [ stmp_M.28 ])
                (vec_select:DF (smax:V2DF (vec_concat:V2DF (vec_select:DF
(reg:V2DF 128 [ vect_M.23 ])
                                (parallel [
                                        (const_int 1 [0x1])
                                    ]))
                            (vec_select:DF (reg:V2DF 128 [ vect_M.23 ])
                                (parallel [
                                        (const_int 0 [0])
                                    ])))
                        (reg:V2DF 128 [ vect_M.23 ]))
                    (parallel [
                            (const_int 1 [0x1])
                        ])))
            (clobber (scratch:DF))
        ]) 1336 {*vsx_reduc_smax_v2df_scalar}
     (expr_list:REG_DEAD (reg:V2DF 128 [ vect_M.23 ])
        (nil)))
That is then split into:
(insn 34 20 35 4 (set (reg:DF 137)
        (vec_select:DF (reg:V2DF 128 [ vect_M.23 ])
            (parallel [
                    (const_int 1 [0x1])
                ]))) -1
     (nil))
(insn 35 34 22 4 (set (reg:DF 123 [ stmp_M.28 ])
        (smax:DF (subreg:DF (reg:V2DF 128 [ vect_M.23 ]) 8)
            (reg:DF 137))) -1
     (nil))
at which point I'm already not sure if it is correct or not.  As I said, at
least
in the debugger it shows that the input to this .REDUC_MAX contains the value {
10, 9 }
is the vec_select extracting the second elt (i.e. 9.0) and (subreg 8) also the
second one?
In the end, that is what happens, the resulting assembly is:
   0x000000001000086c <+32>:    lxvd2x  vs0,0,r9
   0x0000000010000870 <+36>:    addi    r8,r1,-16
   0x0000000010000874 <+40>:    lxvd2x  vs12,0,r8
   0x0000000010000878 <+44>:    xxswapd vs12,vs12
   0x000000001000087c <+48>:    xvmaxdp vs0,vs12,vs0
   0x0000000010000880 <+52>:    xxswapd vs0,vs0
   0x0000000010000884 <+56>:    stxvd2x vs0,0,r8
   0x0000000010000888 <+60>:    xxswapd vs0,vs0
   0x000000001000088c <+64>:    addi    r9,r9,16
   0x0000000010000890 <+68>:    bdnz    0x1000086c <MAIN__._omp_fn.0+32>
=> 0x0000000010000894 <+72>:    lfd     f12,-8(r1)
   0x0000000010000898 <+76>:    xsmaxdp vs0,vs12,vs0
   0x000000001000089c <+80>:    stfd    f0,0(r10)
   0x00000000100008a0 <+84>:    blr
and at that point
x/2fg $r1-16
0x3fffffffed90: 10      9
p $vs0.v2_double
$6 = {10, 9}
p $vs12.v2_double
$7 = {8, 7}
Now, the lfd loads into f12 the second element (i.e. 9), in the debugger it
shows
p $vs12.v2_double
$8 = {0, 9}
after the lfd insn, and xsmaxdp {10, 9}, {0, 9} gives {0, 9} and that is what
we store.
So, does vsx_reduc_smax_v2df_scalar expander need adjustments for
little-endian?

Reply via email to