Toon Moene wrote:
> Paolo Bonzini wrote:
> 
>>> Attached you'll find the (preprocessed) source of the routine that
>>> printed the Infinity's (of course, I cannot be completely certain that
>>> it actually resulted in the wrong code, but at least it might be studied
>>> to see if it helps to find the culprit).
>>
>> No, this function is sane (the peephole *is* called a lot by this
>> function, but all is in due order).  I looked at the dumps and assembly
>> for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected.
> 
> Yeah, it was probably too much to hope for.

No, you were right, and that's great.  -ffast-math makes a difference,
because it enables more vectorization.

It goes as this:

(insn 494 493 495 44 statin.f:703 (set (reg:SF 371)
        (vec_select:SF (reg:V4SF 367)
            (parallel [
                    (const_int 0 [0x0])
                ]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD
(reg:V4SF 367)
        (nil)))

registers 371 and 367 are coalesced into xmm0.  Then the vec_select is
split to just

(set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367]))

and these are indeed !=, but they have the same hard register number so
the peephole should not apply in this case.  Here is a minimized testcase:

subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp)
integer :: x,y
real pstratr(x,y),pconvecr(x,y),zhxy(x,y)
real ztmp(4)
do j = 1,y
  do i = 1,x-2
   zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j)
   ztstratr   = ztstratr   + pstratr(i,j)
   ztconvecr  = ztconvecr  + pconvecr(i,j)
   ztsenf     = ztsenf     + zhxy(i,j)
   ztlatf     = ztlatf     + zhxy(i,j)
   ztcldtop   = ztcldtop   + zhxy(i,j)
  enddo
enddo
ztmp(1)=zttotrainr
ztmp(2)=ztstratr
ztmp(3)=ztconvecr
ztmp(4)=ztsenf*ztlatf*ztcldtop
end

The following patch should fix it, you're welcome to run it through
HIRLAM.  I'm bootstrapping it in the meanwhile.

Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md     (revision 144464)
+++ gcc/config/i386/i386.md     (working copy)
@@ -20795,7 +20795,7 @@
                      [(match_dup 0)
                       (match_operand:SI 2 "memory_operand" "")]))
               (clobber (reg:CC FLAGS_REG))])]
-  "operands[0] != operands[1]
+  "!rtx_equal_p (operands[0], operands[1])
    && GENERAL_REGNO_P (REGNO (operands[0]))
    && GENERAL_REGNO_P (REGNO (operands[1]))"
   [(set (match_dup 0) (match_dup 4))
@@ -20811,7 +20811,7 @@
                    (match_operator 3 "commutative_operator"
                      [(match_dup 0)
                       (match_operand 2 "memory_operand" "")]))]
-  "operands[0] != operands[1]
+  "!rtx_equal_p (operands[0], operands[1])
    && ((MMX_REG_P (operands[0]) && MMX_REG_P (operands[1]))
        || (SSE_REG_P (operands[0]) && SSE_REG_P (operands[1])))"
   [(set (match_dup 0) (match_dup 2))

Paolo

Reply via email to