https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848
Bug ID: 77848 Summary: Gimple if-conversion results in redundant comparisons Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-unknown-linux-gnu Gimple if-conversion is aggressive about converting PHIs to conditional expressions. When these expressions are not vectorized, they remain in conditional form throughout the middle end phases. Sometimes such conditionals do not correspond to any target instructions, so they must be re-expanded to branching logic. When this happens, and several conditionals have the same condition, GCC doesn't manage to combine the redundant conditions (at least, not always). I suspect that if such unusable conditionals were converted back to branching logic after failed vectorization, jump threading would be able to pick up the pieces and generate good code again, but I'm not certain. As an example, on powerpc64le-linux, consider this Fortran code: $ gfortran -S -O3 -mcpu=power8 -mtune=power8 -funroll-loops -ffast-math -mrecip=all d138.f subroutine sub(x,a,n,m) implicit none real*8 x(*),a(*),atemp integer i,j,k,m,n real*8 s,t,u,v do j=1,m atemp=0.d0 do i=1,n if (abs(a(i)).gt.atemp) then atemp=a(i) k = i end if enddo call dummy(atemp,k) enddo return end Prior to if-conversion, we have: <bb 7>: # i_29 = PHI <i_20(10), 1(6)> # atemp_lsm.3_7 = PHI <atemp_lsm.3_9(10), 0.0(6)> # atemp_lsm.4_6 = PHI <atemp_lsm.4_28(10), 0(6)> # k_lsm.5_27 = PHI <k_lsm.5_26(10), k_lsm.5_38(6)> _1 = (integer(kind=8)) i_29; _2 = _1 + -1; _3 = *a_17(D)[_2]; _4 = ABS_EXPR <_3>; if (_4 > atemp_lsm.3_7) goto <bb 8>; else goto <bb 9>; <bb 8>: <bb 9>: # atemp_lsm.3_9 = PHI <atemp_lsm.3_7(7), _3(8)> # atemp_lsm.4_28 = PHI <atemp_lsm.4_6(7), 1(8)> # k_lsm.5_26 = PHI <k_lsm.5_27(7), i_29(8)> i_20 = i_29 + 1; if (_16 < i_20) goto <bb 11>; else goto <bb 10>; Following if-conversion, the PHIs in <bb 9> have been converted into conditional expressions in <bb 7>: <bb 7>: # i_29 = PHI <i_20(8), 1(6)> # atemp_lsm.3_7 = PHI <atemp_lsm.3_9(8), 0.0(6)> # atemp_lsm.4_6 = PHI <atemp_lsm.4_28(8), 0(6)> # k_lsm.5_27 = PHI <k_lsm.5_26(8), k_lsm.5_38(6)> _1 = (integer(kind=8)) i_29; _2 = _1 + -1; _3 = *a_17(D)[_2]; _4 = ABS_EXPR <_3>; atemp_lsm.3_9 = _4 > atemp_lsm.3_7 ? _3 : atemp_lsm.3_7; atemp_lsm.4_28 = _4 > atemp_lsm.3_7 ? 1 : atemp_lsm.4_6; k_lsm.5_26 = _4 > atemp_lsm.3_7 ? i_29 : k_lsm.5_27; i_20 = i_29 + 1; if (_16 < i_20) goto <bb 9>; else goto <bb 8>; Types of the vars in the converted expressions are: integer(kind=4) k_lsm.5; logical(kind=4) atemp_lsm.4; real(kind=8) atemp_lsm.3; The vectorizer is unable to vectorize the loop (unsupported pattern), so these conditionals stay in place until expand time. The first of these corresponds to a floating-point select statement, so it is fine. But the other two perform floating-point comparisons to select between either integer or logical values, and there is no such instruction for POWER. The resulting code is (one iteration of an unrolled loop): .L20: addi 8,3,1 extsw 10,10 extsw 3,8 addi 4,4,8 .L42: lfd 2,0(4) fabs 3,2 fcmpu 7,3,6 fsub 4,6,3 fsel 5,4,6,2 ble 7,.L23 li 9,1 .L23: fcmpu 0,3,6 rldicl 9,9,0,32 ble 0,.L24 mr 10,3 .L24: We didn't use to if-convert these prior to r235436 (https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/tree-if-conv.c?r1=235436&r2=235435&pathrev=235436). Using GCC 6.2, we see the following preferable code: .L46: addi 10,10,1 addi 8,8,8 extsw 10,10 .L37: lfd 3,0(8) fabs 4,3 fcmpu 5,4,0 ble 5,.L47 fmr 12,3 fmr 0,3 mr 3,10 li 4,1 li 6,1 .L47: The added if-conversion causes approximately 30% degradation in performance. (I am not specifically blaming r235436; this just exposed the problem for this particular case.)