On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> Due to some reasons, tree-if-conv.c now factors floating point comparison >>>> out of cond_expr, >>>> resulting in mixed types in it. This does help CSE on common comparison >>>> operations. >>>> Only problem is that test gcc.dg/vect/pr56541.c now requires >>>> vect_cond_mixed to be >>>> vectorized. This patch changes the test in that way. >>>> Test result checked. Is it OK? >>> >>> Hmm, I think the fix is to fix if-conversion not doing that. Can you >>> track down why this happens? >> Hmm, but there are several common floating comparison operations in >> the case, by doing this, we could do CSE on GIMPLE, otherwise we >> depends on RTL optimizers. > > I see. > >> I thought we prefer GIMPLE level >> transforms? > > Yes, but the vectorizer is happier with the conditions present in the > COND_EXPR > and thus we concluded we always want to have them there. forwprop will > also aggressively put them back. Note that we cannot put back > tree_could_throw_p > conditions (FP compares with signalling nans for example) to properly model EH > (though for VEC_COND_EXPRs we don't really care here). > > Note that nothing between if-conversion and vectorization will perform > the desired > CSE anyway. Hi Richard, I looked into this one and found it was not if-conv factors cond_expr out. For test case:
for (i=0; i!=1024; ++i) { float rR = a*z[i]; float rL = b*z[i]; float rMin = (rR<rL) ? rR : rL; float rMax = (rR<rL) ? rL : rR; rMin = (rMax>0) ? rMin : rBig; rMin = (rMin>0) ? rMin : rMax; ok[i] = rMin-c<rMax+d; } Dump before jump threading is like: <bb 7>: # iftmp.3_12 = PHI <rL_18(5), rR_17(6)> if (iftmp.3_12 > 0.0) goto <bb 9>; else goto <bb 8>; <bb 8>: <bb 9>: # iftmp.4_13 = PHI <iftmp.2_11(7), 1.5e+2(8)> if (iftmp.4_13 > 0.0) goto <bb 11>; else goto <bb 10>; <bb 10>: <bb 11>: # iftmp.5_14 = PHI <iftmp.4_13(9), iftmp.3_12(10)> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6 -> bb12 -> bb9) as below: <bb 6>: # iftmp.3_12 = PHI <rL_18(4), rR_17(5), rL_18(11)> # iftmp.2_23 = PHI <iftmp.2_11(4), iftmp.2_11(5), iftmp.2_10(11)> if (iftmp.3_12 > 0.0) goto <bb 7>; else goto <bb 12>; <bb 7>: # iftmp.4_13 = PHI <iftmp.2_23(6)> if (iftmp.4_13 > 0.0) goto <bb 9>; else goto <bb 8>; <bb 8>: <bb 9>: # iftmp.5_14 = PHI <iftmp.4_13(7), iftmp.3_12(8), 1.5e+2(12)> //... <bb 12>: # iftmp.4_22 = PHI <1.5e+2(6)> goto <bb 9>; This transform saves one comparison on the path, but creates multi-arg phi, resulting in cond_expr being factored out. Looks like threading corrupts vectorization opportunity for target doesn't support vect_cond_mixed, but I guess it's hard to tell in threading itself. Any ideas? Thanks, bin