On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Richard, > > I resend you patch1 and patch2 with minor changes: > 1. I renamed flag_force_vectorize to aggressive_if_conv. > 2. Use static cast for the first argument of gimple_phi_arg_edge. > I also very sorry that I sent you bad patch. > > Now let me answer on your questions related to second patch. > 1. Why we need both predicate_extended_scalar_phi and > predicate_arbitrary_scalar_phi? > > Let's consider the following simple test-case: > > #pragma omp simd safelen(8) > for (i=0; i<512; i++) > { > float t = a[i]; > if (t > 0.0f & t < 1.0e+17f) > if (c[i] != 0) /* c is integer array. */ > res += 1; > } > > we can see the following phi node correspondent to res: > > # res_1 = PHI <res_15(3), res_15(4), res_10(5)> > > It is clear that we can optimize it to phi node with 2 arguments only > and only one check can be used for phi predication (for reduction in > our case), namely predicate of bb_5. In general case we can't do it > even if we sort all phi argument values since we still have to produce > a chain of cond expressions to perform phi predication (see comments > for predicate_arbitrary_scalar_phi).
How so? We can always use !(condition) for the "last" value, thus treat it as an 'else' case. That even works for # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> where the condition for edges 5 and 7 can be computed as ! (condition for 3 || condition for 4). Of course it is worthwhile to also sort single-occurances first so your case gets just the condiiton for edge 5 and its inversion used for edges 3 and 4 combined. > 2. Why we need to introduce find_insertion_point? > Let's consider another test-case extracted from 175.vpr ( t5.c is > attached) and we can see that bb_7 and bb_9 containig phi nodes has > only critical incoming edges and both contain code computing edge > predicates, e.g. > > <bb 7>: > # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> > _46 = xmax_17 == xmax_37; > _47 = xmax_17 == xmax_27; > _48 = _46 & _47; > _53 = xmax_17 == xmax_37; > _54 = ~_53; > _55 = xmax_17 == xmax_27; > _56 = _54 & _55; > _57 = _48 | _56; > xmax_edge_19 = xmax_edge_39 + 1; > goto <bb 11>; > > It is evident that we can not put phi predication at the block > beginning but need to put it after predicate computations. > Note also that if there are no critical edges for phi arguments > insertion point will be "after labels" Note also that phi result can > have use in this block too, so we can't put predication code to the > block end. So the issue is that predicate insertion for edge predicates does not happen on the edge but somewhere else (generally impossible for critical edges unless you split them). I think I've told you before that I prefer simple solutions to such issues, like splitting the edge! Certainly not involving a function walking GENERIC expressions. Thanks, Richard. > Let me know if you still have any questions. > > Best regards. > Yuri. > > > > > 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi All, >>> >>> Here is the second patch related to extended predication. >>> Few comments which explain a main goal of design. >>> >>> 1. I don't want to insert any critical edge splitting since it may >>> lead to less efficient binaries. >>> 2. One special case of extended PHI node predication was introduced >>> when #arguments is more than 2 but only two arguments are different >>> and one argument has the only occurrence. For such PHI conditional >>> scalar reduction is applied. >>> This is correspondent to the following statement: >>> if (q1 && q2 && q3) var++ >>> New function phi_has_two_different_args was introduced to detect such phi. >>> 3. Original algorithm for PHI predication used assumption that at >>> least one incoming edge for blocks containing PHI is not critical - it >>> guarantees that all computations related to predicate of normal edge >>> are already inserted above this block and >>> code related to PHI predication can be inserted at the beginning of >>> block. But this is not true for critical edges for which predicate >>> computations are in the block where code for phi predication must be >>> inserted. So new function find_insertion_point is introduced which is >>> simply found out the last statement in block defining predicates >>> correspondent to all incoming edges and insert phi predication code >>> after it (with some minor exceptions). >> >> Unfortunately the patch doesn't apply for me - I get >> >> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >> predicate_all_scalar_phis (struct loop *loop) >> >> a few remarks nevertheless. I don't see how we need both >> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >> Couldn't we simply sort an array of (edge, value) pairs after value >> and handle equal values specially in predicate_extended_scalar_phi? >> That would even make PHI <a, a, b, c, c> more optimal. >> >> I don't understand the need for find_insertion_point. All SSA names >> required for the predicates are defined upward - and the complex CFG >> is squashed to a single basic-block, thus the defs will dominate the >> inserted code if you insert after labels just like for the other case. >> Or what am I missing? ("flattening" of the basic-blocks of course needs >> to happen in dominator order - but I guess that happens already?) >> >> I'd like the extended PHI handling to be enablable by a flag even >> for !force-vectorization - I've seen cases with 3 PHI args multiple >> times that would have been nice to vectorize. I suggest to >> add -ftree-loop-if-convert-aggressive for this. We can do this as >> followup, but please rename the local flag_force_vectorize flag >> to something less looking like a flag, like simply 'aggressive'. >> >> Otherwise patch 2 looks ok to me. >> >> Richard. >> >> >>> ChangeLog: >>> >>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>> FLAG_FORCE_VECTORIZE instead of loop flag. >>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>> FLAG_FORCE_VECTORIZE is true. >>> (if_convertible_bb_p): Delete check that bb has at least one >>> non-critical incoming edge. >>> (phi_has_two_different_args): New function. >>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>> to phi arguments. Invoke phi_has_two_different_args to get phi >>> arguments if EXTENDED is true. Change check that block >>> containing reduction statement candidate is predecessor >>> of phi-block since phi may have more than two arguments. >>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>> statement before/after gsi point. >>> (predicate_scalar_phi): Add argument false (which means non-extended >>> predication) to call of is_cond_scalar_reduction. Add argument >>> true (which correspondent to argument BEFORE) to call of >>> convert_scalar_cond_reduction. >>> (get_predicate_for_edge): New function. >>> (predicate_arbitrary_scalar_phi): New function. >>> (predicate_extended_scalar_phi): New function. >>> (find_insertion_point): New function. >>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>> than 2 predecessors or both incoming edges are critical. Invoke >>> find_phi_replacement_condition and predicate_scalar_phi or >>> find_insertion_point and predicate_extended_scalar_phi depending on >>> EXTENDED value. >>> (insert_gimplified_predicates): Add check that non-predicated block >>> may have statements to insert. Insert predicate of BB just after label >>> if FLAG_FORCE_VECTORIZE is true. >>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>> is copy of inner or outer loop field force_vectorize.